1
00:00:11,660 --> 00:00:19,190
In this lecture we are going to discuss how to improve our model using a process known as data augmentation.

2
00:00:19,190 --> 00:00:21,350
Consider this picture of a cat.

3
00:00:21,470 --> 00:00:26,940
You and I can agree that it's a cat now consider this other picture of a cat.

4
00:00:26,960 --> 00:00:29,240
It's pretty obvious that this is the same cat.

5
00:00:30,320 --> 00:00:35,460
Unfortunately while this is easy for us it's not so easy for a computer.

6
00:00:35,660 --> 00:00:40,820
Now the neural network may be able to identify this because of convolution but perhaps there's a way

7
00:00:40,820 --> 00:00:46,400
to help the neural network learn this translational and variance using data augmentation

8
00:00:51,590 --> 00:00:53,780
as you may have heard one of the reasons.

9
00:00:53,780 --> 00:00:59,840
Deep Learning has been so successful these days is that we have so much more data deep learning continues

10
00:00:59,840 --> 00:01:02,750
to improve as it trains on more and more data.

11
00:01:02,930 --> 00:01:06,340
While other machine learning algorithms seem to hit a plateau.

12
00:01:06,890 --> 00:01:10,810
One disadvantage of tabular data is that you just have what you have.

13
00:01:11,420 --> 00:01:16,670
If we're doing classification to see if you'll pass my math exam and I base that on how many hours you

14
00:01:16,670 --> 00:01:22,670
study and how many hours you play video games that can only depend on the data I collected through my

15
00:01:22,670 --> 00:01:23,200
survey.

16
00:01:23,780 --> 00:01:31,450
I can't just make up new data which would necessarily include my own biases about student performance.

17
00:01:31,560 --> 00:01:36,780
I can't invent a new student who studied this much in play video games this much and then decide if

18
00:01:36,780 --> 00:01:37,780
he passed or failed.

19
00:01:37,800 --> 00:01:39,120
That wouldn't make any sense

20
00:01:44,240 --> 00:01:44,950
however.

21
00:01:44,990 --> 00:01:48,910
Images are an important class of data we can see them.

22
00:01:48,980 --> 00:01:51,290
We know that both these images are cats.

23
00:01:51,290 --> 00:01:54,450
I know that if I rotate this cat a little bit it's still a cat.

24
00:01:55,010 --> 00:01:59,290
I know that if I flip a car upside down or left to right it's still a car.

25
00:01:59,810 --> 00:02:03,910
Although you should probably be worried if you encounter an upside down car.

26
00:02:04,040 --> 00:02:08,890
In other words what I'm trying to say is with images it is ok to invent new data.

27
00:02:14,070 --> 00:02:16,520
Now here's one problem with this approach.

28
00:02:16,590 --> 00:02:24,530
Data takes up space so the more new data I invent the more space it takes up and on top of that there

29
00:02:24,530 --> 00:02:27,910
are an endless number of ways I can invent new data.

30
00:02:27,980 --> 00:02:33,770
I can shift the image to the left one pixel two pixels three pixels four pixels five pixels and so forth

31
00:02:34,200 --> 00:02:36,410
and I can also go in the other direction.

32
00:02:36,410 --> 00:02:40,670
I can also rotate one degree two degrees three degrees and so on.

33
00:02:40,670 --> 00:02:44,030
Do I really want to store all these combinations on my hard drive.

34
00:02:44,570 --> 00:02:46,220
Probably not.

35
00:02:46,220 --> 00:02:50,090
Also Google collab probably wouldn't have space for that anyway.

36
00:02:50,090 --> 00:02:51,100
So what can you do

37
00:02:56,280 --> 00:03:00,030
luckily none of this work is work you have to do yourself.

38
00:03:00,060 --> 00:03:02,890
Have you ever considered how to rotate an image.

39
00:03:02,910 --> 00:03:05,520
Well it's not as straightforward as it sounds.

40
00:03:05,520 --> 00:03:10,050
Instead we can use the torch vision API to do all this work for us.

41
00:03:11,670 --> 00:03:18,030
To understand how it's going to work we have to discuss the concept of generators and integrators for

42
00:03:18,030 --> 00:03:19,560
the purpose of this lecture.

43
00:03:19,560 --> 00:03:21,540
We won't really differentiate between the two

44
00:03:26,650 --> 00:03:32,620
consider what you do when you want to write a for loop using indices 0 up to let's say 10.

45
00:03:32,770 --> 00:03:33,310
We would do.

46
00:03:33,310 --> 00:03:35,830
For i in range 10.

47
00:03:35,860 --> 00:03:42,100
Now in python 2 you would not want to use range 10 because that would return an actual list of numbers

48
00:03:42,160 --> 00:03:43,450
from zero up to 9.

49
00:03:44,180 --> 00:03:49,660
Instead you would use X range which does not create a list but rather generates those numbers on the

50
00:03:49,660 --> 00:03:50,930
fly.

51
00:03:50,950 --> 00:03:55,970
That's why in Python 3 This is the default behavior for the range function.

52
00:03:56,020 --> 00:04:00,460
If you print out range 10 you will not see a list of numbers from 0 up to 9.

53
00:04:00,460 --> 00:04:06,520
As you might expect conceptually that's what it is but physically that's not what it is.

54
00:04:06,520 --> 00:04:10,910
Instead it just says range 010 which is kind of not telling us much.

55
00:04:10,970 --> 00:04:12,790
The range is a range.

56
00:04:12,790 --> 00:04:17,340
Well it appears to be some kind of object on which you can call instance methods and so forth

57
00:04:22,570 --> 00:04:26,450
like you might not have known is that you can actually write your own generators.

58
00:04:26,530 --> 00:04:32,620
For example suppose you wanted to write a loop where the loop variable was a randomly generated number.

59
00:04:32,830 --> 00:04:40,570
So for x in my random generator would be a loop where on each iteration of the loop X would be a random

60
00:04:40,570 --> 00:04:41,650
number.

61
00:04:41,680 --> 00:04:42,870
Can we do this in Python

62
00:04:47,980 --> 00:04:52,670
The answer is yes we can accomplish exactly this using the yield command.

63
00:04:53,320 --> 00:04:58,900
So here is how you would write my random generator function that generates 10 random numbers on the

64
00:04:58,900 --> 00:05:07,740
fly as you can see the code to generate the entire list of values does not have to be computed beforehand

65
00:05:08,410 --> 00:05:13,510
and more importantly the entire list of values does not have to be stored in memory.

66
00:05:13,540 --> 00:05:18,460
Instead this takes up a constant amount of memory because as soon as the variable for each iteration

67
00:05:18,460 --> 00:05:21,280
is computed and used it is no longer needed.

68
00:05:26,120 --> 00:05:32,780
So in PI towards when we want to do data augmentation it's a similar concept instead of pre calculating

69
00:05:32,780 --> 00:05:37,800
all of our augmented images beforehand we are going to create them on the fly.

70
00:05:37,910 --> 00:05:44,690
You can imagine that it looks something like this for each batch of data we augment X batch using some

71
00:05:44,690 --> 00:05:49,880
randomly generated values then we yield x batch and Y batch.

72
00:05:49,910 --> 00:05:53,920
In other words only a small slice of X train is ever augmented.

73
00:05:53,920 --> 00:05:56,000
On each iteration.

74
00:05:56,000 --> 00:06:01,700
More importantly it's done randomly so you get something different each time which helps with generalization

75
00:06:03,250 --> 00:06:08,350
what you should notice is that this looks curiously like our data loader which we have already used

76
00:06:08,350 --> 00:06:09,850
for generating batches.

77
00:06:09,850 --> 00:06:11,440
When we do Batch gradient descent

78
00:06:16,650 --> 00:06:19,220
so how does this actually work in PI torch.

79
00:06:19,410 --> 00:06:22,410
What we've learned so far in this lecture is just conceptual.

80
00:06:22,440 --> 00:06:28,800
Practically speaking the answer lies in the functionality we have already been making use of specifically

81
00:06:28,860 --> 00:06:34,410
you know that when we load in the CFR dataset it's just a function and this function accepts as input

82
00:06:34,440 --> 00:06:36,810
an argument called transform.

83
00:06:36,810 --> 00:06:38,410
At this point you should ask yourself.

84
00:06:38,490 --> 00:06:40,200
I wonder what this does.

85
00:06:40,410 --> 00:06:45,770
In addition as you may recall we learned in the last section the data loader object yields your data

86
00:06:45,770 --> 00:06:50,160
set in batches as you loop over it applying the transform function on the fly

87
00:06:55,430 --> 00:06:56,960
so to be explicit.

88
00:06:56,960 --> 00:07:02,480
Well we have to do is modify the transform argument and instruct torch vision on how to augment the

89
00:07:02,480 --> 00:07:09,510
data specifically torch vision comes with a whole collection of different transformations you can apply

90
00:07:09,900 --> 00:07:15,690
such as slightly changing the colour cropping flipping both horizontally and vertically rotating and

91
00:07:15,690 --> 00:07:17,110
so forth.

92
00:07:17,130 --> 00:07:21,990
I like the idea of changing color because I don't think people mentioned it that often yet it seems

93
00:07:21,990 --> 00:07:23,100
very important.

94
00:07:23,370 --> 00:07:28,220
You would want your model to perform well under different lighting conditions and different colors.

95
00:07:28,320 --> 00:07:33,450
I would recommend checking out the documentation and experimenting with each of them in different combinations

96
00:07:33,750 --> 00:07:37,890
to get a feel for what has the most impact on your particular dataset.

97
00:07:37,920 --> 00:07:42,870
As I mentioned early on in this course page which gives us a lot of building blocks from which we can

98
00:07:42,870 --> 00:07:49,190
compose more complex things just like how we can compose complex neural networks from simple layers.

99
00:07:49,230 --> 00:07:54,760
We can also compose complex transformations made up from these basic ones.

100
00:07:54,900 --> 00:08:00,510
All we have to do is wrap them in a torch vision that transforms that compose object and pass this into

101
00:08:00,510 --> 00:08:02,960
the transform argument when we load in the data

102
00:08:07,900 --> 00:08:13,420
now because all of the data augmentation transformations are specified in the data function itself.

103
00:08:13,480 --> 00:08:19,390
That means everything that happens after that does not need to change specifically consider what the

104
00:08:19,390 --> 00:08:22,740
data loader would look like exactly the same as before.

105
00:08:23,020 --> 00:08:29,270
All the instructions for transforming the data belong to the data set object itself not the data loader.

106
00:08:29,380 --> 00:08:35,760
If you recall the only thing that the data loader does is call the transform function on the data set.

107
00:08:35,770 --> 00:08:40,520
Similarly we should expect that the training loop also does not change.

108
00:08:40,570 --> 00:08:43,770
We still have a nested for loop to do batch gradient descent.

109
00:08:44,170 --> 00:08:49,930
The first loop is still for Iron Range epochs and the second loop is still for inputs targets in data

110
00:08:49,930 --> 00:08:52,220
loader.

111
00:08:52,400 --> 00:08:57,770
The only thing to be careful about is that while you do want to augment the training set you probably

112
00:08:57,770 --> 00:09:02,840
don't want to augment the test set otherwise you get a different answer each time you evaluate it which

113
00:09:02,840 --> 00:09:03,770
is undesirable

114
00:09:08,940 --> 00:09:09,690
after all this.

115
00:09:09,750 --> 00:09:11,980
You might still have one remaining question.

116
00:09:12,300 --> 00:09:17,790
So far we've loaded all our image data from pre-built functions in torch vision but what if we have

117
00:09:17,790 --> 00:09:19,740
our own image dataset.

118
00:09:19,740 --> 00:09:25,920
Sure I can load images into an empire res using a library like pillow but once I've done that how can

119
00:09:25,920 --> 00:09:29,490
I apply these transformations that torch vision provides.

120
00:09:29,520 --> 00:09:34,740
It seems like in order to use them I must have a torch of vision data set object.

121
00:09:34,740 --> 00:09:39,560
Luckily we will cover this later in the course but for now you'll just have to wait and see.