1
00:00:11,680 --> 00:00:18,010
In this lecture we are going to look at a code lab notebook that does transfer learning with data augmentation.

2
00:00:18,010 --> 00:00:23,020
This lecture is going to walk you through a prepared code lab notebook although a very good exercise

3
00:00:23,020 --> 00:00:28,270
which I always recommend is once you know how this is done to try and recreate it yourself with as few

4
00:00:28,270 --> 00:00:30,080
references as possible.

5
00:00:30,310 --> 00:00:35,170
As usual you can look at the title of the notebook to determine what notebook we are currently looking

6
00:00:35,170 --> 00:00:36,130
at.

7
00:00:36,130 --> 00:00:39,910
It's called Pi torch transfer learning with data augmentation

8
00:00:44,630 --> 00:00:47,490
so at the top we have all our usual imports.

9
00:00:47,600 --> 00:00:54,030
We have one extra special thing which comes from the torch vision library and that's the model's module.

10
00:00:54,050 --> 00:00:58,050
This is of course where pre trained image networks are accessible from.

11
00:00:58,160 --> 00:01:02,420
We have some other minor new imports as well but you'll see how those are used as we go along.

12
00:01:06,870 --> 00:01:11,190
Next we download the data which is called the Food five k data set.

13
00:01:11,280 --> 00:01:18,660
This dataset is a set of images of food and non-food items so it's a binary classification dataset and

14
00:01:18,660 --> 00:01:26,040
your job is to classify whether an image is food or not food.

15
00:01:26,240 --> 00:01:31,850
The next step is to unzip the file since there are a lot of files we're going to use the Q Q option

16
00:01:31,910 --> 00:01:33,110
to suppress the output.

17
00:01:38,170 --> 00:01:41,740
Next we're going to do an LSD to see what got unzipped.

18
00:01:41,740 --> 00:01:48,080
As you can see there are a few new folders here specifically evaluation training and validation.

19
00:01:48,640 --> 00:01:50,800
So let's check and see what's in the training folder.

20
00:01:53,870 --> 00:01:56,840
So as you can see it's just a bunch of image files.

21
00:01:56,840 --> 00:01:58,160
So what do we know right away.

22
00:01:58,850 --> 00:02:05,270
Well we can see that this is not in the format we need in order to use the image folder data set object.

23
00:02:05,270 --> 00:02:11,300
What we are looking for is one folder for each class of images what it looks like though is that all

24
00:02:11,300 --> 00:02:16,790
the images for one class start with a zero in the file name whereas all the images for the other class

25
00:02:16,850 --> 00:02:18,150
start with a one.

26
00:02:18,500 --> 00:02:22,880
Of course we shouldn't try to guess whether that's true or not we should at least do a sanity check

27
00:02:22,880 --> 00:02:24,790
to make sure that this is actually true

28
00:02:31,770 --> 00:02:33,590
so let's print an image that starts with zero.

29
00:02:33,600 --> 00:02:34,410
Just to be sure.

30
00:02:37,560 --> 00:02:42,090
So I've chosen 0 8 0 8 but you should try other images yourself too.

31
00:02:42,090 --> 00:02:44,160
All right so this is a flower which is not food

32
00:02:48,330 --> 00:02:53,220
next let's try an image that starts with one I've chosen a 1 6 1 6.

33
00:02:53,220 --> 00:02:57,150
As you can see this appears to be food there looks like pizza.

34
00:02:57,420 --> 00:03:02,970
Again try a few other images yourself to see if all the zeros are non-food and other ones or food

35
00:03:10,320 --> 00:03:13,190
since these images are not organized the way we need.

36
00:03:13,320 --> 00:03:15,810
We're going to have to move the files around a little bit.

37
00:03:16,080 --> 00:03:20,100
So let's start by making a new folder called data inside this folder.

38
00:03:20,100 --> 00:03:23,490
We're going to have two more folders train and test.

39
00:03:23,700 --> 00:03:29,400
Now the original dataset came with three folders evaluation and training and validation but we don't

40
00:03:29,400 --> 00:03:32,210
have any need for three separate datasets.

41
00:03:32,250 --> 00:03:37,740
So what we're going to do is just use the training folder as the train set and the validation folder

42
00:03:37,860 --> 00:03:44,490
as the test or validation set will ignore that evaluation folder inside the train and test folders.

43
00:03:44,490 --> 00:03:50,550
We need to have a folder for each class so you can see here that I've created two folders inside each

44
00:03:50,550 --> 00:03:53,400
of the train and test folders food and non-food

45
00:03:59,400 --> 00:03:59,760
next.

46
00:03:59,760 --> 00:04:04,980
It's just a matter of moving the images moving is better than copying in this case since we don't want

47
00:04:04,980 --> 00:04:11,550
to take up too much space and waste time replicating the data on disk so you can see here that I'm taking

48
00:04:11,550 --> 00:04:18,960
all the files that start with zero and end in JPEG and moving these to the corresponding non-food folders.

49
00:04:18,960 --> 00:04:24,450
I'm taking all the files that start with the one and end in JPEG and moving these to the corresponding

50
00:04:24,450 --> 00:04:25,440
food folders

51
00:04:32,250 --> 00:04:38,340
next I'm going to specify the transformations to perform on both the train and test sets.

52
00:04:38,340 --> 00:04:42,840
As usual you should try to play around with these data augmentation parameters to see what effects they

53
00:04:42,840 --> 00:04:44,890
may have if any.

54
00:04:44,970 --> 00:04:49,860
You'll notice that the final transformation I apply happens to both the train and test sets.

55
00:04:49,860 --> 00:04:55,620
It's a normalize which as you might expect does normalization and I've specified what appears to be

56
00:04:56,130 --> 00:04:58,570
just a bunch of random numbers.

57
00:04:58,650 --> 00:05:01,860
So what are these random numbers and why are they there.

58
00:05:01,860 --> 00:05:06,360
Well you have to remember that these are pre train networks with very specific weights.

59
00:05:06,360 --> 00:05:11,340
So we can't just pass in any image in any way we want into this neural network.

60
00:05:11,340 --> 00:05:16,810
We have to transform our image in the same way that the original authors of the neuron that did.

61
00:05:17,370 --> 00:05:20,020
Otherwise our input won't make any sense.

62
00:05:20,040 --> 00:05:22,200
So this is more or less like that.

63
00:05:22,310 --> 00:05:27,540
There are some additional details that I don't want to get into now you'll notice that there are many

64
00:05:27,540 --> 00:05:33,300
more transformations we're applying to the test set which may seem odd you might think why are we applying

65
00:05:33,300 --> 00:05:35,430
data augmentation to the test set.

66
00:05:35,940 --> 00:05:37,610
And actually we're not.

67
00:05:37,620 --> 00:05:43,110
So these transformations are static transformations they don't augment the data they're just necessary

68
00:05:43,110 --> 00:05:44,510
for the neural network to work.

69
00:05:45,330 --> 00:05:50,880
So the first transformation which of course has to be applied to both datasets is the resize.

70
00:05:51,180 --> 00:05:54,230
CNN's work on image batches of constant size.

71
00:05:54,270 --> 00:05:57,990
So first we'll resize the 256 by 256.

72
00:05:57,990 --> 00:06:02,810
After that we cropped the image to be of size 2 to 4 by 2 to 4.

73
00:06:03,080 --> 00:06:03,360
OK.

74
00:06:03,360 --> 00:06:07,370
So that's why we have some transformations on the test set.

75
00:06:07,380 --> 00:06:10,440
Now you notice that the train data has a few additional steps in between.

76
00:06:10,440 --> 00:06:13,740
So random rotation color jitter and so forth.

77
00:06:13,740 --> 00:06:16,370
As mentioned you should feel free to play around with these.

78
00:06:16,380 --> 00:06:17,660
Change the order and so on

79
00:06:24,860 --> 00:06:27,470
next we create our image folder datasets.

80
00:06:27,500 --> 00:06:29,030
This is pretty trivial.

81
00:06:29,030 --> 00:06:37,980
We pass in the paths to the image files and the transforms that we just created.

82
00:06:37,990 --> 00:06:39,670
Next we create our data load of objects.

83
00:06:39,670 --> 00:06:40,390
As usual

84
00:06:43,630 --> 00:06:45,330
next we define our model.

85
00:06:45,340 --> 00:06:49,260
So this is going to look very different from what you're used to.

86
00:06:49,270 --> 00:06:56,670
First we grab the VEGF 16 model which is located in the torch of vision models module we set pre trained

87
00:06:56,670 --> 00:07:01,260
equal to true because we want pre trained weights and not random weights.

88
00:07:01,300 --> 00:07:05,860
Next we freeze the weights so that these weights are not updated during training.

89
00:07:05,860 --> 00:07:11,380
We can accomplish this simply by looping over the model parameters one by one and sending the requires

90
00:07:11,380 --> 00:07:14,290
grad attribute to false.

91
00:07:14,500 --> 00:07:23,050
Next we print the model so you can see that it's a very large model.

92
00:07:23,130 --> 00:07:30,270
Importantly though it has a structure we've seen before so the first part is a sequential.

93
00:07:30,270 --> 00:07:35,990
That wraps all the convolution and pulling layers as you can see this is attached to an attribute called

94
00:07:36,000 --> 00:07:40,950
features.

95
00:07:40,960 --> 00:07:42,700
Next we have an average pool 2D.

96
00:07:43,390 --> 00:07:49,350
And finally we have another sequential that wraps the dense layers and this is attached to an attribute

97
00:07:49,350 --> 00:07:50,430
called the classifier

98
00:07:55,600 --> 00:08:01,370
so if we print out model that classifier we see just the end and part of the CNN.

99
00:08:01,600 --> 00:08:05,170
This is the part we want to get rid of because we want to attach our own head

100
00:08:10,150 --> 00:08:15,350
so next we're going to get the number of features that are output from the first part of the model and

101
00:08:15,350 --> 00:08:17,400
call that end features.

102
00:08:17,480 --> 00:08:23,300
We need this because it's going to be the number of inputs into our head so we can get this by calling

103
00:08:23,300 --> 00:08:29,270
classifier of 0 which gives us the first layer and then calling the attribute in features.

104
00:08:29,330 --> 00:08:33,690
So we have twenty five thousand eighty eight features.

105
00:08:34,110 --> 00:08:39,610
Next we're going to create our own classifier which will just be a regular linear layer logistic regression

106
00:08:40,420 --> 00:08:47,340
and we'll assign this to the classifier attribute to replace the old one so if we print the model again

107
00:08:48,470 --> 00:08:54,350
we can see that the classifier attribute now refers to our own linear layer

108
00:08:57,440 --> 00:09:00,020
and actually move the model to the GP you as usual

109
00:09:03,760 --> 00:09:07,620
actually after this everything is the same as before.

110
00:09:07,630 --> 00:09:16,780
So we create a loss and optimizer we create the training function we call the training function we look

111
00:09:16,780 --> 00:09:21,580
at the last for iteration and we calculate the accuracy

112
00:09:24,290 --> 00:09:31,130
and these are the results so we get ninety nine percent accuracy on this large image dataset using transfer

113
00:09:31,130 --> 00:09:31,580
learning.