1
00:00:00,570 --> 00:00:01,740
Hi and welcome back.

2
00:00:02,460 --> 00:00:08,940
In this section, we'll take a look at using PyTorch without PyTorch Lightning to implement transfer

3
00:00:08,940 --> 00:00:12,300
learning by feature extraction, as well as fine tuning.

4
00:00:12,540 --> 00:00:15,390
So it's a very important lessons, very nice lesson.

5
00:00:15,810 --> 00:00:19,980
So let's begin to open the book tree here already have it open.

6
00:00:20,220 --> 00:00:24,510
And that's because I'm creating the network because it takes a bit of a while to train.

7
00:00:25,290 --> 00:00:26,680
So let's begin the lesson.

8
00:00:26,730 --> 00:00:30,140
So firstly, run all of your inputs here.

9
00:00:30,150 --> 00:00:35,110
These are all the libraries will be using for this project, as well as download your data set.

10
00:00:35,140 --> 00:00:40,040
So you may notice this is a different data said this is to Hymenoptera.

11
00:00:40,050 --> 00:00:45,600
I really can't pronounce that we're probably not doing it justice, but basically it ends with these

12
00:00:45,600 --> 00:00:46,120
data sets.

13
00:00:46,120 --> 00:00:49,590
So I thought you may have been getting tired of the cats with dogs datasets.

14
00:00:50,010 --> 00:00:55,470
So now let's go into the ends of this piece dataset to visualize some of those images shortly.

15
00:00:56,010 --> 00:00:59,790
But for now, the first thing we do is let's set up what transforms.

16
00:01:00,180 --> 00:01:01,890
Now you may notice something here.

17
00:01:02,760 --> 00:01:05,850
You may notice that this line looks a bit weird, doesn't it?

18
00:01:06,450 --> 00:01:11,400
Well, that's because we're subtracting the mean and then dividing by the standard deviation here.

19
00:01:11,430 --> 00:01:14,910
This is the image net, so the deviations for the RGV images.

20
00:01:15,240 --> 00:01:18,510
So these are the means for the rugby, as well as the standard deviations.

21
00:01:19,020 --> 00:01:25,620
So because we're going to use a pre-trained network, a resonant it, in fact, that was trained on

22
00:01:25,620 --> 00:01:31,440
the image now dataset, we have to pre process it in the same way that the image net data was pre-processed.

23
00:01:32,070 --> 00:01:37,890
So we have a normal, we have an augmentation here or a random horizontal flip, as well as a random

24
00:01:37,890 --> 00:01:42,240
resize crop and over sending it to the tenses which we've done before.

25
00:01:42,840 --> 00:01:45,660
This is for the training data and for the validation data.

26
00:01:45,660 --> 00:01:47,040
We're doing something a bit different.

27
00:01:47,460 --> 00:01:54,150
Firstly, we're resizing it here to 256 and then we're doing a sinta crop to 224 afterward.

28
00:01:54,900 --> 00:01:55,620
Don't worry about it.

29
00:01:55,620 --> 00:01:57,000
That's just how we have to do it.

30
00:01:57,000 --> 00:02:03,120
When we apply it, when we're loading a pre-trained image net network and then again, we converted

31
00:02:03,120 --> 00:02:06,270
to Tensor and then we normalize it again as well.

32
00:02:06,870 --> 00:02:10,320
So there's a standard, relatively simple transformations.

33
00:02:10,740 --> 00:02:14,040
Feel free to add in other transformations here.

34
00:02:14,040 --> 00:02:19,860
If you want random augmentations, there's random brightness, contrast, random rotations, a bunch

35
00:02:19,860 --> 00:02:21,780
of them you can use from pay torch.

36
00:02:22,320 --> 00:02:25,890
So have fun with them and see if they improve your results.

37
00:02:26,610 --> 00:02:29,320
So now we have to create our digital orders.

38
00:02:29,700 --> 00:02:33,690
So what we do, we're going to use a function called Image Folder.

39
00:02:34,110 --> 00:02:38,910
Image Folder expects to data set in this name and this organize for the structure.

40
00:02:38,910 --> 00:02:42,930
So you have the dataset name here, which is a hymenoptera.

41
00:02:44,400 --> 00:02:50,060
And then we have the trained data here and then we have a labels we have and the images here notice

42
00:02:50,070 --> 00:02:54,660
that these are all images in the territory and these are all the bins in the beta, actually.

43
00:02:55,200 --> 00:03:00,210
This is a very nice and simple way to organize your data sets if you have your own custom data set.

44
00:03:00,660 --> 00:03:07,110
Feel free to just add create a folder structure like this where you have the train and Val, or if you

45
00:03:07,110 --> 00:03:12,900
want to have one big data set and then use a different function to do the splits up to you.

46
00:03:13,770 --> 00:03:15,570
But this is usually the way I do it.

47
00:03:16,620 --> 00:03:21,390
So what we do here when creating or data loaders set the data directory path.

48
00:03:21,840 --> 00:03:25,330
Then we use image for to point to the full dataset here.

49
00:03:25,350 --> 00:03:26,580
That's the train involved.

50
00:03:26,580 --> 00:03:28,080
Full list is a filenames.

51
00:03:28,680 --> 00:03:34,140
And then we just use some list or dictionary comprehension is here to create our data loader for port

52
00:03:34,140 --> 00:03:35,730
train and Val in one line.

53
00:03:36,180 --> 00:03:41,370
Normally you might see this done in Tulane's, but we were using Python, so let's take advantage of

54
00:03:41,370 --> 00:03:44,100
some of the many, many abilities.

55
00:03:44,100 --> 00:03:47,460
Python has to do a lot in one line, which is quite cool.

56
00:03:48,120 --> 00:03:52,980
Next, what we'll do with just ghetto dataset sizes as well as well as a low class Nims.

57
00:03:53,430 --> 00:03:59,550
So you can see we only have 224 training images, one image, 50 tree validation images.

58
00:04:00,090 --> 00:04:05,130
So this is a relatively small dataset, and then we have only two classes and some BS.

59
00:04:05,310 --> 00:04:10,570
So we should you would think we should get some fairly good results, given that it's a simple dataset.

60
00:04:10,590 --> 00:04:13,050
However, notice that we don't have much data.

61
00:04:14,130 --> 00:04:21,210
OK, so now let's move on to visualizing some images here so you can see these are some BS here.

62
00:04:21,240 --> 00:04:22,620
I wish I had an example of them.

63
00:04:22,620 --> 00:04:28,530
And but this is calling random images from the training dataset, and it just happened to pick a bunch

64
00:04:28,530 --> 00:04:29,070
of BS.

65
00:04:29,760 --> 00:04:33,720
So you can run this again on your own and see if you get some new images here.

66
00:04:33,720 --> 00:04:34,500
Some nuance?

67
00:04:35,550 --> 00:04:37,950
No, let's create a training function.

68
00:04:38,130 --> 00:04:40,650
So we're also introducing two things here.

69
00:04:41,160 --> 00:04:43,110
We're going to use a learning read scheduler.

70
00:04:43,530 --> 00:04:49,980
If you recall, we're learning, which is is a it uses a different learning rate every epoch according

71
00:04:49,980 --> 00:04:53,430
to a preset algorithm or the preset list of values.

72
00:04:54,150 --> 00:04:59,370
And also, we're going to introduce check pointing, which serves the best model and returns the best

73
00:04:59,370 --> 00:04:59,700
model.

74
00:04:59,980 --> 00:05:00,640
And of this loop.

75
00:05:01,150 --> 00:05:02,200
So that's pretty cool.

76
00:05:02,230 --> 00:05:03,730
And let's take a look at a screening loop.

77
00:05:03,730 --> 00:05:04,850
So I know creating loops.

78
00:05:05,420 --> 00:05:10,150
Well, tuning functions here in PyTorch can be a bit confusing because there's a lot going on.

79
00:05:10,630 --> 00:05:13,450
So I'll do my best to step through this good with you.

80
00:05:14,320 --> 00:05:20,530
So this is just time logging because we log into times for training programs.

81
00:05:20,530 --> 00:05:23,390
So don't don't worry too much about that here.

82
00:05:23,410 --> 00:05:30,760
We actually use copy and deep copy to basically save the state of the initial model here, and then

83
00:05:30,760 --> 00:05:31,840
we keep track of this.

84
00:05:32,140 --> 00:05:33,840
This is how we saved the best model.

85
00:05:33,850 --> 00:05:40,780
It's a PyTorch function and I'll by touch object that you'll be using to return the best model in the

86
00:05:40,780 --> 00:05:44,230
end, as well as just keep track of the best accuracy here.

87
00:05:44,770 --> 00:05:50,890
So now for the number of ebooks we specify, and in this case model we have specified 25, which we'll

88
00:05:50,890 --> 00:05:51,430
see below.

89
00:05:52,180 --> 00:05:58,360
We just print the e-book, the currency book, as well as as well as the number of ebooks we have left

90
00:05:58,360 --> 00:05:59,040
to get.

91
00:05:59,060 --> 00:06:01,630
Feel like we want over 24 25.

92
00:06:02,200 --> 00:06:07,750
And then we just use this here with a dash just to add some separators here so we can have a nice,

93
00:06:07,750 --> 00:06:10,690
clean printing output for training.

94
00:06:11,350 --> 00:06:18,120
So next to what we do, we just basically for each value get in this really creating value.

95
00:06:18,120 --> 00:06:23,770
Remember, the sequence we do is true for each book as we train the model and then we test and evaluate

96
00:06:24,160 --> 00:06:25,300
the validation dataset.

97
00:06:25,720 --> 00:06:29,050
So for cheating first, we just set model to training mode.

98
00:06:29,500 --> 00:06:31,450
Otherwise, it's going to be an evaluation mode here.

99
00:06:32,050 --> 00:06:36,950
We keep track of all lost and no predictions that are correct.

100
00:06:37,660 --> 00:06:40,960
Then we just get the inputs and levels from there tomorrow.

101
00:06:40,990 --> 00:06:42,460
So this is a batch right here.

102
00:06:43,090 --> 00:06:46,480
Send that back to the GPU or CPU, whatever devices.

103
00:06:47,020 --> 00:06:49,300
Then we just zero the parameter gradients.

104
00:06:49,300 --> 00:06:51,940
And then if it's in training mode, it would.

105
00:06:52,690 --> 00:06:54,460
This is what the gradient enabled.

106
00:06:54,460 --> 00:07:00,580
What you're going to do here is it's going to get random, run your input, your batch through the model,

107
00:07:00,580 --> 00:07:03,280
get two predictions, get lost as well.

108
00:07:03,730 --> 00:07:06,100
Again, if it's in the training mode, then you do.

109
00:07:06,100 --> 00:07:11,230
The back propagation here would Optimize Optimizer being whatever you want it to be, whatever you set

110
00:07:11,230 --> 00:07:17,500
it to be, to be like stochastic gradient or Adam or one of those, then we just keep track of those

111
00:07:17,500 --> 00:07:19,360
two training statistics here.

112
00:07:20,080 --> 00:07:21,880
So we just keep track of the first one.

113
00:07:21,880 --> 00:07:23,080
We keep track of this loss.

114
00:07:23,530 --> 00:07:26,460
And then a second one is how much of these predictions were actually correct.

115
00:07:26,500 --> 00:07:32,050
If we just get a total using torture of some number of predictions that are equal to the levels in the

116
00:07:32,050 --> 00:07:33,250
data in that match?

117
00:07:34,060 --> 00:07:38,470
And then again, if if it's in training mode here, let's just skip the going at this point.

118
00:07:39,100 --> 00:07:40,840
We just do this every little step.

119
00:07:41,140 --> 00:07:42,790
That's for the learning rate adjustments.

120
00:07:42,940 --> 00:07:49,060
So we just skip you, just increment that by one and we just go to the next value that you'll use for

121
00:07:49,060 --> 00:07:53,290
the optimizer, the leading rate selection next.

122
00:07:53,410 --> 00:07:57,040
We just keep track of the epoch loss as well as debug accuracy.

123
00:07:57,040 --> 00:07:59,350
And then the split would print it out here.

124
00:08:00,550 --> 00:08:06,910
And then if it's in validation mode and our epoch accuracy is greater than the best accuracy that we

125
00:08:06,910 --> 00:08:10,660
were keeping track of, then we just create a new copy of the model.

126
00:08:11,140 --> 00:08:14,440
So this is how we actually keep track of the best model.

127
00:08:14,890 --> 00:08:21,220
And if you notice at the end here, when the training is finished, this training loop here, this literally

128
00:08:21,220 --> 00:08:24,760
this conditional here would have kept track of the best models.

129
00:08:25,090 --> 00:08:26,500
Nice little programming tool here.

130
00:08:27,100 --> 00:08:31,870
So and this printed just a blank line so we can skip lines between epochs.

131
00:08:32,380 --> 00:08:35,260
And these are more these are summary statistics in the end.

132
00:08:36,000 --> 00:08:37,510
And so that's it.

133
00:08:37,540 --> 00:08:41,140
That's the whole block of code explained.

134
00:08:41,680 --> 00:08:43,690
Hopefully, you understood everything.

135
00:08:43,690 --> 00:08:51,010
And if you don't, because I know too much can be confusing because of its low level of style, then

136
00:08:51,010 --> 00:08:54,130
please message me on the Udemy forums next.

137
00:08:54,280 --> 00:08:59,260
What we have here, we just have a quick function called Visualize Predictions, where we can take a

138
00:08:59,260 --> 00:09:04,990
number of images here and the difficulties six to the model, we set the Model S Model two evaluation

139
00:09:04,990 --> 00:09:05,350
mode.

140
00:09:05,950 --> 00:09:18,030
If and then we just it propagate the images to batch whatever that is, and we just prepare to subplots

141
00:09:18,040 --> 00:09:22,510
at that point here so you can do whatever mode of slot supports you want.

142
00:09:22,730 --> 00:09:25,320
These be a six year, and that's it.

143
00:09:25,320 --> 00:09:27,670
That's a cool visualizing function that we can do.

144
00:09:27,670 --> 00:09:32,410
We can use to just plot the images along with their predictions.

145
00:09:33,400 --> 00:09:39,550
So we'll stop there for no, actually, no, sorry, we're actually about to trying to train them that

146
00:09:39,550 --> 00:09:40,000
network.

147
00:09:40,540 --> 00:09:41,290
Forget about that.

148
00:09:41,950 --> 00:09:46,660
So this is how we fine tune this called convolutional neural network.

149
00:09:47,290 --> 00:09:53,790
So remember, in fine tuning, we lower a pre-trained network, but we don't freeze any of the layers

150
00:09:53,800 --> 00:09:57,710
or we can freeze some layers as well as leaving some unfrozen.

151
00:09:57,730 --> 00:09:58,990
In this case, we leave.

152
00:09:59,620 --> 00:10:04,390
We don't freeze any place in the network, so we lowered Resident 18 here.

153
00:10:05,170 --> 00:10:11,020
Then we just get the number of features here from the model that we just floated.

154
00:10:11,590 --> 00:10:19,080
And then we set that to our last update for the connected layer here for the number of features.

155
00:10:19,090 --> 00:10:24,070
This is the previous number of features that but that definitely going to lay ahead.

156
00:10:24,700 --> 00:10:30,610
And then we just said that I would set our sample size only to set has two classes who have said that

157
00:10:30,610 --> 00:10:35,890
the final output, the linear layer to me to say this is the input for that layer and this is the output

158
00:10:35,890 --> 00:10:39,300
here, and this input came from the last final.

159
00:10:39,310 --> 00:10:44,020
The last fully connected layer features the number of down here.

160
00:10:44,320 --> 00:10:48,990
This is just a number of features, you know, who just send them all to the GP.

161
00:10:49,000 --> 00:10:54,040
You set the criterion here, which is the loss criterion, cross entropy loss.

162
00:10:54,790 --> 00:10:57,570
And then we use sarcastic the gradient descent.

163
00:10:57,580 --> 00:11:00,100
We use this default integrated here with momentum.

164
00:11:01,360 --> 00:11:06,490
However, these are being is up from the linear rate scheduler, which is all of the above to see here.

165
00:11:06,970 --> 00:11:10,870
So this is our this is how we configured to living, which you take a look at this.

166
00:11:11,380 --> 00:11:13,240
So we create a optimizer here.

167
00:11:13,870 --> 00:11:19,810
However, what we do, we then create this doesn't use this other function from pie torches, which

168
00:11:19,810 --> 00:11:25,450
we imported in the beginning of this good learning rate scheduler, doorstep learning rate.

169
00:11:25,780 --> 00:11:30,730
That means every step or whatever step, it's that that's actually the algorithm we're using in this

170
00:11:30,730 --> 00:11:31,690
case for this scheduler.

171
00:11:31,840 --> 00:11:39,580
The step one, we pass this optimizer because this this can no work with any optimizer.

172
00:11:39,580 --> 00:11:42,190
You can use Adam or the others as well.

173
00:11:42,910 --> 00:11:46,150
And we the subsidize of seven and the gamut is point one.

174
00:11:46,150 --> 00:11:50,320
You can take a look at the function to see how it's configured.

175
00:11:50,320 --> 00:11:52,150
You can see exactly what gamma controls.

176
00:11:52,750 --> 00:11:58,630
I believe gamma controls the decrement at every step and things step size means that you change the

177
00:11:59,620 --> 00:12:02,240
learning rate every seven ebooks.

178
00:12:03,430 --> 00:12:04,150
So that's it.

179
00:12:04,750 --> 00:12:06,790
And then now, how do we use that in the model?

180
00:12:06,820 --> 00:12:10,030
Well, remember we we can use trained model here.

181
00:12:10,450 --> 00:12:16,750
So the model that we we defined here, we created, that was this final model that we loaded.

182
00:12:17,110 --> 00:12:17,810
It's quite simple.

183
00:12:18,640 --> 00:12:22,660
Set the last criterion set to optimize a set learning rate scheduler.

184
00:12:22,660 --> 00:12:25,300
So the number of ebooks and there we go.

185
00:12:25,390 --> 00:12:26,920
So I have a training right now.

186
00:12:27,210 --> 00:12:32,710
It takes a little while, takes roughly about maybe five minutes, actually less than five minutes in

187
00:12:32,710 --> 00:12:40,080
your book, maybe about a minute or two an epoch, and it's training for 25 ebooks that starts at zero.

188
00:12:40,540 --> 00:12:42,340
You can add more than +1 to this.

189
00:12:42,340 --> 00:12:45,660
If you wanted to keep it at 25, I actually would have preferred that.

190
00:12:46,540 --> 00:12:49,110
And then we go so we can keep track of your accuracy.

191
00:12:49,120 --> 00:12:51,910
So at the end of each podcast, take a look at default nine.

192
00:12:52,750 --> 00:12:58,090
You can see our training accuracy 83 percent training versus point training of validation.

193
00:12:58,090 --> 00:13:03,280
This is quite good point to do, and validation accuracy is ninety one point five percent.

194
00:13:03,400 --> 00:13:04,120
That's pretty good.

195
00:13:04,510 --> 00:13:07,060
And you can see we go to pick one of ninety three.

196
00:13:07,060 --> 00:13:13,930
I believe that's what any other sort of you know, 93 was actually 95 initially fell in the first epoch.

197
00:13:14,830 --> 00:13:20,380
OK, so that basically shows you that we don't have enough data in this model to really take advantage

198
00:13:20,380 --> 00:13:26,830
of transfer learning here because you can see after one epoch, we actually got the best results coincidentally,

199
00:13:27,340 --> 00:13:32,350
and that's due to the random initialization of weights that give us that coincidental best results.

200
00:13:32,350 --> 00:13:39,490
But the fact that it's not getting better after this, this money box means that this is a bit of an

201
00:13:39,490 --> 00:13:41,370
overkill for this dataset.

202
00:13:42,130 --> 00:13:48,670
What you can do, though, we can perhaps free some lives if you wanted and experiment like that.

203
00:13:48,910 --> 00:13:50,470
So that's actually what we're going to do next.

204
00:13:50,470 --> 00:13:56,830
But in the meantime, let's take a look at our visualizing or predictions here so you can see it predicted

205
00:13:56,830 --> 00:13:57,460
ants.

206
00:13:57,460 --> 00:13:58,750
And I don't.

207
00:13:58,750 --> 00:14:00,220
Yeah, they are and sound asleep.

208
00:14:00,850 --> 00:14:01,730
This is a B.

209
00:14:01,750 --> 00:14:06,340
This is and this is a really do not have to be.

210
00:14:07,330 --> 00:14:10,210
This is an ant and this is ants as well.

211
00:14:11,020 --> 00:14:11,760
So that's cool.

212
00:14:11,800 --> 00:14:13,660
So know what we'll do.

213
00:14:14,170 --> 00:14:20,600
We'll take a look at using a convolutional neural net as a fixed fidget extractor so opposed to listen,

214
00:14:20,600 --> 00:14:25,300
know and resume in the next section because this video has been going on for almost 15 minutes.

215
00:14:25,840 --> 00:14:29,590
So you think you may want to break and I'll raise my voice a bit too as well.

216
00:14:30,040 --> 00:14:36,050
And when we resume, we'll take a look at how we can use or confirm that as a fixed feature, the extractor

217
00:14:36,580 --> 00:14:38,530
and apply play transplanting, which is quite easy.

218
00:14:38,530 --> 00:14:40,540
Actually, it's not going to be a very long section.

219
00:14:41,110 --> 00:14:42,280
Thank you, and I'll see you then.

220
00:14:42,550 --> 00:14:42,850
Bye.