1
00:00:00,150 --> 00:00:06,030
Hello, everyone, and welcome to this new and exciting session in which we are going to treat transform,

2
00:00:06,060 --> 00:00:08,280
learning and fine tuning.

3
00:00:08,310 --> 00:00:14,760
Transform learning can be applied in several domains like computer vision, natural language processing

4
00:00:14,760 --> 00:00:15,870
and speech.

5
00:00:15,870 --> 00:00:23,040
In order to better understand the usefulness of transfer learning, we have to take note that deep learning

6
00:00:23,040 --> 00:00:26,970
models work best when given much data.

7
00:00:27,150 --> 00:00:34,680
And so this means that if you have a data set of only 100 data points, then you are most likely going

8
00:00:34,680 --> 00:00:37,310
to have a poorly performing model.

9
00:00:37,320 --> 00:00:44,670
But what if we tell you that it's possible to train your model or to train a given model on, say,

10
00:00:44,670 --> 00:00:54,990
a million data points and then use that model or adapt that model such that you can now train it on

11
00:00:54,990 --> 00:01:01,590
this very small data set such that you start getting very great results.

12
00:01:01,590 --> 00:01:07,740
This is very possible with transfer learning, and that's why we shall be treated in this section at

13
00:01:07,740 --> 00:01:08,490
this point.

14
00:01:08,490 --> 00:01:17,070
One question which may be going into your mind is how is it possible to train a model made of, say,

15
00:01:17,070 --> 00:01:25,680
a million data points and then use that same model on a smaller data set, which is obviously different

16
00:01:25,680 --> 00:01:31,080
from this data set with about 100 data points?

17
00:01:32,790 --> 00:01:37,800
The answer to this question lies in this figure we have right here.

18
00:01:38,370 --> 00:01:42,840
Notice how you have this image here, this image of this truck.

19
00:01:44,430 --> 00:01:51,840
And then we have this model which takes this input and then produces some outputs right here.

20
00:01:53,160 --> 00:02:00,330
This model or the kind of model we'll use for this image tasks are generally the calf nets as we've

21
00:02:00,330 --> 00:02:01,800
seen already in this course.

22
00:02:01,800 --> 00:02:06,770
And with the calf nets, we generally have two main sections.

23
00:02:06,780 --> 00:02:11,220
The first section is the feature extractor.

24
00:02:12,030 --> 00:02:18,270
And then as we go towards these final sections here we have the classifier.

25
00:02:18,480 --> 00:02:25,770
And so the very first thing the calf that we want to do will be to extract low level features.

26
00:02:25,770 --> 00:02:33,330
And then as we go towards the end, we start or we focus on extracting more high level features.

27
00:02:33,720 --> 00:02:41,910
Notice how your with this, let's pick out this feature right here or this feature map.

28
00:02:41,910 --> 00:02:44,730
You see that we pick up this edges.

29
00:02:44,730 --> 00:02:51,120
You see for other feature maps, we actually filtering out some low level features or extracting this

30
00:02:51,120 --> 00:02:53,150
low level features from our input.

31
00:02:53,160 --> 00:03:00,810
Then as we get towards the end, we get more high level features like, for example, whole portions

32
00:03:00,810 --> 00:03:03,360
of the image like the tire.

33
00:03:03,600 --> 00:03:12,840
And then after this we generally have a classifier which now permits us to pick between a set of options

34
00:03:12,840 --> 00:03:22,320
which one the model thinks this image is actually now, because the calf net works this way, it means

35
00:03:22,320 --> 00:03:32,850
that if we have two data sets which are similar, then we could build some sort of feature extractor,

36
00:03:32,850 --> 00:03:41,130
or we could build a model which will extract features from this very large data set.

37
00:03:41,310 --> 00:03:53,220
And then because these weights have been tuned or have been trained such that the extract features correctly,

38
00:03:53,220 --> 00:04:03,540
then when we pass in this very small dataset, this section of the model will do just its job, that

39
00:04:03,540 --> 00:04:09,450
is, of extracting useful features from this dataset.

40
00:04:10,260 --> 00:04:14,880
And you see that because these two datasets are similar, is going to do a great job.

41
00:04:14,880 --> 00:04:27,450
And so we will not need a very large dataset in order to extract features from this small 100 size dataset

42
00:04:27,450 --> 00:04:28,170
right here.

43
00:04:29,190 --> 00:04:36,060
And so in fact, what we are saying is we have a model, let's say we have this model, then we have

44
00:04:36,060 --> 00:04:42,240
this feature extractor unit and then we have this classifier units, which we've seen already.

45
00:04:43,720 --> 00:04:47,890
Generally, this starts after the flattening of the global average pulling.

46
00:04:47,890 --> 00:04:50,110
And then here we have some dense layers.

47
00:04:50,110 --> 00:04:56,380
While with this we have a continent or some convolutional layers with some max spool and batch norm

48
00:04:56,380 --> 00:04:57,060
layers.

49
00:04:57,070 --> 00:05:05,410
So what we say now is we have this our small data set which are going to pass in here and then we'll

50
00:05:05,410 --> 00:05:06,730
get at this point.

51
00:05:06,730 --> 00:05:15,160
So it's here that we're going to get this output from our feature extractor unit.

52
00:05:15,430 --> 00:05:23,350
And then since generally we have, for example, or we pre-trained this model, not the word pre-trained,

53
00:05:23,350 --> 00:05:29,050
because we are somehow using this for the first time in this course we are pre training or we pre trained

54
00:05:29,050 --> 00:05:37,720
pre that we did the training before we pre trained the model on a large or relatively large dataset

55
00:05:37,720 --> 00:05:40,090
like for example image net.

56
00:05:40,150 --> 00:05:48,130
Let's let's let's suppose that we pre trained on some large dataset with 1 billion images then this

57
00:05:48,460 --> 00:05:54,550
unit year has learned to extract features from whatever image you give it.

58
00:05:54,550 --> 00:06:02,320
And so now when you come with just 100 images, what it does is it extracts those features.

59
00:06:02,320 --> 00:06:09,250
And then since in the other level of this classifier we have a different setup for the pre training,

60
00:06:09,370 --> 00:06:12,400
We now have to modify this classifier.

61
00:06:12,400 --> 00:06:20,680
So this means that if before let's suppose that let's suppose that before we had after the global pooling

62
00:06:20,680 --> 00:06:30,730
we have say 100 unit dense layer and then 1000 output dense layer right here, Then in our case where

63
00:06:30,730 --> 00:06:39,040
we have just three outputs, what we'll do now is we'll simply replace this here would take all this

64
00:06:39,040 --> 00:06:49,060
off and then now we may pass this say directly to the to the three output dense layer or we pass this

65
00:06:49,060 --> 00:06:56,860
first to say 128 and then to this three output dense layer.

66
00:06:56,860 --> 00:07:07,270
So from here we see that this new model will focus more on the classification while allowing the previous

67
00:07:07,270 --> 00:07:16,900
pre-trained model to take care of extracting this features from our data.

68
00:07:18,040 --> 00:07:24,040
Now, it should be noted that we generally use this concept of transfer learning when we have a very

69
00:07:24,040 --> 00:07:25,240
small dataset.

70
00:07:25,240 --> 00:07:33,310
And obviously since deep learning models perform best with larger set, we want to get the best out

71
00:07:33,310 --> 00:07:33,910
of them.

72
00:07:33,910 --> 00:07:36,280
And so we want to use the transfer learning.

73
00:07:36,280 --> 00:07:44,350
When we have a small dataset, as we've said, and we have this model which has been pre trained to

74
00:07:44,350 --> 00:07:49,930
extract this useful features from the those kinds of images.

75
00:07:50,110 --> 00:07:55,930
So the simply means that we should have two similar data sets.

76
00:07:56,140 --> 00:08:05,560
Another advantage of using our where can we transfer learning is that you get to gain in terms of training

77
00:08:05,560 --> 00:08:14,440
compute cost that is this model which was pre trained may have been trained for say three days and then

78
00:08:14,440 --> 00:08:21,550
now all you need to do is just get this pre-trained model and then apply transfer learning on your own

79
00:08:21,550 --> 00:08:23,920
specific and smaller task.

80
00:08:24,700 --> 00:08:32,200
And so when you're running on limited budget, you find that working with pre-trained models is going

81
00:08:32,200 --> 00:08:33,670
to be really helpful.

82
00:08:34,840 --> 00:08:42,250
Now, apart from transfer learning, we also have fine tuning, which is quite similar in the sense

83
00:08:42,250 --> 00:08:51,550
that unlike with the transfer learning where we have this year, does this feature extractors weights

84
00:08:51,550 --> 00:08:59,410
which are fixed And then during training we update the weights of this classification section with fine

85
00:08:59,410 --> 00:09:00,000
tuning.

86
00:09:00,010 --> 00:09:06,470
What we could do is also update the weights of this feature extractor section.

87
00:09:06,490 --> 00:09:11,350
Now generally we start fine tuning from the top, so we suppose that this is the bottom here.

88
00:09:11,350 --> 00:09:14,770
So we have the input and then we have this final layer.

89
00:09:14,770 --> 00:09:21,250
So we will see that we start finding from this final layers going to this initial layers.

90
00:09:21,250 --> 00:09:23,890
So yeah, we have this fine tuning process.

91
00:09:23,890 --> 00:09:26,350
We could get this first, this top layer.

92
00:09:26,350 --> 00:09:34,000
So that is we fix this layers here, we keep this layers fixed so the weights aren't updated during

93
00:09:34,000 --> 00:09:37,540
the training process and then we update this weights.

94
00:09:37,540 --> 00:09:40,450
While obviously we update this, we update this weights already.

95
00:09:40,450 --> 00:09:42,550
But the difference is that these weights were.

96
00:09:42,620 --> 00:09:44,870
Initialized from scratch.

97
00:09:44,870 --> 00:09:53,870
That is, we randomly initialized these weights, whereas these weights are initialized from the Pre-trained

98
00:09:53,870 --> 00:09:54,710
model.

99
00:09:54,950 --> 00:09:59,450
Now this, this weights here from the Pre-trained model, but they are not trained.

100
00:09:59,810 --> 00:10:06,200
Now, we could again, depending on the kind of results we get in, meaning that if we apply a fine

101
00:10:06,200 --> 00:10:11,180
tuning to this top layers or to this final layers and we get better results, you see that what we could

102
00:10:11,180 --> 00:10:19,400
do is we could keep fine tuning, so we could keep increasing this or this section of weights we could

103
00:10:19,400 --> 00:10:21,410
update in the feature extractor unit.

104
00:10:21,410 --> 00:10:24,050
So here would take this off, take this off.

105
00:10:24,050 --> 00:10:27,470
And now we have this part which we can train.

106
00:10:27,470 --> 00:10:28,670
So this is trainable.

107
00:10:28,670 --> 00:10:31,190
This is trainable or not trained.

108
00:10:31,280 --> 00:10:33,270
So we have something like this.

109
00:10:33,290 --> 00:10:41,500
Now, it's also possible for us to go ahead and just say, okay, we're going to train all our model.

110
00:10:41,510 --> 00:10:43,850
But why carrying out fine tuning?

111
00:10:43,880 --> 00:10:50,980
If there's one thing you need to note is that you have to use a very small learning rate.

112
00:10:50,990 --> 00:10:58,880
And the reason why you're doing this is to avoid disrupting this weight values which have taken very

113
00:10:58,880 --> 00:11:01,400
much time to attain.

114
00:11:02,510 --> 00:11:07,940
And so as we do the fine tuning, we're going to update this weights, but very slowly.

115
00:11:08,180 --> 00:11:16,040
And by getting this by updating this with very slowly, we mean we're going to choose a very small learning

116
00:11:16,040 --> 00:11:22,490
rate and then observe how this affects our model's performance.

117
00:11:23,240 --> 00:11:28,790
At this point, we'll get straight to the code and we'll look at some pre-trained models.

118
00:11:28,790 --> 00:11:33,080
So here you could see you get into this TensorFlow applications.

119
00:11:33,080 --> 00:11:35,090
You have called next model.

120
00:11:35,090 --> 00:11:41,360
That's net model efficient, net model efficient V two We've seen the efficient net already, inception

121
00:11:41,360 --> 00:11:49,130
net mobile net, which we've seen mobile net V two, V three nice net and the famous rest nets which

122
00:11:49,130 --> 00:11:53,600
we've seen already with the Vgs and the exception net.

123
00:11:53,600 --> 00:11:58,430
So you're you have the choice of picking out any one of this.

124
00:11:58,430 --> 00:12:05,480
We're going to go straight to the efficient net so we could have this your or you could get or you could

125
00:12:05,480 --> 00:12:09,140
pick the efficient at V two, so could pick any one of these.

126
00:12:09,140 --> 00:12:17,750
So you're going to pick the efficient net B for this one since he has slightly fewer number of parameters

127
00:12:17,750 --> 00:12:24,590
compared to the rest net 50 and it outperforms the rest at 50 by a very large margin.

128
00:12:24,590 --> 00:12:27,140
So we're going to pick this efficient net before.

129
00:12:27,560 --> 00:12:33,860
And if there's one thing you can do with TensorFlow is simply the fact that you could use these models

130
00:12:33,860 --> 00:12:36,530
without having to cut them out from scratch.

131
00:12:36,530 --> 00:12:41,750
So as you could see here, we have this TensorFlow as applications efficient, net efficient before.

132
00:12:41,750 --> 00:12:43,850
And then we just have this argument with this.

133
00:12:43,850 --> 00:12:47,630
We get we're going to define our efficient model.

134
00:12:47,630 --> 00:12:51,710
We pay this right here and we'll call this the backbone.

135
00:12:51,950 --> 00:12:58,940
So since it's already now here, we're not going to include the top recall that as we had seen here,

136
00:12:58,940 --> 00:13:04,370
we have this old model.

137
00:13:04,370 --> 00:13:09,890
And then let's take this fine tuning part of we have this whole model.

138
00:13:09,890 --> 00:13:15,140
And then what we are interested is in this feature extractor unit.

139
00:13:15,140 --> 00:13:18,830
So we'll set that include top to false.

140
00:13:18,830 --> 00:13:19,730
Let's get back here.

141
00:13:19,730 --> 00:13:21,080
We have include top.

142
00:13:21,080 --> 00:13:24,560
We're not going to include this, set this to false.

143
00:13:25,070 --> 00:13:31,550
Let's take this off and then the weights have been pre-trained on the image net data set.

144
00:13:32,480 --> 00:13:34,550
We want to take this input tensor.

145
00:13:34,550 --> 00:13:40,520
We have the input shape now, this input shape, we have configuration.

146
00:13:40,520 --> 00:13:43,340
So basically we have configuration.

147
00:13:43,340 --> 00:13:46,910
The image size, the image size.

148
00:13:48,050 --> 00:13:52,790
There we go in size by configuration.

149
00:13:53,630 --> 00:13:58,400
And since we are not including the top, as we've said here, we're not going to take into consideration

150
00:13:58,400 --> 00:14:01,670
this classes nor the classifier activation.

151
00:14:01,670 --> 00:14:09,140
So we have that off and then we could decide whether to include the pulling layer or not.

152
00:14:09,140 --> 00:14:13,970
And if we want to pick the pulling layer, then we'll have to specify what pull and layer want to work

153
00:14:13,970 --> 00:14:16,850
with either the average or the max.

154
00:14:16,850 --> 00:14:22,610
So with this, we are just going to take this off and specify that later on.

155
00:14:22,610 --> 00:14:30,200
So here we have our backbone, we run the cell and then what we do to freeze what we call this is freezing.

156
00:14:30,200 --> 00:14:37,400
So what we do to freeze this backbone such that the weights aren't updated during training is by simply

157
00:14:37,400 --> 00:14:41,510
setting this here to to false or.

158
00:14:41,670 --> 00:14:43,170
Senders trainable.

159
00:14:43,950 --> 00:14:45,840
Parameter to false.

160
00:14:45,840 --> 00:14:48,010
So backbone not trainable.

161
00:14:48,030 --> 00:14:49,560
Equal false.

162
00:14:49,560 --> 00:14:50,280
And that's it.

163
00:14:50,280 --> 00:14:53,360
So this all we need to do to freeze our model.

164
00:14:53,370 --> 00:15:00,540
Now, first on our model, the next thing to do is to add this other layer right here, all this other

165
00:15:00,540 --> 00:15:01,140
layers.

166
00:15:01,140 --> 00:15:02,820
So we'll go straight away.

167
00:15:03,240 --> 00:15:07,710
Then here we're going to define this input with the image size.

168
00:15:08,010 --> 00:15:12,600
Now, once we have this input, we now pass in or we have the backbone.

169
00:15:12,600 --> 00:15:20,970
So we have the backbone here which has been defined and as parameters have been set to be frozen.

170
00:15:20,970 --> 00:15:24,510
And then from here we have the global average pulling.

171
00:15:24,510 --> 00:15:27,330
So we have global average pulling your.

172
00:15:27,360 --> 00:15:30,370
Then we now have this dense layer.

173
00:15:30,390 --> 00:15:39,480
Now the configurations we set this number of dense layer, 1 to 1024 and number of dense layers, 2

174
00:15:39,480 --> 00:15:41,040
to 128.

175
00:15:41,040 --> 00:15:43,590
So let's get back to this.

176
00:15:43,650 --> 00:15:44,610
There we go.

177
00:15:44,970 --> 00:15:46,020
We have that.

178
00:15:46,020 --> 00:15:50,550
And then from here, we're going to have batch normalization layer.

179
00:15:52,350 --> 00:15:53,330
There we go.

180
00:15:53,340 --> 00:15:55,650
Let's copy this, paste it out here.

181
00:15:55,650 --> 00:16:00,750
We have another dense layer now, this time around, the second one, that's fine.

182
00:16:00,750 --> 00:16:05,820
And then finally, we have this dense layer activation, soft max.

183
00:16:05,850 --> 00:16:08,430
So here we have our soft max activation.

184
00:16:08,430 --> 00:16:11,010
And then this is a number of classes.

185
00:16:12,090 --> 00:16:15,030
Here we have a number of classes and that's fine.

186
00:16:15,030 --> 00:16:20,490
So let's take this off now and then run this cell right here.

187
00:16:21,270 --> 00:16:25,560
We get in this arrow because we have to specify this this way.

188
00:16:25,890 --> 00:16:27,960
That's fine and that's it.

189
00:16:27,960 --> 00:16:34,770
So now we have this model, you see total parameters, 19 million, the trainable parameters, just

190
00:16:34,770 --> 00:16:36,660
1.9 million.

191
00:16:36,660 --> 00:16:40,320
And the non trainable parameters are 17.6 million.

192
00:16:40,320 --> 00:16:44,160
So it means that the backbone itself is 17.6 million.

193
00:16:44,160 --> 00:16:50,050
And then this additional parameters here come with the remaining 1.9 million parameters.

194
00:16:50,070 --> 00:16:57,240
Now, with this, we have our model already set, you see, with minimal code, and we could go ahead

195
00:16:57,240 --> 00:16:58,530
to start the training.

196
00:16:59,250 --> 00:17:02,970
Then you're going to start by training our model again.

197
00:17:02,970 --> 00:17:07,530
We compile the model and we run the training process.

198
00:17:07,770 --> 00:17:09,390
Now training is over.

199
00:17:09,390 --> 00:17:12,120
We could go ahead and evaluate our model.

200
00:17:12,120 --> 00:17:13,770
So let's run this.

201
00:17:13,920 --> 00:17:15,570
And what do we get?

202
00:17:15,840 --> 00:17:25,170
We have close to 85% accuracy and 95.3% top K accuracy.

203
00:17:25,830 --> 00:17:31,170
And this does slightly better than the previous models which we had worked with.

204
00:17:32,070 --> 00:17:33,420
Let's go ahead and test this.

205
00:17:33,420 --> 00:17:42,300
So it changes to model and we run this here incompatible found shape.

206
00:17:42,300 --> 00:17:44,070
This.

207
00:17:44,070 --> 00:17:47,600
We are going to resize this before passing into the model.

208
00:17:47,610 --> 00:17:56,580
So here we have this image and just here we have let's say we have our test image which will resize.

209
00:17:56,580 --> 00:18:08,760
So we use open CV to resize this image we pass in our test image and then we specify this in size.

210
00:18:09,120 --> 00:18:15,830
So here we have in size, copy this and there we go.

211
00:18:15,840 --> 00:18:18,510
Let's run this again and here's what we get.

212
00:18:18,570 --> 00:18:21,330
See, we have the side output.

213
00:18:21,360 --> 00:18:25,440
Now, let's go ahead and check this out here.

214
00:18:26,190 --> 00:18:27,450
We run this.

215
00:18:29,960 --> 00:18:30,950
Yeah, we have one.

216
00:18:30,950 --> 00:18:31,760
Ms..

217
00:18:33,320 --> 00:18:33,610
No.

218
00:18:33,620 --> 00:18:34,190
Ms..

219
00:18:34,220 --> 00:18:34,520
No.

220
00:18:34,550 --> 00:18:35,240
Ms..

221
00:18:35,360 --> 00:18:36,400
The second Miss.

222
00:18:36,410 --> 00:18:36,650
Yeah.

223
00:18:36,650 --> 00:18:37,280
We have two.

224
00:18:37,280 --> 00:18:38,210
Mrs..

225
00:18:38,330 --> 00:18:39,620
And that's it.

226
00:18:39,620 --> 00:18:43,190
So out of the 16, we have to.

227
00:18:43,190 --> 00:18:44,000
Mrs..

228
00:18:44,090 --> 00:18:45,980
That is 14.

229
00:18:45,980 --> 00:18:57,920
Divide that by 16 about 87.5% accuracy on this small batch of images which we took from the validation

230
00:18:57,920 --> 00:18:58,490
dataset.

231
00:18:59,330 --> 00:19:01,790
We go ahead and check out the conversion metrics.

232
00:19:01,820 --> 00:19:04,730
Oops, let's run this, run this.

233
00:19:05,420 --> 00:19:10,100
Let's get back and use our conviction metrics.

234
00:19:10,100 --> 00:19:12,440
We get even better results.

235
00:19:13,040 --> 00:19:18,950
But one thing we have to note here is that our dataset was not that small.

236
00:19:18,950 --> 00:19:28,130
And so we may not see this change or this difference between training from scratch and using transfer

237
00:19:28,160 --> 00:19:28,830
learning.

238
00:19:28,830 --> 00:19:35,270
So what we'll do is we will take this, we'll take just say ten, that's 320.

239
00:19:35,270 --> 00:19:43,160
So we'll have a data set of 320 data points and then we'll see the difference when we train from scratch

240
00:19:43,160 --> 00:19:47,450
and when we train with a pre-trained model.

241
00:19:47,630 --> 00:19:51,190
So right here, let's get back to this.

242
00:19:51,200 --> 00:19:55,670
We take this down and we're going to use we could use any of this ones.

243
00:19:55,670 --> 00:20:00,170
So let's let's pick the net quite simple.

244
00:20:00,170 --> 00:20:12,170
Pick the net and then your we have in this we scroll down, we have training lost function the same

245
00:20:12,170 --> 00:20:15,680
metrics the same, and here we have the net model.

246
00:20:15,680 --> 00:20:20,570
So we run that and learn that model.

247
00:20:21,230 --> 00:20:22,340
That's fine.

248
00:20:22,790 --> 00:20:28,700
And so here we'll train on this small part and validate on the full validation data set.

249
00:20:28,730 --> 00:20:30,980
We'll do this for just 20 epochs.

250
00:20:31,790 --> 00:20:40,400
So after training for 20 epochs on that very small data set, you see the model doesn't perform well.

251
00:20:40,400 --> 00:20:46,880
You see it doesn't even get up to 50% validation accuracy while the trained accuracy keeps increasing.

252
00:20:46,880 --> 00:20:48,470
So the model's overfitting.

253
00:20:48,500 --> 00:20:53,120
Now, from here, we are now going to change this.

254
00:20:53,120 --> 00:20:57,110
So we're going to use our model, this pre-trained model.

255
00:20:57,110 --> 00:20:59,690
So we have this year we just run this again.

256
00:20:59,690 --> 00:21:06,830
So we initialize this parameters and then we'll compile the model.

257
00:21:07,130 --> 00:21:08,150
This is model.

258
00:21:08,150 --> 00:21:12,590
So let's, let's call this pre trained, pre trained model.

259
00:21:12,590 --> 00:21:16,340
Let's change this name here to Pre-trained model.

260
00:21:16,580 --> 00:21:17,390
There we go.

261
00:21:17,390 --> 00:21:26,240
We have our pre trained model and we'll get the pre trained model summary.

262
00:21:27,230 --> 00:21:28,670
That's fine.

263
00:21:29,720 --> 00:21:30,800
Yeah, we have that.

264
00:21:30,800 --> 00:21:36,420
And then here we have pre trained, pre trained model.

265
00:21:36,520 --> 00:21:37,030
Okay.

266
00:21:37,040 --> 00:21:43,190
So yeah, we're going to run this pre trained model compile and then we'll start with the training again.

267
00:21:43,970 --> 00:21:53,210
So just knowing that we, we had the accuracy validation accuracy below 50% previously and now we're

268
00:21:53,210 --> 00:21:57,590
going to check out our validation accuracy when working with the pre trained model.

269
00:21:57,620 --> 00:22:06,650
But already one thing you could notice just after two epochs like your see, after these two epochs,

270
00:22:06,650 --> 00:22:13,490
we are going to see this validation accuracy, which is already greater than 50, even from the first

271
00:22:13,490 --> 00:22:15,290
epoch was already greater than 50.

272
00:22:15,320 --> 00:22:21,230
It shows you the power of working with pre-trained models as we are now making use of those extracted

273
00:22:21,230 --> 00:22:25,680
features to get this much more performing model.

274
00:22:25,710 --> 00:22:27,650
So here the accuracy keeps increasing.

275
00:22:28,880 --> 00:22:30,300
Now we're done with the training.

276
00:22:30,320 --> 00:22:41,330
You could see that this model, which before couldn't cross this 50% mark for the validation accuracy

277
00:22:41,330 --> 00:22:45,710
now is able to cross this mark as we'll see here.

278
00:22:45,750 --> 00:22:52,970
You see, we have the validation accuracy of 71% while just training on 320 data points.

279
00:22:53,120 --> 00:22:56,870
So let's get your let's run this.

280
00:22:56,990 --> 00:22:59,090
And you could see what we have here.

281
00:22:59,120 --> 00:23:03,170
You see it gets just above 70%.

282
00:23:03,500 --> 00:23:12,410
So now with pre training, we get above 70% and we even got greater than 50% just from the very first

283
00:23:12,410 --> 00:23:13,130
epoch.

284
00:23:14,240 --> 00:23:21,140
And so what we could see from here is the very first thing is get as much cleaned data as possible.

285
00:23:21,140 --> 00:23:26,150
And if you can lay hands on this, try some data augmentation.

286
00:23:26,150 --> 00:23:34,460
And then from your if your dataset is still very small, then you could then apply transfer learning.

287
00:23:35,600 --> 00:23:41,660
But if you have a relatively large data set, it will be needless applying transfer learning as training

288
00:23:41,660 --> 00:23:45,980
from scratch should normally get you better results.

289
00:23:46,940 --> 00:23:58,490
So here we could evaluate our model so you could see that this our pre pre trained model with just ten

290
00:23:58,490 --> 00:24:06,170
out of 213 batches, which produces 71.3% validation accuracy.
