1
00:00:01,670 --> 00:00:02,870
What's up, everyone?

2
00:00:02,870 --> 00:00:08,330
And welcome to this new section in which we'll build callbacks with TensorFlow.

3
00:00:08,540 --> 00:00:16,070
In the section, we'll look at how to build a callback from scratch by inheriting from the callback

4
00:00:16,070 --> 00:00:16,910
class.

5
00:00:16,910 --> 00:00:23,120
And then we'll build other callbacks made available by TensorFlow like the CSV logger.

6
00:00:23,420 --> 00:00:30,650
L is topping learning rate scheduler model check pointing and finally reduce learning rate on plateau

7
00:00:30,650 --> 00:00:31,430
callback.

8
00:00:31,580 --> 00:00:37,700
Don't forget to subscribe and hit the notification button so you never miss amazing content like this.

9
00:00:38,730 --> 00:00:44,400
Callbacks are methods we call during training evaluating our prediction.

10
00:00:44,970 --> 00:00:51,090
This callbacks can permit us extract useful information from those processes we just listed, or even

11
00:00:51,090 --> 00:00:53,820
carry out changes on those processes.

12
00:00:53,820 --> 00:00:59,190
You're in this TensorFlow documentation we have to TF Keras and then we have the callbacks.

13
00:00:59,190 --> 00:01:02,580
So this is what we have for the documentation.

14
00:01:02,580 --> 00:01:09,000
We could go through each and every one of this, but for the sake of this course we are going to look

15
00:01:09,000 --> 00:01:10,650
at the key ones.

16
00:01:10,650 --> 00:01:13,650
So right here we have this callback.

17
00:01:13,650 --> 00:01:16,920
But before get into this, we are going to look at the history.

18
00:01:16,920 --> 00:01:21,060
The reason why we're looking at history first is actually because we've used it already.

19
00:01:21,060 --> 00:01:25,830
So we use this callback without really knowing that we're using callbacks.

20
00:01:26,430 --> 00:01:32,070
So yeah, we're told this is a callback that records event into a history object.

21
00:01:32,070 --> 00:01:35,520
So remember the times we were having this year?

22
00:01:35,520 --> 00:01:43,020
So after training, that's when when we're training, we start some information in the history.

23
00:01:43,020 --> 00:01:50,100
And then after training, we're able to come up with plots like this because we have this information

24
00:01:50,100 --> 00:01:51,960
stored in this history right here.

25
00:01:52,050 --> 00:01:58,620
So that said, as you could see in this example, which is kind of like similar to what we've been seeing

26
00:01:58,620 --> 00:02:06,690
so far, we have this history and it collects all values we been storing during training and then we

27
00:02:06,690 --> 00:02:13,530
could print out this params history dot params, we get the params and then we could also get the keys.

28
00:02:14,160 --> 00:02:15,810
So that's how this works.

29
00:02:15,810 --> 00:02:17,250
Will send us already.

30
00:02:17,250 --> 00:02:22,800
And then we could look at this callback class right here.

31
00:02:22,830 --> 00:02:29,880
Now, the reason why this is like the most important of all these different callback classes is because

32
00:02:29,880 --> 00:02:37,530
this is kind of like the master class or that abstract base class which can be used in building new

33
00:02:37,530 --> 00:02:38,250
callbacks.

34
00:02:38,250 --> 00:02:44,940
So if you want to build a callback which isn't listed here, you could always get back to this and then

35
00:02:44,940 --> 00:02:47,000
build that callback from scratch.

36
00:02:47,010 --> 00:02:55,410
So here we have this callback class which has attributes, as you could see, params and model, and

37
00:02:55,410 --> 00:03:01,110
then it also has this methods which we'll see how to implement very easily.

38
00:03:01,110 --> 00:03:07,680
If you look at this on batch begin and this on batch and you'll see that the take similar arguments

39
00:03:07,680 --> 00:03:11,250
like here we have the batch and then we have this logs.

40
00:03:11,250 --> 00:03:20,100
Then for the next the epoch begin and the epoch and the actually taken the epoch and the logs that set

41
00:03:20,100 --> 00:03:22,530
will impart the callback.

42
00:03:22,530 --> 00:03:33,600
So here we have from TensorFlow Keras callbacks, we're going to impart callback this class which we're

43
00:03:33,600 --> 00:03:36,120
going to be using in creating our callbacks.

44
00:03:36,120 --> 00:03:42,510
We run that, then we define this callback class, which we'll note as called last callback.

45
00:03:42,510 --> 00:03:49,290
We have this last callback class which inherits from the callback which we've just imported, and then

46
00:03:49,290 --> 00:03:54,140
we will make use of the different methods which have been given to us in the documentation.

47
00:03:54,150 --> 00:03:58,470
Let's start with the on epoch and so on.

48
00:03:58,470 --> 00:03:58,800
Epoch.

49
00:03:58,800 --> 00:04:00,480
And we're taking the epoch.

50
00:04:00,480 --> 00:04:08,250
And then what we're going to do is we are going to print out the last values at the end of an epoch.

51
00:04:08,250 --> 00:04:11,910
So what are we doing is kind of similar to what we have already here.

52
00:04:11,910 --> 00:04:19,110
So let's do that and have print epoch number.

53
00:04:19,800 --> 00:04:30,270
Let's say epoch number this and then has a loss of four epoch number.

54
00:04:30,270 --> 00:04:41,640
This method, the model has a loss of this and then we just format so there we go, we pass in the epoch,

55
00:04:41,640 --> 00:04:45,540
let's have this here and the locks.

56
00:04:45,540 --> 00:04:50,970
So just as in the documentation where we add in the epoch and the logs, now we are going to pass in

57
00:04:50,970 --> 00:04:53,160
the epoch and then the logs.

58
00:04:53,160 --> 00:04:57,330
But since we want to get just the loss, we could get the loss from this.

59
00:04:57,330 --> 00:05:01,290
So we have this dictionary here and we pick out just the loss.

60
00:05:01,320 --> 00:05:10,560
Now let's run this and then we'll see how to include a callback in the training process.

61
00:05:10,560 --> 00:05:13,650
So just right here, we're going to have callbacks.

62
00:05:13,650 --> 00:05:16,770
So the callbacks argument and we have this list.

63
00:05:16,770 --> 00:05:20,460
So in this list we're going to insert this callback we've just created here.

64
00:05:20,460 --> 00:05:24,390
So we have the last callback we should just pass in here and that's fine.

65
00:05:24,390 --> 00:05:26,130
So now let's rerun this.

66
00:05:26,130 --> 00:05:30,090
Let's take this for, say, three epochs and then see what we get.

67
00:05:30,420 --> 00:05:35,670
We get in this error, we click search StackOverflow and then click on this.

68
00:05:35,880 --> 00:05:37,810
You see that we have.

69
00:05:38,100 --> 00:05:39,380
Solution.

70
00:05:39,390 --> 00:05:47,760
Just your and what is said here is what you're going to pass is the object and not the class itself.

71
00:05:47,760 --> 00:05:53,340
So that said, instead of passing the last callback as we just did, we're going to pass on this object.

72
00:05:53,340 --> 00:05:55,580
So we're going to put in the brackets.

73
00:05:55,590 --> 00:05:59,280
Let's modify this print here.

74
00:06:01,030 --> 00:06:02,650
And personal space.

75
00:06:02,650 --> 00:06:08,410
So we're going to have that or two next line and then we run that again.

76
00:06:08,410 --> 00:06:10,690
So let's run that and that's fine.

77
00:06:11,230 --> 00:06:17,680
Then after training for over three epochs is where we get we see that unlike before where we just had

78
00:06:17,680 --> 00:06:24,730
this output it, now we have this message output that is for epoch number zero.

79
00:06:24,760 --> 00:06:29,530
The model has a loss of this for epoch number one, and then for epoch number two.

80
00:06:29,560 --> 00:06:32,080
Now what we could do is we could add this.

81
00:06:32,080 --> 00:06:36,850
So we have this formatted normally, so we don't start from zero, but instead from one.

82
00:06:36,850 --> 00:06:38,650
So we could say plus one right here.

83
00:06:38,650 --> 00:06:39,580
So that's it.

84
00:06:39,610 --> 00:06:48,940
Now another thing we could do is we could have on batch and so on batch, and we have that and then

85
00:06:48,940 --> 00:06:51,880
we have the batch we passing the lox.

86
00:06:51,880 --> 00:06:55,870
This time around, what we want to do is kind of similar to what we've seen already here.

87
00:06:55,870 --> 00:06:58,240
So we just print this out.

88
00:06:58,330 --> 00:06:59,350
That's fine.

89
00:06:59,590 --> 00:07:00,520
Take that off.

90
00:07:00,520 --> 00:07:06,310
So for the batch number, for this batch number, the model has a loss of this.

91
00:07:06,310 --> 00:07:08,650
So there we take in the batch.

92
00:07:08,650 --> 00:07:09,640
So that's it.

93
00:07:09,640 --> 00:07:11,620
And then we log out the loss.

94
00:07:11,650 --> 00:07:14,170
Now, Yeah, let's just pick out this loss.

95
00:07:14,170 --> 00:07:14,710
Totally.

96
00:07:14,710 --> 00:07:17,740
So we have the locks and then that's fine.

97
00:07:17,740 --> 00:07:24,520
So yeah, we've put in the batch like, Yeah, well, passing the epochs, we run this again and run

98
00:07:24,520 --> 00:07:26,110
this, run this.

99
00:07:27,220 --> 00:07:32,920
You'll notice that this time around we have much more information which has been locked out.

100
00:07:33,340 --> 00:07:36,730
Let's take this here and get right to the top.

101
00:07:36,730 --> 00:07:40,330
What we have here is for batch number one, the model has this loss.

102
00:07:40,330 --> 00:07:43,990
So you see that this is after each and every batch.

103
00:07:43,990 --> 00:07:45,910
So here's the first batch.

104
00:07:45,910 --> 00:07:51,880
We have this log, and then the next batch we have this log and so on and so forth.

105
00:07:52,660 --> 00:08:02,020
And since our batch size equal to the simply means, after working on 32 different data points, we're

106
00:08:02,020 --> 00:08:04,930
going to have this locked out.

107
00:08:04,930 --> 00:08:09,040
And then the next 32, we have this locked out right up to the end.

108
00:08:09,280 --> 00:08:13,210
Now for the epochs, you will notice that as we go on.

109
00:08:13,210 --> 00:08:17,830
So we're moving on to let's get right up to 689.

110
00:08:19,450 --> 00:08:20,440
We have that.

111
00:08:20,440 --> 00:08:24,580
And finally here we have 689.

112
00:08:24,580 --> 00:08:24,930
Okay.

113
00:08:25,000 --> 00:08:30,310
So for this last year, last at the end of the epoch, we now have this locked out.

114
00:08:30,310 --> 00:08:35,710
So here we have four epoch, whereas previously we had four batch four for all the different batches.

115
00:08:36,340 --> 00:08:41,080
In case you want to understand how we got this six, eight and nine right here, you could take the

116
00:08:41,080 --> 00:08:47,410
total dataset size and then divide by the current batch size, which is 32.

117
00:08:47,440 --> 00:08:51,340
You should get this 689 which is been given right here.

118
00:08:52,030 --> 00:08:54,820
So from this we can look at the CSV logger.

119
00:08:55,180 --> 00:09:01,330
Now with this here as a logger, what we're actually doing is we are logging out this information in

120
00:09:01,330 --> 00:09:05,200
a CSV file or in some file, which we're going to define.

121
00:09:05,980 --> 00:09:08,500
So what we do is just simply copy this.

122
00:09:08,500 --> 00:09:15,250
Now, this time around, we wouldn't actually have to recreate a class as we just did with the callbacks

123
00:09:15,250 --> 00:09:23,020
because this kind of like a general way of creating this callbacks now with the CSV logger will actually

124
00:09:23,020 --> 00:09:25,720
create the callback more easily.

125
00:09:25,720 --> 00:09:30,220
So here we have this and then we're going to specify the file name.

126
00:09:30,220 --> 00:09:38,470
So here let's have the CSV we call back equal that, let's take this off and we specify our file name.

127
00:09:38,470 --> 00:09:46,270
So yeah, we just have our file or locks dot CSV and that should be fine.

128
00:09:46,270 --> 00:09:57,490
So we take that off, take that off and then up right here we have to impart C as V logger.

129
00:09:58,780 --> 00:09:59,710
That's fine.

130
00:09:59,800 --> 00:10:02,440
We run that everything is okay.

131
00:10:02,770 --> 00:10:09,950
We get back to our callback and then we run this cell right here.

132
00:10:09,970 --> 00:10:19,180
Now, note that this append is as described in the documentation, a boolean which tells us whether

133
00:10:19,180 --> 00:10:27,550
the locks we are currently putting in the CSV file or in the file in general are going to be appended

134
00:10:27,550 --> 00:10:30,280
on previously locked content or not.

135
00:10:30,280 --> 00:10:38,650
So when we have this as false, we're supposing that this is empty and so we are going to like be put

136
00:10:38,650 --> 00:10:41,290
in information in this file for the very first time.

137
00:10:41,290 --> 00:10:49,090
So we've run this and then now all we need to do to take this new callback into consideration is to

138
00:10:49,090 --> 00:10:52,030
have your CSV call back.

139
00:10:52,030 --> 00:11:00,010
So let's make sure, let's take this batch locks out from here so we are not going to take this last

140
00:11:00,010 --> 00:11:00,490
callback.

141
00:11:00,570 --> 00:11:01,440
To conservation.

142
00:11:01,440 --> 00:11:08,700
And the longer we run this, after three epochs, we're going to open this up and then we have this

143
00:11:08,700 --> 00:11:11,400
logs of CSV file which has been created.

144
00:11:11,400 --> 00:11:12,600
So let's have that.

145
00:11:12,600 --> 00:11:19,380
And as you could see, we have this as file, which we could now download and then view it later.

146
00:11:19,380 --> 00:11:28,140
So here we have the accuracy AUC and all the other metrics and lost values which we want to store.

147
00:11:28,290 --> 00:11:29,610
So that's fine.

148
00:11:29,610 --> 00:11:36,810
And next thing we could do is we could get back again here and then I'll select append.

149
00:11:36,810 --> 00:11:37,640
True.

150
00:11:37,650 --> 00:11:42,750
So if we've done training the first time, I want to redo the training process.

151
00:11:42,750 --> 00:11:45,900
We don't want to erase all the values we had previously.

152
00:11:45,900 --> 00:11:48,990
So here we set this to append true around this.

153
00:11:48,990 --> 00:11:54,810
Again, after three epochs we open this log dot C has the file.

154
00:11:54,810 --> 00:11:56,040
And what do you have here?

155
00:11:56,040 --> 00:12:04,110
You see, we have this new information which is just been appended on the previous information from

156
00:12:04,110 --> 00:12:05,550
this CSV logger.

157
00:12:05,550 --> 00:12:09,810
We could now move on to this early stopping callback right here.

158
00:12:11,430 --> 00:12:13,770
To better understand l is talking.

159
00:12:13,770 --> 00:12:17,100
Let's get back to this plots which we had previously.

160
00:12:17,100 --> 00:12:24,030
So let's take all this plot of the models accuracy where we see how the train accuracy keeps increasing

161
00:12:24,030 --> 00:12:29,000
while after a certain point let's see this points here.

162
00:12:29,010 --> 00:12:35,730
Let's take this off after say, this point our models or even this point here.

163
00:12:37,200 --> 00:12:42,390
Our model's validation accuracy doesn't increase any further.

164
00:12:42,390 --> 00:12:45,990
So what we have in here is something like this.

165
00:12:45,990 --> 00:12:56,670
We have this training accuracy which increases and goes towards one, and then the validation accuracy,

166
00:12:56,670 --> 00:13:00,420
which is something like this.

167
00:13:00,510 --> 00:13:05,580
Now, in some other cases you will even have situations where this starts to drop.

168
00:13:05,670 --> 00:13:12,870
Nonetheless, in this case, we have this plot where it just kind of like stabilizes and doesn't increase

169
00:13:12,870 --> 00:13:13,890
any further.

170
00:13:13,920 --> 00:13:21,600
Now, note that this kind of situation is known as overfitting in overfitting.

171
00:13:21,720 --> 00:13:27,810
The model starts to overfeed the training data.

172
00:13:27,810 --> 00:13:34,110
So because the model has been trained on the training data and not on the validation data at certain

173
00:13:34,110 --> 00:13:42,480
points, the model stops or ceases to generalize because the aim of this training process is not to

174
00:13:42,480 --> 00:13:46,790
come up with a model which only performs well on the training data.

175
00:13:46,800 --> 00:13:53,040
We're trying to come up with a model which performs well on any type of data, be it a train, be the

176
00:13:53,040 --> 00:13:55,070
validation or the test data.

177
00:13:55,080 --> 00:14:01,860
So if we're able to have a model which does the same or which has the same performance with a train

178
00:14:01,860 --> 00:14:06,110
and with the validation with the test, then that model is an ideal one.

179
00:14:06,120 --> 00:14:14,820
But in this case we see that as we keep on training, the models parameters have been modified to suit

180
00:14:14,820 --> 00:14:16,260
only the training data.

181
00:14:16,260 --> 00:14:24,810
And this is very dangerous because at a certain point you may feel like because you're having high trained

182
00:14:25,170 --> 00:14:31,770
accuracy, your model is performing well, whereas this isn't the case because when your model will

183
00:14:31,770 --> 00:14:41,040
be showed new data like the validation in this case and the test later on, this model wouldn't perform

184
00:14:41,040 --> 00:14:46,150
as well as it will do or as well as it's doing with the training data.

185
00:14:46,170 --> 00:14:55,650
So to avoid this kind of false measurements, we tend to stop the training once this overfitting starts

186
00:14:55,650 --> 00:14:56,550
to occur.

187
00:14:56,670 --> 00:15:02,940
So this means that if you training and then your validation, let's take let's suppose we stop in at

188
00:15:02,940 --> 00:15:03,720
this year.

189
00:15:03,720 --> 00:15:11,760
So we're training and then your validation data or your validation accuracy seems to be constant, whereas

190
00:15:12,000 --> 00:15:19,410
that of the training seems to kind of increase, then it's better for you to stop training at this point

191
00:15:19,410 --> 00:15:28,080
because after this point the model parameters are just been modified to suit the training data and doesn't

192
00:15:28,080 --> 00:15:36,150
really generalize, which is the case here, because we're trying to extract some information from this

193
00:15:36,150 --> 00:15:38,820
data and make the model intelligent.

194
00:15:38,820 --> 00:15:47,700
So the model doesn't become intelligent by only modifying its weights or parameters based on the data

195
00:15:47,700 --> 00:15:56,040
it's been trained upon is intelligent because after being trained it can perform well on data it has

196
00:15:56,040 --> 00:15:57,630
never, ever seen.

197
00:15:57,630 --> 00:16:05,790
So yeah, we have this l is stopping where after we notice that this the validation accuracy doesn't

198
00:16:05,790 --> 00:16:07,440
seem to increase any further.

199
00:16:07,440 --> 00:16:14,250
We just kind of like stop the training and then use the model parameters from this number of epochs.

200
00:16:14,250 --> 00:16:19,290
So we could say that after, say, 12 epochs, we just stop the training.

201
00:16:19,440 --> 00:16:24,240
Now let's take this off and then do replicate something similar for the loss.

202
00:16:24,990 --> 00:16:31,740
So if we're having a loss, we could have something like this and then that.

203
00:16:31,740 --> 00:16:35,700
So yeah, we have the number of epochs and then we have the loss.

204
00:16:35,700 --> 00:16:41,850
So you could have a situation where you have in your training data which your training loss which keeps

205
00:16:42,390 --> 00:16:47,550
reducing, whereas for the validation you see you would have something like this.

206
00:16:47,550 --> 00:16:53,410
So this is a typical plot for overfitting.

207
00:16:53,430 --> 00:16:58,980
Nonetheless, in our case, or the model of our face, but not that much, in some cases you will have

208
00:16:58,980 --> 00:17:04,980
a situation where this event starts to drop and where the loss starts to increase.

209
00:17:04,980 --> 00:17:10,540
After a certain point there is a validation loss and then there is a training loss.

210
00:17:10,560 --> 00:17:17,260
Obviously, the training loss will always keep reducing because we are training on this training data.

211
00:17:17,280 --> 00:17:26,010
So what we're saying is at this point where the validation stops reducing, it's important to just stop

212
00:17:26,010 --> 00:17:28,860
this training and this is known as early stopping.

213
00:17:28,920 --> 00:17:39,390
Now recall that the aim of callbacks is to be able to modify the training process, the evaluation process,

214
00:17:39,390 --> 00:17:42,030
or the test process as prediction process.

215
00:17:43,260 --> 00:17:45,130
In an automatic manner.

216
00:17:45,150 --> 00:17:53,880
So that said, we shall make use of this early stopping callback right here, which will permit us stop

217
00:17:53,880 --> 00:18:01,770
training automatically once we notice that a given parameter like, say the loss, the validation loss

218
00:18:01,770 --> 00:18:03,930
doesn't drop any longer.

219
00:18:04,710 --> 00:18:06,870
So here we're just going to copy this.

220
00:18:06,900 --> 00:18:13,550
And then we just applied similarly to what we had done with the CSV callback.

221
00:18:13,560 --> 00:18:18,780
So yeah, we add this text and then we add that code, we just paste this out.

222
00:18:19,510 --> 00:18:22,840
Now we define this is callback that's early stop callback.

223
00:18:22,840 --> 00:18:27,500
And and then we'll look at the significance of each and every one of these arguments.

224
00:18:27,520 --> 00:18:32,550
Now, coming back to documentation, we have this monitor quantity to be monitored.

225
00:18:32,560 --> 00:18:36,760
So by default, yeah, we have this vowel loss.

226
00:18:36,760 --> 00:18:44,230
This means that this callback will simply check on this validation loss right here.

227
00:18:44,230 --> 00:18:51,250
And then once it stops reducing, like, see how this points, we're going to stop the training.

228
00:18:52,570 --> 00:19:00,640
Whereas if we change this to say validation, precision or validation accuracy, then what will be monitoring

229
00:19:00,640 --> 00:19:05,290
will be that accuracy of the precision value.

230
00:19:05,290 --> 00:19:07,460
So if you happen like this, we'll see.

231
00:19:07,480 --> 00:19:14,350
We'll stop around this right here to ensure that we don't go and over feed on the training data.

232
00:19:14,950 --> 00:19:19,060
The next argument is this mean delta argument right here.

233
00:19:19,060 --> 00:19:28,960
So with the mean Delta argument, we are defining a minimum change below which any change is considered

234
00:19:28,960 --> 00:19:30,700
as no improvement.

235
00:19:30,700 --> 00:19:42,460
So if we have a loss like this and that our mean delta is say, a value of 0.1, then even if this loss

236
00:19:42,460 --> 00:19:52,300
reduces by value of 0.5, then this callback will consider that there has been no decrease in the loss

237
00:19:52,300 --> 00:19:55,260
because the mean delta is 0.1.

238
00:19:55,270 --> 00:19:58,130
Now, by default, the mean delta is set to zero.

239
00:19:58,150 --> 00:20:01,360
This means that any slight change is considered as a drop.

240
00:20:01,360 --> 00:20:09,070
So if we have even 0.0005, then we consider this as a drop in the loss.

241
00:20:10,300 --> 00:20:17,190
Now, this is important because this has a patients this callbacks make use of this patients with those

242
00:20:17,200 --> 00:20:27,940
patients we are defining the number of epochs above which if we don't have a decrease in the validation

243
00:20:27,940 --> 00:20:35,200
loss, like in this case of the validation loss, we consider that we could stop that training process.

244
00:20:35,200 --> 00:20:43,870
And for the accuracy is the number of epochs above which if we don't have an increase in the validation

245
00:20:43,870 --> 00:20:50,620
accuracy, if we've picked validation accuracy for the monitor then would have to stop the training.

246
00:20:50,620 --> 00:20:57,160
So we define this or we predefined this so that this could run automatically.

247
00:20:57,580 --> 00:21:03,820
Now, for the mode by default, we have the auto mode, but we could specify mean or max.

248
00:21:03,850 --> 00:21:13,450
Note is that when speaking about the loss, we spoke of a value or the number of epochs above which

249
00:21:13,450 --> 00:21:16,210
if the loss doesn't decrease.

250
00:21:16,210 --> 00:21:21,940
So yeah, we're supposing that the loss is meant to be decreasing and in that case we are having a mode

251
00:21:21,940 --> 00:21:22,930
of mean.

252
00:21:22,930 --> 00:21:30,820
Now for the accuracy we spoke of for the patients a number of epochs above which if the accuracy doesn't

253
00:21:30,820 --> 00:21:31,720
increase.

254
00:21:31,720 --> 00:21:38,650
So in this case we have in the max Now what TensorFlow permits us to do is to use an auto and what is

255
00:21:38,650 --> 00:21:44,320
auto TensorFlow automatically infers whether it's dealing with a mean or max.

256
00:21:44,890 --> 00:21:51,190
So this means that if you place in, for example, valid position, this auto should be able to understand

257
00:21:51,190 --> 00:21:54,160
that a position should be increasing.

258
00:21:54,230 --> 00:21:55,810
So it's going to use a max.

259
00:21:56,830 --> 00:22:00,190
Then we move to the baseline where the training stops.

260
00:22:00,190 --> 00:22:07,870
If the model doesn't show improvement over the baseline, then finally we have this restore best weights

261
00:22:07,870 --> 00:22:12,250
with the restore best weights, which by default is false.

262
00:22:12,760 --> 00:22:17,830
We are simply saying that the model is going to take up its final state.

263
00:22:17,830 --> 00:22:25,450
So this means that if we start monitoring the model, say at this point here or at this point where

264
00:22:25,450 --> 00:22:32,980
we have the lowest possible loss, and then the model, let's say we have a patients of five, so we

265
00:22:32,980 --> 00:22:37,000
are going to train for four or five epochs before stopping.

266
00:22:37,000 --> 00:22:42,130
So if after five epochs we are on this, let's suppose that the epoch we are on this.

267
00:22:42,130 --> 00:22:48,790
So we've added from here plus five epochs, we are on this and then we are having this loss is clear

268
00:22:48,790 --> 00:22:53,710
that this model with this loss is less performant than this one.

269
00:22:53,830 --> 00:23:01,870
Now, if this restore best weight is set to false, then we'll just take the model's weights here or

270
00:23:01,900 --> 00:23:05,230
the model weights which give this loss value here.

271
00:23:05,230 --> 00:23:11,920
Whereas if it's set to true, then it means we're going to take the best weights we've had throughout

272
00:23:11,920 --> 00:23:18,490
the training process and which happens to be the weights which provide this loss right here.

273
00:23:19,250 --> 00:23:20,680
That said, here we go.

274
00:23:20,680 --> 00:23:21,550
We have this.

275
00:23:21,550 --> 00:23:31,660
We have, let's say the patience to to re verbosity one or mode auto baseline non restore best is false.

276
00:23:31,660 --> 00:23:37,870
So let's run that and then all we need to do is to just include this year.

277
00:23:37,870 --> 00:23:44,680
So yeah we're going to have is call back now we could take off the as we call back you could always

278
00:23:44,950 --> 00:23:49,390
put all this together, let's just let it so we could see how all that works.

279
00:23:49,390 --> 00:23:55,030
So we just have this list right here and we have this is callback with the CSB callback.

280
00:23:55,030 --> 00:23:56,620
Now we run the training.

281
00:23:57,400 --> 00:24:04,450
We didn't train this for long enough to be able to observe any callback changes.

282
00:24:04,450 --> 00:24:11,880
So let's take this to ten epochs and then we reduce this to one or two.

283
00:24:11,890 --> 00:24:13,120
Let's take this to two.

284
00:24:13,780 --> 00:24:16,840
We run that again, so that's fine.

285
00:24:16,840 --> 00:24:19,690
And then we fit our model.

286
00:24:20,020 --> 00:24:28,270
After training for eight epochs, we see clear how the early stop and callback stops the training process.

287
00:24:28,420 --> 00:24:33,790
Now, let's understand why this is all this training process has been stopped.

288
00:24:33,910 --> 00:24:40,810
If you take take a look at this validation loss right here, you'll find that there was a drop your

289
00:24:40,810 --> 00:24:43,030
drop increase.

290
00:24:43,030 --> 00:24:44,950
But after this increase, there was a drop.

291
00:24:44,950 --> 00:24:53,890
And since the patience is equal to that's the patience we had defined here, equal to we have to get

292
00:24:53,890 --> 00:24:57,850
two successive increases or two successive.

293
00:24:59,050 --> 00:25:03,530
Same loss values before the training process could be stopped.

294
00:25:03,550 --> 00:25:06,130
So since we after this, we have a drop.

295
00:25:06,130 --> 00:25:07,570
The training process continues.

296
00:25:07,570 --> 00:25:10,600
Then we have this increase and then we have this drop.

297
00:25:10,600 --> 00:25:11,770
So it continues.

298
00:25:11,770 --> 00:25:15,670
Then here we have this increase and then here again we have this other increase.

299
00:25:15,670 --> 00:25:22,540
So because now we have had this two successive increases, as we could see in the plot right here,

300
00:25:22,600 --> 00:25:23,890
you see right here.

301
00:25:25,160 --> 00:25:29,720
Well, we have this increase, drop, increase and then increase.

302
00:25:30,920 --> 00:25:37,040
We now have the training process, which is being stopped, as you could see here.

303
00:25:38,090 --> 00:25:40,970
That said, we now move on to the learning rate schedule.