1
00:00:00,240 --> 00:00:01,260
Hi and welcome back.

2
00:00:01,740 --> 00:00:08,890
Now let's take a look at how we add to regularization techniques that we did in Congress in PyTorch

3
00:00:08,890 --> 00:00:10,900
when a similar fashion amnesty to set.

4
00:00:10,950 --> 00:00:15,520
So let's open this notebook and we'll wait for it to load.

5
00:00:15,540 --> 00:00:16,240
There it goes.

6
00:00:16,260 --> 00:00:17,820
So let's begin the lesson.

7
00:00:18,450 --> 00:00:25,740
So just to recap the regularization methods that we will be using as L2 or regularization data augmentation

8
00:00:26,250 --> 00:00:30,660
mutation, which is a data manipulations that you've seen previously dropout.

9
00:00:30,660 --> 00:00:31,560
And that's norm.

10
00:00:32,130 --> 00:00:32,940
So let's begin.

11
00:00:33,150 --> 00:00:37,690
We just do the standard loading of our libraries, which allow us to load the PyTorch models and data

12
00:00:37,730 --> 00:00:38,100
set.

13
00:00:38,580 --> 00:00:40,580
We set our device to GPU.

14
00:00:41,610 --> 00:00:42,060
There we go.

15
00:00:42,060 --> 00:00:46,020
So these things are running and it's going to be finished shortly.

16
00:00:47,490 --> 00:00:52,680
In the meantime, I'm going to talk about how we manipulate our data transforms.

17
00:00:53,130 --> 00:00:56,880
Remember, with Chris, we used our image data generator.

18
00:00:57,300 --> 00:00:59,160
Will PyTorch has that by default.

19
00:00:59,280 --> 00:01:00,810
That's where these transformers are.

20
00:01:01,170 --> 00:01:03,210
And that's what these disclosures are, actually.

21
00:01:03,660 --> 00:01:07,980
And the transformers are basically where we can set all of these augmentations we want to do.

22
00:01:08,490 --> 00:01:09,540
Remember, we were doing.

23
00:01:09,850 --> 00:01:14,910
Remember, we did things like random horizontal flip rotations, grid scaling.

24
00:01:15,270 --> 00:01:16,410
Well, we didn't do those.

25
00:01:16,410 --> 00:01:18,300
And Keros, we used a different set of.

26
00:01:18,630 --> 00:01:21,300
We use some random shifts and skewing insure.

27
00:01:21,720 --> 00:01:26,910
We can do all of those things and pay to watch the all functions to support those things and into transforms

28
00:01:27,510 --> 00:01:28,440
library and pay to.

29
00:01:29,490 --> 00:01:32,580
However, for this lesson, I decided to use some different augmentations.

30
00:01:33,000 --> 00:01:34,970
And then we also visualized these below.

31
00:01:35,010 --> 00:01:37,230
But for now, these are augmentations.

32
00:01:37,230 --> 00:01:42,840
We're going to do random our transformation, which is basically a sharing and warping type effect.

33
00:01:43,380 --> 00:01:48,600
It also has some degree of actually degrees basically said how much you want to skew it by.

34
00:01:49,050 --> 00:01:52,140
It has some translation effect, as well as the sharing effect here.

35
00:01:52,740 --> 00:01:57,720
We're doing college data, which basically changes to hue bit and saturation a bit randomly.

36
00:01:58,260 --> 00:02:02,190
Random horizontal flip, random rotations 15 degrees.

37
00:02:03,390 --> 00:02:04,670
Then we're using grayscale.

38
00:02:04,680 --> 00:02:06,180
We obviously have to do this.

39
00:02:06,180 --> 00:02:10,870
This as well to tend to normalize notice that these are the bottom here.

40
00:02:10,890 --> 00:02:13,500
It doesn't matter in the pipeline where they are, but it isn't.

41
00:02:13,680 --> 00:02:14,610
They do need to be here.

42
00:02:14,720 --> 00:02:19,200
And you can put them at the bottom because it makes more sense to have them there as opposed to having

43
00:02:19,200 --> 00:02:22,470
them in the beginning, just because it's easier to visualize, like what's happening.

44
00:02:23,160 --> 00:02:28,980
And you may have noticed that we have Val here and there are tree in here and Val over here.

45
00:02:29,640 --> 00:02:33,630
Well, this is the transforms as well for the validation data set or test data sets.

46
00:02:33,630 --> 00:02:37,320
It always has to be to see him the same way these days too.

47
00:02:37,680 --> 00:02:43,590
But notice what I wanted you to pay attention to is that we doing to the augmentations individual additional

48
00:02:43,590 --> 00:02:44,350
test dataset.

49
00:02:44,370 --> 00:02:48,450
It's just been done in the training dataset because it's a train time operation.

50
00:02:49,440 --> 00:02:53,220
Next, we create our data ladders, which you've seen before.

51
00:02:54,210 --> 00:02:54,580
Whoops.

52
00:02:54,600 --> 00:02:56,670
I didn't actually run the code above it.

53
00:02:58,080 --> 00:03:00,290
So let's run this block now.

54
00:03:01,650 --> 00:03:03,090
And what this does?

55
00:03:03,210 --> 00:03:03,570
Sorry.

56
00:03:04,140 --> 00:03:10,890
You've seen basically how we actually have to specify the data transforms here instead of pointing it

57
00:03:11,070 --> 00:03:13,710
to the actual transformer having a separate transform.

58
00:03:14,100 --> 00:03:19,500
We use a dictionary to store this information so we can access it like data transforms and access to

59
00:03:19,500 --> 00:03:23,790
key train here, which contains this object here.

60
00:03:24,630 --> 00:03:30,810
And then similarly, we do it for devel, which contains this object here, which contains it transforms.

61
00:03:31,500 --> 00:03:35,760
And we just load the train load said this to for the validation.

62
00:03:36,300 --> 00:03:42,240
Then we create our data loaders here where we shuffle the training dataset, but we're not shuffling

63
00:03:42,240 --> 00:03:43,890
to the test dataset.

64
00:03:44,880 --> 00:03:47,880
We set the number of workers this time to remember.

65
00:03:47,880 --> 00:03:48,960
Previously we used zero.

66
00:03:49,560 --> 00:03:54,450
Now we can know because this club environment has at least two CPU's, we can do this.

67
00:03:54,960 --> 00:03:56,820
And we said about Boutzos here.

68
00:03:57,270 --> 00:03:58,110
So let's move on.

69
00:03:58,110 --> 00:04:05,250
Now, let's take a look at how we add dropout that's known in in the CNN model itself.

70
00:04:05,760 --> 00:04:06,900
It's actually quite easy to do.

71
00:04:07,560 --> 00:04:12,720
So remember, we define our CNN layers here and notice.

72
00:04:12,720 --> 00:04:15,830
I mean, generally, it's a good standard practice to put them in the order you want to do.

73
00:04:15,840 --> 00:04:18,150
But what is actually created here?

74
00:04:18,780 --> 00:04:24,600
But notice we have an end or that's known and and that's known previously in Keros, we didn't have

75
00:04:24,600 --> 00:04:31,140
to set any parameters that specified how many output filters it was or what the input size was.

76
00:04:31,500 --> 00:04:33,930
But in PyTorch, we do, unfortunately.

77
00:04:33,940 --> 00:04:37,260
So we know the output of this filter has to be to a feature map.

78
00:04:37,270 --> 00:04:40,080
So we have to set set to be two here for the batch normalization.

79
00:04:40,590 --> 00:04:42,840
Similarly, for this one here, 64.

80
00:04:43,530 --> 00:04:49,350
And notice we don't have to drop out here yet, but we do introduce a drop down here.

81
00:04:49,920 --> 00:04:55,450
Now the reason for that for us, because the dropout, this dropout layer is going to be reused.

82
00:04:55,470 --> 00:04:57,180
And you'll see it's going to be reused here.

83
00:04:57,720 --> 00:04:59,510
We can call in this multiple times because.

84
00:04:59,640 --> 00:05:04,710
Says just earlier, this isn't actually in sequence, even though it's good practice to put them in

85
00:05:04,710 --> 00:05:06,570
sequence, we can put the drop out below.

86
00:05:06,960 --> 00:05:10,060
It's basically convention in this case would pay too much.

87
00:05:10,530 --> 00:05:15,420
So we did finally is here we have a conflict or conflict with.

88
00:05:16,410 --> 00:05:17,310
That's no, I'm sorry.

89
00:05:17,790 --> 00:05:20,790
That's this one here, and we have the conflict inside of it.

90
00:05:20,820 --> 00:05:28,860
Notice that it's inside of the batch of the batch normal conflict and then we have rlu as activation

91
00:05:28,860 --> 00:05:30,180
function for all of this here.

92
00:05:31,140 --> 00:05:35,850
So that's basically how we set these things up and pay towards this is the input.

93
00:05:35,850 --> 00:05:36,720
This is the output.

94
00:05:36,720 --> 00:05:37,920
And then this is activation.

95
00:05:38,490 --> 00:05:39,880
Then we take this X.

96
00:05:39,880 --> 00:05:46,110
This output gives it a drop out layer and a droplet parameter is specified as point to appear.

97
00:05:46,980 --> 00:05:53,520
And so we then we have another drop out layer at the end of this one where we have come to batch norm,

98
00:05:53,520 --> 00:05:57,330
come through and reload and then drop out layer at the end of all of that.

99
00:05:58,080 --> 00:06:04,710
And then we have a pooling layer which saw Max-Q or flatten function or glue on the next FC layer,

100
00:06:04,710 --> 00:06:06,990
which is a 128 node layer.

101
00:06:07,470 --> 00:06:10,230
And then finally, that returns to the soft max layer.

102
00:06:10,650 --> 00:06:12,240
So let's create this.

103
00:06:12,240 --> 00:06:14,030
So this could be a bit confusing.

104
00:06:14,040 --> 00:06:15,540
However, it's actually quite simple.

105
00:06:15,900 --> 00:06:21,810
And you will get the hang of PyTorch when creating you and see it ends just just by practicing this

106
00:06:21,810 --> 00:06:24,360
thing, just creating you and see and see what happens between it.

107
00:06:24,720 --> 00:06:27,420
It's always good to experiment because this is a lot of code.

108
00:06:27,420 --> 00:06:29,760
This can be overwhelming for many, many people.

109
00:06:30,180 --> 00:06:32,340
So just break it down into pieces.

110
00:06:32,340 --> 00:06:33,180
Step through it.

111
00:06:33,570 --> 00:06:34,470
Experiment.

112
00:06:34,890 --> 00:06:38,640
I python that books are great for experimenting because you can see the results right below.

113
00:06:39,360 --> 00:06:43,240
So don't feel intimidated yet, and they know it and know it will be.

114
00:06:43,260 --> 00:06:44,310
But don't worry about it.

115
00:06:44,970 --> 00:06:47,880
Next, we're going to add L2 regularization.

116
00:06:48,270 --> 00:06:49,530
So how do we do that?

117
00:06:50,190 --> 00:06:51,540
Well, it's actually quite easy.

118
00:06:53,490 --> 00:06:57,210
So how do we introduce L2 regularization in Python?

119
00:06:57,330 --> 00:07:01,980
Well, it's actually a bit different to how we did it in Keros, and that's because we have to do it

120
00:07:01,980 --> 00:07:05,370
within the optimizer itself to sarcastic gradient descent algorithm.

121
00:07:06,000 --> 00:07:06,870
And it isn't.

122
00:07:06,870 --> 00:07:11,760
It's not called L2 and Python, which it's called the weird decay function, and we set it to point

123
00:07:11,760 --> 00:07:12,900
zero zero one here.

124
00:07:12,910 --> 00:07:18,570
That's the L2 isn't L1 regularization by default in this optimizer, unfortunately.

125
00:07:18,990 --> 00:07:24,720
However, we can apply it using this and alternatively if you want, but we wouldn't go into that here.

126
00:07:25,200 --> 00:07:30,990
So let's just create this with delta regularization, which is called way to get right there.

127
00:07:32,400 --> 00:07:36,510
So now we're ready to train our model with all of these augmentation methods.

128
00:07:36,960 --> 00:07:37,970
So let's do it.

129
00:07:37,980 --> 00:07:39,770
Let's set it for 15 epochs.

130
00:07:39,780 --> 00:07:41,610
Is that what we did previously?

131
00:07:43,820 --> 00:07:52,310
Let's see what the setting was 15 bucks, yes, so let's leave this, let's run this and we'll start

132
00:07:52,310 --> 00:07:59,990
our training process and our training process could is exactly the same as it was previously, so we'll

133
00:07:59,990 --> 00:08:01,230
wait for these results.

134
00:08:01,250 --> 00:08:06,080
I'll pause the video for now and come back and then we'll analyze the results shortly.

135
00:08:06,350 --> 00:08:08,870
So I'll see you in a couple of minutes.

136
00:08:11,670 --> 00:08:12,990
Hi and welcome back.

137
00:08:13,320 --> 00:08:16,200
So now you can see a model has finished training.

138
00:08:16,290 --> 00:08:21,910
And similarly, just like the Keros instance, you can see, oh, accuracy isn't as good as the non-regular

139
00:08:21,910 --> 00:08:22,500
rice model.

140
00:08:22,960 --> 00:08:29,300
Now that might be surprising because, you know, I told you that regularization helps get better performance.

141
00:08:29,340 --> 00:08:34,860
However, to achieve that better performance you didn't need, you do need to use more epochs.

142
00:08:35,400 --> 00:08:38,790
So that's basically something you can try on your own.

143
00:08:38,800 --> 00:08:45,390
Let's increase this to maybe 25 to 50 books better yet, and we can assess the model's accuracy.

144
00:08:45,930 --> 00:08:52,050
One thing is good to a generalization is that it's good to experiment a bit initially with outside the

145
00:08:52,060 --> 00:08:52,530
office.

146
00:08:52,920 --> 00:08:54,660
See when your model starts with fitting.

147
00:08:54,900 --> 00:09:01,410
Basically, get a baseline model and try to improve iteratively by introducing one regularization method

148
00:09:01,800 --> 00:09:02,460
at a time.

149
00:09:02,910 --> 00:09:06,600
That's how we keep learning practitioners actually get the best models out.

150
00:09:06,990 --> 00:09:10,320
We don't just throw everything at it and hope for the best.

151
00:09:10,680 --> 00:09:15,560
You need to take experimental strategic approach to improving your models.

152
00:09:15,570 --> 00:09:19,470
Otherwise you can be overlooking something quite simple like you can be.

153
00:09:20,040 --> 00:09:24,270
You probably get the best model by using something simple, sometimes depending on the dataset.

154
00:09:24,900 --> 00:09:29,740
So for now, let's just get overall accuracy, which would be the average of all of these tests.

155
00:09:29,760 --> 00:09:30,780
Accuracy is here.

156
00:09:31,380 --> 00:09:32,480
Eighty nine point nine sheets.

157
00:09:32,490 --> 00:09:33,210
So that's pretty good.

158
00:09:33,900 --> 00:09:35,130
A little bit better than expected.

159
00:09:35,640 --> 00:09:41,320
So let's take a look at our training plots again, and you can see here.

160
00:09:41,550 --> 00:09:45,230
As expected, it looks like it's going to be overfitting slightly higher overall.

161
00:09:46,140 --> 00:09:47,790
This may just be an anomaly.

162
00:09:48,270 --> 00:09:50,910
It could probably continue to go go down.

163
00:09:51,720 --> 00:09:52,110
I'm sorry.

164
00:09:52,110 --> 00:09:52,560
Go up.

165
00:09:52,890 --> 00:09:55,050
Blue is accuracy, so you can.

166
00:09:55,260 --> 00:09:56,520
You can monitor it here.

167
00:09:57,150 --> 00:10:05,070
So I would suggest training this for 50 bucks just to verify that our regularization methods are making

168
00:10:05,070 --> 00:10:05,700
a difference.

169
00:10:05,700 --> 00:10:08,250
And if not, that's just the nature of this.

170
00:10:08,790 --> 00:10:12,030
And basically, that's it for this lesson.

171
00:10:12,420 --> 00:10:19,440
So you would have learned how to add L to regularization by adding it into the optimizer here as this

172
00:10:19,440 --> 00:10:20,550
weird Typekit parameter.

173
00:10:20,940 --> 00:10:25,950
And you would have learned how to easily implement things like drop out and batch Norman where they

174
00:10:25,950 --> 00:10:27,230
need to be placed in the loop.

175
00:10:27,630 --> 00:10:30,360
This is an important part of the code.

176
00:10:30,420 --> 00:10:35,200
Let's take a look at this and remember that when you have a conflict here, this is the input.

177
00:10:35,220 --> 00:10:36,360
This is the first layer.

178
00:10:36,720 --> 00:10:37,750
This is the second.

179
00:10:37,770 --> 00:10:42,040
This is a tool in how that's applied to which it models are constructed.

180
00:10:42,060 --> 00:10:43,980
You don't have to do all in one line.

181
00:10:44,340 --> 00:10:51,510
You can do it line by line where X is equal to self thought into X and then progressively Google to

182
00:10:51,510 --> 00:10:53,430
the batch normally, then reload.

183
00:10:53,760 --> 00:10:59,580
However, this this saves of space and this is sort of like how Keras treats one layer anyway.

184
00:10:59,970 --> 00:11:03,180
Actually, no carrot actually has a separate batch name outside of it.

185
00:11:03,510 --> 00:11:05,370
So just remember those differences anyway.

186
00:11:05,850 --> 00:11:12,330
And lastly, the last augmentation method we added was data augmentation, and that's was done in the

187
00:11:12,330 --> 00:11:13,770
transformation possibly could.

188
00:11:14,220 --> 00:11:20,610
So you can see we introduced all these random data augmentation techniques to manipulate our training

189
00:11:20,610 --> 00:11:23,970
dataset as data is loaded during the training process.

190
00:11:24,480 --> 00:11:31,920
So we'll stop there for now and then the next lesson will continue with the visualization of how CNN's

191
00:11:31,920 --> 00:11:32,640
actually live.

192
00:11:33,090 --> 00:11:37,830
So we'll start with some slides and then I'll dive into the code afterwards for those lessons.

193
00:11:38,280 --> 00:11:40,950
Thank you, and I'll see you in the rest of the course.