1
00:00:04,680 --> 00:00:08,520
Hi and welcome to Lesson six in the deep learning section here.

2
00:00:11,700 --> 00:00:16,500
Hi and welcome to the six lesson of the deep learning section of this course.

3
00:00:17,190 --> 00:00:23,520
We're going to start with introducing regularisation now onto the same network we've trained just now.

4
00:00:24,120 --> 00:00:26,670
That was the fashion amnesty F.A. dataset.

5
00:00:27,120 --> 00:00:29,130
Previously, we didn't train it without.

6
00:00:29,280 --> 00:00:31,170
We trained it without any augmentation.

7
00:00:31,500 --> 00:00:32,130
Now we're going.

8
00:00:35,990 --> 00:00:37,280
Hi and welcome back.

9
00:00:37,700 --> 00:00:42,680
So what we're going to do now is we're going to train the same fashion amnesty to set.

10
00:00:43,100 --> 00:00:48,770
However, we're going to introduce regularization into this model and we'll assess its performance and

11
00:00:48,770 --> 00:00:52,460
you'll see how regularization has improved our accuracy.

12
00:00:53,210 --> 00:00:54,950
So let's firstly get started.

13
00:00:54,960 --> 00:00:57,920
So in this lesson, what we're going to do?

14
00:00:57,950 --> 00:01:00,650
We're going to introduce four types of regularization.

15
00:01:01,190 --> 00:01:04,640
We're going to do L2 regularization data augmentation.

16
00:01:04,700 --> 00:01:09,950
Remember, this is the image manipulation one where we keep every time we load data into our dataset,

17
00:01:10,310 --> 00:01:13,010
we just add some random manipulations to it.

18
00:01:13,070 --> 00:01:16,160
So that means it helps it generalize much better to unseen data.

19
00:01:16,780 --> 00:01:19,400
Then we're going to just drop out and batch norm.

20
00:01:19,880 --> 00:01:22,400
It's quite simple to introduce increase.

21
00:01:22,940 --> 00:01:24,470
So let's firstly begin.

22
00:01:24,470 --> 00:01:30,230
Let's look at our model and you'll see it takes a little while because we have to connect to our instance

23
00:01:30,470 --> 00:01:31,310
on Google Cloud.

24
00:01:32,120 --> 00:01:33,110
And do we go?

25
00:01:33,200 --> 00:01:35,210
So it's going to be loading this dataset?

26
00:01:35,210 --> 00:01:37,610
No, that finishes quite quickly.

27
00:01:38,420 --> 00:01:41,420
Check to see if we're using a GPU, just to be sure.

28
00:01:43,130 --> 00:01:48,830
And we are we using the Tesla P 100 GPU again, we inspect our data.

29
00:01:49,070 --> 00:01:54,200
I'm going to go through this a bit quickly because you've seen this before, and I don't want to waste

30
00:01:54,200 --> 00:01:55,820
your time and be redundant.

31
00:01:55,850 --> 00:01:59,210
Similarly, we have seen this two class plotting examples.

32
00:02:01,520 --> 00:02:07,040
Next, we're going to take a look at something that you haven't seen before, and that's the image data

33
00:02:07,040 --> 00:02:08,660
generator function in.

34
00:02:09,440 --> 00:02:11,810
So what is this image data generator function?

35
00:02:12,320 --> 00:02:19,400
Well, we need it when we're doing the data augmentation part of it because we're loading data.

36
00:02:20,210 --> 00:02:27,560
You mentioned this and we're loading the data in batches, but we want to apply these random data augmentations

37
00:02:27,560 --> 00:02:28,490
to that batch.

38
00:02:28,970 --> 00:02:34,760
The way we do that is we used image data generator that basically points to these images, applies to

39
00:02:34,760 --> 00:02:37,220
manipulations and then returns the images.

40
00:02:37,520 --> 00:02:39,800
Where does manipulations back to a training loop?

41
00:02:40,280 --> 00:02:41,720
So that's what this function does here.

42
00:02:41,720 --> 00:02:43,610
So well, I'll let you listen shortly.

43
00:02:43,640 --> 00:02:48,050
I'm just talking about it because it's mentioned here, and I just think you would.

44
00:02:48,140 --> 00:02:50,240
It would feel curious about it at this point.

45
00:02:50,780 --> 00:02:53,840
But the rest of the code here is standard stuff you've seen before.

46
00:02:53,880 --> 00:02:58,370
We're reshaping the training that is said the test dataset.

47
00:02:58,760 --> 00:03:01,850
Then we're going to do the hot one hot, including the labels.

48
00:03:03,110 --> 00:03:08,560
We're going to get dual image rows and columns and created with image input shape variable.

49
00:03:08,630 --> 00:03:10,640
We're going to normalize the data as well.

50
00:03:13,400 --> 00:03:19,340
Next, we're going to keep a track of the number of pixels classes as well as due to one hot including.

51
00:03:19,490 --> 00:03:25,820
And next we build our model and notice this model looks a bit longer than our previous model.

52
00:03:26,390 --> 00:03:27,440
What's different about it?

53
00:03:27,560 --> 00:03:31,400
Well, you should have noticed noticed three main things.

54
00:03:31,460 --> 00:03:37,850
Firstly, we now have a parameter defined above here called L2 rhumba L2 and L1 norm.

55
00:03:37,880 --> 00:03:43,320
We're going to be using the L2 norm, so too do to do that in the conflict here.

56
00:03:43,340 --> 00:03:49,760
When we do model that add remember, we specify no filters, the kernel sized activation, and there's

57
00:03:49,760 --> 00:03:56,690
a parameter we can introduce called Colonel Regularised and we just use a regular dot l to remember

58
00:03:56,690 --> 00:03:58,460
we import regular races here.

59
00:03:59,030 --> 00:04:02,390
We could do regular as a dot L1 as well.

60
00:04:02,810 --> 00:04:05,350
We give it the L2 variable setting.

61
00:04:05,390 --> 00:04:08,570
We were using point zero zero one here in this instance.

62
00:04:09,320 --> 00:04:14,660
You can try different values as you remember the bit larger to value, the more you penalized a weight,

63
00:04:14,660 --> 00:04:15,680
so the slower.

64
00:04:15,750 --> 00:04:18,590
Generally, this network would converge to.

65
00:04:20,060 --> 00:04:23,340
So now you can see we do it here as well with a second cornflower.

66
00:04:24,080 --> 00:04:29,120
And notice something he actually skipped over it accidentally, but we're introducing Batch Gnome.

67
00:04:29,360 --> 00:04:30,530
And you can see how easy.

68
00:04:30,530 --> 00:04:37,790
But still to introduce it simply just is just modeled on ad batch normalization layer, which we import

69
00:04:37,880 --> 00:04:43,880
up here and you place it between the currently, as you can see after the real layer, which is what

70
00:04:43,880 --> 00:04:45,980
happens in this little block of code here.

71
00:04:46,550 --> 00:04:52,320
This is a conflict with the reload activation along with the L2 razor.

72
00:04:52,970 --> 00:04:56,150
So then we just add a batch norm at the end right here.

73
00:04:57,020 --> 00:04:58,140
No, it doesn't.

74
00:04:58,160 --> 00:05:05,240
Just one thing to note that all the layers that have the L2 that can take the kernel regular are there.

75
00:05:05,300 --> 00:05:08,690
Those layers need to have the same URL to regularize a set as well.

76
00:05:09,170 --> 00:05:13,910
I mean, yes, you probably can mix L1 L2, but it's not recommended unless you know what you're doing.

77
00:05:14,390 --> 00:05:17,360
So generally stick to the same regularised for each layer.

78
00:05:17,930 --> 00:05:19,610
So that's applied in the dense.

79
00:05:20,000 --> 00:05:21,890
And the conflict is I mean, do you know why?

80
00:05:22,310 --> 00:05:27,740
Because those are the layers here you can see besides about stone, but the conflict is here and the

81
00:05:27,740 --> 00:05:31,340
dense layers away, all of the learnable parameters are the bulk of them.

82
00:05:31,790 --> 00:05:33,680
So that's those are the weights we want to control.

83
00:05:34,490 --> 00:05:37,090
So you can see we have the batch gnome here.

84
00:05:37,100 --> 00:05:38,420
We have another batch gnome here.

85
00:05:38,960 --> 00:05:40,760
And next we're introducing Drop-Out.

86
00:05:41,720 --> 00:05:44,420
We're introducing drop out after the Max Boulia.

87
00:05:44,420 --> 00:05:51,020
In this case, we can introduce dropout at different points, but it generally works best after the

88
00:05:51,020 --> 00:05:54,760
max boot or after completely a conflict output.

89
00:05:55,400 --> 00:05:57,800
So we're setting the dropout rate at zero point two.

90
00:05:58,110 --> 00:06:02,210
Generally, I like to use values of point to point trips to find those tend to work well.

91
00:06:02,540 --> 00:06:07,250
I don't go up to point five because that slows down convergence time quite a bit.

92
00:06:07,370 --> 00:06:09,410
So point two is a fairly good one to go at.

93
00:06:09,830 --> 00:06:19,040
So you can see already we've introduced one to three types of regularization almost effortlessly.

94
00:06:19,310 --> 00:06:21,530
You can see how easy it is to configure CNN's now.

95
00:06:22,100 --> 00:06:28,100
And next, we must do more to compile with a loss, optimize a specified A metrics print or model summary.

96
00:06:28,670 --> 00:06:29,960
And we get that below.

97
00:06:31,340 --> 00:06:33,510
So now let's begin trading on model.

98
00:06:33,830 --> 00:06:39,170
However, let's note something in this code because this code does look a bit longer than our previous

99
00:06:39,170 --> 00:06:44,600
training block of code that generally just had a model that fit and specified a bad size and epochs.

100
00:06:45,200 --> 00:06:50,360
Well, what we've added here is the image data generated that I mentioned previously.

101
00:06:50,870 --> 00:06:57,140
You can see this is creating a generator here with the specific augmentations that we want to use.

102
00:06:57,560 --> 00:07:00,990
So we re scaling it where this is normalizing it.

103
00:07:01,020 --> 00:07:07,190
Basically, this is rotations which shift height shift share in the Zoom range horizontal flip.

104
00:07:07,730 --> 00:07:09,710
All of these should be self-explanatory.

105
00:07:09,740 --> 00:07:16,430
However, if you want to take a look at what they actually do, go to the decrease augmentation documentation.

106
00:07:16,910 --> 00:07:19,720
I'll probably put a link within this notebook if you want to see afterwards.

107
00:07:19,730 --> 00:07:25,110
It's not in my video notebook, but I'll post one in here for you guys so you can take a look and it

108
00:07:25,110 --> 00:07:27,800
will basically show you what these things look, look, look like.

109
00:07:28,250 --> 00:07:31,580
How you can like this full mode of what the film would nearest means.

110
00:07:32,000 --> 00:07:37,250
Basically, what narrowest means is that if you were to zoom out or shift an image left or right, you

111
00:07:37,250 --> 00:07:41,150
would end up with some black pixels or some and some blank pixels, basically.

112
00:07:41,600 --> 00:07:44,090
So you can fill that with either the nearest pixel.

113
00:07:44,090 --> 00:07:47,120
So whatever pixel was nearest, it has copies that value.

114
00:07:47,480 --> 00:07:49,640
It's quite good to you as near as I would think.

115
00:07:50,450 --> 00:07:54,170
So that's we have a image data generator I turn to here.

116
00:07:54,740 --> 00:08:01,070
So now when we're doing model that fit, we use that generator we created up here and do dot flow.

117
00:08:01,730 --> 00:08:07,160
So because this is this, this is just how Keros treats its image generator.

118
00:08:07,580 --> 00:08:09,770
So we used this as a function built into it.

119
00:08:09,770 --> 00:08:12,390
So it's trained in this agenda of.

120
00:08:12,440 --> 00:08:14,600
Variable for it to flow.

121
00:08:15,080 --> 00:08:20,480
And then we specifies our training data or training labels, the bad size, and then again, we just

122
00:08:20,480 --> 00:08:23,390
do the epochs validation steps for epoch.

123
00:08:23,660 --> 00:08:25,460
Now notice this parameter.

124
00:08:25,460 --> 00:08:28,250
I don't think this was here before in the previous lesson.

125
00:08:29,180 --> 00:08:35,660
Here we have two bigger steps per epoch, which is basically a parameter here that takes the extreme

126
00:08:35,660 --> 00:08:38,300
shape, which is 28 divided by two.

127
00:08:38,780 --> 00:08:40,460
I'm sorry this is not the extreme shape.

128
00:08:40,790 --> 00:08:46,940
The ActionScript zero zero here is the 60000 sorry, not the twenty eight to sixty thousand.

129
00:08:46,940 --> 00:08:53,480
Divided by the bad size is how much steps per epoch that's this number has 1875 the steps, people,

130
00:08:53,510 --> 00:08:56,030
because we're using a small of that size this time.

131
00:08:56,720 --> 00:09:00,950
So we need to specify this parameter in here when using the tree and its agenda.

132
00:09:01,400 --> 00:09:05,070
So let's run this now, and that's not our results.

133
00:09:05,070 --> 00:09:05,960
So let's take a look.

134
00:09:06,050 --> 00:09:08,540
This may take a bit longer, but we'll see.

135
00:09:08,870 --> 00:09:10,310
So let's wait for it.

136
00:09:15,810 --> 00:09:16,200
OK.

137
00:09:16,290 --> 00:09:21,720
So when that work has finally finished training and you would have noticed some, some things, definitely

138
00:09:21,720 --> 00:09:22,470
you would have noticed.

139
00:09:22,980 --> 00:09:24,630
One is that it took a lot longer.

140
00:09:24,630 --> 00:09:30,540
The train took almost twice the time, and that's because we're doing several things that add competition

141
00:09:30,540 --> 00:09:38,190
alluded to this training process that this was data augmentation, the L to regularize it, regularization

142
00:09:38,670 --> 00:09:44,700
the batch norm and to drop out all of those things contribute to reducing the convergence time.

143
00:09:45,420 --> 00:09:53,070
However, you can see that while 15 epochs wasn't enough to get similar accuracy on the training dataset,

144
00:09:53,550 --> 00:09:58,110
you can see generally the validation accuracy was quite good 89 percent.

145
00:09:58,620 --> 00:10:03,600
Let's take a look at what we got previously, because I don't recall might have been the similar.

146
00:10:09,780 --> 00:10:14,100
Eighty nine percent actually across 90 percent without validation, without regularization.

147
00:10:14,850 --> 00:10:22,560
So while this doesn't look good for regularization methods, however, it does indicate that when using

148
00:10:22,560 --> 00:10:24,990
regularization, you do have to train for more e-books.

149
00:10:25,020 --> 00:10:26,640
I didn't want to make this lesson too long.

150
00:10:27,060 --> 00:10:33,060
But if you have some time to leave it overnight, I recommend you try this for at least 50 books, and

151
00:10:33,060 --> 00:10:40,590
you can see that I'm pretty sure our validation accuracy would be better than if we done without regularization.

152
00:10:41,160 --> 00:10:45,180
So give it a try and you can see an experiment and see how it goes.

153
00:10:45,600 --> 00:10:49,890
One thing you would know is that our validation accuracy was gradually going up.

154
00:10:50,340 --> 00:10:55,770
You may have had a little peek in and drop here, but generally it was going up again, which is a good

155
00:10:55,770 --> 00:10:56,130
sign.

156
00:10:56,130 --> 00:11:01,320
It means that it wasn't really overfitting to training data as much as the previous model, which is

157
00:11:01,320 --> 00:11:02,100
what we want.

158
00:11:02,760 --> 00:11:08,010
So generally, this is a good thing and you can see how the loss was point zero point four zero three

159
00:11:08,010 --> 00:11:11,880
here, although this was actually was quite a bit.

160
00:11:12,270 --> 00:11:14,640
You can definitely see it over fitted at this point here.

161
00:11:15,030 --> 00:11:20,790
So you can you can see generally that the regularization methods have reduced over fitting on our model.

162
00:11:21,270 --> 00:11:26,280
However, we ended up getting the same accuracy in the validation dataset, so it's almost no difference

163
00:11:26,280 --> 00:11:29,160
in the real world, at least after 15 epochs.

164
00:11:29,160 --> 00:11:34,350
But I would definitely recommend you train for more ebooks than this, and I'm pretty sure you will

165
00:11:34,350 --> 00:11:37,320
get a better model out of this than we did before.

166
00:11:37,590 --> 00:11:40,530
OK, so we'll stop there for now.

167
00:11:40,980 --> 00:11:45,090
And next will we're going to do the same thing with PyTorch now?

168
00:11:45,480 --> 00:11:50,460
So we're going to try the PyTorch with the fashion amnesty, the said without regularization and then

169
00:11:50,460 --> 00:11:56,580
train one of which regularization and compare them, as well as we get to show you how we introduce

170
00:11:56,670 --> 00:12:01,440
regularization methods in PyTorch, which is a bit different to us, but they're quite easy.

171
00:12:01,890 --> 00:12:03,490
So I'll see you in the next lesson.

172
00:12:03,570 --> 00:12:03,990
Thank you.