1
00:00:00,990 --> 00:00:04,620
Hi and welcome to our first regularisation lesson.

2
00:00:05,130 --> 00:00:10,380
So in this lesson, what we're going to do, we're going to train a model on the fashion amnesty to

3
00:00:10,380 --> 00:00:10,690
set.

4
00:00:10,710 --> 00:00:12,970
Firstly, without any regularization.

5
00:00:13,440 --> 00:00:16,590
So let's begin to open this up based on the book here.

6
00:00:17,010 --> 00:00:18,300
I already have it opened.

7
00:00:18,810 --> 00:00:20,970
And this opens the lesson file.

8
00:00:21,570 --> 00:00:27,270
So this is where the fashionista the F.A. dataset looks like these are some of the some of the classes

9
00:00:27,270 --> 00:00:27,810
inside of it.

10
00:00:27,870 --> 00:00:29,070
It's 10 classes.

11
00:00:29,550 --> 00:00:32,970
It basically incorporates 10 classes of clothing items.

12
00:00:33,390 --> 00:00:34,110
I don't know them.

13
00:00:34,110 --> 00:00:34,530
All right.

14
00:00:34,540 --> 00:00:38,250
No, actually, I should have probably prepped that for the lesson, but as you can see, you can visualize

15
00:00:38,250 --> 00:00:38,640
them here.

16
00:00:39,000 --> 00:00:45,630
They are things like dresses, sweaters, shoes, slippers, T-shirts, shirts is another category,

17
00:00:45,630 --> 00:00:49,230
I believe, so that's basically what we're going to do.

18
00:00:49,830 --> 00:00:55,710
It's it's called fashion fashionista, mainly because it's in the same style and format as the original

19
00:00:55,710 --> 00:00:58,260
amnesty to said which you've seen in the previous lessons.

20
00:00:58,800 --> 00:01:04,740
There are grayscale images and there are 28 by 28 pixels, and they're also the exact same number of

21
00:01:04,740 --> 00:01:13,410
classes is 50 tells sorry, 60000 training data, set data images and 10000 test data images.

22
00:01:13,590 --> 00:01:15,390
So actually, I'm stupid enough.

23
00:01:15,390 --> 00:01:15,960
I'm stupid.

24
00:01:16,440 --> 00:01:21,810
Here are the classes here for the fashion this that I told you I didn't remember, and they're actually

25
00:01:21,810 --> 00:01:22,650
right here in this code.

26
00:01:23,250 --> 00:01:25,950
So let's begin loading our dataset.

27
00:01:26,070 --> 00:01:27,480
So to load this dataset?

28
00:01:27,870 --> 00:01:31,280
You've seen how we loaded the amnesty to set with cameras.

29
00:01:31,650 --> 00:01:37,050
Yes, from TensorFlow Dot Keras Dot datasets, and you import the dataset you want.

30
00:01:38,010 --> 00:01:38,730
Here we don't.

31
00:01:38,730 --> 00:01:45,060
We're downloading the fashion amnesty, so we have fashion, amnesty or load data, and it puts the

32
00:01:45,330 --> 00:01:50,220
training data, training labels, test data test labels into these variables right here.

33
00:01:50,730 --> 00:01:54,570
And this is something that they don't provided for us.

34
00:01:54,600 --> 00:02:01,680
We just we basically have to find this online and maybe the data set right up or read me file that describes

35
00:02:01,680 --> 00:02:08,040
what the classes are and the ordering of this of these classes are quite important because whenever

36
00:02:08,040 --> 00:02:13,200
you return probability, let's say the max probability of this image is class two.

37
00:02:13,650 --> 00:02:17,160
It needs to correspond to the second element in this area.

38
00:02:17,550 --> 00:02:21,690
So this would be class would be index zero index one index two.

39
00:02:22,140 --> 00:02:23,880
So the order has to be important.

40
00:02:23,880 --> 00:02:26,860
The order corresponds to what class it is.

41
00:02:26,880 --> 00:02:32,190
So we have a T-shirt, trouser pullover, dress code issued sneaker bag and ankle boots.

42
00:02:32,910 --> 00:02:34,800
Apparently, those are the ten classes of clothing.

43
00:02:35,160 --> 00:02:35,580
Who knew?

44
00:02:36,510 --> 00:02:39,870
Anyway, so let's check to see if we're using yoga for you.

45
00:02:39,870 --> 00:02:48,390
But let me just run this block of code before it takes a little while because you see it has to connect

46
00:02:48,390 --> 00:02:53,880
to a virtual machine in the cloud somewhere and then actually can see the specs of that machine right

47
00:02:53,880 --> 00:02:54,060
here.

48
00:02:54,060 --> 00:02:56,340
How much RAM and how this we're using.

49
00:02:56,820 --> 00:03:00,000
And it tells us we're actually all using the GPU back end to his.

50
00:03:00,120 --> 00:03:05,610
That's a quick way to check as opposed to checking it like this, which would have what I've shown you

51
00:03:05,610 --> 00:03:06,330
guys before.

52
00:03:06,870 --> 00:03:13,080
This is apparently new because I don't remember seeing this information when you hover over the RAM

53
00:03:13,080 --> 00:03:14,580
and disk icons appear.

54
00:03:15,270 --> 00:03:20,160
So now that we have downloadable data set, let's just we don't need to do this to be fair.

55
00:03:20,160 --> 00:03:26,520
But let's just check to make sure TensorFlow is using a CPU GPU and it is, and it displays the name

56
00:03:26,520 --> 00:03:27,240
of the CPU.

57
00:03:27,630 --> 00:03:31,130
This is a Tesla P 100, which is a fairly decent GPU.

58
00:03:31,140 --> 00:03:36,360
By the way, if you're using club free, you might get a different GPU.

59
00:03:36,360 --> 00:03:39,570
You might get one of the key A-series chips, which are a bit slower.

60
00:03:40,410 --> 00:03:45,810
I'm paying $10 a month for Color Pro, which I find to be a steal of a deal.

61
00:03:45,810 --> 00:03:51,750
That's quite good value for money because because these GPUs are basically a roughly the speed of like

62
00:03:52,230 --> 00:03:59,100
the Nvidia's like a 20 60 or 30 60 CPU plus to come with 16GB of RAM.

63
00:03:59,100 --> 00:04:00,630
So that's quite quite good.

64
00:04:01,320 --> 00:04:04,020
So we can just print out like we've done before.

65
00:04:04,320 --> 00:04:05,430
Just some specs.

66
00:04:05,430 --> 00:04:11,070
And if our data so we get samples, how many test samples, for example, the the size.

67
00:04:11,790 --> 00:04:17,160
So it's always good to do this, always good to just inspect the leadership, to make sure it's what

68
00:04:17,160 --> 00:04:17,940
you expected.

69
00:04:17,940 --> 00:04:23,910
Because many times I've seen guys and myself included load data and we don't actually load to write

70
00:04:23,910 --> 00:04:29,610
data or sometimes our test data set is blank because we specified a bad path somewhere.

71
00:04:30,240 --> 00:04:36,450
So now let's visualize some of the sample data here, which I mean, basically, you've seen that before

72
00:04:36,450 --> 00:04:42,240
appear, but in order to visualize it with the class cinema and to do that, all we do, we just loop

73
00:04:42,240 --> 00:04:44,110
true from one to 51.

74
00:04:44,910 --> 00:04:49,740
They focus because this subplot has and started the index of one.

75
00:04:50,280 --> 00:04:51,960
We just load the images here.

76
00:04:52,230 --> 00:04:59,190
So we describe going from one to fifty one to first five basically for this fifty one images technically.

77
00:04:59,340 --> 00:04:59,670
But we.

78
00:04:59,840 --> 00:05:05,390
We're not looking into this image because we're starting at one and just putting it into subplots with

79
00:05:05,390 --> 00:05:06,380
my plot Libya.

80
00:05:06,620 --> 00:05:12,500
So let's run this and we can see our categories and example images here.

81
00:05:13,250 --> 00:05:14,320
So this is pretty cool.

82
00:05:14,330 --> 00:05:20,150
This is also very useful because it's good to understand how your dataset was labeled because a lot

83
00:05:20,150 --> 00:05:25,340
of times errors can be found in the labeling by just simply inspecting it beforehand.

84
00:05:25,790 --> 00:05:27,160
It's always a good practice.

85
00:05:27,170 --> 00:05:29,660
It's something that we should all be doing.

86
00:05:29,870 --> 00:05:34,250
I don't always do it sometimes, and it comes back to haunt me.

87
00:05:34,640 --> 00:05:41,750
So make sure you do this next, you know, the data preprocessing that we have to do for us.

88
00:05:41,960 --> 00:05:43,130
So what do we do with this?

89
00:05:43,130 --> 00:05:49,850
Get our rows and columns of this 28 by 28 and we stick it into the input shape because we use that as

90
00:05:49,850 --> 00:05:57,650
a variable and we pass it to a Keros sequential model is when we're building it next.

91
00:05:57,710 --> 00:05:59,120
Let's run that next.

92
00:05:59,120 --> 00:06:03,380
We have to do all one hot encoding, which you've understood from before.

93
00:06:03,800 --> 00:06:10,490
So when something is like Level four, label five, we just we have these 10 rows here and they're all

94
00:06:10,490 --> 00:06:14,220
zeros except the Fool, except the five, except to accept the six.

95
00:06:14,240 --> 00:06:20,000
So you can you can understand that it's quite simple, and we do that by using the two categorical function

96
00:06:20,000 --> 00:06:20,870
that we've loaded here.

97
00:06:21,860 --> 00:06:25,040
We've seen all this before, which is why I'm going a bit fast.

98
00:06:25,040 --> 00:06:30,440
Apologies if you if you missed or forgot the other lesson you can always go back to.

99
00:06:30,680 --> 00:06:32,510
Let's see this one here.

100
00:06:32,680 --> 00:06:34,640
We explain it in quite a bit of detail.

101
00:06:35,570 --> 00:06:38,240
So now let's move on to building our model.

102
00:06:38,600 --> 00:06:44,300
So again, I'm not going to go through all the steps that we did previously where I showed you how to

103
00:06:44,300 --> 00:06:48,470
set the kernel size and set rlu and all those things you've seen it before.

104
00:06:48,470 --> 00:06:50,810
And here's some extra notes just in case you forgot.

105
00:06:51,890 --> 00:06:55,190
So let's go ahead and build this model.

106
00:06:55,940 --> 00:06:59,390
And you can see we just have to import the layers beforehand right up here.

107
00:06:59,780 --> 00:07:05,480
We also import SD, which is the stochastic gradient descent optimizer that we'll be using.

108
00:07:05,990 --> 00:07:12,770
So you may have noticed if you have a good memory that this is the same CNN that we tried amnesty on.

109
00:07:13,250 --> 00:07:20,450
So in theory, it does seem CNN architecture should work well for the fashion this dataset because it's

110
00:07:20,450 --> 00:07:23,960
28 by 28 and it's grayscale images as well.

111
00:07:24,680 --> 00:07:27,860
However, that theory may not always hold, but let's find out.

112
00:07:29,000 --> 00:07:31,010
And now we're ready to compile our model.

113
00:07:32,030 --> 00:07:38,870
So we do that to print out the parameters you can see it's the same as we saw previously, and now we're

114
00:07:38,870 --> 00:07:40,100
ready to train our model.

115
00:07:40,790 --> 00:07:42,070
So let's train this model now.

116
00:07:42,090 --> 00:07:46,930
It's going to take twenty one seconds for e-book and we're training it for 15 epochs here.

117
00:07:46,940 --> 00:07:48,940
So this will take a little while.

118
00:07:48,950 --> 00:07:52,700
But let's do it, and let's fast forward it to the end when it's complete.

119
00:08:07,830 --> 00:08:08,940
OK, there we go.

120
00:08:09,480 --> 00:08:13,530
So you can see it's finished training and accuracy is actually pretty decent.

121
00:08:14,010 --> 00:08:17,160
However, I want you to note something here that's quite important.

122
00:08:17,730 --> 00:08:21,270
You can see that volition accuracy peaked quite early on.

123
00:08:21,270 --> 00:08:25,310
It picked up maybe at the 7th epoch and afterward.

124
00:08:25,320 --> 00:08:26,070
What do we see?

125
00:08:26,250 --> 00:08:30,600
You can see it's actually slightly decreasing, actually peaked here, but then slightly decreased here.

126
00:08:31,230 --> 00:08:34,110
And look at the training accuracy as you can.

127
00:08:34,350 --> 00:08:39,870
You can see the training actually got better and better every time, every for every epoch.

128
00:08:40,590 --> 00:08:42,570
This is an example of overfitting.

129
00:08:43,080 --> 00:08:45,440
You can see it's overfitting no to the training data.

130
00:08:45,450 --> 00:08:46,390
It's getting quite good.

131
00:08:46,860 --> 00:08:50,460
However, performance on the validation accuracy isn't that great.

132
00:08:50,910 --> 00:08:56,100
So this would be a very good time to if we were training for longer epochs, but probably use early

133
00:08:56,100 --> 00:08:59,820
stopping, at least stopping if we start really stopping at five.

134
00:09:00,060 --> 00:09:05,910
Meaning that if the validation lost, doesn't evolution, accuracy or loss, whichever one we want to

135
00:09:05,910 --> 00:09:12,720
monitor, doesn't decrease for five consecutive epochs, then we stop training.

136
00:09:13,350 --> 00:09:18,080
So you can see here it didn't basically increase by much all this way here.

137
00:09:18,090 --> 00:09:23,640
So most likely, an early stopping algorithm would probably stop around here or here somewhere.

138
00:09:24,360 --> 00:09:26,420
So that brings us to the end of this lesson.

139
00:09:26,430 --> 00:09:27,350
I hope you enjoyed it.

140
00:09:27,360 --> 00:09:32,640
It should have served as a good recap on how to train CNN on a different dataset.

141
00:09:32,640 --> 00:09:36,130
This time now we're using the fashionista as opposed to the eminence dataset.

142
00:09:36,630 --> 00:09:38,010
And what we'll do next.

143
00:09:38,340 --> 00:09:42,630
Note the performance here we get roughly 90 percent performance after 15 epochs.

144
00:09:43,020 --> 00:09:43,920
What we're going to do?

145
00:09:43,950 --> 00:09:49,980
We're going to introduce a few of the regularization techniques and we'll assess its performance and

146
00:09:49,980 --> 00:09:50,580
compare it.

147
00:09:50,910 --> 00:09:52,740
So I'll see you in the next lesson.

148
00:09:52,980 --> 00:09:53,400
Thank you.