1
00:00:00,060 --> 00:00:00,350
Hey.

2
00:00:00,420 --> 00:00:05,790
Welcome back to the course in this section, we'll take a look at how we can use cameras to implement

3
00:00:05,790 --> 00:00:07,500
transfer learning and fine tuning.

4
00:00:08,040 --> 00:00:12,060
So go ahead and open the Notebook 21 and let's get started.

5
00:00:12,810 --> 00:00:17,820
So in this lesson, firstly, let's load our libraries here and what we're going to do.

6
00:00:17,850 --> 00:00:20,550
We're going to explore what trainable layers are.

7
00:00:21,270 --> 00:00:25,590
Take a look at the weights and them and understand how we can freeze layers and carrots.

8
00:00:25,620 --> 00:00:33,010
So firstly, let's create a very simple densely here with four nodes here, and then we use layer top

9
00:00:33,030 --> 00:00:35,040
build to build that network.

10
00:00:35,520 --> 00:00:40,890
And then by doing that, we can now explore, get the weights, get the number of trainable leads and

11
00:00:40,890 --> 00:00:42,660
get a number of non-tradable widths.

12
00:00:42,660 --> 00:00:43,980
Let's run this block of code.

13
00:00:44,550 --> 00:00:49,920
And you can see in this layer, we have two widths which are trainable and zero that are non trainable

14
00:00:49,920 --> 00:00:50,280
here.

15
00:00:50,460 --> 00:00:51,690
So that's quite interesting.

16
00:00:52,320 --> 00:00:56,580
Now note all we're all is or trainable with the exception of the batch norm.

17
00:00:56,940 --> 00:01:00,810
It uses non-tradable weights to keep track of the mean and variance during training.

18
00:01:01,770 --> 00:01:04,650
So now that you've seen that, let's take a look at this here.

19
00:01:05,010 --> 00:01:09,270
Let's make a model with two layers here, and we're going to freeze one of them here.

20
00:01:09,720 --> 00:01:15,060
So layer one, layer two for the dense, which we know is each this one is using really, this one is

21
00:01:15,060 --> 00:01:16,560
using sigmoid activation.

22
00:01:17,310 --> 00:01:21,750
And then we just use the sequential here just to combine them to get into the model.

23
00:01:22,290 --> 00:01:27,990
And what we'll do, we're going to freeze the first layer and by freezing to firstly, we have to set

24
00:01:27,990 --> 00:01:33,210
the dot trainable at that setting, the trainable width in this network to false.

25
00:01:34,170 --> 00:01:37,670
And also, we can keep a copy of this weird Slater for reference.

26
00:01:37,670 --> 00:01:42,000
So we just do layer one, get widths and we submitted a variable name here.

27
00:01:42,030 --> 00:01:47,250
Initially, I want to wait values so we can we can train this network, so we're actually going to compile

28
00:01:47,250 --> 00:01:53,280
it and treat it with some random data here just to make sure that, oh, it's 22 and change.

29
00:01:53,640 --> 00:01:56,580
If we went to one, we set them to trainable equal force.

30
00:01:57,180 --> 00:02:02,460
So that's finished trained because for a simple prototype here and know what we can do, we can get

31
00:02:02,730 --> 00:02:03,360
the layer one.

32
00:02:03,360 --> 00:02:07,290
We had snow, so we get to finally a one weights, which shouldn't have changed.

33
00:02:07,710 --> 00:02:13,350
And then we can just check them here so we can check each of these little weights here and see if they've

34
00:02:13,350 --> 00:02:14,190
changed or not.

35
00:02:15,150 --> 00:02:16,290
And they haven't changed.

36
00:02:16,530 --> 00:02:17,200
So that's good.

37
00:02:17,880 --> 00:02:19,080
So what does that mean?

38
00:02:19,230 --> 00:02:21,220
That means that, no, you're right.

39
00:02:21,240 --> 00:02:27,840
You've understood how we can set different layers to trainable and how we can explore and freeze those

40
00:02:27,840 --> 00:02:29,580
weights in the network while training.

41
00:02:30,090 --> 00:02:32,040
So now let's implement chance for living.

42
00:02:32,340 --> 00:02:38,790
So this is a transfer learning workflow, which you've seen before we instantiate a base model, Ludo

43
00:02:38,790 --> 00:02:39,210
Pre-trained.

44
00:02:39,210 --> 00:02:40,140
We onto it.

45
00:02:40,530 --> 00:02:45,090
We freeze all the layers in the base model by setting all trainable to false.

46
00:02:45,720 --> 00:02:50,850
And then we create a new top layer head of the model that's basically a fully connected head.

47
00:02:51,270 --> 00:02:53,520
And then we just start training on our new dataset.

48
00:02:53,670 --> 00:02:57,180
This is using the feature extraction method of transfer learning.

49
00:02:58,260 --> 00:03:02,640
So let's load a base model with our pre-trained image net widths.

50
00:03:05,540 --> 00:03:11,240
So the data set will be working with here is actually the catalyst, those data set that's available

51
00:03:11,240 --> 00:03:17,510
from this package, called TensorFlow, underscored datasets which will input as two Ph.D.s, and we

52
00:03:17,510 --> 00:03:19,280
can directly load this dataset here.

53
00:03:19,340 --> 00:03:24,950
It's a number of datasets available here as well, and we can just run this split that we want right

54
00:03:24,950 --> 00:03:27,550
here and we get our dataset here.

55
00:03:27,560 --> 00:03:28,850
So we have a slow.

56
00:03:28,850 --> 00:03:35,330
This will probably download the first instance you can see it downloading there, and the output you're

57
00:03:35,330 --> 00:03:40,970
going to get is a number of it's going to print a number of training samples in the trends and then

58
00:03:40,970 --> 00:03:42,440
into validation in the test.

59
00:03:42,440 --> 00:03:42,830
Yes.

60
00:03:43,100 --> 00:03:44,450
So let's wait for this.

61
00:03:46,010 --> 00:03:47,480
It should finish shortly.

62
00:03:48,350 --> 00:03:53,420
OK, so we have dataset loaded here and you can see the number of training samples.

63
00:03:53,420 --> 00:03:58,940
We have nine thousand three hundred and five validation samples, twenty three hundred and twenty three

64
00:03:58,940 --> 00:04:00,170
hundred as well for the tests.

65
00:04:00,800 --> 00:04:04,610
So let's take a look at visualizing some of the images here.

66
00:04:04,850 --> 00:04:10,190
So let's run this block of code and this how how this works.

67
00:04:10,190 --> 00:04:11,180
Here's that you take.

68
00:04:11,430 --> 00:04:16,190
You use a tick function here from the dataset that was loaded here.

69
00:04:16,220 --> 00:04:21,380
That's this object ahead of the trends or tests of ideas, whatever you want to use.

70
00:04:21,860 --> 00:04:25,100
And you can use a double take function here to take a number of images here.

71
00:04:25,430 --> 00:04:30,900
And then we just create nine subplots with the Mark lab code here and we just replace a title if it's

72
00:04:30,900 --> 00:04:37,490
a zero, if it's a dog, but a cat, if it's a zero and or else it's a dog one.

73
00:04:38,210 --> 00:04:41,270
So we can see, we can visualize those labels here.

74
00:04:41,810 --> 00:04:44,420
So this is a quite cool and nice way to visualize it.

75
00:04:45,050 --> 00:04:47,540
Now what we do is standardize our data here.

76
00:04:47,540 --> 00:04:55,520
So we're going to use this map lambda function here with this t.f image resized to resize all the images

77
00:04:55,520 --> 00:04:59,920
in this size here in the tree info and test this dataset.

78
00:04:59,930 --> 00:05:05,570
So let's do that and we'll also this will also normalize between minus one and one.

79
00:05:07,340 --> 00:05:15,410
And now what we do here, we're going to set about size here, and we'll just use this pre fetch buffer

80
00:05:15,410 --> 00:05:15,710
here.

81
00:05:16,250 --> 00:05:20,460
This is a bit confusing for you guys, so I actually use this occasionally in pie touch.

82
00:05:20,960 --> 00:05:26,300
This is just a way to load the data onto your RAM so that you actually can scroll caching so you can

83
00:05:26,300 --> 00:05:29,690
actually get much faster access to it to improve your training speed.

84
00:05:30,170 --> 00:05:35,120
So this is all part of the TensorFlow datasets package here.

85
00:05:35,300 --> 00:05:38,510
So don't worry if it's unfamiliar, it's just because it's part of that.

86
00:05:38,510 --> 00:05:42,470
It's not really part of carers, so we're just learning new things.

87
00:05:42,740 --> 00:05:47,900
So now what we do here, which is our augmentation pipeline.

88
00:05:48,050 --> 00:05:53,330
So we're going to introduce random flips, horizontal flips and some random rotation.

89
00:05:53,810 --> 00:06:00,500
You can do any sort of augmentation as you want, but we can visualize the augmentations here by running

90
00:06:00,500 --> 00:06:05,750
it through this, this little training loop here, and it's ruining the getting the augmentations here

91
00:06:06,530 --> 00:06:07,580
to get the augmented image.

92
00:06:08,120 --> 00:06:14,030
So you can see we've taken this dog image here and we've just flipped it around and we're testing it

93
00:06:14,030 --> 00:06:14,690
slightly.

94
00:06:15,080 --> 00:06:17,900
So we're getting different versions of the same image in our training dataset.

95
00:06:17,900 --> 00:06:18,590
So that's good.

96
00:06:19,100 --> 00:06:19,730
Next.

97
00:06:20,060 --> 00:06:23,060
Next, we're going to actually construct a model.

98
00:06:23,150 --> 00:06:29,120
So basically, that means that we are taking a base layer that we froze and attaching our new head to

99
00:06:29,120 --> 00:06:29,370
it.

100
00:06:29,420 --> 00:06:31,410
So let's take a look and see how we do that.

101
00:06:31,410 --> 00:06:36,620
So the model we're going to be loading is the exception model here, and this is how you load it here,

102
00:06:36,770 --> 00:06:39,410
with crystal exceptions to the application.

103
00:06:39,410 --> 00:06:40,550
Sorry exception.

104
00:06:41,150 --> 00:06:48,350
Then you just pointed to imaging that we'd set the input image size, which is 150 by 150 color treat

105
00:06:48,350 --> 00:06:49,370
up to RTB.

106
00:06:49,820 --> 00:06:52,070
And notice this is an important line here.

107
00:06:52,580 --> 00:06:58,680
We proceed to include topical faults, so that means that we don't include the top layer that's solely

108
00:06:58,700 --> 00:06:59,960
the output nodes.

109
00:07:00,440 --> 00:07:04,910
That was for image that which was a thousand nodes, and we don't include that fully connected layer

110
00:07:04,910 --> 00:07:05,240
there.

111
00:07:06,050 --> 00:07:09,280
So what we do in that next, we freeze the base model here.

112
00:07:09,290 --> 00:07:11,060
So you would have seen this before.

113
00:07:11,480 --> 00:07:14,030
So we set this model that trainable to false.

114
00:07:14,030 --> 00:07:18,530
And now what we do next is we create the top model header.

115
00:07:19,070 --> 00:07:20,030
So to do that.

116
00:07:20,270 --> 00:07:21,560
So it's a bit confusing.

117
00:07:21,560 --> 00:07:23,080
But as in many things we have to do here.

118
00:07:23,090 --> 00:07:28,980
So firstly, we have our inputs here and that basically is input shape of our models.

119
00:07:28,980 --> 00:07:31,520
So we have crystal input just to finding that input shape.

120
00:07:31,560 --> 00:07:35,300
Here we have our data augmentation, which we defined previously.

121
00:07:35,870 --> 00:07:37,550
And then what we do next.

122
00:07:38,240 --> 00:07:45,020
The pre-trained exception we require that input be scaled from zero from zero to 255 to a range of minus

123
00:07:45,020 --> 00:07:46,010
one to plus one.

124
00:07:46,610 --> 00:07:50,320
And then the rescaling layer here basically outputs this here.

125
00:07:50,330 --> 00:07:57,020
So this is how we actually have to use the keras totally layers rescaling, so we scale it to here the

126
00:07:57,020 --> 00:07:58,040
one minus one.

127
00:07:58,580 --> 00:08:03,320
So don't worry about this if you don't understand that there's a lot of tricky, hard to comprehend

128
00:08:03,320 --> 00:08:03,920
cause sometimes.

129
00:08:04,000 --> 00:08:09,430
Was with us because there were things of the just set that does specific things like this whole.

130
00:08:09,730 --> 00:08:11,460
However, don't worry about it too much.

131
00:08:11,470 --> 00:08:16,120
It's boilerplate code that you can reuse easily for your own dataset.

132
00:08:16,750 --> 00:08:18,130
So next, we just create.

133
00:08:18,730 --> 00:08:22,610
We just passed us Input X here to that skilling layer here.

134
00:08:22,630 --> 00:08:24,670
This is all all the skilling with us.

135
00:08:25,300 --> 00:08:28,150
And then what we do now, we have obvious model here.

136
00:08:28,150 --> 00:08:29,320
So this is the inputs here.

137
00:08:29,800 --> 00:08:35,830
We have inputs going into our base model and then notice it in your head that we've attached to a model.

138
00:08:36,340 --> 00:08:38,890
So we have a global average pooling layer here.

139
00:08:39,430 --> 00:08:44,860
Then that outputs to another layer that has drop out and then we have a final densely, which is just

140
00:08:44,860 --> 00:08:50,050
one output node because it's a binary output binary model, just two classes.

141
00:08:50,650 --> 00:08:55,060
So let's run this and analyze the summary that comes out of out of it here.

142
00:08:55,450 --> 00:09:01,270
You can see we only have two thousand trainable parameters, so we have 20 million parameters that we

143
00:09:01,270 --> 00:09:02,230
aren't training.

144
00:09:02,470 --> 00:09:05,170
Those are fixed to the ones that have been trained on image that.

145
00:09:05,680 --> 00:09:08,530
So those are frozen and these are the ones that are unfrozen here.

146
00:09:09,340 --> 00:09:11,590
So now let's train our top layer.

147
00:09:11,770 --> 00:09:13,120
So let's do this.

148
00:09:13,930 --> 00:09:17,940
And you can see we've started training already and this is quite simple.

149
00:09:17,950 --> 00:09:19,600
We just have to compile the model here.

150
00:09:19,990 --> 00:09:25,750
We used Adam Optimizer here, and then we use Binary Cross into breathe because it's a binary output

151
00:09:25,750 --> 00:09:29,380
model that we set from the it's equal true.

152
00:09:29,980 --> 00:09:34,660
And then we just have the metrics here being the Keros metrics binary accuracy.

153
00:09:35,260 --> 00:09:40,540
Right now, I believe we are training on CPU, which is why it is so slow.

154
00:09:40,570 --> 00:09:41,450
Yes, it is.

155
00:09:41,510 --> 00:09:48,130
So yeah, I'm going to stop this for now because there's no point in turning on the CPU because in ten

156
00:09:48,130 --> 00:09:50,020
minutes, but you book, it's going to take all night.

157
00:09:50,800 --> 00:09:54,190
So what I'll do next is we'll take a look at fine tuning.

158
00:09:54,190 --> 00:09:59,200
So remember and fine tuning, we unfreeze the base model a bit or parts of it.

159
00:10:00,100 --> 00:10:02,500
And then we use a low learning rate here.

160
00:10:03,130 --> 00:10:05,290
So let's take a look and see how we do that.

161
00:10:05,290 --> 00:10:09,790
So now you can see we've set a base model trainable to be equal to true.

162
00:10:10,270 --> 00:10:10,600
OK.

163
00:10:10,960 --> 00:10:15,940
So in this case, we're training the whole base model, but at a very low learning oriented, you can

164
00:10:15,940 --> 00:10:19,970
see we've set the learning rate to one two by 10 to the minus five.

165
00:10:19,990 --> 00:10:26,200
It's a tiny learning rate and lets us compile this model from the model summary here.

166
00:10:26,800 --> 00:10:28,850
And again, over on the CPU.

167
00:10:28,870 --> 00:10:33,960
So this is going to take a while, but hopefully you enjoyed this lesson.

168
00:10:33,970 --> 00:10:37,570
It's quite easy to do transcoding in tariffs compared to PI torture, in my opinion.

169
00:10:37,960 --> 00:10:41,140
However, doing it to purify too much isn't that bad.

170
00:10:42,640 --> 00:10:46,180
To us, lightning actually does make it a tad confusing sometimes, but it's OK.

171
00:10:47,170 --> 00:10:53,050
As you can see, this is why we don't train on CPUs, especially when we have all these tunable parameters.

172
00:10:53,620 --> 00:10:59,890
This is going to take roughly an hour to epoch an attorney box, so that's going to take all night.

173
00:10:59,900 --> 00:11:01,690
I mean, if you wanted to wait, you could.

174
00:11:02,230 --> 00:11:04,900
It's just probably not with our time right now.

175
00:11:05,500 --> 00:11:07,650
So I'm going to stop this lesson here.

176
00:11:07,690 --> 00:11:09,910
And then next, we'll take a look at.

177
00:11:11,860 --> 00:11:16,390
Using CNN's as feature extract is with us, so I'll see you in the next listen.

178
00:11:16,570 --> 00:11:16,990
Thank you.