1
00:00:00,360 --> 00:00:01,110
Hi, guys.

2
00:00:01,110 --> 00:00:06,990
Welcome to this session on modeling, in which we are going to build our own model which permits us

3
00:00:06,990 --> 00:00:15,330
pass in these kinds of inputs and it tells us whether the input is that of a person will sign angry

4
00:00:15,330 --> 00:00:17,700
or happy in the session.

5
00:00:17,700 --> 00:00:24,090
We are going to start with the LINET model, which we saw in the previous session while modifying some

6
00:00:24,090 --> 00:00:31,440
parameters to suit the problem we're trying to solve and then move on to even more complex and better

7
00:00:31,440 --> 00:00:33,390
computer vision models.

8
00:00:33,390 --> 00:00:36,780
We copy the code from the previous session and pace out here.

9
00:00:36,780 --> 00:00:39,120
So there is it.

10
00:00:39,120 --> 00:00:47,060
We have this little net model which we saw in the previous example, and then we also have this finalize

11
00:00:47,070 --> 00:00:49,620
this number of classes here which we have to set.

12
00:00:50,610 --> 00:00:54,910
So that set your we change this number of classes from 1 to 3.

13
00:00:54,930 --> 00:01:00,250
Now, the reason why we're doing this is simply because in the previous session we were building the

14
00:01:00,270 --> 00:01:03,450
net model, which takes an images and outputs.

15
00:01:03,450 --> 00:01:12,460
Whether these images are those of a parasitic or a known parasitic cell.

16
00:01:12,480 --> 00:01:19,680
Now, in this example, we're taking an images and our model has to decide whether is that of an angry

17
00:01:19,680 --> 00:01:24,630
person, a sad person or a happy person.

18
00:01:24,630 --> 00:01:25,620
So we see that here.

19
00:01:25,620 --> 00:01:31,980
We have three different classes, and because of that, we're going to change this to from 1 to 3.

20
00:01:31,980 --> 00:01:33,150
So we have this.

21
00:01:33,150 --> 00:01:34,590
We run the cell.

22
00:01:34,590 --> 00:01:35,400
That's fine.

23
00:01:35,400 --> 00:01:41,280
We also have this other configurations, the learning rate, number of epochs, the dropout rate, the

24
00:01:41,280 --> 00:01:50,250
regularization rate, number of filters, canal size, number of strikes, pool size, number of layers,

25
00:01:50,970 --> 00:01:57,210
or rather outputs in this dense layer and a number of outputs in this other dense layer.

26
00:01:57,780 --> 00:02:00,420
So with that, we're just going to run the cell.

27
00:02:00,420 --> 00:02:06,630
Now, note that you could always get back to the previous sessions to understand each and every parameter

28
00:02:06,630 --> 00:02:09,000
we have here as we've discussed this already.

29
00:02:09,150 --> 00:02:13,050
So we just seem to replicate in this little net model we've seen previously.

30
00:02:13,290 --> 00:02:16,410
Now, that said, you see the kind of output we get here.

31
00:02:16,410 --> 00:02:18,120
We have 6 million parameters.

32
00:02:18,120 --> 00:02:24,750
We have our different layers going from our Conv layers to our fully connected layers.

33
00:02:24,870 --> 00:02:31,050
Now the next point here is this activation right here.

34
00:02:31,050 --> 00:02:37,140
So this activation now with a previous case study was the sigmoid.

35
00:02:37,140 --> 00:02:43,560
And this was because we're actually deciding whether an output will be one class or the other.

36
00:02:43,560 --> 00:02:51,690
It was a binary classification problem wherein we could use this kind of sigmoid activation function

37
00:02:51,690 --> 00:02:53,400
with the sigmoid activation function.

38
00:02:53,400 --> 00:02:56,610
If you can recall, we have this input and we have this output.

39
00:02:56,610 --> 00:03:05,280
So let's say X gets in and Y goes out and here's 00.5.

40
00:03:05,310 --> 00:03:09,570
Yeah, we have this and one.

41
00:03:09,570 --> 00:03:17,850
So what goes on here is as we take in higher values of X, the sigmoid function approaches one, while

42
00:03:17,850 --> 00:03:23,430
as we take in high negative values of X, the sigmoid function approaches zero.

43
00:03:23,430 --> 00:03:27,360
And so the range of values here is simply 0 to 1.

44
00:03:27,360 --> 00:03:34,680
And this is logical since in the binary classification kind of setting, we wanted to output a zero

45
00:03:34,680 --> 00:03:37,910
for one class or a one for another class.

46
00:03:37,920 --> 00:03:46,290
Now intermediate values like 0.7 will be taken to be either one of these classes, depending on our

47
00:03:46,290 --> 00:03:47,160
threshold.

48
00:03:47,370 --> 00:03:53,760
But then the role of the sigmoid here is to make sure that all our values, that's based on all the

49
00:03:53,760 --> 00:04:01,470
inputs, we are able to have an output which always lies between zero and one as we could see.

50
00:04:01,490 --> 00:04:05,880
You see that we always in this range here, 0 to 1.

51
00:04:06,330 --> 00:04:13,620
Now, in a case where we're having a multiclass problem, like in this case where we have three different

52
00:04:13,620 --> 00:04:21,300
classes like this and that, what we don't want is to just say, for example, here we have an output

53
00:04:21,300 --> 00:04:27,330
0.1 output 0.67 output say one.

54
00:04:27,330 --> 00:04:36,520
So we do not want this kind of outputs right here since we're dealing with a multi class problem with

55
00:04:36,540 --> 00:04:43,590
single level, since here our output cannot be two of those classes.

56
00:04:43,590 --> 00:04:49,500
So in some problems or in some kind of problems, we may have a situation where the person is maybe

57
00:04:49,500 --> 00:04:53,580
sad and say angry at the same time.

58
00:04:53,730 --> 00:04:59,610
Now in those kinds of situations, you could afford to have this kind.

59
00:04:59,630 --> 00:05:05,420
Kinds of outputs where, say, these two year, these two classes have high values nonetheless.

60
00:05:05,420 --> 00:05:13,010
And the kind of problem we're trying to solve, we want the model to choose or pick only one out of

61
00:05:13,010 --> 00:05:14,750
the three different classes.

62
00:05:14,750 --> 00:05:18,380
So we're not going to pick two or one.

63
00:05:18,380 --> 00:05:20,660
We're going to pick only one.

64
00:05:20,660 --> 00:05:27,590
And because we're going to pick only one, what we have here, what we try to do here is to make sure

65
00:05:27,590 --> 00:05:30,440
that this our output sums up to one.

66
00:05:30,440 --> 00:05:37,760
So instead of having let's take this off, instead of having this kind of output, we'll have an output

67
00:05:37,760 --> 00:05:46,820
which sums up to one such that this year if you have that such that one, the one with the highest class

68
00:05:46,820 --> 00:05:53,690
or the one with the highest value is considered as that class which the model has selected.

69
00:05:54,170 --> 00:05:57,110
And so we could have different kinds of values.

70
00:05:57,110 --> 00:06:01,910
We could have say 0.1 year 0.2.

71
00:06:01,910 --> 00:06:07,520
But here, since we already have 0.10.2, we're going to have 0.7.

72
00:06:07,730 --> 00:06:14,240
Since we want to ensure that the sum of all these values should give us one, we look at this 0.3 plus

73
00:06:14,240 --> 00:06:16,130
0.7 gives one.

74
00:06:16,310 --> 00:06:22,790
So we make sure that our values here lie between zero and one.

75
00:06:22,790 --> 00:06:25,610
And when we sum them up, it gives us one.

76
00:06:26,390 --> 00:06:32,750
Now, an activation function, which we could use in achieving this, is the soft max function.

77
00:06:32,750 --> 00:06:34,910
So let's have here soft max.

78
00:06:34,910 --> 00:06:38,360
So instead of the sigmoid, now we have the soft max.

79
00:06:38,660 --> 00:06:41,720
And to better understand how the soft max works.

80
00:06:41,720 --> 00:06:43,570
So you clearly see that difference.

81
00:06:43,580 --> 00:06:47,720
Let's take this example from the analytics Viacom website.

82
00:06:48,260 --> 00:06:53,930
In this example, they are considering that we have three classes in this output.

83
00:06:53,930 --> 00:06:59,510
Let's take this off here instead of ten like what we have now here they have just this three classes

84
00:06:59,510 --> 00:07:00,380
right here.

85
00:07:00,590 --> 00:07:08,480
Now, after applying the soft max, we're going to have outputs here, just like when you have some

86
00:07:08,480 --> 00:07:09,110
output.

87
00:07:09,110 --> 00:07:13,400
After applying the sigmoid, you have this other output.

88
00:07:13,400 --> 00:07:19,340
So just like, for example, if you have an output, let's say five, after applying the sigma of five,

89
00:07:19,340 --> 00:07:23,210
you have a value like 0.9, nine, whatever.

90
00:07:23,210 --> 00:07:24,860
So it's a very, very close to one.

91
00:07:24,860 --> 00:07:31,790
Whereas if you pass in a value, say negative ten after passing in the sigmoid, you're going to have

92
00:07:31,790 --> 00:07:37,710
a value very close to 0c0 .001 So this is how the sigmoid works.

93
00:07:37,730 --> 00:07:43,700
Now for the soft max function is quite different here.

94
00:07:43,700 --> 00:07:47,960
What goes on is we're going to make use of this formula here.

95
00:07:47,960 --> 00:07:59,840
So we have x, let's say x I, or let's say X for a particular class divided by the sum of e, x c for

96
00:07:59,840 --> 00:08:00,710
all the classes.

97
00:08:00,830 --> 00:08:05,690
Now, this formula shouldn't scare you as we're going to explain how it works in detail here.

98
00:08:05,840 --> 00:08:08,210
That's that we have in this input.

99
00:08:08,210 --> 00:08:15,080
Just like with the sigmoid where you have the the input coming from the output of the dense layer.

100
00:08:15,170 --> 00:08:20,180
Now here we have this year we have obviously three inputs because we have three different classes.

101
00:08:20,180 --> 00:08:23,270
So we have this one, this one and this one.

102
00:08:23,270 --> 00:08:25,300
Now you should note the values here.

103
00:08:25,310 --> 00:08:27,560
Let's take this off so you can see the value here.

104
00:08:27,680 --> 00:08:29,390
So you know the values.

105
00:08:29,390 --> 00:08:35,800
This one is 2.33 -1.46 0.56.

106
00:08:35,810 --> 00:08:40,490
Now, once we have this, we're going to simply apply this formula and the weights apply this as such,

107
00:08:40,490 --> 00:08:45,620
you take this value 2.33, we have the exponential of that value.

108
00:08:45,620 --> 00:08:53,690
So you have e to the power of 2.33 as you see here of that particular class divided by the sum of all

109
00:08:53,690 --> 00:08:56,420
these exponentials for all the different classes.

110
00:08:56,420 --> 00:09:07,370
So you have each step of 2.33 there is it divided by each step of 2.33 plus here E to the power of negative

111
00:09:07,370 --> 00:09:12,290
four, 1.46 plus each of 0.56.

112
00:09:12,290 --> 00:09:14,300
So you have all this summed up.

113
00:09:14,300 --> 00:09:17,720
That's basically what this formula here means for all the different classes.

114
00:09:17,720 --> 00:09:21,680
Sum up E to the power of the values that is, classes take up.

115
00:09:21,860 --> 00:09:26,690
Now, once you have that for this first class, class one, you obtain this value.

116
00:09:26,690 --> 00:09:33,890
Now for the next class you have e to the power of -1.46 divided by this same some years.

117
00:09:33,890 --> 00:09:39,440
So basically this sum is the same everywhere, but the only difference is the numerator.

118
00:09:39,440 --> 00:09:41,480
So you have this numerator which changes.

119
00:09:41,480 --> 00:09:45,590
Here we have 0.56, we have this other numerator and we have this value.

120
00:09:45,620 --> 00:09:53,030
Notice how as the value goes towards negative high negative values, these values are put in zero,

121
00:09:53,030 --> 00:09:56,930
while when the value is going towards a high, positive value is approaching one.

122
00:09:56,930 --> 00:09:59,420
So this means that if we replace this and put a value.

123
00:09:59,510 --> 00:10:03,910
Exit ten would have a value even very close to one like, say, 0.9.

124
00:10:04,030 --> 00:10:05,560
So that's it.

125
00:10:05,560 --> 00:10:08,230
And that's basically how the soft max works.

126
00:10:08,230 --> 00:10:16,840
So this means that at every given point when you sum up all these outputs, you have a total value equal

127
00:10:16,840 --> 00:10:17,380
one.

128
00:10:17,380 --> 00:10:22,030
So you could take the 0.83 plus 0.01 less 0.84.

129
00:10:22,060 --> 00:10:24,070
Let's walk into decimal places.

130
00:10:24,070 --> 00:10:29,500
So 0.840.84 plus 0.14 is like 0.98.

131
00:10:29,500 --> 00:10:32,580
So you add this other parts up to give you one.

132
00:10:32,590 --> 00:10:38,770
So basically what you're doing with the soft max is you're taking some inputs for, let's say, three

133
00:10:38,770 --> 00:10:43,360
classes and then you sharing the values.

134
00:10:43,360 --> 00:10:45,700
So you have one that you want to share.

135
00:10:45,700 --> 00:10:49,090
You have this number one which you want to share.

136
00:10:49,090 --> 00:10:54,370
You're going to give this one fraction of this whole one.

137
00:10:54,520 --> 00:10:57,130
In this case, the fraction is 0.83.

138
00:10:57,130 --> 00:11:02,800
And then you're going to give this other one's own fraction, this other one, its own fraction, such

139
00:11:02,800 --> 00:11:06,070
that at the end, all this year, sum ought to be equal one.

140
00:11:06,070 --> 00:11:11,620
From here, we're going to get straight into the training and start by designing our loss function.

141
00:11:11,620 --> 00:11:17,590
So we have this last function which is going to be the categorical cross entropy function.

142
00:11:17,590 --> 00:11:23,680
So unlike before, where we had the binary, now we have this categorical cross entropy, we have it

143
00:11:24,040 --> 00:11:25,480
under documentation.

144
00:11:25,480 --> 00:11:29,500
You have to cross losses, categorical cross entropy.

145
00:11:29,500 --> 00:11:37,510
And then one of the arguments is this from logics now from logic, as given your Oh, let's get back

146
00:11:37,510 --> 00:11:37,900
here.

147
00:11:37,900 --> 00:11:39,970
The from logic by default is false.

148
00:11:39,970 --> 00:11:44,260
So by default we suppose that we have from logic to be false.

149
00:11:44,260 --> 00:11:52,630
And as given your this from logic says whether the y pred the y part is what the the other y which is

150
00:11:52,630 --> 00:11:54,280
outputted by the model.

151
00:11:54,280 --> 00:12:02,590
So this is whether the y pred is expected to be a logic tensor or not, and by default we assume that

152
00:12:02,590 --> 00:12:05,320
y pred encodes a probability distribution.

153
00:12:05,320 --> 00:12:12,030
So by default we are supposing that what the model is going to output.

154
00:12:12,040 --> 00:12:13,900
In our case we have three classes.

155
00:12:13,900 --> 00:12:20,080
What the model is going to output is going to be such that when we some others probabilities about some

156
00:12:20,080 --> 00:12:24,550
others outputs here is going to give us a value of one.

157
00:12:26,200 --> 00:12:32,980
And so when we have this by default, that's from logit to be false.

158
00:12:32,980 --> 00:12:39,130
So by default we have from largest to be false when we have this is simply meaning that we're supposing

159
00:12:39,130 --> 00:12:43,660
that what is going to get into this last function is going to be a larger tensor.

160
00:12:43,660 --> 00:12:50,170
And that's our case here because we've had or we have this self max activation.

161
00:12:50,170 --> 00:12:56,320
Now in case we didn't have this solve my activation, then we'll have needed to specify this or set

162
00:12:56,320 --> 00:13:02,080
this to true such that what gets into the categorical cross entropy is a largest tensor.

163
00:13:03,520 --> 00:13:08,140
Now that said, let's go ahead and test this example.

164
00:13:08,140 --> 00:13:10,090
Here we have this example.

165
00:13:10,090 --> 00:13:14,620
Let's take this example, copy this and then just paste it out here.

166
00:13:14,800 --> 00:13:17,440
So we have this example.

167
00:13:17,620 --> 00:13:18,880
There we go.

168
00:13:18,910 --> 00:13:21,640
Yeah, we have this categorical cross entropy.

169
00:13:21,640 --> 00:13:24,160
We're going to print this results.

170
00:13:24,160 --> 00:13:27,790
So we print out the results and see what it gives us.

171
00:13:27,790 --> 00:13:31,010
We run that and you see we have a value of 1.17.

172
00:13:31,030 --> 00:13:40,630
Now this value here tells us how close the models prediction is to the true values of y.

173
00:13:40,720 --> 00:13:46,480
Now let's modify our model's predictions such that it's very close to the true values of y.

174
00:13:46,510 --> 00:13:48,070
So yeah, we have zero maturity.

175
00:13:48,070 --> 00:13:54,310
0.050.950000.100.8.

176
00:13:54,310 --> 00:13:56,740
Now we should reduce this to 0.08.

177
00:13:56,740 --> 00:13:58,510
So the very close to zero now.

178
00:13:58,510 --> 00:13:58,680
Yeah.

179
00:13:58,690 --> 00:14:01,940
Instead of 0.1, let's say we have zero point.

180
00:14:01,960 --> 00:14:08,260
Now let's, let's change this 0.05 and then here we have 0.85.

181
00:14:08,590 --> 00:14:12,850
So all this sums to one, this to sums to one.

182
00:14:12,850 --> 00:14:13,480
That's it.

183
00:14:13,510 --> 00:14:15,040
You see, they are very close now.

184
00:14:15,040 --> 00:14:17,470
Now with this we run the cell again.

185
00:14:17,470 --> 00:14:21,970
And what you notice, you see the value drops by almost a factor of ten.

186
00:14:21,970 --> 00:14:25,560
So this shows us that these two are very close to each other.

187
00:14:25,570 --> 00:14:37,720
Now, let's change this and say 0.0, Let's have here 1.0 and you're zero zero, your 1.0.

188
00:14:37,720 --> 00:14:39,940
We run that and what do we get?

189
00:14:39,940 --> 00:14:42,370
You see, you have a value which is practically zero.

190
00:14:42,370 --> 00:14:44,750
And this is because these two values are very close.

191
00:14:44,770 --> 00:14:51,820
Now, if you change this again, so let's have here zero and have this 1.0, let's now make these values

192
00:14:51,820 --> 00:14:53,230
completely different from each other.

193
00:14:53,230 --> 00:14:58,990
So here we have 1.0 and that so this means that this year the actual.

194
00:14:59,070 --> 00:15:04,240
Position is this position your while, this one or what?

195
00:15:04,240 --> 00:15:04,610
The mother.

196
00:15:04,770 --> 00:15:08,900
This is a different position now you're the actual is this but the model predicts this.

197
00:15:08,900 --> 00:15:13,790
So the model fails to predict for each case here.

198
00:15:13,790 --> 00:15:18,200
So your let's run this again and see how high this value is.

199
00:15:18,200 --> 00:15:24,050
So this shows us how this categorical cross entropy loss actually works.

200
00:15:25,430 --> 00:15:31,010
Now, to gain even more in depth understanding of how this works, let's consider this foreign formula.

201
00:15:31,010 --> 00:15:36,980
So this is the categorical cross entropy loss defined here is simply we have this negative.

202
00:15:37,010 --> 00:15:40,850
Here's the sum of y true log widespread.

203
00:15:41,120 --> 00:15:51,530
Now we take this example here for this case, or let's modify this such that this first prediction here

204
00:15:51,530 --> 00:16:01,640
matches this prediction and or rather this second prediction matches this second prediction and this

205
00:16:01,640 --> 00:16:02,930
year doesn't match.

206
00:16:02,930 --> 00:16:06,770
So this example, let's yeah, this doesn't match.

207
00:16:06,770 --> 00:16:12,140
So the first example doesn't match while this other example matches.

208
00:16:12,140 --> 00:16:17,480
So what the model predicts matches what the model was meant to predict.

209
00:16:17,600 --> 00:16:24,200
Now, that said, let's get into this formula and see how it works so we understand why we get in high

210
00:16:24,200 --> 00:16:28,610
values when there is no matching and very low values when there is a match.

211
00:16:28,610 --> 00:16:39,350
So here, for example, we have zero this y true, just use the formula zero then log of 0.01.

212
00:16:40,970 --> 00:16:42,260
Now we have the summation.

213
00:16:42,260 --> 00:16:52,010
So plus this next example one this is y true one log of 0.05.

214
00:16:52,640 --> 00:17:01,030
You have that plus next one zero log of 0.96.

215
00:17:01,040 --> 00:17:03,980
Now, obviously this cancels because we have zero year here too.

216
00:17:03,980 --> 00:17:04,610
We have zero.

217
00:17:04,610 --> 00:17:05,720
This goes away.

218
00:17:06,530 --> 00:17:07,490
This goes away.

219
00:17:07,490 --> 00:17:08,810
We left only with this.

220
00:17:08,810 --> 00:17:10,220
Let's take this off.

221
00:17:10,490 --> 00:17:14,600
So we left only with this year after this computation.

222
00:17:15,380 --> 00:17:16,040
Okay.

223
00:17:16,040 --> 00:17:23,330
Now log of 0.05 gives us -1.3, approximately -1.3.

224
00:17:23,330 --> 00:17:27,350
So we have here approximately -1.3.

225
00:17:27,350 --> 00:17:29,540
But what is negative here gives us 1.3.

226
00:17:29,540 --> 00:17:31,910
So we have this value of 1.3.

227
00:17:32,030 --> 00:17:40,340
Now let's try example here for this case where there is a matching, we have zero log of 0.1 because

228
00:17:40,340 --> 00:17:42,380
we have zero is not going to be considered here.

229
00:17:42,380 --> 00:17:43,460
We have zero again.

230
00:17:43,460 --> 00:17:45,170
Zero here is going to multiply.

231
00:17:45,170 --> 00:17:46,340
That is going to be zero.

232
00:17:46,340 --> 00:17:48,380
Now, this other one, we have one.

233
00:17:48,380 --> 00:17:49,280
So that's what I want.

234
00:17:49,280 --> 00:17:55,670
We have one log of 0.71 log of 0.7.

235
00:17:55,910 --> 00:17:57,800
What do we have here?

236
00:17:59,530 --> 00:18:06,870
This gives us -0.15, so we have approximately 0.15.

237
00:18:06,880 --> 00:18:11,350
So we see that with where there is a matching, the loss is reduced and when there is no matching,

238
00:18:11,350 --> 00:18:13,060
the loss value is increased.

239
00:18:13,060 --> 00:18:17,560
So we that we just simply take this off since it's our default and that's fine.

240
00:18:17,560 --> 00:18:19,090
So we have this now.

241
00:18:19,450 --> 00:18:25,900
Before continuing, let's also consider a case where our outputs like let's get back here, we're going

242
00:18:25,900 --> 00:18:28,960
to run this again, get the loss we have.

243
00:18:28,960 --> 00:18:35,110
So now we're going to consider that this Y true, that's our data set is going to be designed such that

244
00:18:35,110 --> 00:18:39,790
instead of having this category called kind of output, we're going to have the integers.

245
00:18:39,790 --> 00:18:44,020
So here instead of 010, we're going to simply have one.

246
00:18:44,230 --> 00:18:47,950
Now, let's get back to why this is so.

247
00:18:47,950 --> 00:18:49,990
We have 010.

248
00:18:49,990 --> 00:18:55,350
This is translated that's this for categorical and for integer.

249
00:18:55,360 --> 00:18:59,410
This is translated as one because the one is at the first position.

250
00:18:59,590 --> 00:19:01,230
We're starting from 012.

251
00:19:01,240 --> 00:19:07,720
Now, in the case where we have 001, this is translated as two since this are the second position,

252
00:19:07,720 --> 00:19:09,910
0012.

253
00:19:09,910 --> 00:19:10,630
So that's it.

254
00:19:10,630 --> 00:19:13,060
So we had seen this already in the previous session.

255
00:19:13,060 --> 00:19:16,900
Now getting back here, let's modify this.

256
00:19:16,900 --> 00:19:19,480
Instead of having this, let's put one.

257
00:19:19,690 --> 00:19:22,450
And instead of having this, let's put two.

258
00:19:22,450 --> 00:19:29,980
So we're saying that if your data set is designed in this way, recall we have seen it here, just here.

259
00:19:30,010 --> 00:19:32,440
Let's get into dataset loading just here.

260
00:19:32,440 --> 00:19:39,580
If the level mode is int, then you should have this kind of design that we go right here.

261
00:19:39,580 --> 00:19:41,800
Okay, so we have this kind of design instead.

262
00:19:41,800 --> 00:19:43,330
So we just have this integers.

263
00:19:43,330 --> 00:19:47,110
So if you have this, let's run this here, you see, doesn't work.

264
00:19:47,110 --> 00:19:54,700
Now what we would do in the case where we have this kind of output is we're going to use a sparse categorical.

265
00:19:54,700 --> 00:19:57,850
So instead of the categorical we have the sparse categorical.

266
00:19:57,850 --> 00:20:04,180
You see you run that and you have the exact same answer you will have in the case where it was categorical,

267
00:20:04,630 --> 00:20:07,840
Let's get back here, run that again.

268
00:20:08,170 --> 00:20:09,730
So you have the exact same answer.

269
00:20:09,730 --> 00:20:13,570
So that said, we could make use of the sparse categorical.

270
00:20:13,570 --> 00:20:17,110
It depends on how we created our dataset.

271
00:20:17,110 --> 00:20:22,780
So here we have sparse, categorical and we should just into this since we're not making use of that.

272
00:20:22,930 --> 00:20:27,700
Let's take this off now and then we move on to our metrics.

273
00:20:27,700 --> 00:20:29,650
So here we have the metrics.

274
00:20:30,430 --> 00:20:34,330
The metrics we're using here will be the categorical accuracy.

275
00:20:34,330 --> 00:20:37,120
So we have categorical accuracy.

276
00:20:38,530 --> 00:20:46,480
Let's give it a name, let's call that accuracy, and then we will also have the top K accuracy.

277
00:20:46,480 --> 00:20:47,350
There we go.

278
00:20:47,350 --> 00:20:50,800
We have our top K categorical accuracy.

279
00:20:50,830 --> 00:20:55,840
We'll give it a name before giving it a name, you need to give it the value of K.

280
00:20:55,840 --> 00:21:04,360
So we'll call, we'll give a value of K of two and then we give it a name top K accuracy.

281
00:21:04,540 --> 00:21:11,050
Now before we move on, let's explain this top K categorical accuracy metric right here.

282
00:21:11,050 --> 00:21:18,430
So unlike with this accuracy, with this categorical accuracy, where if we have, for example, this

283
00:21:18,430 --> 00:21:22,210
for situations here, there's four examples.

284
00:21:23,170 --> 00:21:29,560
This those in blue are what the model predicts, and those in red are what the model was expected to

285
00:21:29,560 --> 00:21:30,340
predict.

286
00:21:30,550 --> 00:21:33,160
Our accuracy will be computed as such.

287
00:21:33,430 --> 00:21:36,160
So we'll have we'll start with this first one.

288
00:21:36,910 --> 00:21:39,760
The highest year is this 0.5.

289
00:21:39,760 --> 00:21:40,990
The highest year is this.

290
00:21:40,990 --> 00:21:44,980
Since there is no match, we have a zero plus.

291
00:21:45,340 --> 00:21:46,900
We move to this next one.

292
00:21:46,900 --> 00:21:48,190
The highest year is this.

293
00:21:48,190 --> 00:21:52,660
The highest is the highest here Is this and the highest is this.

294
00:21:52,660 --> 00:21:54,610
At this position there is no matching.

295
00:21:54,610 --> 00:22:02,500
So we have a zero and that's because the class zero was picked, whereas the expected what should have

296
00:22:02,500 --> 00:22:03,700
been class one.

297
00:22:03,910 --> 00:22:11,290
Now we get your the class zero is predicted by the model and the class zero is also what the model,

298
00:22:11,290 --> 00:22:15,490
what was expected or what the actual outputs.

299
00:22:15,490 --> 00:22:20,260
So here we have plus a one because we've gotten this correctly.

300
00:22:20,260 --> 00:22:22,510
So this is correct, this is wrong.

301
00:22:22,510 --> 00:22:23,500
This is wrong.

302
00:22:23,500 --> 00:22:26,140
Now we get your we have the same situation.

303
00:22:26,140 --> 00:22:27,790
The highest here is 0.7.

304
00:22:27,790 --> 00:22:28,930
The highest here is this.

305
00:22:28,930 --> 00:22:31,720
And the position in is such that there is a match.

306
00:22:31,960 --> 00:22:36,160
So we have plus one now because we have four different examples.

307
00:22:36,160 --> 00:22:40,300
We divide all this by four and multiply by 100.

308
00:22:40,300 --> 00:22:45,250
So this gives us accuracy of 50%.

309
00:22:45,700 --> 00:22:51,370
Now we get back to this case here for the Typekit categorical accuracy.

310
00:22:51,790 --> 00:22:58,420
For this first case, the highest class here is this one, the second highest.

311
00:22:58,630 --> 00:23:03,090
US is this order one year.

312
00:23:03,100 --> 00:23:07,600
So we have this highest class second and we have this third.

313
00:23:07,600 --> 00:23:13,240
So in this order for this one year, this our first.

314
00:23:14,110 --> 00:23:16,090
We have two of them.

315
00:23:16,090 --> 00:23:17,380
And then this one.

316
00:23:17,380 --> 00:23:20,530
This is our first and this.

317
00:23:20,530 --> 00:23:21,500
This is 0.05.

318
00:23:21,520 --> 00:23:24,360
So yeah, we have two of them this year.

319
00:23:24,370 --> 00:23:26,620
This is our first and this our second.

320
00:23:26,620 --> 00:23:33,340
So because we've selected K equal to we interested in just our first two highest predictions.

321
00:23:33,340 --> 00:23:42,070
So with the top K categorical accuracy, we are not interested in making sure that the highest prediction

322
00:23:42,070 --> 00:23:43,480
here matches the highest prediction.

323
00:23:43,480 --> 00:23:51,670
Here's what we're interested in here is if any of the two highest matches the highest prediction we

324
00:23:51,670 --> 00:23:52,420
expect.

325
00:23:52,420 --> 00:23:59,830
So if any of the two predictions of the model matches the exact prediction in the model should have

326
00:23:59,830 --> 00:24:04,900
predicted, then we will give that or we consider that as a correct prediction.

327
00:24:04,900 --> 00:24:08,320
So this case here, these first two do not match this.

328
00:24:08,320 --> 00:24:17,470
So this is a wrong prediction and we have a zero plus we move to this next one year or the highest year

329
00:24:17,470 --> 00:24:19,180
is this and it matches with the second.

330
00:24:19,180 --> 00:24:20,410
So we have a one.

331
00:24:20,410 --> 00:24:25,660
So this is considered now correct prediction unlike previously where we would have considered this to

332
00:24:25,660 --> 00:24:26,850
be a wrong prediction.

333
00:24:26,860 --> 00:24:31,600
Now for this one, the first two of the three there.

334
00:24:31,600 --> 00:24:33,550
So obviously this would be a right prediction.

335
00:24:33,550 --> 00:24:36,490
It also matches here the first matches with this first year.

336
00:24:36,520 --> 00:24:38,620
So we have that plus one.

337
00:24:39,190 --> 00:24:45,370
And then this other one year we have the highest prediction year matches in with this.

338
00:24:45,370 --> 00:24:46,570
So we have plus one.

339
00:24:46,570 --> 00:24:52,300
So here we have three divided by four times 100.

340
00:24:52,300 --> 00:25:02,860
This means that the top two categorical accuracy in this case is equal 75%, unlike the categorical

341
00:25:02,860 --> 00:25:04,780
accuracy, which is 50%.

342
00:25:04,810 --> 00:25:07,650
Now with that, we go ahead and compile the model.

343
00:25:07,660 --> 00:25:15,100
And that model, the compilers from this we have Adam learning rate, which we specified in the configuration,

344
00:25:15,100 --> 00:25:19,570
the last function here, the Matrix, this and that's it.

345
00:25:19,570 --> 00:25:23,410
So we compile our model and then we set now to train.

346
00:25:23,410 --> 00:25:26,020
There we go, let's paste this out.

347
00:25:26,020 --> 00:25:33,880
And here instead of this year we should have configuration number of epochs.

348
00:25:33,880 --> 00:25:37,870
Then obviously we have our training data set and we have our validation data set.

349
00:25:37,870 --> 00:25:40,270
So let's run this and see what we get.

350
00:25:40,270 --> 00:25:41,980
Scroll down and that's it.

351
00:25:42,010 --> 00:25:49,720
See our losses drop in accuracy, increase in and top accuracy, which is clearly higher than the accuracy

352
00:25:49,900 --> 00:25:50,230
training.

353
00:25:50,230 --> 00:25:55,780
Now complete, let's plot out the loss curves for the validation and the training.

354
00:25:55,810 --> 00:26:02,830
As you could see here, both the validation and training losses all drop together, while the accuracies

355
00:26:02,830 --> 00:26:07,780
for the validation and training also increase up to this point here.

356
00:26:07,780 --> 00:26:10,480
So it's almost getting to a value of one.

357
00:26:10,480 --> 00:26:19,210
Now, you could check this out here you see the accuracy, 97.8% year, 98.19%.

358
00:26:19,210 --> 00:26:21,640
So the model is performing quite well.

359
00:26:21,640 --> 00:26:25,060
We could evaluate this model on our validation data.

360
00:26:25,270 --> 00:26:34,030
And here we have the net model that evaluates so we call the evaluate method and then we pass in the

361
00:26:34,030 --> 00:26:38,590
validation data set, validation data set.

362
00:26:39,130 --> 00:26:39,560
Okay.

363
00:26:39,610 --> 00:26:42,220
So we run this and there we go.

364
00:26:42,220 --> 00:26:54,700
We have a loss of 0.350.035 accuracy 98.33% top k or rather a better still top to accuracy 99.88%.

365
00:26:54,730 --> 00:27:01,000
Now note that given that the model is an overfitting, the model keeps all the model's metrics keep

366
00:27:01,000 --> 00:27:01,750
increasing.

367
00:27:01,750 --> 00:27:08,500
What you could do here is increase the number of epochs so we could train for more epochs to get even

368
00:27:08,500 --> 00:27:09,610
better results.

369
00:27:09,700 --> 00:27:15,700
Now we're ready to test out this model on some image in our testing dataset.

370
00:27:15,700 --> 00:27:19,690
So here we're going to have this image or let's call it test image.

371
00:27:19,690 --> 00:27:25,330
We have test image, which is going to be red or which is going to be this image we're going to read

372
00:27:25,330 --> 00:27:27,190
using OpenCV library.

373
00:27:27,190 --> 00:27:29,230
So we have your CV to read.

374
00:27:29,230 --> 00:27:33,640
And then just in here we just open this up and the test.

375
00:27:34,390 --> 00:27:41,860
Let's take up, say happy, we take all, we copy this path here, copy this path, paste it out here.

376
00:27:41,890 --> 00:27:42,880
There we go.

377
00:27:43,060 --> 00:27:46,510
And then we're going to convert this into a tensor.

378
00:27:46,750 --> 00:27:58,270
Let's close this we have here now our test image or let's say M is equal to F dot constant test image.

379
00:27:58,430 --> 00:28:00,500
And then we're going to specify the data type.

380
00:28:00,500 --> 00:28:03,770
So your is a float 32.

381
00:28:04,520 --> 00:28:06,650
And then with that, we're just going to pass this.

382
00:28:06,650 --> 00:28:09,350
So let's let's print out the output.

383
00:28:10,010 --> 00:28:15,500
Let's first of all, print out the shape, see, see the image shape 90 by 90 by three.

384
00:28:15,500 --> 00:28:22,670
Then we're going to pass this into our model directly, because as we had designed this, we have put

385
00:28:22,670 --> 00:28:27,540
in the resizing in the model and the rescaling tool.

386
00:28:27,560 --> 00:28:33,980
So we're going to resize and then we're going to rescale in the model such that now we do not need to

387
00:28:33,980 --> 00:28:36,500
do that out of the model.

388
00:28:36,560 --> 00:28:42,070
So that said, let's get back here and all we need to do is call all in that model.

389
00:28:42,080 --> 00:28:50,300
But before calling that, we need to do add one dimension since we passed this input in the model as

390
00:28:50,300 --> 00:28:50,900
batches.

391
00:28:50,900 --> 00:29:01,400
So we add the batch dimension here to expand dimensions and then we have the image to specify the axis

392
00:29:01,400 --> 00:29:02,270
zero.

393
00:29:02,270 --> 00:29:02,900
Okay.

394
00:29:02,900 --> 00:29:10,070
So once we have this, now let's have our little net model which takes in that image.

395
00:29:10,070 --> 00:29:15,680
So here we're going to print out this output or print out what our model gives us.

396
00:29:15,680 --> 00:29:22,760
We get in this error here where we're told that there is this incompatibility issues between the input

397
00:29:22,760 --> 00:29:26,330
image and what the model expects.

398
00:29:27,140 --> 00:29:32,300
That said, we get back to the model and we notice that we didn't actually put the resizing.

399
00:29:32,300 --> 00:29:39,350
So let's get back here and we have this resize rescale layers which we had built already, and we put

400
00:29:39,350 --> 00:29:43,820
that instead of this rescaling, we have this resize rescale.

401
00:29:43,820 --> 00:29:47,390
So we make sure we resize and we rescale.

402
00:29:47,540 --> 00:29:48,280
Okay.

403
00:29:48,290 --> 00:29:55,130
Now next thing we have to do is we have to modify this year because you're by doing this, we suppose

404
00:29:55,130 --> 00:29:58,760
another input is going to be 256 by 256.

405
00:29:58,760 --> 00:30:02,330
But here's what we will have is that we will have this known.

406
00:30:02,330 --> 00:30:08,870
So our input could be any of any dimension, but we are going to do the resizing here.

407
00:30:08,870 --> 00:30:14,540
So we have the resizing in this resize scale layer and then we also have the rescaling.

408
00:30:14,540 --> 00:30:15,590
So that's it.

409
00:30:15,590 --> 00:30:17,030
Let's run this.

410
00:30:17,720 --> 00:30:18,640
There we go.

411
00:30:18,650 --> 00:30:25,640
As you could see, you have from your we have this 256 by two, 56 by three.

412
00:30:25,640 --> 00:30:31,550
And that's because we've actually passed this input into our resize scale layers.

413
00:30:31,760 --> 00:30:34,930
So let's go ahead and retrain our model.

414
00:30:35,360 --> 00:30:39,200
Your training and validation plots for the loss and accuracy.

415
00:30:39,200 --> 00:30:43,820
And with this, we could go ahead and test our image.

416
00:30:43,820 --> 00:30:45,230
So let's run that.

417
00:30:45,230 --> 00:30:46,760
And this is what we get.

418
00:30:46,760 --> 00:30:53,600
As you could see here, we have practically 00.99 and then almost zero.

419
00:30:53,600 --> 00:31:01,340
So this shows us clearly that our class, the class one in this case, because here this is our class

420
00:31:01,340 --> 00:31:02,300
zero.

421
00:31:03,290 --> 00:31:04,370
Take that off.

422
00:31:04,670 --> 00:31:09,220
This is our Class zero, Class one and class two.

423
00:31:09,230 --> 00:31:17,440
So this image is of class one and is correct because this is a happy image.

424
00:31:17,480 --> 00:31:29,150
So basically, you see how to create this image here, this image array from the file path, and then

425
00:31:29,150 --> 00:31:35,840
convert this into a tensor, which is then pass into the model without any preprocessing.

426
00:31:36,380 --> 00:31:39,680
Another thing we want to do is to actually print out the class.

427
00:31:39,680 --> 00:31:44,750
So what we can do here is instead of this little net model, we'll have tfte ARG max.

428
00:31:44,750 --> 00:31:50,480
So we'll look for the class with the highest probability, which in this case is this one.

429
00:31:50,480 --> 00:31:51,650
We'll look for that class.

430
00:31:51,650 --> 00:31:52,720
So let's hear the ARG.

431
00:31:52,760 --> 00:31:55,370
Max, we specify the axis.

432
00:31:55,400 --> 00:31:59,720
Now, if you're new to this, you could check out our previous sessions where we treat these kinds of

433
00:31:59,720 --> 00:32:01,820
functions thoroughly.

434
00:32:01,820 --> 00:32:03,560
So here we have that.

435
00:32:03,560 --> 00:32:05,480
Let's run this.

436
00:32:05,510 --> 00:32:07,160
You see, we have zero zero.

437
00:32:07,160 --> 00:32:08,750
No, this is T of the max.

438
00:32:08,750 --> 00:32:10,940
This is negative one or one.

439
00:32:10,940 --> 00:32:13,220
Let's say negative one is the last axis.

440
00:32:13,670 --> 00:32:16,400
Okay, we have that you see picks out this here.

441
00:32:16,400 --> 00:32:19,820
And then from here let's convert this to non pi.

442
00:32:20,210 --> 00:32:24,800
And then from here we use the class names to get this name.

443
00:32:24,800 --> 00:32:27,350
So there we go, class names.

444
00:32:27,350 --> 00:32:28,430
We run that.

445
00:32:30,560 --> 00:32:32,120
We given this.

446
00:32:33,320 --> 00:32:35,870
Let's see what we obtain before this class names.

447
00:32:35,870 --> 00:32:39,740
Let's take this off and also take this one off.

448
00:32:39,740 --> 00:32:40,910
We run that.

449
00:32:41,300 --> 00:32:44,780
Okay, we have this list so we should take the zero element.

450
00:32:45,290 --> 00:32:46,070
There we go.

451
00:32:46,070 --> 00:32:47,420
Now we have that.

452
00:32:47,420 --> 00:32:52,490
We put in the class names and we get the name.

453
00:32:52,970 --> 00:32:53,720
So that is it.

454
00:32:53,780 --> 00:32:55,340
See, we get the name happy.

455
00:32:55,580 --> 00:32:58,010
Okay, so with that, now let's do one.

456
00:32:58,580 --> 00:33:04,790
Let's test let's take SAT, for example, and we'll see how easy it is to carry out such tests.

457
00:33:04,820 --> 00:33:06,860
Let's pick out an image here.

458
00:33:07,060 --> 00:33:07,880
Let's take this one.

459
00:33:08,090 --> 00:33:10,160
You could actually view this image here.

460
00:33:10,250 --> 00:33:11,210
So that's it.

461
00:33:12,260 --> 00:33:14,580
This one are the same image.

462
00:33:14,600 --> 00:33:15,920
Take up this other one.

463
00:33:15,950 --> 00:33:17,330
You see, we have this.

464
00:33:17,900 --> 00:33:19,580
So let's copy this path.

465
00:33:19,970 --> 00:33:21,190
Take this off.

466
00:33:21,200 --> 00:33:22,460
Take this off.

467
00:33:22,670 --> 00:33:26,310
Scroll up and then simply paste it here.

468
00:33:26,330 --> 00:33:33,070
So this is the path we try to know exactly from the model what kind of image it is.

469
00:33:33,080 --> 00:33:33,830
So is here.

470
00:33:33,830 --> 00:33:34,940
It's a sad image.

471
00:33:34,940 --> 00:33:37,070
And this is from the test set.

472
00:33:37,070 --> 00:33:42,980
So make sure you are doing this kind of testing with data the model has never, ever seen.

473
00:33:43,280 --> 00:33:44,330
Okay, so that's it.

474
00:33:44,640 --> 00:33:46,580
Our model is performing quite well.

475
00:33:47,540 --> 00:33:52,340
We can do something similar to what we had here, but with the difference that instead of just giving

476
00:33:52,340 --> 00:33:57,770
out these labels, we give out not only this labels, but also what the model predicts.

477
00:33:57,770 --> 00:33:59,780
So let's copy this code here.

478
00:33:59,930 --> 00:34:02,420
Get back to our testing.

479
00:34:02,630 --> 00:34:05,450
There we go and paste it out here.

480
00:34:05,870 --> 00:34:10,820
So again, we're going to use the train with the validation data set.

481
00:34:10,970 --> 00:34:11,720
So there we go.

482
00:34:11,720 --> 00:34:12,560
We have our validation.

483
00:34:12,560 --> 00:34:14,690
They said we have this plot.

484
00:34:14,690 --> 00:34:23,960
And then at the level of the title we plot, let's have this true label right here, and then we will

485
00:34:23,960 --> 00:34:26,450
have the predicted level.

486
00:34:26,450 --> 00:34:27,940
So let's have this.

487
00:34:27,950 --> 00:34:35,480
We move to the next line and then we have our predicted level.

488
00:34:36,770 --> 00:34:38,120
Yeah, predicted level.

489
00:34:38,120 --> 00:34:39,200
There we go.

490
00:34:40,520 --> 00:34:42,350
And that's fine.

491
00:34:42,500 --> 00:34:47,000
So we have this predicted level and we'll get it from here.

492
00:34:47,000 --> 00:34:49,610
So we could simply have the net model.

493
00:34:50,810 --> 00:34:59,330
The net model here we have our net model which takes in the images, so it takes in the image, selects

494
00:34:59,330 --> 00:35:03,950
a particular image, and let's do the let's expand them.

495
00:35:03,950 --> 00:35:10,970
So we have expand dimension, take in that image.

496
00:35:12,320 --> 00:35:15,500
Axis equals zero and we close that.

497
00:35:16,550 --> 00:35:20,770
So we have this, we pass the net model and we have this output.

498
00:35:20,780 --> 00:35:25,610
Now we are going to do something similar again to what we had here.

499
00:35:26,210 --> 00:35:27,620
Basically, it's even this year.

500
00:35:27,620 --> 00:35:29,120
So let's just copy this.

501
00:35:29,690 --> 00:35:32,420
Let's just copy this and replace it right here.

502
00:35:32,570 --> 00:35:35,080
So it's the same thing we're trying to do.

503
00:35:35,090 --> 00:35:42,050
All we're trying to do here is actually passing the image into the model and get its class.

504
00:35:42,050 --> 00:35:46,180
So we compare with the actual level.

505
00:35:46,190 --> 00:35:53,780
So here we have this class names and this we have plus there should be fine and that's it.

506
00:35:54,800 --> 00:35:55,310
With this.

507
00:35:55,310 --> 00:35:58,160
Now let's run this cell and see what we get.

508
00:35:58,220 --> 00:35:59,270
Scroll.

509
00:36:01,390 --> 00:36:02,390
Here is what we get.

510
00:36:02,410 --> 00:36:03,840
You see the model?

511
00:36:03,850 --> 00:36:05,200
This is supposed to be true.

512
00:36:05,230 --> 00:36:10,780
The model year doesn't perform quite well, unlike what we had in this evaluation.

513
00:36:10,780 --> 00:36:14,290
So let's check out our code and see if there's any errors.

514
00:36:15,370 --> 00:36:16,600
Scroll this way.

515
00:36:16,900 --> 00:36:19,360
Here you see, we have m This was the images.

516
00:36:19,360 --> 00:36:20,650
So we have that.

517
00:36:20,650 --> 00:36:23,020
That's why you see, the predictor is always sad.

518
00:36:23,020 --> 00:36:29,440
So we have always predictor label, always sad because we had taken or we had picked just one image

519
00:36:29,440 --> 00:36:30,490
and it's not dynamic.

520
00:36:30,490 --> 00:36:36,760
So here we have this images, we select the particular image and we run this again.

521
00:36:37,510 --> 00:36:38,830
This should be fine.

522
00:36:39,460 --> 00:36:45,730
We get in this error because we didn't add the batch dimension, unlike here where we added a batch

523
00:36:45,730 --> 00:36:47,440
dimension before passing in here.

524
00:36:47,440 --> 00:36:55,060
So let's get back and then we're going to replace that images the image with this code here.

525
00:36:55,060 --> 00:36:59,180
So let's get back here and then replace this with this code.

526
00:36:59,200 --> 00:37:04,150
Note that we've treated this already in previous sessions, so you could always check out if you new

527
00:37:04,150 --> 00:37:06,340
to methods like expand DBMS.

528
00:37:06,460 --> 00:37:11,040
Okay, we run that and see what we get.

529
00:37:11,050 --> 00:37:11,520
Okay.

530
00:37:11,560 --> 00:37:12,370
So that's it.

531
00:37:12,370 --> 00:37:13,210
This is what we get.

532
00:37:13,210 --> 00:37:18,970
You see, Happy, happy, angry, happy, true level, happy, predicted, happy.

533
00:37:19,090 --> 00:37:22,510
Or let's see if there's any errors.

534
00:37:22,510 --> 00:37:26,260
The model does quite well see no errors.

535
00:37:26,260 --> 00:37:28,090
So it's almost 100%.

536
00:37:28,090 --> 00:37:29,470
In fact, it's 100%.

537
00:37:29,470 --> 00:37:33,010
Although the evaluation here shows 98%.

538
00:37:33,010 --> 00:37:34,570
98.33%.

539
00:37:34,870 --> 00:37:38,200
The next thing we'll do is plot out the confusion matrix.

540
00:37:38,860 --> 00:37:43,450
So we're going to go through our validation data for M level.

541
00:37:43,450 --> 00:37:52,600
In our validation data set, we are going to start the labels in this list right here.

542
00:37:52,600 --> 00:38:03,160
So we have labels that append the label and then we also have predicted that append add the net model

543
00:38:03,160 --> 00:38:05,170
which takes in the images.

544
00:38:05,170 --> 00:38:10,150
So here we have what the model predicts and what the model should predict.

545
00:38:11,270 --> 00:38:16,280
So yeah, we have predicted create this list and then labels.

546
00:38:16,280 --> 00:38:17,590
We also have this list.

547
00:38:17,600 --> 00:38:18,350
There we go.

548
00:38:18,350 --> 00:38:20,180
So we have now this.

549
00:38:20,900 --> 00:38:21,500
That's it.

550
00:38:21,500 --> 00:38:23,930
So with this now let's run the cell.

551
00:38:24,680 --> 00:38:29,930
And then before moving on, we will convert this to a non P format so we could easily manipulate it.

552
00:38:29,930 --> 00:38:33,950
So we have this as non pi, we run this again.

553
00:38:34,250 --> 00:38:35,630
That's fine.

554
00:38:36,830 --> 00:38:38,870
The year we could print out the labels.

555
00:38:38,870 --> 00:38:41,030
We have labels.

556
00:38:41,060 --> 00:38:42,350
Let's print this out.

557
00:38:42,800 --> 00:38:43,580
There we go.

558
00:38:43,580 --> 00:38:45,670
You see the labels, the different levels.

559
00:38:45,680 --> 00:38:49,340
Now let's try to flatten out all the different labels right here.

560
00:38:49,340 --> 00:38:51,610
So let's have this to be flattened.

561
00:38:51,620 --> 00:38:53,150
Now let's scroll up.

562
00:38:53,150 --> 00:38:54,380
See this?

563
00:38:55,500 --> 00:38:57,650
They are in batches of 32, so.

564
00:38:59,320 --> 00:39:02,120
We can actually see that you're scroll down a bit.

565
00:39:02,140 --> 00:39:03,890
You see, we have this batch here.

566
00:39:03,910 --> 00:39:06,210
Then we have the next batch and so on and so forth.

567
00:39:06,220 --> 00:39:09,670
But then this output format isn't exactly what we want.

568
00:39:09,670 --> 00:39:15,480
What we want is the classes that the class with the highest core.

569
00:39:15,490 --> 00:39:19,660
So instead of this one head representation, we want the integer representation.

570
00:39:19,660 --> 00:39:22,330
So what we're going to do here is use the max.

571
00:39:22,330 --> 00:39:25,960
So let's print out the agg max of this.

572
00:39:26,410 --> 00:39:28,570
Oops, Let's get back up.

573
00:39:28,570 --> 00:39:30,370
Let's print out the max of this.

574
00:39:31,210 --> 00:39:31,960
There we go.

575
00:39:31,960 --> 00:39:34,690
And then we specify the last axis.

576
00:39:34,960 --> 00:39:42,340
So we have that and we run that or we get an error or we have this error.

577
00:39:42,370 --> 00:39:44,260
Now let's let's do this.

578
00:39:44,260 --> 00:39:51,010
Let's pick out or let's simply select your up to the last value.

579
00:39:51,010 --> 00:39:57,670
Now, this error should be coming in because we have batches of 32, but it happens that if you have

580
00:39:57,670 --> 00:40:01,210
a data, let's suppose that you have a data set of 48 items.

581
00:40:01,210 --> 00:40:06,250
Now if you break this in batches of 32, then you have let's let's say 98.

582
00:40:06,610 --> 00:40:12,370
Okay, So here you have the first batch 32, the next batch 32, the next batch 32.

583
00:40:12,370 --> 00:40:19,600
So here you already have 96 elements and then the last batch will be two because you have you want to

584
00:40:19,600 --> 00:40:20,680
have 98 elements here.

585
00:40:20,680 --> 00:40:23,380
You have 96 plus two given 98.

586
00:40:23,380 --> 00:40:27,610
So because of this last batch here, we get we get in this error here.

587
00:40:27,760 --> 00:40:29,850
So because of that, we get the error.

588
00:40:29,860 --> 00:40:33,400
So what we could do is print right up to this last batch.

589
00:40:33,400 --> 00:40:36,280
So let's let's run that.

590
00:40:36,280 --> 00:40:38,080
So we're not going to pick out the last batch here.

591
00:40:38,080 --> 00:40:39,550
We're not picking up the last batch.

592
00:40:39,550 --> 00:40:45,520
So we go from the first batch to right up to the batch before the last one.

593
00:40:45,520 --> 00:40:48,370
So let's run this now and you see that, that's fine.

594
00:40:48,370 --> 00:40:49,990
So this works out well.

595
00:40:49,990 --> 00:40:53,140
We could even print out, say, the first two batches.

596
00:40:53,140 --> 00:40:54,850
Let's print out the first two batches.

597
00:40:54,850 --> 00:40:56,250
So you see what that looks like.

598
00:40:56,260 --> 00:40:57,730
See, this is the first two batches.

599
00:40:57,730 --> 00:41:00,940
You have this here and this.

600
00:41:00,940 --> 00:41:03,190
Now, from here, you could actually flatten.

601
00:41:03,190 --> 00:41:05,500
So you could do this, you flatten this.

602
00:41:05,500 --> 00:41:11,410
So you get all this in this single or one dimension list.

603
00:41:11,410 --> 00:41:14,950
So that's why we want to get now let's move on.

604
00:41:14,950 --> 00:41:16,780
So let's get back to this.

605
00:41:16,780 --> 00:41:19,030
We have right up to the last batch.

606
00:41:21,520 --> 00:41:24,730
We run this and there we go.

607
00:41:24,730 --> 00:41:25,630
So this is what we get.

608
00:41:25,630 --> 00:41:27,900
So we've actually flatten out all these elements.

609
00:41:27,910 --> 00:41:33,880
Now, if you print out the length of this year, see, you print all this length, you see you have

610
00:41:33,880 --> 00:41:36,070
6784 elements.

611
00:41:36,070 --> 00:41:40,240
So now basically we want to compare this different predictions here.

612
00:41:40,240 --> 00:41:41,950
We want to compare those predictions.

613
00:41:41,950 --> 00:41:43,510
Let's take this length off.

614
00:41:43,510 --> 00:41:50,290
Want to compare this labels, sorry, you want to compare all these labels with the model's predictions.

615
00:41:50,290 --> 00:41:58,390
So here we could repeat the same process with a predicted let's have predicted or predicted.

616
00:41:58,390 --> 00:41:59,620
We run that.

617
00:41:59,920 --> 00:42:01,600
You see, we have this two lists here.

618
00:42:01,600 --> 00:42:04,600
You see already that they are quite similar, although this one misses out.

619
00:42:04,600 --> 00:42:09,490
So what we're doing is we have in this list and this other list, this for the predicted and this for

620
00:42:09,490 --> 00:42:10,180
the labels.

621
00:42:10,180 --> 00:42:18,940
So now that we have this set, let's define all as we define our pred, let's say pred, we'll call

622
00:42:18,940 --> 00:42:20,560
pred to be this year.

623
00:42:20,560 --> 00:42:22,750
So this are print which will flatten out.

624
00:42:22,750 --> 00:42:25,210
So these are the different predictions by the model.

625
00:42:25,210 --> 00:42:28,720
And then this is what the model was supposed to predict.

626
00:42:28,720 --> 00:42:33,190
So the other labels so spreads and levels.

627
00:42:33,400 --> 00:42:35,920
Okay, so let's have that.

628
00:42:35,920 --> 00:42:41,620
We run that cell and then we get back to the code for the coefficient matrix, which we had previously.

629
00:42:41,620 --> 00:42:44,320
So basically here we are setting the previous sessions.

630
00:42:44,320 --> 00:42:50,020
We define this threshold because we have in binary classification problem here, since we have different

631
00:42:50,020 --> 00:42:51,760
classes, we wouldn't define that.

632
00:42:51,760 --> 00:42:57,550
We would just simply pass in as here this different predictions.

633
00:42:57,550 --> 00:43:02,740
So for the labels and for the predicted, we have that.

634
00:43:02,740 --> 00:43:06,490
Let's simply copy this out and paste out here.

635
00:43:06,490 --> 00:43:07,720
So we've seen this already.

636
00:43:07,720 --> 00:43:09,430
Let's take this off.

637
00:43:09,640 --> 00:43:17,980
Now we have label and then here we have the predictions that we go, we're going to print out this coefficient

638
00:43:17,980 --> 00:43:21,490
matrix, We have the figure and that's it.

639
00:43:21,490 --> 00:43:24,060
So let's run this and see what we get.

640
00:43:24,070 --> 00:43:24,880
There we go.

641
00:43:24,880 --> 00:43:27,160
We see already here we have this conversion matrix.

642
00:43:27,160 --> 00:43:34,990
And one thing you can notice straight away is that the most values are the leading diagonal has those

643
00:43:34,990 --> 00:43:36,490
elements with the highest value.

644
00:43:36,490 --> 00:43:38,860
So is here we have the highest values here.

645
00:43:38,860 --> 00:43:44,950
And this is normal actually, because when you have this conversion matrix like this, let's redraw

646
00:43:44,950 --> 00:43:49,150
this confusion matrix where you have this conversion matrix like this, this is a class zero.

647
00:43:49,150 --> 00:43:50,980
So let's say class zero is angry.

648
00:43:50,980 --> 00:43:52,690
So this is class angry.

649
00:43:54,340 --> 00:43:55,900
Yeah, Happy and sad.

650
00:43:55,900 --> 00:43:56,890
Now you're angry.

651
00:43:56,890 --> 00:43:57,790
Happy and sad.

652
00:43:57,790 --> 00:43:58,150
So.

653
00:43:58,270 --> 00:44:04,360
Whenever the prediction or whatever what the model predicts matches with what it was supposed to predict,

654
00:44:04,360 --> 00:44:07,030
you have an additional one, which is other year.

655
00:44:07,030 --> 00:44:14,590
So we simply go through all the different model predictions for the validation and you see that 1472

656
00:44:14,590 --> 00:44:22,930
times for 1472 times the model predicted angry when it was actually angry.

657
00:44:22,930 --> 00:44:25,510
So this is correct now for happy.

658
00:44:25,510 --> 00:44:27,250
You see 2000.

659
00:44:27,280 --> 00:44:35,290
Here it is matches year 2890 times the model predicts happy and the actual was happy.

660
00:44:35,290 --> 00:44:41,320
And here we have in 2198 times the model predicts that when it's actually sad.

661
00:44:42,610 --> 00:44:50,650
And then here we have 21 times the model predicts angry when it was happy and then 26 times the model

662
00:44:50,650 --> 00:44:52,690
predicts angry when it was sad.

663
00:44:52,780 --> 00:44:57,880
Then we have 20 times the model predicts happy when it was angry.

664
00:44:57,910 --> 00:45:00,020
Here we have 27 times.

665
00:45:00,020 --> 00:45:02,210
The model is happy when it was sad.

666
00:45:02,230 --> 00:45:04,600
Here we have 30 times the model.

667
00:45:04,600 --> 00:45:11,040
Pretty sad when it was angry and we have 100 times the model predicts sad when it was instead happy.

668
00:45:11,050 --> 00:45:18,280
So there is you see, this is the highest call we obtain for the wrong predictions.

669
00:45:18,280 --> 00:45:25,210
And so this means that the model has that tendency of predicting sad when it's actually happy.

670
00:45:26,020 --> 00:45:28,420
You can also observe this plot here.

671
00:45:28,420 --> 00:45:33,160
You see this lighter colors this year.

672
00:45:33,160 --> 00:45:40,570
As we go up, we get to this higher values here and then this and this.

673
00:45:40,570 --> 00:45:43,660
So the lighter the color, the higher the score.

674
00:45:43,660 --> 00:45:46,510
And then the darker the color, the lower the score.

675
00:45:47,500 --> 00:45:52,420
Now, obviously, the ideal case will be where we have this purely white.

676
00:45:52,420 --> 00:45:55,600
So we have this purely white, this purely white, this purely white.

677
00:45:55,600 --> 00:45:59,830
And all this year completely dark with all zero values.

678
00:46:00,100 --> 00:46:07,690
So that said, we've looked at how to have the obtain the confusion matrix, which is an important evaluation

679
00:46:07,690 --> 00:46:08,440
metric.

680
00:46:08,440 --> 00:46:11,800
And here we could also change this to training.

681
00:46:11,800 --> 00:46:16,750
So anyway, you just have to have your training and I'll be fine.

682
00:46:16,900 --> 00:46:24,370
Now we have to deal with that last, the last batch which did not take into consideration.

683
00:46:24,370 --> 00:46:32,320
So what we could do is we could do some concatenation here, so let's do concatenate concatenate and

684
00:46:32,320 --> 00:46:40,480
then we have that, then let's copy this, We just copy this and paste out your copy and then paste

685
00:46:40,480 --> 00:46:41,290
it out your.

686
00:46:41,710 --> 00:46:42,720
There we go.

687
00:46:42,730 --> 00:46:44,800
But then we're taking the last element.

688
00:46:44,800 --> 00:46:49,990
So instead of the elements before the last, now we're taking the last element and that's it.

689
00:46:49,990 --> 00:46:56,380
So we add, then we add in the last element here so that we could actually flooding it out separately

690
00:46:56,380 --> 00:46:59,440
before concatenating it with the previous elements.

691
00:46:59,440 --> 00:47:03,580
We saw that having all of this joined together will cause an error.

692
00:47:03,580 --> 00:47:06,790
So that's why we why are we doing this year again?

693
00:47:06,790 --> 00:47:11,980
We copied this year, let's copy this and then paste it out here.

694
00:47:11,980 --> 00:47:12,970
That should be fine.

695
00:47:13,660 --> 00:47:14,710
This for the predicted.

696
00:47:14,710 --> 00:47:18,880
So let's change this to predicted predicted.

697
00:47:18,910 --> 00:47:27,940
Okay, so we should have that fine scroll this way and then let's have concatenate.

698
00:47:28,900 --> 00:47:32,050
Okay, so let's run this and see what we get.

699
00:47:32,530 --> 00:47:33,850
We get in this error.

700
00:47:34,630 --> 00:47:36,280
Let's add this here.

701
00:47:36,790 --> 00:47:37,480
There we go.

702
00:47:37,480 --> 00:47:38,710
We run that again.

703
00:47:39,100 --> 00:47:39,970
That's fine.

704
00:47:39,970 --> 00:47:42,820
And the same should be for the predicted.

705
00:47:42,820 --> 00:47:44,620
So let's run this again.

706
00:47:44,800 --> 00:47:49,900
Get that error, scroll this way and add this year.

707
00:47:51,240 --> 00:47:58,050
Okay, so we have this, the last batch which has been added up, and then we simply copy this and paste

708
00:47:58,050 --> 00:47:58,440
your.

709
00:47:58,470 --> 00:48:06,630
So now, instead of having just the all the values before the last, we have now all the different batches

710
00:48:06,630 --> 00:48:07,410
together.

711
00:48:07,410 --> 00:48:09,900
So here is for the predict, not just for the label.

712
00:48:09,900 --> 00:48:14,850
So let's put this here we have the labels and then this is for the predicted.

713
00:48:14,970 --> 00:48:19,290
So copy this, get right to the end and we have that.

714
00:48:19,860 --> 00:48:24,410
So here we have this for the predicted space this year and that should be fine.

715
00:48:24,420 --> 00:48:25,890
Let's run this again.

716
00:48:26,730 --> 00:48:30,770
We run that and then we run this and there we go.

717
00:48:30,780 --> 00:48:33,540
See, we have slightly different answers now.

718
00:48:33,750 --> 00:48:34,890
Okay, so that's it.

719
00:48:34,890 --> 00:48:39,660
We've seen how to plot out the coefficient matrix for multiclass classification.

720
00:48:39,780 --> 00:48:45,210
Before we move on, we notice that we've made a very big mistake here.

721
00:48:45,210 --> 00:48:49,890
As with those validation data set, we actually pass in the train data set.

722
00:48:49,890 --> 00:48:54,690
So let's modify this and make sure you never make this kind of error.

723
00:48:54,690 --> 00:49:00,510
As if you make this kind of error, you feel like your model is performing well, whereas we are validating

724
00:49:00,510 --> 00:49:01,820
on the train data.

725
00:49:01,830 --> 00:49:03,920
So let's we run this again.

726
00:49:04,200 --> 00:49:05,700
I have to run this again.

727
00:49:06,090 --> 00:49:11,640
Now training is complete and we see that the model wasn't performing as well as we thought.

728
00:49:11,970 --> 00:49:17,550
You see here the last drops and then at some point starts increasing while that of the training keeps

729
00:49:17,550 --> 00:49:18,240
dropping.

730
00:49:18,240 --> 00:49:24,540
And then for the validation we have as accuracy going towards one as we have previously, whereas for

731
00:49:24,540 --> 00:49:34,350
validation you're it plateaus at around 75% as you could see your in this values.

732
00:49:34,350 --> 00:49:35,940
So you see the validation accuracy.

733
00:49:35,940 --> 00:49:42,300
The highest we have here is like 75%, although the the top key accuracy is around 90%.

734
00:49:42,960 --> 00:49:43,950
So that's sad.

735
00:49:43,950 --> 00:49:45,120
We see that.

736
00:49:45,120 --> 00:49:48,000
We see clearly there are models and performing that well.

737
00:49:48,000 --> 00:49:54,810
And the next sessions, we'll see how to better for this model performance here.

738
00:49:54,810 --> 00:50:06,060
We could also run this evaluation have the evaluation SEER 7590 and loss of one let's run this testing

739
00:50:07,350 --> 00:50:16,950
correctly classifies this next we're going to try this out to see previously we had generally 100% because

740
00:50:16,950 --> 00:50:21,210
that was on our same training data.

741
00:50:21,240 --> 00:50:23,790
Now here you see let's start from up.

742
00:50:23,790 --> 00:50:28,290
You see wrong prediction, right, wrong, wrong.

743
00:50:30,120 --> 00:50:34,590
You have your wrong, right, right, right, wrong.

744
00:50:34,620 --> 00:50:41,850
You see you see that out of the 16 different predictions, we have ten out of 16, Right.

745
00:50:41,850 --> 00:50:48,540
So running that you see, you have 62% on this little sample which you took here.

746
00:50:48,660 --> 00:50:51,780
Now, we could also plot the conversion metrics.

747
00:50:51,780 --> 00:50:53,160
Let's run this.

748
00:50:53,970 --> 00:50:59,760
And you could see clearly from here that the model isn't performing as well as it used to perform when

749
00:50:59,760 --> 00:51:01,080
we're making that error.