1
00:00:00,110 --> 00:00:04,700
Now we understand in detail how the former model was built.

2
00:00:04,700 --> 00:00:11,600
We are going to go ahead and create a simple formal model with hugging face.

3
00:00:11,870 --> 00:00:17,690
We shall head over to the documentation where we have this, um, former um docs.

4
00:00:17,690 --> 00:00:22,580
Then we click on the TensorFlow former for semantic segmentation.

5
00:00:22,580 --> 00:00:28,400
Now once you click here just simply go ahead and copy out this part where we um, where we create the

6
00:00:28,400 --> 00:00:31,610
model from a pre-trained one.

7
00:00:31,610 --> 00:00:35,840
So let's copy this out and then um paste in here.

8
00:00:36,350 --> 00:00:37,160
There we go.

9
00:00:37,160 --> 00:00:42,920
Now it should be noted that this model ID um is for the former B0.

10
00:00:42,920 --> 00:00:47,210
And as we have seen in the paper, the former comes in different variations.

11
00:00:47,210 --> 00:00:50,810
That's from b0 right up to B5.

12
00:00:51,230 --> 00:00:55,730
Um, so getting back to the code, we should let's copy this out.

13
00:00:56,330 --> 00:00:57,290
Let's take that off.

14
00:00:57,290 --> 00:00:59,870
Let's just call this model ID.

15
00:00:59,870 --> 00:01:00,680
There we go.

16
00:01:00,680 --> 00:01:01,820
So that's our model ID.

17
00:01:02,000 --> 00:01:05,270
And we'll define model ID just right up here.

18
00:01:05,270 --> 00:01:09,020
Model ID is equal this model ID.

19
00:01:09,020 --> 00:01:13,370
And um as we've said already this is the B0.

20
00:01:13,370 --> 00:01:14,930
So we could change it to B5.

21
00:01:14,930 --> 00:01:18,800
And then it's trained under a D 20 K data set.

22
00:01:18,800 --> 00:01:24,890
Now other parameters which we're going to pass in here will be the number of labels.

23
00:01:24,890 --> 00:01:29,360
So we'll have number of labels which we're going to define num labels.

24
00:01:29,360 --> 00:01:33,350
And then we'll also define id to labels.

25
00:01:33,350 --> 00:01:36,920
So ID to label ID to label.

26
00:01:36,920 --> 00:01:43,460
And then we'll also define label to id and then label to id.

27
00:01:43,820 --> 00:01:45,080
There we go.

28
00:01:45,080 --> 00:01:51,740
Finally we do ignore ignore mismatched um sizes.

29
00:01:51,980 --> 00:01:52,940
Set that to true.

30
00:01:52,940 --> 00:01:58,700
Now to obtain this ID to label label to id and number of labels, we'll make use of this labels dot

31
00:01:58,700 --> 00:01:59,510
csv file.

32
00:01:59,510 --> 00:02:02,120
So you could open that up and check it out.

33
00:02:02,120 --> 00:02:03,260
Let's open this up.

34
00:02:03,260 --> 00:02:05,780
You see this is our labels um CSV file.

35
00:02:05,780 --> 00:02:07,310
Now let's copy the path.

36
00:02:07,430 --> 00:02:13,640
We'll just copy the path and then we'll make use of pandas to read the CSV.

37
00:02:13,640 --> 00:02:15,920
So we have our data frame.

38
00:02:15,920 --> 00:02:18,980
Pandas read um CSV.

39
00:02:18,980 --> 00:02:21,140
And then we pass in that file path.

40
00:02:21,140 --> 00:02:26,120
So now we read that let's create another cell just below this.

41
00:02:26,600 --> 00:02:28,040
Um there we go.

42
00:02:28,070 --> 00:02:32,840
We create this other cell and then we create our ID to label dictionary.

43
00:02:33,890 --> 00:02:34,370
Oops.

44
00:02:34,370 --> 00:02:35,990
We have this empty dictionary.

45
00:02:35,990 --> 00:02:38,210
And then we go through the data frame.

46
00:02:38,210 --> 00:02:43,460
So we have for I j in iterate rows.

47
00:02:44,060 --> 00:02:45,200
There we go.

48
00:02:45,500 --> 00:02:45,980
Oops.

49
00:02:45,980 --> 00:02:47,090
Let's get back.

50
00:02:48,440 --> 00:02:59,150
We are going to say if I is equal to zero then in that case ID to label I is going to be equal Nan.

51
00:02:59,570 --> 00:03:02,000
So Nan there we go.

52
00:03:02,000 --> 00:03:09,770
Else id to label I is going to be equal j label list.

53
00:03:09,920 --> 00:03:11,450
So we're getting this from here.

54
00:03:11,450 --> 00:03:13,070
So we have this label list.

55
00:03:13,250 --> 00:03:17,960
Um and this permits us now to get the exact um label.

56
00:03:17,960 --> 00:03:20,450
So we're going from the IDs to the labels.

57
00:03:20,450 --> 00:03:21,770
So we're matching them up.

58
00:03:21,770 --> 00:03:22,550
That's it.

59
00:03:22,550 --> 00:03:28,700
So now that we have this set, now that we have, uh, the cell, we could just run this and then print

60
00:03:28,700 --> 00:03:30,950
out our ID to label and see what we get.

61
00:03:31,280 --> 00:03:34,760
Let's run this and then say we have ID to label.

62
00:03:35,660 --> 00:03:37,040
Let's see what we obtain.

63
00:03:37,070 --> 00:03:42,200
See, we start from zero right up to 59.

64
00:03:42,740 --> 00:03:43,700
Scroll.

65
00:03:44,240 --> 00:03:45,950
Um, okay.

66
00:03:45,950 --> 00:03:48,830
So, uh, we actually have 59 of them.

67
00:03:48,830 --> 00:03:50,750
So it goes from 0 to 58.

68
00:03:50,750 --> 00:03:51,530
So that's it.

69
00:03:51,530 --> 00:03:53,600
So we have now our ID to label.

70
00:03:53,600 --> 00:03:56,750
And then we have we could obtain from your label to ID.

71
00:03:56,750 --> 00:03:59,780
So let's just create this um cell.

72
00:03:59,780 --> 00:04:03,320
And then uh the way we obtain label to ID is simple.

73
00:04:03,320 --> 00:04:08,180
So label to ID we create a dictionary.

74
00:04:08,180 --> 00:04:11,090
And then we have we match the labels with the ID.

75
00:04:11,090 --> 00:04:14,510
So it's kind of reverse of what we had for the ID two level.

76
00:04:14,510 --> 00:04:25,880
So for id label level in ID two level items we go through each and every item, um items.

77
00:04:26,540 --> 00:04:27,470
There we go.

78
00:04:27,470 --> 00:04:27,980
Okay.

79
00:04:27,980 --> 00:04:29,000
So that's it.

80
00:04:29,000 --> 00:04:32,390
So for every item we have here we're just going to reverse.

81
00:04:32,390 --> 00:04:33,470
So it becomes level.

82
00:04:33,470 --> 00:04:38,060
Um, or ID takes the place or label takes the place of id and ID takes the place of label.

83
00:04:38,060 --> 00:04:45,920
So let's run this again and then print out, um, label to ID.

84
00:04:48,240 --> 00:04:51,300
Oops label to ID there we go.

85
00:04:51,300 --> 00:04:54,870
So running that we could now see how we obtain.

86
00:04:54,960 --> 00:04:55,950
Scroll down.

87
00:04:55,950 --> 00:04:57,930
We see how we obtain our label to ID.

88
00:04:58,170 --> 00:04:58,980
So that's it.

89
00:04:58,980 --> 00:05:01,350
We've defined our label to id and ID to label.

90
00:05:01,350 --> 00:05:06,930
You see we could just we could obtain this, uh, length by doing label length of label to ID and we

91
00:05:06,930 --> 00:05:08,760
should obtain, uh, the length.

92
00:05:08,760 --> 00:05:16,230
So here we just simply have length of label to ID or length of ID to label.

93
00:05:16,230 --> 00:05:18,780
Take this off and that's fine.

94
00:05:18,780 --> 00:05:22,800
So now we can run this and then um create our model.

95
00:05:22,800 --> 00:05:26,700
And obviously after creating your model you could simply do a summary.

96
00:05:27,210 --> 00:05:28,920
We are getting an error.

97
00:05:28,920 --> 00:05:35,040
And what this error says is this model ID we specified is not a local folder.

98
00:05:35,040 --> 00:05:37,950
And it's not a valid model identifier listed on hugging face.

99
00:05:37,950 --> 00:05:41,610
So what we'll do is we'll get back to the documentation.

100
00:05:42,060 --> 00:05:43,920
Um, there we go.

101
00:05:43,920 --> 00:05:46,410
We get back here and search for former.

102
00:05:46,470 --> 00:05:52,350
And you will find that that specific model isn't in that, um, exact image size.

103
00:05:52,350 --> 00:05:54,780
So it's 640 by 640 instead.

104
00:05:54,780 --> 00:06:02,790
So let's get back here and modify this and have 640 by 640.

105
00:06:02,820 --> 00:06:03,540
There we go.

106
00:06:03,540 --> 00:06:05,580
So let's run again.

107
00:06:06,830 --> 00:06:09,440
And hopefully this time around everything should work.

108
00:06:09,440 --> 00:06:09,830
Fine.

109
00:06:09,830 --> 00:06:10,760
Let's take this off.

110
00:06:12,220 --> 00:06:13,210
Run that again.

111
00:06:13,210 --> 00:06:18,160
So as we're saying, we could, um, obviously do a model summary.

112
00:06:18,760 --> 00:06:22,540
So we could have model summary, and we could also test this out.

113
00:06:22,540 --> 00:06:29,020
So we could take in an input of shape batch by a number of channels by height by width, and then um,

114
00:06:29,020 --> 00:06:30,850
see what it produces in the output.

115
00:06:30,850 --> 00:06:33,610
Let's run this and see what we get right here.

116
00:06:33,610 --> 00:06:36,010
As you can see we have our model summary.

117
00:06:36,010 --> 00:06:38,800
It's uh 84 million parameter model.

118
00:06:38,800 --> 00:06:42,520
And we also get the shape of our output.

119
00:06:42,520 --> 00:06:46,060
So because we have 59 classes this makes sense.

120
00:06:46,060 --> 00:06:50,050
And then also we are having 128 by 128.

121
00:06:50,050 --> 00:07:03,220
So we're living from an input of 512 um 512 by 512 to an output of 128 by 128, which makes sense because

122
00:07:03,220 --> 00:07:14,020
we are living from height by width to, um, height divided by four by width divided by four.

123
00:07:14,020 --> 00:07:17,650
So, um, this makes sense that we have this kind of output.

124
00:07:17,980 --> 00:07:19,960
And so that's it for this section.

125
00:07:19,960 --> 00:07:25,510
We will now move on to the next section where we will go ahead and train um, and then evaluate our

126
00:07:25,510 --> 00:07:26,110
model.

127
00:07:26,680 --> 00:07:31,270
We'll start with the training and evaluation where we'll first of all create the optimizer.

128
00:07:31,270 --> 00:07:36,940
So right here we pass in the initial learning rate number of training steps, which depends on the length

129
00:07:36,940 --> 00:07:39,130
of our training data and the number of epochs.

130
00:07:39,130 --> 00:07:44,440
Then we have the weight decay rate specified and the number of warm ups set to zero.

131
00:07:44,440 --> 00:07:48,460
So let's run this and then we compile our model.

132
00:07:48,460 --> 00:07:51,310
Once our model is compiled we can start with the training.

133
00:07:51,310 --> 00:07:57,490
But before starting with the training we'll create a callback for our model evaluation.

134
00:07:57,490 --> 00:08:05,050
So every time we offer after every epoch, we want to take in our validation data and compute the mean

135
00:08:05,050 --> 00:08:07,900
IOU of the model.

136
00:08:07,900 --> 00:08:13,960
To understand how the mean IOU score is computed, let's consider the following example.

137
00:08:13,960 --> 00:08:20,860
Now note that mean IOU stands for the mean intersection over union.

138
00:08:20,860 --> 00:08:25,900
So if you have two examples, let's let's just draw this before we get to our main example.

139
00:08:25,900 --> 00:08:35,590
If we have this two boxes um then the red and then we have another one, let's change the color to blue

140
00:08:35,590 --> 00:08:37,480
so you could see that clearly okay.

141
00:08:37,480 --> 00:08:46,840
So if you have these two boxes then the intersection over union of them is simply going to be this area.

142
00:08:46,840 --> 00:08:47,980
That's the intersection.

143
00:08:47,980 --> 00:08:49,090
Let's change the color.

144
00:08:49,690 --> 00:08:54,400
Um, this area right here which is the intersection of the two.

145
00:08:55,920 --> 00:08:58,950
You see then divided by their union.

146
00:08:58,950 --> 00:09:02,490
Their union is obviously all this part, the zone.

147
00:09:03,060 --> 00:09:06,390
Um, let me change the color again so you can see that clearly.

148
00:09:07,200 --> 00:09:08,130
There we go.

149
00:09:08,130 --> 00:09:09,240
We have green okay.

150
00:09:09,240 --> 00:09:13,080
So we take as we're saying we take all this here.

151
00:09:13,080 --> 00:09:14,130
This is our union.

152
00:09:14,220 --> 00:09:17,160
So all this is, um, their union.

153
00:09:17,160 --> 00:09:22,380
So we take this part in white divided by all this to get the IOU.

154
00:09:22,680 --> 00:09:25,860
Let's consider that here we have the target output.

155
00:09:25,860 --> 00:09:31,650
So we expect, um, our model to produce exactly this output.

156
00:09:31,650 --> 00:09:34,470
Here we have two different classes or two different objects.

157
00:09:34,470 --> 00:09:35,070
We could say.

158
00:09:35,070 --> 00:09:37,260
So let's call this object A.

159
00:09:37,260 --> 00:09:40,290
And then let's call this object B this object right here.

160
00:09:40,290 --> 00:09:42,060
The rest is simply the background.

161
00:09:42,060 --> 00:09:44,670
So this grayish zone here is the background.

162
00:09:44,670 --> 00:09:46,800
So we have object a object B.

163
00:09:46,830 --> 00:09:50,700
Then we have a model uh which we'll call model one.

164
00:09:51,120 --> 00:09:53,280
Then we'll call this model two.

165
00:09:53,280 --> 00:09:55,560
And then we'll call this model three.

166
00:09:55,560 --> 00:09:58,830
So this model one predicts this output.

167
00:09:58,830 --> 00:10:00,540
And then the model two predicts this.

168
00:10:00,540 --> 00:10:02,100
And model three predicts this.

169
00:10:02,100 --> 00:10:07,320
Now before getting to the IOU, let's suppose that we're using a metric like the accuracy.

170
00:10:07,320 --> 00:10:15,840
Now if we're using accuracy then clearly this year this model one should have the best score.

171
00:10:15,840 --> 00:10:23,310
And that will simply be because we are going to go pixel by pixel and compare the actual with what the

172
00:10:23,310 --> 00:10:24,150
model predicts.

173
00:10:24,150 --> 00:10:29,610
It should be noted that whatever the model predicts here for this object, A is in this yellow, and

174
00:10:29,610 --> 00:10:31,470
then for the object B is black.

175
00:10:31,470 --> 00:10:33,690
So this is what the model predicts.

176
00:10:33,690 --> 00:10:34,920
This is what the model predicts.

177
00:10:34,920 --> 00:10:38,280
This is what was expected and this is what's expected.

178
00:10:38,280 --> 00:10:41,460
Well this was what's expected here in the back is behind this.

179
00:10:41,460 --> 00:10:42,690
So that's it.

180
00:10:42,990 --> 00:10:44,100
Let's get back.

181
00:10:44,100 --> 00:10:45,480
Take that off okay.

182
00:10:45,480 --> 00:10:52,770
So as we're saying with the accuracy we'll go pixel by pixel and say okay for this pixel um what does

183
00:10:52,770 --> 00:10:53,970
the model predict.

184
00:10:53,970 --> 00:10:57,900
In this case it predicts the background and what was expected was background.

185
00:10:57,900 --> 00:10:58,890
So that's fine.

186
00:10:59,190 --> 00:11:01,680
Uh, we'll do that for all the remaining pixels.

187
00:11:01,680 --> 00:11:08,520
And so given that the model does a great job at predicting, um, object A, although it makes some

188
00:11:08,520 --> 00:11:09,660
slight error here.

189
00:11:09,660 --> 00:11:16,050
So it's only this portion, this portion where we're going to have a, uh, an issue with the accuracy.

190
00:11:16,050 --> 00:11:21,780
And then for this object, B the model doesn't even predict it or it doesn't notice that.

191
00:11:21,780 --> 00:11:23,760
So we also have this zone.

192
00:11:23,760 --> 00:11:27,540
So you find that these are the two zones where we're going to have our issues.

193
00:11:27,540 --> 00:11:30,630
Or maybe this thin line here and this thin line.

194
00:11:30,960 --> 00:11:36,330
Apart from this we do not have any other issues or all the rest are correctly predicted.

195
00:11:36,330 --> 00:11:41,400
So our accuracy is going to be pretty high compared to these two orders where we have.

196
00:11:41,400 --> 00:11:50,760
You see, for this here we have um, this zone which is wrongly predicted, this zone wrongly predicted.

197
00:11:50,760 --> 00:11:56,130
Also the zone C wrongly predicted, the zone wrongly predicted.

198
00:11:56,130 --> 00:12:00,240
And then here we have the zone two wrongly predicted.

199
00:12:00,240 --> 00:12:01,110
So that's it.

200
00:12:01,110 --> 00:12:06,600
You see that this has the smallest, um, as of all this for this tool, you see, we're going to have

201
00:12:06,600 --> 00:12:12,630
this zone which is wrongly predicted because this ought to be, um, seen by the model as object A,

202
00:12:12,630 --> 00:12:14,400
but it is not seen as object A.

203
00:12:14,520 --> 00:12:19,980
So, um, if you use an accuracy, then you consider that all the zones are not predicted properly.

204
00:12:19,980 --> 00:12:22,440
And hence this model wins.

205
00:12:22,440 --> 00:12:23,670
Model one wins.

206
00:12:23,880 --> 00:12:30,780
But the problem now is if you have, um, or if you suppose that this model one is the best performant

207
00:12:30,780 --> 00:12:40,440
model, then you would end up with a model which doesn't know how to properly segment object B and in

208
00:12:40,440 --> 00:12:47,190
the case where object um, a and B are of equal importance, then this is a poorly performing model

209
00:12:47,220 --> 00:12:52,410
because it doesn't know how to to predict or how to segment.

210
00:12:52,410 --> 00:12:55,650
Uh, one of the major classes.

211
00:12:55,980 --> 00:13:04,440
On the other hand, the accuracy for model two will be the lowest, whereas it tries to segment a list

212
00:13:04,440 --> 00:13:06,240
of both classes.

213
00:13:06,240 --> 00:13:12,690
Thus, class A and class B compared to this model one which doesn't segment class B.

214
00:13:12,930 --> 00:13:20,250
And so with the mean IOU, we are simply going to compute the IOU for each and every object and then

215
00:13:20,250 --> 00:13:23,340
calculate the average over all the different classes.

216
00:13:23,340 --> 00:13:31,710
So in this example we'll take or we'll further start by computing this area that is this intersection.

217
00:13:32,340 --> 00:13:34,050
Um the intersection is this here.

218
00:13:34,050 --> 00:13:36,150
So the here is our intersection.

219
00:13:36,150 --> 00:13:37,800
We change this to red.

220
00:13:37,800 --> 00:13:38,700
So it's clearer.

221
00:13:38,790 --> 00:13:42,570
So as we're seeing here we have this intersection.

222
00:13:42,990 --> 00:13:45,960
You see um there we go.

223
00:13:45,960 --> 00:13:47,070
We have the intersection.

224
00:13:47,070 --> 00:13:51,780
And then we have the area the, the the union here is a union.

225
00:13:52,500 --> 00:13:53,640
We compute that.

226
00:13:54,270 --> 00:13:54,870
See.

227
00:13:55,580 --> 00:14:01,070
Compute that, uh, this this intersection divided by the the the the union.

228
00:14:01,070 --> 00:14:02,660
And then we do the same for this.

229
00:14:02,660 --> 00:14:04,790
Let's try to zoom in so you can see that.

230
00:14:04,790 --> 00:14:06,170
So again here we go.

231
00:14:06,170 --> 00:14:10,760
We take this intersection and then divide it by the union.

232
00:14:11,570 --> 00:14:15,350
It's clear that in this case it does very well with the class B.

233
00:14:15,350 --> 00:14:18,470
And so here we're going to have a high IOU score.

234
00:14:18,470 --> 00:14:26,030
But nonetheless with this um, with this class A, we're going to have a smaller IOU score, but it's

235
00:14:26,030 --> 00:14:27,050
going to be pretty reasonable.

236
00:14:27,050 --> 00:14:30,440
So we could get say 0.55.

237
00:14:30,440 --> 00:14:34,310
So here we get 0.55 for this with the IOU.

238
00:14:34,850 --> 00:14:38,930
Um and then here we get say 0.85.

239
00:14:39,930 --> 00:14:44,040
Whereas for this example 0.85.

240
00:14:44,040 --> 00:14:50,100
Whereas for this example we get say 0.09.

241
00:14:50,100 --> 00:14:52,560
And then here we have zero.

242
00:14:52,560 --> 00:14:58,770
So if you sum this up and then you divide of let's just get for this, for this we can estimate that

243
00:14:58,770 --> 00:15:04,170
we get um, an IOU because here the intersection is simply this part, this intersection.

244
00:15:04,380 --> 00:15:06,630
And then the union obviously is all this.

245
00:15:07,980 --> 00:15:09,090
See all of that.

246
00:15:09,090 --> 00:15:12,090
So here we could get say 0.3.

247
00:15:12,750 --> 00:15:16,500
And then here we could get say zero point.

248
00:15:16,500 --> 00:15:18,510
Let's say 0.9 okay.

249
00:15:18,510 --> 00:15:21,330
So 0.30.91.2.

250
00:15:21,360 --> 00:15:23,160
We have an average of 0.6.

251
00:15:23,160 --> 00:15:25,230
That's for the domain IOU.

252
00:15:25,350 --> 00:15:28,350
And then here's 0.90 divided by two.

253
00:15:28,350 --> 00:15:30,990
We have an average of 0.45.

254
00:15:30,990 --> 00:15:37,530
So you see that the so-called best performing model uh, in terms of accuracy turns out to be the worst

255
00:15:37,530 --> 00:15:40,020
performing model with domain IOU.

256
00:15:40,020 --> 00:15:43,770
Then here we have 0.55 plus 0.85.

257
00:15:43,770 --> 00:15:46,860
That's essentially 0.9 plus 0.5.

258
00:15:46,860 --> 00:15:48,660
That's 1.4 divided by two.

259
00:15:48,690 --> 00:15:51,780
We have 10.7.

260
00:15:51,780 --> 00:15:55,860
So this turns out to be our best performing model.

261
00:15:56,130 --> 00:16:01,710
Um, as compared to the accuracy which instead saw this to be the best performing model.

262
00:16:01,710 --> 00:16:07,680
And this makes more sense because, um, all our different classes are of equal importance and so the

263
00:16:07,680 --> 00:16:13,440
fact that we did well in predicting the larger object, we shouldn't rush to think that this is the

264
00:16:13,440 --> 00:16:19,170
best performing model because the classes which appear smaller to matter.

265
00:16:19,170 --> 00:16:20,820
That said, dive into the code.

266
00:16:20,820 --> 00:16:31,350
We're going to import evaluate, and then we are going to have our mean IOU evaluate dot load the mean

267
00:16:31,350 --> 00:16:32,490
IOU.

268
00:16:32,550 --> 00:16:34,050
So let's run that.

269
00:16:34,050 --> 00:16:39,750
See we now have our mean IOU, um, loaded very easily thanks to hugging face.

270
00:16:39,750 --> 00:16:42,540
Let's define our compute metrics method.

271
00:16:42,540 --> 00:16:47,040
So here we have compute metrics compute metrics.

272
00:16:47,040 --> 00:16:53,490
And what this metrics takes in is simply our predictions and the target output.

273
00:16:53,490 --> 00:16:56,010
So we have logits.

274
00:16:56,010 --> 00:16:57,660
This is what the model produces.

275
00:16:57,660 --> 00:17:01,950
And then the labels are what we expect the model to produce.

276
00:17:01,950 --> 00:17:02,880
So that's it.

277
00:17:02,880 --> 00:17:06,750
And then once we have this the logits will be slightly modified.

278
00:17:06,750 --> 00:17:14,460
If you can see from here you'll notice that we have batch by number of classes by 128 by 128.

279
00:17:14,460 --> 00:17:27,030
But what the mean IOU will need will be an output of the of type one by one, 28 by one, 28 by 59 instead.

280
00:17:27,030 --> 00:17:35,910
And so that said, we are going from, uh, batch by number of classes, by height, by weight to batch,

281
00:17:35,910 --> 00:17:39,180
by height, by weight, by number of classes.

282
00:17:39,180 --> 00:17:41,790
So that's what we're going to permettait this.

283
00:17:41,790 --> 00:17:44,670
So we'll go from zero now this two three.

284
00:17:44,670 --> 00:17:45,810
So you see two three.

285
00:17:45,810 --> 00:17:50,430
So we now have 223 and then one.

286
00:17:50,580 --> 00:17:56,310
So that's why we take in this logits and then permutate them or transpose.

287
00:17:56,310 --> 00:18:01,050
So we have logits um TensorFlow transpose.

288
00:18:01,050 --> 00:18:12,870
And then we're taking the logits and then permutate and we specify it's going to be 02231231 okay.

289
00:18:12,870 --> 00:18:13,650
So that's it.

290
00:18:13,650 --> 00:18:15,180
So now we have this logits.

291
00:18:15,180 --> 00:18:22,440
We're going to resize them because remember um the output or the target output isn't um 128 by 128.

292
00:18:22,440 --> 00:18:25,770
Our target output is 512 by 512.

293
00:18:26,280 --> 00:18:33,930
So because we have the stagger output and not, um, what this model produces, we will have to upsample

294
00:18:33,930 --> 00:18:34,500
this.

295
00:18:34,500 --> 00:18:39,360
And so that's why we're going to use a simple the resize method.

296
00:18:39,360 --> 00:18:42,000
So we have our logits which is now resized.

297
00:18:42,000 --> 00:18:44,940
And then we'll make use of TensorFlow's resize method.

298
00:18:45,420 --> 00:18:46,980
Um resize.

299
00:18:47,130 --> 00:18:48,030
There we go.

300
00:18:48,030 --> 00:18:51,120
We take in the logits on on resized.

301
00:18:51,120 --> 00:18:53,820
And then we specify the size the output size.

302
00:18:53,820 --> 00:19:00,750
So it's simply going to be the shape of the labels because the labels are already having the shape 512

303
00:19:00,750 --> 00:19:01,800
by 512.

304
00:19:01,800 --> 00:19:08,040
But because we do not want to include the the batch, we're going to take one and that's it.

305
00:19:08,040 --> 00:19:08,580
Okay.

306
00:19:08,580 --> 00:19:10,230
So there we go.

307
00:19:10,230 --> 00:19:16,620
We have we specify the method we want to have, uh, we make use of the bilinear interpolation method

308
00:19:16,620 --> 00:19:17,940
for resizing.

309
00:19:17,940 --> 00:19:22,890
Now talking about bilinear interpolation is simply like this.

310
00:19:22,920 --> 00:19:28,110
Uh, as you can see from this paper by, um Sanjar Kashif.

311
00:19:29,250 --> 00:19:29,670
Yeah.

312
00:19:29,670 --> 00:19:34,440
You have an input, let's say ten, 20, 30, 40.

313
00:19:34,470 --> 00:19:41,970
Then if you want to go from two by 2 to 4 by four, you see take this ten place here, 20 place here,

314
00:19:41,970 --> 00:19:44,280
30 place here, 40 place here.

315
00:19:44,280 --> 00:19:51,480
And then to go from 10 to 20, we simply look for numbers such that the distance or the difference between

316
00:19:51,480 --> 00:19:53,820
all the numbers are the same.

317
00:19:53,820 --> 00:20:00,270
So ten plus 3.3, 13.3 plus 3.3, 16.6 plus 3.3, 20.

318
00:20:00,270 --> 00:20:05,640
And then we do the same in this direction, the same in this direction, the same this direction, and

319
00:20:05,640 --> 00:20:06,390
so on and so forth.

320
00:20:06,390 --> 00:20:13,650
So that's how we go from, um, this two by two to a 4x4 by bilinear interpolation.

321
00:20:14,040 --> 00:20:17,730
So getting back to the code we're going to modify this again.

322
00:20:17,730 --> 00:20:19,860
This our logits or our predictions.

323
00:20:19,860 --> 00:20:20,670
Again.

324
00:20:20,670 --> 00:20:23,610
Yeah we'll define prediction labels.

325
00:20:23,610 --> 00:20:31,440
And then here we have argmax and text and logits resize and resize okay.

326
00:20:31,470 --> 00:20:33,510
Then we'll specify the axis to be the last axis.

327
00:20:33,510 --> 00:20:35,010
So negative one.

328
00:20:35,010 --> 00:20:43,020
Now the reason why we're doing this is also because uh we do not want 512 by 512 by 5059.

329
00:20:43,020 --> 00:20:46,320
We actually want 512 by 512.

330
00:20:46,320 --> 00:20:54,120
And so we use the argmax function such that for each and every position in this, um, output, we are

331
00:20:54,120 --> 00:20:58,440
going to know exactly which class um, had the highest probability.

332
00:20:58,440 --> 00:20:59,460
So that's it.

333
00:20:59,460 --> 00:21:01,380
We have now our predictions.

334
00:21:01,380 --> 00:21:05,010
And then we could go ahead and compute the mean IOU.

335
00:21:05,010 --> 00:21:09,240
So let's just say metrics uh metric compute.

336
00:21:09,240 --> 00:21:17,340
And then remember we this is not actually a metric here is going to be mean I oh you mean IOU okay.

337
00:21:17,340 --> 00:21:20,880
So we have our mean IOU, um compute.

338
00:21:20,880 --> 00:21:23,160
And then here we specify the predictions.

339
00:21:23,550 --> 00:21:24,390
There we go.

340
00:21:24,390 --> 00:21:27,390
We have pred labels which we've just defined.

341
00:21:27,390 --> 00:21:30,540
We have the references references.

342
00:21:30,540 --> 00:21:31,980
We have labels.

343
00:21:31,980 --> 00:21:38,010
Obviously from our data set we have the number of labels um number of labels.

344
00:21:38,640 --> 00:21:45,240
And then we have the ignore index in the index which is set to negative one.

345
00:21:45,240 --> 00:21:45,810
Okay.

346
00:21:45,810 --> 00:21:46,740
So that's it.

347
00:21:46,740 --> 00:21:54,180
Um, now once we are able to compute our metrics, the next thing we want to do is to obtain the per

348
00:21:54,180 --> 00:21:57,540
category IOU and the per category accuracy.

349
00:21:57,540 --> 00:22:00,510
So here you take this metrics.

350
00:22:00,510 --> 00:22:03,210
We get the per category accuracy.

351
00:22:03,210 --> 00:22:06,300
Convert to a list get that of the IOU and convert to a list.

352
00:22:06,330 --> 00:22:09,180
Now let's just change this name to metric actually.

353
00:22:09,180 --> 00:22:13,650
Because even though we load the mean IOU, we could also get a metric like the accuracy.

354
00:22:13,650 --> 00:22:16,950
So let's change this to metric okay.

355
00:22:16,950 --> 00:22:18,300
So that's fine.

356
00:22:18,300 --> 00:22:26,700
Um, now that we have our per category accuracy and that of the IOU, we are going to now update our

357
00:22:26,700 --> 00:22:30,030
metrics with the accuracy and IOU of each class.

358
00:22:30,030 --> 00:22:35,250
Now specifying your user ID to label the specific class name.

359
00:22:35,460 --> 00:22:41,010
And then we'll return the validation metrics, which is going to be printed out with the metrics method

360
00:22:41,010 --> 00:22:41,580
defined.

361
00:22:41,580 --> 00:22:45,090
We can now go ahead and define our metric callback.

362
00:22:45,090 --> 00:22:47,190
So let's create this new cell.

363
00:22:47,190 --> 00:22:55,890
And then we import Keras uh metric callback from Transformers Keras callbacks.

364
00:22:55,890 --> 00:23:01,140
Let's import the Keras metric callback okay.

365
00:23:01,140 --> 00:23:08,790
Then we have our metric callback which is simply going to be metric callback.

366
00:23:09,990 --> 00:23:10,830
There we go.

367
00:23:10,830 --> 00:23:16,530
Which is simply going to be our Keras metric callback, which we've just imported.

368
00:23:16,530 --> 00:23:20,640
And then in here we're going to define the metric function we just defined.

369
00:23:20,640 --> 00:23:26,580
We just created this metric function which is compute um metrics compute metrics.

370
00:23:26,580 --> 00:23:32,280
And then the evaluation data set is going to be our validation data set.

371
00:23:32,280 --> 00:23:37,230
We also specify the batch size which is the batch size we've already set.

372
00:23:37,230 --> 00:23:39,420
So we have batch size.

373
00:23:39,420 --> 00:23:40,950
Well it's actually two.

374
00:23:41,340 --> 00:23:43,470
This is batch size we set right here.

375
00:23:43,470 --> 00:23:46,890
So we could um instead put this just below.

376
00:23:46,890 --> 00:23:51,630
So we just make use of this um batch size which has already been set.

377
00:23:51,630 --> 00:23:53,340
So let's paste it out here.

378
00:23:53,880 --> 00:23:54,900
There we go.

379
00:23:54,900 --> 00:24:03,030
We have this paste it out and then we specify our batch size batch size which is set to two.

380
00:24:03,030 --> 00:24:10,620
And then finally we have the label columns which is um simply specified as labels.

381
00:24:10,620 --> 00:24:11,340
That's it.

382
00:24:11,340 --> 00:24:12,270
So that's it.

383
00:24:12,270 --> 00:24:14,160
We've created our metric callback.

384
00:24:14,160 --> 00:24:21,720
We now define this callbacks list which is going to take in our metric callback.

385
00:24:21,720 --> 00:24:28,080
So this means that you could specify other callbacks and um add them to this list with our callback

386
00:24:28,080 --> 00:24:28,740
created.

387
00:24:28,920 --> 00:24:34,080
We'll go ahead and recompile our model and then start with the training.

388
00:24:34,990 --> 00:24:36,940
After training for several epochs.

389
00:24:36,940 --> 00:24:38,740
Here are the results we obtain.

390
00:24:39,010 --> 00:24:47,620
Um, as you could see at the 15th epoch, we get this loss of 0.28 validation loss of 0.36.

391
00:24:47,620 --> 00:24:51,700
The validation mean I owe you is quite low.

392
00:24:51,700 --> 00:24:53,800
That's 0.18.

393
00:24:54,130 --> 00:25:00,250
Um, and then the validation mean accuracy is also very low 0.278.

394
00:25:00,250 --> 00:25:07,840
Then the validation overall accuracy is higher compared to the mean accuracy and the mean IOU.

395
00:25:07,840 --> 00:25:13,840
Now it should be noted that the reason why our mean values are very low is simply because we have so

396
00:25:13,840 --> 00:25:16,780
many classes which aren't represented in our data set.

397
00:25:16,780 --> 00:25:23,290
So if you look at accessories, you find that this very low value for the the validation accuracy.

398
00:25:23,290 --> 00:25:25,570
Let's get to the IOUs.

399
00:25:25,990 --> 00:25:29,500
Um, after the accuracies, you have the IOU for each and every class.

400
00:25:29,500 --> 00:25:33,640
So we have about uh, we have more than 50 classes here.

401
00:25:33,640 --> 00:25:37,540
And you see for pumps those there are no pumps.

402
00:25:37,540 --> 00:25:40,900
So this makes sense that we have, um, a value of zero.

403
00:25:40,900 --> 00:25:46,690
But pants which are much common have a much higher IOU score.

404
00:25:46,690 --> 00:25:55,540
So the reason why we having this very low values for the mean IOU and the mean accuracy is simply because

405
00:25:55,540 --> 00:26:01,360
we have this many classes which aren't represented or aren't represented enough in the data set.

406
00:26:01,360 --> 00:26:09,340
That said, we go ahead and save our current weights, and then we now dive into the next part of our

407
00:26:09,340 --> 00:26:09,790
work.

408
00:26:09,790 --> 00:26:13,360
So we've just looked at training and evaluation.

409
00:26:13,360 --> 00:26:17,890
Now we'll dive into evaluation with the 51 platform.
