1
00:00:00,770 --> 00:00:07,640
In some previous section, we saw how to visualize our data samples using the 51 app.

2
00:00:07,640 --> 00:00:09,200
So we click on the sample.

3
00:00:09,200 --> 00:00:17,000
For example you find that we have the original image and then we have its corresponding ground truth

4
00:00:17,000 --> 00:00:17,750
mask.

5
00:00:17,750 --> 00:00:24,710
Now what if apart from visualizing this ground truth mask with the different labels like here we have

6
00:00:24,710 --> 00:00:28,610
a blazer, here we have dress and here we have a bag.

7
00:00:28,610 --> 00:00:34,910
We go ahead and also visualize what the model predicts.

8
00:00:35,270 --> 00:00:41,420
The way we could make this work is by modifying each and every sample we have in our data set.

9
00:00:41,420 --> 00:00:47,030
So if we get our data set, head right here, we could see a couple of samples.

10
00:00:47,030 --> 00:00:49,400
Then we get the image.

11
00:00:49,580 --> 00:00:56,750
Using this file path we pass this image into our already trained model.

12
00:00:56,750 --> 00:00:59,570
So here we have our trained model.

13
00:00:59,570 --> 00:01:02,330
We obtain its predictions.

14
00:01:02,330 --> 00:01:10,160
And then once we obtain this predictions we create a new key in our dictionary which we'll call pred.

15
00:01:10,160 --> 00:01:13,730
So we have seen that we have the ground truth.

16
00:01:13,730 --> 00:01:20,840
You see we have the ID metadata or media type file path tags, metadata, ground truth.

17
00:01:20,870 --> 00:01:23,210
We'll add pred or the prediction.

18
00:01:23,210 --> 00:01:31,130
And then this prediction will simply now have a value which will be this segmentation object.

19
00:01:31,130 --> 00:01:33,380
That's a 51 segmentation object.

20
00:01:33,380 --> 00:01:39,860
But with the difference that we are not going to create the mask path as we already have, what we're

21
00:01:39,860 --> 00:01:41,960
going to create here will be a mask.

22
00:01:41,960 --> 00:01:48,980
And this mask will essentially be an array which contains or which represents our prediction.

23
00:01:49,580 --> 00:01:57,260
Diving into the code we are going to go for, I sample in the data set, or we want to get each and

24
00:01:57,260 --> 00:01:59,990
every sample and simply modify it.

25
00:01:59,990 --> 00:02:02,930
We are going to start by pre-processing the data.

26
00:02:02,930 --> 00:02:07,880
So as we had seen already, or from our pre-processing method which we've seen already.

27
00:02:07,880 --> 00:02:11,990
So let's take off this preprocess, take this off.

28
00:02:11,990 --> 00:02:15,560
And then now we have sample file path.

29
00:02:15,890 --> 00:02:17,930
See we have sample file path.

30
00:02:18,290 --> 00:02:20,330
Um instead of the image path we had already.

31
00:02:20,360 --> 00:02:21,350
There we go.

32
00:02:21,350 --> 00:02:25,250
We now obtain the image from um our sample file path.

33
00:02:25,250 --> 00:02:27,140
That's obviously after reading and decoding.

34
00:02:27,140 --> 00:02:31,280
And then um we are going to also resize this image.

35
00:02:31,280 --> 00:02:35,240
So here we have image resizing.

36
00:02:35,420 --> 00:02:37,070
Uh we resize the image.

37
00:02:37,070 --> 00:02:44,540
And uh obviously because our model is 512 by 512, we're going to have this new size or let's, let's,

38
00:02:44,540 --> 00:02:54,410
let's call this um h um underscore resized and the width width.

39
00:02:55,320 --> 00:02:57,000
On the score resized.

40
00:02:57,000 --> 00:02:57,690
There we go.

41
00:02:57,690 --> 00:03:01,800
So we have that 512 512.

42
00:03:01,800 --> 00:03:02,490
That's fine.

43
00:03:02,490 --> 00:03:07,770
So now we have height resized and then width.

44
00:03:08,660 --> 00:03:10,970
With resized.

45
00:03:11,240 --> 00:03:11,990
There we go.

46
00:03:11,990 --> 00:03:19,700
So we now resize the image, and then we cast the the output image into a float 32.

47
00:03:19,730 --> 00:03:21,800
Then we carry out the transpose.

48
00:03:21,830 --> 00:03:24,350
Now remember our input images are.

49
00:03:24,350 --> 00:03:28,460
What we obtain from our samples is height by width by three.

50
00:03:28,460 --> 00:03:37,130
But what the model needs is um instead three by height by width or channel channel by height by width.

51
00:03:37,130 --> 00:03:41,990
So let's we want to move from height by width by channel to channel by height by width.

52
00:03:41,990 --> 00:03:47,150
And so we're going to carry out the transpose which we've already been doing so far in this course.

53
00:03:47,150 --> 00:03:50,420
So we simply have img our image.

54
00:03:50,420 --> 00:03:55,760
And then we transpose so that it suits what our model takes in.

55
00:03:55,760 --> 00:03:58,190
And then we modify those positions.

56
00:03:58,190 --> 00:03:58,940
There we go.

57
00:03:58,940 --> 00:04:03,920
So once we're done with the transposition we now add an extra dimension.

58
00:04:04,100 --> 00:04:08,090
We have expand dimensions.

59
00:04:08,090 --> 00:04:09,500
There we go.

60
00:04:09,500 --> 00:04:11,390
We have axes set to zero.

61
00:04:11,390 --> 00:04:18,620
So we want to move now from channel by height by width to channel by or rather one by.

62
00:04:18,650 --> 00:04:22,400
That's actually batch by batch by channel by height by width.

63
00:04:22,400 --> 00:04:23,210
So that's it.

64
00:04:23,210 --> 00:04:30,410
In this case it's actually one by three by 512 by 512.

65
00:04:30,410 --> 00:04:30,950
Okay.

66
00:04:30,950 --> 00:04:31,910
So that's it.

67
00:04:32,120 --> 00:04:36,920
Uh, once we're done with that we now normalize and then that's fine.

68
00:04:37,040 --> 00:04:43,370
We can now go ahead and take off this part where we had processed annotations before and then dive into

69
00:04:43,370 --> 00:04:46,310
passing our inputs into the model.

70
00:04:46,310 --> 00:04:49,460
So now we we we have this image.

71
00:04:49,460 --> 00:04:52,280
We want to pass this into the model and obtain some output.

72
00:04:52,280 --> 00:04:58,070
So here we have our output which is um gotten from passing the image into the model.

73
00:04:58,070 --> 00:04:59,600
So we get logits.

74
00:04:59,600 --> 00:05:00,530
There we go.

75
00:05:00,530 --> 00:05:06,560
And then now because let's, let's, let's have this because when you, when you have or when you,

76
00:05:06,590 --> 00:05:16,040
when you get the output, its shape is essentially the batch by the number of classes by 128 by 128.

77
00:05:16,040 --> 00:05:18,020
Where where 128 by 128.

78
00:05:18,020 --> 00:05:20,930
Here is simply 512 divided by four.

79
00:05:20,930 --> 00:05:30,530
So because our output masks are going to be one by 128, that's essentially batch by height by width,

80
00:05:30,530 --> 00:05:33,470
that's obviously the output height by the output width.

81
00:05:33,470 --> 00:05:42,860
What we are going to do is we are going to select for each and every position in this, um, 2D array,

82
00:05:42,860 --> 00:05:48,260
the class with the highest um probability of occurring.

83
00:05:48,260 --> 00:05:50,720
So to do that it's going to be simple.

84
00:05:50,720 --> 00:05:53,600
We're going to take the output which we've already had.

85
00:05:53,600 --> 00:05:57,080
And then we'll pass it through the argmax or method.

86
00:05:57,080 --> 00:05:59,360
So here we have output.

87
00:05:59,360 --> 00:06:02,540
And then we specify the axis to be one.

88
00:06:02,540 --> 00:06:07,250
The reason why we specify an axis to be one year is because it's in this axis which we are going to

89
00:06:07,250 --> 00:06:10,940
be, um, selecting the classes with the highest values.

90
00:06:10,940 --> 00:06:17,120
So here we have obviously zero, we have one, we have two and we have three.

91
00:06:17,270 --> 00:06:22,190
So specify an axis to be one means we'll pick in this specific axis out.

92
00:06:22,190 --> 00:06:30,950
So that's why we make that choice then um obviously don't forget to load the model and load it with

93
00:06:30,950 --> 00:06:33,920
this um most recent checkpoints.

94
00:06:33,920 --> 00:06:39,470
So now that we have this we could now go ahead and resize our image or our output.

95
00:06:39,470 --> 00:06:46,400
Now remember our input images were resized into 512 by 512.

96
00:06:46,400 --> 00:06:48,830
Because this is what the model takes in.

97
00:06:48,830 --> 00:06:58,370
But then what we want to save is going to be a 25 by, um, 550, as we had, uh, as the image was

98
00:06:58,370 --> 00:06:59,060
originally.

99
00:06:59,060 --> 00:07:03,980
So we will have to resize the image and also obviously resize the masks.

100
00:07:03,980 --> 00:07:12,650
Now, considering the fact that our image, which we have here is going to be of shape 825 by 550 and

101
00:07:12,650 --> 00:07:17,000
also the mask is 825 by 550.

102
00:07:17,030 --> 00:07:24,680
This means that we need to resize the mask, which we get after passing this image into our trained

103
00:07:24,680 --> 00:07:25,400
model.

104
00:07:25,400 --> 00:07:32,510
After defining the height initial and width initial, which is 825 550, we could then go ahead and

105
00:07:32,720 --> 00:07:35,870
uh, make use of our image resize method.

106
00:07:35,870 --> 00:07:39,140
So here we have the resized output.

107
00:07:39,140 --> 00:07:46,640
Our resize output is simply going to be TensorFlow image resize which is going to take in our output

108
00:07:47,060 --> 00:07:48,080
taking our output.

109
00:07:48,080 --> 00:07:57,560
And then we're going to specify that the method is going to be by linear and that the anti-aliasing

110
00:07:57,590 --> 00:07:59,390
is going to be set to true.

111
00:07:59,390 --> 00:08:07,910
And then now what we're going to note also is the fact that this resize method takes in the the output

112
00:08:07,910 --> 00:08:08,180
which.

113
00:08:08,300 --> 00:08:12,170
Use one like, let's let's get back here.

114
00:08:12,170 --> 00:08:17,000
We we've converted our output into one by 128 by 128.

115
00:08:17,000 --> 00:08:24,350
But the resize takes in one by one, 28 by 128 by some number of channels.

116
00:08:24,350 --> 00:08:26,090
So we need to modify that.

117
00:08:26,090 --> 00:08:29,480
So what we do here is we're going to add an extra dimension.

118
00:08:29,480 --> 00:08:31,520
So let's there we go.

119
00:08:31,520 --> 00:08:35,090
Let's have expand dimensions.

120
00:08:35,090 --> 00:08:38,450
Then we add that extra dimension at the end.

121
00:08:38,450 --> 00:08:40,490
So we have that added.

122
00:08:40,490 --> 00:08:41,540
So that's fine.

123
00:08:41,540 --> 00:08:44,120
So now we we resize our image.

124
00:08:44,120 --> 00:08:52,280
The next step will be to go ahead and create our prediction for or add that prediction into each and

125
00:08:52,280 --> 00:08:53,270
every sample.

126
00:08:53,270 --> 00:08:56,180
All we need to do is have our sample.

127
00:08:56,180 --> 00:09:00,770
And then we specify the key prediction that's spread as we said already.

128
00:09:00,770 --> 00:09:08,960
And now we make use of this segmentation object where we're going to specify its mask to be the prediction,

129
00:09:08,960 --> 00:09:11,660
which is uh, a numpy array.

130
00:09:11,720 --> 00:09:14,930
Let's not forget to specify the height and width.

131
00:09:14,930 --> 00:09:19,100
So here we have height in feet and width in it.

132
00:09:19,100 --> 00:09:20,990
So that's it then.

133
00:09:20,990 --> 00:09:29,720
Since we've resized this into this new shape and that our output needs to be, let's say we resize this

134
00:09:29,720 --> 00:09:42,050
into one by 128 or rather 825 by 550 by one, and that we need, um, our output to take this one off.

135
00:09:42,050 --> 00:09:48,590
What we're going to do is we are going to squeeze this, um, output or this last axis out.

136
00:09:48,590 --> 00:09:49,850
So there we go.

137
00:09:49,850 --> 00:09:52,370
We have or let's, let's just do that below.

138
00:09:52,370 --> 00:09:58,610
So we have resized output again resize output TensorFlow squeeze.

139
00:09:58,940 --> 00:10:00,230
There we go.

140
00:10:00,230 --> 00:10:02,240
Our resized output.

141
00:10:02,240 --> 00:10:06,020
And we specify the axis to be equal the last one.

142
00:10:06,020 --> 00:10:07,250
So that's negative one.

143
00:10:07,250 --> 00:10:09,080
And then we do some casting.

144
00:10:09,080 --> 00:10:10,610
So there we go.

145
00:10:10,610 --> 00:10:12,500
We have cast that.

146
00:10:12,500 --> 00:10:18,470
And then we specify that our data type data type should be unsigned int eight.

147
00:10:18,860 --> 00:10:19,760
There we go.

148
00:10:19,760 --> 00:10:20,540
So that's it.

149
00:10:20,540 --> 00:10:25,940
And then once we cast this or once we squeeze and we cast our output now will be of this shape.

150
00:10:25,940 --> 00:10:32,750
But again we want this shape that's we want our output to be without a batch um axis.

151
00:10:32,750 --> 00:10:36,950
So we are going to take that off by simply specifying zero.

152
00:10:36,950 --> 00:10:37,460
Okay.

153
00:10:37,460 --> 00:10:39,440
So that's our resized output.

154
00:10:39,440 --> 00:10:40,190
There we go.

155
00:10:40,190 --> 00:10:44,540
We have resized output and that should be fine at this point.

156
00:10:44,870 --> 00:10:49,070
Now since we're done with modifying our sample we could go ahead and save that.

157
00:10:49,070 --> 00:10:52,130
So we have sample dot save and that should be fine.

158
00:10:52,130 --> 00:10:57,980
Now we'll say that um, if I is greater than five we should simply break out.

159
00:10:57,980 --> 00:11:03,320
So let's run that we get an error required broadcast table shapes.

160
00:11:03,320 --> 00:11:08,750
Now let's get back to the code and update it so that we, um, get past this error.

161
00:11:08,780 --> 00:11:13,940
Getting back to the code, we notice how we carry out normalization after the transpose.

162
00:11:13,940 --> 00:11:16,040
So we should instead do this before.

163
00:11:16,040 --> 00:11:19,220
Let's paste this out here and then run this again.

164
00:11:19,220 --> 00:11:20,690
And with that running correctly.

165
00:11:20,690 --> 00:11:23,750
Now let's go ahead and rerun our session.

166
00:11:23,750 --> 00:11:29,210
After rerunning, you could see um right here that we have an additional label.

167
00:11:29,210 --> 00:11:30,260
That is the prediction.

168
00:11:30,260 --> 00:11:31,700
So we have the ground truth.

169
00:11:31,700 --> 00:11:33,440
And now the prediction.

170
00:11:33,440 --> 00:11:38,030
If we run this data set head you'll notice we have this additional key prediction.

171
00:11:38,030 --> 00:11:42,080
So again segmentation object he has this ID.

172
00:11:42,290 --> 00:11:46,700
And then we note how here we have this mask.

173
00:11:46,700 --> 00:11:49,760
So instead of using the mask path here we use a mask.

174
00:11:49,760 --> 00:11:53,150
It's the same with um what we had already with the ground truth.

175
00:11:53,150 --> 00:11:58,880
So this means that if you don't want to specify the path, you could obviously obtain a mask and store

176
00:11:58,880 --> 00:12:01,880
it that way in your data in your database.

177
00:12:01,880 --> 00:12:03,110
So that's it.

178
00:12:03,110 --> 00:12:06,830
You see the next sample we have this mask and so on and so forth.

179
00:12:06,830 --> 00:12:08,900
Now we did this only for six samples.

180
00:12:08,900 --> 00:12:15,590
So getting back to our visualization, you'll notice that if we take off the ground truth, you would

181
00:12:15,590 --> 00:12:18,470
find that we only have this for this six samples.

182
00:12:18,470 --> 00:12:22,100
Now let's click on this one here and see what we get.

183
00:12:23,000 --> 00:12:23,960
We have ground truth.

184
00:12:23,990 --> 00:12:26,690
You could take off the prediction or you could have the original.

185
00:12:26,690 --> 00:12:28,070
So this is the original.

186
00:12:28,070 --> 00:12:30,290
Let's reduce this so you could see it a bit better.

187
00:12:30,290 --> 00:12:33,650
So as we're seeing here we have the original.

188
00:12:33,650 --> 00:12:37,250
If you add the ground truth you can see that we have the shirt.

189
00:12:37,430 --> 00:12:41,990
Let's, let's um reduce again so you could see better okay.

190
00:12:41,990 --> 00:12:46,460
So as we're seeing here you see we have shared we have skin and so on and so forth.

191
00:12:46,460 --> 00:12:47,510
That's the ground truth.

192
00:12:47,510 --> 00:12:51,170
Then for the other prediction, you see this is what our model predicts.

193
00:12:51,170 --> 00:12:56,570
So our model sees this as a shoe sees this as pants shirt skin skin.

194
00:12:56,570 --> 00:12:58,730
But this portion sees us as a jacket.

195
00:12:58,730 --> 00:13:03,260
Which makes sense since the shirt um, and jacket uh, look alike.

196
00:13:03,260 --> 00:13:05,090
Now it has a hair.

197
00:13:05,090 --> 00:13:07,880
Doesn't do so well with the hair, but, um, that's.

198
00:13:08,300 --> 00:13:15,590
So we see how we could visualize our ground truth and our model's predictions side by side.

199
00:13:15,800 --> 00:13:23,000
Now, if we get back to our training logs, we'll find that we had this, um, our Val or our mean IOU

200
00:13:23,000 --> 00:13:23,720
scores.

201
00:13:23,720 --> 00:13:30,710
We had the mean accuracy, we had overall accuracy, and then we had the per category accuracy with

202
00:13:30,710 --> 00:13:32,960
the per category IOU.

203
00:13:32,990 --> 00:13:40,370
So here for example, we have value for bra which was zero or belt 0.1 and so on and so forth.

204
00:13:40,370 --> 00:13:50,150
So what we want to do now is visualize our data taking this output or this metrics into consideration.

205
00:13:50,150 --> 00:14:00,410
So what if we want to say, um, get all the different images where our validation or rather where our

206
00:14:00,410 --> 00:14:07,370
um, the belt accuracy, the validation belt accuracy is for example, greater than zero point, um,

207
00:14:07,370 --> 00:14:08,180
five.

208
00:14:08,180 --> 00:14:10,850
In that case, we would need to filter those out.

209
00:14:10,850 --> 00:14:14,960
And and this can permit us make better decisions.

210
00:14:14,960 --> 00:14:21,920
So one great thing with working with 51 is, is really easy to implement these kinds of um, strategies,

211
00:14:21,920 --> 00:14:28,460
which can be very helpful when it comes to ameliorating our already existing solutions.

212
00:14:28,460 --> 00:14:32,480
So just as we had seen in our section on evaluation, we have our metrics.

213
00:14:32,480 --> 00:14:41,390
And then now our predictions will simply be the pred, or rather the resized output which we had from

214
00:14:41,450 --> 00:14:42,140
um above.

215
00:14:42,140 --> 00:14:43,520
So we change this.

216
00:14:43,520 --> 00:14:51,410
We have resized output and there we go resize output, convert that to numpy, and then we put this

217
00:14:51,410 --> 00:14:52,310
in a list.

218
00:14:53,220 --> 00:14:54,720
That should be fine.

219
00:14:54,750 --> 00:14:55,740
There we go.

220
00:14:55,740 --> 00:14:57,480
We have our reference.

221
00:14:57,600 --> 00:14:58,890
Uh, which is our mask.

222
00:14:58,890 --> 00:15:01,770
So here is our mask.

223
00:15:01,770 --> 00:15:03,390
We also put this in the list.

224
00:15:03,750 --> 00:15:11,130
But then this mask, the way we can compute it is by simply, um, making use of open CV.

225
00:15:11,130 --> 00:15:20,220
So open CV in read our sample we have the ground truth ground truth and then the mask path.

226
00:15:20,220 --> 00:15:22,740
So we have the mask path.

227
00:15:22,740 --> 00:15:23,670
There we go.

228
00:15:23,670 --> 00:15:30,180
And then we obviously read this CV two in read as a gray scale.

229
00:15:30,180 --> 00:15:31,170
So that's fine.

230
00:15:31,170 --> 00:15:34,020
So now we obtain our mask which we pass in here.

231
00:15:34,350 --> 00:15:38,070
Um the number of levels is simply 59.

232
00:15:38,070 --> 00:15:41,160
So the length of level two ID will suffice.

233
00:15:41,490 --> 00:15:45,270
Um ignore index zero nan to norm.

234
00:15:45,270 --> 00:15:50,640
So in case we have Nan, we convert that to zero.

235
00:15:50,640 --> 00:15:56,400
So Nan to num nan to num set to zero.

236
00:15:56,580 --> 00:15:59,190
Um, let's get back.

237
00:15:59,190 --> 00:16:03,480
Reduce levels equal false.

238
00:16:03,480 --> 00:16:04,080
That's fine.

239
00:16:04,080 --> 00:16:06,630
So now we've had our metrics.

240
00:16:06,630 --> 00:16:08,430
We can now add a new key.

241
00:16:08,430 --> 00:16:13,740
So sample mean IOU which is equal to metrics.

242
00:16:13,740 --> 00:16:17,430
And then we specify the mean IOU.

243
00:16:17,430 --> 00:16:18,300
There we go.

244
00:16:18,300 --> 00:16:20,580
We could also do the same for the accuracy.

245
00:16:21,530 --> 00:16:24,470
So yeah, we have mean accuracy.

246
00:16:24,500 --> 00:16:25,940
Mean accuracy.

247
00:16:25,940 --> 00:16:30,800
And then we have also overall accuracy.

248
00:16:30,800 --> 00:16:31,790
Accuracy.

249
00:16:31,790 --> 00:16:32,570
That's fine.

250
00:16:32,570 --> 00:16:35,000
Yeah we have mean accuracy.

251
00:16:35,150 --> 00:16:37,730
And then overall.

252
00:16:37,940 --> 00:16:40,340
Overall accuracy.

253
00:16:40,850 --> 00:16:43,370
So let's run that again and see what we get.

254
00:16:43,370 --> 00:16:50,840
Diving back into our visualizations you'll notice that we now have this extra primitives.

255
00:16:50,840 --> 00:16:52,820
So here we have the mean IOU.

256
00:16:52,850 --> 00:16:59,150
We also have the the mean accuracy and then the overall accuracy.

257
00:16:59,150 --> 00:17:04,370
So for each and every sample we have the different um values.

258
00:17:04,370 --> 00:17:12,320
And so we could filter out the different samples which have a certain um overall accuracy for example.

259
00:17:12,320 --> 00:17:18,080
So let's look for samples with an overall accuracy between 0.81 and 0.92.

260
00:17:18,110 --> 00:17:23,030
You see that when you filter this out, when you select this, you have all these different samples.

261
00:17:23,030 --> 00:17:32,300
So nine out of 77 samples, um, have an overall accuracy of uh, between 0.8 and 0.92.

262
00:17:32,330 --> 00:17:35,990
Now we can go from 0.89 to 0.92.

263
00:17:35,990 --> 00:17:38,180
And you see we have only a single sample.

264
00:17:38,180 --> 00:17:44,900
So this turns out to be our best performing sample in terms of the overall accuracy.

265
00:17:44,900 --> 00:17:51,050
So it's normal that the ground truth and then what the model predicts will will be quite similar.

266
00:17:51,050 --> 00:17:55,100
You see that we have what the model predicts and then we have the ground truth.

267
00:17:55,370 --> 00:17:56,720
Um prediction.

268
00:17:56,930 --> 00:17:57,830
So that's it.

269
00:17:57,830 --> 00:18:01,220
Now let's um, try to get that for the IOU.

270
00:18:01,220 --> 00:18:03,740
Let's take this off now.

271
00:18:05,110 --> 00:18:06,040
There we go.

272
00:18:06,040 --> 00:18:09,850
Let's get back to the IOU and filter this off.

273
00:18:09,880 --> 00:18:11,560
You see again here we have one.

274
00:18:11,830 --> 00:18:13,090
We have this one.

275
00:18:13,090 --> 00:18:14,080
Let's, uh.

276
00:18:14,080 --> 00:18:14,980
See.

277
00:18:14,980 --> 00:18:15,370
Okay.

278
00:18:15,370 --> 00:18:21,610
You could notice that, um, the sample which performed best in terms of overall accuracy, is in the

279
00:18:21,610 --> 00:18:28,510
exact same sample, which performs best in terms of mean IOU, but still, normally these two samples

280
00:18:28,510 --> 00:18:33,430
should have their ground truths and predictions or model predictions.

281
00:18:33,550 --> 00:18:35,020
Um, quite similar.

282
00:18:35,410 --> 00:18:36,400
There we go.

283
00:18:36,400 --> 00:18:37,840
Let's take off our prediction.

284
00:18:37,870 --> 00:18:38,740
See this?

285
00:18:38,740 --> 00:18:39,610
The ground truth.

286
00:18:39,610 --> 00:18:42,580
And then this is what the model predicts.

287
00:18:43,360 --> 00:18:44,380
That's fine.

288
00:18:45,040 --> 00:18:51,640
We could check out the mean accuracy and we could also check out let's cancel this or let's reset this.

289
00:18:51,640 --> 00:18:56,470
You see, you could reset and then we could also check out those with the worst overall accuracy.

290
00:18:56,470 --> 00:18:59,080
So let's check out those with the worst overall accuracy.

291
00:18:59,080 --> 00:19:03,880
We have five of them between 0.09 and 0.19.

292
00:19:04,420 --> 00:19:09,730
Uh we can click on this one and see how poor our model performed on this one.

293
00:19:09,730 --> 00:19:12,130
So here's our ground truth.

294
00:19:12,370 --> 00:19:15,130
And then here's our prediction.

295
00:19:15,130 --> 00:19:19,180
You see that this prediction here, it predicts all this to be pants.

296
00:19:19,420 --> 00:19:24,670
Whereas the ground truth expects that this should be a romper.

297
00:19:24,700 --> 00:19:31,810
Now this makes sense since, um, we shall not get the romper um, very often in our data set.

298
00:19:31,810 --> 00:19:37,750
So the model hasn't learned to classify this kind of dress as a romper.

299
00:19:38,320 --> 00:19:40,000
Um, for the shoes.

300
00:19:40,000 --> 00:19:41,200
Here we have shoes.

301
00:19:41,200 --> 00:19:42,850
Let's see what the model predicted.

302
00:19:43,600 --> 00:19:44,170
Does.

303
00:19:44,170 --> 00:19:44,530
It did.

304
00:19:44,530 --> 00:19:45,550
Well with that.

305
00:19:46,000 --> 00:19:49,090
Um, here, let's check that out.

306
00:19:49,090 --> 00:19:49,690
Wasn't bad.

307
00:19:49,690 --> 00:19:56,830
But what really makes this, um, or what gives this a very bad score is just the fact that all this

308
00:19:56,830 --> 00:20:01,840
is a romper, whereas the model sees this as a pant and this tool.

309
00:20:01,840 --> 00:20:03,670
Uh, well, this is a blouse.

310
00:20:03,670 --> 00:20:05,110
So you see here we have blouse.

311
00:20:05,110 --> 00:20:11,410
So all this blouse and then your pants, uh, which isn't too different from what we would expect,

312
00:20:11,410 --> 00:20:13,510
but this, um, has ground.

313
00:20:13,510 --> 00:20:15,340
Truth is a romper instead.

314
00:20:15,340 --> 00:20:16,480
So that's it.

315
00:20:17,230 --> 00:20:19,210
We could check out this other one.

316
00:20:19,210 --> 00:20:21,730
Let's check out this other sample.

317
00:20:21,730 --> 00:20:24,850
We take off the prediction here.

318
00:20:24,850 --> 00:20:26,950
This is a blazer and the shirt.

319
00:20:27,430 --> 00:20:28,780
Check out the predictions.

320
00:20:28,780 --> 00:20:33,610
This is a coat and a coat sweater.

321
00:20:33,610 --> 00:20:37,510
So you see um, that's why the model score is quite low.

322
00:20:37,510 --> 00:20:39,100
You could check out the score here.

323
00:20:39,100 --> 00:20:42,250
The mean IOU is 0.047.

324
00:20:42,250 --> 00:20:45,280
The the mean accuracy is 0.27.

325
00:20:45,280 --> 00:20:47,740
And the overall accuracy is 0.093.

326
00:20:47,740 --> 00:20:48,640
So that's it.

327
00:20:48,640 --> 00:20:59,470
Now what if we get for each and every sample the mean IOU for every um category or every class.

328
00:20:59,470 --> 00:21:06,610
So this means that for a class like shoes, we could get the samples with which, um, have the worst

329
00:21:06,610 --> 00:21:11,050
predictions for shoes and those which have the best predictions for shoes.

330
00:21:11,050 --> 00:21:16,960
So let's go ahead and modify our code such that we don't we don't only get this mean IOU or overall

331
00:21:16,960 --> 00:21:23,170
accuracy, but also, uh, a more specific IOUs for each and every category.

332
00:21:23,170 --> 00:21:31,570
So now we'll go ahead and modify the code or update the code so that we will have the pair category

333
00:21:31,570 --> 00:21:35,290
IOU, um, added to our visualizations.

334
00:21:35,650 --> 00:21:43,000
Um, right here we'll go through the pair category IOU, which we obtained from the metrics.

335
00:21:43,000 --> 00:21:47,440
So just like the mean IOU which we get, we could also get the pair category IOU.

336
00:21:47,440 --> 00:21:57,220
So for let's say see in metrics we get the pair category category IOU.

337
00:21:57,250 --> 00:21:58,540
There we go.

338
00:21:58,660 --> 00:22:11,170
Um, if if the value of C is greater than zero point say 0.0010001, um, we are going to create a new

339
00:22:11,170 --> 00:22:15,910
key because just like this, we had a key and then we had a corresponding value.

340
00:22:15,910 --> 00:22:19,450
So we'll create a new key and then we'll give it a name.

341
00:22:19,450 --> 00:22:23,320
What we want to have is for example shoes underscore IOU.

342
00:22:23,320 --> 00:22:25,510
So that's what we want for our keys.

343
00:22:25,510 --> 00:22:29,500
So here we have um ID to label.

344
00:22:29,500 --> 00:22:34,090
We have k and c then enumerate enumerate.

345
00:22:34,090 --> 00:22:35,470
There we go.

346
00:22:35,470 --> 00:22:42,040
So that now for each and every category we have a corresponding ID.

347
00:22:42,580 --> 00:22:49,720
And so now with ID to label we'll be able to get its label id to label plus underscore IOU.

348
00:22:49,720 --> 00:22:56,800
So we can have for example, um, we could have shoes IOU.

349
00:22:56,800 --> 00:23:05,740
And then now we have sample um shoes IOU, which is simply equal C because here we get in this pair

350
00:23:05,740 --> 00:23:06,910
category IOU.

351
00:23:06,910 --> 00:23:10,510
So here we'll take this off now and replace with key.

352
00:23:10,510 --> 00:23:11,470
So that's fine.

353
00:23:11,470 --> 00:23:12,430
Let's run that again.

354
00:23:12,430 --> 00:23:15,490
And then go ahead and check out our visualization.

355
00:23:15,610 --> 00:23:23,650
We now find that in addition to this mean IOU mean accuracy and overall accuracy, we have the different

356
00:23:23,650 --> 00:23:27,490
IOU scores for each and every category.

357
00:23:27,490 --> 00:23:31,870
So let's pick out one category let's say shoes.

358
00:23:32,140 --> 00:23:33,160
Um there we go.

359
00:23:33,160 --> 00:23:34,240
We click on shoes.

360
00:23:34,240 --> 00:23:38,950
You see it ranges all the scores range from 0.05 to 0.8.

361
00:23:38,980 --> 00:23:44,530
Now let's select between say 0.73 and zero point.

362
00:23:44,690 --> 00:23:45,020
Eight.

363
00:23:45,290 --> 00:23:49,580
Here we have ten out of 77 samples.

364
00:23:49,580 --> 00:23:56,660
Um, get an IOU for the shoe between 0.73 and 0.0 0.8.

365
00:23:56,660 --> 00:23:58,580
So let's click on this first one.

366
00:23:59,210 --> 00:24:04,700
And then we could swap between the ground truth and the prediction C ground truth.

367
00:24:04,760 --> 00:24:06,440
You see there we go.

368
00:24:06,440 --> 00:24:07,040
Make sense.

369
00:24:07,040 --> 00:24:08,360
We have here shoes.

370
00:24:08,360 --> 00:24:10,400
And then when we take off ground truth.

371
00:24:10,550 --> 00:24:17,750
Um although we missed out with the romper for the predictions, we have um shoes, which is um, also

372
00:24:17,750 --> 00:24:20,600
which is well predicted, unlike the romper.

373
00:24:20,600 --> 00:24:21,650
So that's it.

374
00:24:21,650 --> 00:24:24,140
We could select another sample.

375
00:24:24,140 --> 00:24:27,080
Let's pick out, say, this one.

376
00:24:28,130 --> 00:24:32,780
Take out the ground truth and the prediction this original with the ground truth.

377
00:24:32,810 --> 00:24:36,080
So here we have the shoes and then the predictions.

378
00:24:36,560 --> 00:24:39,620
We also have the shoes well predicted.

379
00:24:39,620 --> 00:24:47,810
Now what we could do also is let's take those which, um don't have shoes properly predicted.

380
00:24:47,930 --> 00:24:51,380
So now it's between 0.05 and 0.16.

381
00:24:51,380 --> 00:24:52,820
So there are only three samples.

382
00:24:52,820 --> 00:25:00,260
So it turns out that, um, the shoes uh, in general are well predicted as compared to some other samples

383
00:25:00,260 --> 00:25:02,840
which are um, less represented.

384
00:25:03,020 --> 00:25:10,190
So if we, if we click on, let's say, well, just looking at this, this in general, you would notice

385
00:25:10,190 --> 00:25:19,190
that most times when the shoes aren't well or correctly predicted is when the person is putting on a

386
00:25:19,190 --> 00:25:23,420
clothes or a dress which covers the shoe.

387
00:25:23,420 --> 00:25:25,490
So if you click on this here.

388
00:25:26,450 --> 00:25:29,120
You should have this, you see.

389
00:25:29,210 --> 00:25:30,410
Let's take this off.

390
00:25:30,410 --> 00:25:31,430
Take off the predictions.

391
00:25:31,460 --> 00:25:33,950
You see, we have just this little tip here.

392
00:25:33,950 --> 00:25:36,830
So that's where we have problems with predicting the shoe.

393
00:25:36,830 --> 00:25:42,650
So if you want to ameliorate, um, the shoe prediction, you would have to train on more of these kinds

394
00:25:42,650 --> 00:25:43,400
of data.

395
00:25:43,850 --> 00:25:45,260
Um, let's look at the prediction.

396
00:25:45,260 --> 00:25:48,980
You see it picks out just this part.

397
00:25:49,250 --> 00:25:53,720
Let's click on this, um zoom.

398
00:25:53,720 --> 00:25:54,740
There we go.

399
00:25:54,740 --> 00:25:55,220
Okay.

400
00:25:55,220 --> 00:26:02,120
So you see it picks out sees this bag, sees boots, bracelets and blouse all of that.

401
00:26:02,120 --> 00:26:03,800
So that's not that's not really good.

402
00:26:04,160 --> 00:26:10,160
Um, if we click on the ground truth, obviously you just have their shoes.

403
00:26:10,250 --> 00:26:16,670
So this tells us that with these kinds of samples, um, our model struggles to notice that this is

404
00:26:16,670 --> 00:26:17,300
a shoe.

405
00:26:17,300 --> 00:26:20,240
Now, if you go, move to the next one.

406
00:26:21,440 --> 00:26:23,030
We also have a shoe.

407
00:26:23,910 --> 00:26:27,630
Although it doesn't cover all the dress doesn't cover the shoe here.

408
00:26:28,500 --> 00:26:32,160
Um, maybe because of the size of the shoe or the kind of shoe.

409
00:26:32,190 --> 00:26:37,830
We don't predict all because unlike the others, where we had some closed shoes.

410
00:26:38,100 --> 00:26:46,800
Um, here we have, uh, a ballerina, where you have the skin which occupies much part of the the zone

411
00:26:46,800 --> 00:26:48,570
where we usually have the, the shoe.

412
00:26:48,570 --> 00:26:52,830
So here we have skin according to the prediction, here we have shoes.

413
00:26:52,830 --> 00:26:54,360
But he doesn't see this part.

414
00:26:54,390 --> 00:27:01,020
Now, if we move to the next one because there are three samples, if we move to this next one we should

415
00:27:01,020 --> 00:27:02,250
take off the prediction.

416
00:27:02,250 --> 00:27:05,820
You see again here that the shoes isn't very visible.

417
00:27:05,820 --> 00:27:09,900
So the ground truth here we have shoes where we take that off.

418
00:27:09,900 --> 00:27:16,110
But actually normally the ground truth here should have skin because here we have some part of the skin.

419
00:27:16,110 --> 00:27:19,320
So the sample wasn't properly annotated.

420
00:27:19,320 --> 00:27:24,450
Nonetheless what we predict, what we predict here we take off the ground truth.

421
00:27:24,600 --> 00:27:28,350
You see it predicts shorts, predict shoes, shorts.

422
00:27:28,350 --> 00:27:32,130
Well, it doesn't do very well as it's supposed to do.

423
00:27:32,340 --> 00:27:38,250
Um, also, as we've said already, because the trousers cover the shoe and, um, it's not, uh, close

424
00:27:38,250 --> 00:27:43,590
to like, the other, um, samples we have in our data set, so.

425
00:27:43,590 --> 00:27:51,330
Well, when it comes to ballerinas and sandals, um, would have more work, or we would have to gather

426
00:27:51,330 --> 00:27:53,340
much more data with respect to that.

427
00:27:53,340 --> 00:27:57,930
Now, let's pick out some other, let's say, hat we could pick out.

428
00:27:57,930 --> 00:27:58,710
Let's reset.

429
00:27:58,710 --> 00:28:00,210
We could reset here.

430
00:28:00,540 --> 00:28:01,410
Could reset.

431
00:28:01,410 --> 00:28:02,370
That's fine.

432
00:28:02,490 --> 00:28:07,530
And then we pick out, let's say hoodie or we decide to pick out hat.

433
00:28:07,530 --> 00:28:08,340
So that's it.

434
00:28:08,370 --> 00:28:15,840
Hat from 0.01 to 0.38 meaning that it doesn't uh, the model doesn't perform so well when it comes to

435
00:28:15,840 --> 00:28:16,500
the hat.

436
00:28:16,890 --> 00:28:18,870
So let's increase this.

437
00:28:18,870 --> 00:28:19,950
There we go.

438
00:28:19,950 --> 00:28:21,840
We should get the number of samples.

439
00:28:21,840 --> 00:28:26,400
We only have a single sample, which does a little bit better than the others.

440
00:28:27,000 --> 00:28:28,470
So let's take off the ground.

441
00:28:28,470 --> 00:28:34,260
Truth, take off the prediction, see the original image and then try to compare, um, the ground truth

442
00:28:34,260 --> 00:28:37,710
and the prediction when it comes to just the hat.

443
00:28:38,370 --> 00:28:39,660
So there we go.

444
00:28:39,660 --> 00:28:41,190
We have ground truth.

445
00:28:41,220 --> 00:28:43,020
See, we have hat here.

446
00:28:43,470 --> 00:28:44,370
Take that off.

447
00:28:44,370 --> 00:28:47,190
We have prediction and you see takes hat.

448
00:28:47,190 --> 00:28:50,640
But um, see some parts of skin.

449
00:28:51,060 --> 00:28:52,710
Um so that's it.

450
00:28:52,710 --> 00:28:55,980
So it confused between the, the hat and the skin.

451
00:28:55,980 --> 00:29:02,790
So overall we see that the, the the model struggles to predict the hats correctly though you can see

452
00:29:02,790 --> 00:29:04,650
what the shoes does quite well.

453
00:29:05,490 --> 00:29:06,420
Now that's it.

454
00:29:06,450 --> 00:29:16,080
We have seen how to filter, um, the samples based on a specific categories IOU or based on the overall

455
00:29:16,080 --> 00:29:23,070
IOU or, or rather the mean IOU or say the mean accuracy or the overall accuracy.

456
00:29:23,370 --> 00:29:28,380
Now we're going to move to the next section where we shall be generating new samples.