1
00:00:00,080 --> 00:00:06,860
In the section on data preparation, we will start by splitting the data, thus creating a train and

2
00:00:06,860 --> 00:00:08,330
validation data set.

3
00:00:08,360 --> 00:00:13,040
Then we'll preprocesses data, visualize this data.

4
00:00:13,040 --> 00:00:20,720
And finally we are going to make use of voxel 51 for a much better visualization.

5
00:00:20,930 --> 00:00:26,870
Diving into the splitting of our data set, we have this part which we're going to create.

6
00:00:26,870 --> 00:00:31,370
So we have the train path which will simply be the image path.

7
00:00:31,370 --> 00:00:33,350
So here we have image path.

8
00:00:33,740 --> 00:00:36,440
And let's copy the path.

9
00:00:36,500 --> 00:00:38,060
Or let's just let's just copy this.

10
00:00:38,060 --> 00:00:39,890
No need to open that up.

11
00:00:39,890 --> 00:00:42,410
So we copy it and paste it in here.

12
00:00:42,410 --> 00:00:46,820
And then uh we also copy the annotation path.

13
00:00:46,820 --> 00:00:49,520
That's the masks path and paste out here.

14
00:00:49,520 --> 00:00:52,250
So we have annotation path.

15
00:00:53,060 --> 00:00:54,230
There we go.

16
00:00:54,260 --> 00:00:57,440
We copy from here and paste in the cell.

17
00:00:58,550 --> 00:01:01,670
And then we are now going to create our validation path.

18
00:01:01,670 --> 00:01:06,800
So let's just copy this and then um modify it slightly val data set.

19
00:01:06,800 --> 00:01:10,460
And instead of path we have the val image path.

20
00:01:10,460 --> 00:01:13,310
And then the val annotation path okay.

21
00:01:13,310 --> 00:01:14,780
So that should be fine.

22
00:01:14,780 --> 00:01:15,530
This looks fine.

23
00:01:15,530 --> 00:01:18,140
We have images um images.

24
00:01:18,140 --> 00:01:19,670
Uh this should be masks actually.

25
00:01:19,670 --> 00:01:21,140
So let's copy this again.

26
00:01:21,140 --> 00:01:25,070
Copy and paste it in here.

27
00:01:25,100 --> 00:01:26,030
There we go.

28
00:01:26,030 --> 00:01:28,340
And then we also paste this in here.

29
00:01:28,730 --> 00:01:30,530
So this oops.

30
00:01:30,650 --> 00:01:35,120
Um take this off and paste it in here.

31
00:01:35,120 --> 00:01:39,590
So we have val data set val data set there okay.

32
00:01:39,590 --> 00:01:41,720
So this is our different parts.

33
00:01:41,720 --> 00:01:47,450
That's the image and or the train and the validation paths.

34
00:01:48,050 --> 00:01:54,920
With this path defined the next thing we want to do is create this um validation data set directory.

35
00:01:55,160 --> 00:01:58,520
So we have the val data set directory which we're going to create.

36
00:01:58,520 --> 00:02:04,340
And then we also have the PNG images directory and the PNG masks directory.

37
00:02:04,340 --> 00:02:09,320
So it's just going to be similar to what we had already from this original data set.

38
00:02:09,320 --> 00:02:12,740
So now this two year will be for our train.

39
00:02:12,740 --> 00:02:16,040
And then the other two will be for the validation.

40
00:02:16,040 --> 00:02:18,920
Let's run this and there we go.

41
00:02:18,920 --> 00:02:22,910
So let's refresh um and then see what we get.

42
00:02:22,910 --> 00:02:26,210
So here we have our data set which is actually our train data set.

43
00:02:26,210 --> 00:02:29,180
And then here we have the validation data set.

44
00:02:29,180 --> 00:02:29,900
We open this up.

45
00:02:29,900 --> 00:02:33,350
You see we have PNG images and PNG masks.

46
00:02:33,770 --> 00:02:34,520
That's fine.

47
00:02:34,520 --> 00:02:43,310
Now the next step we want to carry out is define or pick out randomly some images from our train data

48
00:02:43,310 --> 00:02:49,220
set and then move them to our validation data set folder.

49
00:02:49,220 --> 00:02:51,140
Now let's select this.

50
00:02:51,170 --> 00:02:54,650
We randomly picked out this um list of images.

51
00:02:54,650 --> 00:02:58,190
If you open up this file or open up this folder.

52
00:02:58,190 --> 00:03:05,120
And here you would find that all the images and even the masks take up this format.

53
00:03:05,120 --> 00:03:11,000
You can see how we have the image underscore the number which is one of this.

54
00:03:11,000 --> 00:03:12,290
And dot png.

55
00:03:12,290 --> 00:03:17,000
So that's uh, the formatting of all our images and even the masks.

56
00:03:17,000 --> 00:03:22,640
So if we open up this mask here, you should be able to have that same format.

57
00:03:22,970 --> 00:03:26,660
Um, here's img underscore number dot png.

58
00:03:26,720 --> 00:03:29,510
And with the mask is set instead of the img.

59
00:03:29,510 --> 00:03:30,950
So that's the only difference.

60
00:03:31,400 --> 00:03:40,220
Let's now use the short tailed tool to move this files or these images from this original folder to

61
00:03:40,220 --> 00:03:41,510
the validation folder.

62
00:03:41,510 --> 00:03:44,480
So we'll start by importing Shutil.

63
00:03:44,480 --> 00:03:45,380
There we go.

64
00:03:45,380 --> 00:03:47,330
Once we import that we could now make use of it.

65
00:03:47,330 --> 00:03:52,760
So we would do for all the different um elements in here.

66
00:03:52,760 --> 00:03:58,250
So for um path or for name in val list.

67
00:03:58,250 --> 00:04:04,790
So for naming val list we are going to do Shutil um move.

68
00:04:04,910 --> 00:04:11,510
And then we specify the initial path and then the final path.

69
00:04:11,510 --> 00:04:12,200
Okay.

70
00:04:12,200 --> 00:04:17,810
So our initial path in a case like this let's suppose we had 0001.

71
00:04:17,810 --> 00:04:24,860
So if we had zero zero 1 or 0 001 as an example, we want to have this path.

72
00:04:24,860 --> 00:04:30,500
That's let's take this that's image path because up to this point is this is the path.

73
00:04:30,500 --> 00:04:31,850
So we have image path.

74
00:04:31,850 --> 00:04:34,940
Then we just simply write here in path.

75
00:04:34,940 --> 00:04:36,410
So we want to have this path.

76
00:04:36,410 --> 00:04:40,490
And then we also want to have IMG underscore.

77
00:04:40,490 --> 00:04:45,560
So take this off plus the name because we're going through this list.

78
00:04:45,560 --> 00:04:48,980
So plus name for a specific name in that list.

79
00:04:48,980 --> 00:04:52,970
And then plus um dot png okay.

80
00:04:52,970 --> 00:04:53,810
So that's it.

81
00:04:53,810 --> 00:04:59,450
And then for the final path final destination, let's copy this and paste out here.

82
00:04:59,960 --> 00:05:04,910
For our final destination, we would have instead vile image path.

83
00:05:04,910 --> 00:05:08,990
And then we'll also have um image underscore name dot png.

84
00:05:09,020 --> 00:05:13,520
So this is going to be and then now we'll just do the same for the annotations.

85
00:05:13,550 --> 00:05:17,180
Remember the annotations we had seq instead of image or instead of img.

86
00:05:17,210 --> 00:05:23,120
So here we have um ano path ano path.

87
00:05:23,360 --> 00:05:27,740
And then here we have an o path.

88
00:05:27,770 --> 00:05:30,200
Okay so let's take this off.

89
00:05:30,230 --> 00:05:31,520
We now have seq.

90
00:05:31,520 --> 00:05:33,890
And then here we have seq.

91
00:05:34,670 --> 00:05:37,340
And let's run this and then see what we get.

92
00:05:38,060 --> 00:05:39,500
We are getting an error.

93
00:05:39,500 --> 00:05:40,880
Let's um look at this.

94
00:05:40,880 --> 00:05:46,490
When we look at this, you see there's no, uh, the arrows showing no such file or directory as this

95
00:05:46,490 --> 00:05:46,820
year.

96
00:05:46,820 --> 00:05:49,040
We see that there is this slash which we omitted.

97
00:05:49,040 --> 00:05:57,290
So let's, um, slide up and then modify here, just add the slash and then we rerun again.

98
00:05:57,290 --> 00:06:04,640
So we have here slash and then here slash run again and move this files.

99
00:06:05,090 --> 00:06:07,160
We still have another error.

100
00:06:07,160 --> 00:06:14,000
And this time around we notice also that we omitted this images folder in our PNG images folder.

101
00:06:14,000 --> 00:06:17,930
So getting back up we just need to make this directory.

102
00:06:17,930 --> 00:06:19,610
So we have images.

103
00:06:19,610 --> 00:06:22,340
And then we also have masks.

104
00:06:22,520 --> 00:06:24,260
So masks okay.

105
00:06:24,260 --> 00:06:25,550
So that's it.

106
00:06:25,550 --> 00:06:26,990
Let's run this again.

107
00:06:27,380 --> 00:06:28,820
Um that's already been done.

108
00:06:28,820 --> 00:06:32,810
And then now let's um move.

109
00:06:33,050 --> 00:06:36,140
We have the files which have been moved as we could see here.

110
00:06:36,140 --> 00:06:40,190
If we click Open Images right here, you could find these files.

111
00:06:40,190 --> 00:06:43,130
Now the next step will be to create a TensorFlow data set.

112
00:06:43,130 --> 00:06:50,570
So we'll have our train data set um which is going to be our TensorFlow data set made of the different

113
00:06:50,570 --> 00:06:51,380
file paths.

114
00:06:51,380 --> 00:06:57,200
So let's go ahead and do that um from tensor slices.

115
00:06:57,200 --> 00:07:05,210
And then the way we will define this is such that we have this image or the path of an image and its

116
00:07:05,210 --> 00:07:06,650
corresponding annotation.

117
00:07:06,650 --> 00:07:11,270
So here we have Im path that's image path.

118
00:07:12,050 --> 00:07:19,400
And then for I in the image list that's the list of all our images.

119
00:07:19,400 --> 00:07:22,010
So we have image path.

120
00:07:22,010 --> 00:07:26,000
And then we'll also have the annotation.

121
00:07:26,000 --> 00:07:27,650
So this is the first slice.

122
00:07:27,650 --> 00:07:29,360
And then our second slice.

123
00:07:29,510 --> 00:07:32,570
Here we have annotation.

124
00:07:32,570 --> 00:07:34,100
Let's just put it below.

125
00:07:34,100 --> 00:07:37,910
So copy this and paste here.

126
00:07:37,910 --> 00:07:44,030
So here we have annotation annotation and annotation.

127
00:07:44,030 --> 00:07:52,760
So for every um element in this TensorFlow data set we have an image path and its corresponding annotation

128
00:07:52,760 --> 00:07:53,420
path.

129
00:07:53,480 --> 00:08:01,640
Let's run this and let's do for I in train data set train data set.

130
00:08:01,640 --> 00:08:05,930
Let's take an element and then we print it out.

131
00:08:06,050 --> 00:08:07,460
Well we have it an error.

132
00:08:07,460 --> 00:08:11,300
So let's get back up here and see why are we getting this error.

133
00:08:11,300 --> 00:08:16,010
Here we have list object has no attribute is identifier.

134
00:08:16,040 --> 00:08:19,910
Now let's get back and try to see where we made an error.

135
00:08:20,030 --> 00:08:23,450
Um looking at this we forgot to make this a tuple.

136
00:08:23,450 --> 00:08:33,170
So this is actually a tuple composed of this input or this image and its corresponding annotation.

137
00:08:33,170 --> 00:08:37,280
So let's run this again and see what we get.

138
00:08:37,520 --> 00:08:38,600
There we go.

139
00:08:38,600 --> 00:08:45,080
You can now see that we have this image or this image path scroll scroll this way.

140
00:08:45,080 --> 00:08:46,400
So we could see that image path.

141
00:08:46,400 --> 00:08:47,990
We have this image path.

142
00:08:47,990 --> 00:08:53,390
And then we also have its corresponding annotation path.

143
00:08:53,840 --> 00:08:57,620
Um, and also looking at this annotation path you will find that it's different actually.

144
00:08:57,620 --> 00:08:59,810
So we should modify this.

145
00:08:59,810 --> 00:09:03,860
Here we have 0987 whereas here we have 0336.

146
00:09:03,860 --> 00:09:11,570
This means that looping through the image path and the annotation path, we are not getting the exact

147
00:09:11,570 --> 00:09:16,940
same order of the different, um, elements.

148
00:09:16,940 --> 00:09:19,430
That's the different images and the annotations.

149
00:09:19,430 --> 00:09:26,090
So that said, instead of looping through the annotation path here, we want to loop through the image

150
00:09:26,090 --> 00:09:26,540
path.

151
00:09:26,540 --> 00:09:28,430
So we'll still loop through the image path.

152
00:09:28,430 --> 00:09:34,790
But since we do not want to have the images, but instead of segmentation, um, what we'll do is we

153
00:09:34,790 --> 00:09:38,960
are going to take off part of this eye right here.

154
00:09:38,990 --> 00:09:44,000
Now let's, let's, let's go step by step so you understand exactly what's going on.

155
00:09:44,000 --> 00:09:57,650
So if you do um, for I in um list directory image path and then you print out I, you print out I brick.

156
00:09:57,740 --> 00:09:59,630
If you print out I, you would.

157
00:09:59,750 --> 00:10:00,080
Find out.

158
00:10:00,080 --> 00:10:01,340
We have this here.

159
00:10:01,370 --> 00:10:07,880
This path, or rather this file name image as img 0336 png.

160
00:10:07,910 --> 00:10:14,810
Okay, now the idea here is to take off this img and replace it with seg.

161
00:10:14,810 --> 00:10:17,960
So let me get back to this.

162
00:10:17,960 --> 00:10:20,540
Easier to open this up since there are fewer files.

163
00:10:20,540 --> 00:10:24,350
So let's scroll down and you see you see here we have seg.

164
00:10:24,350 --> 00:10:33,920
So what we want to do is maintain this path to maintain this path while taking off this first three

165
00:10:33,920 --> 00:10:34,760
letters.

166
00:10:34,760 --> 00:10:37,670
So now what we'll do is we will have I.

167
00:10:37,700 --> 00:10:46,490
So if we want to have I or let's say we take seg, we want to do seg plus or rather seg like this seg

168
00:10:47,090 --> 00:10:52,550
plus I, which goes from because this is zero, one, two, three.

169
00:10:52,550 --> 00:10:57,170
So it goes from three to the end, from three to the end.

170
00:10:57,170 --> 00:11:02,270
Okay, run that again and you see we have 60336.

171
00:11:02,270 --> 00:11:02,930
That's fine.

172
00:11:02,930 --> 00:11:08,450
So getting back here we're going to have an annotation path.

173
00:11:08,450 --> 00:11:13,310
Plus I but instead of having plus I we're going to have plus seg.

174
00:11:13,520 --> 00:11:14,690
There we go.

175
00:11:14,690 --> 00:11:18,590
And then plus I from three to the end.

176
00:11:18,590 --> 00:11:22,490
So let's run this again and then check this out.

177
00:11:22,850 --> 00:11:26,690
We now see here that we have exact same um file.

178
00:11:26,690 --> 00:11:32,960
So we have the image and then its corresponding annotation which is uh which matches exactly what we

179
00:11:32,960 --> 00:11:33,470
expect.

180
00:11:33,470 --> 00:11:35,510
So here we have 0336.

181
00:11:35,510 --> 00:11:37,490
And then here we have 0336.

182
00:11:37,490 --> 00:11:39,500
We could print out let's say three orders.

183
00:11:39,500 --> 00:11:45,080
So we see that all this match because this time around we are taking from the same list.

184
00:11:45,080 --> 00:11:46,490
So that's it

185
00:11:46,490 --> 00:11:52,370
0613070306130703.

186
00:11:52,520 --> 00:11:53,510
So that's fine.

187
00:11:53,510 --> 00:11:58,790
We just replicate this for the validation and we move to the next step.

188
00:11:58,790 --> 00:12:01,760
So here we have validation.

189
00:12:02,060 --> 00:12:07,670
And then we have val image path val image path.

190
00:12:08,360 --> 00:12:15,440
We also have oops we have val image path and val image path.

191
00:12:15,440 --> 00:12:18,620
Then here we have val image path tool okay.

192
00:12:18,620 --> 00:12:22,820
So we have the training and the validation sets ready.

193
00:12:22,850 --> 00:12:25,280
Go run this take this off.

194
00:12:25,910 --> 00:12:32,420
Then we could print out for I in val data set.

195
00:12:33,080 --> 00:12:34,850
Let's take a single element.

196
00:12:35,450 --> 00:12:36,860
Let's print out.

197
00:12:37,040 --> 00:12:38,780
Let's print out I.

198
00:12:40,480 --> 00:12:41,350
There we go.

199
00:12:41,350 --> 00:12:43,090
We have this now.

200
00:12:43,090 --> 00:12:45,640
So here we have image.

201
00:12:45,850 --> 00:12:50,950
Um, 0049 and then here we have 60049.

202
00:12:50,950 --> 00:12:52,750
And this is a valid data set.

203
00:12:52,750 --> 00:12:55,060
Whereas here is our train data set.

204
00:12:55,240 --> 00:13:00,610
Let's obtain the length of our train data set and that of the validation data set.

205
00:13:00,610 --> 00:13:02,830
We have length train data set.

206
00:13:03,280 --> 00:13:06,730
And then Len val data set.

207
00:13:06,730 --> 00:13:07,630
There we go.

208
00:13:07,630 --> 00:13:14,500
So from here we'll see that if we sum this two up we should obtain a thousand different images.

209
00:13:14,500 --> 00:13:19,780
And at this point we are going to dive into the next step which is pre-processing.

210
00:13:19,780 --> 00:13:21,670
So we've done the splitting.

211
00:13:21,670 --> 00:13:24,820
And now we're heading over to preprocess.
