1
00:00:00,180 --> 00:00:07,180
Hello, everyone, and welcome to this new and exciting session in which we are going to treat TensorFlow.

2
00:00:07,200 --> 00:00:16,110
Record TensorFlow Records helps us build more efficient data pipelines as they help us store data,

3
00:00:16,110 --> 00:00:23,340
which we are use to train our models more efficiently, and also the help in paralyzing the reading

4
00:00:23,340 --> 00:00:28,650
of the data, hence helping in speeding up the overall training process.

5
00:00:28,650 --> 00:00:34,650
And so in this section, together with the TensorFlow data sets, which we've seen already, we are

6
00:00:34,650 --> 00:00:41,310
going to see how to implement an efficient training pipeline with TensorFlow.

7
00:00:41,610 --> 00:00:47,010
In the previous sections, we've been working with TensorFlow datasets in the session.

8
00:00:47,010 --> 00:00:57,420
We'll see how to convert our TensorFlow dataset into a TensorFlow record and then get this TensorFlow

9
00:00:57,420 --> 00:01:03,450
record and convert it back to TensorFlow datasets to be used for training.

10
00:01:03,720 --> 00:01:08,970
Now, the very first question you ask yourself is, given that we've already carried out this training

11
00:01:08,970 --> 00:01:16,160
process successfully without any problems, why do we need to work with TensorFlow records?

12
00:01:16,170 --> 00:01:16,980
Now?

13
00:01:16,980 --> 00:01:23,340
There are two major problems TensorFlow records come to solve or the two advantages of working with

14
00:01:23,340 --> 00:01:24,510
TensorFlow records.

15
00:01:24,510 --> 00:01:31,050
The very first one is the fact that you can now store your data more efficiently.

16
00:01:31,260 --> 00:01:37,200
Now notice that every time you have to create this data, you're creating this data from this data set

17
00:01:37,200 --> 00:01:44,130
which we've loaded here, which is made up of little files of like, see a few kilobytes, let's open

18
00:01:44,130 --> 00:01:46,500
this up so you could see for yourself.

19
00:01:46,500 --> 00:01:52,800
We have this year let's take this one, open this folder, and then we'll see the size of this.

20
00:01:52,800 --> 00:01:57,350
So you can see the size is about 17.26 kilobytes.

21
00:01:57,360 --> 00:02:04,110
Now, the fact that we have to do with deal with these kinds of files means that we are not going to

22
00:02:04,110 --> 00:02:07,650
always have to load this data very efficiently.

23
00:02:07,770 --> 00:02:14,640
Now, it's true, working with the TensorFlow datasets brings in some efficiency, but what if we start

24
00:02:14,640 --> 00:02:17,910
this data in a very efficient manner?

25
00:02:17,910 --> 00:02:25,230
That is, instead of storing, let's say, for example, 17 kilobytes files like this, what if we just

26
00:02:25,230 --> 00:02:32,460
store, say, ten megabyte files or say 100 megabyte files?

27
00:02:32,460 --> 00:02:38,430
So every time we want to read from a file, we don't have to read from many of these kinds of files,

28
00:02:38,430 --> 00:02:42,150
but from a single file like this one.

29
00:02:43,080 --> 00:02:50,610
Then another good thing with having to work with TensorFlow records, which start in this kind of form,

30
00:02:50,610 --> 00:03:00,540
is that this time around you could carry out the preprocessing before storing the data so you could

31
00:03:00,930 --> 00:03:02,370
augment your data.

32
00:03:02,370 --> 00:03:09,270
So we have augmented data, which is now stored as TensorFlow records.

33
00:03:09,270 --> 00:03:14,330
So instead of having the images and then getting here and each and every time you have to carry out

34
00:03:14,350 --> 00:03:21,150
augmentation or some preprocessing before training, you store the data, which has been augmented already.

35
00:03:21,150 --> 00:03:28,170
So suppose you have your initial data, so you have your initial data year, it passes through some

36
00:03:28,170 --> 00:03:33,930
pre processing, so it passes through some pre processing and then let's change this color.

37
00:03:33,930 --> 00:03:40,020
So yeah, we have the data and then here we have the pre processing and then from the pre processing

38
00:03:40,020 --> 00:03:42,780
we have the augmented data.

39
00:03:42,780 --> 00:03:45,300
So after this the data has been augmented.

40
00:03:45,450 --> 00:03:52,490
What we do is instead of having to go through this each and every time, what we do is we pass through,

41
00:03:52,510 --> 00:03:55,860
we go to this once and we store our data in.

42
00:03:55,860 --> 00:04:02,490
This is augmented form such that the next time we want to train our model, all we need to do is just

43
00:04:02,490 --> 00:04:06,390
make use of this augmented data right here.

44
00:04:07,140 --> 00:04:14,250
Apart from this, it should be noted that sometimes we have models or we have sections of a model.

45
00:04:14,250 --> 00:04:15,600
Let's draw it this way.

46
00:04:16,470 --> 00:04:19,740
We have let's suppose that this is our model here.

47
00:04:19,740 --> 00:04:23,880
So this is our model is made of section one and section two.

48
00:04:23,910 --> 00:04:29,100
The Section one is fixed, meaning that we are now going to train this path, but we're going to train

49
00:04:29,100 --> 00:04:30,180
only this part.

50
00:04:30,210 --> 00:04:36,840
This means that when you pass in data right here, when you take this data and you pass in your the

51
00:04:36,840 --> 00:04:42,540
output, you would have your is going to be the same each and every time.

52
00:04:42,540 --> 00:04:45,900
Given that this year is fixed, the weights here are fixed.

53
00:04:45,900 --> 00:04:52,770
So what we could do is instead of storing the data, we are going to start this outputs which we could

54
00:04:52,770 --> 00:04:54,270
call embeddings.

55
00:04:54,270 --> 00:04:59,940
So instead of storing the data, we store this embeddings here such that we now make use of them.

56
00:05:00,070 --> 00:05:08,020
Bedouins directly and train our model or train this part of the model which is actually trainable on

57
00:05:08,020 --> 00:05:10,870
this embeddings instead of working with the dieter.

58
00:05:10,870 --> 00:05:18,160
So this time around we see that we have data, the preprocessing, the augmentation, and then we have

59
00:05:18,160 --> 00:05:21,400
the embedded or the embeddings from this data.

60
00:05:21,400 --> 00:05:25,540
So here we have the embeddings, so we can now store this as TensorFlow records.

61
00:05:25,540 --> 00:05:33,490
So you see that it gives you or it gives us some kind of flexibility as to what we can store and be

62
00:05:33,490 --> 00:05:37,690
able to retrieve the START data and make use of it as we wish.

63
00:05:38,500 --> 00:05:47,080
The second major advantage of working with TensorFlow records, apart from this efficiency in storing

64
00:05:47,080 --> 00:05:52,720
data, is the fact that they encourage the parallelism of reading data.

65
00:05:52,750 --> 00:06:00,430
Now, this means that if we happen to train our model on several hosts, let's say we have four, four

66
00:06:00,520 --> 00:06:06,970
different hosts or so, we have four different machines on which we train our data like this, and we

67
00:06:06,970 --> 00:06:09,250
have our data set right here.

68
00:06:09,430 --> 00:06:17,270
What we could do is we could create shards of our TensorFlow record data.

69
00:06:17,290 --> 00:06:23,100
So here, for example, let's suppose that this are our data set, our complete data set.

70
00:06:23,110 --> 00:06:25,840
We could break this up into several parts.

71
00:06:25,840 --> 00:06:30,910
So let's say we break this up such that each host takes care of two.

72
00:06:30,910 --> 00:06:34,480
So we have 1 to 1 to 1 to 1.

73
00:06:34,480 --> 00:06:36,460
Let's add another part here.

74
00:06:36,460 --> 00:06:38,620
So each of these takes care of this.

75
00:06:38,620 --> 00:06:45,160
So this this host yet trains on this data, this other one trains and this this other one trains on

76
00:06:45,160 --> 00:06:47,920
this this one trains on this.

77
00:06:48,910 --> 00:06:59,200
Now, as a rule of thumb, is generally advisable to make sure that each host has about ten of this

78
00:06:59,200 --> 00:07:05,290
packs of 10 to 100 mega bytes of our data set.

79
00:07:06,490 --> 00:07:14,650
So that said, if we have a ten gigabyte data set like suppose that all this now is ten gigabyte, let's

80
00:07:14,650 --> 00:07:15,940
take this off.

81
00:07:15,940 --> 00:07:19,510
We have ten gigabyte data set.

82
00:07:19,510 --> 00:07:21,100
That's right this in here.

83
00:07:21,460 --> 00:07:25,450
So we have your ten gigabyte data set.

84
00:07:25,900 --> 00:07:36,420
What we want to do is have your likes each and every host taking care of at least ten packs of our TensorFlow

85
00:07:36,430 --> 00:07:40,660
of our TensorFlow record, which we've created.

86
00:07:40,660 --> 00:07:43,030
So let's take this off.

87
00:07:43,060 --> 00:07:51,910
It's no longer two, we have to create ten packs or ten charts of our TensorFlow records and allocate

88
00:07:51,910 --> 00:07:55,200
that to each in every host we have here.

89
00:07:55,210 --> 00:08:01,960
Now, that said, given that we have four hosts, then we will have four times ten.

90
00:08:04,280 --> 00:08:05,840
Shards to create.

91
00:08:06,110 --> 00:08:12,800
So we'll break this ten gigabyte data set up into 40 different parts.

92
00:08:12,950 --> 00:08:21,740
So we have 40 different power packs or shards of our TensorFlow data set or rather of our TensorFlow

93
00:08:21,740 --> 00:08:23,600
record right here.

94
00:08:23,600 --> 00:08:34,430
And then each and every one of them will be approximately 250 megabytes.

95
00:08:35,720 --> 00:08:41,240
Since ten gigabytes that we convert this to megabytes, we will have 10,000.

96
00:08:41,390 --> 00:08:45,380
Suppose that a thousand megabytes equal a gigabyte.

97
00:08:45,380 --> 00:08:55,550
So we have your 10,000 megabytes divided by 40, which will give us 250 megabytes per shard or per pack

98
00:08:55,550 --> 00:09:00,140
that we've created your per split off our TensorFlow records.

99
00:09:01,790 --> 00:09:12,230
And so this will lead to some reasonable gains because we now can train our model on this paralyzed

100
00:09:12,230 --> 00:09:13,220
data.

101
00:09:13,310 --> 00:09:25,310
And then we could also pre fetch huge chunks of our data set to be precise, 250 megabytes, such that

102
00:09:25,310 --> 00:09:31,040
once the model is ready, the model just feeds on this data, which has already been pre fetched.

103
00:09:31,070 --> 00:09:37,550
Now, you could check out the previous sessions where we talk about pre fetching under TensorFlow data

104
00:09:37,550 --> 00:09:38,240
sets.

105
00:09:39,140 --> 00:09:46,820
That said, what we actually store in this TensorFlow records are the protocol buffers.

106
00:09:46,940 --> 00:09:56,240
And the way TensorFlow manages this is by making use of this TensorFlow example class which is defined

107
00:09:56,240 --> 00:09:57,060
here.

108
00:09:57,080 --> 00:09:58,550
Let's check this out.

109
00:09:58,550 --> 00:10:06,920
Your TensorFlow example, which is defined here as a standard protocol storing data for training and

110
00:10:06,920 --> 00:10:07,790
inference.

111
00:10:08,420 --> 00:10:17,120
And so if you have to convert your data into this, into the proto files, you would have to make use

112
00:10:17,120 --> 00:10:26,020
of this TensorFlow example class and then you would have to understand this representation right here.

113
00:10:26,030 --> 00:10:33,110
And so in our case, where we're dealing with an image and its corresponding label, let's say we have

114
00:10:33,110 --> 00:10:40,460
the image of this person, smile, then we have the label one, so we have this image and the presence

115
00:10:40,460 --> 00:10:51,260
corresponding label would have to convert this data into this format before creating our TensorFlow

116
00:10:51,260 --> 00:10:52,370
records.

117
00:10:53,030 --> 00:10:55,360
Now here we have this dictionary.

118
00:10:55,370 --> 00:11:02,930
As I said here, it contains a key value store example features where each key, which is a string,

119
00:11:02,930 --> 00:11:05,090
maps to a feature.

120
00:11:06,500 --> 00:11:12,530
Now note that here we have this feature with SX and here without SX.

121
00:11:13,100 --> 00:11:19,750
So we have this features and each and every one of them is a feature.

122
00:11:19,760 --> 00:11:25,430
So we could click here, you could click here and you see this feature, you could check out the documentation

123
00:11:25,430 --> 00:11:28,040
and we have this contained.

124
00:11:28,040 --> 00:11:29,330
Let's take this off.

125
00:11:29,330 --> 00:11:33,410
The content list can be one of the three types a byte list.

126
00:11:33,410 --> 00:11:40,130
Generally, this is our information flow list or an inch 64 list.

127
00:11:40,130 --> 00:11:46,920
So you would pick your feature depending on the kind of data you're having.

128
00:11:46,940 --> 00:11:53,570
Now here you have this features, which is like a combination of these different features here.

129
00:11:53,720 --> 00:12:01,370
So you have your the INT list, you have the float feature, the float list, the byte list, and then

130
00:12:01,370 --> 00:12:08,930
here you create the, you create the features from this different features here.

131
00:12:09,170 --> 00:12:10,820
So from the, from each.

132
00:12:10,820 --> 00:12:16,700
And because this is a feature, this, this, this one year, this one year is a feature.

133
00:12:16,700 --> 00:12:17,960
Let's take this off.

134
00:12:17,960 --> 00:12:19,430
This one year is a feature.

135
00:12:19,430 --> 00:12:23,900
This is a feature and this is a feature that has been defined already here.

136
00:12:23,900 --> 00:12:27,110
So you can see NS feature is a type feature there.

137
00:12:27,110 --> 00:12:29,450
Is it float feature the same?

138
00:12:29,450 --> 00:12:29,960
There is.

139
00:12:29,960 --> 00:12:32,780
It bites feature the same, that is it.

140
00:12:32,960 --> 00:12:37,300
Now all this combined forms features which is of type features.

141
00:12:37,310 --> 00:12:38,120
See the difference.

142
00:12:38,120 --> 00:12:43,870
You have this and this without the se obviously the singular and that's plural.

143
00:12:43,880 --> 00:12:50,990
So that said, if we want to create our TensorFlow records, I want to convert our data center TensorFlow

144
00:12:50,990 --> 00:12:51,530
records.

145
00:12:51,530 --> 00:12:56,090
We have to take into consideration this formatting of our data.

146
00:12:56,960 --> 00:13:01,320
So we get back to the code and then we'll add this to in parts here.

147
00:13:01,340 --> 00:13:04,160
Here we have this TensorFlow train.

148
00:13:04,160 --> 00:13:07,910
We import my list float list and in 64 list.

149
00:13:07,910 --> 00:13:11,840
So these are the types of our feature.

150
00:13:11,840 --> 00:13:18,890
And then we have from TensorFlow training, again, we import example, we import features and we import

151
00:13:18,890 --> 00:13:19,700
feature.

152
00:13:19,700 --> 00:13:25,310
So with that, let's run the cell, We get back to our code.

153
00:13:25,310 --> 00:13:27,720
We have your TensorFlow records.

154
00:13:27,740 --> 00:13:31,900
Now what we'll start by doing is on batching our data.

155
00:13:31,910 --> 00:13:34,970
So we'll start by batching our data.

156
00:13:35,320 --> 00:13:40,810
Note that to run this we've taken we've taken off this pre fetching as we will not need this we just

157
00:13:40,810 --> 00:13:42,460
take our data as it is.

158
00:13:42,460 --> 00:13:45,910
So we have our trained data set which has been augmented.

159
00:13:45,910 --> 00:13:51,730
You could carry out any preprocessing you want before starting this data and then the validation data

160
00:13:51,730 --> 00:13:52,750
remains validation.

161
00:13:52,750 --> 00:13:55,420
So we have that valid data set there.

162
00:13:55,540 --> 00:14:02,080
Then from here we're from the cells already from year, we'll go ahead and batch this data.

163
00:14:02,590 --> 00:14:03,370
There we go.

164
00:14:03,370 --> 00:14:07,660
We have training data set, we run this we have this year.

165
00:14:07,690 --> 00:14:10,540
Then we do the same for the validation.

166
00:14:10,540 --> 00:14:18,100
So here we have validation data set and validation is set on batch.

167
00:14:18,340 --> 00:14:19,150
There we go.

168
00:14:19,150 --> 00:14:24,390
We have that you could see for yourself training data set, training data set.

169
00:14:24,400 --> 00:14:25,810
This is going to be unmatched.

170
00:14:25,810 --> 00:14:27,590
So you have your own batch data set.

171
00:14:27,610 --> 00:14:30,310
Notice that the batch dimension is taken off.

172
00:14:30,610 --> 00:14:30,780
Good.

173
00:14:30,850 --> 00:14:33,300
Simply have this for the validation.

174
00:14:33,310 --> 00:14:34,300
There we go.

175
00:14:34,300 --> 00:14:35,230
Validation.

176
00:14:35,230 --> 00:14:36,430
You run that.

177
00:14:36,550 --> 00:14:38,440
See that taken off two.

178
00:14:39,370 --> 00:14:42,010
And then we get back to this documentation.

179
00:14:42,010 --> 00:14:48,910
Recall that before creating the TensorFlow record, we need to put our data in a certain format.

180
00:14:48,940 --> 00:14:51,970
More specifically, we need to create proto files.

181
00:14:51,970 --> 00:14:59,650
And the way we do this is by making use of TensorFlow example and then to create the stance of example.

182
00:14:59,650 --> 00:15:06,280
We have this features which we need to combine to create this TensorFlow examples.

183
00:15:06,280 --> 00:15:11,170
And in this documentation here we have all that is needed to create this.

184
00:15:11,170 --> 00:15:13,660
So here you see for example, this example.

185
00:15:13,660 --> 00:15:21,670
You could copy this out simply and then we paste this right here so you could see the int, the float,

186
00:15:21,670 --> 00:15:22,570
the bytes.

187
00:15:22,570 --> 00:15:26,650
But since we are not going to use the floats here, we could take this off.

188
00:15:26,650 --> 00:15:28,360
We're not going to use this float feature.

189
00:15:28,360 --> 00:15:36,430
What we're interested in is the byte feature because recall we have the image and we have the label,

190
00:15:36,430 --> 00:15:43,660
so we have the image of the person happy and then we have the level one.

191
00:15:43,660 --> 00:15:50,470
So this label will be this end feature and the image will be the bytes feature.

192
00:15:50,710 --> 00:15:53,740
So let's go ahead and get this done.

193
00:15:53,740 --> 00:15:57,400
We have your let's put this first.

194
00:15:57,400 --> 00:16:02,680
So here we have our bytes and then we have our INT.

195
00:16:02,710 --> 00:16:06,820
Now we're going to create an example from your you can see clearly we've imported this already, so

196
00:16:06,820 --> 00:16:10,720
we just have that even here can have this features.

197
00:16:10,720 --> 00:16:13,810
There we go, here we have byte list.

198
00:16:13,840 --> 00:16:25,900
See here we have feature here, we have feature here we have in 64 list feature of type for list and

199
00:16:25,900 --> 00:16:26,620
that should be it.

200
00:16:26,800 --> 00:16:31,360
So we have all this year we need we take this off, we don't need that.

201
00:16:31,510 --> 00:16:35,020
So here we have we change this name.

202
00:16:35,020 --> 00:16:43,000
We put we call this image, we'll call this image our images and then we'll call this labels.

203
00:16:43,000 --> 00:16:43,900
So that's it.

204
00:16:44,470 --> 00:16:48,670
Okay, So we have this year, let's get back.

205
00:16:49,390 --> 00:16:50,200
That's fine.

206
00:16:50,200 --> 00:16:51,280
We have that.

207
00:16:51,850 --> 00:16:55,020
The bytes does images and we have the levels.

208
00:16:55,030 --> 00:17:03,010
Now, once we have this, the next thing to do is to put in the correct values in here.

209
00:17:03,010 --> 00:17:09,370
So instead of having this year, we take this off and then here we're going to create a method which

210
00:17:09,370 --> 00:17:18,610
will call create example, call This method create example is going to take our image and also the label.

211
00:17:18,610 --> 00:17:26,500
So it takes the image and the label and then what it returns is our serialized example.

212
00:17:26,500 --> 00:17:33,040
So we've had this example which we've created and these are serialized examples we pass we call the

213
00:17:33,040 --> 00:17:36,580
serialized to String Method right here.

214
00:17:36,580 --> 00:17:43,810
Now, that said, instead of this year we'll take in our image and then instead of this year we'll take

215
00:17:43,810 --> 00:17:45,550
in our level.

216
00:17:45,640 --> 00:17:47,350
So that should be fine.

217
00:17:47,350 --> 00:17:53,560
We now run this and the next thing we'll do is define the number of charts.

218
00:17:53,560 --> 00:17:53,800
Here.

219
00:17:53,800 --> 00:17:57,250
We have ten charts and then the path.

220
00:17:57,250 --> 00:18:00,520
So you call this TensorFlow records.

221
00:18:00,520 --> 00:18:07,690
Let's create this new folder, your F records, That's OC, t F records.

222
00:18:07,690 --> 00:18:08,470
That's fine.

223
00:18:08,470 --> 00:18:14,140
And then we will have the charts with your specific number.

224
00:18:14,140 --> 00:18:21,600
So here we have the chart or basically we have this name, let's have your file name, and then here

225
00:18:21,610 --> 00:18:24,310
is the extension to F record.

226
00:18:24,610 --> 00:18:26,020
Okay, so that's it.

227
00:18:26,020 --> 00:18:34,030
And then look and run this, and then the next thing we want to do is to get back to the documentation

228
00:18:34,030 --> 00:18:34,980
to f dot i.

229
00:18:34,990 --> 00:18:35,110
O.

230
00:18:35,630 --> 00:18:43,490
And then we get to this TF record rider where we are going to see how the write in a TF record file.

231
00:18:43,490 --> 00:18:51,420
So we have this year years of simple definition arguments.

232
00:18:51,440 --> 00:18:55,010
You specify the path and you also use this example.

233
00:18:55,010 --> 00:18:58,220
So we have here write the records to a file.

234
00:18:58,220 --> 00:19:02,600
So let's copy this simply and then we paste this year.

235
00:19:02,600 --> 00:19:12,950
So we have with this TF record writer, we specify the path you're our path is going to be this path.

236
00:19:12,950 --> 00:19:14,480
We have that path.

237
00:19:14,480 --> 00:19:18,650
And then as file writer, we're going to write our information in that.

238
00:19:18,650 --> 00:19:23,270
Now we want our file to get those different names.

239
00:19:23,270 --> 00:19:32,980
So here we have path, dot format, and then we'll specify a given shard, so a given part of our data.

240
00:19:32,990 --> 00:19:35,860
So let's get back here.

241
00:19:35,870 --> 00:19:41,810
Recall that when we create our TensorFlow record, we could create this as a block like this, and then

242
00:19:41,810 --> 00:19:45,680
we'll later on shard this, break it up into different parts.

243
00:19:45,680 --> 00:19:51,980
So here is actually we have ten shards, so break this up into this ten different parts.

244
00:19:51,980 --> 00:19:58,460
One, two, three, four, five, six, seven, eight, nine, ten.

245
00:19:58,460 --> 00:19:59,660
Okay, So there we go.

246
00:19:59,660 --> 00:20:07,070
We have this ten different charts, and then we want each chart to have a different file name.

247
00:20:07,070 --> 00:20:16,220
And that's why if you look at this year, you see we have this formatting such that we pass in a given

248
00:20:16,220 --> 00:20:17,720
chart number in year.

249
00:20:17,720 --> 00:20:19,820
So we have the chart number.

250
00:20:20,210 --> 00:20:21,110
So that's it.

251
00:20:21,110 --> 00:20:35,900
And what we'll do now is for each shard that's out of the ten charts here for chart number in range,

252
00:20:35,900 --> 00:20:40,640
number of shards, that's number of shards specify Your Honor has ten.

253
00:20:40,640 --> 00:20:42,470
So it's basically for shard number in ten.

254
00:20:42,470 --> 00:20:48,770
So we are going to go through, we're going to loop through this and we are going to create a file for

255
00:20:48,770 --> 00:20:49,550
each shard.

256
00:20:49,550 --> 00:20:51,470
So we're breaking up here.

257
00:20:51,470 --> 00:20:56,300
So for each and every one of this, we're going to create a file, this one, a file, this a file,

258
00:20:56,300 --> 00:20:57,590
and so on and so forth.

259
00:20:58,340 --> 00:21:00,230
So let's take this off.

260
00:21:00,830 --> 00:21:02,090
There we go.

261
00:21:02,240 --> 00:21:08,480
Now, getting back here, once we have had this in place, once we've set this up, now, recall that

262
00:21:09,080 --> 00:21:13,070
we had created this create example method right here.

263
00:21:13,070 --> 00:21:22,430
And if you notice, you find that in this year we actually or this example is kind of like doing exact

264
00:21:22,430 --> 00:21:27,470
same thing with this create example where we have these different features which are created and then

265
00:21:27,470 --> 00:21:31,040
at the end we have this serialized to string method, which is called.

266
00:21:31,040 --> 00:21:36,560
So you could see here there's different features, you see these features and then they all combine

267
00:21:36,560 --> 00:21:39,770
to form the example and then serialization.

268
00:21:39,770 --> 00:21:43,730
So this means that all we need to do here is pass in.

269
00:21:43,730 --> 00:21:51,190
We have here instead of this we have create example and then we'll be passing in the image and the label.

270
00:21:51,200 --> 00:21:53,270
Now we'll take this off here.

271
00:21:53,270 --> 00:21:54,740
We do not need this.

272
00:21:54,740 --> 00:21:56,210
Basically, we don't need that.

273
00:21:56,210 --> 00:22:03,680
And then we'll go for image and label in our data set, in our TensorFlow data set.

274
00:22:03,680 --> 00:22:06,590
Let's get back up the what are we doing with now?

275
00:22:06,590 --> 00:22:07,760
So our training data set.

276
00:22:07,760 --> 00:22:13,820
So for this, in our training data set, that's it.

277
00:22:14,000 --> 00:22:15,650
We're going to write this.

278
00:22:15,650 --> 00:22:23,780
So we're going to have to write this image and this label in our our file will be created.

279
00:22:23,780 --> 00:22:26,660
So in this our TensorFlow record file.

280
00:22:27,770 --> 00:22:34,820
Now, given that we have to write a given chart and not just a full data set, yeah, would we change

281
00:22:34,820 --> 00:22:38,990
this to chart sharded data set?

282
00:22:38,990 --> 00:22:46,070
And then let's have this here we have our sharded data set which is going to be equal.

283
00:22:46,520 --> 00:22:54,140
We have that training data set and then we'll shard so we could get to the definition or I could get

284
00:22:54,140 --> 00:22:57,320
to the TensorFlow data.

285
00:22:57,350 --> 00:23:02,090
Let's scroll this TensorFlow data data set here.

286
00:23:02,090 --> 00:23:06,260
You could have this or you could find the Shard method right here.

287
00:23:06,440 --> 00:23:09,320
Let's scroll down and we'll click here.

288
00:23:09,320 --> 00:23:13,340
You see, we have let's click on Shard Shard.

289
00:23:13,730 --> 00:23:19,730
So you see here we have the definition, takes a number of shards and then it takes the specific index.

290
00:23:19,730 --> 00:23:21,170
So here we have ten shards.

291
00:23:21,170 --> 00:23:25,160
And then for each index we're going to pass in the value here dynamically.

292
00:23:25,160 --> 00:23:31,820
So here you see that this creates different parts or parts of our dataset.

293
00:23:31,820 --> 00:23:35,090
So here we have each pack which is going to be created.

294
00:23:35,270 --> 00:23:37,850
So we specify the number of shots.

295
00:23:37,850 --> 00:23:38,750
There we go.

296
00:23:38,750 --> 00:23:41,330
And then we specify the shot number.

297
00:23:41,330 --> 00:23:45,380
So that's all we need to create a part of our dataset.

298
00:23:45,380 --> 00:23:46,290
So that's it.

299
00:23:46,310 --> 00:23:51,330
Once we have this, let's now run this and see what we get.

300
00:23:51,340 --> 00:23:54,410
Get in this error for sharded number.

301
00:23:56,120 --> 00:23:57,650
Run that again.

302
00:23:57,650 --> 00:23:59,000
What do we get?

303
00:23:59,990 --> 00:24:02,120
This very speeded one of type bytes.

304
00:24:02,120 --> 00:24:06,200
So what we're passing is tensor, but we expected a bite.

305
00:24:06,230 --> 00:24:11,120
Now the next question is how do we convert the sensors to bytes?

306
00:24:11,750 --> 00:24:16,310
So we do a quick Google search here, convert image to bytes in TensorFlow.

307
00:24:16,340 --> 00:24:17,440
There we go.

308
00:24:17,450 --> 00:24:28,850
We have decode image, but in fact, what we'll be using to convert this image into bytes is this encode

309
00:24:28,850 --> 00:24:30,350
jpeg right here.

310
00:24:30,350 --> 00:24:36,530
So we haven't JPEG images so we could use this here using it.

311
00:24:36,530 --> 00:24:37,880
It's quite simple.

312
00:24:37,880 --> 00:24:43,880
We're just simply passing the image and then we'll consider all this to be default values.

313
00:24:43,880 --> 00:24:51,560
So here we have encode JPEG and then we're going to create this encoder method.

314
00:24:51,560 --> 00:24:57,800
Your encode image takes in the image and the label.

315
00:24:57,800 --> 00:25:03,260
And then what we have here is that we're going to start by having getting the image.

316
00:25:03,260 --> 00:25:06,020
You say image is going to be the encoded version.

317
00:25:06,020 --> 00:25:11,810
So we have to do I0 encode JPEG, that's it.

318
00:25:11,810 --> 00:25:17,240
We pass in the image and then we're going to return the image and the label.

319
00:25:17,240 --> 00:25:23,660
So once we have this, let's add this code, so let's run this and then we're going to create a new

320
00:25:23,660 --> 00:25:28,820
encoded dataset data set equal this.

321
00:25:28,970 --> 00:25:32,240
We have training data set.

322
00:25:32,840 --> 00:25:34,100
There we go.

323
00:25:34,220 --> 00:25:38,150
We're going to map, encode, image, That's it.

324
00:25:39,560 --> 00:25:47,150
So that each and every time we want to pass this into this, our file writer, we're going to make sure

325
00:25:47,150 --> 00:25:58,700
that we have this in the form of bytes so that it matches up with this year with this feature, which

326
00:25:58,700 --> 00:26:02,870
is defined in our create example method right here.

327
00:26:02,870 --> 00:26:04,640
So we need to match this up.

328
00:26:04,730 --> 00:26:07,040
So that said, let's get back.

329
00:26:07,040 --> 00:26:10,850
We have our encode encoded data set.

330
00:26:11,690 --> 00:26:15,230
Take this off, we run, we have encoded data set.

331
00:26:15,560 --> 00:26:22,010
We told that this image has type flow 32 that is match expected type of unsigned int eight.

332
00:26:22,010 --> 00:26:28,790
So what we need to do here is we need to convert our image first into this unsigned it.

333
00:26:28,790 --> 00:26:32,510
So we have here now to get a solution.

334
00:26:32,510 --> 00:26:38,090
We'll get into TF image, convert image, the type so that we could convert.

335
00:26:38,090 --> 00:26:43,370
Let's actually convert, let's get to C convert image D type.

336
00:26:43,370 --> 00:26:49,970
So as we're saying, we could actually convert our float 32 to the unsigned int So here all we need

337
00:26:49,970 --> 00:26:56,120
to do, as you could see here, you see you have the tensor and then you have the D type specified.

338
00:26:56,120 --> 00:27:03,590
So that said, let's copy this and we have that anyway, we're going to, we're going to take the default

339
00:27:03,590 --> 00:27:04,970
value for the saturation.

340
00:27:04,970 --> 00:27:12,860
So let's get back here and then at the level of this before the encoding, we're going to convert that.

341
00:27:12,860 --> 00:27:15,110
So here we have image.

342
00:27:15,830 --> 00:27:19,640
Image equals that and then we pass in the image here.

343
00:27:19,910 --> 00:27:28,550
So we have to f dot on sign int eight, int eight and close that.

344
00:27:28,550 --> 00:27:31,400
So that's it then.

345
00:27:31,400 --> 00:27:37,090
Set we can now let's run this and run this again so we could see our encoded data set.

346
00:27:37,100 --> 00:27:38,480
So that works fine now.

347
00:27:38,480 --> 00:27:46,850
So we have our encoded data set and then in here instead of having this, we'll call this encoded data

348
00:27:46,850 --> 00:27:47,270
set.

349
00:27:47,510 --> 00:27:53,780
So we have encoded data set and then we'll run this and see what we get.

350
00:27:54,710 --> 00:27:59,870
Value must be iterable, so we must convert this into an AI terrible data set.

351
00:27:59,870 --> 00:28:07,160
So to do this we just have your as non pi iterator, we have as non power trader.

352
00:28:07,160 --> 00:28:08,480
We run that again.

353
00:28:09,800 --> 00:28:17,510
We are getting here to 55 has type in expected one of bytes of this error is coming from our create

354
00:28:17,510 --> 00:28:18,680
example method.

355
00:28:18,680 --> 00:28:21,170
So let's get back to create example.

356
00:28:21,710 --> 00:28:25,100
Everything looks fine but here we should have a list.

357
00:28:25,100 --> 00:28:26,840
So let's have this here.

358
00:28:27,260 --> 00:28:32,180
There we go and run this again.

359
00:28:33,320 --> 00:28:33,980
And that's it.

360
00:28:34,730 --> 00:28:35,030
The.

361
00:28:35,080 --> 00:28:40,420
Creation of the different files now complete, you could click open your and you see we have all this

362
00:28:40,420 --> 00:28:42,070
ten different charts.

363
00:28:42,700 --> 00:28:48,430
At this point, what we could do is we could save this files and the drive such that we could use it

364
00:28:48,430 --> 00:28:50,170
next time for training.

365
00:28:50,350 --> 00:28:56,310
So let's go ahead and see how we could make use of this for our training process.

366
00:28:56,320 --> 00:29:01,180
Now, what we want to do here is to convert this back to a tensorflow data set.

367
00:29:01,180 --> 00:29:03,520
So we had our TensorFlow data set.

368
00:29:03,520 --> 00:29:10,420
We converted it into a TensorFlow record like we see here, converted this into some different TensorFlow

369
00:29:10,420 --> 00:29:16,060
record files and now want to reconvert this into our TensorFlow data set.

370
00:29:17,020 --> 00:29:22,900
Now, to get this reconstructed data set, we are going to pass the different file names in this TF

371
00:29:22,900 --> 00:29:24,580
record dataset right here.

372
00:29:24,580 --> 00:29:29,050
So what we have here is this list made of all these different file names right here.

373
00:29:29,830 --> 00:29:37,660
So this list, in essence, let's call it l will be our path, which will format and then we'll pass

374
00:29:37,660 --> 00:29:45,590
in this variable P for P in range, the number of shards for P and number of shards.

375
00:29:45,610 --> 00:29:53,770
Let's print out this L run that, and we get this list which is made up of all the different files here.

376
00:29:53,950 --> 00:29:59,650
So that said, what we're going to have here is we're going to copy this, we're going to copy this

377
00:29:59,650 --> 00:30:04,300
here and then simply replace this list with it.

378
00:30:04,450 --> 00:30:06,340
So we have this list now.

379
00:30:06,820 --> 00:30:07,720
Let's have this.

380
00:30:07,720 --> 00:30:12,370
We have this list now, and then we run this and get our reconstructed data set.

381
00:30:12,370 --> 00:30:20,650
But then we need to parse this TF record data set such that we could get our original data, which was

382
00:30:20,650 --> 00:30:27,670
in the form of the image and the label where the image was an array and the label was some integer.

383
00:30:28,090 --> 00:30:33,670
And so with that we have your parse single example method, which we're going to make use of, which

384
00:30:33,670 --> 00:30:39,370
takes an example which is basically what is contained in our reconstructed data set, takes an example

385
00:30:39,370 --> 00:30:44,830
and permits us to split this into the image and the label.

386
00:30:44,830 --> 00:30:52,300
So right here we'll have example and then we'll get the image we have the image that will make use of

387
00:30:52,300 --> 00:30:53,740
the decode JPEG method.

388
00:30:53,740 --> 00:30:58,480
So previously we encoded the chip into JPEG or we convert it into bytes.

389
00:30:58,510 --> 00:31:03,520
Now we're going to convert from bytes back to the unsigned integer.

390
00:31:03,520 --> 00:31:11,470
So here we have decode JPEG and then we're going to pass in the example image.

391
00:31:13,120 --> 00:31:14,470
So we have that image.

392
00:31:14,470 --> 00:31:17,200
We specify a number of channels to be three.

393
00:31:17,470 --> 00:31:23,560
Now we have all the set, while we'll have to do is specify this feature description right here.

394
00:31:23,680 --> 00:31:29,470
Nonetheless, this feature description is basically what we had in this create example.

395
00:31:30,670 --> 00:31:32,830
And so here we have this dictionary.

396
00:31:33,070 --> 00:31:36,730
Let's have this images, labels your images.

397
00:31:36,730 --> 00:31:44,170
We have this dictionary which is made of the images and the labels, and then we have the data types

398
00:31:44,170 --> 00:31:47,320
of the images and the labels respectively.

399
00:31:47,320 --> 00:31:53,410
So we need to pass in this feature description, in this parse single example method.

400
00:31:53,740 --> 00:31:57,760
Now we have that we set to return our output.

401
00:31:57,760 --> 00:32:04,840
So we take example and we have images and then example and we have labels.

402
00:32:04,840 --> 00:32:10,990
So what we're doing here is we take an input example and then we're breaking it up into the images,

403
00:32:10,990 --> 00:32:19,420
into the labels while converting these images from the bytes back to the unsigned end.

404
00:32:19,420 --> 00:32:20,860
So we have that.

405
00:32:20,860 --> 00:32:22,870
Now let's run the cell.

406
00:32:22,870 --> 00:32:28,600
And then here we have our parsed or parsed data set.

407
00:32:28,900 --> 00:32:29,800
There we go.

408
00:32:29,800 --> 00:32:34,060
We have parsed data set and we have recurrence data set.

409
00:32:35,080 --> 00:32:35,860
There we go.

410
00:32:35,860 --> 00:32:38,320
We map this method here.

411
00:32:38,320 --> 00:32:44,290
So we have parse to F records or parse F records.

412
00:32:44,290 --> 00:32:47,590
So we have this method, let's run this.

413
00:32:47,590 --> 00:32:48,610
That's fine.

414
00:32:48,610 --> 00:32:52,450
And then let's see what is contained in our parsed data set.

415
00:32:52,450 --> 00:32:58,000
So let's take our single value and then print this out.

416
00:32:58,250 --> 00:32:59,320
From that.

417
00:33:00,100 --> 00:33:04,420
We get in this error for AI in this, let's run that.

418
00:33:04,420 --> 00:33:05,400
And there we go.

419
00:33:05,410 --> 00:33:11,540
As you could see, we have our input and then our output label right here.

420
00:33:11,560 --> 00:33:15,940
Now the next thing we could do is specify the batch size.

421
00:33:15,940 --> 00:33:26,260
So let's have this batch size here we have our batch configuration and then batch size, that's fine.

422
00:33:26,260 --> 00:33:28,150
And then we could do some pre fetching.

423
00:33:28,150 --> 00:33:30,550
So let's carry out the pre fetching.

424
00:33:31,120 --> 00:33:33,130
Let's just do auto tune.

425
00:33:33,130 --> 00:33:34,750
Auto tune.

426
00:33:36,340 --> 00:33:38,170
And then run this again.

427
00:33:39,370 --> 00:33:40,140
There we go.

428
00:33:40,150 --> 00:33:42,080
Now, we should have 32 elements.

429
00:33:42,120 --> 00:33:46,300
So let's look at our batch, our parsed data set.

430
00:33:47,860 --> 00:33:48,540
It looks fine.

431
00:33:48,550 --> 00:33:54,580
You see, we have this four different dimensions, and then we have our output here.

432
00:33:55,180 --> 00:33:56,260
So that's it.

433
00:33:56,980 --> 00:34:00,640
We now have our parsed data set and we're ready to train.

434
00:34:00,760 --> 00:34:08,140
So as we've said already, it's important for you to save this in some location.

435
00:34:08,190 --> 00:34:12,610
Let's say, for example, in the drive such that the next time when you want to do training, all you

436
00:34:12,610 --> 00:34:15,830
need to do is start from here.

437
00:34:15,850 --> 00:34:19,000
So all you need to do is to reconstruct this.

438
00:34:19,000 --> 00:34:25,120
So you just come and reconstruct this data set, you will parse it, and then you're good to go with

439
00:34:25,120 --> 00:34:31,750
this parsed data set right here so you no longer have to load all these images from memory and all of

440
00:34:31,750 --> 00:34:32,230
that.

441
00:34:32,320 --> 00:34:39,130
Now, before we move on, you should also note that while encoding the image or while encoding the data,

442
00:34:39,130 --> 00:34:44,320
that the image and the label, what we could also do is take the arc max of the label.

443
00:34:44,320 --> 00:34:50,950
So if you have a label, so let's suppose we have an input image like this and then you have an output

444
00:34:50,950 --> 00:35:00,760
label, say 010 instead of taking the constraint the labels to be this, we could take the position

445
00:35:00,760 --> 00:35:02,230
with the highest value.

446
00:35:02,230 --> 00:35:06,760
So this is 012, so it turns to one.

447
00:35:06,760 --> 00:35:16,630
Now if we have 001, then after getting the max in, this case is going to be two because we have zero

448
00:35:16,630 --> 00:35:21,900
position or first position, the composition and this is the one with the highest value.

449
00:35:21,910 --> 00:35:26,140
So this is another way of encoding our labels and that's what we are going to do.

450
00:35:26,140 --> 00:35:34,180
So we will take this and rerun again and create our dataset or create our TensorFlow records and we

451
00:35:34,180 --> 00:35:36,190
are still going to run this cells again.

452
00:35:36,190 --> 00:35:37,960
So let's rerun this.

453
00:35:37,960 --> 00:35:41,410
And we have our past data which you could see here.

454
00:35:41,470 --> 00:35:44,560
See now it's different from what we had before.

455
00:35:44,560 --> 00:35:50,620
So now we have the image or we have the images and then we have the labels.

456
00:35:50,620 --> 00:35:53,410
Then we could go ahead and run the model.

457
00:35:53,410 --> 00:35:55,990
We have the loss function defined.

458
00:35:55,990 --> 00:35:59,680
Now the loss function is a sparse, categorical across entropy.

459
00:35:59,680 --> 00:36:07,480
And this is simply because instead of the one hard notation that is, instead of representing our outputs

460
00:36:07,480 --> 00:36:12,850
like this, for example, we converting them into this single integer.

461
00:36:12,850 --> 00:36:15,640
So like this one, for example, would be two.

462
00:36:15,670 --> 00:36:20,770
If we had 100, then this will be zero.

463
00:36:20,800 --> 00:36:25,780
If we have 010, then this will be one.

464
00:36:25,780 --> 00:36:32,110
So we've seen already that when we have this, these kinds of outputs, then we use this parse categorical,

465
00:36:32,110 --> 00:36:33,340
cross entropy.

466
00:36:33,340 --> 00:36:34,270
And so that's it.

467
00:36:34,300 --> 00:36:37,360
We have this pass accuracy, categorical accuracy.

468
00:36:37,360 --> 00:36:40,000
And now let's go ahead and start with the training.

469
00:36:42,010 --> 00:36:43,020
So there we go.

470
00:36:43,030 --> 00:36:47,800
You can see that train has begun and we train, as we usually do.

471
00:36:48,370 --> 00:36:50,890
So that's it for this section on TensorFlow Records.

472
00:36:50,920 --> 00:36:52,210
See you in the next section.