1
00:00:00,510 --> 00:00:05,880
Now, let's take a look at the most important part of this PyTorch project that we're doing.

2
00:00:06,540 --> 00:00:08,400
This is called training the model.

3
00:00:08,970 --> 00:00:12,640
Now you can see how we created a prototype object model class.

4
00:00:13,050 --> 00:00:18,570
We declared optimizes data load is lost, criterion all of those things.

5
00:00:18,570 --> 00:00:22,590
But how do we know use all of those things to train our model?

6
00:00:23,010 --> 00:00:25,530
Because that's the essential part of what we're trying to do.

7
00:00:26,220 --> 00:00:30,720
So I'm going to take a look at this flowchart Typekit here, and we'll look at a quarter in each line

8
00:00:30,720 --> 00:00:38,070
afterward because mainly it's good to look at this in this simplified way because the code for this,

9
00:00:38,070 --> 00:00:40,770
as you can see here, can be a bit confusing.

10
00:00:40,770 --> 00:00:45,330
And these comments, even though they're quite helpful, added, I put in for you guys, they can add

11
00:00:45,330 --> 00:00:47,730
some bulk to the code, which could seem intimidating.

12
00:00:48,180 --> 00:00:50,430
So let's start with the other top.

13
00:00:51,180 --> 00:00:52,790
We're using these batches.

14
00:00:52,800 --> 00:00:58,770
Remember what data the returns can return in batches of 128 of sites 128.

15
00:00:59,190 --> 00:01:03,000
So we get 128 samples of data along with the labels.

16
00:01:03,480 --> 00:01:05,940
Well, that's what we do in the first phase of training.

17
00:01:06,060 --> 00:01:10,590
We get so many this, we need to get all data and that sort of mini batch.

18
00:01:11,070 --> 00:01:13,740
So we get about 128 inputs on the labels.

19
00:01:14,190 --> 00:01:19,170
Next, we have to initialize our videos to zero for the first run of this.

20
00:01:19,650 --> 00:01:22,170
And afterwards, after each run, we'll be optimizing these weights.

21
00:01:23,160 --> 00:01:26,310
So set our great gradients to zero value here.

22
00:01:26,940 --> 00:01:35,760
And next, what we do, we found propagate our own batches or batch of data, but to true network,

23
00:01:36,240 --> 00:01:39,490
get the outputs past outputs to get a loss.

24
00:01:39,570 --> 00:01:41,760
We'll use opposite outputs to get lost.

25
00:01:42,210 --> 00:01:47,760
Then once we have that loss, we propagate through and network and then we update gradients using the

26
00:01:47,760 --> 00:01:48,420
Optimizely.

27
00:01:48,810 --> 00:01:54,720
And you can see this cut each part of the code to show you what part of the code does what.

28
00:01:55,200 --> 00:01:56,230
So you understand fully.

29
00:01:56,250 --> 00:02:01,200
So just to take a look at this here to get the data, batch what you need to use.

30
00:02:01,680 --> 00:02:06,960
We just enumerate, you see, enumerate function for the looping to get the data loader and that puts

31
00:02:06,960 --> 00:02:08,010
the data at.

32
00:02:08,010 --> 00:02:12,060
But each batch is data in a loop, so we can.

33
00:02:12,060 --> 00:02:15,360
Data consists of two parts, consists of the inputs and levels.

34
00:02:15,810 --> 00:02:21,120
So we just use inputs and labels throughout the loop to feed into the neural to CNN.

35
00:02:22,110 --> 00:02:25,890
Next, we can use Optimizer 2.0 credit as zero as a gradients.

36
00:02:26,520 --> 00:02:27,610
Next, we actually default.

37
00:02:27,630 --> 00:02:28,980
Propagate is quite simple.

38
00:02:29,040 --> 00:02:29,820
Surprisingly, isn't.

39
00:02:30,040 --> 00:02:30,540
Think about it.

40
00:02:30,990 --> 00:02:37,320
We just take the inputs from a batch or batch of inputs, feed it into the net like this and collect

41
00:02:37,320 --> 00:02:39,990
outputs an end to consume the very, very simple.

42
00:02:40,590 --> 00:02:43,530
Next, we test to get the lowest criterion against is quite simple.

43
00:02:43,950 --> 00:02:48,630
Remove remember the criteria and function we created above criterion using cross entropy loss.

44
00:02:49,350 --> 00:02:51,660
That simply takes two outputs and labels.

45
00:02:52,140 --> 00:02:53,280
We get the lowest value.

46
00:02:53,640 --> 00:02:56,680
Then we use the lowest validator to do about proper application.

47
00:02:57,120 --> 00:03:02,640
So we do have a loss because this returns a loss object and that loss object contains to back backward

48
00:03:03,240 --> 00:03:06,270
for the net backward to implement that propagation.

49
00:03:07,080 --> 00:03:07,840
So we have that.

50
00:03:07,840 --> 00:03:08,880
Did we implement that?

51
00:03:09,300 --> 00:03:15,780
And then we just used to optimize a lot step to update the gradients using the stochastic gradient descent

52
00:03:16,170 --> 00:03:16,680
algorithm.

53
00:03:17,730 --> 00:03:19,260
So now let's take a look at the code.

54
00:03:20,280 --> 00:03:24,690
So you'll have some extra things in the code, which I'll explain to you labelling what we are using

55
00:03:24,690 --> 00:03:26,520
them for things like epochs.

56
00:03:26,520 --> 00:03:27,210
So epochs.

57
00:03:27,390 --> 00:03:32,670
Basically, if you remember correctly from the other slides, epochs are how many times we pass.

58
00:03:32,670 --> 00:03:36,360
All the data to network in one epoch means that we have passed.

59
00:03:36,360 --> 00:03:41,280
We've passed all the batches of our data through our network, and that means we have completed one

60
00:03:41,280 --> 00:03:48,420
book, one book and take anywhere from like a second to minutes or hours, even depending on how big

61
00:03:48,420 --> 00:03:53,250
the dataset is and how deep and complex your CNN or net neural net is.

62
00:03:53,520 --> 00:03:57,150
So I tend to use the box maybe around 50.

63
00:03:57,150 --> 00:04:01,710
In most cases, I'm using 10 in this example, just because I don't want you guys be waiting for an

64
00:04:01,710 --> 00:04:03,510
hour or so to finish to finish.

65
00:04:04,080 --> 00:04:07,830
Actually, with 20 books, it takes five minutes to have 50 bucks and think about 25 minutes.

66
00:04:08,640 --> 00:04:10,590
So either way, you can set whatever you want.

67
00:04:10,890 --> 00:04:13,500
The more ebooks you used, the better your accuracy would be.

68
00:04:13,680 --> 00:04:16,200
However it starts, it starts.

69
00:04:16,200 --> 00:04:17,040
Topping out diminishes.

70
00:04:17,040 --> 00:04:18,600
You get diminishing returns at some point.

71
00:04:19,020 --> 00:04:22,470
So there's no point in selling e-books that I wasn't going to get a better network.

72
00:04:22,470 --> 00:04:29,040
And if you used e-books at 50 or 100 and the tendency is to over fit to that point, which we'll talk

73
00:04:29,040 --> 00:04:29,940
about, don't worry.

74
00:04:30,630 --> 00:04:35,760
So let's step through this code so you can see we have these empty lists.

75
00:04:35,760 --> 00:04:39,330
We've created epochal, long lost log in accuracy.

76
00:04:39,330 --> 00:04:40,860
Look, these are lowest.

77
00:04:41,130 --> 00:04:45,300
This is necessary as we keep the stored information as we go redundant.

78
00:04:45,780 --> 00:04:47,670
So we just want to keep a log of all information.

79
00:04:48,210 --> 00:04:53,350
So we just keep track of the information by putting it, appending it into these areas.

80
00:04:53,640 --> 00:04:55,980
Well, this upon completion.

81
00:04:56,400 --> 00:04:59,850
So for epoch in ebooks, which means.

82
00:04:59,990 --> 00:05:01,910
We're going to look you one to 10.

83
00:05:02,510 --> 00:05:05,270
We just print this, I think it's OK here because it starts at zero.

84
00:05:05,570 --> 00:05:12,590
So on the eve of PLoS One, the book won just to just add some human, more human, more human element

85
00:05:12,590 --> 00:05:17,720
to it because we tend to start counting at 1.0 and we create a function here.

86
00:05:18,560 --> 00:05:23,000
I mean, the function of variable here called running loss and then make it a zero.

87
00:05:23,600 --> 00:05:25,750
This story's Kimberly.

88
00:05:25,760 --> 00:05:28,880
It's the loss after each mini batch run.

89
00:05:29,300 --> 00:05:34,280
So that's important to use for some of the calculations as we output an information output.

90
00:05:35,060 --> 00:05:37,190
So this is a plot we need to discuss now.

91
00:05:37,610 --> 00:05:42,830
This is a pilot talked about where we get the data from the data loader can see Enumerate this here.

92
00:05:42,830 --> 00:05:50,480
So we have AI as a index function to use within a loop, and we have data here, which is the 128 batch

93
00:05:50,480 --> 00:05:55,460
size we extracted, which comprised of the inputs and the labels.

94
00:05:55,910 --> 00:06:02,140
So we have data equal that so now we can pass the inputs now to a GPO as well as labels.

95
00:06:02,140 --> 00:06:04,220
So this is an important line in here.

96
00:06:04,700 --> 00:06:11,720
If you're using a CPU, then we said two gradients to zero, if it's a fresh start.

97
00:06:12,170 --> 00:06:12,470
OK?

98
00:06:12,890 --> 00:06:15,830
There are also things where you can pause learning and resume learning.

99
00:06:16,220 --> 00:06:18,260
So that's what this function allows us to do.

100
00:06:19,650 --> 00:06:22,650
So if in most cases, you would be something that says zero.

101
00:06:23,190 --> 00:06:24,180
So just remember that.

102
00:06:25,080 --> 00:06:27,920
So no, this is a process we want to do forward.

103
00:06:28,050 --> 00:06:33,690
Then back backdrop that optimize, so we fully propagate here, take the inputs, get the outputs,

104
00:06:34,290 --> 00:06:38,400
get the outputs, the outputs into the most criterion cross entropy loss.

105
00:06:39,000 --> 00:06:40,350
So we get to loss here.

106
00:06:40,710 --> 00:06:44,370
Then we can run back propagation and then we can update the gradients here.

107
00:06:44,520 --> 00:06:49,830
This is a lot of stuff going on in these four lines of good by total, which makes it quite simple for

108
00:06:49,830 --> 00:06:50,610
us, to be honest.

109
00:06:51,740 --> 00:06:57,350
Next, we keep track of your running loss, so we use a plus equal just to keep adding on to the previous

110
00:06:57,350 --> 00:06:59,780
lowest value, so we extract the lost value.

111
00:07:00,210 --> 00:07:03,230
Add it here that allows us to keep track of the loss.

112
00:07:03,980 --> 00:07:08,930
The end of the book so that at the end of that epoch, we can see what the overall loss was.

113
00:07:10,070 --> 00:07:16,640
And this function, this line here is a simple line that tells us after every 15 minute matches what

114
00:07:16,640 --> 00:07:20,300
we'll do, we'll just print some stuff and get some information.

115
00:07:20,330 --> 00:07:21,380
Let's see what it is.

116
00:07:21,830 --> 00:07:22,220
So.

117
00:07:23,590 --> 00:07:25,690
We have a value call, a variable called correct.

118
00:07:25,930 --> 00:07:28,540
We make it equal to zero, then total equals zero.

119
00:07:28,600 --> 00:07:29,320
What does that mean?

120
00:07:29,950 --> 00:07:31,480
Well, it means two here.

121
00:07:31,630 --> 00:07:35,410
We're going to keep track of accuracy in this look in this part of the code.

122
00:07:35,950 --> 00:07:37,540
So how do we keep track of accuracy?

123
00:07:37,960 --> 00:07:42,700
Remember, we're training a model, so and we have access to a test data, too.

124
00:07:43,060 --> 00:07:46,480
So every 50 minute match is what we do here.

125
00:07:46,990 --> 00:07:49,810
We tick would know too much credit.

126
00:07:50,860 --> 00:07:53,770
This allows us to quickly access to gradients of network.

127
00:07:54,850 --> 00:07:57,820
And basically the way to do it without dumping it to memory.

128
00:07:58,210 --> 00:07:59,510
So it's quite a fast way we do it.

129
00:07:59,530 --> 00:08:01,810
We can use pie too much within the loop to do this.

130
00:08:02,260 --> 00:08:05,230
So we use with no would touch that, no dread.

131
00:08:05,860 --> 00:08:08,340
Then we just take the data from the little order again.

132
00:08:08,600 --> 00:08:09,250
This returns.

133
00:08:09,610 --> 00:08:11,410
This is a little over the test data set.

134
00:08:11,410 --> 00:08:11,950
No, sorry.

135
00:08:12,430 --> 00:08:18,670
So it's 10000 samples in our test data set and 60000 in our training dataset.

136
00:08:19,240 --> 00:08:21,900
So we still have the same batch size one hundred and twenty eight now.

137
00:08:22,480 --> 00:08:29,410
So we just take the first batch from our test dataset here again, moving to the GPU labels and images,

138
00:08:30,070 --> 00:08:31,300
we felt propagated here.

139
00:08:31,720 --> 00:08:37,210
This is about, you know, this is about I want you to pay attention to what we do know is that we have

140
00:08:37,210 --> 00:08:38,560
the outputs from here.

141
00:08:39,040 --> 00:08:47,260
We use this to dodge too much of max function to get to the node of the MAX with the maximum probability

142
00:08:47,950 --> 00:08:50,590
analysis to get the predictive class.

143
00:08:50,860 --> 00:08:52,630
So we get to predictive class here.

144
00:08:53,550 --> 00:08:59,700
And this allows us no, we can just keep track so we know how much the total is.

145
00:09:00,210 --> 00:09:02,070
We're keeping track as a total as it goes along.

146
00:09:02,460 --> 00:09:04,770
That's why we use this accumulator on function here.

147
00:09:04,980 --> 00:09:10,650
Operator And then we just keep track of the how many values are predicted correctly.

148
00:09:11,010 --> 00:09:15,630
This is a good way to monitor your training to monitor your training of the neural net.

149
00:09:16,080 --> 00:09:18,720
Do you have to monitor its progress while it's training?

150
00:09:19,320 --> 00:09:23,290
So for every 50 many matches, we take the test.

151
00:09:23,330 --> 00:09:28,320
Either we run the testing and shooting that went to that point and we just see how much you got it got

152
00:09:28,320 --> 00:09:28,650
right?

153
00:09:29,160 --> 00:09:30,480
That's what all of this is doing here.

154
00:09:30,480 --> 00:09:31,080
It's quite simple.

155
00:09:31,080 --> 00:09:33,450
Actually, we calculate accuracy.

156
00:09:33,930 --> 00:09:38,760
We get it epoch number here, we get the actual loss and then we just print it out here.

157
00:09:39,270 --> 00:09:45,660
So we just use the f string function, the sprint epoch number, how many many batches were complete

158
00:09:46,140 --> 00:09:49,350
or the actual loss was which is running a loss divided by 50.

159
00:09:49,950 --> 00:09:53,490
We divided by 50 because that's how much we set this here.

160
00:09:54,900 --> 00:10:00,720
And then we just get the accuracy as well, which is how much it got correct over the total multiplied

161
00:10:00,720 --> 00:10:01,290
by 100.

162
00:10:02,070 --> 00:10:05,580
And then we started running the list again to zero at this point, at the end here.

163
00:10:06,510 --> 00:10:07,740
And that's it.

164
00:10:07,860 --> 00:10:14,370
Oh, after sorry, after all of that, that we just append the above number to log, but then the loss,

165
00:10:14,550 --> 00:10:18,750
actual loss after each epoch to lot more here as well as accuracy.

166
00:10:19,170 --> 00:10:23,640
This is important because we want to be able to plot our results and visualize them afterward.

167
00:10:23,970 --> 00:10:25,170
Our training results.

168
00:10:25,860 --> 00:10:32,310
So you can see when you run this, I would like you all to go press play here to run of this big trading

169
00:10:32,670 --> 00:10:33,120
loop here.

170
00:10:34,170 --> 00:10:40,380
And you can see after every 50 50 minute matches in epoch one, it tells you how much many batches we

171
00:10:40,380 --> 00:10:41,590
complete gives you.

172
00:10:41,590 --> 00:10:43,440
The loss or loss goes down.

173
00:10:43,980 --> 00:10:46,200
This is a good thing because we want the loss to go down.

174
00:10:46,800 --> 00:10:50,730
And you can see your test accuracy, which is what we were calculating here.

175
00:10:51,120 --> 00:10:51,900
Its accuracy.

176
00:10:52,800 --> 00:10:53,610
This is accuracy.

177
00:10:53,610 --> 00:10:57,210
Initially, when the first run 65 percent, which is quite good.

178
00:10:57,540 --> 00:11:01,500
So first quick run and you can see it gets better and better, better.

179
00:11:01,860 --> 00:11:09,080
And you can see after 10 epochs, we get ninety seven point eight percent accuracy in earnest.

180
00:11:09,120 --> 00:11:12,600
That's quite good for five minutes of training the day.

181
00:11:12,630 --> 00:11:18,450
I think this to record it right, no one else stands at maybe ninety nine point three percent accuracy.

182
00:11:18,960 --> 00:11:22,620
So feel free to experiment and see how close it can get that it would be fun.

183
00:11:23,520 --> 00:11:28,700
So we'll know stuff to after the lengthy lesson, and we'll take a look at some of the things in the

184
00:11:28,710 --> 00:11:32,970
next section, which is saving our model so we can see this moment and use it.

185
00:11:33,120 --> 00:11:33,660
I deleted.

186
00:11:34,560 --> 00:11:40,320
And also taking a look at how we can visualize the results so we can pass input data into a network

187
00:11:40,620 --> 00:11:41,940
and get the outputs afterwards.

188
00:11:42,330 --> 00:11:44,190
So I'll see you in the next section.