1
00:00:01,080 --> 00:00:04,860
Now we want to understand back propagation.

2
00:00:04,890 --> 00:00:14,190
Now back propagation is a method that the deep neural networks use to optimize or to train the, uh,

3
00:00:14,190 --> 00:00:14,940
the network.

4
00:00:14,940 --> 00:00:22,350
So the actual thing, what's happening is training is just simply changing the weights and biases.

5
00:00:22,350 --> 00:00:28,530
And in order to match the inputs with the outputs based on the data.

6
00:00:28,830 --> 00:00:30,810
So let's do that.

7
00:00:30,810 --> 00:00:33,690
But let's do it well by hand.

8
00:00:33,690 --> 00:00:40,860
To do so let's consider we have an input value of um of x.

9
00:00:41,950 --> 00:00:44,260
X equals one.

10
00:00:44,990 --> 00:00:52,100
And an output value which is equal y equals or is a bit or.

11
00:00:53,390 --> 00:00:55,400
Y equals two.

12
00:00:56,270 --> 00:00:59,810
Now this is what we have.

13
00:00:59,810 --> 00:01:00,560
And.

14
00:01:01,180 --> 00:01:09,280
We need a network that will have the input multiplied by W and get the output, which is two.

15
00:01:09,490 --> 00:01:13,600
So what do we do is we will have input of x.

16
00:01:14,230 --> 00:01:18,820
This input of X will go to the.

17
00:01:19,610 --> 00:01:23,240
Neural network and.

18
00:01:23,900 --> 00:01:29,840
The neural network or this operation will need w the weight.

19
00:01:30,890 --> 00:01:39,380
So this wait, let's consider its value is or initial value is one.

20
00:01:39,830 --> 00:01:47,750
This is of course the initial value of w x is one, a w or y is two.

21
00:01:48,530 --> 00:01:51,530
And W initially is one.

22
00:01:52,340 --> 00:01:57,230
Now the question is is this one value is correct or not correct?

23
00:01:57,260 --> 00:02:04,220
Well back propagation we need to understand how much we we need to change, how how much we need to

24
00:02:04,220 --> 00:02:10,970
change, or how W is related with our loss or the difference between the input and the output.

25
00:02:11,810 --> 00:02:13,880
So let's go step by step now.

26
00:02:13,910 --> 00:02:16,760
W we need to multiply it by x.

27
00:02:16,850 --> 00:02:22,100
And this is going to be the operation which is multiplication multiplication.

28
00:02:22,310 --> 00:02:26,330
And we make it a bit bigger I think 45.

29
00:02:27,580 --> 00:02:28,450
Or 50.

30
00:02:29,460 --> 00:02:35,520
So here is multiplication and what we will get, we will get.

31
00:02:37,410 --> 00:02:41,280
Y or we call it y hat.

32
00:02:42,420 --> 00:02:43,950
Y hat.

33
00:02:44,460 --> 00:02:53,490
Now y hat will go into another operation Y because we don't know.

34
00:02:54,100 --> 00:02:57,430
We don't know is this y hat is correct or not correct?

35
00:02:57,430 --> 00:03:02,410
We need to compare it with the real value, which is the y value.

36
00:03:02,410 --> 00:03:09,610
So we have the y value which equals of course two.

37
00:03:10,740 --> 00:03:11,220
Or.

38
00:03:11,220 --> 00:03:18,870
Yeah, we can just put it or let's just keep it like this y and it will enter this operation.

39
00:03:18,870 --> 00:03:21,960
And what we need is we need a subtraction.

40
00:03:22,500 --> 00:03:26,520
Subtract y from y hat and we will get.

41
00:03:27,270 --> 00:03:27,810
A value.

42
00:03:27,810 --> 00:03:30,840
Another value is the difference, which is s.

43
00:03:31,890 --> 00:03:35,130
So here is going to be S.

44
00:03:35,640 --> 00:03:38,610
Now after that what do we need.

45
00:03:40,230 --> 00:03:44,070
Usually the difference is might be minus, might be positive.

46
00:03:44,250 --> 00:03:49,170
So what we care about is not really the if it's positive or negative.

47
00:03:49,170 --> 00:03:52,710
So usually we want to square the value.

48
00:03:53,130 --> 00:03:57,720
We want to have a square value of uh well of s.

49
00:03:57,720 --> 00:04:04,440
So we um, this is like for example, the mean square error is actually squaring the error so that we

50
00:04:04,440 --> 00:04:07,230
will do another operation, which is.

51
00:04:07,970 --> 00:04:15,710
That squaring because what we care about or the loss itself, the value itself is, is just if the number

52
00:04:15,710 --> 00:04:18,620
is higher or lower, it's both bad.

53
00:04:18,620 --> 00:04:25,970
So what we care about is to minimize the actual loss, which is the absolute value, or well a squared

54
00:04:25,970 --> 00:04:28,160
value of the error.

55
00:04:28,520 --> 00:04:30,320
So we have to.

56
00:04:32,180 --> 00:04:34,760
Have is to the power of two.

57
00:04:35,480 --> 00:04:37,190
Now after that.

58
00:04:38,200 --> 00:04:40,180
After it got squared.

59
00:04:40,180 --> 00:04:41,620
This is actually.

60
00:04:42,300 --> 00:04:46,050
The value which we call loss value.

61
00:04:47,230 --> 00:04:47,980
Los.

62
00:04:48,850 --> 00:04:49,870
So.

63
00:04:50,420 --> 00:04:53,720
This is the system, like our system.

64
00:04:53,720 --> 00:04:56,180
But until now, we didn't understand what did it.

65
00:04:56,210 --> 00:04:57,050
What did we do?

66
00:04:57,050 --> 00:04:57,980
Like what did we do?

67
00:04:57,980 --> 00:04:58,970
Back propagation.

68
00:04:59,120 --> 00:05:00,350
We didn't till now.

69
00:05:00,350 --> 00:05:08,090
What we will do now is in order to do back propagation, first we need to do what we call a forward

70
00:05:08,090 --> 00:05:10,520
pass and then we go backward pass.

71
00:05:10,520 --> 00:05:21,620
So for this simple system, let's take just a bit um like thick line and let's choose green.

72
00:05:21,620 --> 00:05:24,740
And then w we said it's one x.

73
00:05:24,740 --> 00:05:26,450
This is of course initial value.

74
00:05:26,450 --> 00:05:34,190
And x also equals to one one multiplied one y hat will be equal.

75
00:05:34,190 --> 00:05:38,690
Well y hat it will be.

76
00:05:38,870 --> 00:05:47,240
Or actually this way hat equals one multiplied one which equals to one.

77
00:05:47,960 --> 00:05:48,830
Why?

78
00:05:48,830 --> 00:06:00,590
On the other hand, its value is two, so two multiplied one 2 or 1 minus two will equate to minus one

79
00:06:00,590 --> 00:06:05,120
as x will as s will be minus one.

80
00:06:05,720 --> 00:06:11,600
You square this thing and what you will get is the Los equals one.

81
00:06:12,380 --> 00:06:16,790
So this is quite, you know, a forward pass.

82
00:06:16,820 --> 00:06:21,020
Now we didn't solve the problem yet.

83
00:06:21,110 --> 00:06:27,140
What we need is we need, uh, how much I need to change w to make.

84
00:06:27,140 --> 00:06:28,970
The loss is zero.

85
00:06:29,000 --> 00:06:31,490
Usually it's a minimization problem.

86
00:06:31,490 --> 00:06:32,900
So what should I do?

87
00:06:32,930 --> 00:06:36,260
What you should do is basically apply the chain rule.

88
00:06:36,650 --> 00:06:39,980
And or we need to apply the back propagation.

89
00:06:39,980 --> 00:06:42,530
But actually the back propagation is based on the chain rule.

90
00:06:42,530 --> 00:06:51,830
In order to calculate dz and connected it with w what you need is or the loss with w we calculate.

91
00:06:51,830 --> 00:07:01,940
Let's say we want d over d w d loss over d w.

92
00:07:01,970 --> 00:07:08,600
What it will equal will equal de Los de Los.

93
00:07:11,100 --> 00:07:13,770
Over the years.

94
00:07:14,070 --> 00:07:15,270
This one.

95
00:07:16,270 --> 00:07:17,290
These.

96
00:07:20,100 --> 00:07:22,800
Multiplied this one.

97
00:07:22,800 --> 00:07:24,540
We put a DZ here.

98
00:07:25,090 --> 00:07:28,360
And then divide by y hat.

99
00:07:29,420 --> 00:07:29,990
Dee.

100
00:07:30,380 --> 00:07:36,350
Why not y this one y hat or the predicted y.

101
00:07:37,160 --> 00:07:45,560
Multiplied the hat, the y hat, the y hat over D.

102
00:07:46,150 --> 00:07:47,170
That will you?

103
00:07:47,170 --> 00:07:51,730
And this will equal the de loss over double.

104
00:07:52,900 --> 00:07:59,440
So what we actually did is basically we.

105
00:08:00,260 --> 00:08:04,160
We just if we dz, we kind of expanded the whole thing.

106
00:08:04,160 --> 00:08:11,090
We care about the loss to the W, but we cannot go directly from W to D and from D loss to W.

107
00:08:11,120 --> 00:08:17,750
We have to go through a process, uh, or the process is um, or we know the D loss to the S, we can

108
00:08:17,750 --> 00:08:22,700
connect dz to d y hat and y hat to d w, but we cannot connect directly here.

109
00:08:22,700 --> 00:08:23,810
So we use the chain rule.

110
00:08:23,810 --> 00:08:30,530
And the chain rule is basically if we got the loss over dz and dz here, this and this will cancel each

111
00:08:30,530 --> 00:08:33,170
other and this and this will cancel each other.

112
00:08:33,170 --> 00:08:35,390
And we go to the loss of a D.

113
00:08:35,420 --> 00:08:38,630
So the the idea of chain rule is quite interesting.

114
00:08:38,630 --> 00:08:41,930
And it this is how it works now.

115
00:08:42,830 --> 00:08:44,000
What is the problem?

116
00:08:44,000 --> 00:08:49,850
Now we need to calculate the every, uh, every point here.

117
00:08:49,850 --> 00:08:53,150
What is the derivation of these points.

118
00:08:53,150 --> 00:08:57,980
So basically loss we said is equal s squared.

119
00:08:58,010 --> 00:09:01,670
So if we want to say loss.

120
00:09:02,640 --> 00:09:05,190
Equals s squared.

121
00:09:05,220 --> 00:09:10,530
So what is the derivation of this loss?

122
00:09:12,640 --> 00:09:13,900
Or de?

123
00:09:14,590 --> 00:09:14,980
Well.

124
00:09:14,980 --> 00:09:15,670
Los.

125
00:09:16,690 --> 00:09:22,480
Loss over over d s will be.

126
00:09:23,350 --> 00:09:24,640
Will be what?

127
00:09:24,640 --> 00:09:25,660
Two.

128
00:09:26,520 --> 00:09:27,420
Two.

129
00:09:28,200 --> 00:09:28,650
Yes.

130
00:09:32,280 --> 00:09:35,790
So the derivation of de loss over.

131
00:09:36,690 --> 00:09:38,940
A the s is to s.

132
00:09:39,580 --> 00:09:42,490
That's quite useful now.

133
00:09:42,580 --> 00:09:45,010
What if we.

134
00:09:45,280 --> 00:09:46,540
What if we put.

135
00:09:46,540 --> 00:09:49,540
Let's say I will take this orange.

136
00:09:49,720 --> 00:09:53,410
So we already know this is the derivation.

137
00:09:53,410 --> 00:09:58,750
So how much is a s is minus one.

138
00:09:58,750 --> 00:10:03,190
And if we put s in here what is the value we get.

139
00:10:03,190 --> 00:10:06,610
We get equals minus.

140
00:10:08,170 --> 00:10:12,010
Or you will get minus two because or just a little bit.

141
00:10:12,010 --> 00:10:14,560
We just move it a bit after.

142
00:10:14,680 --> 00:10:21,040
Um, well, basically if we apply, uh, this one, it will be, well, okay.

143
00:10:21,040 --> 00:10:23,080
Like I'll just write it minus two.

144
00:10:23,110 --> 00:10:26,440
This is d loss over d s.

145
00:10:26,440 --> 00:10:33,880
This is not d loss over d w because we applied here's we already know the value and we applied in this

146
00:10:33,880 --> 00:10:34,600
equation.

147
00:10:34,600 --> 00:10:36,670
And then we will get minus two.

148
00:10:36,700 --> 00:10:41,650
This is again the loss over d s now.

149
00:10:42,470 --> 00:10:43,910
Let's go back here.

150
00:10:43,910 --> 00:10:49,340
What's equals to d s d?

151
00:10:50,370 --> 00:10:55,740
S over d well y hat.

152
00:10:56,530 --> 00:10:57,850
Y hat.

153
00:10:58,510 --> 00:11:01,180
Of course, it's all partial derivative.

154
00:11:04,840 --> 00:11:07,660
And we put it here a little bit.

155
00:11:07,660 --> 00:11:08,110
Yeah.

156
00:11:08,110 --> 00:11:10,420
The hat what it will equal.

157
00:11:10,420 --> 00:11:12,190
It will equal two.

158
00:11:13,280 --> 00:11:15,500
D hat.

159
00:11:16,370 --> 00:11:17,600
D all.

160
00:11:18,400 --> 00:11:21,430
Well, just make it a bit smaller.

161
00:11:25,720 --> 00:11:26,860
It's too small.

162
00:11:33,700 --> 00:11:37,570
So it will equal d y hat.

163
00:11:37,600 --> 00:11:38,830
Y hat.

164
00:11:39,640 --> 00:11:42,910
Minus y, minus y.

165
00:11:44,600 --> 00:11:46,070
Two D.

166
00:11:47,430 --> 00:11:51,930
Uh, well, y hat, this one, y hat, this one the predicted.

167
00:11:52,800 --> 00:11:53,820
Y hat.

168
00:11:54,840 --> 00:12:00,960
And then what this derivation will derivation will be.

169
00:12:01,920 --> 00:12:12,840
If you'd hat to d hat, it will equal to one and d y minus d y over d y hat is equal to zero.

170
00:12:12,840 --> 00:12:19,530
Because y doesn't change with y hat, y is a constant, so it become zero.

171
00:12:19,830 --> 00:12:27,600
And now we have the derivation of um well the the this derivation.

172
00:12:28,840 --> 00:12:30,070
Now.

173
00:12:30,460 --> 00:12:33,250
How much is d s over d hat?

174
00:12:33,280 --> 00:12:37,900
Well, it is one, because even y hat is one or whatever it is, it's.

175
00:12:37,900 --> 00:12:40,330
This derivation is always awkward equals to one.

176
00:12:40,330 --> 00:12:42,370
So let's write it down.

177
00:12:42,370 --> 00:12:44,650
Is equals to.

178
00:12:45,250 --> 00:12:46,270
One.

179
00:12:47,800 --> 00:12:49,480
What else do we have?

180
00:12:49,480 --> 00:12:55,060
Or yet still we have d y hat over d w.

181
00:12:56,940 --> 00:12:59,940
A what does it equal?

182
00:13:00,570 --> 00:13:02,280
So y hat.

183
00:13:03,160 --> 00:13:04,750
Over the W.

184
00:13:06,350 --> 00:13:08,990
We just write it a little bit?

185
00:13:10,100 --> 00:13:11,030
Maybe.

186
00:13:11,720 --> 00:13:12,560
How about here?

187
00:13:12,560 --> 00:13:13,520
We put it here.

188
00:13:13,520 --> 00:13:14,900
This one is here.

189
00:13:16,260 --> 00:13:16,890
The.

190
00:13:17,890 --> 00:13:20,140
Y hat.

191
00:13:21,700 --> 00:13:24,640
Over d w.

192
00:13:26,240 --> 00:13:27,860
What it will equal.

193
00:13:32,710 --> 00:13:34,060
It will equal.

194
00:13:36,870 --> 00:13:43,230
We cannot derivative because we don't know how much y hat unless we change y hat to an equation.

195
00:13:43,620 --> 00:13:51,990
So it becomes d y hat is w multiplied x, so w multiplied x.

196
00:13:53,000 --> 00:13:55,220
And derive d w.

197
00:13:56,790 --> 00:13:59,370
This is now easier.

198
00:14:00,230 --> 00:14:06,980
Derive this thing to, like, um, again, the y hat over the w.

199
00:14:06,980 --> 00:14:07,700
This one.

200
00:14:08,480 --> 00:14:16,640
When we derive this dw over dw d hat, deriving w to w will equal to one.

201
00:14:16,640 --> 00:14:20,270
That means the final answer will be x.

202
00:14:22,900 --> 00:14:24,430
And that's it.

203
00:14:24,430 --> 00:14:27,610
That's the value we have just a little bit.

204
00:14:27,610 --> 00:14:31,390
I move these things to look a little bit better.

205
00:14:32,950 --> 00:14:36,520
And put this one in its location.

206
00:14:36,760 --> 00:14:37,600
Maybe.

207
00:14:39,040 --> 00:14:39,250
Right.

208
00:14:40,380 --> 00:14:40,800
Here.

209
00:14:42,530 --> 00:14:52,610
So the value of this thing what it will equal this one and put orange x is one we we put the x value

210
00:14:52,610 --> 00:14:53,090
in here.

211
00:14:53,090 --> 00:14:54,140
What do we get.

212
00:14:54,260 --> 00:14:56,690
We get one.

213
00:14:58,440 --> 00:14:59,220
So.

214
00:15:00,430 --> 00:15:02,050
Let's calculate this guy.

215
00:15:02,890 --> 00:15:05,140
Let's use, uh, let's say blue.

216
00:15:06,390 --> 00:15:07,980
De Los over dos.

217
00:15:08,010 --> 00:15:09,090
How much is it?

218
00:15:09,120 --> 00:15:10,170
Minus two.

219
00:15:11,420 --> 00:15:14,870
Minus two multiplied.

220
00:15:15,380 --> 00:15:19,760
DZ over d what its account to dz over d.

221
00:15:20,730 --> 00:15:21,360
A white hat.

222
00:15:21,360 --> 00:15:21,780
Sorry.

223
00:15:22,260 --> 00:15:24,180
It will be one.

224
00:15:25,740 --> 00:15:33,690
Multiply d over DW what it will be d y hat of course over DW it will be one.

225
00:15:35,490 --> 00:15:44,040
What is all of this will be equal to one multiplied one is one multiplied minus two is minus two.

226
00:15:44,880 --> 00:15:54,180
So basically this is our, uh, basically the, uh, the value we have, which is minus two.

227
00:15:55,520 --> 00:16:00,200
The loss will change with w minus with a value of minus two.

228
00:16:00,230 --> 00:16:05,630
That means if the w increased, the loss will decrease.

229
00:16:05,630 --> 00:16:07,880
So what do we need to do?

230
00:16:08,060 --> 00:16:09,920
We need to increase w.

231
00:16:10,520 --> 00:16:12,500
A matter of fact it is correct.

232
00:16:12,500 --> 00:16:22,280
If we put w is two two multiplied one y hat will be two and two multiplied two is going to be equals

233
00:16:22,280 --> 00:16:27,890
to zero, which makes the loss is also equals to zero.

234
00:16:28,130 --> 00:16:36,080
So with this case we understand how can we change w to match the loss.

235
00:16:36,970 --> 00:16:41,890
So this is basically the idea of back propagation.

236
00:16:41,890 --> 00:16:48,220
And this is what in principle, every neural network, um, get trained on.

237
00:16:48,220 --> 00:16:53,260
We just calculate how much every weight is connected to the loss.

238
00:16:53,260 --> 00:16:58,660
And next class we will apply this with using PyTorch.
