1
00:00:01,440 --> 00:00:01,770
‫OK.

2
00:00:02,250 --> 00:00:05,700
‫In the last lecture, we learn what gradient descent is.

3
00:00:06,990 --> 00:00:13,980
‫In this lecture, we are going to see how to use this mathematical technique to find the optimum W's

4
00:00:14,100 --> 00:00:14,790
‫and B's.

5
00:00:16,170 --> 00:00:21,690
‫For this, we first need to understand the error function, which we discussed in the last lecture.

6
00:00:22,590 --> 00:00:26,310
‫Here are the five steps that we use to implement gradient descent.

7
00:00:28,200 --> 00:00:31,890
‫The first step is to give random values to all W's and B's in the system.

8
00:00:32,970 --> 00:00:38,850
‫Then we take one training example and put its X values as input to our system.

9
00:00:40,020 --> 00:00:44,400
‫We process through the entire network to get one predicted value.

10
00:00:46,860 --> 00:00:53,550
‫Now, on this third step, I told you that we measure the distance between predicted and the actual

11
00:00:53,550 --> 00:00:55,920
‫value using an error function.

12
00:00:57,990 --> 00:00:59,220
‫Let's see what this means.

13
00:01:00,720 --> 00:01:03,870
‫Suppose we predicted an output of zero point three.

14
00:01:04,700 --> 00:01:07,460
‫where is the actual value is zero.

15
00:01:09,060 --> 00:01:14,190
‫One way of calculating error of prediction could be just to subtract these two.

16
00:01:14,580 --> 00:01:21,540
‫That is finding out actual minus predicted, which will be zero minus zero point three, giving us minus

17
00:01:21,660 --> 00:01:22,590
‫zero point three.

18
00:01:24,740 --> 00:01:31,050
‫To remove this negative sign in the error and focus only on the magnitude of this error, we can simply

19
00:01:31,050 --> 00:01:41,170
‫put an absolute function or a square function on top of it, basically meaning minus zero point three

20
00:01:41,190 --> 00:01:44,760
‫would become point three or it will be squared.

21
00:01:44,850 --> 00:01:46,650
‫And it will become zero point zero nine.

22
00:01:50,250 --> 00:01:56,610
‫These two are good measures of error, but they do not work well when we are doing classification with

23
00:01:56,610 --> 00:01:57,510
‫neural networks.

24
00:01:59,590 --> 00:02:02,890
‫For this purpose, we use a different function.

25
00:02:06,430 --> 00:02:09,700
‫This one is called Cross Entropy Lost Function.

26
00:02:10,970 --> 00:02:12,740
‫It is represented by this formula.

27
00:02:14,170 --> 00:02:26,500
‫e is equal to minus f y into log y dash, minus one minus 5 log of one minus y dash.

28
00:02:27,190 --> 00:02:32,800
‫y represents actual value and why dash represents the predicted output value.

29
00:02:36,070 --> 00:02:42,010
‫I know this looks complex, much complex then the two error functions that we saw in the last slide.

30
00:02:43,090 --> 00:02:48,980
‫But the reason for using this is that this function does not have local minimas.

31
00:02:50,650 --> 00:02:57,550
‫That is the graph of this function looks like this one on the left and not like this one on the right.

32
00:02:59,380 --> 00:03:07,630
‫If a function has local minimas our gradient, descent won't work properly and it might stop here instead

33
00:03:07,630 --> 00:03:09,890
‫of finding the global minima which is here.

34
00:03:12,730 --> 00:03:15,760
‫If you don't understand the last comment, don't worry about it.

35
00:03:16,660 --> 00:03:19,870
‫The simple takeaway is for classification problems.

36
00:03:20,560 --> 00:03:24,430
‫The error function to be used is this cross entropy error function.

37
00:03:26,320 --> 00:03:30,110
‫We can take a look at this edit function to build some intuition around it.

38
00:03:32,420 --> 00:03:38,180
‫As you know, in classification problems, the output value is either zero or one.

39
00:03:39,170 --> 00:03:48,470
‫So if the output value is one, the second part of this function that is one minus Y, this entire tab

40
00:03:48,470 --> 00:03:54,980
‫will become zero because one minus one would be zero if the actual output is zero.

41
00:03:55,700 --> 00:03:58,730
‫Then the first term of this equation will become zero.

42
00:03:58,850 --> 00:04:00,680
‫And only the second time will remain.

43
00:04:02,350 --> 00:04:08,320
‫So let's say if the actual output is one for this error function to be minimum.

44
00:04:08,530 --> 00:04:11,050
‫The function should be as close to zero as possible.

45
00:04:12,310 --> 00:04:18,610
‫Let's see if Y is equal to one error is minus of one into

46
00:04:18,610 --> 00:04:24,790
‫Log y dash plus one minus one log, one minus Y dash the second time becomes zero.

47
00:04:25,420 --> 00:04:28,870
‫So we are left with only minus of log y dash.

48
00:04:30,390 --> 00:04:35,640
‫So for this error to be small minus log, y dash should be small.

49
00:04:37,260 --> 00:04:40,350
‫This implies that log y dash should be large.

50
00:04:41,850 --> 00:04:44,820
‫This further implies that y dash should be large.

51
00:04:46,470 --> 00:04:53,790
‫Since our predicted output is between zero and one, y dash being large simply means that white dash

52
00:04:54,180 --> 00:04:56,640
‫should be as close to one as possible

53
00:04:58,780 --> 00:05:05,000
‫Similarly, if actual value of output is zero, the first term of this equation will be zero.

54
00:05:07,690 --> 00:05:12,820
‫So the error function remaining would be minus logoff one minus y dash.

55
00:05:14,480 --> 00:05:22,780
‫For this error to be small, logoff one minus Y dash has to be large, implying that one minus y dash

56
00:05:22,780 --> 00:05:26,990
‫has to be large, implying that Y should be as small as possible.

57
00:05:27,920 --> 00:05:33,350
‫Although I am not given you the mathematical justification for using this function, but I guess with

58
00:05:33,350 --> 00:05:41,240
‫these particular examples, you are getting the feel of how minimizing the error or loss function is

59
00:05:41,240 --> 00:05:45,750
‫trying to match the predicted output value to the actual value.

60
00:05:50,210 --> 00:05:57,530
‫So now you may have guessed the job of gradient descent is to find the minimum of this error function,

61
00:05:58,430 --> 00:06:05,690
‫that is, we will make small changes to the values of weights and biases in that direction where we

62
00:06:05,690 --> 00:06:08,060
‫get maximum decrease in error.

63
00:06:09,480 --> 00:06:11,660
‫We will continue changing W's and B's.

64
00:06:11,780 --> 00:06:14,540
‫till no further decrease in error is possible.

65
00:06:16,130 --> 00:06:18,470
‫This is how the process looks graphically.

66
00:06:21,040 --> 00:06:28,610
‫For ease of understanding, I have represented all of the weights on one axis and all of the biases on another axis

67
00:06:29,690 --> 00:06:31,040
‫and on the vertical axis.

68
00:06:31,100 --> 00:06:32,940
‫We have the corresponding value of error

69
00:06:35,030 --> 00:06:38,210
‫These values of error are calculated using the error function.

70
00:06:40,320 --> 00:06:40,610
‫OK.

71
00:06:41,000 --> 00:06:44,280
‫So now let's revisit our steps to implement gradient descent.

72
00:06:46,640 --> 00:06:55,910
‫Again, the first step is setting a random initial values of W and B, then we go forward to get predicted

73
00:06:55,910 --> 00:06:56,660
‫output value.

74
00:06:58,220 --> 00:07:02,780
‫Then we put this predicted output value in our loss function to get the error prediction.

75
00:07:04,730 --> 00:07:06,020
‫Now we have the error.

76
00:07:06,170 --> 00:07:11,210
‫W's  and B's say we have a W value between one and two.

77
00:07:12,240 --> 00:07:17,540
‫A biased value between zero and minus one and an error value near 1

78
00:07:18,500 --> 00:07:21,440
‫So we're nearly here on this graph.

79
00:07:24,180 --> 00:07:31,940
‫Now, in the fourth step, we do backward propagation to find the direction of movement on this graph.

80
00:07:33,350 --> 00:07:37,140
‫Which means we find Delta W and Delta B..

81
00:07:37,760 --> 00:07:43,100
‫That is the change in W's and B's that will take us to the minimum point.

82
00:07:45,500 --> 00:07:52,810
‫If you look at this graph, you can probably see that by decreasing the weight and increasing the bias

83
00:07:52,820 --> 00:07:53,480
‫values.

84
00:07:54,560 --> 00:07:57,770
‫we'll be moving closer to the lowest point.

85
00:08:00,680 --> 00:08:03,610
‫So basically, we have initial W's and B's.

86
00:08:04,170 --> 00:08:10,350
‫We will be updating our W to W minus Alpha Times DELTA W.

87
00:08:12,530 --> 00:08:17,090
‫And we'll be updating our B to B minus Alpha Times, Delta B.

88
00:08:18,740 --> 00:08:20,660
‫Head alpha is called the Learning Rate.

89
00:08:22,700 --> 00:08:28,370
‫Basically, Delta W and Delta B are unit steps that we calculate using calculus.

90
00:08:29,690 --> 00:08:33,920
‫Alpha is controlling the number of those steps we take in that direction.

91
00:08:35,990 --> 00:08:42,250
‫You can imagine the impact of large versus small values of Alpha. If alpha is large

92
00:08:42,770 --> 00:08:46,100
‫We are taking multiple steps in the direction of gradient descent.

93
00:08:47,210 --> 00:08:49,840
‫This means that we can reach the bottom faster.

94
00:08:51,320 --> 00:08:55,550
‫But problem with large alpha is that we can overshoot from the minimum.

95
00:08:57,260 --> 00:08:59,750
‫Imagine you're very near to the bottom.

96
00:09:00,080 --> 00:09:02,930
‫But on the next term you take 50 steps.

97
00:09:03,050 --> 00:09:04,190
‫Instead of just one.

98
00:09:05,390 --> 00:09:08,530
‫In such a situation, you will climb the curve on the other side.

99
00:09:10,430 --> 00:09:18,500
‫So a large learning rate can help in faster dissent, but can face issue in the final stages of convergence.

100
00:09:19,940 --> 00:09:23,330
‫Therefore, a moderate value of learning rate is to be used.

101
00:09:24,230 --> 00:09:29,450
‫You will see what value of learning it is to be used in practical section of this course.

102
00:09:31,180 --> 00:09:31,660
‫Very well.

103
00:09:31,960 --> 00:09:39,630
‫So the steps to be taken in the direction of the descent is alpha times delta W and Alpha times delta

104
00:09:39,650 --> 00:09:40,120
‫B

105
00:09:42,500 --> 00:09:46,770
‫Now, how do we find delta W and delta B

106
00:09:47,150 --> 00:09:53,570
‫Delta W. is the change in weight and Delta B is the change in bias.

107
00:09:54,290 --> 00:09:55,710
‫Basically, we will change.

108
00:09:55,740 --> 00:10:00,040
‫They initially set W's and B's in the effort to reduce error.

109
00:10:04,040 --> 00:10:11,910
‫Now, let us see how to find Delta W. and Delta B. These values are formed by doing backward propagation,

110
00:10:13,050 --> 00:10:20,310
‫which means we will look back in the network to find out the instantaneous slope with respect to each

111
00:10:20,430 --> 00:10:21,780
‫W and B..

112
00:10:24,510 --> 00:10:28,590
‫Let me take an example with a single neuron to show you how this happens.

113
00:10:29,970 --> 00:10:37,500
‫Otherwise, the mathematics and calculus involved can get quite messy and is often overwhelming for some

114
00:10:37,500 --> 00:10:38,010
‫student.

115
00:10:39,930 --> 00:10:46,200
‫If you're comfortable with calculus, you can look at the complete back propagation theory in the link

116
00:10:46,200 --> 00:10:48,150
‫shared in the description of this lecture.

117
00:10:49,940 --> 00:10:56,900
‫However, I think with this simple example, you will get a solid intuition of how back propagation

118
00:10:56,900 --> 00:10:57,350
‫works.

119
00:10:59,440 --> 00:11:03,070
‫Here's a single neuron with two inputs, X1 and x2.

120
00:11:05,580 --> 00:11:07,940
‫It first calculate linearly.

121
00:11:08,150 --> 00:11:10,330
‫That is, it will calculate the value of Z.

122
00:11:10,670 --> 00:11:15,080
‫Which is equal to W one, X one plus W2 x2 plus B1.

123
00:11:17,680 --> 00:11:20,950
‫It then appies a sigmoid on this value of z

124
00:11:24,920 --> 00:11:29,150
‫This sigmoid of z is the predicted output of this neuron.

125
00:11:30,530 --> 00:11:37,670
‫We used this predicted output with the actual output to get the error of this particular training example.

126
00:11:40,860 --> 00:11:42,280
‫So let's start with step one.

127
00:11:42,730 --> 00:11:48,160
‫Step one is we have to randomly initialize the values of weights and bias

128
00:11:49,460 --> 00:11:50,420
‫We have two weights.

129
00:11:50,960 --> 00:12:00,230
‫And one bias we randomly initialize w one to be two, W2 is equal to three and bias value is equal to

130
00:12:00,230 --> 00:12:00,940
‫minus four.

131
00:12:04,850 --> 00:12:07,880
‫Now, the second step is forward propagation.

132
00:12:08,300 --> 00:12:15,950
‫That is, we will take one training example and put the input values of that training example to get

133
00:12:16,040 --> 00:12:17,300
‫a predicted output.

134
00:12:19,100 --> 00:12:27,230
‫We have taken this training example in which X1 value is 10 x two, value is minus four and the output

135
00:12:27,230 --> 00:12:28,040
‫is one.

136
00:12:28,520 --> 00:12:30,560
‫This Y is the actual output.

137
00:12:31,040 --> 00:12:32,030
‫And it is equal to one.

138
00:12:33,110 --> 00:12:35,180
‫So we have the W one value.

139
00:12:35,990 --> 00:12:37,150
‫We have X1.

140
00:12:37,700 --> 00:12:39,720
‫We have W2, x2 and B1.

141
00:12:39,920 --> 00:12:41,180
‫So we can calculate Z.

142
00:12:42,080 --> 00:12:46,400
‫We put all these values to get a Z value of 4.

143
00:12:48,320 --> 00:12:49,850
‫We apply the activation function.

144
00:12:50,060 --> 00:12:52,550
‫That is the sigmoid function on this value of z.

145
00:12:53,430 --> 00:13:02,790
‫To get predicted output of this neuron so sigmoidal z, that is sigmoid of 4 gives a predicted output

146
00:13:02,790 --> 00:13:04,390
‫of 0.982

147
00:13:06,240 --> 00:13:11,730
‫This predicted output value is the y dash value that we will use in the error function.

148
00:13:13,700 --> 00:13:18,020
‫You can see that this value is already very close to the actual output, which is one.

149
00:13:19,000 --> 00:13:21,380
‫But let's see how we can improve this value.

150
00:13:23,490 --> 00:13:25,880
‫Now, the third step is error calculation.

151
00:13:26,300 --> 00:13:27,920
‫We have the error function with us.

152
00:13:28,670 --> 00:13:30,590
‫We have predicted output value.

153
00:13:30,770 --> 00:13:34,040
‫That is y dash as zero point nine eight two.

154
00:13:34,700 --> 00:13:38,240
‫And we have the actual output value for the training example.

155
00:13:38,330 --> 00:13:41,180
‫As one, we put these two values.

156
00:13:41,360 --> 00:13:47,210
‫in this error function to get the final error value of zero point zero zero seven nine.

157
00:13:51,520 --> 00:13:54,820
‫Now, comes the fourth step, which is back propagation.

158
00:13:56,710 --> 00:14:00,820
‫The next few minutes are going to be a little heavy on mathematics.

159
00:14:00,970 --> 00:14:03,500
‫We will cover some basics of calculus here.

160
00:14:05,230 --> 00:14:08,590
‫If you're not comfortable with this part, it is still OK.

161
00:14:09,070 --> 00:14:12,010
‫This is happening in the background and your software is handling this.

162
00:14:12,340 --> 00:14:18,730
‫But if you have some understanding of calculus, looking at this example, we'll tell you how a neuron

163
00:14:18,760 --> 00:14:19,250
‫is doing.

164
00:14:19,270 --> 00:14:20,200
‫Back propagation.

165
00:14:21,760 --> 00:14:27,550
‫Do not worry if you do not understand this, because this is happening in the background and your software

166
00:14:27,550 --> 00:14:28,300
‫is handling this.

167
00:14:29,380 --> 00:14:32,950
‫It is good to have this intuition if you know a little bit of mathematics.

168
00:14:35,310 --> 00:14:37,530
‫So let's see how to do backward propagation.

169
00:14:39,000 --> 00:14:40,110
‫We are at the end.

170
00:14:40,380 --> 00:14:49,440
‫We have calculated error, the first first step is finding out the slope of error with a predicted output.

171
00:14:50,400 --> 00:14:51,300
‫That is y dash.

172
00:14:53,550 --> 00:14:54,420
‫This symbol here.

173
00:14:54,540 --> 00:15:00,980
‫Delta e by Delta y dash simply means that we are finding the instantaneous slope of error.

174
00:15:01,440 --> 00:15:04,920
‫With respect to y dash, keeping everything else constant.

175
00:15:06,140 --> 00:15:12,770
‫So if you know calculus, you can find a derivative of this function with respect to y dash when

176
00:15:12,890 --> 00:15:14,150
‫y is equal to one.

177
00:15:15,170 --> 00:15:19,670
‫This gives us an output of minus one by y dash.

178
00:15:20,930 --> 00:15:29,750
‫We go further back in our network and we find out the slope of our output function with respect to

179
00:15:29,870 --> 00:15:30,260
‫Z.

180
00:15:32,200 --> 00:15:34,330
‫The output function is a sigmoid function.

181
00:15:35,230 --> 00:15:39,910
‫The slope of sigmoid function with respect to Z is this value.

182
00:15:40,750 --> 00:15:46,630
‫e raised to the power minus Z upon one plus e raise to the power minus C Zwhole square.

183
00:15:48,950 --> 00:15:54,320
‫If you know differentiation, you can differentiate this function with respect to z and you will get

184
00:15:54,620 --> 00:15:56,630
‫this value of slope.

185
00:15:58,780 --> 00:16:08,270
‫Lastly, we find a differential of Z with respect to W1, W2 and B. So Z was equal to W one times X1

186
00:16:08,560 --> 00:16:11,070
‫plus W two times X2 plus B1.

187
00:16:11,800 --> 00:16:19,540
‫So when we find out the differential respect to W1 we get X1, which is equal to 10 at this current

188
00:16:19,540 --> 00:16:29,470
‫point for W2, we get X2, which is equal to minus four and four B. B we get a slope of one.

189
00:16:33,190 --> 00:16:36,790
‫Next comes the process of combining all of this.

190
00:16:37,600 --> 00:16:41,370
‫We moved back in our network to find all these slopes.

191
00:16:41,890 --> 00:16:49,650
‫But the slope we are actually interested in is how does the error function change with respect to W1

192
00:16:50,290 --> 00:16:52,440
‫How does it change with respect to W2?

193
00:16:52,630 --> 00:16:59,020
‫And how does it change with respect to B to find the differential of E respect to W one?

194
00:16:59,950 --> 00:17:06,580
‫We apply gene pool, which means that if you want to find differential of E respect to W one, you can

195
00:17:06,820 --> 00:17:14,800
‫instead find differential of E restricted by Desh multiplied with the eventual abiders, respect to

196
00:17:15,160 --> 00:17:20,310
‫the multiplied with differential of ze respect to differential of W one.

197
00:17:21,790 --> 00:17:24,910
‫We have calculated all these three values in our last slide.

198
00:17:25,210 --> 00:17:29,260
‫You can see on the top here we know the value of white ash.

199
00:17:29,470 --> 00:17:32,710
‫We know the value of Z for this particular training example.

200
00:17:33,250 --> 00:17:40,090
‫We can put all these values and calculate this differential and it comes out to be minus zero point

201
00:17:40,120 --> 00:17:40,930
‫one eight six.

202
00:17:45,320 --> 00:17:49,520
‫We can do the similar exercise for W2 am for.

203
00:17:49,940 --> 00:17:59,990
‫Also for the differential of E respect to W2 comes out to be zero point zero seven four six and the

204
00:17:59,990 --> 00:18:05,090
‫differential of evils to be comes out to be minus zero point zero one eight six.

205
00:18:06,470 --> 00:18:13,460
‫Now, these three differentials are the unit steps that we are going to take in the direction of our

206
00:18:13,460 --> 00:18:14,000
‫descent.

207
00:18:15,440 --> 00:18:19,640
‫These are the Delta W ones, the line W 2s and Delta beats.

208
00:18:21,320 --> 00:18:26,210
‫We are going to use these Delta values to update our rates and biases.

209
00:18:27,860 --> 00:18:33,800
‫So that we move in the direction we are defined, the loss would be less than the loss that we had earlier.

210
00:18:35,390 --> 00:18:37,190
‫This brings us to the last step.

211
00:18:37,970 --> 00:18:47,540
‫Last step is we have to update W and B, the new W one would be previous W one minus Alpha Times, Delta

212
00:18:47,540 --> 00:18:52,430
‫W one previous W one was two Alpha.

213
00:18:52,520 --> 00:18:53,620
‫We have taken as five.

214
00:18:54,290 --> 00:19:00,790
‫We have taken a learning rate of five year and we calculated Delta W one as minus zero point one eight

215
00:19:00,790 --> 00:19:01,220
‫six.

216
00:19:03,040 --> 00:19:06,480
‫This updates are w in value to two point ninety.

217
00:19:08,890 --> 00:19:13,260
‫Similarly, we calculate W2 value and it comes out to be two point six.

218
00:19:14,050 --> 00:19:17,740
‫And we update B value and it is now minus three point nine.

219
00:19:18,640 --> 00:19:25,210
‫You can compare the previous and new W1 W2 B values earlier W one was two.

220
00:19:25,500 --> 00:19:30,040
‫Nowadays, two point ninety earlier, W two was three.

221
00:19:30,310 --> 00:19:31,750
‫Nowadays, two point six.

222
00:19:32,770 --> 00:19:34,470
‫Earlier B was minus four.

223
00:19:34,690 --> 00:19:36,430
‫Nowadays, minus three point nine.

224
00:19:39,460 --> 00:19:46,720
‫Now, since we have updated our W's and B values, we have to go back to step two.

225
00:19:47,050 --> 00:19:54,160
‫We have to reiterate, we have to do forward propagation again and we will calculate the predicted output

226
00:19:54,190 --> 00:19:54,550
‫again.

227
00:19:55,840 --> 00:20:04,420
‫So this is the training example, X one is ten, x two is minus four, Y is one, we put these values

228
00:20:04,510 --> 00:20:06,550
‫with our updated weights and byas.

229
00:20:07,590 --> 00:20:15,000
‫This time, Disney values come out to be fourteen point seven when we apply our activation function

230
00:20:15,030 --> 00:20:18,840
‫on this evalu we get the predicted output value.

231
00:20:18,960 --> 00:20:21,630
‫That is why dash as zero point nine nine nine.

232
00:20:23,550 --> 00:20:27,690
‫If you remember last time we got a predicted value of zero point nine eight two.

233
00:20:28,410 --> 00:20:34,350
‫So clearly this is an improvement over the last values of abusing these.

234
00:20:36,500 --> 00:20:39,470
‫This process is repeated several times.

235
00:20:39,770 --> 00:20:41,740
‫They'll be get minimum error.

236
00:20:44,370 --> 00:20:52,020
‫If we have a lot of neurons in our network, the same processes followed in forward propagation, we

237
00:20:52,020 --> 00:20:55,650
‫go to the end to find our predicted output value.

238
00:20:56,370 --> 00:20:59,400
‫We use that predicted output value to find the loss.

239
00:21:00,330 --> 00:21:01,800
‫Then we step wise.

240
00:21:02,170 --> 00:21:11,730
‫Come back, do these differentials, find the individual differential values with ed function, and

241
00:21:11,730 --> 00:21:18,180
‫then we update our VATE and biases so that the final edit is reduced.

242
00:21:20,380 --> 00:21:27,400
‫Again, I will repeat that I understand that this lecture was a little mathematics heavy, but if you

243
00:21:27,400 --> 00:21:30,650
‫have some background calculus, I'm sure you would have understood it.

244
00:21:31,240 --> 00:21:36,880
‫But if you do not have any background in calculus, I understand that you would be facing some difficulty

245
00:21:36,880 --> 00:21:38,920
‫in following all the things that I said.

246
00:21:40,860 --> 00:21:43,190
‫Try to listen to this lecture again.

247
00:21:43,260 --> 00:21:48,720
‫If you are facing difficulty, if you're still unable to follow the concept here.

248
00:21:49,530 --> 00:21:50,190
‫Do not worry.

249
00:21:51,270 --> 00:21:54,690
‫You can still implement a neural network in a software tool.

250
00:21:55,530 --> 00:21:58,980
‫All this mathematical calculation will be done by this software tool.

251
00:21:59,220 --> 00:22:01,740
‫And you do not have to do anything on your own.

252
00:22:02,820 --> 00:22:04,380
‫That is the beauty of neural networks.

253
00:22:04,470 --> 00:22:08,190
‫If you have to do it with hand, it will take a lot of pain.

254
00:22:08,850 --> 00:22:15,630
‫But with computers, you can have millions of neurons and millions of features and your computer will

255
00:22:15,630 --> 00:22:16,840
‫still be able to solnik.

256
00:22:19,890 --> 00:22:21,630
‫So do focus on the practical lecture.

257
00:22:21,770 --> 00:22:26,060
‫That is where you will learn how to implement these neural networks in this offer to.