1
00:00:00,960 --> 00:00:01,450
Hello.

2
00:00:01,500 --> 00:00:02,710
Welcome back.

3
00:00:02,710 --> 00:00:10,190
In this lesson we shall begin a process of building our own deep learning library from scratch.

4
00:00:10,260 --> 00:00:15,270
Earlier on we saw how to you know compute the cuts or no cut.

5
00:00:15,270 --> 00:00:23,070
We saw how to create a model using logistic regression to predict whether an image has a cut or not.

6
00:00:23,070 --> 00:00:29,130
Now we are going to use deep neural network to compute the same.

7
00:00:29,720 --> 00:00:30,900
That same problem.

8
00:00:30,900 --> 00:00:35,940
We're going to use deep neural network for for solving the same problem.

9
00:00:36,270 --> 00:00:45,810
So in the first four or five lessons we agreed to just create a number of functions for our deep learning

10
00:00:45,810 --> 00:00:50,210
library and this first network is going to be a two lay out network.

11
00:00:50,520 --> 00:00:55,320
It's gonna have an input layer a single hidden layer right.

12
00:00:55,410 --> 00:00:59,370
And then later on we shall create we shall other function.

13
00:00:59,390 --> 00:01:06,930
I can take any size that you can just say you want five hidden layers and you'll be able to call it

14
00:01:06,930 --> 00:01:08,790
with like one line of code.

15
00:01:08,790 --> 00:01:10,600
So later on we shall implement.

16
00:01:10,770 --> 00:01:13,240
Implement a function that accept us.

17
00:01:13,280 --> 00:01:19,190
I comment in a number of layers but for this first initial one we'll take a look at two layers.

18
00:01:19,230 --> 00:01:25,790
After this you get a grasp of what is required to write one that takes in a number and then create um

19
00:01:25,860 --> 00:01:27,240
the layers.

20
00:01:27,360 --> 00:01:31,230
So I'm gonna start off of open my idea over here.

21
00:01:32,010 --> 00:01:33,780
When I press Contro s to save

22
00:01:38,850 --> 00:01:45,630
and I'm going to call this to n n underscore lib V1.

23
00:01:46,090 --> 00:01:47,690
Right.

24
00:01:47,920 --> 00:01:49,220
So let's start off.

25
00:01:49,930 --> 00:01:50,290
Okay.

26
00:01:50,350 --> 00:01:58,120
So let's start off by importing num pi gonna import num pi S and P

27
00:02:00,840 --> 00:02:01,630
next.

28
00:02:01,680 --> 00:02:08,130
Let's start by writing a function that would initialize parameters in our parameters are the weights

29
00:02:08,220 --> 00:02:10,070
and the bias W and P.

30
00:02:10,180 --> 00:02:14,840
So this function is going to initialize parameters for a truly a network.

31
00:02:15,180 --> 00:02:16,620
Right.

32
00:02:16,680 --> 00:02:23,130
So um to take us argument the function is going to take the um the size of the input layer meaning the

33
00:02:23,130 --> 00:02:30,240
number of input notes and then we can parse the um the size of the hidden layer meaning number of nodes

34
00:02:30,240 --> 00:02:34,770
for the hidden layer and then number of nodes for the output layer as well.

35
00:02:34,890 --> 00:02:36,480
So I'm gonna come here and say def

36
00:02:45,660 --> 00:02:46,780
my tag and work working.

37
00:02:46,790 --> 00:02:51,440
Sorry bother and then we're gonna call this initialize parameters

38
00:02:56,070 --> 00:02:59,630
and a first argument is the size of input layer.

39
00:03:00,540 --> 00:03:02,510
I call this an x.

40
00:03:02,730 --> 00:03:14,010
The second argument is size of Italy layer N H four hit in the last size of our triple layer and y like

41
00:03:14,010 --> 00:03:16,620
this.

42
00:03:16,620 --> 00:03:20,540
So we need to initialize the weight in the base.

43
00:03:20,700 --> 00:03:31,480
Okay so I'm going to I'm gonna seed I'm gonna see NPD or to random dot seed to make my resort repeatable.

44
00:03:33,420 --> 00:03:40,320
And then once that is done I'm gonna say W equals w 1 Cause we have to be one in W2 I'll say w 1 Of

45
00:03:40,320 --> 00:03:40,770
course

46
00:03:43,710 --> 00:03:59,820
none pi dot random dot round n and then the size I want is N H and x then I'll multiply by zero point

47
00:03:59,820 --> 00:04:06,600
one two points you were one here then W2 echoes the same thing actually

48
00:04:09,270 --> 00:04:15,780
and P random round n this time the sizes

49
00:04:18,690 --> 00:04:36,840
and Y and then N H and then multiply by two points to a 1 right so once we've done that we can create

50
00:04:36,870 --> 00:04:44,010
one for the bias we have B one equals we initialize the bias with zeros and initialize that weight with

51
00:04:44,100 --> 00:04:58,160
random numbers so B one of course n Pitot zeros and the size here is hidden layer by 1 and thus we've

52
00:04:58,170 --> 00:05:07,260
got a surround and other brackets here next B to be true of course and pedo those

53
00:05:12,220 --> 00:05:13,690
and the size.

54
00:05:13,690 --> 00:05:24,870
This also a column vector this one sizes size of output length by one right.

55
00:05:25,460 --> 00:05:25,880
Okay.

56
00:05:27,890 --> 00:05:28,970
Um yeah.

57
00:05:29,180 --> 00:05:36,500
So this all there is actually we have to make sure the function returns this so we can store we can

58
00:05:36,500 --> 00:05:37,730
store this in a dictionary.

59
00:05:40,870 --> 00:05:50,160
Oh see parameters create a dictionary parameters and then I'll say w one key value.

60
00:05:50,760 --> 00:05:56,040
Um key value pair the p one here would hold our w 1

61
00:05:59,670 --> 00:06:04,200
W-2 would hold our W-2

62
00:06:09,850 --> 00:06:10,670
p. 1

63
00:06:13,480 --> 00:06:15,460
she'd hold our b b 1

64
00:06:25,900 --> 00:06:31,260
and then B to who would I be to like this.

65
00:06:31,640 --> 00:06:38,300
Then I'll simply return dictionary return parameters does the name of the dictionary.

66
00:06:38,710 --> 00:06:39,160
Right.

67
00:06:40,800 --> 00:06:41,220
Okay.

68
00:06:41,250 --> 00:06:47,800
So this function when we call this function and pass the um the size of the inputs the input the size

69
00:06:47,800 --> 00:06:56,050
of the inputs the hidden layer and the output layer we get the weight matrices initialize for us as

70
00:06:56,050 --> 00:06:59,650
well as the bias vectors also initialized.

71
00:07:00,670 --> 00:07:01,100
Okay.

72
00:07:01,150 --> 00:07:06,150
Once that is done we can implement the function for forward propagation.

73
00:07:06,250 --> 00:07:09,160
So we're going to implement this function in two stages.

74
00:07:09,160 --> 00:07:16,510
We would implement the function that calculates for Z and then another function that takes the z value

75
00:07:16,810 --> 00:07:20,070
and computes the um the activation.

76
00:07:20,110 --> 00:07:25,800
It takes the z value and passes it through the activation function to give us the a value right.

77
00:07:27,060 --> 00:07:32,550
So I come over here def and then I'll call this function.

78
00:07:32,560 --> 00:07:33,460
Linear forward

79
00:07:38,820 --> 00:07:47,340
and then it's going to take the um it's gonna take activation from the previous layer or the inputs.

80
00:07:47,340 --> 00:07:51,700
LEE I remember the input layer is often denoted as a zero.

81
00:07:52,030 --> 00:08:00,460
So we pass this activation from the previous layer or the input layer as the input and then the abuse

82
00:08:00,460 --> 00:08:15,170
the weight piece the bias and you know how to compute C you simply see shared by the.

83
00:08:15,470 --> 00:08:21,050
We simply say Z equals w dot

84
00:08:25,530 --> 00:08:27,930
A plus B like this.

85
00:08:29,400 --> 00:08:38,300
And then what we're going to do is we going to store the A W and B values past here.

86
00:08:38,310 --> 00:08:43,290
We're going to store it we're going to store it in that list known as cash.

87
00:08:43,320 --> 00:08:46,820
So I'm gonna create a list here or an array if you prefer.

88
00:08:47,340 --> 00:08:57,120
I'm going to store these values a WP we will use this later for computing back propagation and once

89
00:08:57,120 --> 00:09:00,120
we've done this this function is going to return

90
00:09:03,490 --> 00:09:07,200
it's gonna return to C as well as our cash like this.

91
00:09:08,280 --> 00:09:10,540
Okay.

92
00:09:10,890 --> 00:09:14,090
So once we've done this this will just compute C for us.

93
00:09:14,100 --> 00:09:21,360
Now we write in a function that takes C and passes it through the activation function.

94
00:09:21,360 --> 00:09:26,130
But before we do the less writes functions for our activation

95
00:09:28,690 --> 00:09:35,940
we going to write one for sigmoid as one as well as one for the ratty Linea unit.

96
00:09:36,120 --> 00:09:38,110
Say you have quite a number of images here.

97
00:09:38,830 --> 00:09:42,880
Okay so this is the the real issue.

98
00:09:43,230 --> 00:09:48,690
If we take a number and we pass it through the red blue we just compute the marks comparing the number

99
00:09:48,690 --> 00:09:50,320
to 0 0.

100
00:09:50,460 --> 00:09:53,300
If the number is below zero then zero be returned.

101
00:09:53,310 --> 00:09:57,530
If if if it's above zero then the number will be returned.

102
00:09:57,690 --> 00:09:59,500
Right.

103
00:09:59,640 --> 00:10:02,220
So I'll come over here and creates red blue def

104
00:10:05,480 --> 00:10:12,920
red Lou and then it takes an argument C and this is a simple us.

105
00:10:13,820 --> 00:10:15,450
This is interesting.

106
00:10:17,520 --> 00:10:18,640
Sorry about that.

107
00:10:18,690 --> 00:10:33,300
This are simple as a U course we can use num pi dot maximum and then we just find a maximum of 0.

108
00:10:33,990 --> 00:10:46,280
And the argument passed and then we can store dizzy value in cash for usage and then return a aswell

109
00:10:46,350 --> 00:10:50,530
as cash like this.

110
00:10:50,630 --> 00:10:57,400
Next we write a function to compute the sigmoid we already saw that in the previous example.

111
00:10:58,130 --> 00:11:08,360
So simply going to type it out we see a sigmoid it takes an argument C and we compute the activation

112
00:11:08,360 --> 00:11:23,930
by saying one divided by one plus non pi dot e XP minus C and this should return the um the sigmoid

113
00:11:25,160 --> 00:11:34,370
and we can start to see value in a variable called cash return both the C and D a return a and then

114
00:11:34,850 --> 00:11:39,140
return cash like this.

115
00:11:39,140 --> 00:11:45,560
Okay so now that we've got what activation functions we can take the Zi and pass it through one of the

116
00:11:45,560 --> 00:11:46,910
activation functions.

117
00:11:47,000 --> 00:11:51,620
So we're going to write the activation part the function that performs the activation in such a way

118
00:11:51,620 --> 00:12:01,040
that um you can decide or the user can decide whether to use red blue activation or sigmoid activation

119
00:12:04,500 --> 00:12:04,940
right.

120
00:12:05,580 --> 00:12:12,190
So we are going to write a function here called this linear activation forward.

121
00:12:12,710 --> 00:12:13,400
C. diff

122
00:12:16,920 --> 00:12:24,390
linear activation forward and issue will take three argument.

123
00:12:24,390 --> 00:12:30,640
The first argument is the previous a value which in our case is the Z right.

124
00:12:30,710 --> 00:12:40,160
But if it's the first layer this is going to be the C previous E and then we pass the weight as arguments

125
00:12:40,170 --> 00:12:40,650
as well

126
00:12:44,320 --> 00:12:48,760
and then we pass B and then activation.

127
00:12:48,760 --> 00:12:56,500
So this third argument or this fourth argument is going to allow us to choose whether we want to use

128
00:12:56,590 --> 00:12:58,210
a relative or a sigmoid

129
00:13:01,890 --> 00:13:06,630
so we can check if they use a selected relative or sigmoid.

130
00:13:06,630 --> 00:13:07,620
So I'm gonna see if

131
00:13:10,260 --> 00:13:14,130
activation is sigmoid

132
00:13:19,110 --> 00:13:24,620
then I'm simply going to I'm going to call our function this function here.

133
00:13:24,630 --> 00:13:30,930
Leaning forward kind of call this and then I'm going to pass the argument a previous

134
00:13:35,180 --> 00:13:44,270
and then W and then B and then this is going to return Zi this is going to return Zi in cash.

135
00:13:44,270 --> 00:13:52,530
I'm going to store this in two variable C and linear cash.

136
00:13:52,730 --> 00:13:53,700
Okay.

137
00:13:54,080 --> 00:14:02,020
So then now I can take the Z and pass it through and pass it through a while sigmoid activation function.

138
00:14:02,660 --> 00:14:14,860
So I'm going to call sigmoid here and then I'll pass Zi here apostasy returned from this line under

139
00:14:14,860 --> 00:14:18,730
the sigmoid he's going to return a in cash.

140
00:14:18,770 --> 00:14:20,450
So this is this.

141
00:14:20,450 --> 00:14:21,820
Cash and cash.

142
00:14:21,830 --> 00:14:22,980
We call this linear cash.

143
00:14:23,000 --> 00:14:30,010
We will call this activation cost and we'll see the return is a coma.

144
00:14:34,180 --> 00:14:36,100
Activation on underscore score.

145
00:14:36,100 --> 00:14:36,640
Cash

146
00:14:39,400 --> 00:14:39,980
like this.

147
00:14:41,010 --> 00:14:41,530
Okay.

148
00:14:41,560 --> 00:14:45,130
So this is the case for sigmoid.

149
00:14:45,130 --> 00:14:49,030
So let's check what if the person selected realm.

150
00:14:49,450 --> 00:14:53,680
So we can use the dose if elif

151
00:14:58,120 --> 00:15:07,410
activation in course rally then we do the same thing but this time would pass it through the realm.

152
00:15:09,070 --> 00:15:10,810
So we do excel at this

153
00:15:24,020 --> 00:15:32,900
pasted over here and then it's going to return it and activation cautious were

154
00:15:36,110 --> 00:15:38,510
the name of the function is relative.

155
00:15:43,000 --> 00:15:45,980
Policy like this.

156
00:15:45,980 --> 00:15:48,760
Right.

157
00:15:49,190 --> 00:15:58,400
So we are going to take a one to cautious activation cash and linear cash um linear cash activation

158
00:15:58,400 --> 00:15:58,970
cash.

159
00:15:59,010 --> 00:15:59,780
They both have it.

160
00:15:59,780 --> 00:16:05,460
We're gonna take the linear cash and activation cash and store it in an array or list and then we'll

161
00:16:05,460 --> 00:16:07,390
return to our list as well.

162
00:16:07,410 --> 00:16:09,500
We'll call this cash because

163
00:16:15,460 --> 00:16:19,990
linear cash from here and activation cash

164
00:16:22,770 --> 00:16:23,770
and then we return

165
00:16:26,680 --> 00:16:27,100
a

166
00:16:30,410 --> 00:16:31,240
in cash.

167
00:16:31,390 --> 00:16:31,810
Right.

168
00:16:33,160 --> 00:16:34,640
So now we have this.

169
00:16:35,510 --> 00:16:39,270
So now let's write a function to compute the cost.

170
00:16:39,550 --> 00:16:42,060
I'm gonna come over here.

171
00:16:42,520 --> 00:16:47,250
Def codec function compute cost.

172
00:16:47,260 --> 00:16:53,470
And this is going to take Y hot and then Y

173
00:16:58,040 --> 00:17:08,030
and we can get the M value which is the number of training examples simply by saying m equals the true

174
00:17:08,270 --> 00:17:11,810
um the label is this the labels of the M of the.

175
00:17:12,170 --> 00:17:13,960
That's the labels for the training data.

176
00:17:13,980 --> 00:17:22,190
Meaning the correct answer which is why we say Y shape 1 will give us give us the number of examples

177
00:17:22,220 --> 00:17:22,700
we have.

178
00:17:23,090 --> 00:17:29,420
So once that is done we complete the cost using the um using the equation we saw earlier

179
00:17:35,560 --> 00:17:38,810
um we use this equation over here for our cost function.

180
00:17:38,810 --> 00:17:39,920
We saw this earlier.

181
00:17:41,030 --> 00:17:43,030
So I'm gonna implement a python

182
00:17:45,080 --> 00:17:49,280
to simply say a cost equals

183
00:17:52,010 --> 00:18:05,990
1 over m multiply by brackets will happen when I use N.P. dot then pass y come up you need a log.

184
00:18:06,020 --> 00:18:20,300
So I say and p dot log y hot and we're gonna take the transpose of that and then all of this minus and

185
00:18:20,300 --> 00:18:23,900
p dot 1 minus Y

186
00:18:30,820 --> 00:18:40,390
comma and p dot we need a log log 1 minus Y hot

187
00:18:44,150 --> 00:18:45,230
transpose like this.

188
00:18:47,060 --> 00:18:59,000
Okay so this should work for us and what we gonna do is we going to squeeze the output and when we use

189
00:18:59,000 --> 00:19:07,250
the squeeze function it makes the shape what we expect um what happens is we would end up with something

190
00:19:07,250 --> 00:19:07,870
like this.

191
00:19:08,870 --> 00:19:11,000
Let's say 20.

192
00:19:11,120 --> 00:19:16,130
But when we squeeze this becomes simply 20 and this can't.

193
00:19:16,160 --> 00:19:21,490
This is what our function can accept a function one should be able to accept this this data.

194
00:19:21,500 --> 00:19:23,800
This number will be represented something like this.

195
00:19:24,050 --> 00:19:27,090
But when we squeeze all dimensions it becomes just that number.

196
00:19:27,120 --> 00:19:30,740
So I'm going to call the squeeze function on cost here.

197
00:19:30,910 --> 00:19:33,860
Or simply say it cost

198
00:19:37,680 --> 00:19:41,790
cost equals and Peter squeeze

199
00:19:44,530 --> 00:19:50,300
I'm gonna pass cost in here like this and we're going to return cost.

200
00:19:50,640 --> 00:19:52,630
Right.

201
00:19:53,250 --> 00:20:00,240
So once that is done we can perform a while back propagation using the same back propagation equations

202
00:20:00,240 --> 00:20:01,750
we saw earlier.

203
00:20:01,860 --> 00:20:09,120
I'll say def we do it in two stages like we saw earlier like we did for forward propagation we do the

204
00:20:09,120 --> 00:20:13,750
linear bits and then we activate the linear.

205
00:20:14,010 --> 00:20:15,540
So I see a linear backward

206
00:20:18,810 --> 00:20:24,570
and we're gonna pass the z over here and then the cush

207
00:20:27,840 --> 00:20:32,380
so it does take DC and cash as I comment.

208
00:20:33,100 --> 00:20:37,520
And what we're gonna do is we're going to extract the content of cash.

209
00:20:37,560 --> 00:20:44,090
Remember within this cash we would have the previous e as well as weight and the bias.

210
00:20:44,550 --> 00:20:45,630
So I'm gonna say

211
00:20:48,780 --> 00:20:59,090
I'm gonna say a proof coma W and then B equals you get a cash.

212
00:20:59,370 --> 00:21:05,840
Uh we get it from the cash and then we can get the no number of training examples.

213
00:21:06,270 --> 00:21:08,750
Which is represented by M.

214
00:21:08,850 --> 00:21:14,250
Of course previous shape X one like this.

215
00:21:15,600 --> 00:21:16,040
Right.

216
00:21:16,430 --> 00:21:32,940
So once that is done we can compute d w w e course one divided by M times non pi dot DC

217
00:21:36,360 --> 00:21:39,590
a previous transpose like this.

218
00:21:39,870 --> 00:21:41,010
Okay.

219
00:21:41,030 --> 00:21:43,350
No DP e course

220
00:21:49,250 --> 00:21:54,980
one divided by M times num pi dot

221
00:21:57,960 --> 00:22:00,050
times num pi sum.

222
00:22:00,300 --> 00:22:05,400
Let's find so these are the equations we are using

223
00:22:11,090 --> 00:22:13,020
right.

224
00:22:13,880 --> 00:22:21,850
We are computing T W one on D DP one here we are ready computer w one this bit here.

225
00:22:21,950 --> 00:22:25,320
This is what we have over here.

226
00:22:25,430 --> 00:22:34,510
DC Delta product X transpose and then our X is the previous activation we just a underscore proof.

227
00:22:34,680 --> 00:22:37,460
Now we compete in DP 1 here.

228
00:22:37,460 --> 00:22:38,280
Right.

229
00:22:38,360 --> 00:22:40,630
So I'll say N.P. dot sum

230
00:22:43,790 --> 00:22:53,060
and then I'm gonna say I'm gonna pass DC and actions equals one and then I'm going to set keep dimensions

231
00:22:53,060 --> 00:22:53,570
to true

232
00:22:59,330 --> 00:23:08,150
like this one of uh m times and put out some DC axis one cube dimensions.

233
00:23:08,300 --> 00:23:13,640
Okay and now we do a previous

234
00:23:16,550 --> 00:23:26,010
course and p dot w transpose DC right.

235
00:23:27,670 --> 00:23:37,900
So now we point to return to three parameters return to a previous

236
00:23:40,610 --> 00:23:44,980
W and DP Right.

237
00:23:45,040 --> 00:23:47,190
So that's the linear back propagation.

238
00:23:47,230 --> 00:23:55,480
Next we're going to take what we get from the linear back propagation and pass it through activation.

239
00:23:55,480 --> 00:24:01,420
So I'm going to come over here to create a new function call this function

240
00:24:03,760 --> 00:24:12,160
linear activation back for it like this one we have lean it backward only this one is linear backward

241
00:24:12,740 --> 00:24:14,170
the activation is gonna be here

242
00:24:21,300 --> 00:24:29,910
and this will take three arguments it's gonna take D a to crush the activation type whether we want

243
00:24:29,910 --> 00:24:40,140
to use real to or the sigmoid so first less extract the contents of the cache we have cash over here

244
00:24:42,630 --> 00:24:50,070
and in that we we would extract the linear cache and the activation cache

245
00:24:55,610 --> 00:24:56,020
right.

246
00:24:56,380 --> 00:25:01,610
So once we've done that we've got check the activation type selected by the user.

247
00:25:01,870 --> 00:25:17,260
So I'll see if activation equals revenue then we simply compute DC first DC equals we've got a right

248
00:25:17,260 --> 00:25:21,590
to function for the realm backward.

249
00:25:22,150 --> 00:25:25,410
Right.

250
00:25:25,570 --> 00:25:31,300
Basically need to write a function to compute the derivative of the sigmoid and you read.

251
00:25:31,570 --> 00:25:36,450
So I'm gonna come over here top here and then I'll write really backward.

252
00:25:38,260 --> 00:25:43,240
And this is going to take the D.A. as well as the cash as argument

253
00:25:47,530 --> 00:25:56,880
and we asked extract the cash once that is done I'm gonna convert the DC to D.

254
00:25:56,890 --> 00:26:12,010
Correct object simply do this by saying DC equals P dot array and then we take D a set copy of course

255
00:26:12,010 --> 00:26:12,840
true here.

256
00:26:13,000 --> 00:26:14,060
Like this.

257
00:26:14,590 --> 00:26:15,300
Right.

258
00:26:15,370 --> 00:26:17,250
And then I'll say DC

259
00:26:19,890 --> 00:26:25,050
say square bracket C is less than or equal to zero.

260
00:26:25,090 --> 00:26:32,240
What I want to do is when the C is less than or equal to zero then DC should be zero.

261
00:26:32,320 --> 00:26:35,590
So say DC square brackets T.

262
00:26:35,730 --> 00:26:40,480
Uh Z is less than or equal to zero to space this out a bit.

263
00:26:40,520 --> 00:26:42,070
I'm simply going to return DC

264
00:26:44,800 --> 00:26:45,250
next.

265
00:26:45,350 --> 00:26:46,000
Oh right.

266
00:26:46,000 --> 00:26:59,830
The um sigmoid quote on this is going to take the same argument D.A. and the cache and we start off

267
00:26:59,830 --> 00:27:01,330
by getting our C

268
00:27:03,970 --> 00:27:14,560
G escort from the cache and then we compute the actual sigmoid S equals one divided by into bracket

269
00:27:14,980 --> 00:27:26,250
one plus and P that exponent minus C and then once that is done we can compute a derivative.

270
00:27:26,260 --> 00:27:27,370
DC equals

271
00:27:31,440 --> 00:27:43,060
a watch play by s watch play by one minus s and we can return DC.

272
00:27:43,610 --> 00:27:44,900
Right.

273
00:27:44,990 --> 00:27:47,750
So this is really backward.

274
00:27:47,740 --> 00:27:49,310
And then there's the sigmoid backward.

275
00:27:49,940 --> 00:27:52,000
Okay.

276
00:27:52,010 --> 00:27:55,300
So now we can continue with our linear activation backward

277
00:27:59,860 --> 00:28:04,450
so if Rallo is selected we're gonna use the real loop backward to function

278
00:28:10,270 --> 00:28:15,110
past the a here and then the activation cache

279
00:28:18,200 --> 00:28:19,510
on the other hand.

280
00:28:20,220 --> 00:28:21,100
Um yeah.

281
00:28:21,230 --> 00:28:32,130
So once we get a DNA we can um copy this paste on the other hand if sigmoid is selected if sigmoid is

282
00:28:32,130 --> 00:28:37,080
selected we simply use the sigmoid backward

283
00:28:39,780 --> 00:28:46,290
so we get dizzy we get easy but once we get it easy we've got to pass it through the um the back propagation

284
00:28:46,290 --> 00:28:46,850
function.

285
00:28:46,860 --> 00:28:52,080
So I'm gonna call the function here linear backward

286
00:28:56,190 --> 00:29:01,020
and this is going to take DC and then the linear cache

287
00:29:05,770 --> 00:29:06,250
and

288
00:29:09,000 --> 00:29:17,560
should return previous to a as well as D W and DP

289
00:29:20,640 --> 00:29:21,080
Right.

290
00:29:24,070 --> 00:29:26,900
It can just copy this and put it on the sigmoid

291
00:29:29,510 --> 00:29:41,100
so if Ray Lewis passed this bit he'll be used sigmoid is past this PTO be used it has to be Elif here

292
00:29:47,560 --> 00:29:51,470
right.

293
00:29:52,040 --> 00:29:58,020
So now we can return our previous a D.W. NDP

294
00:30:02,280 --> 00:30:03,600
called D.A. prove

295
00:30:07,140 --> 00:30:08,940
E.W. D.B.

296
00:30:13,920 --> 00:30:20,940
no less right to function to update these parameters after we perform after we perform back propagation

297
00:30:21,750 --> 00:30:23,650
and we compute the arrow.

298
00:30:23,850 --> 00:30:26,000
We would have to update the parameters.

299
00:30:26,070 --> 00:30:27,900
So I'm going to create a function here.

300
00:30:28,110 --> 00:30:35,330
I'll say def update parameters.

301
00:30:35,430 --> 00:30:37,370
Was going to take three argument.

302
00:30:37,550 --> 00:30:44,800
The first argument is gonna be the parameters to create gradients on the learning rate alpha

303
00:30:48,370 --> 00:30:48,880
and

304
00:30:56,180 --> 00:31:02,210
in order to be able to use this function for future neural networks that we would develop which more

305
00:31:02,210 --> 00:31:03,110
than two layers.

306
00:31:03,680 --> 00:31:08,900
I'm going to make sure they use I can use um I'm going to make sure we get the number of layers from

307
00:31:08,900 --> 00:31:16,370
the parameters so oh simply C L equals the length of parameters

308
00:31:25,170 --> 00:31:31,480
to plenty of parameters divided by two rounded down to the nearest number.

309
00:31:32,050 --> 00:31:37,240
And then once that is done I'm gonna use a loop.

310
00:31:37,300 --> 00:31:44,400
I'm gonna say for l in range of capital L

311
00:31:47,870 --> 00:31:49,500
and I'm gonna see parameters

312
00:31:52,440 --> 00:32:00,180
and then into parameters dictionary I'm going to take the view I'm going to append the number what I

313
00:32:00,180 --> 00:32:02,010
stop you want up you two etc..

314
00:32:02,040 --> 00:32:04,460
I say W L plus 1

315
00:32:09,470 --> 00:32:13,400
equals parameters.

316
00:32:18,840 --> 00:32:19,440
W

317
00:32:24,250 --> 00:32:25,230
plus one.

318
00:32:25,270 --> 00:32:27,180
I'm using this crankier SDR

319
00:32:30,490 --> 00:32:34,930
plus one minus the lending rate multiplied multiply by the gradient

320
00:32:50,580 --> 00:32:52,000
and integrate into.

321
00:32:52,010 --> 00:32:54,310
We're looking for the top you.

322
00:32:55,800 --> 00:32:59,630
So let me just complete this and then I'll explain further

323
00:33:07,250 --> 00:33:07,820
right.

324
00:33:12,150 --> 00:33:14,020
See what we have here.

325
00:33:17,980 --> 00:33:18,790
Yeah.

326
00:33:19,630 --> 00:33:21,310
Square brackets close to here.

327
00:33:21,310 --> 00:33:28,180
So we have a parameter w you cross the same parameter top you.

328
00:33:29,080 --> 00:33:36,760
So if it is one w one a course the B 1 minus lending rate multiplied by two top you.

329
00:33:36,760 --> 00:33:38,620
That is what we compute in essentially

330
00:33:44,000 --> 00:33:55,290
you're gonna see right next we going to compute b the bias.

331
00:33:56,070 --> 00:33:57,600
So I'll see parameters

332
00:34:01,400 --> 00:34:09,060
B and then we can append without the um the B number what s b 1 b to be 3.

333
00:34:09,180 --> 00:34:13,080
So we see SDR L plus one

334
00:34:15,990 --> 00:34:16,590
equals

335
00:34:23,050 --> 00:34:25,210
parameters.

336
00:34:26,310 --> 00:34:33,440
B plus T R O plus 1.

337
00:34:40,760 --> 00:34:46,970
Minus lending rate which play by

338
00:34:50,990 --> 00:34:51,360
B.

339
00:34:51,630 --> 00:35:01,890
And to get DP we do create NS which is going to be a dictionary DP and then we get a type of DP What

340
00:35:01,920 --> 00:35:08,400
s DP 1 or DP two or DP three etc. L plus one like this

341
00:35:11,310 --> 00:35:13,260
and we can return parameters

342
00:35:23,740 --> 00:35:24,720
like this.

343
00:35:24,970 --> 00:35:25,470
Right.

344
00:35:25,480 --> 00:35:29,770
So once this is done we've got to write a function to predict for us

345
00:35:32,650 --> 00:35:40,330
this function is going to take our test set which is x and then our test Labor's y.

346
00:35:40,330 --> 00:35:44,830
And then the parameters which are the weight and the biases from our trained model.

347
00:35:47,350 --> 00:35:57,640
So I'm gonna come over here and say def predict and I'll pass x y then parameters

348
00:36:01,560 --> 00:36:09,860
and I'm gonna get the number of examples we have by saying M cause X shape

349
00:36:13,600 --> 00:36:23,860
index 1 and I'm going to get the number of layers in the neural network as well by getting the parameters

350
00:36:23,890 --> 00:36:29,050
divided by two rounded to the um the nearest whole number and equals

351
00:36:32,680 --> 00:36:34,030
plenty of parameters

352
00:36:40,150 --> 00:36:48,000
divided by two rounded down using the top with sludge because I'm around in it down to the nearest number

353
00:36:48,340 --> 00:36:58,330
and then now I can perform my forward propagation to perform a forward propagation will need a function

354
00:36:58,660 --> 00:37:03,160
that puts everything together right.

355
00:37:05,740 --> 00:37:11,920
Perhaps we can implement this function in a way uh in our neural network file remember this function

356
00:37:11,950 --> 00:37:17,610
we call in it our library file and we just put in to get up some popular functions that will be using

357
00:37:17,640 --> 00:37:23,410
the screen to be another function there's gonna be another python script that would call all the functions

358
00:37:23,410 --> 00:37:28,720
in this file and in that script perhaps we can include a while predict function

359
00:37:32,520 --> 00:37:33,000
actually.

360
00:37:33,030 --> 00:37:34,050
Let's keep it here.

361
00:37:34,140 --> 00:37:39,750
Let's write another function to perform the company to forward propagation which we shall pass to this

362
00:37:39,750 --> 00:37:40,980
predictive function.

363
00:37:41,200 --> 00:37:46,290
And what I mean by the complete forward propagation is is a function that would take put the linear

364
00:37:46,290 --> 00:37:47,440
and deactivation bit.

365
00:37:48,120 --> 00:37:50,060
Okay let's write quickly.

366
00:37:50,280 --> 00:37:50,830
I'm gonna say.

367
00:37:50,830 --> 00:38:05,580
Def motto Forward prop or quarter's function is and this function is going to take X which is the um

368
00:38:05,670 --> 00:38:18,190
the input and then parameters meaning the weight and the bias and then I'm gonna have a list of names

369
00:38:18,200 --> 00:38:18,900
to list here.

370
00:38:18,900 --> 00:38:28,740
Known as cache caches and then I'm gonna set a course X and then I'm going to get a length the length

371
00:38:28,740 --> 00:38:33,330
of the um the length US and the number of layers.

372
00:38:33,330 --> 00:38:33,700
Now.

373
00:38:33,720 --> 00:38:41,700
Neural network that we pass we can do that by getting the length of the parameters then parameters

374
00:38:46,850 --> 00:38:48,980
divided by two and then we round it down.

375
00:38:49,400 --> 00:38:55,480
Now once that is done I'm gonna open a for loop for I in range

376
00:38:59,220 --> 00:38:59,970
1 L

377
00:39:02,800 --> 00:39:09,720
from I in range 1 to L 4 here.

378
00:39:10,170 --> 00:39:27,750
Then we can do a print equals a and then we can we can perform Iowa linear activation so o say net activation

379
00:39:27,750 --> 00:39:31,230
forward which is the function we created initially earlier

380
00:39:36,030 --> 00:39:40,290
and then I'm going to pass a proof.

381
00:39:40,290 --> 00:39:47,880
Remember I just took X here put it in a and then I took it and then I made it a previous.

382
00:39:48,000 --> 00:39:58,470
So I'm gonna pass this a previous and then I'm gonna get the um I'm gonna get the weight from the dictionary

383
00:39:58,910 --> 00:40:00,130
so I'll see parameters

384
00:40:03,300 --> 00:40:04,280
the weight W

385
00:40:09,810 --> 00:40:18,710
and then I'll get the right one what I stop you want or w two etc. by the person L over here like this

386
00:40:22,050 --> 00:40:27,400
and then remember l is going to be incrementing It's a loop.

387
00:40:27,450 --> 00:40:28,280
This should be uh

388
00:40:35,700 --> 00:40:44,290
okay so we get a wait L and then I'm going to I'm gonna get a bias.

389
00:40:44,300 --> 00:40:44,680
P

390
00:40:57,760 --> 00:41:00,000
plus l here as well.

391
00:41:00,340 --> 00:41:00,900
Right.

392
00:41:00,910 --> 00:41:03,640
And then I'm going to select the activation type.

393
00:41:03,640 --> 00:41:13,520
We're going to use realm for the first activation so see activation because relevant

394
00:41:17,610 --> 00:41:22,620
Okay so we've got to perform the next one.

395
00:41:22,620 --> 00:41:24,710
So we take what we get.

396
00:41:24,900 --> 00:41:26,890
This is going to return.

397
00:41:27,790 --> 00:41:28,040
Yeah.

398
00:41:28,050 --> 00:41:29,840
And then cash.

399
00:41:30,030 --> 00:41:41,640
Remember our linear forward unit activation forward function returns to object where we set it to returns

400
00:41:41,670 --> 00:41:43,690
the activation and the cash.

401
00:41:43,800 --> 00:41:48,930
And I'm going to take the activation into cash activation in cash or is it.

402
00:41:48,930 --> 00:41:49,190
Yeah.

403
00:41:49,200 --> 00:41:50,250
Activation in cash.

404
00:41:50,250 --> 00:41:51,390
And I'm going to read it.

405
00:41:51,450 --> 00:41:52,110
Return them.

406
00:41:52,140 --> 00:41:58,350
But what I'm going to do is simply take the cash and spend it too cautious.

407
00:42:01,560 --> 00:42:05,190
But before we do that we've gotta implement the second layer

408
00:42:11,030 --> 00:42:14,660
pent up over here cautious that append

409
00:42:17,360 --> 00:42:22,310
I want to append cash to cash is right.

410
00:42:22,320 --> 00:42:23,130
Once this is done.

411
00:42:23,160 --> 00:42:27,930
So this is going to go through from L to L minus 1.

412
00:42:27,960 --> 00:42:35,760
So I'm gonna come out of the loop and compute for L O to see linear activation forward

413
00:42:41,790 --> 00:42:53,520
and then I'm going to take a I'm gonna take the E returned from here I'm gonna pass I here a and then

414
00:42:53,790 --> 00:42:59,790
parameters this time I'm looking forward to playing capital L

415
00:43:04,780 --> 00:43:15,160
W plus SDR to L and then the as well say parameters

416
00:43:20,850 --> 00:43:31,140
B plus capital to l like this and then the activation is going to be sigmoid

417
00:43:45,220 --> 00:43:46,450
right.

418
00:43:46,540 --> 00:43:49,630
So this also screen to return to object.

419
00:43:51,160 --> 00:43:57,070
So this is going to return activation of the final layer capital L and the cash as well.

420
00:43:58,210 --> 00:44:01,890
So this can be the the.

421
00:44:01,970 --> 00:44:07,300
The function is going to return to final activation the activation of the final layer in the cash.

422
00:44:07,330 --> 00:44:10,650
So I'll just put that in the return statement here.

423
00:44:10,660 --> 00:44:13,000
Return a uh.

424
00:44:13,510 --> 00:44:21,520
And um um we've got to append this cash total caches and return caches.

425
00:44:23,050 --> 00:44:30,790
So I'm gonna say cautious is not append and we want to append cash

426
00:44:34,510 --> 00:44:36,190
and then now we can return

427
00:44:41,800 --> 00:44:45,220
El caches like this.

428
00:44:46,750 --> 00:44:48,800
Right before we move on we've got a fix.

429
00:44:48,870 --> 00:44:58,600
The quotation marks for access in the um the dictionary object it should be a single quotation not double.

430
00:44:58,650 --> 00:45:00,690
I think I've been mixed in double and single

431
00:45:08,520 --> 00:45:10,690
actually.

432
00:45:10,700 --> 00:45:15,230
I think both single and double quotation is work in Python 3.

433
00:45:15,270 --> 00:45:18,480
So I'll leave it if we receive an error we can come back and fix it.

434
00:45:19,150 --> 00:45:19,430
Okay.

435
00:45:19,470 --> 00:45:21,660
So now we can come up to a what predict function.

436
00:45:21,660 --> 00:45:22,600
Now we have this.

437
00:45:22,620 --> 00:45:26,790
We have fought propagation the complete forward propagation.

438
00:45:26,850 --> 00:45:32,490
Okay so we are back to what predict function.

439
00:45:32,700 --> 00:45:42,130
We get the number from the number of test examples we get a number of layers.

440
00:45:42,160 --> 00:45:52,380
Now I'm gonna create a list called P O C P E course in p dot zeros initialized to zeros.

441
00:45:52,810 --> 00:45:55,590
You're gonna be equal to number of test examples.

442
00:45:55,590 --> 00:45:56,520
1 m.

443
00:45:56,820 --> 00:45:59,070
So as you can see this a real vector.

444
00:46:01,770 --> 00:46:08,100
And once that is done I'm going to perform the forward propagation or call our newly created function

445
00:46:08,190 --> 00:46:09,730
model forward prop.

446
00:46:10,050 --> 00:46:11,620
Bring it over here.

447
00:46:11,790 --> 00:46:19,560
I'm going to pass the X that I use I'll pass over here pass it as the first I comment.

448
00:46:19,800 --> 00:46:23,340
The second argument is gonna be the parameters that I use are passes

449
00:46:25,930 --> 00:46:31,820
right and now um this is going to return.

450
00:46:31,890 --> 00:46:37,160
Of course is going to return the activation of the final layer as well as the caches.

451
00:46:37,170 --> 00:46:39,030
So I'm going to store this in

452
00:46:42,330 --> 00:46:43,620
um probabilities

453
00:46:52,020 --> 00:46:53,770
and then cautious

454
00:46:57,470 --> 00:46:57,890
right.

455
00:46:57,900 --> 00:47:03,370
So activation of the final layer is gonna be stored in a list called probabilities.

456
00:47:03,420 --> 00:47:08,330
We're gonna take the probabilities and convert them to 0 and 1 based on thresholds.

457
00:47:08,520 --> 00:47:14,860
We say if the probability is greater than zero point five converted to one if it less converted to zero.

458
00:47:15,780 --> 00:47:18,870
So at best we know what it's crossed or not.

459
00:47:19,050 --> 00:47:31,770
So I'll say for I in range zero comma is a long word probabilities pasted over here.

460
00:47:31,860 --> 00:47:36,100
That shape the next one.

461
00:47:36,750 --> 00:47:38,310
I'll see if

462
00:47:42,950 --> 00:47:44,240
if probabilities

463
00:47:49,990 --> 00:47:56,340
index zero I is greater than zero point five.

464
00:47:57,700 --> 00:48:00,300
Then probabilities index 0.

465
00:48:00,310 --> 00:48:07,020
I of course one else

466
00:48:10,440 --> 00:48:13,570
probabilities index 0.

467
00:48:13,590 --> 00:48:18,710
I of course you like this.

468
00:48:20,650 --> 00:48:21,450
Okay.

469
00:48:21,840 --> 00:48:25,890
Now we can print the accuracy if we want.

470
00:48:26,820 --> 00:48:30,060
Um actually we we are going to okay.

471
00:48:30,090 --> 00:48:37,860
We're returning to probabilities but we're going to start with the um the converted ones the ones that

472
00:48:37,860 --> 00:48:41,270
we convert to binary so 1 we're going to storage in peace.

473
00:48:41,280 --> 00:48:48,000
I'm gonna copy the S P over here and then if probability is greater than zero point five I'm gonna store

474
00:48:48,000 --> 00:49:00,690
this in P then this in P S Well right we can compute total accuracy you can print it out print

475
00:49:07,520 --> 00:49:08,300
accuracy

476
00:49:12,130 --> 00:49:16,850
into SDR then put out some

477
00:49:21,980 --> 00:49:33,830
and then we do p of course why don't we divide this by M which is the number of test examples and we

478
00:49:33,830 --> 00:49:38,680
can return p s well we can return the entire list.

479
00:49:39,290 --> 00:49:44,580
Okay so we've got all the functions we need right.

480
00:49:48,400 --> 00:49:49,540
Okay okay.

481
00:49:49,610 --> 00:49:50,890
Looks good.

482
00:49:51,710 --> 00:49:57,590
So we've got all the functions we need and we would find out what we will find out if these functions

483
00:49:57,590 --> 00:50:03,800
work if there are areas when we call them in our neural network function.

484
00:50:03,830 --> 00:50:10,100
So in the next lesson we should start building our first neural network with two layers.

485
00:50:10,100 --> 00:50:13,980
The first time that we saw is a logistic regression model.

486
00:50:14,520 --> 00:50:17,790
And um this time we did it in an actual.

487
00:50:17,820 --> 00:50:25,580
And in the first one is taken at a logistic regression approach using you know neural networks but this

488
00:50:25,580 --> 00:50:32,250
one is our classic to lay a neural network and we shall build this in the next lesson and we shall make

489
00:50:32,250 --> 00:50:38,120
use of the functions that we've created in this script here known as the end and underscore loop.

490
00:50:38,660 --> 00:50:43,960
So if you have any questions just send me a message and I'll see you in the next lesson.

491
00:50:44,000 --> 00:50:51,000
So before we um we move on to um to use these functions in our one next script.

492
00:50:51,440 --> 00:50:51,600
Yeah.

493
00:50:51,610 --> 00:50:52,480
A number of arrows.

494
00:50:52,520 --> 00:50:59,240
I actually found while just looking at a file for instance or what yeah I was supposed to return the

495
00:50:59,240 --> 00:51:01,870
activation by I ended up returning.

496
00:51:01,970 --> 00:51:05,410
So don't make this error just fix it.

497
00:51:05,510 --> 00:51:10,420
The C should be returned it should be change to a it was C here.

498
00:51:10,880 --> 00:51:12,200
I meant to return it.

499
00:51:12,230 --> 00:51:15,140
No see we don't need to return c c s the argument.

500
00:51:15,230 --> 00:51:20,540
So this is just a typo and there's another error somewhere.

501
00:51:21,100 --> 00:51:22,240
Well he had a W.

502
00:51:22,250 --> 00:51:30,410
Make sure it s cup to a W if you type small w wouldn't is different from cup to a WD arguments except

503
00:51:30,410 --> 00:51:31,630
the cup to a W.

504
00:51:31,700 --> 00:51:32,660
So that's all there is.

505
00:51:32,780 --> 00:51:34,030
Sorry about this.

506
00:51:34,210 --> 00:51:38,300
Yeah okay right.

507
00:51:38,390 --> 00:51:39,980
So I'll see you in the next lesson.