1
00:00:00,750 --> 00:00:01,280
Hello.

2
00:00:01,320 --> 00:00:01,860
Welcome back.

3
00:00:01,860 --> 00:00:07,520
In this lesson we're going to write a function to help us initialize our weight and our buyers.

4
00:00:07,530 --> 00:00:09,360
The B parameter.

5
00:00:09,360 --> 00:00:15,930
I'm going to delete the comment from here to create a bit more space and then I'm going to put a function

6
00:00:15,930 --> 00:00:16,980
up here.

7
00:00:17,150 --> 00:00:17,850
I said def

8
00:00:22,270 --> 00:00:23,140
initialize

9
00:00:27,210 --> 00:00:32,510
and a sure initialize zeros and then it's going to take the dimension.

10
00:00:32,510 --> 00:00:33,410
Simple as this.

11
00:00:33,740 --> 00:00:37,400
And then what we want to do is use num pi.

12
00:00:37,400 --> 00:00:43,780
We can simply say W B which is screenshot which are going to be the return.

13
00:00:44,000 --> 00:00:49,870
And then we say over here num pi dot zeros.

14
00:00:50,060 --> 00:00:53,570
And over here we can simply

15
00:00:57,140 --> 00:01:04,270
pass dimension by one over here and then zero.

16
00:01:04,910 --> 00:01:14,300
Once this is done we can simply return you b so return w come up b here and then we can test this out.

17
00:01:14,600 --> 00:01:18,980
I can come over here and see.

18
00:01:19,890 --> 00:01:23,370
Way to come up by S E course.

19
00:01:24,530 --> 00:01:31,940
Just name of the function Initialize zeros and then I'm going to see a dimension I'm gonna pass a dimension

20
00:01:31,940 --> 00:01:37,610
of 3 and then what I'm going to do next and I'm gonna see a print

21
00:01:40,600 --> 00:01:46,140
wait e course plus SDR wait

22
00:01:51,110 --> 00:01:51,830
then print

23
00:01:55,290 --> 00:01:56,840
by as equals

24
00:02:05,840 --> 00:02:07,270
bias like this.

25
00:02:07,470 --> 00:02:08,870
Okay let's run this and see

26
00:02:13,220 --> 00:02:14,790
why are we still printing this.

27
00:02:14,920 --> 00:02:17,840
Anyway as you can see the weights have three dimensions.

28
00:02:17,840 --> 00:02:21,300
The bias has a single dimension right.

29
00:02:21,320 --> 00:02:23,630
Okay looks good.

30
00:02:23,630 --> 00:02:25,700
I forgot to clean this print.

31
00:02:26,450 --> 00:02:26,780
Okay

32
00:02:31,300 --> 00:02:32,620
I'm gonna clean this now.

33
00:02:33,200 --> 00:02:34,020
Okay.

34
00:02:34,250 --> 00:02:41,510
So once this is done um we can go on to write our forward propagation function but we're going to need

35
00:02:41,510 --> 00:02:42,740
a sigmoid there.

36
00:02:43,280 --> 00:02:48,100
So we have to write a function to implement the sigmoid in Python.

37
00:02:48,110 --> 00:02:49,160
So let's do that now.

38
00:02:49,220 --> 00:02:55,790
I'm gonna come down here or write a function called sigmoid.

39
00:02:55,910 --> 00:03:05,990
This function is gonna take a variable X and then what are we going to do is say that um we can see

40
00:03:06,620 --> 00:03:24,710
why the cause 1.0 divided by one plus num pi exponential XP minus X like this and we can return y

41
00:03:29,510 --> 00:03:30,050
right.

42
00:03:30,140 --> 00:03:39,440
So this the sigmoid function and we saw we saw this array This is the equation for compute in the sigmoid

43
00:03:39,530 --> 00:03:41,700
is the same thing with implemented here.

44
00:03:41,810 --> 00:03:51,260
1 over 1 plus e rich to the poor the argument pass their function over here it says c we are passing

45
00:03:51,260 --> 00:03:52,640
X it s fine.

46
00:03:52,640 --> 00:03:53,400
Okay.

47
00:03:53,540 --> 00:03:53,990
Right.

48
00:03:54,350 --> 00:04:01,010
So once this is done we can go on and write our forward propagation function.

49
00:04:01,160 --> 00:04:02,060
Actually you're wrong.

50
00:04:02,070 --> 00:04:03,320
Don't you know.

51
00:04:03,320 --> 00:04:03,920
Right.

52
00:04:03,920 --> 00:04:08,680
Forward and backward propagation you know separately.

53
00:04:08,720 --> 00:04:11,600
How about we just put it all together.

54
00:04:11,990 --> 00:04:17,740
We can implement it in one function of C diff.

55
00:04:17,920 --> 00:04:19,520
I'll call this propagation

56
00:04:22,740 --> 00:04:26,910
and then we going to pass the weight the weight vector.

57
00:04:26,910 --> 00:04:33,750
The bias which is s color value our input and then our labels which are the y values.

58
00:04:33,990 --> 00:04:35,920
Right.

59
00:04:36,210 --> 00:04:46,600
And let's start by extracting the um the M value also M because X shape and then we're gonna extract

60
00:04:46,600 --> 00:04:47,760
it from here like this.

61
00:04:48,120 --> 00:04:54,960
Once that is done we can perform the activation and we saw how to compute the activation.

62
00:04:54,960 --> 00:04:56,060
Let's take a look at I was

63
00:04:59,030 --> 00:04:59,450
right.

64
00:04:59,460 --> 00:05:09,030
We said we compute ze by taking the dot product of weight and input input x here the Dutch product of

65
00:05:09,030 --> 00:05:14,340
the input x and transpose this t here implies the transpose of the weight.

66
00:05:14,610 --> 00:05:21,220
Once we've taken the dot product of the input x and the transpose of the weight we are by bias P and

67
00:05:21,240 --> 00:05:22,470
then we get Z.

68
00:05:22,740 --> 00:05:30,140
When we take Z and we pass it as an argument to the sigmoid function then we get the value over yet

69
00:05:30,220 --> 00:05:36,540
it is showing y hearts because it is the final a value because it's the final one but this is how we

70
00:05:36,540 --> 00:05:37,320
get activation.

71
00:05:37,320 --> 00:05:40,680
This the final activation that is why it's written here as y hat.

72
00:05:41,280 --> 00:05:48,090
Okay so we're going to implement input right plus B and then pass it through the sigmoid and then we'll

73
00:05:48,090 --> 00:05:51,130
find a right.

74
00:05:51,700 --> 00:05:52,030
Okay.

75
00:05:52,050 --> 00:05:58,040
For those of you who don't understand I said Of course various images here.

76
00:05:58,080 --> 00:06:02,890
So I'm saying it here and Y hats are the same.

77
00:06:03,000 --> 00:06:08,160
I've written y hearts as we saw in a theoretical class it's y hat here because it's the final activation

78
00:06:08,220 --> 00:06:09,640
of the neural network.

79
00:06:09,730 --> 00:06:14,990
I'll show you a slide that shows that this is the same as a show.

80
00:06:15,000 --> 00:06:22,800
This is the slide from our workout neural network where we're compute in hours of work out and hours

81
00:06:22,800 --> 00:06:23,480
of rest.

82
00:06:23,550 --> 00:06:31,710
Over here as you see we have a here and then we have y hat is because over here we had we had a hidden

83
00:06:31,710 --> 00:06:34,850
layer before the final activation there.

84
00:06:34,860 --> 00:06:39,060
That is why we have a and why how to bind our logistic regression.

85
00:06:39,060 --> 00:06:41,640
We have just one activation.

86
00:06:41,700 --> 00:06:42,120
Okay.

87
00:06:43,800 --> 00:06:44,130
Okay.

88
00:06:44,160 --> 00:06:51,280
So this was a neural network or a logistic regression you would know what looks like this is what you

89
00:06:51,280 --> 00:06:57,900
make we convert it to a vector like we've done already and then we feed it into a neural network we

90
00:06:57,900 --> 00:06:58,280
feed it.

91
00:06:58,290 --> 00:07:05,040
This the inputs layer and then we just pass through a single activation and then we predict if it's

92
00:07:05,250 --> 00:07:10,510
greater than 5 if it's greater than zero point five meaning it's a cut if it's less than zero point

93
00:07:10,510 --> 00:07:10,920
five.

94
00:07:10,920 --> 00:07:11,970
It's not a cut.

95
00:07:12,030 --> 00:07:15,880
So this was a our neural network looks like that is way over here.

96
00:07:15,900 --> 00:07:17,740
Why hat is the same as a.

97
00:07:17,850 --> 00:07:24,060
We don't have a one and then a 2 which is the same as Y hat if we have a hidden layer.

98
00:07:24,090 --> 00:07:30,300
Then we would have a face activation and after that activation they activation after that one which

99
00:07:30,300 --> 00:07:32,770
could be the final one will be our y huts.

100
00:07:32,990 --> 00:07:39,930
But over here we simply have our input layer and then we multiply by the wheat and the buyers and then

101
00:07:39,930 --> 00:07:43,290
we pass it through our activation function.

102
00:07:43,440 --> 00:07:43,900
Right.

103
00:07:43,920 --> 00:07:48,470
So this on your network let's continue with implementing the back propagation.

104
00:07:48,600 --> 00:07:53,830
So we came over here we said we've got to compute the activation which is a.

105
00:07:54,420 --> 00:08:00,210
So I'm going to say the course we're going to call the sigmoid function we wrote sigmoid and then we

106
00:08:00,210 --> 00:08:08,490
said we we compute the z value by taking the transpose of the weights and then compute in a dot product

107
00:08:08,490 --> 00:08:11,650
with a weight transpose and the inputs.

108
00:08:11,670 --> 00:08:19,920
Once we've done that we added bias and then that gives us Z to get E we pass Zi to the sigmoid function

109
00:08:19,980 --> 00:08:21,660
and the output is a.

110
00:08:21,780 --> 00:08:28,350
So over here I'm gonna call our sigmoid function and the argument here is NPR the dot dot for the dot

111
00:08:28,350 --> 00:08:35,140
product operation and when I do the beauty for transpose of weight and then X over here.

112
00:08:35,190 --> 00:08:45,900
So this will give us you know this will compute the way to transpose dot product X after that plus b

113
00:08:45,900 --> 00:08:49,820
here and then all of this will be applied.

114
00:08:50,010 --> 00:08:54,390
We apply the sigmoid to all of this what comes out is the value.

115
00:08:54,480 --> 00:08:57,530
Once this is done we can compute our cost.

116
00:08:57,840 --> 00:09:00,000
We saw the equation for our cost function.

117
00:09:00,000 --> 00:09:00,840
Let's take a look.

118
00:09:01,110 --> 00:09:02,690
Let's take a look let's look at it.

119
00:09:04,710 --> 00:09:08,040
So does the equation for our cost function.

120
00:09:08,040 --> 00:09:11,030
We are going to write this equation in Python right.

121
00:09:12,060 --> 00:09:17,020
Okay so um the return is gonna be stored.

122
00:09:17,020 --> 00:09:17,910
Here it is.

123
00:09:18,090 --> 00:09:23,320
We said let's see let us see again.

124
00:09:25,170 --> 00:09:30,690
So it starts with minus 1 over m m is the number of training examples and then we take the summation

125
00:09:30,690 --> 00:09:32,990
of this entire thing.

126
00:09:34,320 --> 00:09:35,150
Okay.

127
00:09:35,660 --> 00:09:37,480
Oh 2.

128
00:09:38,280 --> 00:09:40,890
Sorry Buddha gotta go minimize this a bit.

129
00:09:41,150 --> 00:09:42,670
Please or equation here.

130
00:09:45,080 --> 00:09:59,780
Okay so I'm gonna come here and I'm going to say minus 1 over M times and then I'm going to use num

131
00:09:59,780 --> 00:10:10,370
by NPD or some that would give me the summation and Pete at some and what I'm gonna do is open brackets

132
00:10:10,460 --> 00:10:19,280
and within this brackets I'll start off with Y times log a

133
00:10:22,420 --> 00:10:24,570
and it is the same as Y hut.

134
00:10:24,610 --> 00:10:25,390
So remember that.

135
00:10:25,390 --> 00:10:25,950
Yeah.

136
00:10:26,090 --> 00:10:44,180
Y 10 times look a plus 1 minus Y times and p dot log 1 minus B.

137
00:10:44,560 --> 00:10:45,060
Like this.

138
00:10:45,070 --> 00:10:45,860
Let's verify.

139
00:10:48,150 --> 00:11:02,010
We have minus one over X times and put out some brackets Lupin y times and P does it look a plus 1 minus

140
00:11:03,510 --> 00:11:05,700
Y times.

141
00:11:06,410 --> 00:11:07,130
Y here.

142
00:11:07,140 --> 00:11:15,780
That should be a dot and P dots look one minus a.

143
00:11:16,010 --> 00:11:19,910
Okay I'm gonna open this again.

144
00:11:20,300 --> 00:11:20,760
Right.

145
00:11:20,780 --> 00:11:26,270
So this what we have this going to be this is our cost function like we saw in a theoretical class.

146
00:11:27,050 --> 00:11:27,730
Okay.

147
00:11:27,800 --> 00:11:38,120
So once this is done we've got to perform the back propagation we've got to compute D W and W A.B. and

148
00:11:38,120 --> 00:11:40,710
we can take a look at this slide again for those y.

149
00:11:43,310 --> 00:11:52,210
So we compute D W by one of these z times X transpose over here.

150
00:11:52,460 --> 00:11:53,400
Right.

151
00:11:53,480 --> 00:11:54,730
And then um okay.

152
00:11:54,740 --> 00:12:01,310
I should mention if if you've not seen this slide before then it would be that I may have re arranged

153
00:12:01,430 --> 00:12:08,030
the course but I would try to keep the current arrangements of the cause the current arrangement of

154
00:12:08,030 --> 00:12:13,430
the course implies that we have studied this in the theoretical class before get into this practical

155
00:12:13,430 --> 00:12:20,840
part but often at times after I apply the Course students and suggestions regarding how the arrangement

156
00:12:20,840 --> 00:12:27,500
of the course should go and I tend to end up rearranging stuff but our endeavour not to add outputs

157
00:12:27,590 --> 00:12:33,440
all day three or day three core lessons at the top so that you go through them before you get to the

158
00:12:33,440 --> 00:12:34,330
practical part.

159
00:12:34,340 --> 00:12:40,880
So it's gonna be some form of long fourth coalescence before come into the practical part.

160
00:12:41,590 --> 00:12:48,680
Um yeah to see um this code in lesson but I hope you get a point anyway like we saw earlier.

161
00:12:48,710 --> 00:12:54,110
If you've seen the theoretical part about how to perform back propagation this is how we compute empty

162
00:12:54,260 --> 00:13:01,850
w this equation for D W and we know DC we compute DC by simply doing a minus Y.

163
00:13:02,300 --> 00:13:02,890
Right.

164
00:13:03,620 --> 00:13:15,590
So I'm gonna minimize this so I'm gonna come down here C D W E cause 1 over m times and P dot

165
00:13:27,710 --> 00:13:38,720
X comma and then DC is a minus Y transpose that's what we saw in the equation and we compute TB as well

166
00:13:39,020 --> 00:13:50,270
like we saw in the theoretical class 1 over m times and P dot one of our M times let's verify that DP

167
00:13:50,310 --> 00:14:01,160
is summation of DC okay N.P. daughter sum and then DC is simply a minus y right.

168
00:14:01,190 --> 00:14:08,060
So this is going to perform the back propagation for us to then once that is done we are going to I'm

169
00:14:09,230 --> 00:14:18,470
going to squeeze it I'm gonna see cost e course and pedo squeeze and this you squeeze the m the dimensions

170
00:14:19,130 --> 00:14:19,730
for us

171
00:14:22,610 --> 00:14:33,320
and then once that is done we are going to store this in a in a dictionary type I'll say it grads this

172
00:14:33,470 --> 00:14:38,810
so that we can simply use the word W and it will fetch W for us and we can do the same

173
00:14:41,870 --> 00:14:42,530
for D.B.

174
00:14:47,560 --> 00:14:51,630
okay who started this key valued pair.

175
00:14:53,050 --> 00:14:53,420
Yeah.

176
00:14:53,530 --> 00:14:54,880
DP WDC and then

177
00:14:57,760 --> 00:15:00,550
we're going to call this gradient.

178
00:15:00,880 --> 00:15:03,760
So then we can simply return gradient in the cost

179
00:15:10,400 --> 00:15:13,040
to receive an arrow funded station here

180
00:15:18,480 --> 00:15:20,270
right.

181
00:15:20,310 --> 00:15:23,430
So this is going to be our propagation function.

182
00:15:23,430 --> 00:15:31,260
We go through forward propagation we calculate the cost and then we computed gradients.

183
00:15:31,260 --> 00:15:31,620
Okay

184
00:15:34,750 --> 00:15:40,570
so in the next lesson we're going to write a function that would learn from the data and update the

185
00:15:40,570 --> 00:15:44,050
weight and bias to sort of reduce error.

186
00:15:44,200 --> 00:15:51,060
We're going to write a function that will take the gradient descent and sort of Len or create our outputs

187
00:15:51,160 --> 00:15:54,190
more model for us.

188
00:15:54,190 --> 00:15:54,700
Okay.

189
00:15:54,760 --> 00:15:56,880
So in the next lesson we're going to continue.
