1
00:00:00,300 --> 00:00:00,710
Hello.

2
00:00:00,720 --> 00:00:01,590
Welcome back.

3
00:00:01,590 --> 00:00:09,120
In this lesson we shall talk about factorization de Gaulle or factorization is to avoid explicit for

4
00:00:09,120 --> 00:00:10,530
loops.

5
00:00:10,590 --> 00:00:12,380
Let me explain.

6
00:00:12,390 --> 00:00:22,490
We know to derive C. We compute w transpose x plus P. let's say we have stored a.

7
00:00:22,570 --> 00:00:31,870
W values in a one dimensional array or a column vector and also our x values also in a one dimensional

8
00:00:32,080 --> 00:00:36,770
array which is also known as a column vector like we can see over here.

9
00:00:37,120 --> 00:00:41,690
To complete this we will use a for loop to do something like this.

10
00:00:42,200 --> 00:00:47,760
This single block shall be used to compute a single z value.

11
00:00:47,770 --> 00:00:53,190
We are doing this computing element by element.

12
00:00:53,260 --> 00:00:57,400
This that this takes extremely long to compute.

13
00:00:57,870 --> 00:01:02,980
In programming languages such as Python there are libraries that can be used to perform this kind of

14
00:01:02,980 --> 00:01:09,820
computation over 100 times faster than pi library can be used to do this in number pi.

15
00:01:09,820 --> 00:01:16,740
We can compute the entire Z vector without doing it element by element simply by doing an end p dot

16
00:01:17,380 --> 00:01:20,140
w comma x plus B.

17
00:01:20,620 --> 00:01:27,280
Just by using the N P dots function the NUM pi dots function we can be able to do this.

18
00:01:27,370 --> 00:01:32,690
So this essentially is what we call the vector rise implementation using num pi.

19
00:01:32,770 --> 00:01:41,500
Apart from allowing us to compute the entire vector without for loops the NUM pi library also uses the

20
00:01:42,140 --> 00:01:49,360
same D instructions of the process or which makes it optimized for computations such as this one same

21
00:01:49,370 --> 00:01:56,610
day is spelt as I empty and it stands for single instruction multiple data.

22
00:01:56,650 --> 00:02:04,320
This allows data to be processed in in a parallel manner with future process or cycles.

23
00:02:04,420 --> 00:02:07,430
Let's say we have some training examples too.

24
00:02:07,600 --> 00:02:10,930
Let's say we have some training examples over here.

25
00:02:10,930 --> 00:02:17,670
So to compute the prediction of the first train an example we compute for Z.

26
00:02:17,980 --> 00:02:26,620
And then we use z to find a note that over here I am using actual brackets in the superscript rather

27
00:02:26,620 --> 00:02:28,000
than square brackets.

28
00:02:28,060 --> 00:02:34,200
This is to indicate that I'm indicating training example numbers not only numbers.

29
00:02:34,960 --> 00:02:39,550
We can compute the predictions for the second and deferred by doing the same.

30
00:02:39,730 --> 00:02:41,830
Like I've shown over here.

31
00:02:41,830 --> 00:02:50,260
If we have let's say ten thousand training examples we need to do this ten thousand times we can complete

32
00:02:50,260 --> 00:02:53,190
this faster by applying factorization.

33
00:02:53,620 --> 00:02:56,720
Remember each example is a column vector.

34
00:02:56,740 --> 00:03:04,570
In other words a one dimensional array the length of each example is equal to the number of features

35
00:03:05,290 --> 00:03:08,890
which we call n subscript x.

36
00:03:09,280 --> 00:03:16,780
We can put all our training examples together in a single matrix which we can call matrix capital X

37
00:03:16,780 --> 00:03:17,890
like this.

38
00:03:17,890 --> 00:03:22,630
Such that X superscript 1 represent train.

39
00:03:22,630 --> 00:03:27,600
Example One X superscript to represent train an example 2.

40
00:03:27,860 --> 00:03:32,730
I repeat the first column of the matrix represents the first train in example.

41
00:03:32,740 --> 00:03:39,130
The second column represents the second train an example all the way to the last train an example which

42
00:03:39,130 --> 00:03:41,510
we have denoted here as M.

43
00:03:41,680 --> 00:03:47,100
Therefore the size of this matrix is n subscript X by m.

44
00:03:47,230 --> 00:03:51,700
We can also put all the biases B together in a single real vector.

45
00:03:51,790 --> 00:03:59,380
Having done this we can compute Z for all the training examples all at once by performing w transpose

46
00:03:59,380 --> 00:04:06,920
capital X which contains all our training examples plus real vector B which contains the biases for

47
00:04:06,960 --> 00:04:10,570
all d training examples.

48
00:04:10,570 --> 00:04:17,020
If we take the capital Z and um this will give us a real vector a couple Jose as we can see over here

49
00:04:17,050 --> 00:04:24,310
which contains the small as these desired results of all the training examples and if we take this capital

50
00:04:24,310 --> 00:04:31,220
C and apply our activation function we get the capital E which is a real vector containing all these

51
00:04:31,480 --> 00:04:39,540
small e values and the small e values represent the Ukrainian example a value that represents the a

52
00:04:39,640 --> 00:04:41,650
value for each train an example.

53
00:04:41,710 --> 00:04:48,100
So by factorization we can be using these capital letters just by putting all at once and compute and

54
00:04:48,100 --> 00:04:55,570
starting over in a matrix and then compute in such that we can run it faster and earlier we deduce that

55
00:04:55,590 --> 00:05:03,800
to find the loss with respect to Z we compute a minus Y which we call dizzy so the dizzy of training

56
00:05:03,850 --> 00:05:05,620
example one is written here.

57
00:05:05,650 --> 00:05:14,050
Ask these superscript 1 and DC of training example 2 is written here as DC superscript 2 we can put

58
00:05:14,170 --> 00:05:19,370
all that DC together in a row vector which we can call capital Z.

59
00:05:19,730 --> 00:05:22,140
See over here.

60
00:05:23,930 --> 00:05:29,200
Similarly we can have all the A's.

61
00:05:29,210 --> 00:05:34,570
We can have all the A's in a row Victor which we call capital a.

62
00:05:34,850 --> 00:05:42,950
We can also have all the wise in the roof actual fridge free coke up to why we can complete disease

63
00:05:43,040 --> 00:05:50,950
of our training examples by simply brewing down capital C because capital E minus capital Y.

64
00:05:52,490 --> 00:06:00,210
We can compute to be of our training examples by finding B average of DC like this.

65
00:06:00,270 --> 00:06:09,670
And finally we can compute D W of all training examples by finding the average of capital ex DC transpose.

66
00:06:09,740 --> 00:06:12,480
We shall see how to compute all of this in code.

67
00:06:12,500 --> 00:06:17,960
But before we do that let's see an example let's say we have a truly a neural network such as this one

68
00:06:18,020 --> 00:06:19,430
over here.

69
00:06:19,430 --> 00:06:28,250
As you know by now to find a superscript 2 which is equal to y heart we first find Z superscript 1 and

70
00:06:28,250 --> 00:06:34,430
then use that to find a superscript 1 and then use a superscript 1 to find Z superscript 2.

71
00:06:34,460 --> 00:06:40,980
And then finally use z superscript 2 to find a superscript 2.

72
00:06:41,060 --> 00:06:47,900
This is for just one training example to execute this and code for m training examples.

73
00:06:47,900 --> 00:06:54,700
This is how we shall implement it in code which will essentially perform these steps m times.

74
00:06:55,070 --> 00:07:05,100
Just to remind you again x1 x2 x3 over here do not represent train training example.

75
00:07:05,150 --> 00:07:08,910
These are the features of each train an example.

76
00:07:08,960 --> 00:07:14,310
In other words each train and example has three inputs.

77
00:07:15,020 --> 00:07:23,090
So training example one will produce y hearts one which is the same as a superscript square bracket

78
00:07:23,480 --> 00:07:27,230
to bracket one as it is written here.

79
00:07:27,770 --> 00:07:30,960
Describe brackets indicate the.

80
00:07:31,050 --> 00:07:35,240
The layer number and the normal brackets indicate deep training.

81
00:07:35,270 --> 00:07:36,880
Example No.

82
00:07:36,920 --> 00:07:41,440
That is why they all have square brackets to hold the A's have square brackets too.

83
00:07:41,480 --> 00:07:44,860
Over here in the same way you train training example to produce.

84
00:07:44,930 --> 00:07:50,090
Why how to and training example 3 will produce y hot 3.

85
00:07:50,180 --> 00:07:56,570
So to vector right this we take all our training examples and put them in a single matrix called Cup

86
00:07:56,570 --> 00:07:57,340
2 x.

87
00:07:57,410 --> 00:08:05,090
Like this we shall also keep all the ze superscript 1 result in a matrix like this the a superscript

88
00:08:05,090 --> 00:08:13,190
1 resort in a matrix like this as well as well as the Z superscript 2 and a superscript 2 in their own

89
00:08:13,220 --> 00:08:14,210
matrices.

90
00:08:14,210 --> 00:08:16,530
We've not shown those matrices here.

91
00:08:16,790 --> 00:08:23,750
We can then simply take these vector rise matrices and compute a superscript 2 which is the same as

92
00:08:23,750 --> 00:08:28,750
Y huts as we've shown over here in this snippet named vector right.

93
00:08:28,790 --> 00:08:33,230
In summary this is what a complete gradient descent function looks like.

94
00:08:33,260 --> 00:08:36,710
We complete all z values and then 0 8 values.

95
00:08:36,710 --> 00:08:40,760
Then we find DC D W and DP.

96
00:08:40,820 --> 00:08:48,320
Once we have found these derivatives we go ahead to update to a weight and biases over here it is in

97
00:08:48,440 --> 00:08:55,190
a for loop because we want to perform multiple iterations of gradient descent decimals in a thousand

98
00:08:55,280 --> 00:08:56,550
iterations here.

99
00:08:56,660 --> 00:08:57,960
As an example.

100
00:08:58,130 --> 00:08:59,630
Now let's summarize.

101
00:08:59,820 --> 00:09:05,510
Order with land about 4 propagation and back propagation so far in the next lesson.