1
00:00:00,620 --> 00:00:00,940
Hello.

2
00:00:00,950 --> 00:00:01,610
Welcome back.

3
00:00:01,610 --> 00:00:05,240
In this lesson we are going to give a short revision of calculus.

4
00:00:05,240 --> 00:00:11,390
This will help you to be able to picture back propagation and gradient descent more easily in future

5
00:00:11,390 --> 00:00:12,890
lessons.

6
00:00:12,920 --> 00:00:17,090
You do not need to understand calculus to work with deep learning.

7
00:00:17,090 --> 00:00:22,730
In fact out say majority of the people working with deep learning nowadays don't even understand the

8
00:00:22,760 --> 00:00:29,480
calculus behind the back propagation and gradient descent apart from the hard core researchers.

9
00:00:29,480 --> 00:00:32,260
Of course so let's begin

10
00:00:35,260 --> 00:00:39,470
this function over here says F of equals 3 A.

11
00:00:39,490 --> 00:00:45,970
This means we get a result of the function f of E by multiplying the number three by whatever the value

12
00:00:45,970 --> 00:00:47,280
of is.

13
00:00:47,770 --> 00:00:48,840
In this graph.

14
00:00:48,830 --> 00:00:51,130
Over here the x axis represent a

15
00:00:55,080 --> 00:01:01,930
the x axis represent a and the y axis represents the values for F of E.

16
00:01:01,980 --> 00:01:05,010
For example let's say a equals to

17
00:01:08,200 --> 00:01:10,660
then F A is going to be six.

18
00:01:10,660 --> 00:01:19,080
Since we multiply 2 by 3 to get F of E as we have over here all right.

19
00:01:19,170 --> 00:01:30,430
And for example let's say we increase a by a tiny amount an amount such as zero point zero 0 1 if we

20
00:01:30,430 --> 00:01:34,450
increase a base viewpoints for a short one and we compute f a.

21
00:01:34,480 --> 00:01:43,500
This is what we get we get six point church or three since f of E equals three because we are going

22
00:01:43,500 --> 00:01:49,860
to perform three more to play by two point zero to one right.

23
00:01:51,910 --> 00:01:57,790
We can realize that whatever it is F of a is three times that value.

24
00:01:58,330 --> 00:02:06,880
When we plot our value and our correspondent f of a values this is what we get we get this graph over

25
00:02:06,880 --> 00:02:13,990
here as you can see we've got two here we increase we get two points you sure one we've got six here

26
00:02:14,010 --> 00:02:21,640
we increase we get six points or two or three close when we increase this when we compute f off two

27
00:02:21,640 --> 00:02:30,670
points which are one We get six points for short three right we can compute the slope of F of a bi divide

28
00:02:30,670 --> 00:02:37,900
into vertical change by the horizontal change over here we have named the vertical change height you

29
00:02:38,050 --> 00:02:45,360
can see height and the horizontal change has been named as with and this is the height zero point zero

30
00:02:45,400 --> 00:02:53,870
three the width is your points which are 1 when we perform this division we arrive at the answer 3 as

31
00:02:53,870 --> 00:02:58,130
we have over here and is the same thing we get over here.

32
00:02:58,130 --> 00:03:09,170
This is often referred to us the F E over d e f of e of a D E and this is how we write it d e f g over

33
00:03:09,170 --> 00:03:10,900
D E and we get 3.

34
00:03:11,030 --> 00:03:17,570
Also I should point out that you don't always need to plot a graph in order to compute the functions

35
00:03:17,600 --> 00:03:22,120
we should define basic rules for finding derivatives later on in the lesson.

36
00:03:22,340 --> 00:03:24,120
Let's take a look at another example.

37
00:03:24,170 --> 00:03:28,390
Over here we have a function Y equals X squared right.

38
00:03:28,490 --> 00:03:34,970
Another way to find the derivative is the change in y over change in x which is the same thing as what

39
00:03:34,970 --> 00:03:41,300
we spoke of earlier but we can write it as Delta Delta to represent change.

40
00:03:41,300 --> 00:03:48,260
So delta y over delta x over here we have a number of x values in the corresponding y values from this

41
00:03:48,260 --> 00:03:49,130
graph.

42
00:03:49,130 --> 00:03:56,570
If x is 1 given our equation Y equals X squared then Y will still be a y will be 1.

43
00:03:56,570 --> 00:03:59,780
Also if x is 3 y will be 9.

44
00:03:59,780 --> 00:04:06,180
If we find a delta x meaning the change in x we get to if we find a Delta why the change in Y.

45
00:04:06,200 --> 00:04:13,490
We get 8 if we perform delta y over Delta 2 we get for this is that there effective delta y over delta

46
00:04:13,580 --> 00:04:16,670
x 8 divided by two equals four.

47
00:04:17,030 --> 00:04:24,410
Right now let's take a look at some routes of derivatives the power to state star to find a derivative

48
00:04:24,530 --> 00:04:27,160
of a function x rated a power n.

49
00:04:27,290 --> 00:04:28,400
The answer becomes.

50
00:04:28,400 --> 00:04:30,860
And times X reached the power end minus one.

51
00:04:32,150 --> 00:04:39,980
If N equals zero then then the derivative is zero because that they're effective of a constant is equal

52
00:04:39,980 --> 00:04:41,450
to zero.

53
00:04:41,450 --> 00:04:42,450
Right.

54
00:04:42,470 --> 00:04:51,170
If we want to find the derivative of a constant a multiplied by a function f of x we can find a derivative

55
00:04:51,290 --> 00:04:58,460
of the function f effects and then multiply the answer by the constant a derivative of a function f

56
00:04:58,460 --> 00:05:02,540
of x is also sometimes written us f prime of X prime.

57
00:05:02,540 --> 00:05:06,050
Is this apostrophe sine you have up here.

58
00:05:06,200 --> 00:05:08,960
So that's another way of writing the derivative.

59
00:05:08,960 --> 00:05:14,910
D derivative of the sum of two functions f of x and f of T is equal to D.

60
00:05:14,930 --> 00:05:22,910
Derivative of F of X plus the derivative of F of G as we can see shown in these two examples over here.

61
00:05:23,030 --> 00:05:23,330
Right.

62
00:05:23,360 --> 00:05:29,630
You can post a video and take a look at the examples we already mentioned the derivative of a constant

63
00:05:29,960 --> 00:05:31,820
equals zero.

64
00:05:31,820 --> 00:05:32,390
Right.

65
00:05:32,540 --> 00:05:40,790
The derivative of a function multiplied by a constant equals the derivative of the function only then

66
00:05:40,790 --> 00:05:42,080
multiplied by the constant.

67
00:05:42,080 --> 00:05:47,810
So if we have the function multiplied by a constant and we want to find a derivative we first have to

68
00:05:47,810 --> 00:05:53,060
find a derivative of the function alone and then it would multiply the answer by the constant.

69
00:05:53,400 --> 00:05:55,490
The difference rule works the same way.

70
00:05:55,580 --> 00:06:01,670
Like the sum rule and we've already seen the same rule we set the m the derivative of the sum of two

71
00:06:01,670 --> 00:06:09,880
functions f of x plus G of X equals that derivative of F of X plus the derivative of G F X and a differential

72
00:06:10,070 --> 00:06:17,870
works exactly the same way as the sum rule the product to over here says the derivative of the derivative

73
00:06:17,870 --> 00:06:24,200
of the product of two functions is equal to D derivative of the first function multiplied by the second

74
00:06:24,200 --> 00:06:32,360
function plus defense function multiplied by the derivative of the second function the derivative of

75
00:06:32,360 --> 00:06:41,340
sine X equals cause X and the derivative of course X equals minus sine X right.

76
00:06:41,390 --> 00:06:48,770
And the derivative of each less constant e raised about X is the same thing E ratio part X the chain

77
00:06:48,770 --> 00:06:53,350
rule is the one group that is heavily applied in back propagation.

78
00:06:53,390 --> 00:07:01,730
It states that we apply the chain rule by multiplying the derivative of the outside function by the

79
00:07:01,730 --> 00:07:03,100
inside function.

80
00:07:03,290 --> 00:07:15,230
It is often expressed as D X equals DFT y dot do Y D X let's say we want to find the derivative of a

81
00:07:15,230 --> 00:07:25,090
function over here that they're a victim of this function brackets open three x plus two X squared brackets

82
00:07:25,160 --> 00:07:27,690
closed or squared.

83
00:07:27,980 --> 00:07:36,860
We first find the derivative of the entire function which gives us 2 x brackets open 3 x plus 2 x squared

84
00:07:37,490 --> 00:07:44,570
brackets closed and once we've done that we have to find the derivative of the content inside the front

85
00:07:45,050 --> 00:07:46,060
inside a bracket.

86
00:07:47,030 --> 00:07:49,400
And that gives us 3 x plus 4 x.

87
00:07:49,650 --> 00:07:52,970
So first we find the derivative of the entire function.

88
00:07:53,310 --> 00:07:57,220
And then we've multiplied by the derivative of the constant in the brackets.

89
00:07:57,220 --> 00:08:05,520
Only remember the entire function is bracket open three x plus two X squared brackets close squared.

90
00:08:05,550 --> 00:08:07,680
That is the entire function.

91
00:08:07,680 --> 00:08:13,350
The content of the frac the content of the bracket only is three x plus two X squared.

92
00:08:14,220 --> 00:08:20,140
So entire function derivative multiply by content of bracket derivative.

93
00:08:20,190 --> 00:08:22,370
That is the chain rule right.
