1
00:00:00,120 --> 00:00:03,240
So we have just defined the error, a function of the fit.

2
00:00:03,660 --> 00:00:08,560
But to be honest, for our algorithm, we don't need the error function itself.

3
00:00:08,580 --> 00:00:12,330
So this was just a practice to define the error function.

4
00:00:12,420 --> 00:00:16,680
What we need instead is the gradient of the error function.

5
00:00:17,250 --> 00:00:22,020
So basically the partial derivatives with respect to our coefficients.

6
00:00:22,860 --> 00:00:26,820
And the reason is our algorithm will work as follows.

7
00:00:27,600 --> 00:00:34,380
We start from some value a zero, which will, for example, be this array here and it will give us

8
00:00:34,650 --> 00:00:35,880
an incorrect fit.

9
00:00:36,660 --> 00:00:42,240
This, we could find out by calculating the error of its value, and it will be a very large number,

10
00:00:42,270 --> 00:00:48,000
much larger than this one, because this one is actually the value that we got for the correct parameters.

11
00:00:49,440 --> 00:00:54,440
And what will happen then is we will loop and in the loop.

12
00:00:54,450 --> 00:01:02,700
We will update our values, our parameters, our coefficients, and we will update them in a way so

13
00:01:02,700 --> 00:01:05,040
that the error function will be decreased.

14
00:01:05,250 --> 00:01:07,230
So it will get smaller and smaller.

15
00:01:08,190 --> 00:01:14,970
And to do this, there exists several reasonable methods and we will later in this course discuss something

16
00:01:14,970 --> 00:01:16,950
called Monte Carlo algorithm.

17
00:01:16,960 --> 00:01:18,660
This is something we could use here.

18
00:01:19,200 --> 00:01:23,670
We would exploit just randomness to decrease the error.

19
00:01:24,210 --> 00:01:28,230
But what I want to show you here is a very typical method that is used very often.

20
00:01:28,240 --> 00:01:31,800
It's called gradient descent or gradient descent methods.

21
00:01:32,760 --> 00:01:39,510
So the thing is, maybe you know this from mathematics when you have a function that depends on several

22
00:01:39,510 --> 00:01:46,710
arguments say, for example, on X and Y, then you can draw something like height map, like when you

23
00:01:46,710 --> 00:01:51,930
have a map, when you go hiking and you have a map that shows you a mountain.

24
00:01:52,980 --> 00:01:58,920
And the idea on this map is that you can draw at every point.

25
00:01:59,280 --> 00:02:06,330
The so-called gradient and the gradient when you're calculated will always point along the direction

26
00:02:06,330 --> 00:02:08,009
of the largest slope.

27
00:02:08,610 --> 00:02:10,880
So when you have the steepest way.

28
00:02:11,310 --> 00:02:14,400
So this will be the fastest way to climb up the mountain.

29
00:02:15,450 --> 00:02:20,100
So this will be a vector, and the components of this vector will be the partial derivatives.

30
00:02:20,940 --> 00:02:23,790
So I think you have heard of this, but if not, it's not a big deal.

31
00:02:24,180 --> 00:02:26,440
We are not really here for the math.

32
00:02:26,610 --> 00:02:29,730
I just give you the math and we are here for the programming, of course.

33
00:02:30,630 --> 00:02:35,810
So the idea would be to calculate the gradient of the error function.

34
00:02:35,820 --> 00:02:37,230
This is what is written down here.

35
00:02:37,240 --> 00:02:39,600
So this is called the novella operator.

36
00:02:40,020 --> 00:02:48,060
And I wrote Down the gradient consists out of elements that are all these partial derivatives of this

37
00:02:48,060 --> 00:02:48,840
error function.

38
00:02:48,960 --> 00:02:53,090
So we have to calculate the derivative with respect to a key.

39
00:02:53,100 --> 00:02:57,090
So there will be several terms of this are a function here.

40
00:02:57,690 --> 00:03:05,310
So we will derive this one respect to a key where the acts are in the function f and then we have this

41
00:03:05,310 --> 00:03:05,820
vector.

42
00:03:06,090 --> 00:03:12,750
There's Nublar acting on the error function and this will be the steepest increase of the error.

43
00:03:13,410 --> 00:03:16,650
So to decrease it, we have to go along the opposite direction.

44
00:03:17,190 --> 00:03:18,360
And this is the whole idea.

45
00:03:18,960 --> 00:03:25,590
We take out parameters a zero, calculate the gradient, go along the opposite direction and update

46
00:03:25,590 --> 00:03:26,550
our parameters.

47
00:03:27,120 --> 00:03:28,580
And then we do it again.

48
00:03:28,590 --> 00:03:34,650
We calculate once again the gradient and we loop over this process several times, and this should bring

49
00:03:34,650 --> 00:03:38,250
us down to the minimum value for the error itself.

50
00:03:38,490 --> 00:03:40,800
And these will then be our final parameters.

51
00:03:42,730 --> 00:03:48,630
So as I wrote, the gradient consists of several elements, which are these partial derivatives, and

52
00:03:48,630 --> 00:03:54,690
what we have to do is we have to calculate the derivative of this function with respect to A.K.

53
00:03:55,650 --> 00:03:58,320
So as I said, these acres are only in the F.

54
00:03:58,740 --> 00:04:06,720
So first of all, we have to calculate this derivative for using the chain rule, and the outer derivative

55
00:04:06,720 --> 00:04:09,720
would be two times this function here.

56
00:04:09,870 --> 00:04:11,370
And of course, the sun sign.

57
00:04:12,030 --> 00:04:15,510
And then we just have to multiply by the inner derivative.

58
00:04:16,230 --> 00:04:23,040
So we have two times the sum of this difference, and then we have the inner derivative, which would

59
00:04:23,040 --> 00:04:25,230
be minus.

60
00:04:25,440 --> 00:04:32,730
And then the narrative of this one would be the river to f with respect to a K.

61
00:04:33,450 --> 00:04:37,950
And if you remember, I will scroll up here a bit to the definition of our model function.

62
00:04:37,980 --> 00:04:45,210
This was the function, and if we derive this with respect to a K, we only get a single term from the

63
00:04:45,210 --> 00:04:48,840
sum and this would be X to the power of K, of course.

64
00:04:50,220 --> 00:04:52,200
So we have now.

65
00:04:56,140 --> 00:04:59,470
Here, the yeah x to the power of K.

66
00:05:00,280 --> 00:05:05,950
And since we have to some sign over the data points, then it also has the Index II, which corresponds

67
00:05:05,950 --> 00:05:06,850
to the data points.

68
00:05:07,600 --> 00:05:13,570
So basically, we just have to calculate our program this term here, which will be one component of

69
00:05:13,570 --> 00:05:14,230
the gradient.

70
00:05:14,680 --> 00:05:22,090
And then the the individual components just differ by the key where key goes from zero to three in our

71
00:05:22,090 --> 00:05:22,450
case.

72
00:05:22,930 --> 00:05:24,160
So it's always the same.

73
00:05:24,160 --> 00:05:28,180
It just differs here by to the power of zero, one, two and three.

74
00:05:29,440 --> 00:05:30,790
So let's go ahead and do this.

75
00:05:31,480 --> 00:05:41,650
We will just right, define our own fit gradient and we have a function f.

76
00:05:42,520 --> 00:05:53,140
We have four coefficients and we have our data and f is to fit to fit function.

77
00:05:53,620 --> 00:05:55,840
The so this is the same thing as here.

78
00:05:56,140 --> 00:06:03,220
So we use to set exactly the same parameters or the same arguments as in the error function.

79
00:06:04,420 --> 00:06:14,560
So now what we have to do is we have to sum up over some terms so we can, of course, take again the

80
00:06:14,560 --> 00:06:16,240
loop over the four.

81
00:06:16,720 --> 00:06:21,910
So we're using a formal, for example, and then looping over not over the error, but looping over

82
00:06:22,360 --> 00:06:26,740
the the gradient or over the partial derivatives.

83
00:06:27,280 --> 00:06:30,880
But I want what I want to do here instead is I want to show you something else.

84
00:06:31,120 --> 00:06:35,860
I want to show you how you can do this even in a single line of code.

85
00:06:36,820 --> 00:06:43,510
And this is because we have to return here, not a single value, but a vector.

86
00:06:43,840 --> 00:06:48,550
So as I said, this one is just a single component of this gradient.

87
00:06:49,000 --> 00:06:54,190
So our actual return will be an array, so it will be and p dot parade.

88
00:06:57,050 --> 00:07:04,070
And will be some court, and then we write for K in range.

89
00:07:05,480 --> 00:07:07,760
Length of coefficients.

90
00:07:09,560 --> 00:07:13,460
So in our case, this will just be a vector with four components.

91
00:07:13,460 --> 00:07:19,340
But since I want to do it here in a very general way where you could also later on say we use a polynomial

92
00:07:19,340 --> 00:07:22,370
of six order, we have to write it this way.

93
00:07:23,330 --> 00:07:27,230
So now just to make it look a bit nicer, we can, for example, write it like this.

94
00:07:28,160 --> 00:07:32,900
And I think know maybe this and then it should work.

95
00:07:32,910 --> 00:07:35,690
And now here we have the actual code from here.

96
00:07:36,710 --> 00:07:38,720
So we write minus two.

97
00:07:38,810 --> 00:07:42,830
This one, we can also write here, but we can also write it here.

98
00:07:42,830 --> 00:07:43,460
It doesn't matter.

99
00:07:44,000 --> 00:07:45,160
Simply write minus two.

100
00:07:45,170 --> 00:07:46,550
And then we just have to say.

101
00:07:48,420 --> 00:07:58,350
Some over these expressions here and now, it would be very tedious to do this using a loop.

102
00:07:58,740 --> 00:08:09,720
So what I do instead is I will sum up over an array, so I write and prod some over and don't pray that

103
00:08:09,720 --> 00:08:11,940
the array will be some values.

104
00:08:11,940 --> 00:08:18,690
And then the index runs for I in range length.

105
00:08:20,580 --> 00:08:25,950
Peter and once again, remember, if you want to have the number of data points, you have to take the

106
00:08:25,950 --> 00:08:28,800
length of data zero and not length of data.

107
00:08:28,830 --> 00:08:30,360
Otherwise, this will just be two.

108
00:08:30,780 --> 00:08:31,980
And no, it's 21.

109
00:08:32,940 --> 00:08:34,500
And now we have to sum.

110
00:08:34,919 --> 00:08:37,080
And now we just have to write this one.

111
00:08:37,860 --> 00:08:38,520
So I write

112
00:08:42,299 --> 00:08:42,929
data.

113
00:08:45,340 --> 00:08:56,380
One Croma eye, which will be the Y component of the IWF data point mines have off date zero comma I

114
00:08:57,850 --> 00:08:59,650
comma coefficients.

115
00:09:02,770 --> 00:09:08,860
And then we multiply by this one day to zero.

116
00:09:09,250 --> 00:09:15,910
Come on, I took the power off key and I think this should be it.

117
00:09:16,990 --> 00:09:20,140
Let's see what happens if we call this our own.

118
00:09:20,770 --> 00:09:23,680
So once again, problem here.

119
00:09:23,690 --> 00:09:28,660
So our fit gradient polynomial.

120
00:09:31,340 --> 00:09:32,750
Come up a zero.

121
00:09:33,000 --> 00:09:42,950
Come on data, um, polynomial, of course, or function is not called polynomial, it's called polynomial

122
00:09:42,950 --> 00:09:43,370
model.

123
00:09:44,480 --> 00:09:52,880
And you see, it is indeed a function that returns an array with four numbers.

124
00:09:53,240 --> 00:09:58,460
So, yeah, at least in terms of the shape of the solution, it looks good.

125
00:09:59,150 --> 00:10:07,670
So this is here, our solution or our gradient of the error function at this particular point.

126
00:10:08,720 --> 00:10:12,660
And we have just programmed here a vector gradient.

127
00:10:12,740 --> 00:10:19,550
So the Knobloch acting on the error function and every component is given by this expression where we

128
00:10:19,550 --> 00:10:23,150
have to change here, the key value when we go to a different component.

129
00:10:23,900 --> 00:10:30,050
And in the next step, we will use this function and use it over and over again to update the value

130
00:10:30,050 --> 00:10:30,590
of a.

