1
00:00:04,350 --> 00:00:09,660
In this video, we will look at how to estimate a constant picked from a series of noisy measurements

2
00:00:09,660 --> 00:00:11,110
that relate to that constant.

3
00:00:11,730 --> 00:00:17,460
For example, we might have an engine and we might have a temperature sensor attached to the engine.

4
00:00:18,090 --> 00:00:21,740
Now we want to estimate the temperature of the engine using the sensor measurements.

5
00:00:22,170 --> 00:00:26,430
However, the quality of the sensor might not be all that good at each temperature.

6
00:00:26,430 --> 00:00:28,180
Measurement might be very noisy.

7
00:00:28,950 --> 00:00:34,410
So the true temperature of the engine might look like this, but each of the measurements that we get

8
00:00:34,410 --> 00:00:36,440
from the sensor might be very noisy.

9
00:00:36,450 --> 00:00:37,490
It might be scattered like this.

10
00:00:37,620 --> 00:00:41,670
Every time we would query the sensor, we're going to get a different measurement.

11
00:00:43,090 --> 00:00:48,310
And because of this, we want to take multiple measurements with the sensor to try to get a better estimate

12
00:00:48,310 --> 00:00:50,070
of what the true temperature really is.

13
00:00:52,440 --> 00:00:58,050
So let's put this into mathematical terms, suppose that we have key measurements where each measurement

14
00:00:58,050 --> 00:01:04,680
is denoted by the temperature measurements in this case are going to be a function of the quantity X,

15
00:01:04,950 --> 00:01:09,730
which is the true temperature, and they are corrupted by some noise V.I..

16
00:01:10,650 --> 00:01:16,980
Now we will assume that V.I. is random noise and this noise has zero mean and is uncorrelated white

17
00:01:16,980 --> 00:01:17,440
noise.

18
00:01:18,030 --> 00:01:22,830
This just means that the expected value for the random variable, which is the noise, is going to be

19
00:01:22,830 --> 00:01:23,660
equal to zero.

20
00:01:24,840 --> 00:01:29,550
Now, to estimate the true temperature from all the measurements, a reasonable idea would be just to

21
00:01:29,550 --> 00:01:32,340
average all the temperature measurements together that have been made.

22
00:01:33,660 --> 00:01:37,380
So if we sum up all the measurements, we're going to have an equation that looks like this.

23
00:01:37,950 --> 00:01:45,030
So the main measurement denoted by Y Bar is just can be one of acoustic measurements summing up all

24
00:01:45,030 --> 00:01:46,220
the different Y eyes.

25
00:01:46,770 --> 00:01:50,600
So we substitute in our measurements into this equation here.

26
00:01:51,300 --> 00:01:52,650
We're going to end up with this.

27
00:01:53,070 --> 00:01:58,350
Now we can see here that X here is going to be a constant it doesn't change with the each of the summation.

28
00:01:58,390 --> 00:02:00,420
So we can actually bring out the front.

29
00:02:01,500 --> 00:02:06,210
So we can see here that the main of all this, since the measurements, is going to be equal to the

30
00:02:06,210 --> 00:02:11,130
true temperature X plus the summation of all the random noise.

31
00:02:11,670 --> 00:02:15,370
Now, we know from probability that this operator here looks very familiar.

32
00:02:15,450 --> 00:02:17,720
This is actually just the expectation of radar.

33
00:02:18,240 --> 00:02:22,360
And we know that the expectation of right of a random variable has zero.

34
00:02:22,360 --> 00:02:24,560
I mean, is just going to be equal to zero.

35
00:02:25,050 --> 00:02:30,930
So this is why we can say that the best estimate of the true temperature, so the best estimate being

36
00:02:31,440 --> 00:02:35,700
set here is just going to be who the main of all the measurements this is.

37
00:02:35,700 --> 00:02:41,370
Because if we have enough measurements, the expected value of all the noise values is going to equal

38
00:02:41,370 --> 00:02:44,980
to zero zero main and will be left just with the true temperature.

39
00:02:46,290 --> 00:02:51,570
This relies on the fact that if we take enough measurements, the average value of all the noise should

40
00:02:51,570 --> 00:02:52,460
be zero nine.

41
00:02:52,620 --> 00:02:53,970
So it should all cancel out.

42
00:02:54,540 --> 00:02:59,370
The number of measurements that need to be made will depend on the required accuracy of the estimates

43
00:02:59,370 --> 00:03:02,580
and the size of the noise distribution from the sensor.

44
00:03:03,930 --> 00:03:09,510
Let's extend the estimation of a constant skalla in the previous example to a constant factor.

45
00:03:10,170 --> 00:03:15,300
So suppose that we have the temperature sensor again attached to the engine, but this time we're going

46
00:03:15,300 --> 00:03:19,890
to run the engine at different RPM so speeds and we're going to see how the temperature changes.

47
00:03:20,700 --> 00:03:23,790
So when we do this, we get end up with a graph that looks like this.

48
00:03:23,790 --> 00:03:27,450
We can see the temperature increasing as the engine speed increases.

49
00:03:28,290 --> 00:03:33,060
So now, instead of estimating a single temperature, what we want to do is we want to estimate this

50
00:03:33,060 --> 00:03:33,910
relationship here.

51
00:03:34,620 --> 00:03:42,180
So we want to work out how the temperature changes with the RPM of the engine so this relationship can

52
00:03:42,180 --> 00:03:45,090
be written as a linear line equation.

53
00:03:45,420 --> 00:03:52,920
So the temperature y is going to be a function of a long period of X one times higher RPM plus X to

54
00:03:52,920 --> 00:03:54,330
another parameter of the line.

55
00:03:54,360 --> 00:03:57,580
So this is going to be the slope of the line, this valley up here.

56
00:03:57,870 --> 00:04:01,350
This is going to be the bias of the lines of how far it's shifted from the origin.

57
00:04:02,160 --> 00:04:05,830
So using these two parameters here, we can come up with a line of best fit.

58
00:04:06,270 --> 00:04:11,940
So somehow we have to estimate X one next to from this data set here from the set of measurements.

59
00:04:13,960 --> 00:04:19,210
So, again, putting this into mathematical terms, we're going to end up with Kay measurements and

60
00:04:19,210 --> 00:04:26,140
this temperature measurement, why it's going to be a function of the R.P.M. measurement, the noise

61
00:04:26,800 --> 00:04:31,540
and the line parameter for the slope and the line parameter for the bias.

62
00:04:33,570 --> 00:04:38,670
And again, we make the assumption that the noise on the measurement is zero, remain uncorrelated and

63
00:04:38,670 --> 00:04:44,730
it's white noise so we can take all the equations for all the different measurements we've made and

64
00:04:44,730 --> 00:04:46,620
put it into a matrix equation.

65
00:04:47,200 --> 00:04:51,390
So this matrix here, why is going to be a vector of all the different measurements?

66
00:04:52,020 --> 00:04:56,670
He is going to be the measurement matrix, which is going to describe the linear system that we want

67
00:04:56,670 --> 00:04:57,420
to estimate.

68
00:04:58,020 --> 00:05:03,090
X is going to be the thing that we're actually estimating and V is going to be the vector of all the

69
00:05:03,090 --> 00:05:03,660
noise.

70
00:05:04,560 --> 00:05:07,510
So if you put that into a matrix equation is going to look like this.

71
00:05:08,040 --> 00:05:12,990
So this is just a matrix matrix way of writing out all these simultaneous equations.

72
00:05:16,520 --> 00:05:21,200
Now, this is the equation which we would like to solve now, when you knew all the different terms.

73
00:05:21,200 --> 00:05:25,430
Exactly, then maybe just rearrange this equation and solve for X directly.

74
00:05:26,240 --> 00:05:29,650
But the problem in this is that we have this random variable here.

75
00:05:29,660 --> 00:05:36,230
We have this noise vector and we don't know what the noise Victor is, but we do know some properties

76
00:05:36,230 --> 00:05:37,070
of the noise vector.

77
00:05:37,100 --> 00:05:42,490
We know that the expected value of the random variable for noise is going to be equal to zero.

78
00:05:43,070 --> 00:05:45,960
So we can use this property to come up with an estimate.

79
00:05:46,910 --> 00:05:52,880
So instead of solving for X directly, we actually want to solve for estimated X, and we can do this

80
00:05:52,880 --> 00:05:54,470
by looking at the measurement residual.

81
00:05:55,430 --> 00:06:00,110
So if you rearrange this equation here, instead of trying to solve for X, we're going to solve for

82
00:06:00,110 --> 00:06:00,900
our hat.

83
00:06:00,950 --> 00:06:03,490
So this is going to be an estimated value of X.

84
00:06:04,160 --> 00:06:10,070
Now as we make this measurement residual here go towards zero, it basically means that we're going

85
00:06:10,070 --> 00:06:12,710
to minimize all the effects of the noise.

86
00:06:13,610 --> 00:06:19,640
So this equation here is basically just a rearranging of the Matrix equation to work out the different

87
00:06:20,090 --> 00:06:22,520
measurement residuals for each of the different measurements.

88
00:06:24,050 --> 00:06:29,420
So we define a cost function as the sum of the squares of all the different measurement residuals.

89
00:06:29,970 --> 00:06:31,610
So that's this equation here.

90
00:06:31,610 --> 00:06:38,000
And this equation here can be written out in a vector form of just the residual vector transpose time,

91
00:06:38,000 --> 00:06:39,620
the residual vector again.

92
00:06:39,980 --> 00:06:45,500
So this vector here is going to be a scalar and it's going to be based on the size of the measurement

93
00:06:45,500 --> 00:06:46,210
residuals.

94
00:06:46,730 --> 00:06:52,160
So the larger the residuals, the larger the cost function, the smaller the cost function, the smaller

95
00:06:52,160 --> 00:06:52,750
residuals.

96
00:06:53,450 --> 00:07:01,010
So the idea behind these squares estimation is to minimize J such that our estimated value for X goes

97
00:07:01,010 --> 00:07:02,660
towards the real value of X.

98
00:07:04,830 --> 00:07:09,390
So now let's have a look at how we mathematically go about minimizing the cost function.

99
00:07:09,990 --> 00:07:13,100
So first thing I want to do is expand the cost function.

100
00:07:13,110 --> 00:07:18,090
So if you ride out the original equation for the cost function, it is a function of the measurement

101
00:07:18,090 --> 00:07:18,910
residual vector.

102
00:07:19,410 --> 00:07:24,600
So if we substitute in an equation for the residual vector, we end up with this equation here.

103
00:07:25,260 --> 00:07:30,950
And again, we can multiply all these terms out and we end up with the full expansion of the cost function

104
00:07:31,650 --> 00:07:37,740
so we can see that the cost function, our scalar value J is going to be a function of the measurement

105
00:07:37,740 --> 00:07:42,480
vector Y, our estimate X and our estimation matrix.

106
00:07:43,530 --> 00:07:49,410
So what we want to do is you want to find the value of our asset that minimizes this whole equation

107
00:07:49,410 --> 00:07:49,770
here.

108
00:07:51,870 --> 00:07:57,000
So now we want to find the minimum of the cost function and we can do that in a very similar way as

109
00:07:57,000 --> 00:08:03,200
we would find a minimum of any function, first thing we want to do is differentiate the cost function.

110
00:08:03,900 --> 00:08:09,870
So if we take this function here and we differentiate it with respect to X, we come up with this equation

111
00:08:09,870 --> 00:08:10,200
here.

112
00:08:11,570 --> 00:08:17,390
So now from our maps, we know that if we want to sit, if you want to find the minimum, we can set

113
00:08:17,390 --> 00:08:20,430
the derivative to equal to zero to find a stationary point.

114
00:08:20,780 --> 00:08:26,360
And since this equation is a squared equation, we know that there should be only one stationary point,

115
00:08:26,360 --> 00:08:27,770
which is going to be the minimum point.

116
00:08:28,340 --> 00:08:34,190
So we can substitute in zero for this term here and we can rearrange this equation to work out what

117
00:08:34,190 --> 00:08:34,790
Xs.

118
00:08:35,120 --> 00:08:39,530
And if we do that, we end up with this squares equation here.

119
00:08:40,580 --> 00:08:43,390
So this equation is for the least squares solution.

120
00:08:43,790 --> 00:08:46,910
This is the equation that minimizes the sum of the squared error.

121
00:08:47,480 --> 00:08:54,110
For this equation to be tractable, the matrix here must be full rank and the inverse of this matrix

122
00:08:54,110 --> 00:08:55,250
here must exist.

123
00:08:56,150 --> 00:09:02,000
So the number of measurements K must be greater than the number of elements in the X Factor that we

124
00:09:02,000 --> 00:09:02,810
want to estimate.

125
00:09:05,780 --> 00:09:11,810
So let's have a look at a concrete example, so we've taken a number of sensor measurements of the temperature

126
00:09:11,810 --> 00:09:14,190
at different RPM's and we have a table here.

127
00:09:14,870 --> 00:09:16,310
So these are the temperature measurements.

128
00:09:16,430 --> 00:09:18,380
These are the R.P.M. measurements.

129
00:09:18,650 --> 00:09:23,600
So we can form a Y vector, which is just going to be the elements of the temperature sensor measurements.

130
00:09:24,140 --> 00:09:26,270
And we can form our matrix.

131
00:09:26,270 --> 00:09:31,630
And we know that the linear relationship that we are trying to estimate is going to be the odd P.M.

132
00:09:32,720 --> 00:09:37,430
and a one so that when we multiply this equation out here, we get the equation of a line.

133
00:09:37,430 --> 00:09:44,000
So we end up with this matrix here where we have the RPM measurements one, two, three, four and five.

134
00:09:44,360 --> 00:09:50,660
And then we have the constant BI's elements, just one one, one one, so that when we multiply this

135
00:09:50,900 --> 00:09:58,430
matrix out here, we're going to get X1 times A1 plus X2 and same thing here we're going to get X1 times

136
00:09:58,430 --> 00:10:01,070
are two plus x2.

137
00:10:02,960 --> 00:10:07,260
But once we have these two equations here, we can go about calculating at least squares function.

138
00:10:08,040 --> 00:10:13,740
So first thing we want to do is begin to use the matrix and multiply the transpose times.

139
00:10:14,820 --> 00:10:18,270
If we use a matrix on the previous slide, we're going to come up with this equation here.

140
00:10:18,990 --> 00:10:23,270
Now, the next step in a solution is to work out the inverse of this equation.

141
00:10:23,430 --> 00:10:26,130
So the inverse is going to be this equation here.

142
00:10:27,480 --> 00:10:31,160
The next step is to multiply the inverse by the makeshift again.

143
00:10:32,010 --> 00:10:33,180
So we get this solution.

144
00:10:34,290 --> 00:10:39,900
And then the last step in the least squares equation is to multiply this matrix by the measurement vector

145
00:10:39,900 --> 00:10:40,380
y.

146
00:10:40,710 --> 00:10:43,770
And again, so this is our final solution here.

147
00:10:44,100 --> 00:10:47,410
This is going to be the parameters for X1 and X2.

148
00:10:48,180 --> 00:10:54,390
So if we take these parameters out from this solution, Matrix X, we end up with the linear relationship

149
00:10:54,390 --> 00:10:55,340
that we want to estimate.

150
00:10:55,350 --> 00:10:58,530
We want to estimate the temperature first R.P.M. relationship.

151
00:10:59,130 --> 00:11:03,670
So the temperature for a given RPM is going to be X1, which is our nine point one.

152
00:11:04,020 --> 00:11:09,000
So 91 divided by 10 X1 plus are biased in Obama's term.

153
00:11:09,000 --> 00:11:14,070
Here is fifty two point seven or 527 divided by ten.

154
00:11:14,400 --> 00:11:16,160
So this is the X2 term here.

155
00:11:16,200 --> 00:11:17,460
This is the two term here.

156
00:11:17,910 --> 00:11:23,160
So this equation here is line of best fit that fits the data on the previous slide.