1
00:00:11,100 --> 00:00:17,050
So in this video, we will be answering the question, why are states based models something of interest?

2
00:00:17,790 --> 00:00:22,560
Let's start with what the main points are and then the rest of this video will be expanding on these

3
00:00:22,560 --> 00:00:23,190
points.

4
00:00:24,000 --> 00:00:29,640
OK, so one important application of states based models is that they allow us to model a wide variety

5
00:00:29,640 --> 00:00:31,050
of real world systems.

6
00:00:31,440 --> 00:00:37,630
These can be electrical systems, mechanical systems, economic systems and even biological systems.

7
00:00:38,010 --> 00:00:41,760
So basically, anything you care to model can likely be models.

8
00:00:42,720 --> 00:00:48,390
Another interesting point is that states based models can be used for both continuous time and discrete

9
00:00:48,390 --> 00:00:49,440
time systems.

10
00:00:49,890 --> 00:00:55,110
In this video, we'll look at a continuous time system just for fun, even though we don't normally

11
00:00:55,110 --> 00:00:58,200
see such examples in machine learning and data science.

12
00:00:59,100 --> 00:01:04,440
The third point, which is pretty important, is that this representation is the basis for the theory

13
00:01:04,650 --> 00:01:06,560
of how to control these systems.

14
00:01:07,050 --> 00:01:13,290
That is, how can we make these systems do what we want them to do or how can we put the system into

15
00:01:13,290 --> 00:01:14,580
a state of our choice?

16
00:01:19,060 --> 00:01:24,550
OK, so let's start with one surprising fact, which is that states based models can be applied in both

17
00:01:24,550 --> 00:01:31,000
discrete time and continuous time systems, continuous time systems are often more realistic because

18
00:01:31,000 --> 00:01:32,830
they correspond to the laws of physics.

19
00:01:32,980 --> 00:01:38,850
But eventually you need to program these systems into a computer which requires a discrete time representation.

20
00:01:39,670 --> 00:01:44,740
The most common type of states based model is the linear states based model, which looks like this.

21
00:01:45,310 --> 00:01:50,740
The discrete time version says that the state vector at time t depends on the previous state vector

22
00:01:50,740 --> 00:01:54,730
at time T minus one and some control input vector U of T.

23
00:01:55,450 --> 00:02:02,320
The relationship between these vectors is controlled by the matrices and B the output vector Y of T

24
00:02:02,320 --> 00:02:06,900
depends on the state vector and the control input through the matrices C and D..

25
00:02:07,750 --> 00:02:12,070
Now don't worry if this seems a bit abstract, since we'll do an example very shortly.

26
00:02:16,630 --> 00:02:19,910
So here's a linear states based model in continuous time.

27
00:02:20,590 --> 00:02:25,660
Basically, it has the same format as the discrete time version, except that on the left side we have

28
00:02:25,660 --> 00:02:28,620
a time derivative instead of just the next time step.

29
00:02:29,260 --> 00:02:35,410
And again, continuous time systems are useful because they can represent real physical laws of nature.

30
00:02:35,860 --> 00:02:39,980
So any analysis we do on these systems is based on real physics.

31
00:02:40,480 --> 00:02:45,570
This is opposed to discrete time systems which are useful for building computer programs.

32
00:02:50,130 --> 00:02:55,530
OK, so as mentioned earlier, although these equations may seem a bit abstract, they are also very

33
00:02:55,530 --> 00:02:56,130
general.

34
00:02:56,760 --> 00:03:02,010
What we discuss in this lecture can really be applied to any field, whether that's electrical systems,

35
00:03:02,010 --> 00:03:06,460
mechanical systems, economic policy and biological systems.

36
00:03:06,900 --> 00:03:12,930
So, for example, you can model your physiology and determine how to administer a drug to keep your

37
00:03:12,930 --> 00:03:14,290
body in an optimal state.

38
00:03:15,390 --> 00:03:20,460
For this video, we'll be doing a simple example, which only requires a bit of high school physics.

39
00:03:20,790 --> 00:03:22,200
So hopefully it makes sense.

40
00:03:22,200 --> 00:03:24,150
And you recognize these concepts.

41
00:03:29,190 --> 00:03:34,060
OK, so one of the most popular systems we study in physics is the mass in the spring.

42
00:03:34,680 --> 00:03:40,270
So let's suppose we have a mass attached to a spring which is attached to a wall on the other ends.

43
00:03:41,100 --> 00:03:46,590
Now, as you recall, our goal in such a system is to determine all of the forces that can act on each

44
00:03:46,590 --> 00:03:47,180
mass.

45
00:03:47,550 --> 00:03:49,660
In this case, we only have one mass.

46
00:03:50,100 --> 00:03:55,440
So one force on this mass is the input force you, which is the force that we get to apply.

47
00:03:55,950 --> 00:04:00,870
So you can imagine this like being the output of your computer program that gets to control how much

48
00:04:00,870 --> 00:04:02,550
force is applied to the mass.

49
00:04:03,240 --> 00:04:06,220
Now, this is not the only force applied to the mass.

50
00:04:06,870 --> 00:04:11,310
The spring also applies a force whenever the spring is not in a neutral position.

51
00:04:12,060 --> 00:04:17,670
So let's create a variable X, which represents how far away the mass is from its neutral position.

52
00:04:18,330 --> 00:04:24,660
Thus, X equals zero represents no tension on the spring will assume that the positive direction is

53
00:04:24,660 --> 00:04:29,990
left to right, so that when the mass is to the right of this neutral position, then X is positive.

54
00:04:31,050 --> 00:04:33,660
In this case, the force from the spring is minus.

55
00:04:33,660 --> 00:04:36,690
Kayak's work is what we call the spring constant.

56
00:04:37,320 --> 00:04:42,630
This force is a minus sign because when X is positive, the spring pulls the mask to the left.

57
00:04:43,680 --> 00:04:47,790
Alternatively, we can just write X with an arrow going to the left.

58
00:04:48,750 --> 00:04:52,280
Now there is one more force we're going to consider, which is friction.

59
00:04:52,950 --> 00:04:58,950
So let's assume that the mass sits on a table and this friction or damping force is proportional to

60
00:04:58,950 --> 00:04:59,880
the velocity.

61
00:05:00,420 --> 00:05:07,350
We'll call this damping coefficient B, so the force due to friction is minus B times the velocity or

62
00:05:07,350 --> 00:05:08,110
B times X.

63
00:05:09,330 --> 00:05:14,550
This has a negative side because again, if the mass is moving in the direction, this force will be

64
00:05:14,550 --> 00:05:15,730
in the minus direction.

65
00:05:16,440 --> 00:05:20,490
Also, recall that we use a dot as another way of writing the time derivative.

66
00:05:25,200 --> 00:05:31,290
OK, so the next step is to recall Newton's second law of motion, basically, the force on an object

67
00:05:31,290 --> 00:05:34,280
is equal to its mass times, its acceleration.

68
00:05:35,190 --> 00:05:39,420
So on the left side, we have M times X double again.

69
00:05:39,420 --> 00:05:44,490
Since the dot represents the time derivative X double dot represents acceleration.

70
00:05:45,570 --> 00:05:48,730
On the right side, we have the sum of all the forces on the mass.

71
00:05:49,080 --> 00:05:52,480
So that's U minus Kayak's minus Beckstein.

72
00:05:53,700 --> 00:05:56,860
OK, so this is an equation which represents our system.

73
00:05:57,540 --> 00:06:01,710
The next step is to consider how to convert this into space form.

74
00:06:06,300 --> 00:06:12,360
So the trick in these kinds of problems is how we define the state in this case, we're going to define

75
00:06:12,360 --> 00:06:14,820
a new vector containing X and X dot.

76
00:06:15,450 --> 00:06:19,010
So this might be a bit confusing since I've used X from multiple things.

77
00:06:19,260 --> 00:06:24,570
So for the purpose of this derivation only, I'm going to use an underscore to denote the X Factor,

78
00:06:24,750 --> 00:06:29,700
which is the state it contains X and X die for convenience.

79
00:06:29,700 --> 00:06:34,470
We can also think of these as X, one and X to simply the components of the state.

80
00:06:35,820 --> 00:06:40,410
Now one thing we can see right away is that X two is equal to X one died.

81
00:06:41,370 --> 00:06:46,500
The next thing to do is to express our original system in terms of X one, an X two.

82
00:06:47,370 --> 00:06:54,250
When we do that, we get M times X two dot equals U minus one, minus two.

83
00:06:55,290 --> 00:07:00,040
We can also simplify this so that we only have X to dot on the left side.

84
00:07:00,510 --> 00:07:02,340
Basically we just divide everything by.

85
00:07:07,440 --> 00:07:12,780
Now, as you recall, with a state based model, what we would like to have is the derivatives on the

86
00:07:12,780 --> 00:07:15,330
left and the regular variables on the right.

87
00:07:16,080 --> 00:07:20,720
So let's see what happens if we try to put these equations into that format.

88
00:07:22,160 --> 00:07:28,790
Of course, it should be clear that this is a matrix equation on the left side, we have X one dot next

89
00:07:28,790 --> 00:07:31,880
to die, which is the derivative of our state vector.

90
00:07:33,020 --> 00:07:39,620
On the right side, we have some matrix A multiplied by a state vector plus some matrix B multiplied

91
00:07:39,620 --> 00:07:40,190
by our input.

92
00:07:40,190 --> 00:07:43,960
You you can think of you as a one by one matrix.

93
00:07:44,750 --> 00:07:50,570
So as you can see, both Matrices A and B are determined by actual physical quantities.

94
00:07:55,140 --> 00:08:00,380
Now, in the real world, it's often the case that not every state variable can be observed.

95
00:08:00,840 --> 00:08:05,910
So, for example, we might know the position, but not the velocity in this case.

96
00:08:05,910 --> 00:08:11,760
We let the variable y represent our observation, which is just the matrix C times the state vector

97
00:08:11,760 --> 00:08:15,030
X plus another Matrix D times the input.

98
00:08:15,030 --> 00:08:21,930
You clearly this is a pretty trivial addition to what we had before, but we now have a full linear

99
00:08:21,930 --> 00:08:23,640
states based representation.

100
00:08:28,330 --> 00:08:33,640
So this states based representation looks pretty simple, but why do we actually care about this?

101
00:08:34,240 --> 00:08:37,870
Well, the fact is there is a lot we can do with this now.

102
00:08:37,870 --> 00:08:40,810
It requires some linear algebra, but the potential is there.

103
00:08:41,320 --> 00:08:47,310
For example, by looking at the eigenvalues of A, we can determine whether or not the system is stable.

104
00:08:48,340 --> 00:08:54,040
The most important point is there with this framework, we can build programs to control these systems.

105
00:08:54,550 --> 00:09:00,400
So not only can we forecast the next state of the system, given where it is now, we can actually decide

106
00:09:00,400 --> 00:09:01,660
where we want it to go.

107
00:09:02,470 --> 00:09:09,640
What we would like to do is find some matrix K such that if we set the input you to be K times the observation

108
00:09:09,640 --> 00:09:13,260
Y, we will control the system in some desired way.

109
00:09:14,200 --> 00:09:19,900
This makes sense since what we want to do, which is you should be computed from what we observe, which

110
00:09:19,900 --> 00:09:24,400
is why of course the challenges, how do we actually find K.

111
00:09:29,000 --> 00:09:34,760
Now, one interesting fact is that one of the most popular time series models called Arima came from

112
00:09:34,760 --> 00:09:36,470
a book by Box and Jenkins.

113
00:09:36,890 --> 00:09:41,010
This book is really how Arima became as popular as it is today.

114
00:09:41,540 --> 00:09:46,030
The title of this book is Time series Analysis, Forecasting and Control.

115
00:09:46,820 --> 00:09:52,790
So you can see that even at the origin of Arima, it was clear that forecasting is not the only interesting

116
00:09:52,790 --> 00:09:53,660
thing you can do.

117
00:09:54,230 --> 00:09:58,910
Controlling systems, which falls under control theory was always part of this field.

118
00:10:03,650 --> 00:10:06,620
Now, in the real world, we know that systems have noise.

119
00:10:07,070 --> 00:10:11,300
So what happens when we add noise to our linear states based model?

120
00:10:12,110 --> 00:10:16,250
In these situations, we often want to answer the question, what is the state?

121
00:10:16,280 --> 00:10:17,720
Given our observation?

122
00:10:18,320 --> 00:10:25,010
One common example of this is why might represent sensor readings like the GPS on your car X, which

123
00:10:25,010 --> 00:10:28,420
is the state might represent the true location of your car.

124
00:10:29,660 --> 00:10:34,970
In this case, there is a popular algorithm called the Cowman Filter, which finds the optimal gas for

125
00:10:34,970 --> 00:10:35,630
X.

126
00:10:36,530 --> 00:10:39,200
And keep in mind, this is a sequential model.

127
00:10:39,200 --> 00:10:45,500
So this optimal gas doesn't only depend on the current Y, it depends on all the previous Y's from time

128
00:10:45,500 --> 00:10:52,100
one up to T using this model, we can also forecast the next states at times plus one or two, plus

129
00:10:52,100 --> 00:10:53,360
two and so forth.

130
00:10:55,160 --> 00:11:00,560
OK, so I hope this video helps you see why the states face representation is useful and how it can

131
00:11:00,560 --> 00:11:02,420
be applied in different scenarios.

132
00:11:02,900 --> 00:11:09,350
It really is a universal tool which is studied by both engineers and statisticians and can be applied

133
00:11:09,350 --> 00:11:13,670
to pretty much any field, including biology, economics and business.

134
00:11:14,570 --> 00:11:16,850
Thanks for listening and I'll see you in the next video.