WEBVTT

00:01.050 --> 00:02.100
-: Hello and welcome back to the course

00:02.100 --> 00:03.780
on Artificial Intelligence.

00:03.780 --> 00:06.930
And today we're talking about Markov decision processes

00:06.930 --> 00:08.730
or MDPs.

00:08.730 --> 00:11.400
Let's have a look what we've got today.

00:11.400 --> 00:14.100
So last time, we stopped on the concept of a map.

00:14.100 --> 00:16.140
So because we've calculated the values

00:16.140 --> 00:17.910
based on the Bellman equation,

00:17.910 --> 00:21.210
we can derive this map for our agent on this maze,

00:21.210 --> 00:25.650
and basically what that means is wherever the agent starts,

00:25.650 --> 00:27.540
so let's say it starts over there,

00:27.540 --> 00:29.610
it knows exactly which steps to take

00:29.610 --> 00:30.870
in order to get to the finish line.

00:30.870 --> 00:35.070
So it just goes up, up, right, right, and done.

00:35.070 --> 00:37.560
And so the question here is, is that it?

00:37.560 --> 00:39.780
Is it really that simple?

00:39.780 --> 00:42.810
Is reinforcement learning really that, you know,

00:42.810 --> 00:44.760
for the lack of a better word, boring?

00:44.760 --> 00:46.410
It's...

00:46.410 --> 00:47.490
Once you have the map, that's it.

00:47.490 --> 00:51.090
All you have to do is, you're done, you just follow the map.

00:51.090 --> 00:55.500
Well, the reality is that it's not actually that all simple.

00:55.500 --> 00:56.333
And that's a good thing,

00:56.333 --> 00:58.260
because it makes this course more interesting for us

00:58.260 --> 01:02.580
and we can actually solve much more complex problems.

01:02.580 --> 01:05.490
So this is where a Markov process is coming.

01:05.490 --> 01:07.740
But first we're going to talk about two things,

01:07.740 --> 01:09.660
we're gonna talk about deterministic search

01:09.660 --> 01:11.670
versus non-deterministic search.

01:11.670 --> 01:14.790
So let's talk about the concept of deterministic search.

01:14.790 --> 01:16.680
This is our agent in the maze,

01:16.680 --> 01:18.450
and deterministic search means

01:18.450 --> 01:21.330
that if the agent decides to go up,

01:21.330 --> 01:23.340
then what will happen is,

01:23.340 --> 01:27.000
with a hundred percent probability, it will go up.

01:27.000 --> 01:29.700
That's exactly what will happen. There's no other options.

01:29.700 --> 01:33.690
Once it says go up or clicks the up arrow, it'll go up.

01:33.690 --> 01:35.220
There's no other options.

01:35.220 --> 01:38.790
Now, on the other hand, non-deterministic search

01:38.790 --> 01:42.180
is when our agent says it wants to go up,

01:42.180 --> 01:44.430
they are actually couple of options.

01:44.430 --> 01:46.740
For example, there could be three options,

01:46.740 --> 01:47.940
and we're going to be looking at an example

01:47.940 --> 01:48.840
where there are three options,

01:48.840 --> 01:50.280
but it doesn't have to be limited to three,

01:50.280 --> 01:52.042
it could be four or it could be, you know,

01:52.042 --> 01:54.480
it depends on the problem.

01:54.480 --> 01:56.160
The randomness could be different

01:56.160 --> 01:58.260
but, in our case, there could be three options.

01:58.260 --> 02:01.860
With an 80% chance he does go up.

02:01.860 --> 02:05.520
But then with a 10% chance when he wants to go up,

02:05.520 --> 02:06.840
he'll actually go to the left,

02:06.840 --> 02:09.360
just because, because that's how the environment works,

02:09.360 --> 02:11.430
that's the world that he lives in.

02:11.430 --> 02:14.850
And with another 10% chance, he'll actually go right,

02:14.850 --> 02:17.850
and, in this case, he'll fall into the fire pit.

02:17.850 --> 02:20.760
So that is how it all works.

02:20.760 --> 02:23.550
That's an example of a non-deterministic search

02:23.550 --> 02:24.900
a stochastic process.

02:24.900 --> 02:29.900
And what the point of this is to make a more realistic model

02:30.390 --> 02:34.140
of what could actually happen in a real world,

02:34.140 --> 02:36.540
in a real-world type of problem,

02:36.540 --> 02:39.210
because very rarely do you get situations like this

02:39.210 --> 02:41.460
when you do something and it happens exactly that way.

02:41.460 --> 02:43.620
And even if you think about it in terms of games,

02:43.620 --> 02:46.740
let's say, you've got an agent playing Pac-Man,

02:46.740 --> 02:49.110
well, not always is it the case

02:49.110 --> 02:51.300
that if he's standing in this square, he goes up,

02:51.300 --> 02:54.269
he will get the same exact result every time.

02:54.269 --> 02:55.890
He will indeed go up,

02:55.890 --> 02:59.160
but it may be, in one case, he won't get eaten by a ghost,

02:59.160 --> 03:01.560
in another case, he will get eaten by a ghost.

03:01.560 --> 03:03.600
So as you can see, there's some randomness to it

03:03.600 --> 03:05.820
because it depends on how the ghosts are moving

03:05.820 --> 03:07.350
and they don't always move the same way,

03:07.350 --> 03:09.480
they don't always start in the same locations.

03:09.480 --> 03:12.180
So it is very logical, it's very...

03:12.180 --> 03:14.340
It is fair that there is some randomness.

03:14.340 --> 03:16.860
There's something that is not under the control

03:16.860 --> 03:17.730
of the agent.

03:17.730 --> 03:20.790
And this is just a way for us to represent that

03:20.790 --> 03:23.100
in order for us to learn how we can deal with it

03:23.100 --> 03:25.860
and how that affects the Bellman equation,

03:25.860 --> 03:29.070
how it affects the whole reinforcement learning process.

03:29.070 --> 03:31.290
But at the same time, the randomness is, of course,

03:31.290 --> 03:32.347
not limited to if you go up,

03:32.347 --> 03:34.290
there's a 10% chance you'll go right,

03:34.290 --> 03:35.730
or 10% chance you go left,

03:35.730 --> 03:36.630
or if you go down,

03:36.630 --> 03:38.400
there's a 10% chance you go right or left,

03:38.400 --> 03:40.680
or if you go right, there's a 10% chance you go up or down.

03:40.680 --> 03:42.960
It's not limited to where you're going to end up.

03:42.960 --> 03:44.520
Sometimes you might have a problem

03:44.520 --> 03:45.600
that is exactly like...

03:45.600 --> 03:47.400
Sometimes the probabilities might be different.

03:47.400 --> 03:51.150
Sometimes the randomness might boil down to something else.

03:51.150 --> 03:53.180
It might be boiled down, like in that example of Pac-Man

03:53.180 --> 03:55.770
of the ghosts eating you or not eating you,

03:55.770 --> 03:58.920
or it might boil down to something different,

03:58.920 --> 04:01.650
for instance, like there's...

04:01.650 --> 04:03.300
if the agent is playing Doom

04:03.300 --> 04:06.360
and then there's something like a monster

04:06.360 --> 04:08.430
which is going to shoot him in one case,

04:08.430 --> 04:11.070
and another came, there's a probability

04:11.070 --> 04:12.480
with which it will get shot

04:12.480 --> 04:14.940
and with which it won't get shot, and so on.

04:14.940 --> 04:17.670
So something that is out of the control of the agent,

04:17.670 --> 04:19.680
something that it cannot predict,

04:19.680 --> 04:21.150
that's what we are modeling here

04:21.150 --> 04:23.070
in non-deterministic search.

04:23.070 --> 04:25.920
And this is where we have directly approached

04:25.920 --> 04:30.360
two new concepts, Markov processes, a Markov process,

04:30.360 --> 04:32.850
and a Markov decision process.

04:32.850 --> 04:34.140
So let's have a look at these.

04:34.140 --> 04:38.100
And you know how much I don't like to put definitions

04:38.100 --> 04:39.120
and lots of texts on the slides,

04:39.120 --> 04:42.300
but in this case, it is necessary for us to go through them.

04:42.300 --> 04:43.380
So let's have a look.

04:43.380 --> 04:46.200
A stochastic process has the Markov property

04:46.200 --> 04:48.750
if the conditional probability distribution

04:48.750 --> 04:50.760
of future states of the process,

04:50.760 --> 04:52.800
conditional on both past and present states,

04:52.800 --> 04:55.020
depends only upon the present state,

04:55.020 --> 04:58.230
not on the sequence of events that preceded it.

04:58.230 --> 05:01.020
A process with this property is called a Markov process.

05:01.020 --> 05:04.560
Very complex definition and it kind of like... (stutters)

05:04.560 --> 05:06.360
A little bit, not contradicts itself,

05:06.360 --> 05:07.920
but feels like it contradicts itself.

05:07.920 --> 05:08.963
So here it says, "Conditional both on the past

05:08.963 --> 05:10.560
and the present states.

05:10.560 --> 05:12.210
Depends only upon, but at the same time,

05:12.210 --> 05:14.490
it only depends upon the present state."

05:14.490 --> 05:17.460
So don't get too bogged down in that.

05:17.460 --> 05:19.380
I'll break it down in simple terms.

05:19.380 --> 05:23.070
So a Markov property is when your future states,

05:23.070 --> 05:25.320
so not just your choice, but the whole thing,

05:25.320 --> 05:27.230
your choice and the environment,

05:27.230 --> 05:30.690
it will only, like, the results of the action

05:30.690 --> 05:32.220
you take in that environment

05:32.220 --> 05:33.900
will only depend on where you are now,

05:33.900 --> 05:36.060
it will not depend on how you got there.

05:36.060 --> 05:37.410
And that's it. So that's a Markov property.

05:37.410 --> 05:39.390
And a process which has this property

05:39.390 --> 05:40.860
is called a Markov process.

05:40.860 --> 05:42.960
So in, to put it into an example,

05:42.960 --> 05:44.730
so if your agent is here

05:44.730 --> 05:47.190
and if he goes, if he decides to go up,

05:47.190 --> 05:49.590
he might go, in our case,

05:49.590 --> 05:51.270
in our non-deterministic search example,

05:51.270 --> 05:53.670
he actually might go left and right, or the right,

05:53.670 --> 05:56.400
that's because we have that stochasticity

05:56.400 --> 05:57.600
inside our environment,

05:57.600 --> 05:59.790
we have that randomness inside our environment.

05:59.790 --> 06:01.830
So any one of these three might happen,

06:01.830 --> 06:05.190
but the key here is that this is a Markov process

06:05.190 --> 06:07.230
because we don't care how he got here.

06:07.230 --> 06:09.030
He could have come from the top, ended up here,

06:09.030 --> 06:10.490
he could have come from the left, ended up here,

06:10.490 --> 06:12.390
he could have come from the bottom, ended up here,

06:12.390 --> 06:13.380
he could have like play,

06:13.380 --> 06:15.660
moved around here like 100 thousand times

06:15.660 --> 06:16.680
and then got here.

06:16.680 --> 06:18.810
It does not matter what happened before.

06:18.810 --> 06:22.500
Only what matters is which state is he in now.

06:22.500 --> 06:27.500
And so the probabilities of going left or right, or up,

06:27.750 --> 06:32.670
they will always be the same if he's in this state now.

06:32.670 --> 06:35.370
And so that's basically just saying

06:35.370 --> 06:36.510
doesn't matter what happened before,

06:36.510 --> 06:39.180
we are here now this is the state you're in.

06:39.180 --> 06:41.700
And don't forget that state doesn't just mean

06:41.700 --> 06:42.533
where he's standing.

06:42.533 --> 06:45.960
The state is the state of the whole of the agent

06:45.960 --> 06:46.793
in the environment.

06:46.793 --> 06:48.690
So is there like monsters on the right

06:48.690 --> 06:49.980
or are there monsters on the left,

06:49.980 --> 06:51.803
or, you know, is the ghost coming from the top

06:51.803 --> 06:52.740
or from the bottom?

06:52.740 --> 06:54.540
Whatever state you're in now,

06:54.540 --> 06:55.590
doesn't matter how you got there,

06:55.590 --> 06:57.600
doesn't matter how it all came to be

06:57.600 --> 06:59.640
that you are there in that state now.

06:59.640 --> 07:00.600
What will happen in the future

07:00.600 --> 07:03.000
is only determined by the state you're in now,

07:03.000 --> 07:04.500
plus the actions you will take,

07:04.500 --> 07:06.420
then plus, of course, the randomness that is overlaid

07:06.420 --> 07:07.440
on top of that.

07:07.440 --> 07:08.910
So that's a Markov process.

07:08.910 --> 07:12.480
And a Markov decision process or a MDP,

07:12.480 --> 07:14.370
or Markov decision processes,

07:14.370 --> 07:16.140
provide a mathematical framework

07:16.140 --> 07:17.820
for modeling decision making

07:17.820 --> 07:20.880
in situations where outcomes are partly random

07:20.880 --> 07:23.550
and partly under control of a decision maker.

07:23.550 --> 07:27.210
So, important to understand, that Markov decision processes

07:27.210 --> 07:32.210
are different hold concept to a Markov process.

07:32.280 --> 07:34.980
They're kind of like a mathematical framework, so...

07:34.980 --> 07:36.540
But at the same time I thought it was important for us

07:36.540 --> 07:38.850
to understand what a Markov process is

07:38.850 --> 07:41.490
because I think it still helps in understanding

07:41.490 --> 07:43.170
of a Markov decision process.

07:43.170 --> 07:45.038
So a Markov decision process

07:45.038 --> 07:48.300
is exactly what we've been discussing up 'til now.

07:48.300 --> 07:50.580
So that the agent lives in this environment

07:50.580 --> 07:52.140
where it has control,

07:52.140 --> 07:53.010
like remember previously,

07:53.010 --> 07:55.080
it had full control of the of what's going on,

07:55.080 --> 07:57.570
but now it has a little bit less control.

07:57.570 --> 08:00.187
It can decide to go up, but it actually knows,

08:00.187 --> 08:02.977
"Okay, so if I go up, there's an 8% chance, I'll go up,

08:02.977 --> 08:04.890
there's a 10% chance on the left,

08:04.890 --> 08:06.210
10% chance I'll go right."

08:06.210 --> 08:08.940
So not everything is fully under its control.

08:08.940 --> 08:10.680
There is some randomness in this environment

08:10.680 --> 08:13.170
and that's exactly what a Markov decision process is.

08:13.170 --> 08:15.420
A Markov decision process is the framework

08:15.420 --> 08:18.360
that the agent will use in order to understand

08:18.360 --> 08:19.440
what to do in this environment.

08:19.440 --> 08:21.450
So we've got an environment with some stochasticity,

08:21.450 --> 08:23.850
some randomness, and now the agent has to choose,

08:23.850 --> 08:27.390
for instance, should go up, down, left, or right.

08:27.390 --> 08:30.120
It has to make that decision. It doesn't know what to do.

08:30.120 --> 08:31.500
And in order to make that decision,

08:31.500 --> 08:34.230
it's going to apply a framework,

08:34.230 --> 08:36.900
it's going to be using a Markov decision process

08:36.900 --> 08:38.040
in order to make that decision,

08:38.040 --> 08:40.890
what's going to happen, where it's going to go.

08:40.890 --> 08:44.490
And so basically this environment that poses this problem,

08:44.490 --> 08:47.640
it is referred to the Markov decision process.

08:47.640 --> 08:50.010
So it's the framework that agent using,

08:50.010 --> 08:51.900
at the same time the environment is referred

08:51.900 --> 08:53.490
to that the agent is operating

08:53.490 --> 08:56.310
in a Markov decision process environment.

08:56.310 --> 08:58.020
And so basically, here we've got two concepts.

08:58.020 --> 08:59.370
We've got the Markov process,

08:59.370 --> 09:01.800
is the way this environment is designed

09:01.800 --> 09:03.050
that the part of the...

09:03.960 --> 09:05.460
What happens from where you are now

09:05.460 --> 09:07.943
doesn't depend on the past. And then at the same time,

09:07.943 --> 09:09.540
we've got the Markov decision process,

09:09.540 --> 09:11.790
is the framework that the agent is going to be using

09:11.790 --> 09:13.920
in order to solve this environment.

09:13.920 --> 09:16.530
And the good news is that the Markov decision process

09:16.530 --> 09:18.540
or that framework that we're talking about

09:18.540 --> 09:22.200
is actually just an add-on to our Bellman equation,

09:22.200 --> 09:24.780
is the Bellman equation but just a bit more sophisticated.

09:24.780 --> 09:27.060
So let's have a look at that.

09:27.060 --> 09:28.980
This is our Bellman equation so far.

09:28.980 --> 09:31.050
It's the maximum of all possible actions.

09:31.050 --> 09:32.550
So the value of being in a state

09:32.550 --> 09:34.110
is the maximum of all possible actions

09:34.110 --> 09:36.240
that you can take from that state.

09:36.240 --> 09:39.090
The maximum was taken from the reward that you would get

09:39.090 --> 09:41.160
by taking that action in that state,

09:41.160 --> 09:44.370
plus a discount factor times the value of the next state,

09:44.370 --> 09:45.390
which is S prime.

09:45.390 --> 09:47.370
So that's what we've had so far.

09:47.370 --> 09:50.634
Now because we have some randomness in our whole process,

09:50.634 --> 09:51.930
this part will change

09:51.930 --> 09:54.200
because we don't actually know which state we'll end up in,

09:54.200 --> 09:56.130
we don't know what S prime will be.

09:56.130 --> 09:57.900
Will it be, if we're going up,

09:57.900 --> 09:59.880
will it be up or will be left, or will be right?

09:59.880 --> 10:02.070
So we actually have to place this

10:02.070 --> 10:04.950
with the expected value of the next state.

10:04.950 --> 10:06.420
So here, we're going to replace this,

10:06.420 --> 10:08.820
so there's three possible states we can end up in.

10:08.820 --> 10:12.480
And so we're going to replace that with some value.

10:12.480 --> 10:15.480
That state has a value of S one prime,

10:15.480 --> 10:18.450
that state has a V of S prime two, S two prime,

10:18.450 --> 10:22.620
and this state has a value of V of S three prime.

10:22.620 --> 10:26.070
So now we're going to multiply the state

10:26.070 --> 10:28.590
that we actually are intending to go into by 80%,

10:28.590 --> 10:31.140
because that's our probability of getting into that state,

10:31.140 --> 10:33.570
plus the probability of getting into this state,

10:33.570 --> 10:35.370
10% plus probability of getting in this state.

10:35.370 --> 10:38.040
So this is just our expected value.

10:38.040 --> 10:41.280
So if, from statistics, we take the expected value

10:41.280 --> 10:45.840
of the state that we'll get into,

10:45.840 --> 10:47.280
so we're kind of like the average,

10:47.280 --> 10:49.650
what's the average of what we'll get.

10:49.650 --> 10:51.990
And then we replace that over here

10:51.990 --> 10:52.950
then we get this equation.

10:52.950 --> 10:54.480
Now it jumps very quickly

10:54.480 --> 10:55.680
just because this equation is bigger,

10:55.680 --> 10:56.700
but if you look at it carefully,

10:56.700 --> 10:57.930
you'll see it's exactly the same thing.

10:57.930 --> 10:59.970
So you've got max here, you've got max here,

10:59.970 --> 11:04.470
then you've got R of SNA, you've got R of SNA.

11:04.470 --> 11:06.360
Here, you've got gamma, you've got gamma.

11:06.360 --> 11:08.640
And then finally, here you've got V,

11:08.640 --> 11:11.700
so you knew exactly it was a deterministic search,

11:11.700 --> 11:13.620
you knew which state you'll get into,

11:13.620 --> 11:15.090
now you don't know which state you'll get into,

11:15.090 --> 11:16.080
since instead of taking V,

11:16.080 --> 11:18.270
you're taking the expected value

11:18.270 --> 11:19.590
of the state you'll get into,

11:19.590 --> 11:23.520
or of the future state, or, just in simpler terms,

11:23.520 --> 11:26.040
you're just taking the average of what you'll get into.

11:26.040 --> 11:30.090
So, you know, if like it was a in a 33% chance,

11:30.090 --> 11:31.590
and it'll be, like, this, plus this, plus this,

11:31.590 --> 11:32.910
divided by three basically.

11:32.910 --> 11:37.110
But in this case, it's not exactly, like, average, average.

11:37.110 --> 11:38.460
It's a weighted average

11:38.460 --> 11:40.410
because of your probabilities here.

11:40.410 --> 11:42.090
So here you've got the probability

11:42.090 --> 11:44.708
of, when you're in this state, you take this action,

11:44.708 --> 11:47.280
of getting into state S prime

11:47.280 --> 11:49.980
times the value of S prime and summed across all S primes

11:49.980 --> 11:51.840
that you could possibly get into over here.

11:51.840 --> 11:53.340
So exactly what we had, three,

11:53.340 --> 11:54.840
here, one, two, three,

11:54.840 --> 11:56.490
add them up, multiply by probabilities,

11:56.490 --> 11:58.020
and add them up, same here.

11:58.020 --> 12:00.600
One, two, three, multiply them by probabilities,

12:00.600 --> 12:02.070
and add them up.

12:02.070 --> 12:05.190
And that is your new Bellman equation.

12:05.190 --> 12:06.450
Congratulations.

12:06.450 --> 12:09.120
This is what we are going to be working with, going forward.

12:09.120 --> 12:12.150
And that is the framework that is used

12:12.150 --> 12:13.620
in Markov decision processes.

12:13.620 --> 12:16.500
So that is the framework that solves this...

12:16.500 --> 12:19.680
that agents use to solve this whole, stochastic,

12:19.680 --> 12:21.600
non-deterministic search problem

12:21.600 --> 12:24.060
where there's random events that are happening

12:24.060 --> 12:25.470
that they cannot control.

12:25.470 --> 12:26.940
So it's much more complex,

12:26.940 --> 12:30.270
but, as you can see, because we built up slowly to it,

12:30.270 --> 12:32.400
now we already know about this,

12:32.400 --> 12:35.134
we already know about this, we already know about this,

12:35.134 --> 12:36.810
we know about this, we know about this.

12:36.810 --> 12:38.340
So all we did is we just introduced

12:38.340 --> 12:39.600
this part over here

12:39.600 --> 12:43.710
because there are probabilities involved in the action

12:43.710 --> 12:46.230
or the consequences of your action.

12:46.230 --> 12:47.100
And on deterministic,

12:47.100 --> 12:49.200
they are based on certain probabilities.

12:49.200 --> 12:50.580
And so, there we go.

12:50.580 --> 12:54.870
That's how a Markov decision process works

12:54.870 --> 12:58.350
and the underlying equation behind it.

12:58.350 --> 12:59.700
Once again, it is something

12:59.700 --> 13:03.900
that is more closely resembles real-world problems

13:03.900 --> 13:06.330
real world scenarios, or even game scenarios,

13:06.330 --> 13:08.730
because not everything is straightforward.

13:08.730 --> 13:11.730
There is some randomness involved

13:11.730 --> 13:16.347
and not always will take in an action in a certain state,

13:16.347 --> 13:18.840
not always will it lead to the same outcome.

13:18.840 --> 13:20.220
And so this is what

13:20.220 --> 13:21.600
we're going to be dealing with going forward,

13:21.600 --> 13:24.360
and that's gonna make things way more interesting.

13:24.360 --> 13:26.730
So hopefully, you're excited for that

13:26.730 --> 13:29.670
and excited to see what's going to come next.

13:29.670 --> 13:33.600
And in the meantime, I found a really cool paper for you

13:33.600 --> 13:35.250
to have a look at.

13:35.250 --> 13:37.440
This time, it's a very applied paper.

13:37.440 --> 13:40.140
So this one's actually really interesting to read through.

13:40.140 --> 13:41.790
It's called "A Survey of Applications

13:41.790 --> 13:44.448
of Markov Decision Processes,"

13:44.448 --> 13:48.000
and it was written by White in 1993.

13:48.000 --> 13:51.240
There's link and it'll show you examples

13:51.240 --> 13:53.580
of where Markov decision processes

13:53.580 --> 13:57.006
actually are used to model real-life scenarios.

13:57.006 --> 13:59.580
I was very excited by this,

13:59.580 --> 14:01.050
I was impressed by some examples.

14:01.050 --> 14:03.750
So population harvesting, for instance.

14:03.750 --> 14:05.940
So, let's say, you have some fish

14:05.940 --> 14:08.070
and you know what the population of the fish is,

14:08.070 --> 14:09.600
you need to decide how many fish

14:09.600 --> 14:13.290
can we fish out this year and what?

14:13.290 --> 14:14.340
So that's your current state,

14:14.340 --> 14:15.630
that's the action that you're taking.

14:15.630 --> 14:17.160
How many can we fish out at this year

14:17.160 --> 14:20.550
so what are the possible outcomes of that?

14:20.550 --> 14:22.140
How many fish will we have next year?

14:22.140 --> 14:23.790
How many fish will we have the year after

14:23.790 --> 14:25.110
and the year after, and so on?

14:25.110 --> 14:26.520
And it's not deterministic

14:26.520 --> 14:28.260
because it's not like if you take out,

14:28.260 --> 14:30.240
I don't know, 90% of the population,

14:30.240 --> 14:32.880
then next year, you will have, you know, back to a 100%.

14:32.880 --> 14:34.650
It's not exactly deterministic.

14:34.650 --> 14:36.270
They are certain random factors involved

14:36.270 --> 14:37.740
which are out of our control,

14:37.740 --> 14:39.693
and therefore we have to understand

14:39.693 --> 14:41.340
what's going to happen,

14:41.340 --> 14:42.660
we have to model what's going to happen.

14:42.660 --> 14:44.910
That's where a Markov decision process is used.

14:44.910 --> 14:46.710
Agriculture, there's an example,

14:46.710 --> 14:48.270
like, same thing, harvesting crops.

14:48.270 --> 14:49.440
How much crops do we harvest?

14:49.440 --> 14:51.450
How much do we not harvest?

14:51.450 --> 14:54.720
Another one, which I looked at, finance and investment,

14:54.720 --> 14:57.780
like an insurance company needs to decide

14:57.780 --> 14:59.520
how much of its funds it'll invest

14:59.520 --> 15:03.120
in any given, I think, day or year or some period of time.

15:03.120 --> 15:06.480
And there are certain factors are out of its control,

15:06.480 --> 15:08.070
For instance, you know, the market movements,

15:08.070 --> 15:09.300
it doesn't know what can happen,

15:09.300 --> 15:11.970
so it needs to actually model that somehow,

15:11.970 --> 15:14.340
and a Markov decision process is used for that.

15:14.340 --> 15:16.890
So here, you can see lots and lots of examples,

15:16.890 --> 15:19.470
and this is the number of examples given, I think,

15:19.470 --> 15:20.610
for each one.

15:20.610 --> 15:22.440
And so, yeah, even sports.

15:22.440 --> 15:24.660
Two examples for sports and epidemics,

15:24.660 --> 15:28.050
and modern insurance claims, inspections,

15:28.050 --> 15:30.090
and maintenance and repair, and so, and so.

15:30.090 --> 15:31.020
Very interesting.

15:31.020 --> 15:32.820
Have a look at that, just to give you

15:32.820 --> 15:36.000
an understanding of, "Hey,

15:36.000 --> 15:39.690
this is not just all made up stuff, hypothetical,

15:39.690 --> 15:41.100
the Matrix-type of thing.

15:41.100 --> 15:42.600
This is actually a real-world scenario."

15:42.600 --> 15:44.760
So it'll give you a better understanding

15:44.760 --> 15:46.080
and this is what we talked about

15:46.080 --> 15:48.600
in the promotional video for this course that,

15:48.600 --> 15:49.590
or the description of the course,

15:49.590 --> 15:52.800
that we're going to inspire you and your intuition

15:52.800 --> 15:55.890
to give you ideas for how to use AI in real life.

15:55.890 --> 15:57.720
This is your opportunity.

15:57.720 --> 15:59.767
Look at this paper to understand,

15:59.767 --> 16:00.870
"Okay, so we're gonna be dealing

16:00.870 --> 16:02.880
with Markov decision processes going forward.

16:02.880 --> 16:05.280
That's really cool. What do they look like in real life?"

16:05.280 --> 16:07.470
And this possibly could trigger some ideas for you

16:07.470 --> 16:09.690
how you could apply AI in the future

16:09.690 --> 16:11.730
to make the world a better place.

16:11.730 --> 16:13.740
And we'd be super happy about that.

16:13.740 --> 16:16.260
We'd be super happy if you could use

16:16.260 --> 16:17.130
what you learn in this course

16:17.130 --> 16:18.750
to make the world a better place with AI.

16:18.750 --> 16:20.370
How fantastic would that be?

16:20.370 --> 16:23.160
So, on that note, I hope you enjoyed today's tutorial.

16:23.160 --> 16:24.600
I look forward to seeing you next time.

16:24.600 --> 16:26.553
And until then, enjoy AI.