1
00:00:11,110 --> 00:00:16,870
OK, so in this lecture, we are going to continue our investigation into applying machine learning

2
00:00:17,140 --> 00:00:19,220
to the airline passengers time series.

3
00:00:19,750 --> 00:00:23,900
So this script is going to have the exact same structure as the previous script.

4
00:00:24,360 --> 00:00:27,790
We're going to go through all the same steps in the exact same order.

5
00:00:28,330 --> 00:00:33,470
The only difference, no pun intended, is that in this case, we are going to make the time series

6
00:00:33,490 --> 00:00:35,910
stationary by taking the first difference.

7
00:00:37,270 --> 00:00:42,610
The implication of this is that when we want to compute our forecast, we have to undo this different

8
00:00:42,620 --> 00:00:42,970
saying.

9
00:00:43,660 --> 00:00:48,370
Now, for some people, it might help to look at some math and for others it might help to just look

10
00:00:48,370 --> 00:00:49,080
at the code.

11
00:00:49,390 --> 00:00:50,840
So we're going to do both.

12
00:00:51,280 --> 00:00:54,870
For me personally, I found just going straight to the code, more intuitive.

13
00:00:55,420 --> 00:01:00,760
I didn't write down this math until I realized I had to make this lecture, but I decided to explain

14
00:01:00,760 --> 00:01:04,990
the math, since it really helps to explain why we use those functions in the code.

15
00:01:09,590 --> 00:01:16,280
OK, so suppose we've taken the first difference of our data will say Delta Y of T is just Y of T,

16
00:01:16,280 --> 00:01:17,930
minus Y of T minus one.

17
00:01:19,410 --> 00:01:26,040
Then our different data set is just Delta one, Delta Y two and so forth, this is the data we will

18
00:01:26,040 --> 00:01:28,350
train on because of this.

19
00:01:28,350 --> 00:01:32,350
Our model predictions will also be for this Difference Time series.

20
00:01:32,640 --> 00:01:36,000
So Delta Y have one, Delta Y had two and so forth.

21
00:01:40,650 --> 00:01:46,770
Now, let's suppose we would like to make a train prediction for the original Time series, for instance,

22
00:01:46,770 --> 00:01:53,430
I want to know why hat t my model does not predict why hat t directly, but I do have Delta.

23
00:01:53,430 --> 00:01:54,350
Why hat t?

24
00:01:54,780 --> 00:02:00,420
I know that Delta why hat t is a prediction for Y of T minus Y of T minus one.

25
00:02:02,060 --> 00:02:08,210
In this way, I can conceptually move Y of T minus one to the other side and say that my prediction

26
00:02:08,210 --> 00:02:12,290
for Y of T is Y of T minus one plus Delta Y had.

27
00:02:13,760 --> 00:02:17,360
So this is how I would make a forecast for an N sample prediction.

28
00:02:18,110 --> 00:02:20,360
This is also how to do a one step forecast.

29
00:02:20,990 --> 00:02:27,320
This makes sense because if today is T minus one that I know Y of T minus one, I do not know why of

30
00:02:27,320 --> 00:02:30,760
T which is why I want to make a forecast of this value.

31
00:02:31,520 --> 00:02:37,820
So I use my model to predict Delta Y of T, and then I add that to the current wave T minus one in order

32
00:02:37,820 --> 00:02:39,710
to predict tomorrow's Y of T.

33
00:02:44,440 --> 00:02:50,110
The next question to consider is how do we forecast the multiple times steps into the future beyond

34
00:02:50,110 --> 00:02:51,160
the in sample data?

35
00:02:52,180 --> 00:02:57,940
Well, let's begin with the fact that we know how to make multistep forecasts, either using the incremental

36
00:02:58,120 --> 00:03:03,880
or the multi output method so we know how to obtain a Delta Y hat of big C plus one.

37
00:03:04,120 --> 00:03:06,760
Delta Y had a big C plus two and so forth.

38
00:03:08,170 --> 00:03:13,250
So we have all these deltas, but we need to go back to the original undifferentiated series.

39
00:03:13,900 --> 00:03:15,800
Let's start with the first prediction at time.

40
00:03:15,800 --> 00:03:16,960
That Big T plus one.

41
00:03:17,710 --> 00:03:20,800
In this case, we can follow the same formula as before.

42
00:03:21,490 --> 00:03:25,540
We just add the prediction for the next Delta to the last known value.

43
00:03:26,230 --> 00:03:28,870
But how do we get the forecast for big C plus two?

44
00:03:33,660 --> 00:03:38,090
Well, if we knew why a big T plus one, then we could just follow the usual formula.

45
00:03:38,670 --> 00:03:43,530
But since we do not, then we do the next best thing, which is to plug in our prediction.

46
00:03:43,680 --> 00:03:45,230
Why had of Big C plus one.

47
00:03:45,990 --> 00:03:52,080
However, notice that we can just plug in the previous expression for we had a big C plus one in terms

48
00:03:52,080 --> 00:03:54,270
of the last known value Y of Big T.

49
00:03:58,890 --> 00:04:04,980
And of course, we can repeat this pattern for the forecast at time at Big T plus three in this case,

50
00:04:04,980 --> 00:04:07,700
it's why Big T plus the next three deltas.

51
00:04:08,640 --> 00:04:10,250
OK, so I hope you get the idea.

52
00:04:10,440 --> 00:04:16,710
The forecast is just the cumulative some of the deltas added to the last known value in the Time series.
