1
00:00:11,100 --> 00:00:17,100
So in this video, we will learn how to apply machine learning, two time series forecasting, we'll

2
00:00:17,100 --> 00:00:21,370
use the airline passengers data, which has been our standard benchmark throughout this course.

3
00:00:22,260 --> 00:00:26,190
So the main idea is that we're going to have two versions of this script.

4
00:00:26,580 --> 00:00:28,890
In this version, we will not use different saying.

5
00:00:29,370 --> 00:00:34,000
We already know why this is suboptimal, but we're going to do it anyway just to see if it will work.

6
00:00:34,500 --> 00:00:39,570
This will also help us get familiar with the basic structure of this code without having to worry about

7
00:00:39,570 --> 00:00:39,870
different.

8
00:00:40,950 --> 00:00:44,730
Remember that when you difference your data, you make it stationary, which is good.

9
00:00:45,390 --> 00:00:50,510
But when you want to make a forecast, you then have to undo this different thing, which is non-trivial.

10
00:00:51,870 --> 00:00:57,150
Now, libraries like stats, models do all that work for you, but because we're now using generic machine

11
00:00:57,150 --> 00:00:59,440
learning methods, we'll need to do it ourselves.

12
00:00:59,940 --> 00:01:05,250
So in order to introduce only one concept at a time, we'll save defensing for the next script.

13
00:01:05,700 --> 00:01:09,350
In this script, we'll test the belief that machine learning can extrapolate.

14
00:01:09,360 --> 00:01:14,610
Well, we kind of already know it can't, but this will help us to confirm what we saw before.

15
00:01:16,140 --> 00:01:19,500
So let's begin by importing by pendas and matplotlib.

16
00:01:23,390 --> 00:01:27,740
The next step is to update Saikat, learn so that we can import the metric.

17
00:01:34,200 --> 00:01:36,120
The next step is to download our data.

18
00:01:40,030 --> 00:01:43,480
The next step is to load in our data using PDA, reads GSV.

19
00:01:47,330 --> 00:01:49,640
The next step is to compute the log, transform.

20
00:01:52,830 --> 00:01:55,800
The next step is to split our data into train and test.

21
00:01:59,580 --> 00:02:06,090
OK, so the next part is new in this block of code, we will learn how to convert a time series into

22
00:02:06,090 --> 00:02:07,870
a supervised machine learning data set.

23
00:02:08,610 --> 00:02:11,920
We'll start by converting our log passengers into an empire.

24
00:02:11,920 --> 00:02:15,120
A no race are a little easier to index.

25
00:02:18,280 --> 00:02:24,460
The next step is to set the number of legs in our model, which is called Big T. We also create empty

26
00:02:24,460 --> 00:02:27,070
lists to store inputs and targets.

27
00:02:28,450 --> 00:02:30,630
The next step is to loop through the Time series.

28
00:02:31,150 --> 00:02:37,450
Note that we only go up to online series minus Bugti and encourage you to print out the final data to

29
00:02:37,450 --> 00:02:38,650
ensure this is correct.

30
00:02:39,670 --> 00:02:46,450
OK, so inside the loop we grab the time series from Index Little T up to little T plus big T.

31
00:02:47,110 --> 00:02:50,350
This is our input, which is a time series of size, a big T.

32
00:02:50,980 --> 00:02:54,190
Once we have this input, we append this to our list of inputs.

33
00:02:55,540 --> 00:02:59,800
The next step is to compute the target, which is the next value of the Time series.

34
00:03:00,340 --> 00:03:03,690
Now you might be wondering why is it still little T plus big T?

35
00:03:04,300 --> 00:03:10,860
So remember that when you index a range of values in Python, the last index is exclusive, not inclusive.

36
00:03:11,380 --> 00:03:14,830
So the last data point in the input is not the same as the target.

37
00:03:15,970 --> 00:03:19,300
OK, so that's the target which we append to our list of targets.

38
00:03:21,540 --> 00:03:26,940
Once we are outside the loop, we convert both the inputs and targets into no higher raise, which makes

39
00:03:26,940 --> 00:03:28,290
them easier to index.

40
00:03:33,430 --> 00:03:37,150
The next step is to split our input targets into train and test.

41
00:03:40,970 --> 00:03:43,040
The next step is to fit a linear regression.

42
00:03:46,180 --> 00:03:51,040
As you can see, we get an R-squared of about zero point nine six, which seems pretty good.

43
00:03:55,240 --> 00:03:59,320
But the test, R-squared is only zero point six nine, which is not that good.

44
00:04:03,040 --> 00:04:09,580
The next step is to create boolean arrays, to index both the train and testbeds note that because we

45
00:04:09,580 --> 00:04:15,280
are using big T lag's in our model, we won't be able to make predictions for the first big T values.

46
00:04:15,640 --> 00:04:19,530
So we'll set the first big T values of Traini reacts to false.

47
00:04:20,440 --> 00:04:25,390
This kind of thing should give you a better idea of how Stat's models works behind the scenes, since

48
00:04:25,390 --> 00:04:27,570
it has to contend with the same details.

49
00:04:32,130 --> 00:04:35,870
The next step is to assign our one step forecasts to the data frame.

50
00:04:40,110 --> 00:04:42,600
The next step is to plot our one step forecast.

51
00:04:47,970 --> 00:04:50,880
Notice how our model seems to underestimate the peaks.

52
00:04:54,540 --> 00:04:57,430
The next step is to compute a multi-step forecast.

53
00:04:58,260 --> 00:05:02,130
So remember that stat's models does not compute a one step forecast.

54
00:05:02,310 --> 00:05:04,690
It only computes a multi-step forecast.

55
00:05:05,190 --> 00:05:09,730
So this code is closer to what we were doing before when we looked at Arima assets.

56
00:05:11,130 --> 00:05:14,940
OK, so we'll start by creating an empty list to store our predictions.

57
00:05:15,600 --> 00:05:21,120
The first step is to obtain the first test input, which is just X test indexed at zero.

58
00:05:21,900 --> 00:05:26,990
Well, assign this to a variable called Last X since this will be updated as we go through the loop.

59
00:05:27,870 --> 00:05:33,030
The next step is to enter a loop that will continue as long as our list of predictions is shorter than

60
00:05:33,030 --> 00:05:33,840
and test.

61
00:05:36,410 --> 00:05:39,740
Inside the loop, we'll call the predicate function to get a prediction.

62
00:05:40,580 --> 00:05:46,850
Note that we have to reshape our last x ray because Saikat learn only accepts tuti arrays as input.

63
00:05:47,660 --> 00:05:53,120
Recall that your data has to be in the form of a table with samples along the rows and features along

64
00:05:53,120 --> 00:05:54,020
the columns.

65
00:05:54,770 --> 00:06:00,830
Basically, we have one sample and two features, so we reshape last X to one by T.

66
00:06:01,820 --> 00:06:07,040
After we get the prediction, we index the prediction at zero since there is only one prediction.

67
00:06:09,210 --> 00:06:12,660
The next step is to append our prediction to a list of predictions.

68
00:06:16,190 --> 00:06:20,900
Once we've started our prediction, the next step is to update last logistics for the next iteration

69
00:06:20,900 --> 00:06:21,530
of the loop.

70
00:06:22,400 --> 00:06:27,830
Basically, what we want to do is throw out the oldest value and append the newest value, which is

71
00:06:27,830 --> 00:06:28,930
the latest prediction.

72
00:06:29,630 --> 00:06:31,910
But recall that no higher rates are constant.

73
00:06:31,910 --> 00:06:35,770
Sighs You can't delete or append anything to an array.

74
00:06:36,470 --> 00:06:41,680
So the simplest way to perform this operation is to rotate all the values by one step.

75
00:06:42,200 --> 00:06:47,510
The oldest value will wrap around to the end, but in the next line, we simply replace it with P.

76
00:06:48,290 --> 00:06:53,080
So this effectively throws out the oldest value and adds the newest prediction to the end.

77
00:06:53,660 --> 00:06:58,820
You should confirm to yourself that this implements the incremental forecasting procedure we discussed.

78
00:07:03,640 --> 00:07:07,140
The next step is to store the multistep predictions to our data frame.

79
00:07:10,510 --> 00:07:12,430
The next step is to plot our forecast.

80
00:07:16,860 --> 00:07:22,590
As expected, the multi-step forecast seems to be a little worse compared to the ones that forecast.

81
00:07:27,230 --> 00:07:32,840
The next step in this notebook is to create a multi output model, so in order to build this, it's

82
00:07:32,840 --> 00:07:37,020
really just a matter of creating the right data set in this block of code.

83
00:07:37,040 --> 00:07:44,150
We now have it to X and Y t X represents the number of time steps in the input, and seewhy represents

84
00:07:44,150 --> 00:07:45,620
the number of steps in the output.

85
00:07:48,230 --> 00:07:52,910
Now, again, the limit for this loop might be a bit confusing, but I encourage you to print out the

86
00:07:52,910 --> 00:07:57,880
result and maybe also run through the loop by hand to convince yourself that this is correct.

87
00:07:58,700 --> 00:08:02,120
Inside the loop, we grab the input series, which is the same as before.

88
00:08:02,750 --> 00:08:07,790
The output series has length t y starting from the index, little T plus the X.

89
00:08:08,540 --> 00:08:13,490
And again, once we're outside the loop, we convert X and Y into an umpire raise.

90
00:08:19,320 --> 00:08:24,050
OK, and so in the next block, we split our inputs and targets into train and test.

91
00:08:24,660 --> 00:08:30,300
Now, since we only want to forecast for the final and test time steps, our test set is effectively

92
00:08:30,300 --> 00:08:31,500
just the final data point.

93
00:08:32,160 --> 00:08:37,580
This is because it's a multi output data set and each target itself has n test the timestamps.

94
00:08:38,250 --> 00:08:42,660
Note that I've put an underscore M at the end of these variable names so that we don't overwrite the

95
00:08:42,660 --> 00:08:43,560
old values.

96
00:08:47,160 --> 00:08:50,370
OK, so for the next step, we're going to fit a linear model.

97
00:08:53,610 --> 00:08:56,520
Notice that the R-squared is now a bit better than before.

98
00:08:57,120 --> 00:09:02,370
However, note that this is a bit misleading since the targets actually repeat the same values multiple

99
00:09:02,370 --> 00:09:03,090
times.

100
00:09:03,900 --> 00:09:08,670
For example, you have Y one y two by three and Y two y three in my four and so on.

101
00:09:13,140 --> 00:09:17,420
OK, so when we tried to compute the R squared of the test, we get not a number.

102
00:09:17,940 --> 00:09:19,230
So why does this happen?

103
00:09:20,100 --> 00:09:25,290
Well, recall that the R squared involves computing the SC, which is essentially the sample variance

104
00:09:25,290 --> 00:09:31,290
of the targets, but the sample variance of a single sample is zero since we only have one test data

105
00:09:31,290 --> 00:09:31,680
point.

106
00:09:32,550 --> 00:09:36,150
So unfortunately, the test R squared cannot be computed this way.

107
00:09:38,950 --> 00:09:44,650
However, an alternative is to simply call the R-squared function ourselves and flatten the input arrays.

108
00:09:47,290 --> 00:09:51,190
OK, so this gives us an R-squared of zero point eight, which is an improvement.

109
00:09:53,300 --> 00:09:57,040
The next step is to save our multisport forecast to our data frame.

110
00:10:01,140 --> 00:10:03,120
The next step is to plot our forecasts.

111
00:10:08,320 --> 00:10:12,430
OK, so it looks like the multiple forecast is superior.

112
00:10:15,450 --> 00:10:21,030
Now, just for fun, we're going to use a different metric in this block of code, we compute the map

113
00:10:21,150 --> 00:10:24,150
for both the incremental and multiple forecasts.

114
00:10:27,710 --> 00:10:31,030
So the multi map is a bit better as expected.

115
00:10:34,670 --> 00:10:39,140
So in the next portion of this notebook, what I'd like to do is run through the exact same process

116
00:10:39,140 --> 00:10:45,200
again, but with different machine learning models, since the code would essentially be the same,

117
00:10:45,360 --> 00:10:48,420
it would be a good idea to make a function to do all the work.

118
00:10:48,950 --> 00:10:51,110
So this function takes in two inputs.

119
00:10:51,410 --> 00:10:57,260
The model we want to use and its name, the name is just the string so we can make sure all the color

120
00:10:57,260 --> 00:10:59,180
names of our data frame are unique.

121
00:10:59,930 --> 00:11:03,020
So basically everything in this function is stuff you just saw.

122
00:11:03,440 --> 00:11:04,790
First we fit a model.

123
00:11:05,210 --> 00:11:07,100
Next we do a one step forecast.

124
00:11:07,580 --> 00:11:10,710
After that, we do an incremental, multi-step forecast.

125
00:11:11,240 --> 00:11:13,980
After that, we store the forecast in the data frame.

126
00:11:14,570 --> 00:11:16,490
After that, we compute the map.

127
00:11:17,030 --> 00:11:19,490
And the final thing is to plot the forecast.

128
00:11:23,730 --> 00:11:27,000
OK, so let's try a function with the support vector machine.

129
00:11:31,140 --> 00:11:36,150
So the support vector machine does not perform that well, but again, remember that this is without

130
00:11:36,150 --> 00:11:36,810
different saying.

131
00:11:40,380 --> 00:11:43,710
The next step is to use the same function with the random forest.

132
00:11:50,130 --> 00:11:55,260
Again, the predictions are pretty bad, but it does seem to fit very well to the train set, which

133
00:11:55,260 --> 00:11:57,090
is pretty common for the random forest.

134
00:12:01,750 --> 00:12:07,540
OK, so the next step is to make another function for the multi output forecast, remember that this

135
00:12:07,540 --> 00:12:10,910
requires a different model because it has a different number of outputs.

136
00:12:11,440 --> 00:12:13,210
So the basic steps are the same.

137
00:12:13,300 --> 00:12:16,810
Fit the models, store the forecast, compute the map, make apply.

138
00:12:21,010 --> 00:12:24,220
OK, so let's try to function with the support vector machine.

139
00:12:27,560 --> 00:12:32,420
Now, I kind of gave it away in the come in here, but the support vector machine does not handle multiple

140
00:12:32,420 --> 00:12:33,100
outputs.

141
00:12:33,590 --> 00:12:38,680
Therefore, this is one example of a machine learning model for which this method will not work.

142
00:12:42,710 --> 00:12:44,810
The next step is to try the random forest.

143
00:12:50,370 --> 00:12:55,120
OK, so luckily the random force works, but again, it doesn't perform too well.
