1
00:00:11,700 --> 00:00:16,890
In this lecture we are going to look at the codes implement an auto regressive linear model for time

2
00:00:16,890 --> 00:00:18,390
series prediction.

3
00:00:18,390 --> 00:00:23,100
Since you already know how to write linear regression Using pi storage that's not going to be much of

4
00:00:23,100 --> 00:00:24,900
a challenge in this lecture.

5
00:00:24,900 --> 00:00:30,680
The challenge in this lecture mostly has to do with processing the data correctly and making forecasts.

6
00:00:30,690 --> 00:00:33,930
This lecture is going to walk you through a prepared call lab notebook.

7
00:00:33,930 --> 00:00:39,600
Although a very good exercise which I always recommend is once you know how this is done to try and

8
00:00:39,600 --> 00:00:45,870
recreate it yourself with as few references as possible as usual you can look at the title of the notebook

9
00:00:46,110 --> 00:00:48,510
to determine what notebook we are currently looking at.

10
00:00:49,200 --> 00:00:50,240
So let's get started.

11
00:00:56,710 --> 00:01:02,020
The new thing here is after the imports we're going to start by creating a synthetic dataset.

12
00:01:02,110 --> 00:01:05,840
Remember how earlier I explain why synthetic datasets are important.

13
00:01:06,040 --> 00:01:12,080
They allow us to carefully study the behaviour of our model under different controlled circumstances.

14
00:01:12,100 --> 00:01:17,980
So in this lecture we're going to start with a very basic time series a sine wave to start we're going

15
00:01:17,980 --> 00:01:26,400
to create a sine wave with no noise just a pure sine wave with 1000 points we also plot the series so

16
00:01:26,400 --> 00:01:27,420
you know what it looks like

17
00:01:30,720 --> 00:01:31,030
all right.

18
00:01:31,060 --> 00:01:34,380
So there's the sine wave.

19
00:01:34,580 --> 00:01:40,130
Next we build our data set and this example we're going to set t equal to 10 which means that we're

20
00:01:40,130 --> 00:01:44,650
going to use 10 previous timestamps to predict the next timestamp.

21
00:01:44,810 --> 00:01:49,090
We'll start by populating X and Y as lists and then cast them into arrays.

22
00:01:49,100 --> 00:01:55,160
Once we are done you could alternatively pre allocate these as fixed size an umpire is but I've decided

23
00:01:55,160 --> 00:01:57,590
to take a more lazy approach.

24
00:01:57,590 --> 00:02:02,780
So in this loop we go from zero up to the length of the series minus Big T.

25
00:02:02,780 --> 00:02:05,450
Now of course you want to double check that this is correct.

26
00:02:05,540 --> 00:02:08,320
So that we don't have an off by one error.

27
00:02:08,390 --> 00:02:09,910
The reasoning is this.

28
00:02:10,160 --> 00:02:14,780
We know that our input x will go from Little T up to Little T plus big T.

29
00:02:14,810 --> 00:02:23,520
So for example zero up to 10 in terms of actual indices that will be at 0 1 2 and so on up to 9 inclusive.

30
00:02:23,570 --> 00:02:28,010
So when we say zero up to 10 it doesn't actually include the index 10.

31
00:02:28,340 --> 00:02:30,920
Instead the target will be at the index 10.

32
00:02:31,760 --> 00:02:36,380
We know that the final index of the original series will be the final target.

33
00:02:36,500 --> 00:02:44,480
That's the index Lens series minus 1 thus in order for the final target to be the length of the series

34
00:02:44,480 --> 00:02:45,540
minus 1.

35
00:02:45,560 --> 00:02:49,850
That means Little T must go up to the length of the series minus Big T.

36
00:02:50,780 --> 00:02:58,190
If we plug in the final value of Little T we get Lens series minus T minus one plus T which is equal

37
00:02:58,190 --> 00:03:05,810
to land series minus one so inside the loop we assign a little X to be the series from Little T up to

38
00:03:05,810 --> 00:03:14,260
Little T plus big T then we append this to Big X next we assign a little Y which is the target to be

39
00:03:14,260 --> 00:03:22,660
the series at time A little T plus big T then we append this to big Y when we're done that we cast x

40
00:03:22,660 --> 00:03:25,360
and y to no higher res and prints out the shape

41
00:03:32,120 --> 00:03:42,960
so as expected X is of shape and my 10 whereas Y is end by 1.

42
00:03:43,080 --> 00:03:47,550
Next we build our auto regressive linear model as you know it's just an end linear

43
00:03:51,170 --> 00:03:56,870
as expected at the next step is to create our loss and optimizer since this is regression we'll be using

44
00:03:56,900 --> 00:03:57,920
the means squared error

45
00:04:04,110 --> 00:04:06,850
next we split the data into train and test sets.

46
00:04:07,080 --> 00:04:12,210
Here's another important point about training time series models you don't want to split up your data

47
00:04:12,210 --> 00:04:15,870
randomly say using the cycle learn function at train test split.

48
00:04:16,170 --> 00:04:21,510
This wouldn't make sense because a real forecasting model has to predict the future it can't train on

49
00:04:21,510 --> 00:04:23,410
points within that future.

50
00:04:23,490 --> 00:04:30,270
Thus our model trains only on the first half of the dataset and validates on the 2nd half since sideways

51
00:04:30,270 --> 00:04:31,050
are periodic.

52
00:04:31,050 --> 00:04:34,340
This is a non issue in the scripts but in general this is important

53
00:04:43,730 --> 00:04:49,520
next we have our training function which is called full D because it does full gradient descent instead

54
00:04:49,520 --> 00:04:51,260
of batch gradient descent.

55
00:04:51,260 --> 00:04:54,820
This is okay for this dataset because this dataset is quite small

56
00:04:59,450 --> 00:05:05,110
all right so next we call our Phil G.D. function in training is almost instant

57
00:05:09,080 --> 00:05:09,420
OK.

58
00:05:09,430 --> 00:05:14,350
So next we plot our loss as usual so this looks at it

59
00:05:19,850 --> 00:05:20,570
next.

60
00:05:20,640 --> 00:05:24,150
We're going to do a forecast using the incorrect method.

61
00:05:24,150 --> 00:05:28,610
The reason I want to do this is because I want to show you what not to do.

62
00:05:28,620 --> 00:05:33,810
We're also going to do this calculation manually because the general form of the code is going to be

63
00:05:33,810 --> 00:05:38,130
helpful when we want to make actual forecasts as an exercise.

64
00:05:38,130 --> 00:05:43,350
You might want to think about how to make this code do the same thing but more efficiently.

65
00:05:43,380 --> 00:05:50,160
So first we assign the validation target variable which is just the second half of why we'll create

66
00:05:50,160 --> 00:05:56,070
an empty list for the validation predictions and then populate this as we move through the data.

67
00:05:56,070 --> 00:06:02,520
Next we're going to set the index variable AI which indexes x test to 0.

68
00:06:02,520 --> 00:06:07,470
Next we're going to enter a loop that continues while the length of the validation predictions list

69
00:06:07,740 --> 00:06:10,880
is less than the length of the validation target list.

70
00:06:11,010 --> 00:06:15,900
Once they have the same length we know that we're done because we've made all the predictions to correspond

71
00:06:15,900 --> 00:06:16,830
with the targets

72
00:06:20,690 --> 00:06:21,450
inside the loop.

73
00:06:21,470 --> 00:06:23,620
We get our model prediction P.

74
00:06:23,660 --> 00:06:25,820
Now this might look weird to you.

75
00:06:25,820 --> 00:06:30,090
Why not just pass in x test Saibai to the model and then assign that to P.

76
00:06:30,830 --> 00:06:32,840
Well there are two problems there.

77
00:06:32,930 --> 00:06:36,640
First X tests Abi does not have the right shape.

78
00:06:36,650 --> 00:06:42,680
Remember that the inputs in the model must be a two dimensional array but X test itself is a two dimensional

79
00:06:42,680 --> 00:06:43,660
array.

80
00:06:43,670 --> 00:06:45,450
Sophie index x test.

81
00:06:45,500 --> 00:06:48,520
We only get a one to array of size T.

82
00:06:48,800 --> 00:06:52,220
Therefore we need to reshape the array to one by T.

83
00:06:52,250 --> 00:06:58,580
That means one sample with t features we can accomplish that by calling the view function and passing

84
00:06:58,580 --> 00:07:03,450
in one minus one then we get the output of the model prediction.

85
00:07:03,580 --> 00:07:09,790
And that's also not the right shape because remember that in general the model returns and n by K output.

86
00:07:09,790 --> 00:07:16,690
If we have N samples and K output nodes in this particular example we happen to have one sample and

87
00:07:16,690 --> 00:07:17,670
one output node.

88
00:07:17,800 --> 00:07:24,700
So this returns a 1 by 1 2 dimensional array therefore to get the scalar value contained within it.

89
00:07:24,700 --> 00:07:32,020
We have to index the array at 0 0 finally since the output value lives in PI towards land.

90
00:07:32,020 --> 00:07:36,320
We want to bring it back to Python land by calling the item function.

91
00:07:36,340 --> 00:07:51,330
Next we increment AI and we append the prediction P to our validation predictions.

92
00:07:51,330 --> 00:07:54,360
Next we're going to plot our predictions against the targets

93
00:07:58,500 --> 00:07:59,830
so this looks very good.

94
00:07:59,850 --> 00:08:01,650
Pretty much exactly correct.

95
00:08:01,890 --> 00:08:13,050
But remember this is still the wrong way of forecasting.

96
00:08:13,070 --> 00:08:16,260
Next we're going to do forecasting the correct way.

97
00:08:16,260 --> 00:08:22,170
So if you just glance at this code so it looks almost exactly the same as what we had before we start

98
00:08:22,170 --> 00:08:28,050
by grabbing our validation targets as the second half of Y and we initialize validation predictions

99
00:08:28,110 --> 00:08:29,740
as an empty list.

100
00:08:29,850 --> 00:08:31,510
The next line is different.

101
00:08:31,750 --> 00:08:37,800
Before we just had an index i and we used I to index x test the input data.

102
00:08:37,800 --> 00:08:42,900
But the problem was we shouldn't have been using the true input data to predict future values of the

103
00:08:42,900 --> 00:08:44,450
time series.

104
00:08:44,460 --> 00:08:50,220
Instead we'll set the variable last X to be the first input vector from here on out.

105
00:08:50,280 --> 00:08:54,470
We will no longer take any new values from the actual data set.

106
00:08:54,510 --> 00:08:57,330
So next we enter the loop inside the loop.

107
00:08:57,330 --> 00:08:59,730
We do the same prediction as before.

108
00:08:59,790 --> 00:09:03,560
Remember that we have to reshape the input to one by T.

109
00:09:03,630 --> 00:09:06,570
Now this is where things change first.

110
00:09:06,600 --> 00:09:12,270
After getting the model output which we'll call P I'm not going to immediately grab the value and bring

111
00:09:12,270 --> 00:09:18,660
it back to Python land instead I'll do that on the next line where I appends the prediction to the list

112
00:09:18,660 --> 00:09:25,220
of predictions so I index p at 0 0 and then I call dot item.

113
00:09:25,220 --> 00:09:27,160
So why did I do that.

114
00:09:27,170 --> 00:09:33,410
Well I still need the p value to be in PI torch land in order to make our new input on the next iteration

115
00:09:33,410 --> 00:09:34,370
of the loop.

116
00:09:34,430 --> 00:09:39,290
We essentially have to shift all the values of the array to the left and then add the new prediction

117
00:09:39,290 --> 00:09:40,820
at the end.

118
00:09:40,820 --> 00:09:44,800
Of course this entire array must be in PI torch land and not num pilot.

119
00:09:45,350 --> 00:09:53,690
So that's why I kept the p variable as it was returned by the model lastly we use the PI torch Kat function

120
00:09:53,960 --> 00:09:55,850
which stands for concatenate.

121
00:09:56,120 --> 00:10:01,940
I take everything in the previous input x throw away the first value and then concatenate it with P..

122
00:10:01,940 --> 00:10:03,680
The newest prediction.

123
00:10:03,680 --> 00:10:08,270
So that's how we update the variable last X with our latest forecasted prediction.

124
00:10:08,900 --> 00:10:09,860
So let's run this

125
00:10:15,760 --> 00:10:22,540
and finally let's plot the predictions against the targets.

126
00:10:22,570 --> 00:10:23,770
So this is very interesting.

127
00:10:25,000 --> 00:10:30,550
So it appears even though we are now making the forecast using the correct method we still get a perfect

128
00:10:30,550 --> 00:10:32,710
prediction.

129
00:10:32,950 --> 00:10:38,680
Now remember that this time series has no noise so you might be wondering what happens if it does have

130
00:10:38,680 --> 00:10:39,790
noise.

131
00:10:39,820 --> 00:10:42,390
Let's go back and add noise and then run everything again

132
00:10:45,120 --> 00:10:46,170
so let's scroll back up.

133
00:10:47,550 --> 00:10:48,780
Let's uncommon that line

134
00:10:51,970 --> 00:10:54,340
and then this good at runtime and run after

135
00:10:57,900 --> 00:10:58,290
OK.

136
00:10:58,320 --> 00:11:00,630
So here's the noisy sine wave

137
00:11:08,010 --> 00:11:13,680
here's the other code so you see that the loss is a little higher now

138
00:11:17,460 --> 00:11:19,110
lost more iterations still looks good.

139
00:11:24,860 --> 00:11:26,860
So here's where things get tricky.

140
00:11:27,020 --> 00:11:32,930
We're going to plot our incorrectly made forecast as you can see it appears to look pretty good.

141
00:11:32,930 --> 00:11:39,890
Our forecast even appears to smooth out all the noise but remember this is the wrong way to forecast

142
00:11:39,920 --> 00:11:44,000
so it should be no surprise that these results are misleading.

143
00:11:44,000 --> 00:11:52,320
So let's go down to the correct way of making forecasts.

144
00:11:52,440 --> 00:11:58,160
OK so now we see what happens when we do a correct forecast which is what happens in the realistic scenario.

145
00:11:58,500 --> 00:12:02,960
As you can see the results are no longer as good because of all that noise.

146
00:12:02,970 --> 00:12:07,800
Interestingly the model does learn that the pattern is periodic and that the true underlying function

147
00:12:07,800 --> 00:12:08,400
is smooth.

148
00:12:09,180 --> 00:12:10,770
So at least that's pretty encouraging.