1
00:00:11,690 --> 00:00:17,050
In this lecture, we are going to discuss forecasting, which will lead us into our next coning example.

2
00:00:18,010 --> 00:00:24,160
This lecture will focus on how to do a correct forecast because unfortunately, a lot of resources do

3
00:00:24,160 --> 00:00:28,960
this in a way that makes your results look really nice, but actually make zero sense.

4
00:00:30,270 --> 00:00:35,190
So first, what does it mean to forecast in the way that we mean in this course?

5
00:00:35,460 --> 00:00:41,760
It means that we have a time series and our job is to predict the next values of this Time series.

6
00:00:42,450 --> 00:00:44,940
Importantly, notice I use the plural there.

7
00:00:45,270 --> 00:00:48,470
We want to predict multiple values, not just one value.

8
00:00:50,270 --> 00:00:56,630
The length of time that you want to predict is called the horizon, in some applications, your horizon

9
00:00:56,900 --> 00:00:58,610
might be three days or five days.

10
00:00:58,790 --> 00:01:04,070
For example, if you're trying to predict the demand for each product at a factory, you might want

11
00:01:04,070 --> 00:01:07,460
to forecast the demand over the next three to five days.

12
00:01:09,370 --> 00:01:11,510
Another pretty obvious example is the weather.

13
00:01:13,290 --> 00:01:17,190
Let's say we want to predict the hourly weather over the next seven days or so.

14
00:01:17,940 --> 00:01:25,140
So if we predict the weather for each hour over seven days, that's seven times 24, which is one hundred

15
00:01:25,140 --> 00:01:26,460
and sixty eight hours.

16
00:01:28,480 --> 00:01:34,300
So if your sequence step size is ours, that would be predicting one hundred sixty eight steps ahead.

17
00:01:35,290 --> 00:01:36,160
This is important.

18
00:01:36,250 --> 00:01:40,770
So remember, you don't want to just predict one step ahead as a side note.

19
00:01:41,200 --> 00:01:42,880
Forecasting is a huge topic.

20
00:01:43,210 --> 00:01:45,550
You could study an entire book about forecasting.

21
00:01:46,150 --> 00:01:52,090
This course is not that we're just touching on specific points which are relevant to deep learning and

22
00:01:52,110 --> 00:01:52,900
Arnon's.

23
00:01:57,850 --> 00:02:02,660
Now, you might recognize that this section is all about own ends, recurrent neural networks.

24
00:02:03,050 --> 00:02:04,610
So where are the Ahran ends?

25
00:02:05,240 --> 00:02:06,200
Well, we are not there yet.

26
00:02:07,250 --> 00:02:12,710
A common approach in machine learning, especially in the industry, is to do the simplest thing possible.

27
00:02:13,490 --> 00:02:17,480
Don't throw an R.N. in at a problem if there is a simpler method available.

28
00:02:18,380 --> 00:02:24,620
So my question for you now is what is the simplest way to do a forecast given a single one dimensional

29
00:02:24,620 --> 00:02:25,540
time series?

30
00:02:30,650 --> 00:02:35,690
Well, how about we go back to our approach at the very beginning of this course, linear regression,

31
00:02:36,650 --> 00:02:42,800
linear regression is in fact one of the best models to use to study forecasting because it's very intuitive

32
00:02:43,190 --> 00:02:47,270
and there's not that much going on in terms of formulas and manipulating equations.

33
00:02:52,420 --> 00:02:57,380
At this point, you might raise an objection if our data is of shape and by t by deed.

34
00:02:57,790 --> 00:03:03,400
We can't use linear regression because linear regression works on 2D rectangles of shape and by D.

35
00:03:04,210 --> 00:03:06,460
And if you were thinking this, that's a great sign.

36
00:03:06,970 --> 00:03:08,560
That means you're thinking of our shapes.

37
00:03:08,890 --> 00:03:10,660
Which is exactly what you should be doing.

38
00:03:11,590 --> 00:03:14,290
So how can we reconcile this apparent contradiction?

39
00:03:15,130 --> 00:03:19,440
Well, in fact, linear regression does only work on data of shape and body.

40
00:03:20,230 --> 00:03:25,210
But remember that in our example, we are considering only one dimensional time series.

41
00:03:25,230 --> 00:03:26,590
So B equals one.

42
00:03:27,430 --> 00:03:30,090
As you know, this is just a superfluous dimension.

43
00:03:35,230 --> 00:03:40,720
We can represent the exact same data in a two dimensional array of shape and by T.

44
00:03:41,470 --> 00:03:46,450
Again, this is analogous to the situation where you have an end length, one dimensional array.

45
00:03:46,900 --> 00:03:51,130
Or you can have the same array stored in a two dimensional and by one object.

46
00:03:53,080 --> 00:03:55,720
So how do we use linear regression in this situation?

47
00:03:56,680 --> 00:04:01,390
The answer is we pretend that T is D or to put that a different way.

48
00:04:01,750 --> 00:04:06,010
In this scenario, linear regression treats T as if it were D.

49
00:04:06,760 --> 00:04:08,530
This is another instance of my rule.

50
00:04:08,800 --> 00:04:10,030
All data is the same.

51
00:04:15,120 --> 00:04:16,980
Let's think about how our data will look.

52
00:04:17,760 --> 00:04:19,750
Let's suppose we have a time series of length 10.

53
00:04:20,430 --> 00:04:25,810
And we want to train a model to predict the next value in the Time series using the past three time

54
00:04:25,810 --> 00:04:26,210
steps.

55
00:04:27,090 --> 00:04:30,600
In this case, our input matrix will be of size end by three.

56
00:04:31,020 --> 00:04:33,120
And our target vector will be of size N.

57
00:04:35,580 --> 00:04:42,060
Now, what is n as mentioned previously, we can calculate N as the number of length three windows that

58
00:04:42,060 --> 00:04:44,340
fit into the Time series, which is length 10.

59
00:04:44,910 --> 00:04:46,980
So that's ten minus three, plus one, which is eight.

60
00:04:47,820 --> 00:04:48,420
But wait.

61
00:04:49,080 --> 00:04:53,130
Remember that we would like to be able to predict the next value, which is the target.

62
00:04:53,820 --> 00:04:59,100
So while we can fit eight windows into the Time series, we can't actually use the last window in our

63
00:04:59,100 --> 00:05:01,890
training set because we don't know the next value.

64
00:05:07,080 --> 00:05:10,580
Suppose our data is just x 16 to all the way up to x 10.

65
00:05:11,400 --> 00:05:17,670
In this case, if our input is X eight, x nine and 10, we can't have X eleven as a target because

66
00:05:17,670 --> 00:05:19,980
we don't have X eleven in our time series.

67
00:05:20,520 --> 00:05:22,860
So in actuality, we only have N equals seven.

68
00:05:23,730 --> 00:05:29,090
So really, it's like asking how many windows of length for fit into the time series of length.

69
00:05:29,220 --> 00:05:29,600
Ten.

70
00:05:30,150 --> 00:05:32,280
That's ten minus four plus one which is seven.

71
00:05:32,670 --> 00:05:35,460
And this is because the fourth value is the target.

72
00:05:40,600 --> 00:05:44,980
So what does a linear regression forecasting model look like at this point?

73
00:05:45,010 --> 00:05:52,930
It should be pretty clear it's ex hat a time T equal to W zero plus W one times X at time, T minus

74
00:05:52,930 --> 00:05:59,350
one plus W two times X at T, minus two plus W, three times X at T minus three.

75
00:06:00,730 --> 00:06:02,320
Well, I'm using time indices here.

76
00:06:02,770 --> 00:06:07,660
You should still be able to recognize this as linear regression, as a side note.

77
00:06:07,690 --> 00:06:10,600
This is called an auto regressive or a car model.

78
00:06:10,750 --> 00:06:14,410
In statistics, nomic, later, the word auto means self.

79
00:06:14,770 --> 00:06:19,990
So it's a model that tries to predict the value in a time series using its own values.

80
00:06:25,140 --> 00:06:27,840
OK, so using this model, how do we forecast?

81
00:06:28,770 --> 00:06:30,080
Here's a common thing I see a lot.

82
00:06:31,020 --> 00:06:34,830
Let's say our test set consists of the data X eleven up to x 20.

83
00:06:35,370 --> 00:06:38,760
And we want to calculate the test accuracy on our test set.

84
00:06:39,660 --> 00:06:45,570
So you might say take the data matrix that we see here, X say X nine, 10, x nine, x 10x eleven and

85
00:06:45,570 --> 00:06:46,050
so on.

86
00:06:46,470 --> 00:06:48,570
And plug that into model that predicts.

87
00:06:49,260 --> 00:06:53,370
Then we'll get the predictions for X eleven X twelve x 13 and so on.

88
00:06:54,180 --> 00:06:55,940
Unfortunately, this is wrong.

89
00:06:57,800 --> 00:07:02,390
As a side note, please be aware that models and pi torture do not have a predict function.

90
00:07:02,930 --> 00:07:08,300
We're just using this notation for convenience since it allows us to express how to make the model predictions

91
00:07:08,360 --> 00:07:09,680
using one line of code.

92
00:07:10,310 --> 00:07:12,220
You should recognize this notation from psyche.

93
00:07:12,350 --> 00:07:17,840
Learn so you can just frame this lecture in the context of using a psyche learn model rather than pi

94
00:07:17,840 --> 00:07:18,410
torch.

95
00:07:23,530 --> 00:07:24,610
So why is it wrong?

96
00:07:25,600 --> 00:07:30,250
Remember that we would like to use our forecasting model to predict multiple steps ahead.

97
00:07:31,060 --> 00:07:33,910
What we just saw was only predicting one step ahead.

98
00:07:34,630 --> 00:07:35,740
This is not good enough.

99
00:07:36,550 --> 00:07:38,590
Let's say we want to predict three days ahead.

100
00:07:38,890 --> 00:07:43,600
We want to use they one day to day three to predict that day four, day five and day six.

101
00:07:44,280 --> 00:07:49,900
What were we doing before we used day one, day two and day three to predict day four?

102
00:07:50,080 --> 00:07:50,710
That's fine.

103
00:07:51,460 --> 00:07:54,610
Then we used day to day three and day four to predict day five.

104
00:07:54,670 --> 00:07:59,200
That's not fine because remember, we don't know the value on day four yet.

105
00:07:59,590 --> 00:08:01,090
Today is only day three.

106
00:08:01,960 --> 00:08:06,100
Similarly, we can't use day three, day four, day five to predict day six.

107
00:08:06,970 --> 00:08:08,380
So what do we do instead?

108
00:08:13,520 --> 00:08:19,190
If we want to make predictions multiple steps into the future, we must use our own earlier predictions

109
00:08:19,190 --> 00:08:23,360
as input, for example, in order to predict X five.

110
00:08:23,510 --> 00:08:30,670
We use X two, X three and X hat for X hat for was calculated from X one X to an X three.

111
00:08:31,870 --> 00:08:36,760
To predict, execs will use X three X hat for an X hat five.

112
00:08:38,380 --> 00:08:40,270
This is the correct way to forecast.

113
00:08:40,930 --> 00:08:42,970
Now, note that this doesn't mean one step ahead.

114
00:08:42,970 --> 00:08:44,290
Predictions aren't useful.

115
00:08:44,740 --> 00:08:48,400
It just means you have to know when to use and when not to use them.

116
00:08:53,510 --> 00:08:55,730
Because we have to make predictions sequentially.

117
00:08:56,180 --> 00:08:59,810
We can't do it all in one call of model to predict it like we did before.

118
00:09:00,740 --> 00:09:07,190
Instead, we need to use a loop and call model, DOT predicts only on the most recently generated sequence.

119
00:09:07,850 --> 00:09:14,330
And this pseudocode, I'll attempt to express the main idea, our first sequence X we initialize to

120
00:09:14,330 --> 00:09:19,180
the last values of the train set from which we want to predict the first value of the test set.

121
00:09:21,360 --> 00:09:24,870
We also initialize an empty list to store our predictions.

122
00:09:25,830 --> 00:09:31,200
Then we enter a loop that goes for the number of steps we want to forecast inside the loop.

123
00:09:31,230 --> 00:09:35,640
We call model to predict to get only the next immediate value.

124
00:09:36,450 --> 00:09:39,570
We store this in our predictions list then.

125
00:09:39,690 --> 00:09:41,070
And this is the important part.

126
00:09:41,550 --> 00:09:43,770
We discard the all this value index.

127
00:09:44,130 --> 00:09:47,430
And we can cartney the rest of X with the new prediction.

128
00:09:48,000 --> 00:09:49,380
We assign that to X.

129
00:09:50,900 --> 00:09:57,500
Then when we call model to predict on the next round, we are using an X that contains a both true values

130
00:09:57,890 --> 00:10:00,190
and predictions for the forecasted values.

131
00:10:05,360 --> 00:10:10,040
As a final note in this lecture, I want you to think about how can we apply the rule?

132
00:10:10,160 --> 00:10:14,870
All machine learning interfaces are the same in order to build a more powerful predictor.

133
00:10:16,010 --> 00:10:21,890
One limitation of linear regression is that the prediction can only be a linear function of its inputs.

134
00:10:22,700 --> 00:10:28,360
Luckily, we already know how to build a more powerful model that works on the same kind of data and

135
00:10:28,520 --> 00:10:28,970
CNN.

136
00:10:29,930 --> 00:10:35,840
So by being an expert in deep learning, you can immediately apply this technique without any extra

137
00:10:35,840 --> 00:10:36,260
effort.

138
00:10:36,800 --> 00:10:38,310
That's a pretty powerful approach.

139
00:10:39,380 --> 00:10:45,710
All of a sudden, you are able to convert your classic statistical time series forecasting model into

140
00:10:45,710 --> 00:10:49,710
a modern superpower to nonlinear neural network forecaster.

141
00:10:50,510 --> 00:10:52,250
And that's all just with one line of code.

142
00:10:52,700 --> 00:10:53,360
Not bad.
