1
00:00:00,800 --> 00:00:02,290
In the previous videos this week,

2
00:00:02,290 --> 00:00:06,860
you saw all of the different factors that
make up the behavior of a time series.

3
00:00:06,860 --> 00:00:09,080
Now we'll start looking at techniques,

4
00:00:09,080 --> 00:00:14,000
given what we know, that can be used
to then forecast that time series.

5
00:00:14,000 --> 00:00:17,200
Let's start with this time series
containing, trend seasonality,

6
00:00:17,200 --> 00:00:19,050
and noise and
that's realistic enough for now.

7
00:00:20,430 --> 00:00:24,692
We could, for example, take the last
value and assume that the next value will

8
00:00:24,692 --> 00:00:27,742
be the same one, and
this is called naive forecasting.

9
00:00:27,742 --> 00:00:31,559
I've zoomed into a part of the data
set here to show that in action.

10
00:00:31,559 --> 00:00:35,134
We can do that to get a baseline at
the very least, and believe it or

11
00:00:35,134 --> 00:00:37,350
not, that baseline can be pretty good.

12
00:00:37,350 --> 00:00:39,905
But how do you measure performance?

13
00:00:39,905 --> 00:00:42,550
To measure the performance
of our forecasting model,.

14
00:00:42,550 --> 00:00:46,667
We typically want to split the time
series into a training period,

15
00:00:46,667 --> 00:00:49,103
a validation period and a test period.

16
00:00:49,103 --> 00:00:51,644
This is called fixed partitioning.

17
00:00:51,644 --> 00:00:54,379
If the time series has some seasonality,

18
00:00:54,379 --> 00:00:59,783
you generally want to ensure that each
period contains a whole number of seasons.

19
00:00:59,783 --> 00:01:03,328
For example, one year, or
two years, or three years,

20
00:01:03,328 --> 00:01:06,202
if the time series has
a yearly seasonality.

21
00:01:06,202 --> 00:01:09,665
You generally don't want one year and
a half, or

22
00:01:09,665 --> 00:01:13,650
else some months will be
represented more than others.

23
00:01:13,650 --> 00:01:17,922
While this might appear a little different
from the training validation test,

24
00:01:17,922 --> 00:01:21,441
that you might be familiar with
from non-time series data sets.

25
00:01:21,441 --> 00:01:26,019
Where you just picked random values
out of the corpus to make all three,

26
00:01:26,019 --> 00:01:29,617
you should see that the impact
is effectively the same.

27
00:01:29,617 --> 00:01:32,786
Next you'll train your model
on the training period, and

28
00:01:32,786 --> 00:01:35,970
you'll evaluate it on
the validation period.

29
00:01:35,970 --> 00:01:39,740
Here's where you can experiment to find
the right architecture for training.

30
00:01:39,740 --> 00:01:42,295
And work on it and your hyper parameters,

31
00:01:42,295 --> 00:01:47,044
until you get the desired performance,
measured using the validation set.

32
00:01:47,044 --> 00:01:51,360
Often, once you've done that, you can
retrain using both the training and

33
00:01:51,360 --> 00:01:52,447
validation data.

34
00:01:52,447 --> 00:01:57,240
And then test on the test period to see
if your model will perform just as well.

35
00:01:57,240 --> 00:02:02,093
And if it does, then you could take
the unusual step of retraining again,

36
00:02:02,093 --> 00:02:03,827
using also the test data.

37
00:02:03,827 --> 00:02:05,570
But why would you do that?

38
00:02:05,570 --> 00:02:09,060
Well, it's because the test data
is the closest data you have

39
00:02:09,060 --> 00:02:10,890
to the current point in time.

40
00:02:10,890 --> 00:02:15,654
And as such it's often the strongest
signal in determining future values.

41
00:02:15,654 --> 00:02:21,628
If your model is not trained using that
data, too, then it may not be optimal.

42
00:02:21,628 --> 00:02:25,724
Due to this, it's actually quite common
to forgo a test set all together.

43
00:02:25,724 --> 00:02:29,873
And just train, using a training
period and a validation period, and

44
00:02:29,873 --> 00:02:31,644
the test set is in the future.

45
00:02:31,644 --> 00:02:35,636
We'll follow some of that
methodology in this course.

46
00:02:35,636 --> 00:02:39,306
Fixed partitioning like this is
very simple and very intuitive, but

47
00:02:39,306 --> 00:02:41,470
there's also another way.

48
00:02:41,470 --> 00:02:45,409
We start with a short training period,
and we gradually increase it,

49
00:02:45,409 --> 00:02:48,102
say by one day at a time,
or by one week at a time.

50
00:02:48,102 --> 00:02:51,401
At each iteration,
we train the model on a training period.

51
00:02:51,401 --> 00:02:55,111
And we use it to forecast the following
day, or the following week,

52
00:02:55,111 --> 00:02:57,250
in the validation period.

53
00:02:57,250 --> 00:03:00,222
And this is called
roll-forward partitioning.

54
00:03:00,222 --> 00:03:04,710
You could see it as doing fixed
partitioning a number of times, and

55
00:03:04,710 --> 00:03:07,848
then continually refining
the model as such.

56
00:03:07,848 --> 00:03:11,502
For the purposes of learning time
series prediction in this course,

57
00:03:11,502 --> 00:03:14,599
will learn the overall code for
doing series prediction.

58
00:03:14,599 --> 00:03:17,949
Which you could then apply yourself
to a roll-forward scenario, but

59
00:03:17,949 --> 00:03:20,232
our focus is going to be
on fixed partitioning.