1
00:00:11,100 --> 00:00:16,080
So in this lecture, we are going to look at how to apply the code, we just learned to the champagne

2
00:00:16,080 --> 00:00:17,310
sales time series.

3
00:00:18,240 --> 00:00:23,070
Now, one thing I want you to notice about this lecture is that it does not require any new code.

4
00:00:23,580 --> 00:00:25,360
Only the data set has changed.

5
00:00:25,740 --> 00:00:29,890
In fact, you should already have this data set from the previous exercises.

6
00:00:30,420 --> 00:00:35,580
So if you'd like to stop this video and try to do this yourself, please take this opportunity to do

7
00:00:35,580 --> 00:00:36,290
so now.

8
00:00:36,690 --> 00:00:38,080
Otherwise, we'll continue.

9
00:00:39,270 --> 00:00:44,300
Also note that I'm not exaggerating when I say that this did not require any new code.

10
00:00:44,670 --> 00:00:49,950
I simply took the previous script, changed the yooralla of the data set and changed the column names

11
00:00:49,950 --> 00:00:51,250
using fine and replace.

12
00:00:51,780 --> 00:00:54,890
So if you've ever wondered what I mean by all data is the same.

13
00:00:55,170 --> 00:00:56,400
This is a great example.

14
00:00:57,900 --> 00:01:00,360
OK, so let's go through everything super quick.

15
00:01:00,570 --> 00:01:01,800
We've got our imports.

16
00:01:03,960 --> 00:01:05,400
Then we download the data.

17
00:01:06,420 --> 00:01:07,800
Then we load in the data.

18
00:01:10,740 --> 00:01:18,300
Then we take the log transform note that although the ADF test says the different data is stationary,

19
00:01:18,600 --> 00:01:22,980
while the nine different data is non stationary, I did not bother to do any different thing.

20
00:01:23,640 --> 00:01:27,000
So as an exercise, you may want to try that variation.

21
00:01:28,560 --> 00:01:31,500
OK, so then we split the data into training test.

22
00:01:34,290 --> 00:01:36,720
Then we create our data set for a supervised learning.

23
00:01:40,560 --> 00:01:42,240
Then we do another train, says Blair.

24
00:01:45,220 --> 00:01:46,870
Then we fit a linear regression.

25
00:01:51,520 --> 00:01:54,340
OK, so our model seems to have a decent score.

26
00:01:56,570 --> 00:02:01,220
As before we create our data frame indices and then we plot the one step forecast.

27
00:02:10,540 --> 00:02:14,920
OK, so that looks pretty good, but remember, this is only the one step forecast.

28
00:02:19,170 --> 00:02:21,990
The next step is to create our multi-step forecast.

29
00:02:30,500 --> 00:02:36,230
OK, so as you can see, the salsa performed pretty well, in fact, maybe even a bit better than the

30
00:02:36,230 --> 00:02:37,430
one step forecast.

31
00:02:40,310 --> 00:02:46,530
OK, so the next step is to try the multi output forecast, remember that this is still the linear model.

32
00:02:47,180 --> 00:02:51,770
Again, this requires us to create a new data set with multiple targets per row.

33
00:02:55,860 --> 00:02:59,010
Again, we're going to split the data into train and test.

34
00:03:02,800 --> 00:03:04,900
And again, we're going to fit a linear model.

35
00:03:10,100 --> 00:03:12,110
So the score appears to be pretty good.

36
00:03:14,540 --> 00:03:17,540
The next step is to plot our multi output forecast.

37
00:03:24,010 --> 00:03:25,500
Again, it looks pretty accurate.

38
00:03:28,450 --> 00:03:30,210
The next step is to check the map.

39
00:03:33,460 --> 00:03:39,550
So interestingly, the map for the incremental forecast is a bit better than the multi output forecast.

40
00:03:42,460 --> 00:03:47,790
OK, so the next step is to test other models, so we're going to create a helper function to do all

41
00:03:47,790 --> 00:03:48,750
the above work.

42
00:03:54,010 --> 00:03:56,740
So let's check the results for the support vector machine.

43
00:04:01,610 --> 00:04:04,400
OK, so the support vector machine does pretty well.

44
00:04:07,960 --> 00:04:10,150
The next step is to check the random forest.

45
00:04:14,910 --> 00:04:16,170
Again, it does pretty well.

46
00:04:20,390 --> 00:04:24,170
The next step is to create a helper function for the multi output forecast.

47
00:04:29,240 --> 00:04:34,700
So this time we're just going to skip the support vector machine, if you want to implement the multi

48
00:04:34,700 --> 00:04:37,430
output SVR wrapper, please feel free to try that.

49
00:04:43,150 --> 00:04:47,200
OK, so when we check the result for the random forest, we see that it does OK.

50
00:04:48,310 --> 00:04:54,160
So what is interesting about this script is that it seems linear regression with an incremental multistep

51
00:04:54,160 --> 00:04:55,810
forecast was the best.

52
00:04:59,350 --> 00:05:04,510
OK, so as an exercise, there are definitely some things you might want to experiment with that will

53
00:05:04,510 --> 00:05:08,030
not require you to write any new code that you haven't seen before.

54
00:05:08,590 --> 00:05:13,330
For example, comparing these results with previous models and seeing whether or not differences will

55
00:05:13,330 --> 00:05:13,860
help.

56
00:05:14,710 --> 00:05:18,880
So please try any variations you've thought of and share the results in the Q&amp;A.