1
00:00:11,190 --> 00:00:15,610
So in this lecture, we will be looking at how to do cross-validation and profit.

2
00:00:16,290 --> 00:00:19,350
We'll start by importing a function called cross-validation.

3
00:00:24,150 --> 00:00:26,860
The next step is to call the cross-validation function.

4
00:00:27,600 --> 00:00:32,360
This will take in a trained model along with values for initial period and horizon.

5
00:00:33,120 --> 00:00:37,810
As you recall, initial represents the number of initial training points, period.

6
00:00:37,810 --> 00:00:42,690
It represents the step size in horizon, represents the number of steps to forecast.

7
00:00:43,650 --> 00:00:48,020
Notice the syntax for these arguments, which are not just numbers but strings.

8
00:00:48,450 --> 00:00:54,000
I'd recommend looking at the official documentation to check for other examples on how to use this.

9
00:01:01,250 --> 00:01:06,570
OK, so the cross-validation function gives us back a data frame, which we've called KVI.

10
00:01:07,490 --> 00:01:09,740
So the next step is to simply print it out.

11
00:01:14,050 --> 00:01:20,680
OK, so as you can see, this is the format I showed you in the earlier lectures we have D why had why

12
00:01:20,680 --> 00:01:27,670
have lower in Y had upper Y itself and cut off recall the format for these dates cut off.

13
00:01:27,670 --> 00:01:30,070
Tells us where we started the forecast from.

14
00:01:30,610 --> 00:01:34,570
D tells us the timestamp for each forecasted value.

15
00:01:38,230 --> 00:01:42,080
Now, luckily, we don't have to manipulate the previous data frame ourselves.

16
00:01:42,550 --> 00:01:46,330
Instead, we can import this function called performance metrics.

17
00:01:49,850 --> 00:01:53,010
The next step is to call this function and see what we get back.

18
00:01:53,690 --> 00:01:58,040
Notice that this gives us back another data frame, which I've simply called PM.

19
00:02:02,720 --> 00:02:08,510
OK, so notice what's in this data frame, hopefully you took notes during the three lecture that you

20
00:02:08,510 --> 00:02:09,370
can refer back to.

21
00:02:09,380 --> 00:02:15,800
Now, the first thing to recognize is that we get a bunch of metrics, including Amisi, our M.S. and

22
00:02:15,800 --> 00:02:16,570
so forth.

23
00:02:17,510 --> 00:02:22,520
Note that there's a message here stating that map has been skipped since the target values were close

24
00:02:22,520 --> 00:02:23,180
to zero.

25
00:02:23,900 --> 00:02:26,510
As you recall, this results in an infinite map.

26
00:02:27,620 --> 00:02:31,250
The next thing to notice is that the horizon starts at six days.

27
00:02:31,850 --> 00:02:38,090
Again, recall that all of these metrics are computed from moving average windows since our horizon

28
00:02:38,090 --> 00:02:41,300
was 60 days and the default window size is 10 percent.

29
00:02:41,660 --> 00:02:44,600
The first value we will see is that six days.

30
00:02:51,230 --> 00:02:57,800
OK, so luckily, profit also has a function to plot the cross-validation data frame called plot cross-validation

31
00:02:57,800 --> 00:02:58,510
metric.

32
00:02:59,030 --> 00:03:03,420
So let's import this function and call it using the metric SMAP.

33
00:03:09,200 --> 00:03:14,880
OK, so we get back a plot showing the rolling window SMAP over the horizon again.

34
00:03:14,990 --> 00:03:17,090
Notice how it starts at six days.

35
00:03:17,880 --> 00:03:22,960
Also notice that the area stays relatively constant over the entire 60 day horizon.

36
00:03:27,380 --> 00:03:32,210
The next step is to call the cross-validation function on our second model, where we ignore the days

37
00:03:32,210 --> 00:03:33,190
which were not open.

38
00:03:41,600 --> 00:03:47,450
The next step is to call performance metrics on DCB to note that we won't bother printing it out this

39
00:03:47,450 --> 00:03:48,040
time.

40
00:03:51,780 --> 00:03:55,860
The next step is to call Plott cross-validation Metrick on Dfki to.

41
00:04:01,280 --> 00:04:03,640
Interestingly, it has a cyclical pattern.

42
00:04:07,170 --> 00:04:12,390
The next step is to do the same thing for Model three, which is where we added us holidays.

43
00:04:21,800 --> 00:04:26,360
The next step is to call Plott cross-validation Metrick on Dfki three.

44
00:04:32,240 --> 00:04:34,390
Again, we see this cyclical pattern.

45
00:04:37,150 --> 00:04:42,790
The next step is to perform cross-validation on model four, which is the model where we added external

46
00:04:42,790 --> 00:04:43,690
regressions.

47
00:04:51,400 --> 00:04:55,600
The next step is to call Plott cross-validation Metrick on Dfki for.

48
00:05:01,320 --> 00:05:07,680
So notice this has a similar pattern as our first model, which included all the zero days, thus it

49
00:05:07,680 --> 00:05:14,130
seems as if what has the most impact is simply removing any days with zero sales or in other words,

50
00:05:14,130 --> 00:05:15,510
when the store is closed.

51
00:05:18,800 --> 00:05:23,810
The final thing we're going to do in this lecture is plot the mean SMAP for each of the performance

52
00:05:23,810 --> 00:05:25,940
metric data frames we got back.

53
00:05:29,840 --> 00:05:36,020
So here we can see that for our first model, we have the worst SMAP, the second worst Esmay is the

54
00:05:36,020 --> 00:05:40,810
one where we added all the external progressors but kept the days where the store was closed.

55
00:05:41,270 --> 00:05:45,770
So it seems that adding all those external progressors only had a tiny effect.

56
00:05:47,780 --> 00:05:52,580
The best estimate comes from the second model where we simply removed any dates, where the store was

57
00:05:52,580 --> 00:05:53,390
closed.

58
00:05:54,140 --> 00:05:59,030
Notice that the third model is slightly worse, which is the one where we added us holidays.

59
00:05:59,480 --> 00:06:02,450
So it appears that adding US holidays did not help.