1
00:00:11,030 --> 00:00:16,490
OK, so in this lecture, we will continue looking at our CoLab notebook and building different profit

2
00:00:16,490 --> 00:00:18,270
models for our Time series.

3
00:00:18,890 --> 00:00:24,020
This lecture will focus on how to incorporate holidays and exogenous progressors, which we have not

4
00:00:24,020 --> 00:00:24,670
yet done.

5
00:00:25,550 --> 00:00:29,630
So we'll start by creating a new profit model, which we will call M3.

6
00:00:34,560 --> 00:00:40,290
The next step is do simply call the function, add country holidays now, please note that this is for

7
00:00:40,290 --> 00:00:42,120
demonstration purposes only.

8
00:00:42,660 --> 00:00:45,900
As you recall, our data set is for a drug story in Europe.

9
00:00:46,350 --> 00:00:52,200
Unfortunately, there is no data about which particular country any stores from as far as I've looked,

10
00:00:52,600 --> 00:00:55,490
although to be fair, I haven't spent much time doing so.

11
00:00:56,070 --> 00:01:01,890
Thus in this lecture, we are simply going to pass in the country, name us and practice.

12
00:01:02,010 --> 00:01:07,200
You will, of course, have a better idea of what the relevant country is, either because you live

13
00:01:07,200 --> 00:01:09,210
there or your client will tell you.

14
00:01:13,950 --> 00:01:19,230
OK, so after this, the steps pretty much remain the same, the next step is to call fit.

15
00:01:19,780 --> 00:01:24,270
We're going to use the second data frame, which does not include the days the store was closed.

16
00:01:29,220 --> 00:01:33,360
The next step is to call make future data frame for a one year forecast.

17
00:01:37,710 --> 00:01:39,360
The next step is to call it.

18
00:01:44,080 --> 00:01:47,280
The next step is to plot the forecast and check our results.

19
00:01:52,760 --> 00:01:57,410
Now, it's not clear how well we are actually doing, since this looks pretty similar to our previous

20
00:01:57,410 --> 00:01:57,780
plot.

21
00:01:58,490 --> 00:02:01,100
We'll look at numerical metrics later in this lecture.

22
00:02:05,000 --> 00:02:07,190
The next step is to call Plott component's.

23
00:02:12,520 --> 00:02:17,740
So as you can see, there is now one extra component which shows the influence of each holiday.

24
00:02:25,600 --> 00:02:30,730
OK, so in the next portion of code, we're going to look at how to add exoticness or grassers.

25
00:02:31,510 --> 00:02:36,490
Now, for the previous example, we did things the lazy way, which was to just call a function that

26
00:02:36,490 --> 00:02:39,930
automatically adds the holidays for a particular country.

27
00:02:40,540 --> 00:02:42,750
If that works for you, then you should use that.

28
00:02:42,760 --> 00:02:48,270
But if not, then this is an example of how you can do a similar thing, but in a more customized way.

29
00:02:49,240 --> 00:02:53,520
So as you recall, the original data set actually includes holidays already.

30
00:02:54,280 --> 00:02:59,740
So the first thing we're going to do here is check the unique values of the column called State Holiday.

31
00:03:03,680 --> 00:03:10,520
As you can see, there are four different values, zero, A, B and C now you'll have to check Kagle

32
00:03:10,520 --> 00:03:12,140
for this, but zero means no.

33
00:03:12,140 --> 00:03:14,270
Holiday means public holiday.

34
00:03:14,510 --> 00:03:17,030
B means Easter and C means Christmas.

35
00:03:19,870 --> 00:03:22,030
The next step is to check school holiday.

36
00:03:25,560 --> 00:03:28,950
So in this case, we only have two values, zero or one.

37
00:03:32,330 --> 00:03:35,990
The next step is to add these new columns to our profit data frame.

38
00:03:36,620 --> 00:03:41,260
So in the basic case, the profit data frame should only contain ads and why?

39
00:03:41,720 --> 00:03:45,540
But when you add exogenous progressors, then you can add new columns.

40
00:03:46,190 --> 00:03:51,040
So we'll start by adding open and promo, which are just binary, either zero or one.

41
00:03:52,010 --> 00:03:54,200
The next step is to add the state holiday.

42
00:03:55,070 --> 00:03:57,530
Now, we previously saw that these were letters.

43
00:03:57,680 --> 00:04:03,110
Of course, statistical models don't know what to do with letters since they are mathematical formulas

44
00:04:03,110 --> 00:04:04,730
that only work on numbers.

45
00:04:05,300 --> 00:04:11,750
Thus, what we will do is Dommy encode these values using peaty that get dummy's note that we use drop

46
00:04:11,750 --> 00:04:16,340
first equals true in order to use dummy encoding instead of one hot encoding.

47
00:04:17,030 --> 00:04:21,830
One hot encoding will give us four values, but dummy encoding will give us three values.

48
00:04:22,190 --> 00:04:25,190
The fourth value is redundant, so we don't need to include it.

49
00:04:27,260 --> 00:04:33,530
So this will return three columns, which we will assign to be S.H. one, two and three.

50
00:04:34,370 --> 00:04:39,470
The final step is to add a school holiday, which is simpler because it's just zero or one.

51
00:04:45,190 --> 00:04:49,120
The next step is to call the head function to see what our new data frame looks like.

52
00:04:54,420 --> 00:04:59,790
OK, so as you can see, we have some new columns which correspond to the previous code we just ran.

53
00:05:04,420 --> 00:05:10,210
The next step is to create our next model, which we will call them for this time, what we need to

54
00:05:10,210 --> 00:05:13,900
do before calling fit is to call a function at a agressor.

55
00:05:14,560 --> 00:05:19,550
Note that this takes in one required argument, which is the name of the column for the aggressor.

56
00:05:20,200 --> 00:05:25,960
The second argument is optional, and this specifies how the aggressor should be incorporated into the

57
00:05:25,960 --> 00:05:26,400
model.

58
00:05:27,070 --> 00:05:32,180
The default is to use the seasonality mode, but if you want to choose this yourself, you can choose

59
00:05:32,200 --> 00:05:33,560
additive or multiplicative.

60
00:05:34,900 --> 00:05:37,780
Now you'll see that I've chosen multiplicative for open.

61
00:05:38,650 --> 00:05:42,880
My hope for this is that the model will figure out when the store is closed.

62
00:05:42,880 --> 00:05:45,440
You should multiply by zero to get zero.

63
00:05:45,970 --> 00:05:49,870
Of course, there is no guarantee that this is how the model actually works.

64
00:05:50,920 --> 00:05:54,430
OK, so once we've added all our progressors, we can call it.

65
00:06:00,250 --> 00:06:03,880
The next step is to call make future data frame, as we usually do.

66
00:06:08,410 --> 00:06:14,190
The next step is to create a train IDEX ancestrally, which will be used to index the future data frame,

67
00:06:14,200 --> 00:06:19,260
since both the sample data and out of sample data are included in the same object.

68
00:06:19,990 --> 00:06:25,690
What we're going to do here is check whether or not the DSS value is in the index for the input data

69
00:06:25,690 --> 00:06:26,120
frame.

70
00:06:26,740 --> 00:06:29,950
If that is the case, then of course it belongs to the Transat.

71
00:06:30,550 --> 00:06:32,860
The test index is simply the opposite.

72
00:06:37,140 --> 00:06:42,180
The next step is to add the aggressors to the future data frame, which is required in order to make

73
00:06:42,180 --> 00:06:43,060
predictions.

74
00:06:43,770 --> 00:06:46,340
Recall that this is the same as it is with the rhema.

75
00:06:46,710 --> 00:06:52,170
If you want to use exogenous or aggressors, then you need to specify the input values to be used for

76
00:06:52,170 --> 00:06:52,900
each prediction.

77
00:06:53,880 --> 00:06:58,650
So in this first block, we're going to do this for the Transat, which is easy because it's just a

78
00:06:58,650 --> 00:07:00,180
copy of what we had before.

79
00:07:00,990 --> 00:07:06,210
This will be more difficult for the test set because we don't actually know when the store will be closed

80
00:07:06,360 --> 00:07:09,120
or when there will be a holiday in this European country.

81
00:07:10,480 --> 00:07:13,800
So we'll start by creating a list of exoticness or grassers.

82
00:07:14,580 --> 00:07:16,350
The next step is to live through each one.

83
00:07:17,280 --> 00:07:20,940
Inside the loop will index future for on the left side by train.

84
00:07:20,940 --> 00:07:25,430
IDEX specifying the current column are on the right side.

85
00:07:25,440 --> 00:07:30,450
We simply grab the column are from our input data frame and convert that into a list.

86
00:07:36,500 --> 00:07:42,440
So the difficult part is for the test set in this case, we're going to take a very lazy approach and

87
00:07:42,470 --> 00:07:48,880
just assume that the store will not be open on Sundays, which we've seen for all the other progressors.

88
00:07:48,890 --> 00:07:54,070
We are simply going to copy them from the train set for the past three hundred sixty five days.

89
00:07:54,650 --> 00:07:59,660
Of course, in the real world, your client would simply give you this information or you would get

90
00:07:59,660 --> 00:08:05,000
it yourself based on the specifics of your project for this example, it doesn't really matter what

91
00:08:05,000 --> 00:08:08,120
they are since we don't have the true targets anyway.

92
00:08:14,220 --> 00:08:18,600
The next step is to call future for Ducktail to see what our new data frame looks like.

93
00:08:27,250 --> 00:08:32,650
So notice the difference between the previous example, where we had no progressors and this example,

94
00:08:32,650 --> 00:08:39,300
when we do in the previous example where we had no progressors, the future for data frame contained

95
00:08:39,310 --> 00:08:44,970
only the column D When we do have aggressor's, we have to include those progressors.

96
00:08:49,180 --> 00:08:52,060
OK, so the next step is do call predict as usual.

97
00:08:56,460 --> 00:08:58,410
The next step is to plot our forecast.

98
00:09:04,880 --> 00:09:08,450
OK, so notice how these results were not quite what we expected.

99
00:09:09,050 --> 00:09:14,120
The model still gives us negative values and it still doesn't know that the sales are zero when the

100
00:09:14,120 --> 00:09:15,050
store is not open.

101
00:09:15,800 --> 00:09:21,140
What this tells us is that it's probably better to simply remove those days from the Time series.

102
00:09:24,900 --> 00:09:27,150
The next step is to call Plott component's.

103
00:09:40,040 --> 00:09:45,250
So, again, for the weekly seasonality, we see that Sun has a very negative influence.

104
00:09:48,710 --> 00:09:53,210
Furthermore, for the yearly seasonality, we still see a peak around the winter holiday.

105
00:09:55,570 --> 00:10:00,230
So we can see that adding the new regressions did not completely account for these events.

106
00:10:03,490 --> 00:10:09,370
Also, notice that when we add extra progressors, their components are also plotted, so here we can

107
00:10:09,370 --> 00:10:13,840
see two separate plots for both the additive and multiplicative components.
