1
00:00:11,090 --> 00:00:18,170
OK, so in this video, we will be looking at how to use profit in the previous video, we got our data

2
00:00:18,170 --> 00:00:21,900
into the right format stored in a data frame called DFB.

3
00:00:22,250 --> 00:00:25,520
So now we can go straight into creating our profit instant's.

4
00:00:30,640 --> 00:00:33,580
The next step is to call fit passing in our data frame.

5
00:00:36,840 --> 00:00:42,600
Note that during the editing process, Prophet Prince had some information, so here it is pointed out

6
00:00:42,600 --> 00:00:45,190
that daily seasonality has been disabled.

7
00:00:45,750 --> 00:00:51,990
Of course, this makes sense since our data is daily, in order to measure any daily seasonality, we

8
00:00:51,990 --> 00:00:53,610
would need data on a finer scale.

9
00:00:55,650 --> 00:01:01,320
Now that we fitted our model, the next step is to work on the process of making a prediction, as you

10
00:01:01,320 --> 00:01:03,840
recall, this actually takes several steps.

11
00:01:04,650 --> 00:01:07,650
So the first step is to call make future data frame.

12
00:01:08,220 --> 00:01:10,470
This takes in an argument called periods.

13
00:01:11,040 --> 00:01:15,480
So periods is the number of steps into the future you want to forecast.

14
00:01:19,200 --> 00:01:23,420
So the next step is to call Futuregrowth head to see what we actually got back.

15
00:01:27,980 --> 00:01:33,080
As you recall, when we make a so-called forecast, this will actually include predictions for both

16
00:01:33,080 --> 00:01:35,160
in sample data and the future.

17
00:01:35,630 --> 00:01:40,640
This is why we see dates starting from 2013, which is the start of our Time series.

18
00:01:41,420 --> 00:01:46,320
Note also that this only contains a single column, which is the timestamp for each prediction.

19
00:01:46,880 --> 00:01:50,240
So this data frame does not actually contain any predictions.

20
00:01:52,680 --> 00:01:54,750
The next step is to call future ducktail.

21
00:01:58,300 --> 00:02:04,870
So now we can see deeds going up to 20, 16, a 730, which, as you recall, is three hundred and sixty

22
00:02:04,870 --> 00:02:07,270
five days beyond our Time series.

23
00:02:10,450 --> 00:02:16,120
The next step is to make the actual forecast by calling Emerg predict this takes in the future data

24
00:02:16,120 --> 00:02:21,010
frame we just created and gives us back a new data frame, which we will call forecast.

25
00:02:26,340 --> 00:02:29,990
The next step is to call forecastable to see what we get back.

26
00:02:34,700 --> 00:02:40,970
So as you can see, we get back quite a few columns, the first column is the timestamp, which corresponds

27
00:02:40,970 --> 00:02:43,510
to the time stamps in our data frame called future.

28
00:02:44,480 --> 00:02:45,980
The next column is treant.

29
00:02:46,760 --> 00:02:52,910
So as you recall, profit models, each component separately, and thus the forecast contains predictions

30
00:02:52,910 --> 00:02:54,460
for each individual component.

31
00:02:54,740 --> 00:03:01,610
In addition to the final prediction, the next two columns are for while lower in a white hat upper,

32
00:03:01,820 --> 00:03:04,250
which are the prediction intervals for white hat.

33
00:03:05,570 --> 00:03:10,190
The next two columns are for trend lower and trend upper, which are the prediction intervals for the

34
00:03:10,190 --> 00:03:10,880
trend.

35
00:03:13,410 --> 00:03:18,120
Now, the next few columns are pretty self-explanatory, so I'll let you go through those on your own

36
00:03:18,120 --> 00:03:24,570
if you choose the final column is Why Hat, which of course is our model prediction.

37
00:03:30,780 --> 00:03:36,210
OK, so the next step is to plot our forecast, which, as you know, contains the sample prediction

38
00:03:36,210 --> 00:03:36,750
as well.

39
00:03:37,530 --> 00:03:43,110
As you recall, profit makes this very easy since everything is built into this function called plot.

40
00:03:48,300 --> 00:03:52,560
OK, so here we have our model predictions, so what does this say?

41
00:03:54,120 --> 00:03:59,810
Firstly, notice that our original time series data appears as a scatterplot of black dots.

42
00:04:00,420 --> 00:04:02,820
Our model predictions are the thick blue line.

43
00:04:03,600 --> 00:04:07,490
The corresponding prediction intervals are the transparent blue lines.

44
00:04:08,040 --> 00:04:12,990
And of course, the forecast period is obvious since that's the period with no black dots.

45
00:04:15,730 --> 00:04:21,910
Notice one peculiar feature of this prediction, which is that the model is not able to precisely determine

46
00:04:21,910 --> 00:04:28,060
that sales should be zero on Sundays or when the store is not open, we can see that the prediction

47
00:04:28,060 --> 00:04:32,380
for some of these days is negative and the prediction interval falls below zero.

48
00:04:33,100 --> 00:04:39,280
So in that respect, this model does require human intervention to make sense of the predictions, although

49
00:04:39,280 --> 00:04:41,230
we'll see better ways of dealing with this later.

50
00:04:41,860 --> 00:04:43,930
Right now, we're just demonstrating the steps.

51
00:04:47,270 --> 00:04:50,120
The next step is to call MWI component's.

52
00:04:55,450 --> 00:04:59,360
So as you can see, this makes a nice plot for each component of the model.

53
00:05:00,100 --> 00:05:05,980
Essentially, this will show the trend, seasonal components and holiday components if they exist.

54
00:05:06,700 --> 00:05:11,120
So the first component is the trend, which we can see is decreasing over time.

55
00:05:12,010 --> 00:05:14,950
Note that there is one point at which the trend changes.

56
00:05:17,820 --> 00:05:24,000
The next component is the weekly component, as you can see, the impact of Sundays is very negative,

57
00:05:24,270 --> 00:05:27,960
which makes sense given that we included Sundays in our Time series.

58
00:05:31,180 --> 00:05:37,060
The final component is the yearly component, notice the large peak around the Christmas holiday, which

59
00:05:37,060 --> 00:05:38,350
of course, makes sense.

60
00:05:43,450 --> 00:05:48,820
OK, so the next step is to build another model, but to only include the days where the store is open,

61
00:05:49,480 --> 00:05:54,610
this is because we already know that the store will have a zero sales on days that is closed.

62
00:05:56,840 --> 00:06:02,130
So we'll start by selecting only the rows from the store, one data frame where open is not zero.

63
00:06:02,840 --> 00:06:05,390
We'll also set the date column to be the index.

64
00:06:09,490 --> 00:06:13,380
The next step is to call the head function to see what our new data frame looks like.

65
00:06:18,160 --> 00:06:19,990
Notice the absence of zeros.

66
00:06:22,970 --> 00:06:27,410
The next step is to rename our columns to winds as required.

67
00:06:31,760 --> 00:06:37,340
The next step is to run through the usual steps of creating a model calling fit and making a prediction.

68
00:06:37,910 --> 00:06:41,740
I've grouped these together since we already know how they work at this point.

69
00:06:49,090 --> 00:06:54,910
OK, so notice that this time our forecast appears to make a lot more sense, this is because it no

70
00:06:54,910 --> 00:06:58,650
longer has to model the zeros on the days where the stores closed.

71
00:06:59,170 --> 00:07:01,370
One way we can see that this model is better.

72
00:07:01,420 --> 00:07:04,060
Is that the prediction interval is much tighter.

73
00:07:04,570 --> 00:07:07,270
So it no longer goes down to negative values.

74
00:07:08,290 --> 00:07:13,450
However, one thing to be careful of is that the predictions for Sunday's and other days in which the

75
00:07:13,450 --> 00:07:16,120
store is not open are no longer valid.

76
00:07:16,660 --> 00:07:22,350
As you recall, the profit model is a continuous model, so it will have predictions for those days.

77
00:07:22,900 --> 00:07:27,970
But as an analyst, we would simply ignore those predictions because we know they will be zero.

78
00:07:31,540 --> 00:07:35,430
The next step is to plot the components of our new model predictions.

79
00:07:40,460 --> 00:07:45,800
So, again, we see that the trend is piece wise, linear, with one change point where we have a steep

80
00:07:45,800 --> 00:07:49,220
decrease at the start and a slower decrease for the rest.

81
00:07:53,630 --> 00:07:59,750
For the weekly companion, notice that this is very different from before, the influence of Sunday

82
00:07:59,750 --> 00:08:02,370
is no longer very negative, but rather high.

83
00:08:02,900 --> 00:08:06,200
This is because we don't actually have any data for Sundays.

84
00:08:06,440 --> 00:08:08,450
So the model is simply interpolating.

85
00:08:09,290 --> 00:08:14,690
Again, if you were an analyst, you would simply ignore this due to your knowledge of the Time series.

86
00:08:18,240 --> 00:08:23,490
When we look at the yearly component, we see the same pattern as before, where we see a large increase

87
00:08:23,490 --> 00:08:24,750
over the winter holiday.
