1
00:00:11,090 --> 00:00:15,240
So in this lecture, we will be looking at the concepts behind Facebook profit.

2
00:00:15,950 --> 00:00:20,900
Yes, we know it's a Time series modeling and forecasting tool, but we'd like to have a bit of detail

3
00:00:20,900 --> 00:00:22,080
about how it works.

4
00:00:22,700 --> 00:00:27,950
Now, if you're not really concerned with how it works, then feel free to skip ahead to the code preparation.

5
00:00:28,650 --> 00:00:35,060
However, BE warns that not knowing how it works has led to silly use cases of this library, for instance,

6
00:00:35,060 --> 00:00:36,210
predicting stocks.

7
00:00:36,650 --> 00:00:41,720
In fact, we will go through such an example later in this section to show you that this will actually

8
00:00:41,720 --> 00:00:42,590
perform poorly.

9
00:00:43,430 --> 00:00:49,280
So that's one downside of only looking at the API without any knowledge of how the model works, you

10
00:00:49,280 --> 00:00:51,590
may end up doing things that don't make sense.

11
00:00:52,820 --> 00:00:57,680
However, what I think you will find is that the principles of profit are actually quite simple and

12
00:00:57,680 --> 00:00:58,360
intuitive.

13
00:00:58,880 --> 00:01:03,840
We're not going to go over any mathematical details in depth, but just a high level overview.

14
00:01:04,520 --> 00:01:08,480
In fact, Facebook's on paper also does not go into these details.

15
00:01:08,870 --> 00:01:13,610
So it's one of those things where if you really want to know the low level details, you just have to

16
00:01:13,610 --> 00:01:14,510
check the code.

17
00:01:14,910 --> 00:01:17,860
It is open source, so it's available for you to do that.

18
00:01:18,560 --> 00:01:22,430
Otherwise, I think the high level principles are good enough for most of us.

19
00:01:23,150 --> 00:01:25,910
Please note that the paper has been linked in extra reading.

20
00:01:27,290 --> 00:01:29,230
It's called forecasting at scale.

21
00:01:29,630 --> 00:01:33,350
So check that out if you want to read the primary source material.

22
00:01:37,700 --> 00:01:44,030
OK, so as you know, the Facebook profit package is optimized for Business Time series, this means

23
00:01:44,030 --> 00:01:48,950
that a lot of its features are geared towards complexities that businesses have to deal with.

24
00:01:49,580 --> 00:01:52,750
One example is seasonality at multiple scales.

25
00:01:53,180 --> 00:01:59,120
For example, you may have weekly seasonality where you have less customers on Sundays, but you also

26
00:01:59,120 --> 00:02:04,610
may have yearly seasonality based around holidays such as Christmas, New Year's, Valentine's Day and

27
00:02:04,610 --> 00:02:05,390
so forth.

28
00:02:06,110 --> 00:02:11,540
Another example is events which are cyclical but not exactly seasonal with the predefined frequency.

29
00:02:12,080 --> 00:02:15,320
Some examples are Thanksgiving, Black Friday and Easter.

30
00:02:15,980 --> 00:02:20,520
You may also have events which are based on the lunar calendar, such as the Lunar New Year.

31
00:02:21,290 --> 00:02:25,700
This would be difficult to model with the other tools we've learned about, but profit makes this kind

32
00:02:25,700 --> 00:02:26,480
of thing easy.

33
00:02:28,900 --> 00:02:35,320
Another feature that prophet can model is changes in trend, this may happen if, for example, you

34
00:02:35,320 --> 00:02:39,040
create a new website design or perhaps you released some new products.

35
00:02:41,470 --> 00:02:46,660
Yet another issue that Facebook profit can easily deal with is outliers and missing data.

36
00:02:47,350 --> 00:02:49,750
We'll see how these can be dealt with quite elegantly.

37
00:02:54,480 --> 00:02:59,820
OK, so the one and only equation you really need to know in order to understand how Facebook profit

38
00:02:59,820 --> 00:03:06,480
works is this it says that our model of the Time series is simply the addition of three components,

39
00:03:06,720 --> 00:03:08,930
along with an unpredictable error term.

40
00:03:09,930 --> 00:03:14,140
The three components are the trend that the seasonality and the holidays.

41
00:03:14,730 --> 00:03:19,760
So although some seasonality could be due to holidays, you can see that these are modeled separately.

42
00:03:21,120 --> 00:03:27,750
So in this case, Gifty represents the trend, which represents the non periodic changes in the series.

43
00:03:28,290 --> 00:03:32,850
S of T represents periodic changes which may happen at different scales.

44
00:03:33,000 --> 00:03:40,530
For example, weekly or yearly HFT represents holidays which could be irregular and could occur over

45
00:03:40,530 --> 00:03:41,550
more than one day.

46
00:03:42,570 --> 00:03:48,570
Note that like other many time series models, Epsilon T is assumed to come from a normal distribution.

47
00:03:49,920 --> 00:03:55,860
One interesting fact, which is not totally obvious at this point, is that this model is not auto regressive,

48
00:03:55,860 --> 00:04:00,210
but rather the paper states that time itself is the only agressor.

49
00:04:00,900 --> 00:04:05,280
We'll see exactly what this means when we examine each of the components in more detail.

50
00:04:06,450 --> 00:04:10,190
One consequence of this is that missing data is immaterial.

51
00:04:10,530 --> 00:04:15,420
No prediction depends on any other value, but only some equation based on time itself.

52
00:04:16,050 --> 00:04:20,460
Again, if this is not clear, you'll see what we mean by this in the next few slides.

53
00:04:25,060 --> 00:04:31,510
OK, so let's start by discussing the trend G of T. So essentially there are two ways that profit models

54
00:04:31,510 --> 00:04:32,090
trends.

55
00:04:32,680 --> 00:04:34,740
The first method is piecewise linear.

56
00:04:35,410 --> 00:04:40,600
This automatically presumes that the model will also figure out the boundaries of each linear piece,

57
00:04:40,870 --> 00:04:43,180
something we call change point detection.

58
00:04:45,270 --> 00:04:48,220
The second method is logistic growth or decay.

59
00:04:48,870 --> 00:04:51,240
So an example of this is covid-19.

60
00:04:51,810 --> 00:04:57,360
Of course, there is a limit to the number of people who can be infected since there is a limited population.

61
00:04:58,530 --> 00:05:00,950
Another example is sales of a product.

62
00:05:01,500 --> 00:05:05,710
The sales of a product would most likely be limited by the size of the market.

63
00:05:06,480 --> 00:05:12,060
The most basic form of logistic growth is shown by the equation here, which you might recognize as

64
00:05:12,060 --> 00:05:13,740
the sigmoid from deep learning.

65
00:05:14,910 --> 00:05:20,460
The value of C is the maximum and in profits terminology, this is called the carrying capacity.

66
00:05:25,080 --> 00:05:31,170
Now, the actual equations representing piecewise linear trend and logistic trend are a bit more complex,

67
00:05:31,320 --> 00:05:33,630
but they follow the same usual forms.

68
00:05:34,200 --> 00:05:39,720
Note that for both of these equations, you can see that they are not auto regressive, unlike Arima

69
00:05:39,720 --> 00:05:41,130
and the other models we've studied.

70
00:05:42,270 --> 00:05:46,360
Instead, you can see that the time t is the only input argument.

71
00:05:47,130 --> 00:05:54,030
So for the linear case, we can see that it has the usual form slope multiplied by input plus intercept

72
00:05:54,780 --> 00:05:56,070
for the logistic case.

73
00:05:56,070 --> 00:05:59,290
If we squint really hard, we can see the same form we saw earlier.

74
00:05:59,820 --> 00:06:04,650
It just allows for some adjustment and the carrying capacity, rate of change and the offset.

75
00:06:05,370 --> 00:06:09,900
These aren't too important for our use case, but they may be useful if you want to explain the model

76
00:06:09,900 --> 00:06:10,840
to your clients.

77
00:06:11,280 --> 00:06:14,390
So have a look at the paper and read about these other parameters.

78
00:06:14,580 --> 00:06:17,190
If your clients are interested in those details.

79
00:06:21,860 --> 00:06:27,050
So for the seasonal component, we have this interesting equation which should look familiar if you're

80
00:06:27,050 --> 00:06:31,790
an engineer or a physicist, basically this is just a four year series.

81
00:06:32,300 --> 00:06:35,500
If you're not an engineer or a physicist, that's OK, too.

82
00:06:35,900 --> 00:06:41,090
You can see that it's just the sum of science and cosigns, which, of course, produce a periodic time

83
00:06:41,090 --> 00:06:41,790
series.

84
00:06:43,070 --> 00:06:48,410
What's important to note about this is it's the sum of multiple signs and cosigns, and each term in

85
00:06:48,410 --> 00:06:52,670
the sum has a different period thanks to the argument to buy A..

86
00:06:52,670 --> 00:06:57,010
Over P in this case, Big P is like the overall period.

87
00:06:57,410 --> 00:07:02,080
So it's three sixty five point two five four yearly data and seven for weekly data.

88
00:07:02,240 --> 00:07:08,330
When the time variable is scaled in days, the parameters to be optimized are the A's and B's, which

89
00:07:08,330 --> 00:07:11,540
are found by optimizing some lost function as per usual.

90
00:07:16,400 --> 00:07:22,070
OK, so the final component is the holiday component, which is essentially represented as a one hot

91
00:07:22,070 --> 00:07:29,510
vector, thus the influence of any single holiday is governed by a single parameter, which is additive.

92
00:07:30,320 --> 00:07:35,930
Note that the paper mentions that the model also accounts for days surrounding the holiday, but it

93
00:07:35,930 --> 00:07:37,790
doesn't specify exactly how.

94
00:07:42,520 --> 00:07:46,840
OK, so one interesting aspect of the profit model is that it is Bayesian.

95
00:07:47,500 --> 00:07:53,290
This is not obvious from the way it's been presented so far, but all of the model parameters have priors.

96
00:07:53,830 --> 00:08:00,610
Behind the scenes, profit is built using STEM, which is a library for Bazian Machine Learning, although

97
00:08:00,610 --> 00:08:04,660
that sounds complex and allows for some human in the loop optimization.

98
00:08:05,380 --> 00:08:09,210
So in other words, you can include your client in the modeling process.

99
00:08:09,640 --> 00:08:14,500
For example, if you find that the change points for the trends are too sensitive, you can try different

100
00:08:14,500 --> 00:08:19,350
settings for the priors governing the change points to fit the expectations of your client.
