1
00:00:11,070 --> 00:00:17,310
So in this lecture, we'll be talking about the data model used by Adewusi forecast, so the main thing

2
00:00:17,310 --> 00:00:22,640
to remember is that US forecast is not an API like Cyclone or Tenzer flow.

3
00:00:23,010 --> 00:00:28,080
So your data is not an umpire, Ray, and your model is not really an object with fit and predictive

4
00:00:28,080 --> 00:00:28,860
functions.

5
00:00:29,280 --> 00:00:33,050
Adewusi forecast requires your data to be in a specific format.

6
00:00:33,360 --> 00:00:35,450
So that's what we'll be discussing in this lecture.

7
00:00:36,210 --> 00:00:41,640
By the end of this lecture, you should be able to relate how eight of US forecast treats a time series

8
00:00:41,640 --> 00:00:45,090
data set to the data format we've been using in this course.

9
00:00:45,990 --> 00:00:51,810
To remind you, we've been looking at a time series, Arrays of Shape TBD, where T is the number of

10
00:00:51,810 --> 00:00:54,330
time steps and D the dimensionality.

11
00:00:59,000 --> 00:01:07,400
So the basic format will need for us forecast is a CSV, as you recall, a CSV is just a text file organized

12
00:01:07,400 --> 00:01:08,030
as a table.

13
00:01:08,600 --> 00:01:15,680
Each line is a row and each column is separated by columns for a simple one dimensional time series.

14
00:01:15,800 --> 00:01:17,270
We need three columns.

15
00:01:17,660 --> 00:01:19,310
The first column is Timestamp.

16
00:01:19,700 --> 00:01:26,430
This must be in a typical daytime format with year, month, day and optionally hour, minute and second.

17
00:01:27,110 --> 00:01:32,000
So, for example, if you're working with the Time series where the Time index is just an integer that

18
00:01:32,000 --> 00:01:36,280
will not work, your time index has to be in the form of a daytime string.

19
00:01:37,280 --> 00:01:39,320
The second column is the target value.

20
00:01:39,920 --> 00:01:43,050
Basically, this represents the amplitude of your time series.

21
00:01:43,520 --> 00:01:48,500
So if your Time series is the price of a stock over time, then this column would still the price.

22
00:01:49,520 --> 00:01:51,430
The third column is the item ID.

23
00:01:52,100 --> 00:01:57,920
Basically, this allows you to have separate time series for multiple items at once with separate forecast

24
00:01:57,920 --> 00:01:58,560
for each.

25
00:01:59,090 --> 00:02:04,370
So for example, if you're a company like Wal-Mart that sells many products, you can have a separate

26
00:02:04,370 --> 00:02:06,440
time series for the sales of each product.

27
00:02:06,740 --> 00:02:10,670
Just give them each a unique item ID for stock prices.

28
00:02:10,670 --> 00:02:15,680
You could have separate stocks, for example, you could try to make a forecast for all five hundred

29
00:02:15,680 --> 00:02:16,970
stocks in the S&amp;P.

30
00:02:17,840 --> 00:02:21,710
So this is like a vector auto regression, but with the much more powerful model.

31
00:02:26,530 --> 00:02:33,000
Now, Amazon forecast also has a concept for a different kind of time series called a Related Time series,

32
00:02:33,550 --> 00:02:34,460
as you recall.

33
00:02:34,480 --> 00:02:40,120
One thing we can do in machine learning is to use other features in order to help us predict the value

34
00:02:40,120 --> 00:02:40,930
of our target.

35
00:02:41,500 --> 00:02:46,630
For example, the number of sales of a product might be influenced by its price, and whether it was

36
00:02:46,630 --> 00:02:51,970
raining outside for us will be using the other columns of the stock price data set.

37
00:02:52,750 --> 00:02:57,750
As you recall, these typically come with open, high, low, close and volume columns.

38
00:02:58,300 --> 00:03:02,100
The close price is typically the column that people are most interested in.

39
00:03:02,230 --> 00:03:04,840
And so that would represent our Target Time series.

40
00:03:05,020 --> 00:03:06,840
That's the one we're going to try to predict.

41
00:03:07,450 --> 00:03:11,730
The other columns can be used as related time series as before.

42
00:03:11,770 --> 00:03:15,640
This will be stored as a CSV and the format will be very similar.

43
00:03:16,270 --> 00:03:20,800
Specifically, we will again require columns for timestamp in item ID.

44
00:03:21,370 --> 00:03:23,440
The other columns can be whatever we choose.

45
00:03:23,830 --> 00:03:27,790
So for the coming example, that would be open, high, low in volume.

46
00:03:32,440 --> 00:03:38,620
One important fact to note about Amazon forecast is that by default, it treats missing data as zero.

47
00:03:39,130 --> 00:03:45,170
This is problematic because we know that stock price data is not available on non-trading days.

48
00:03:45,820 --> 00:03:51,040
Therefore, the Time series will effectively fluctuate between very large values and zeroes.

49
00:03:51,940 --> 00:03:57,850
Now, one solution is just to upload the data as is, and see if Amazon forecast is smart enough to

50
00:03:57,850 --> 00:03:59,440
predict these variations.

51
00:03:59,980 --> 00:04:06,160
Theoretically, this should be possible since Amazon forecast is advertised as a state of the art no

52
00:04:06,160 --> 00:04:07,990
experience necessary solution.

53
00:04:08,800 --> 00:04:11,640
Another option is to fill in the missing data ourselves.

54
00:04:12,070 --> 00:04:17,320
Amazon forecast does have some options for this, but in my opinion they are not that flexible.

55
00:04:18,100 --> 00:04:23,200
For example, we know that for stock prices we would like to carry the previous value forward, but

56
00:04:23,200 --> 00:04:24,950
there is currently no option for this.

57
00:04:25,600 --> 00:04:31,460
Therefore, what we will do is fill in missing data inside our CSV before uploading to US three.

58
00:04:32,020 --> 00:04:36,130
In this way, we can have more control over how missing data is dealt with.

59
00:04:40,630 --> 00:04:46,360
Another question to consider is whether or not we should pre-process our data, as you've seen, this

60
00:04:46,360 --> 00:04:48,760
is quite important when it comes to Time series.

61
00:04:49,240 --> 00:04:53,000
For example, we like to difference our data so that it becomes stationary.

62
00:04:53,560 --> 00:04:59,260
We also like to normalize or standardize our data so that each dimension takes on values along the same

63
00:04:59,260 --> 00:04:59,840
scale.

64
00:05:00,430 --> 00:05:04,630
But for this exercise, we will not do any preprocessing on our data.

65
00:05:05,290 --> 00:05:11,200
This will let us confirm or validate whether or not Adewusi forecast is smart enough to handle these

66
00:05:11,200 --> 00:05:17,290
transformations automatically, given that it is a paid solution that takes hours of training time and

67
00:05:17,290 --> 00:05:22,450
several minutes just to pre-process the data and generate forecasts, one might expect this to be the

68
00:05:22,450 --> 00:05:23,120
case.

69
00:05:23,740 --> 00:05:29,320
It's also advertised for those with no machine learning experience, which is another sign that these

70
00:05:29,320 --> 00:05:31,660
transformations should be done automatically.

71
00:05:32,260 --> 00:05:37,830
Someone without machine learning experience wouldn't know about data standardization or stationary,

72
00:05:38,170 --> 00:05:42,160
so it stands to reason that this would be handled without any intervention.

73
00:05:43,030 --> 00:05:47,170
I think this would be a prerequisite for any enterprise solutions such as this.

74
00:05:51,770 --> 00:05:57,860
The final topic I want to discuss in this lecture is what kinds of data HWC forecast is appropriate

75
00:05:57,860 --> 00:05:59,360
for for us.

76
00:05:59,370 --> 00:06:01,370
We'll be using it to predict stock prices.

77
00:06:01,850 --> 00:06:06,280
I've chosen this problem because we've already established that it is a hard problem.

78
00:06:06,800 --> 00:06:12,950
So an obvious question to ask is, if we take a powerful enterprise solution and throw it at this problem,

79
00:06:13,280 --> 00:06:19,880
will we get a better result to contrast something like airline passengers would not be interesting because

80
00:06:19,880 --> 00:06:23,040
we can do that on our home computers in a few seconds.

81
00:06:23,600 --> 00:06:27,520
That's also the reason why I'm conflating stock prices with stock returns.

82
00:06:27,980 --> 00:06:33,230
We know that smart models like Arima will do different thing automatically and hence under the hood.

83
00:06:33,410 --> 00:06:38,480
It really is working with a stationary series of returns rather than actual prices.

84
00:06:38,930 --> 00:06:43,970
However, the final output is still a price because it accumulates the returns and does the reverse

85
00:06:43,970 --> 00:06:44,810
transformation.

86
00:06:45,680 --> 00:06:52,160
As you saw previously, one common mistake by many beginners and bloggers is that they do one step forecast

87
00:06:52,160 --> 00:06:56,150
directly on the price and present them as a multistep forecast.

88
00:06:56,660 --> 00:07:01,220
And what they end up really doing is just building a model that tries to predict the previous value.

89
00:07:01,220 --> 00:07:07,670
In the Time series, we'll see that Amazon forecast does not allow any opportunity to make such mistakes

90
00:07:07,880 --> 00:07:13,100
because the forecast is always several steps ahead from the data that you actually provide.

91
00:07:17,730 --> 00:07:24,000
But aside from forecasting stock prices or stock returns, if you look at Amazon's documentation, you'll

92
00:07:24,000 --> 00:07:26,230
see that it's really geared towards businesses.

93
00:07:26,790 --> 00:07:32,490
We see this immediately when we consider that the time index is a time stamp and not just a generic

94
00:07:32,490 --> 00:07:33,480
time index.

95
00:07:33,960 --> 00:07:39,300
For example, if you are analyzing brain signals or the GPS on a car, you don't really care about the

96
00:07:39,300 --> 00:07:40,360
date or the time.

97
00:07:41,310 --> 00:07:44,280
And of course, those are really nonbusiness applications.

98
00:07:44,950 --> 00:07:52,230
Adewusi forecast also allows you to upload item metadata such as brand color, model category and other

99
00:07:52,230 --> 00:07:57,400
features of items that might be sold or used by a business or some example industries.

100
00:07:57,420 --> 00:08:02,150
Amazon mentions are retail, finance, logistics and health care.

101
00:08:02,640 --> 00:08:08,970
Some applications they list are inventory forecasting, workforce forecasting, web traffic forecasting,

102
00:08:09,210 --> 00:08:12,710
service capacity forecasting and financial forecasting.

103
00:08:13,140 --> 00:08:19,380
So clearly based on Amazon's own advertising, we can tell that this product is really meant for businesses.

104
00:08:24,150 --> 00:08:30,330
So here are some specific use cases, if you're in retail, you can forecast product demand, which

105
00:08:30,330 --> 00:08:35,330
would allow you to control the amount of inventory and pricing for different store locations.

106
00:08:35,760 --> 00:08:40,920
If you're in supply chain planning, you can forecast the quantity of raw goods and services required

107
00:08:40,920 --> 00:08:41,970
by manufacturing.

108
00:08:42,810 --> 00:08:48,510
For many businesses, forecasting can be used for staffing levels, advertising, energy consumption

109
00:08:48,660 --> 00:08:49,920
and server capacity.

110
00:08:50,790 --> 00:08:53,970
Finally, forecasting can be used for operational planning.

111
00:08:54,630 --> 00:09:00,120
For example, forecasting the amount of web traffic to your website or even your company's usage of

112
00:09:00,120 --> 00:09:01,500
Adewusi itself.

113
00:09:02,340 --> 00:09:09,240
OK, so hopefully this list gave you some idea of how HWC forecast could be used in your business and

114
00:09:09,240 --> 00:09:11,700
the different types of time series that can be used for.
