1
00:00:11,680 --> 00:00:17,770
In this lecture we are going to do what I think is a lot of students favorite exercise how to predict

2
00:00:17,830 --> 00:00:19,430
stock returns.

3
00:00:19,600 --> 00:00:21,660
We are going to use the Alice theme for this.

4
00:00:21,700 --> 00:00:25,930
The most powerful are an end unit we've encountered so far.

5
00:00:25,930 --> 00:00:30,310
I hope that as you go through this lecture you will consider everything you've learned so far.

6
00:00:30,340 --> 00:00:35,590
Throughout this section this lecture is going to walk you through a prepared call lab notebook.

7
00:00:35,710 --> 00:00:41,680
Although a very good exercise which I always recommend is once you know how this is done to try and

8
00:00:41,680 --> 00:00:47,890
recreate it yourself with as few references as possible as usual you can look at the title of the notebook

9
00:00:48,280 --> 00:00:50,940
to determine what notebook we are currently looking at.

10
00:00:56,090 --> 00:01:00,780
First I want to answer the question What is the motivation behind this example.

11
00:01:00,800 --> 00:01:01,670
Why are we doing it.

12
00:01:02,540 --> 00:01:04,730
Well the answer may surprise you.

13
00:01:04,760 --> 00:01:10,460
There are a lot of resources out there that will claim to be able to predict stock prices using LSD

14
00:01:10,460 --> 00:01:11,710
Ms.

15
00:01:11,720 --> 00:01:17,480
Now you must ask yourself how many of these people have actually tried their techniques in the actual

16
00:01:17,480 --> 00:01:20,400
stock market using their own money.

17
00:01:20,420 --> 00:01:23,320
I can tell you how many 0.

18
00:01:23,390 --> 00:01:30,290
Unfortunately the methodology that a lot of these resources take is extremely flawed.

19
00:01:30,460 --> 00:01:35,020
You might have even joined the chorus because you thought it was going to teach you how to predict stock

20
00:01:35,020 --> 00:01:37,040
prices using our own ends.

21
00:01:37,060 --> 00:01:42,850
I mean it's understandable if you are a beginner all you know is that deep learning is this new field

22
00:01:42,850 --> 00:01:47,420
and it's very popular because people have started to realize how powerful it is.

23
00:01:47,590 --> 00:01:53,920
And so you start to believe that because deep learning is so powerful it must be good at making predictions

24
00:01:53,920 --> 00:01:57,430
with all sorts of data including stocks.

25
00:01:57,430 --> 00:02:02,140
You see deep learning and you think this will be a game changer for making predictions in the stock

26
00:02:02,140 --> 00:02:03,100
market.

27
00:02:03,100 --> 00:02:08,860
All you need to do is use this magical Ellis to em and you'll automatically extrapolate future stock

28
00:02:08,860 --> 00:02:11,690
prices using historical stock prices.

29
00:02:11,740 --> 00:02:20,500
Unfortunately you have fallen for a marketing trap.

30
00:02:20,590 --> 00:02:26,950
So what this lecture is really about is fixing incorrect thinking taught to you by all the market is

31
00:02:26,950 --> 00:02:29,730
out there pretending to be data scientists.

32
00:02:29,800 --> 00:02:34,180
It's not easy to differentiate between a marketer and a data scientist.

33
00:02:34,180 --> 00:02:37,530
I can make a course on deep learning but so can a marketer.

34
00:02:37,720 --> 00:02:44,170
Sometimes there may be only subtle differences in what we teach because marketers can just hire five

35
00:02:44,170 --> 00:02:50,000
dollar an hour freelancers online who themselves are also not actual data scientists.

36
00:02:50,020 --> 00:02:52,890
The Internet is as egocentric.

37
00:02:53,080 --> 00:02:58,540
People write blogs not because they are interested in or knowledgeable about the topic but because they

38
00:02:58,540 --> 00:03:03,700
want you to click on their links and visit their Web site and view their advertisements.

39
00:03:03,700 --> 00:03:07,260
What ends up happening is everybody starts copying each other.

40
00:03:07,270 --> 00:03:13,980
This has network effects if one guy who has a popular blog ends up writing incorrect code.

41
00:03:14,010 --> 00:03:15,540
Guess what happens.

42
00:03:15,540 --> 00:03:21,800
Everybody who is a marketer or wants to do a CEO is going to copy that guy's incorrect code.

43
00:03:21,930 --> 00:03:28,310
It's going to spread everywhere on the Internet which due to a CEO will hide the true answer from you.

44
00:03:28,320 --> 00:03:31,140
I've seen this in several places throughout my career already.

45
00:03:31,470 --> 00:03:38,160
This example not included one example of this is with vague pre processing and another example is the

46
00:03:38,160 --> 00:03:41,810
formula that Reddit uses in their ranking algorithm.

47
00:03:41,820 --> 00:03:47,490
Unfortunately when people stop thinking for themselves and resort to simply copying others because they

48
00:03:47,490 --> 00:03:50,990
are popular it is easy to make mistakes.

49
00:03:51,030 --> 00:03:54,720
So this lecture is all about undoing those mistakes.

50
00:03:54,750 --> 00:04:00,950
Now back to the code.

51
00:04:00,990 --> 00:04:04,490
The first step here is to use pennies to grab the data from my github.

52
00:04:05,070 --> 00:04:06,060
Yes lucky you.

53
00:04:06,060 --> 00:04:09,840
I am not going to make you download this data yourself.

54
00:04:09,840 --> 00:04:18,570
This data contains daily stock prices for Starbucks from about February 2013 to February 2018.

55
00:04:18,570 --> 00:04:25,430
So let's do a DFT head and see what's in this data frame.

56
00:04:25,550 --> 00:04:31,220
Now first of all if you are not an expert in finance you might be surprised that this data when we talk

57
00:04:31,220 --> 00:04:34,880
about stock prices we usually think of these stock price.

58
00:04:34,880 --> 00:04:39,160
The stock price is a number and each day it's a different number.

59
00:04:39,500 --> 00:04:43,100
So you might think of this as a single variable time series.

60
00:04:43,100 --> 00:04:45,510
And yet here we have multiple columns.

61
00:04:45,560 --> 00:04:47,750
So what are these columns.

62
00:04:47,780 --> 00:04:48,940
The first one is the date.

63
00:04:48,950 --> 00:04:50,720
That's pretty obvious.

64
00:04:50,720 --> 00:04:56,250
Second we have four different numerical columns open high low and close.

65
00:04:56,390 --> 00:04:58,810
Now all of these are stock prices.

66
00:04:58,940 --> 00:05:04,850
So the way the stock market works is it starts early in the morning and it stays open for the workday

67
00:05:05,000 --> 00:05:09,590
and then it ends and stock prices are tracked at a very granular rate.

68
00:05:10,100 --> 00:05:11,630
So it's not just a daily thing.

69
00:05:11,630 --> 00:05:13,050
It's pretty much real time.

70
00:05:13,070 --> 00:05:17,930
So at any split second you can find out the exact stock price.

71
00:05:18,200 --> 00:05:23,920
So to put it in simple terms the open price is the price at the start of the trading day.

72
00:05:24,020 --> 00:05:27,400
The closed price is the price at the end of the trading day.

73
00:05:27,440 --> 00:05:33,770
The high price is the maximum price for that day and the low price is the minimum price for that day.

74
00:05:33,800 --> 00:05:38,850
Finally the volume column is the number of shares of the stock that were traded that day.

75
00:05:38,930 --> 00:05:41,910
If this is new to you this is probably very confusing.

76
00:05:42,140 --> 00:05:44,540
But luckily we always remember my rule.

77
00:05:44,540 --> 00:05:46,270
All data is the same.

78
00:05:46,400 --> 00:05:49,500
The machine learning algorithm doesn't really care what these things are.

79
00:05:49,520 --> 00:05:51,770
They are just a table of numbers.

80
00:05:51,800 --> 00:05:56,840
All we have to do as machine learning students is make sure those numbers are in the right range for

81
00:05:56,840 --> 00:06:04,070
the neural network as you can see the price values are in the tens and the volume values are in the

82
00:06:04,070 --> 00:06:06,010
millions.

83
00:06:06,050 --> 00:06:12,500
So this is an example of where different columns of our data have different scales.

84
00:06:12,520 --> 00:06:16,270
In any case we can see some of the beginning values in 2013.

85
00:06:16,270 --> 00:06:24,610
If we do DFT head and we can see some of the ending values in 2018 if we do the left tale so it ends

86
00:06:24,610 --> 00:06:30,240
on February 7th 2018 all right so that's nice for you.

87
00:06:30,240 --> 00:06:35,430
If you've invested in Starbucks your stock approximately doubled in the past five years

88
00:06:39,810 --> 00:06:40,350
next.

89
00:06:40,410 --> 00:06:47,440
Let's begin our work by demonstrating how some resources out there do this the wrong way.

90
00:06:47,540 --> 00:06:54,020
By the way I'm not naming any names but I know that a lot of you taking this course have encountered

91
00:06:54,020 --> 00:06:56,510
a course where they do an example just like this

92
00:06:59,610 --> 00:07:02,850
so we would like a single variable time series to start with.

93
00:07:02,850 --> 00:07:05,590
So let's just focus on the closing price.

94
00:07:05,790 --> 00:07:10,800
So we're going to do dot values to get this as a num pi array and then we are going to reshape it to

95
00:07:10,800 --> 00:07:13,050
an end by one matrix.

96
00:07:13,050 --> 00:07:18,090
We need to do this because the next step is to apply the standard scalar from psychic learn so that

97
00:07:18,090 --> 00:07:21,280
our data is standardized.

98
00:07:21,310 --> 00:07:24,910
Note that I'm calling the fit function on the first half of the series only.

99
00:07:25,120 --> 00:07:29,260
This is because I don't want to include test data in the training pipeline.

100
00:07:29,680 --> 00:07:33,910
I call fit transform however on the entire dataset.

101
00:07:33,910 --> 00:07:39,280
Also note that because we're going to be taking windows of this time series you would have to do a few

102
00:07:39,280 --> 00:07:45,050
calculations to figure out where exactly the boundary between the train set and the test set is.

103
00:07:45,070 --> 00:07:50,770
I kind of hand wave this and said let's just stop in the middle which is probably good enough but it's

104
00:07:50,770 --> 00:07:56,110
probably off by a little bit the mean and variance won't change that much if we add a few values or

105
00:07:56,110 --> 00:08:03,520
take away a few values so once we're done we flatten the series to get back in any length vector since

106
00:08:03,520 --> 00:08:06,850
that's what the code we were working with before expects to see

107
00:08:11,620 --> 00:08:17,410
so next we have the same code which turns our time series into a supervised learning dataset.

108
00:08:17,530 --> 00:08:22,840
Basically it's going to take windows of length 10 and the target will be the next value.

109
00:08:22,840 --> 00:08:27,310
So conceptually this is no different from our sine wave time series.

110
00:08:27,310 --> 00:08:29,550
The only difference is the data itself.

111
00:08:29,680 --> 00:08:35,140
It is no longer a sine wave but a real data from a real stock.

112
00:08:35,500 --> 00:08:37,890
But again all data is the same.

113
00:08:38,080 --> 00:08:40,800
The algorithms we are using don't care about that.

114
00:08:40,960 --> 00:08:42,970
The code is still exactly the same

115
00:08:48,640 --> 00:08:49,240
next.

116
00:08:49,300 --> 00:08:52,720
Right off the bat we are going to use an Alice TDM network.

117
00:08:52,750 --> 00:08:58,150
Now you could attempt to use a linear model or a simple or an N but we are going to skip that in the

118
00:08:58,150 --> 00:09:02,500
spirit of following what these other resources are doing as an exercise.

119
00:09:02,500 --> 00:09:05,620
You might want to try those approaches and see how well they work

120
00:09:09,150 --> 00:09:15,780
so after defining the model I'm going to instantiate it and I'm going to move the model to the GP you

121
00:09:17,060 --> 00:09:22,460
want to create the laws and optimizer then I'm going to define a training function

122
00:09:25,760 --> 00:09:34,140
then we're going to do the usual train to split and we're going to move the data to the GP you and finally

123
00:09:34,140 --> 00:09:46,150
we train the model all right so let's scroll down and loss is the loss here is the loss per iteration.

124
00:09:46,890 --> 00:09:50,600
So we see that the loss does decrease which is promising.

125
00:09:50,700 --> 00:09:56,970
That means our Ellis team network can indeed predict somewhere close to the next value in the time series

126
00:10:01,940 --> 00:10:09,850
and now we have our one step forecast again this looks pretty darn good if you are just learning deep

127
00:10:09,850 --> 00:10:12,110
learning and you encounter something like this.

128
00:10:12,280 --> 00:10:16,000
You probably thinking to yourself Man I just hit the jackpot.

129
00:10:16,480 --> 00:10:18,340
But then you have to ask yourself.

130
00:10:18,400 --> 00:10:23,080
There must be other people out there in the world who are experts in deep learning.

131
00:10:23,080 --> 00:10:26,740
Are they making it millions or billions off this approach.

132
00:10:26,740 --> 00:10:29,870
You can ask yourself this question even without knowing deep learning.

133
00:10:29,890 --> 00:10:32,490
This is just logical reasoning.

134
00:10:32,710 --> 00:10:37,660
Thus we can assume that in reality something tricky is happening here.

135
00:10:37,720 --> 00:10:41,070
Remember that once that forecasts are not really that useful.

136
00:10:41,410 --> 00:10:47,260
What we would really like to do is forecast for multiple times steps even just two or three.

137
00:10:47,320 --> 00:10:53,730
So let's see what happens.

138
00:10:53,750 --> 00:10:55,730
So here's our multi-step forecast

139
00:10:59,930 --> 00:11:06,120
and that's it for the second half of the dataset.

140
00:11:06,140 --> 00:11:06,410
All right.

141
00:11:06,440 --> 00:11:10,840
So now we can see what's happening when we do a multi-step forecast.

142
00:11:10,850 --> 00:11:15,730
All we see is a pretty straight line.

143
00:11:15,850 --> 00:11:17,400
What does that mean.

144
00:11:17,410 --> 00:11:21,060
It means the model isn't really predicting the next value in the time series.

145
00:11:21,070 --> 00:11:23,560
It's just copying the previous value.

146
00:11:23,770 --> 00:11:26,860
And that actually makes a lot of sense for a time series.

147
00:11:26,860 --> 00:11:33,190
Recall I said earlier that four images in time series signals this data is highly correlated.

148
00:11:33,220 --> 00:11:38,050
If you're looking at a red car and you find a red pixel what are the neighboring pixel values.

149
00:11:38,050 --> 00:11:39,720
Probably also red.

150
00:11:39,910 --> 00:11:45,400
And if you're looking at a time series this time series is probably a smooth function over time.

151
00:11:45,400 --> 00:11:48,970
And we can see that that's exactly the model we've created.

152
00:11:48,970 --> 00:11:56,600
This is a smooth orange line it's not going to jump around from say minus one to plus two and back down

153
00:11:56,600 --> 00:12:01,330
to minus three and so forth.

154
00:12:01,330 --> 00:12:07,060
Now one thing that is nice about this last time I've run this example is that it did capture the trend

155
00:12:07,060 --> 00:12:15,650
that is going to go up but other than that it didn't really do anything useful so it turns out that

156
00:12:15,680 --> 00:12:18,010
this model is not really that great.

157
00:12:18,110 --> 00:12:19,880
In fact it's not really doing much.