1
00:00:11,660 --> 00:00:17,450
Finally we're going to move on to the next example where we see how to actually build this model right.

2
00:00:17,570 --> 00:00:20,550
By the way there isn't really anything wrong with the second model.

3
00:00:20,570 --> 00:00:22,570
It just wasn't a good model.

4
00:00:22,580 --> 00:00:27,860
This is unlike the first model where there actually were things wrong with it because it gave us misleading

5
00:00:27,860 --> 00:00:30,810
results and also predicts the wrong thing.

6
00:00:30,920 --> 00:00:37,160
And this final example we are going to make use of all the data open close high and low prices as well

7
00:00:37,160 --> 00:00:37,910
as the volume.

8
00:00:38,600 --> 00:00:43,910
So we are going to have a t by D input where t equals 10 and d equals five.

9
00:00:43,940 --> 00:00:48,290
And from this we are not going to try and predict a return.

10
00:00:48,290 --> 00:00:51,920
Instead we turned this into the simplest problem possible.

11
00:00:51,920 --> 00:00:56,580
Just try to predict whether the price will go up or down.

12
00:00:56,600 --> 00:01:02,720
In other words binary classification if you've worked in machine learning long enough you should have

13
00:01:02,720 --> 00:01:04,600
some intuition about this.

14
00:01:04,970 --> 00:01:10,750
Regression is in general a harder problem than classification with regression.

15
00:01:10,800 --> 00:01:13,300
You have to predict a continuous value.

16
00:01:13,410 --> 00:01:15,470
You can't be too high or too low.

17
00:01:15,510 --> 00:01:22,780
You have to be right on the nose classification is easier especially binary classification.

18
00:01:23,120 --> 00:01:25,240
You don't have to predict the exact number.

19
00:01:25,250 --> 00:01:26,890
You just have to predict a label.

20
00:01:27,020 --> 00:01:29,750
In this case you only have two choices up or down.

21
00:01:30,350 --> 00:01:34,910
So this is what we are usually trying to predict when it comes to stock returns.

22
00:01:34,910 --> 00:01:41,330
Let's get back to the code.

23
00:01:41,510 --> 00:01:47,180
First we are going to start by getting all the data into no higher rates so the input data is going

24
00:01:47,180 --> 00:01:52,380
to be made up of the open high low and close prices as well as the volume.

25
00:01:52,550 --> 00:01:58,620
The target is going to be based on the return.

26
00:01:58,870 --> 00:02:01,340
Next we are going to set an T and D.

27
00:02:01,660 --> 00:02:04,500
Since our data is of shape n by t by D.

28
00:02:04,840 --> 00:02:06,640
T is simple we can just choose that.

29
00:02:06,640 --> 00:02:13,720
So let's stick with 10 although in the future you may want to try different values like 20 30 40 50

30
00:02:13,720 --> 00:02:17,200
and so on these is also simple.

31
00:02:17,330 --> 00:02:22,610
It's just the number of columns in the input data and is a little tricky.

32
00:02:22,610 --> 00:02:28,670
It's the length of the series minus Big T although this should remind you of each of our time series

33
00:02:28,670 --> 00:02:30,430
examples where we did the same thing.

34
00:02:30,530 --> 00:02:32,960
We just never calculated an explicitly

35
00:02:39,440 --> 00:02:42,100
next we're going to normalize the inputs.

36
00:02:42,260 --> 00:02:47,420
This time we're going to make it a little easier on our model instead of the first half being the train

37
00:02:47,420 --> 00:02:49,940
set in the second half being the test set.

38
00:02:50,000 --> 00:02:55,660
Let's make the first two thirds the train set and the last third the test set so as usual we're going

39
00:02:55,660 --> 00:02:57,920
to normalize our data.

40
00:02:57,940 --> 00:03:11,900
This is especially important now since the volume column is very high compared to the price columns.

41
00:03:11,940 --> 00:03:18,840
Next we create our train set X train is of shape entrained by t by D and Y train is of shape and train

42
00:03:19,770 --> 00:03:26,760
then we do a loop where t counts up from zero up to end train and we populate X train and Y train for

43
00:03:26,760 --> 00:03:33,100
x we take the input data from Little T up to a little T plus big T for Y.

44
00:03:33,100 --> 00:03:38,530
We have a boolean which is true if the return at Little T plus big T is bigger than 0

45
00:03:44,950 --> 00:03:46,770
next we create the test set.

46
00:03:47,080 --> 00:03:55,450
As you might expect and test is an A minus N train so x test is of size and a minus and train by t by

47
00:03:55,450 --> 00:04:00,410
D and Y test is a shape and a minus and train.

48
00:04:00,420 --> 00:04:08,390
Next we enter a loop counting from zero up to and minus and train or equivalently and test inside the

49
00:04:08,390 --> 00:04:08,600
loop.

50
00:04:08,600 --> 00:04:14,900
We actually have two indices you refers to the index of X test and Y tests which of course will start

51
00:04:14,900 --> 00:04:22,980
from zero but remember we are indexing our original dataset which must be offset by and train.

52
00:04:22,980 --> 00:04:29,730
So t is equal to you plus and train and we use t to index the original dataset whereas we use you to

53
00:04:29,730 --> 00:04:31,560
index x test and Y test

54
00:04:36,380 --> 00:04:39,320
once we're done that we can make our own n.

55
00:04:39,530 --> 00:04:46,820
Importantly remember that we are now doing binary classification what this means is that our model outputs

56
00:04:46,820 --> 00:04:51,350
are still the same because no matter what our final layer will be a linear.

57
00:04:51,350 --> 00:04:59,060
What makes it different is the lost function I lost is now the binary cross entropy so let's run this

58
00:04:59,150 --> 00:05:00,310
and train our model

59
00:05:06,010 --> 00:05:09,350
that we train for quite a while 300 epochs

60
00:05:15,960 --> 00:05:17,830
so what do we see here.

61
00:05:17,970 --> 00:05:24,210
We see that the train loss goes down a little bit but the validation loss goes up a lot.

62
00:05:24,870 --> 00:05:29,750
So what this tells us is that the model is just again over fitting to the noise in the train set

63
00:05:33,030 --> 00:05:35,330
so that's what it means when the train lost goes down.

64
00:05:35,340 --> 00:05:37,220
But the validation loss goes up

65
00:05:40,570 --> 00:05:44,680
if we look at the accuracy we see the same thing.

66
00:05:44,950 --> 00:05:50,610
The train accuracy goes up but the test accuracy stays at 50 percent.

67
00:05:50,620 --> 00:05:54,870
Now you might ask why doesn't the test accuracy go down.

68
00:05:54,940 --> 00:06:01,570
One important thing to realize about binary classification is that zero accuracy is not the worst accuracy

69
00:06:02,350 --> 00:06:04,420
if your accuracy is zero.

70
00:06:04,450 --> 00:06:08,530
That means you can just reverse all your predictions and you would get one hundred percent.

71
00:06:08,620 --> 00:06:10,870
That would be a perfect model.

72
00:06:10,870 --> 00:06:17,500
In other words for binary classification the worst accuracy is actually 50 percent meaning it's no better

73
00:06:17,500 --> 00:06:19,090
than random guessing.

74
00:06:19,270 --> 00:06:24,580
That's why the test accuracy remains at 50 percent as the model over fits.

75
00:06:28,590 --> 00:06:30,170
All right so that's the end of this script

76
00:06:33,200 --> 00:06:39,380
let's now discuss and summarize everything we just did since it was probably a lot to take in we just

77
00:06:39,380 --> 00:06:44,420
went through three examples of predicting stock returns in the first example.

78
00:06:44,450 --> 00:06:49,600
We used an Alice team to predict future stock prices from past stock prices.

79
00:06:49,790 --> 00:06:55,760
While this appeared to work well with the ones that forecast it didn't work so well for multi-step forecasts

80
00:06:56,600 --> 00:07:02,530
we saw that the model was likely just remembering the previous value and making that prediction.

81
00:07:02,570 --> 00:07:07,850
This actually makes sense in the context of our second example as well where we tried to predict stock

82
00:07:07,850 --> 00:07:09,410
return.

83
00:07:09,470 --> 00:07:14,160
In this example we saw that we couldn't get the last to decrease that much.

84
00:07:14,170 --> 00:07:20,230
The third example we did was to consider all the data not just the previous closed prices and to turn

85
00:07:20,230 --> 00:07:29,060
our model into a binary classifier we saw that even on the simplest possible task with the most data

86
00:07:29,480 --> 00:07:35,670
the models still could not achieve better results than random guessing what this tells us is that there

87
00:07:35,670 --> 00:07:40,020
isn't any hope of our first two models working in the first place.

88
00:07:40,020 --> 00:07:44,290
Imagine if you can't even predict whether the stock price will go up or down.

89
00:07:44,340 --> 00:07:50,600
How could you ever predict the numerical value of the return or the next stock price itself.

90
00:07:50,610 --> 00:07:56,220
In other words if you can't even do the easy task then surely the harder task is also not going to be

91
00:07:56,220 --> 00:07:56,790
solved

92
00:08:01,950 --> 00:08:08,310
so be suspicious whenever you encounter an almost perfectly accurate stock price forecast.

93
00:08:08,340 --> 00:08:13,920
If we can't even do binary classification then you can be sure that forecasting the actual price will

94
00:08:13,920 --> 00:08:19,250
not be so simple by the way these results should not be surprising.

95
00:08:19,260 --> 00:08:25,380
I asked you earlier to keep in mind the previous examples we did in this section if you recall we had

96
00:08:25,380 --> 00:08:29,010
trouble getting an orange end to predict a noisy sine wave.

97
00:08:29,010 --> 00:08:32,070
The underlying function in a noisy sine wave is a sine wave.

98
00:08:32,670 --> 00:08:39,720
It's a deterministic signal you can imagine that if we have problems even with that then in comparison

99
00:08:39,750 --> 00:08:41,940
stock prices should be a huge problem

100
00:08:47,090 --> 00:08:52,430
the major problem with trying to predict stock prices from stock prices is that it's a fundamentally

101
00:08:52,430 --> 00:08:54,030
flawed approach.

102
00:08:54,170 --> 00:08:58,190
These approaches don't take into account data about the world itself.

103
00:08:58,220 --> 00:09:01,010
Why does a stock price go up or down.

104
00:09:01,010 --> 00:09:05,080
Many times it's due to some occurrence in the real world.

105
00:09:05,090 --> 00:09:10,700
Imagine how ludicrous it would be to say here is some historical stock prices from this.

106
00:09:10,770 --> 00:09:12,780
The future is totally deterministic.

107
00:09:12,800 --> 00:09:15,490
We can predict everything that will happen now.

108
00:09:15,510 --> 00:09:18,380
That is a pretty crazy line of thought.

109
00:09:18,380 --> 00:09:25,520
Now consider a more realistic scenario where stock prices depend on the emotions of investors that could

110
00:09:25,520 --> 00:09:30,740
be based on things like how the company is being portrayed in the media whether the company has invented

111
00:09:30,740 --> 00:09:33,140
something new and so on.

112
00:09:33,140 --> 00:09:38,510
Even without knowing about deep learning there should be enough to make you suspicious of any course

113
00:09:38,510 --> 00:09:43,670
that claims to be able to use LSD Ms to predict stock prices using historical data.

114
00:09:44,510 --> 00:09:49,760
And yes that's what a lot of people are doing out there forecasting stock prices based on a time series

115
00:09:49,760 --> 00:09:50,200
alone

116
00:09:53,930 --> 00:09:58,670
instead we can think about what actually might cause a stock price to go up or down.

117
00:09:58,700 --> 00:10:01,250
We see examples of this all the time.

118
00:10:01,250 --> 00:10:06,890
For example when Facebook is being investigated by the government that causes its stock price to go

119
00:10:06,890 --> 00:10:07,770
down.

120
00:10:08,000 --> 00:10:13,630
When Elon Musk tweets while intoxicated that makes his company's stock price go down.

121
00:10:13,700 --> 00:10:19,220
But when someone at Space X invents a new kind of rocket technology that makes the stock price go up

122
00:10:24,360 --> 00:10:25,700
imagine what you are saying.

123
00:10:25,730 --> 00:10:32,120
If you try to use historical prices alone to try and predict future stock prices that's like saying

124
00:10:32,480 --> 00:10:37,850
using historical stock prices you think you can predict whether Facebook will be investigated or fined

125
00:10:37,850 --> 00:10:38,990
by the government.

126
00:10:38,990 --> 00:10:44,660
That Elon Musk might make a stupid tweet or that some rocket scientists will invent something to believe

127
00:10:44,660 --> 00:10:50,770
that you could predict such events from looking at stock prices is surely a laughable idea.

128
00:10:50,870 --> 00:10:55,700
That's like saying different events that happen in the world are preordained by stock prices.

129
00:10:55,700 --> 00:10:57,180
Now that is crazy.

130
00:10:57,440 --> 00:11:03,440
If you really want to learn how to do good stock market predictions that could be a course all by itself.

131
00:11:03,440 --> 00:11:07,970
Never mind a lifetime of study finally.

132
00:11:08,060 --> 00:11:13,640
This clearly isn't to say that unless teams are not good unless teams have proven themselves time and

133
00:11:13,640 --> 00:11:19,950
time again on tasks such as language modeling and machine translation We've already seen how else teams

134
00:11:20,000 --> 00:11:27,420
can be used for image classification which is an application you probably hadn't even considered.

135
00:11:27,680 --> 00:11:33,050
So I hope this lecture helps you understand sort of a bigger picture around deep learning and how to

136
00:11:33,050 --> 00:11:35,530
choose the right resources to learn from.

137
00:11:35,540 --> 00:11:38,540
There are a lot of market is out there trying to get your attention.

138
00:11:38,720 --> 00:11:40,910
So try not to fall into their trap.