1
00:00:11,080 --> 00:00:16,330
So in this lecture, we are going to do stock predictions once again, but this time our objective is

2
00:00:16,330 --> 00:00:17,340
slightly different.

3
00:00:17,980 --> 00:00:23,530
Instead of trying to forecast the return and effectively the price, we are going to simply try to predict

4
00:00:23,530 --> 00:00:24,380
the direction.

5
00:00:24,880 --> 00:00:26,680
So that should make the task much easier.

6
00:00:27,310 --> 00:00:32,050
Instead of trying to predict a number which has an infinite number of choices, we're just going to

7
00:00:32,050 --> 00:00:34,640
predict up or down, which is two choices.

8
00:00:35,350 --> 00:00:37,520
Furthermore, to make this even easier.

9
00:00:37,570 --> 00:00:40,390
We will only try to predict one step ahead.

10
00:00:40,900 --> 00:00:43,820
So this is effectively the easiest task possible.

11
00:00:44,380 --> 00:00:46,010
Let's see how machine learning will do.

12
00:00:50,760 --> 00:00:56,050
OK, so notice that for our imports, we are now importing classifiers and not aggressives.

13
00:00:59,790 --> 00:01:04,110
OK, so, again, we're going to download our data and choose IBM as our stock.

14
00:01:09,890 --> 00:01:15,680
OK, so the new part here is how we create the data set, so for this data set, we're again going to

15
00:01:15,680 --> 00:01:19,940
use a stationary time series as input, which comes from the log returns.

16
00:01:20,840 --> 00:01:24,890
The target is simply whether the log return is positive or negative.

17
00:01:25,670 --> 00:01:28,970
An easy way to compute there is to use greater than zero.

18
00:01:29,630 --> 00:01:35,330
Multiplying by one is technically not necessary, but I like my targets to be zero and one instead of

19
00:01:35,330 --> 00:01:36,240
true and false.

20
00:01:37,430 --> 00:01:40,050
Notice how I'm using 21 lags for this script.

21
00:01:40,580 --> 00:01:46,070
This is just to test the idea that maybe we can get better predictions if we use more lax.

22
00:01:49,880 --> 00:01:55,090
OK, so basically the only difference in our loop now is that when we want to get the target, the target

23
00:01:55,090 --> 00:01:57,170
no longer comes from the Time series itself.

24
00:01:57,550 --> 00:01:59,410
Instead, it has been pre computed.

25
00:02:00,130 --> 00:02:02,350
Other than that, it's the same as before.

26
00:02:07,890 --> 00:02:11,610
OK, so the next step is to split our data into train and test.

27
00:02:16,200 --> 00:02:18,540
The next step is to train a logistic regression.

28
00:02:22,150 --> 00:02:26,410
OK, so the train accuracy is only about 50 percent, not great.

29
00:02:30,330 --> 00:02:36,780
And the test accuracy is also only about 50 percent, so pretty much just as hard as predicting a coin

30
00:02:36,780 --> 00:02:37,380
toss.

31
00:02:40,880 --> 00:02:43,310
The next step is to train a support vector machine.

32
00:02:47,010 --> 00:02:51,140
So this does better on the train set, but let's see whether or not it does better on the tests.

33
00:02:54,930 --> 00:03:01,050
Unfortunately, it does not do better on the test set so we can conclude that the model just over fits.

34
00:03:03,960 --> 00:03:06,090
The next step is to try the random forest.

35
00:03:10,220 --> 00:03:15,770
OK, so the random forest gets 100 percent on the train set, which is not unsurprising given what we

36
00:03:15,770 --> 00:03:16,340
have seen.

37
00:03:20,770 --> 00:03:23,870
But the test set is, again, very close to 50 percent.

38
00:03:24,400 --> 00:03:26,560
So, again, just in overfit model.

39
00:03:27,830 --> 00:03:33,770
OK, so what we can conclude from this is that the linear model was the only model that did not overfit.

40
00:03:35,700 --> 00:03:40,830
Now, again, I want to mention that the exercises and things you can try in this course are nearly

41
00:03:40,830 --> 00:03:41,430
endless.

42
00:03:42,150 --> 00:03:46,580
You might be wondering what if I included other columns such as open high in volume?

43
00:03:47,160 --> 00:03:49,540
What if I included the returns of other stocks?

44
00:03:50,040 --> 00:03:52,230
So these are all interesting things to try.

45
00:03:52,620 --> 00:03:56,610
You should be able to test out these ideas with pretty much no new code.

46
00:03:57,600 --> 00:04:01,220
So, as always, keep trying new things and I'll see you in the next lecture.
