1
00:00:11,920 --> 00:00:14,180
OK so that's lesson number one.

2
00:00:14,180 --> 00:00:19,310
The lesson is that it's actually very unconventional to try and predict the stock price in the first

3
00:00:19,310 --> 00:00:20,450
place.

4
00:00:20,510 --> 00:00:25,220
Instead what we really want to try and predict is the stock return.

5
00:00:25,400 --> 00:00:30,020
The return is defined as follows for a given period of time.

6
00:00:30,020 --> 00:00:35,210
The return is the final price minus the initial price divided by the initial price

7
00:00:40,280 --> 00:00:42,090
not just for some intuition.

8
00:00:42,090 --> 00:00:45,750
This is the same formula you would use when you want to calculate a sale.

9
00:00:45,750 --> 00:00:51,780
So for example what does it mean when we say something is 20 percent off intuitively you know that if

10
00:00:51,780 --> 00:00:57,030
something costs 100 dollars and it's 20 percent off then you'll pay 80 dollars.

11
00:00:57,030 --> 00:00:59,760
But we can also calculate this another way.

12
00:00:59,760 --> 00:01:03,460
Take 80 subtract one hundred and divide by one hundred.

13
00:01:03,480 --> 00:01:05,700
That's minus 20 percent.

14
00:01:05,730 --> 00:01:11,030
In other words the price you pay is 20 percent less than the original price.

15
00:01:11,040 --> 00:01:18,030
So to summarize when financial engineers do quote unquote stock prediction they are usually looking

16
00:01:18,120 --> 00:01:25,010
at some form of return not actual prices.

17
00:01:25,060 --> 00:01:30,950
OK so back to the code since we're working with a pan this data frame this is somewhat convenient at

18
00:01:30,950 --> 00:01:33,320
least if you know the right functions to call

19
00:01:36,260 --> 00:01:42,260
as usual we want to process our data in a vector rise form meaning we do some operation on all the columns

20
00:01:42,260 --> 00:01:43,570
at once.

21
00:01:43,670 --> 00:01:49,820
So first we need to shift at the closing price up by 1 so that yesterday's closing price is aligned

22
00:01:49,820 --> 00:01:51,600
with today's closing price.

23
00:01:51,770 --> 00:01:56,530
So x of 2 is beside x of 1 as a 3 is besides 2 and so on.

24
00:01:56,900 --> 00:02:04,840
Then we can subtract them all in one step so let's do DFT it had to look at the result now as you can

25
00:02:04,840 --> 00:02:10,930
see the first row is in again because we don't know the previous price for that row since it's not in

26
00:02:10,930 --> 00:02:12,030
our dataset.

27
00:02:12,400 --> 00:02:17,470
This is not a problem for us because we are building an hour in n it's going to take the first 10 days

28
00:02:17,470 --> 00:02:20,110
of data to predict that the eleventh day.

29
00:02:20,110 --> 00:02:24,020
So that would be from here up to the tenth row.

30
00:02:24,100 --> 00:02:27,170
So we never even use this value.

31
00:02:27,170 --> 00:02:30,480
All right so the 11th row is much further away than the first row.

32
00:02:30,500 --> 00:02:37,660
So if the first row has any ends as long as you're not in the target that's fine.

33
00:02:37,690 --> 00:02:39,850
Next we do our return calculation.

34
00:02:40,060 --> 00:02:48,160
So we say D F return is equal at the close minus the F previous close divided by the previous close.

35
00:02:48,160 --> 00:02:48,600
Let's do it.

36
00:02:48,610 --> 00:02:54,760
The FDR had again to look at these returns.

37
00:02:54,920 --> 00:02:57,630
So again the first value is in n n.

38
00:02:57,710 --> 00:02:58,340
But we don't care

39
00:03:07,180 --> 00:03:09,850
next we do a plot of this time series.

40
00:03:09,970 --> 00:03:13,860
Conceptually this is no different from the stock price or a sine wave.

41
00:03:13,870 --> 00:03:19,750
It's still just a time series but as you're beginning to see sometimes series are harder to learn from

42
00:03:19,750 --> 00:03:20,490
than others.

43
00:03:29,670 --> 00:03:35,760
Next we draw a histogram so we can see the distribution of returns we can see that these are pretty

44
00:03:35,760 --> 00:03:36,590
small values.

45
00:03:36,600 --> 00:03:43,990
So we might want to normalize them so we're going to go through the same thing again where we assign

46
00:03:43,990 --> 00:03:49,570
the returns to a variable called series as an end by one matrix.

47
00:03:49,570 --> 00:03:56,530
Note that here I'm taking dot values at index 1 and up because the 0 with index is an N N as before

48
00:03:56,590 --> 00:04:02,800
we create a standard scalar call fit on the first half and call it transform on the whole thing and

49
00:04:02,800 --> 00:04:14,360
then flatten it back to an N length vector.

50
00:04:14,390 --> 00:04:19,820
Next we're going to go through all the same steps to build our supervised learning dataset and train

51
00:04:19,820 --> 00:04:25,430
our auto regressive on and model this code again isn't any different than the previous code.

52
00:04:25,550 --> 00:04:33,950
It's only the data itself that's different in now we are working with the series as the return rather

53
00:04:33,950 --> 00:04:38,520
than the actual stock price.

54
00:04:38,560 --> 00:04:48,920
So here's the data his are and then model and here's the result of training so the loss starts out at

55
00:04:48,920 --> 00:04:49,760
about 1

56
00:04:54,010 --> 00:05:02,230
and then by the end of training the loss is about point 8 7 and the test loss is about one point five.

57
00:05:02,290 --> 00:05:04,780
So it's pretty clearly over fitting

58
00:05:09,830 --> 00:05:15,140
well we can see from this is that the model has a much harder time learning anything.

59
00:05:15,230 --> 00:05:21,230
If we look at the loss per iteration it looks like the loss goes down a tiny bit but only at the cost

60
00:05:21,230 --> 00:05:23,220
of the validation loss going up.

61
00:05:23,240 --> 00:05:28,690
In other words it's just fitting to the noise.

62
00:05:28,780 --> 00:05:31,320
Next we're going to do a one that forecast

63
00:05:35,890 --> 00:05:38,350
and so here are the results.

64
00:05:38,360 --> 00:05:43,190
Now it's hard to tell whether this is good or not but judging by the previous loss values it's probably

65
00:05:43,190 --> 00:05:45,400
not good as an exercise.

66
00:05:45,400 --> 00:05:53,930
You might want to try to run this locally so that you can zoom into the plot and check for yourself.

67
00:05:53,940 --> 00:06:00,140
Next we're going to do a multi-step forecast.

68
00:06:00,350 --> 00:06:02,510
So here are the results for the multi-step

69
00:06:05,110 --> 00:06:10,870
so again we have the situation where the model doesn't really match up that well with the actual time

70
00:06:10,870 --> 00:06:11,530
series.
