1
00:00:11,110 --> 00:00:16,810
In this video, we're going to look at how to use an ends to predict stock prices and stock returns.

2
00:00:17,650 --> 00:00:20,380
Now, normally, I would just say all data is the same.

3
00:00:20,380 --> 00:00:23,610
But there are a few additional details I want to make a note of in this script.

4
00:00:24,100 --> 00:00:26,200
So let's scroll down to where we load in our data.

5
00:00:28,700 --> 00:00:33,950
OK, so for this example, we'll be looking at Starbucks again, there is no special reason for this,

6
00:00:34,130 --> 00:00:35,540
so pick whatever you like.

7
00:00:41,770 --> 00:00:46,960
The next interesting thing we're going to do is standardize the Time series, as you can see, we're

8
00:00:46,960 --> 00:00:51,130
doing this on the column Deflower Close, which is really just the log return.

9
00:00:52,060 --> 00:00:54,580
So why do we want to standardize this data set?

10
00:00:55,270 --> 00:00:58,390
Well, I want to avoid any in-depth mathematical details.

11
00:00:58,660 --> 00:01:04,830
But usually in deep learning, standardizing or normalizing your data is a good idea for intuition.

12
00:01:04,840 --> 00:01:07,330
You can simply test this on various data sets.

13
00:01:07,750 --> 00:01:11,860
What you should observe is that standardization seems to help more often than not.

14
00:01:12,640 --> 00:01:17,280
And again, if you are interested in those details, please see extra reading dot text.

15
00:01:18,730 --> 00:01:24,490
Now, you might also be wondering why do this on the original Time series and not the supervised tabular

16
00:01:24,490 --> 00:01:25,140
data set?

17
00:01:25,750 --> 00:01:31,480
And the reason for this is if you use the tabular data set, it will standardize each column independently,

18
00:01:31,750 --> 00:01:33,110
which won't make much sense.

19
00:01:34,060 --> 00:01:38,660
That will mean the same return as showing up in different columns will have different values.

20
00:01:39,100 --> 00:01:43,110
So instead we would like to standardize the whole time series all at once.

21
00:01:46,570 --> 00:01:51,580
OK, so after selling the log returns, we're going to put them back into the data frame using the column

22
00:01:51,580 --> 00:01:59,110
name scale to log return notice again how we need to flatten the data since I get LERN returns to erase.

23
00:02:05,080 --> 00:02:08,770
The next step is to create a supervised data set, which you know how to do.

24
00:02:12,800 --> 00:02:15,710
The next step is to split our data into train and test.

25
00:02:19,070 --> 00:02:20,900
The next step is to create our and.

26
00:02:24,470 --> 00:02:27,100
So the next few steps are the same as we've seen before.

27
00:02:38,340 --> 00:02:40,240
OK, so here's the last prepack.

28
00:02:40,770 --> 00:02:44,700
Notice how there is Overfitting, which we can observe by looking at the test loss.

29
00:02:47,210 --> 00:02:52,970
The next step is to set the first big test, plus one, values in train reacts to force since they are

30
00:02:52,970 --> 00:02:53,800
unpredictable.

31
00:02:56,830 --> 00:03:02,650
The next step is to obtain the model predictions, so normally we would just call model predict, but

32
00:03:02,650 --> 00:03:04,380
recall that our data has been scaled.

33
00:03:04,840 --> 00:03:09,580
Therefore we need to invert the scaling by calling scale or inverse transform.

34
00:03:13,710 --> 00:03:15,690
The next step is to start predictions.

35
00:03:19,690 --> 00:03:21,730
The next step is do plodder predictions.

36
00:03:26,390 --> 00:03:30,290
OK, so here's a plot of the one step stock return predictions.

37
00:03:31,400 --> 00:03:36,400
Note that it's pretty difficult to see since there are so many values, so you might want to try just

38
00:03:36,410 --> 00:03:37,930
plotting the last few steps.

39
00:03:40,790 --> 00:03:45,410
The next step is to plot the prices will begin by shifting up the log clothes.

40
00:03:49,490 --> 00:03:53,470
The next step is to save the last known train value, although we won't need this just yet.

41
00:03:57,350 --> 00:04:02,990
The next step is to add the previous close price to the predicted returns, since this is just the ones

42
00:04:02,990 --> 00:04:06,530
that forecast, we use the same method for both train and test.

43
00:04:10,570 --> 00:04:16,210
The next step is to plot the one step price predictions, so in order to see things more clearly noticed

44
00:04:16,210 --> 00:04:18,820
that we are only plotting the last 100 steps.

45
00:04:23,360 --> 00:04:27,240
OK, so here's a plot of the one step stock price predictions.

46
00:04:27,830 --> 00:04:32,180
Notice how the model basically just lags the previous value, which is not unexpected.

47
00:04:35,760 --> 00:04:40,690
The next step is to compute the multi-step forecast, since you've seen this code before.

48
00:04:40,740 --> 00:04:42,030
I won't explain it again.

49
00:04:46,380 --> 00:04:51,630
The next step is to invert the scaling on our predictions to do this, we're going to convert our list

50
00:04:51,630 --> 00:04:54,060
of predictions into a Nampara.

51
00:04:54,570 --> 00:04:58,670
We then reshape this to be two dimensional, which is required for Saikat learn.

52
00:04:59,400 --> 00:05:03,890
And then after obtaining the unskilled prediction, we flatten it back to a one Deira.

53
00:05:07,600 --> 00:05:12,970
The next step is to compute the incremental multistep forecast by adding the cumulative some of the

54
00:05:12,970 --> 00:05:15,520
return predictions to the last known price.

55
00:05:19,150 --> 00:05:21,790
The next step is to plot our multi-step forecast.

56
00:05:25,920 --> 00:05:31,680
So as you can see, it comes up to be almost a straight line with a trend, which makes sense, although

57
00:05:31,680 --> 00:05:32,990
it is not a good prediction.

58
00:05:35,670 --> 00:05:41,200
The next step is to create a multi output supervised data set, since you already know how to do this.

59
00:05:41,370 --> 00:05:42,830
I want to explain it again.

60
00:05:47,800 --> 00:05:53,680
The next step is to split our new data into trade and test recall that the test set is only the last

61
00:05:53,680 --> 00:05:57,000
row since that contains our entire forecast horizon.

62
00:06:00,760 --> 00:06:05,410
The next step is to create creator and and to follow the same steps we've already seen before.

63
00:06:11,960 --> 00:06:14,150
The next step is to plot the Los Prepack.

64
00:06:18,840 --> 00:06:22,050
OK, so again, notice that the anend over overfit.

65
00:06:25,110 --> 00:06:27,510
The next step is to plot our model predictions.

66
00:06:32,040 --> 00:06:37,680
OK, so, again, the main difference here is that we need to call inversed Transform to put our predictions

67
00:06:37,680 --> 00:06:39,210
back on the original scale.

68
00:06:42,340 --> 00:06:44,410
The next step is to plot our predictions.

69
00:06:50,110 --> 00:06:54,070
OK, so again, it's pretty much just a straight line with a small trend.

70
00:06:57,680 --> 00:06:59,380
The next step is to check the map.

71
00:07:02,460 --> 00:07:05,280
OK, so notice how the map is actually very small.

72
00:07:05,610 --> 00:07:11,430
This is pretty misleading since, as you recall, airline passengers also has a similar map, and yet

73
00:07:11,430 --> 00:07:13,890
those predictions are actually good predictions.

74
00:07:14,220 --> 00:07:18,000
So you may want to consider a different metric, such as the R-squared.

75
00:07:21,830 --> 00:07:26,600
The next step is to make our problems simpler by doing a one step binary classification.

76
00:07:27,650 --> 00:07:31,470
So in order to build this data set, we can simply use what we had before.

77
00:07:31,700 --> 00:07:37,190
We just need to make new targets by assigning anything above zero to one and anything below zero to

78
00:07:37,190 --> 00:07:37,790
zero.

79
00:07:41,800 --> 00:07:44,840
Notice that our end has exactly the same parameters.

80
00:07:45,550 --> 00:07:50,560
The main difference is that since we're doing binary classification, we use the binary cross entropy

81
00:07:50,560 --> 00:07:51,160
loss.

82
00:07:57,590 --> 00:08:01,400
Also, notice that when we call model Duffett, we use our new targets.

83
00:08:10,680 --> 00:08:13,110
The next step is to check out our laws prepack.

84
00:08:17,750 --> 00:08:23,840
OK, so surprisingly, the Lansberry Park looks pretty good, the Tesla's no longer increases.

85
00:08:26,530 --> 00:08:29,230
The next step is to look at the accuracy prepack.

86
00:08:34,650 --> 00:08:40,170
OK, so notice how even for the test set, which is quite small, we seem to do better than just predicting

87
00:08:40,170 --> 00:08:41,310
a random coin toss.

88
00:08:41,850 --> 00:08:45,330
Of course, the real test is whether or not you can do this consistently.