1
00:00:11,140 --> 00:00:16,630
So in this lecture, we'll be taking a little detour from Time series analysis, this lecture is all

2
00:00:16,630 --> 00:00:18,030
about extrapolation.

3
00:00:18,970 --> 00:00:24,460
The goal of this lecture is to figure out how different machine learning models perform on data outside

4
00:00:24,460 --> 00:00:26,210
the range of values it was trained on.

5
00:00:27,070 --> 00:00:30,600
This is a question we have to answer for time series analysis.

6
00:00:31,210 --> 00:00:34,630
So picture a time series with a clear trend like a stock price.

7
00:00:35,140 --> 00:00:40,600
Now, clearly, if we split this data into train and test the test, it will take on different values

8
00:00:40,600 --> 00:00:41,440
than the train set.

9
00:00:41,980 --> 00:00:44,800
This is because the TIME series is increasing.

10
00:00:45,580 --> 00:00:51,070
The question to consider in this lecture is can machine learning predict values outside of the range

11
00:00:51,070 --> 00:00:52,000
of the train set?

12
00:00:52,840 --> 00:00:58,500
If the answer is yes, then it would make sense to plug stock prices into machine learning models.

13
00:00:58,810 --> 00:01:01,440
But if the answer is no, then it would not make sense.

14
00:01:01,900 --> 00:01:05,350
In that case, it's probably better to use something like stock returns.

15
00:01:06,100 --> 00:01:07,360
OK, so let's begin.

16
00:01:09,220 --> 00:01:14,230
So we'll begin by importing non-pay matplotlib and a bunch of different models from cyclosarin learn.

17
00:01:20,390 --> 00:01:22,230
The next step is to make our data set.

18
00:01:22,880 --> 00:01:27,870
Basically, it's a sine wave in two dimensions, so the shape of our data is then by two.

19
00:01:28,730 --> 00:01:32,700
The target is just a linear function of the cosine of the inputs.

20
00:01:34,070 --> 00:01:39,650
One thing to notice about our data set is that the inputs are uniformly distributed in the range of

21
00:01:39,650 --> 00:01:42,320
minus three to plus three in both directions.

22
00:01:43,310 --> 00:01:45,860
OK, so from minus three to plus three.

23
00:01:50,730 --> 00:01:53,910
The next step is to plot our data set to see what it looks like.

24
00:02:00,150 --> 00:02:05,040
So basically, it's a repeating sine wave that goes on forever and both the next one, the next two

25
00:02:05,040 --> 00:02:05,880
directions.

26
00:02:09,520 --> 00:02:12,040
The next step is to train a support vector machine.

27
00:02:17,480 --> 00:02:22,250
The next step is to plot the function that our model has learned with the original data points.

28
00:02:22,850 --> 00:02:26,840
Now, this code isn't that important, so don't pay too much attention to the details.

29
00:02:27,320 --> 00:02:32,600
The main thing to focus on is that the range of values for this client is minus three, up to plus three

30
00:02:32,750 --> 00:02:34,100
in both directions.

31
00:02:39,930 --> 00:02:45,990
OK, so we can see that our model seems to do a pretty good job at learning this function, the support

32
00:02:45,990 --> 00:02:47,550
vector machine is pretty powerful.

33
00:02:47,580 --> 00:02:49,140
So this is not unexpected.

34
00:02:54,240 --> 00:02:59,970
The next step is to run the exact same code, except for one small difference, this time we'll let

35
00:02:59,970 --> 00:03:02,760
the range of values go from minus five to plus five.

36
00:03:08,050 --> 00:03:12,670
So we see that when we extend the range of values, our model does not extrapolate well.

37
00:03:15,830 --> 00:03:20,660
Now, sometimes people see this kind of thing and ask, well, how do you know that's not the true function?

38
00:03:21,220 --> 00:03:24,370
The answer is because we've already defined the true function above.

39
00:03:24,680 --> 00:03:27,950
So we know what the true function is and we know that this is not.

40
00:03:31,660 --> 00:03:36,610
So just in case you forgot, we're going to make the same plot again, but this time we're going to

41
00:03:36,610 --> 00:03:39,220
use the true function instead of a model prediction.

42
00:03:46,070 --> 00:03:51,830
OK, so this plot is a plot of the true function, remember, it's a sine wave that repeats in every

43
00:03:51,830 --> 00:03:52,520
direction.

44
00:03:56,360 --> 00:04:01,850
The next step is to train a random forest again, we're going to plot the functional model has learned

45
00:04:02,090 --> 00:04:03,920
within the range of terrain values.

46
00:04:12,430 --> 00:04:17,590
So we see that for the random forest, the predictions are a lot more bumpy and jagged, this makes

47
00:04:17,590 --> 00:04:22,420
sense considering that the random forest is just the average of a bunch of decision trees.

48
00:04:26,020 --> 00:04:32,200
The next step is to check whether or not the random forest can extrapolate again, we've simply expanded

49
00:04:32,200 --> 00:04:35,050
the range of values to minus five and plus five.

50
00:04:41,280 --> 00:04:46,080
So we see that the random forest simply projects outward the last known prediction.

51
00:04:47,210 --> 00:04:52,070
Again, this makes sense, considering what we know about decision trees, there are no more decision

52
00:04:52,070 --> 00:04:56,420
splits beyond the trading values and we know that decision.

53
00:04:56,420 --> 00:04:58,430
Trees learn horizontal lines.

54
00:05:01,910 --> 00:05:06,500
Now, although we'll be studying deep learning in depth elsewhere, we're going to take this opportunity

55
00:05:06,500 --> 00:05:09,370
to try a neural network inside, you learn.

56
00:05:09,440 --> 00:05:12,470
This is called the Multilayer Perceptron or MLP.

57
00:05:17,600 --> 00:05:22,400
Again, we'll start by plotting the learned function only inside the range of train values.

58
00:05:29,350 --> 00:05:33,700
OK, so the neural network seems to fit pretty well, maybe even better than the others.

59
00:05:38,330 --> 00:05:41,570
But now let's see how well the neural network extrapolates.

60
00:05:47,330 --> 00:05:52,190
So, again, we see that the neural network simply extends the prediction out in a straight line.

61
00:05:52,940 --> 00:05:56,650
This will make perfect sense once you know how neural networks actually work.

62
00:06:00,020 --> 00:06:05,270
OK, so in the next portion of this notebook, we're going to see how this concept of extrapolation

63
00:06:05,450 --> 00:06:07,080
applies to stock prices.

64
00:06:07,610 --> 00:06:09,170
Remember what we have just learned?

65
00:06:09,620 --> 00:06:12,450
We've learned that machine learning models cannot extrapolate.

66
00:06:13,130 --> 00:06:18,200
We've just tried a few of the most powerful go to machine learning methods, and we've seen that they

67
00:06:18,200 --> 00:06:19,820
completely fail at this task.

68
00:06:21,200 --> 00:06:25,130
OK, so I'm going to skip most of this code where we download the data and load it in.

69
00:06:33,640 --> 00:06:39,370
So the relevant part starts here where we split the data into training tests, I've just arbitrarily

70
00:06:39,370 --> 00:06:41,500
chosen two thousand as the split point.

71
00:06:45,770 --> 00:06:52,040
OK, so the next step is to create pretend the inputs will pretend that what we'd like to do is build

72
00:06:52,040 --> 00:06:59,360
an auto regressive model with two legs, that is X of T depends on X of T minus one and X of T minus

73
00:06:59,360 --> 00:06:59,780
two.

74
00:07:00,740 --> 00:07:03,380
This is only so that we'll be able to plot this later on.

75
00:07:04,290 --> 00:07:07,280
OK, so it's basically the same loop for both training and test.

76
00:07:11,240 --> 00:07:17,150
The next step is to convert extranet next test into an empire raise so that they are easier to index.

77
00:07:20,680 --> 00:07:23,330
The next step is to make a scatterplot of our data.

78
00:07:24,100 --> 00:07:30,040
So in this plot, XXXIV goes along the horizontal axis index of T plus one goes along, the vertical

79
00:07:30,040 --> 00:07:33,910
axis will make the trains that read and the test set green.

80
00:07:38,910 --> 00:07:40,500
OK, so what can we see?

81
00:07:41,160 --> 00:07:47,130
Well, we can see that the train inputs and the test inputs occupy a totally different area of the input

82
00:07:47,130 --> 00:07:47,750
space.

83
00:07:48,750 --> 00:07:53,280
Not that I've squashed the plot down in the vertical direction just so that we can see it fit on the

84
00:07:53,280 --> 00:07:53,850
screen.

85
00:07:54,390 --> 00:07:57,600
But in actuality, this would look more like a 45 degree line.

86
00:07:58,950 --> 00:08:05,010
Basically, what this is saying is that if you use stock prices as inputs into a machine learning model,

87
00:08:05,400 --> 00:08:08,210
the test data would require extrapolation.

88
00:08:08,910 --> 00:08:12,180
The model only gets to train on the red range of values.

89
00:08:12,630 --> 00:08:15,090
The green range of values is never observed.

90
00:08:16,410 --> 00:08:22,170
Now, sometimes people ask, well, what about scaling the data, for example, by using standardization

91
00:08:22,170 --> 00:08:23,340
or minimal scaling?

92
00:08:24,060 --> 00:08:28,970
Well, remember that scaling the data has no effect on the relationships between the points.

93
00:08:29,370 --> 00:08:34,710
If you divide a time series by one thousand, the values would just be 1000 times smaller, but the

94
00:08:34,710 --> 00:08:37,000
plot of those values would look exactly the same.

95
00:08:37,680 --> 00:08:42,060
And of course, if you're not convinced by this, you're encouraged to try it as an exercise.