1
00:00:11,110 --> 00:00:15,220
OK, so in this lecture, we will continue looking at our garbage notebook.

2
00:00:15,880 --> 00:00:19,720
The first step is to create our train test split of our log returns.

3
00:00:20,170 --> 00:00:23,650
We'll use an testicles five hundred for no particular reason.

4
00:00:29,530 --> 00:00:35,080
In the next block of code, we'll investigate why it's important to scale our returns when using search.

5
00:00:35,710 --> 00:00:38,260
So we'll begin by calling the arch model function.

6
00:00:39,100 --> 00:00:42,550
As mentioned to the first argument is the training time series.

7
00:00:43,060 --> 00:00:46,240
We'll specify that we want our model to be a large one one.

8
00:00:51,340 --> 00:00:53,290
The next step is to call a model that fit.

9
00:01:00,570 --> 00:01:05,730
OK, so you can see how this prints out a bunch of stuff, such as the lost function at each iteration

10
00:01:06,510 --> 00:01:10,800
notice that we end up getting a warning telling us that the scale of why is too small.

11
00:01:11,490 --> 00:01:16,800
It says parameter estimation works better when this value is between one and 1000.

12
00:01:17,220 --> 00:01:20,470
The recommended rescaling is 100 times why.

13
00:01:21,960 --> 00:01:26,040
OK, so the lesson is that for Gaj, we need to scale our time series.

14
00:01:31,200 --> 00:01:36,420
OK, so the next step is to scale our time series will assign the mean of the training series to be

15
00:01:36,420 --> 00:01:40,080
M and the standard deviation of our training series to be X.

16
00:01:40,800 --> 00:01:46,140
The next step is to standardize both the train series and the Test series using M and S.

17
00:01:46,590 --> 00:01:49,710
As you recall, we subtract and divide by X.

18
00:01:55,220 --> 00:01:56,960
The next step is to create an arch one.

19
00:01:57,950 --> 00:02:03,680
So notice that for this, we have to specify that vol is arch, although P equals one is the default.

20
00:02:04,220 --> 00:02:08,150
This is because the default model for this function is actually gaj.

21
00:02:12,900 --> 00:02:15,180
The next step is to call March one that fit.

22
00:02:15,930 --> 00:02:21,180
Notice that I've passed in an argument called Update Freak, which shows you how to modify how often

23
00:02:21,180 --> 00:02:23,070
this function prints out the locks.

24
00:02:27,460 --> 00:02:32,980
OK, so notice how the training process has completed successfully, and we no longer get any warnings.

25
00:02:36,550 --> 00:02:39,730
The next step is to call the summary function on the results.

26
00:02:43,450 --> 00:02:45,760
So we can see a few important things from this.

27
00:02:46,570 --> 00:02:52,870
Firstly, notice that the mean model is constant mean, as you recall, this is the default setting.

28
00:02:53,920 --> 00:02:56,830
Secondly, notice that the distribution is normal.

29
00:02:57,280 --> 00:02:59,830
This is also one of the default settings.

30
00:03:01,000 --> 00:03:04,480
Third, notice that the method is maximum likelihood.

31
00:03:05,110 --> 00:03:08,710
Now, if you don't know what that is, will be going over that in a later lecture.

32
00:03:08,740 --> 00:03:13,330
But if you do, then this should give you some idea of how this model is trained.

33
00:03:14,830 --> 00:03:17,350
The next thing to notice is the log likelihood.

34
00:03:18,010 --> 00:03:23,620
If you recall, the log likelihood for our previous model was much worse when we did not scale the time

35
00:03:23,620 --> 00:03:24,310
series.

36
00:03:28,560 --> 00:03:31,030
OK, so the next thing to notice is the mean model.

37
00:03:31,800 --> 00:03:38,280
The mean model says that the meme is about 0.01 something, but also note that the P value for this

38
00:03:38,280 --> 00:03:39,390
is quite large.

39
00:03:40,650 --> 00:03:46,140
Furthermore, notice that the confidence interval also contains both positive and negative values.

40
00:03:48,180 --> 00:03:50,820
The next thing to look at is the volatility model.

41
00:03:51,600 --> 00:03:54,930
Notice that the arch library uses the same notation we do.

42
00:03:55,560 --> 00:04:00,300
So the bias term is called omega and the arch coefficient is called out for one.

43
00:04:01,170 --> 00:04:04,950
Also notice that both of these have highly significant P values.

44
00:04:09,430 --> 00:04:16,150
OK, so the next step is to plot our modest predictions of the in simple condition of volatility, since

45
00:04:16,150 --> 00:04:17,440
you've seen this code before.

46
00:04:17,650 --> 00:04:19,120
There's nothing more to explain.

47
00:04:24,720 --> 00:04:27,880
So as you can see, the fit seems to be just OK.

48
00:04:28,380 --> 00:04:33,480
It seems to underestimate when the values are large and overestimate when the values are small.

49
00:04:38,490 --> 00:04:44,320
The next step is to explore how to forecast most of this stuff we discussed in the code preparation.

50
00:04:44,340 --> 00:04:46,000
So this should just be a review.

51
00:04:47,160 --> 00:04:51,990
The first thing we're going to do is call the forecast function with only the horizon.

52
00:04:57,040 --> 00:04:58,100
So why don't we do this?

53
00:04:58,120 --> 00:05:03,400
You see that we get a warning telling us about this argument called the Re index, which we discussed

54
00:05:03,400 --> 00:05:04,720
in the code preparation.

55
00:05:09,080 --> 00:05:12,620
The next step is to try setting this index argument to true.

56
00:05:18,140 --> 00:05:23,000
The next step is to simply print out the return value to see what kind of object it is.

57
00:05:26,840 --> 00:05:30,890
So we can see that it's an object of type arch model forecast.

58
00:05:34,460 --> 00:05:39,530
Now, as you recall, this object has attributes including mean variance and so forth.

59
00:05:40,460 --> 00:05:42,980
So the next step is to try printing out the mean.

60
00:05:49,160 --> 00:05:52,280
So as you can see, what we get back is a data frame.

61
00:05:52,790 --> 00:05:56,510
It contains all the dates of our original training time series.

62
00:05:57,140 --> 00:06:02,660
So it starts at two thousand ten or 105, even though these dates don't have a forecast.

63
00:06:03,020 --> 00:06:06,560
They are simply set to not a number, by the way.

64
00:06:06,560 --> 00:06:11,180
A really great exercise while you're watching this lecture is to take what you learned in the previous

65
00:06:11,180 --> 00:06:13,580
lecture and try to guess what we will see.

66
00:06:14,870 --> 00:06:20,420
Also, notice that the column names are simply a smaller one, each start oh two and so forth.

67
00:06:22,630 --> 00:06:29,720
So if we scroll down to the bottom, we see that only the final row has actual numbers, as you recall.

68
00:06:29,740 --> 00:06:33,640
This is because we didn't specify a value for the start argument.

69
00:06:34,060 --> 00:06:38,770
So it simply assumes that we only want to forecast from the end of the training series.

70
00:06:40,430 --> 00:06:43,190
Now, let's think about why all these values are the same.

71
00:06:44,030 --> 00:06:48,950
Well, as you recall, we're just looking at the mean and the default mean model is just constant.

72
00:06:53,170 --> 00:06:55,240
The next step is to check the variants.

73
00:07:01,880 --> 00:07:07,160
So again, we get a full size data frame, even though most of these values are just done a number.

74
00:07:10,520 --> 00:07:15,620
When we scroll down to the end, we can see that the variance forecast is stored in the final row.

75
00:07:19,540 --> 00:07:25,510
The next step is to check the residual variance, as mentioned when we use a constant mean or zero,

76
00:07:25,510 --> 00:07:28,270
meaning this will be the same as the variance.

77
00:07:34,140 --> 00:07:37,920
So if we scroll down to the bottom, we can see that these values are the same.

78
00:07:42,620 --> 00:07:47,810
The next step is to call the forecast function again, but this time to set RE index defaults.

79
00:07:52,630 --> 00:07:54,970
So let's look at the main attribute once again.

80
00:07:58,550 --> 00:08:02,450
So we can see that now because we only have a forecast for the final date.

81
00:08:02,780 --> 00:08:05,540
This is the only role that shows up in our data frame.

82
00:08:10,960 --> 00:08:12,940
The next step is to look at the variants.

83
00:08:18,280 --> 00:08:24,010
So again, it's the same forecast as before, except that we do not have any rows with not a number

84
00:08:24,010 --> 00:08:24,880
of values.