1
00:00:11,030 --> 00:00:11,450
Okay.

2
00:00:11,450 --> 00:00:15,950
So in this lecture we are going to continue our econometrics example with Varma.

3
00:00:18,700 --> 00:00:23,860
The next step is to define which columns we care about, which are GDP growth and term spread.

4
00:00:24,490 --> 00:00:28,840
If you decide to difference the term spread, you could change this to use that column instead.

5
00:00:32,690 --> 00:00:35,660
The next step is to get rid of the first row of the data.

6
00:00:36,410 --> 00:00:40,690
Since we use different thing on GDPR, the first row will have missing values.

7
00:00:43,910 --> 00:00:46,730
The next step is to split our data into train and test.

8
00:00:47,120 --> 00:00:51,590
I've chosen an testicles 12, but you can feel free to try other values if you like.

9
00:00:55,190 --> 00:00:58,060
The next step is to declare train ATX ancestry checks.

10
00:00:58,280 --> 00:00:59,450
As we've done before.

11
00:01:03,330 --> 00:01:06,360
The next step is through scalar data using the standard scalar.

12
00:01:06,930 --> 00:01:10,850
Note that we are overwriting the old values which we won't be using anyway.

13
00:01:15,420 --> 00:01:20,400
The next step is to also overwrite the old values in Def one since we don't need them anymore.

14
00:01:24,370 --> 00:01:27,250
The next step is to plot the achieve of GDP growth.

15
00:01:32,050 --> 00:01:32,440
Okay.

16
00:01:32,440 --> 00:01:37,180
So based on GDP growth alone, the ECF seems to suggest a small Q value.

17
00:01:37,930 --> 00:01:43,120
By the way, note that these plots are not exactly what you need in order to determine PACU.

18
00:01:43,420 --> 00:01:45,520
Since they do not consider cross terms.

19
00:01:46,150 --> 00:01:48,580
So we're only using these as a very rough guide.

20
00:01:48,820 --> 00:01:50,230
Just to give you some idea.

21
00:01:51,220 --> 00:01:56,710
In practice, the easiest way to do model selection is just to do a grid search and pick the best value

22
00:01:56,920 --> 00:01:58,780
based on the criteria you care about.

23
00:02:02,300 --> 00:02:04,280
The next step is to check the PKF.

24
00:02:08,510 --> 00:02:12,410
So again, the piece have seems to suggest a small p value as well.

25
00:02:15,490 --> 00:02:18,340
The next step is to pull out the ACF for the term spread.

26
00:02:22,580 --> 00:02:27,500
Note that the act suggests that the signal has a cyclical pattern, which is what we observed.

27
00:02:31,310 --> 00:02:33,350
The next step is to check the passive.

28
00:02:37,740 --> 00:02:42,180
So this time a note that quite a few legs are significant, up to a pretty high order.

29
00:02:45,090 --> 00:02:45,480
Okay.

30
00:02:45,480 --> 00:02:47,700
So the next step is to check for stationery.

31
00:02:48,120 --> 00:02:49,830
So we'll start with GDP growth.

32
00:02:52,770 --> 00:02:53,160
Okay.

33
00:02:53,160 --> 00:02:56,640
So recall that the p value is the second element in this tuple.

34
00:02:57,180 --> 00:03:00,900
Since it's very small, we can conclude that this time series is stationary.

35
00:03:04,160 --> 00:03:07,340
The next step is to perform the ADF tests on the term spread.

36
00:03:10,830 --> 00:03:14,310
So again, we get a small p value, but not as small as before.

37
00:03:15,030 --> 00:03:19,800
However, even with the 1% significance threshold, we would consider this stationary.

38
00:03:20,250 --> 00:03:21,810
So it's probably not necessary.

39
00:03:21,810 --> 00:03:22,710
It's a difference.

40
00:03:25,640 --> 00:03:27,500
The next step is to set P and Q.

41
00:03:28,100 --> 00:03:30,800
Note that I've chosen these values somewhat arbitrarily.

42
00:03:31,190 --> 00:03:36,440
I've chosen a high value for P since the PSA for the term spread suggests that a large P.

43
00:03:39,730 --> 00:03:43,300
The next step is to fit our VMAX model, which is the same as before.

44
00:03:49,740 --> 00:03:55,050
So notice that this model trains a bit faster than before, but still not what we would consider fast.

45
00:03:58,610 --> 00:04:03,590
The next step is to call get forecast to get the forecasts for and test timestamps.

46
00:04:07,040 --> 00:04:11,720
The next step is to store the train and test predictions in our DataFrame and to plot the results.

47
00:04:12,290 --> 00:04:15,590
Since this code is the same as before, I won't explain it again.

48
00:04:20,210 --> 00:04:20,600
Okay.

49
00:04:20,600 --> 00:04:25,880
So notice that both the train and test fit are not particularly good, at least compared to the previous

50
00:04:25,880 --> 00:04:26,630
example.

51
00:04:27,260 --> 00:04:31,700
It seems to have the most trouble predicting the extreme values even in the train set.

52
00:04:32,480 --> 00:04:36,170
This makes sense since we've seen the same kind of behavior in stock prices.

53
00:04:36,950 --> 00:04:41,870
However, note that the prediction does seem to go in the same direction as the true value.

54
00:04:43,040 --> 00:04:48,860
One way to check whether or not this forecast is useful is to compare it with the naive forecast that

55
00:04:48,860 --> 00:04:51,890
is comparing it with predicting the last known train value.

56
00:04:52,460 --> 00:04:56,840
In this case, without actually plugging in numbers, it would seem that our model is better.

57
00:04:59,570 --> 00:05:02,450
The next step is to plot the predictions for the term spread.

58
00:05:03,020 --> 00:05:06,320
Since this is the same code as before, I won't explain it again.

59
00:05:10,810 --> 00:05:15,940
So in this case, notice that the train predictions are pretty good, but may maybe lagging by one step.

60
00:05:16,480 --> 00:05:22,660
This kind of behavior is typical when you have a trend with noise, but note that for the test predictions

61
00:05:22,870 --> 00:05:24,430
we still see a similar thing.

62
00:05:25,270 --> 00:05:30,850
However, this is not a case of lagging behind by one step because remember that the model doesn't know

63
00:05:30,850 --> 00:05:32,470
anything about the values in the tests.

64
00:05:33,520 --> 00:05:38,260
In this case, the model really is forecasting this curve, only having seen the train set.

65
00:05:38,920 --> 00:05:43,600
So it's interesting that the model is able to capture the pattern despite it being off by a step or

66
00:05:43,600 --> 00:05:44,090
two.

67
00:05:47,150 --> 00:05:52,280
In the next block, we're going to compute the R squared for both the train and test set for each column.

68
00:05:56,500 --> 00:05:56,880
Okay.

69
00:05:56,890 --> 00:06:02,290
So as expected, the trainer squared for GDP growth is pretty low and the tests are squared is even

70
00:06:02,290 --> 00:06:02,800
lower.

71
00:06:03,790 --> 00:06:08,650
However, note that it is above zero, meaning that it's still better than predicting the average test

72
00:06:08,650 --> 00:06:11,320
value for the term spread.

73
00:06:11,410 --> 00:06:16,270
The trainer scored is high, which makes sense because of the strong correlation with lags we saw in

74
00:06:16,270 --> 00:06:17,140
the ECF.

75
00:06:17,740 --> 00:06:19,930
However, the test R-squared is low.

76
00:06:23,090 --> 00:06:26,570
The next step is to try some other models such as Var and a Remo.

77
00:06:27,080 --> 00:06:29,210
So we'll start with creating a var object.

78
00:06:33,430 --> 00:06:36,010
The next step is to call the select order function.

79
00:06:38,930 --> 00:06:45,620
We can see that if we were to use the AIC, we would select a league order of ten, which is 2.5 years.

80
00:06:47,810 --> 00:06:49,520
The next step is to fit our model.

81
00:06:52,980 --> 00:06:56,550
The next step is to get the flag order of our model, which we know is ten.

82
00:06:57,600 --> 00:07:03,060
Recall that we need this value in order to select the right number of rows for the prior time series.

83
00:07:06,390 --> 00:07:12,300
The next step is to grab the prior time series from the train set and then we call forecast to predict

84
00:07:12,300 --> 00:07:14,010
and test the time steps ahead.

85
00:07:17,820 --> 00:07:20,850
The next step is to store and plot our predictions for GDP.

86
00:07:25,350 --> 00:07:30,720
So from my point of view, it's difficult to tell whether this is better or worse than what we had before.

87
00:07:31,080 --> 00:07:32,990
But it doesn't look particularly bad.

88
00:07:35,580 --> 00:07:38,730
The next step is to store and plot our predictions for term spread.

89
00:07:42,990 --> 00:07:46,710
So in this case, we can see that our forecast is much worse than before.

90
00:07:47,280 --> 00:07:51,900
When we look at the end sample predictions, we can tell that they definitely just lag the previous

91
00:07:51,900 --> 00:07:52,440
value.

92
00:07:55,260 --> 00:07:59,730
The next step is to compute the R squared for the GDP for both train and test.

93
00:08:02,790 --> 00:08:07,260
So for the VAR model, we see that for both train and test the r squared is worse.

94
00:08:07,710 --> 00:08:11,290
This means that var does not be varma, at least for GDP.

95
00:08:11,310 --> 00:08:13,290
With the hyper parameters we used.

96
00:08:16,040 --> 00:08:20,420
The next step is to compute the R squared for the term spread for both train and test.

97
00:08:23,720 --> 00:08:27,620
So again, we see that for both train and test, the R squared is worse.

98
00:08:28,280 --> 00:08:32,770
Again, we can conclude that Varma is better than var, at least for the orders we selected.

99
00:08:36,460 --> 00:08:39,880
The final step in this lecture is to check our remote baseline.

100
00:08:40,450 --> 00:08:44,230
Since this code is essentially the same as before, I won't explain it again.

101
00:08:44,980 --> 00:08:48,010
Note that we are using the same PNG Q as our Varma model.

102
00:08:48,160 --> 00:08:49,570
For a more fair comparison.

103
00:08:55,780 --> 00:08:59,140
So this time we see that Arima does not outperform Varma.

104
00:08:59,950 --> 00:09:04,960
The R squared is worse for both train and test for both GDP and term spread.

105
00:09:06,040 --> 00:09:11,050
In this case, we've seen that Varma is useful and that there could be some predictive value in looking

106
00:09:11,050 --> 00:09:12,730
across the two time series.
