1
00:00:11,170 --> 00:00:15,820
So in this lecture, we will look at another example of how to use Varma in Python.

2
00:00:16,570 --> 00:00:20,650
This time we're going to use a data set that is a bit more complicated to work with.

3
00:00:21,250 --> 00:00:26,650
This data set falls into the field of econometrics, which is probably the most common application of

4
00:00:26,650 --> 00:00:29,290
vector auto regression in this case.

5
00:00:29,290 --> 00:00:34,210
The prevailing wisdom tells us that there should be some temporal correlation between different time

6
00:00:34,210 --> 00:00:34,840
series.

7
00:00:36,400 --> 00:00:41,500
So again, we'll start by updating stats models and then putting the same libraries we did before.

8
00:00:47,640 --> 00:00:53,760
The next step is to download the data now, one kind of interesting thing about this data set is that

9
00:00:53,760 --> 00:00:59,670
it's pretty common to see in classes on econometrics, but for some reason, statisticians never split

10
00:00:59,670 --> 00:01:04,340
their data into train and test and never evaluate their models on out of sample data.

11
00:01:04,920 --> 00:01:09,750
So they tend to generate forecasts in the future, which you cannot even check with the Give and Time

12
00:01:09,750 --> 00:01:10,370
series.

13
00:01:10,800 --> 00:01:15,720
I'm not sure why this is, but that tends to be how statisticians do things for us.

14
00:01:15,720 --> 00:01:20,570
We will check our forecasts and get a true sense of how accurate our predictions really are.

15
00:01:25,770 --> 00:01:30,120
OK, so the next step is to read in our data file using to read Excel.

16
00:01:30,720 --> 00:01:33,780
Note that this is an Excel spreadsheet, not a ESV.

17
00:01:37,830 --> 00:01:41,520
The next step is to call the FDA ahead to see what our data looks like.

18
00:01:46,240 --> 00:01:52,120
OK, so we've got a bunch of time series, but it's not clear what anything is since this is not a class

19
00:01:52,120 --> 00:01:53,140
about economics.

20
00:01:53,290 --> 00:01:56,980
We're not going to really worry too much about the meaning behind this data.

21
00:01:56,980 --> 00:02:01,480
But you're encouraged to study economics on your own if that's what interests you.

22
00:02:02,200 --> 00:02:03,790
It really is a different topic.

23
00:02:03,790 --> 00:02:05,620
So it won't be of much use.

24
00:02:06,370 --> 00:02:11,560
At the same time, if you've taken my finance course, you should have some intuition for the data will

25
00:02:11,560 --> 00:02:12,190
be using.

26
00:02:13,210 --> 00:02:15,310
So that's helpful, but not necessary.

27
00:02:16,600 --> 00:02:20,200
Also, notice that the date is not in a format we typically use.

28
00:02:20,710 --> 00:02:21,430
In this case.

29
00:02:21,430 --> 00:02:24,990
We have the year followed by the quarter with a colon in between.

30
00:02:25,810 --> 00:02:28,030
Note that the second number is not in months.

31
00:02:28,180 --> 00:02:30,940
It goes one, two, three, four and then back to one.

32
00:02:31,090 --> 00:02:32,500
So these are quarter's.

33
00:02:36,870 --> 00:02:42,930
The next step is to write a function to pass the date column as before, the goal is to use the appli

34
00:02:42,930 --> 00:02:45,920
function, which will map this function onto each row.

35
00:02:46,890 --> 00:02:50,850
So inside the function, we're going to start by splitting the data by the colon.

36
00:02:51,300 --> 00:02:54,210
So this gives us the year in the quarter as strings.

37
00:02:54,840 --> 00:02:58,500
Now, what we would like to do is convert the quarter into a month.

38
00:02:59,040 --> 00:03:01,710
I encourage you to go through this equation on your own.

39
00:03:01,950 --> 00:03:08,250
But basically we want a function that will map one to January two to April three to July and four to

40
00:03:08,250 --> 00:03:08,900
October.

41
00:03:09,480 --> 00:03:14,200
So one goes to one, two goes to four, three goes to seven and four goes to ten.

42
00:03:14,910 --> 00:03:16,820
In other words, it's a linear equation.

43
00:03:17,790 --> 00:03:23,550
The intuition here is that the output should go up by three and the offset is one since January is encoded

44
00:03:23,550 --> 00:03:24,130
as one.

45
00:03:25,170 --> 00:03:27,480
OK, so this equation will give us the month.

46
00:03:28,260 --> 00:03:31,550
The next step is to put this into a string along with the year.

47
00:03:32,250 --> 00:03:37,140
The final step is to convert this into a daytime object using the strip time function.

48
00:03:42,250 --> 00:03:45,700
The next step is to call appli passing in the function we just made.

49
00:03:49,370 --> 00:03:54,350
The next step is to call DFG head again to see that our new column has been created successfully.

50
00:03:59,880 --> 00:04:04,890
OK, so you can see that we now have a dead column where the months go up by three as expected.

51
00:04:08,090 --> 00:04:13,820
The next step is to set the date column as the index for the data frame will also drop these unnecessary

52
00:04:13,820 --> 00:04:14,340
columns.

53
00:04:14,360 --> 00:04:15,620
We don't need any more.

54
00:04:19,190 --> 00:04:23,210
The next step is to set the frequency of our index, which is in quarter's.

55
00:04:27,050 --> 00:04:31,310
The next step is to call the FDA had again to check the format of our data frame.

56
00:04:35,780 --> 00:04:40,340
So you can see now that the old columns are gone and we have a new date index.

57
00:04:44,210 --> 00:04:50,480
OK, so the next step is to look at our data for this example, we'll be looking at GDP and something

58
00:04:50,480 --> 00:04:51,600
called the term spread.

59
00:04:52,040 --> 00:04:54,340
I'll explain what term spread means very shortly.

60
00:04:55,010 --> 00:04:57,820
But GDP stands for gross domestic product.

61
00:04:58,310 --> 00:05:04,070
The technical definition is that it's the total market value of the goods and services produced within

62
00:05:04,070 --> 00:05:06,630
a country within a specific time period.

63
00:05:06,920 --> 00:05:08,450
In this case, three months.

64
00:05:09,020 --> 00:05:11,900
Basically, it's a measure of a country's economic health.

65
00:05:12,380 --> 00:05:15,920
So this column is stored as GDP C ninety six.

66
00:05:20,460 --> 00:05:25,760
OK, so notice that this column has a very strong trend, in other words, it's non stationary.

67
00:05:26,310 --> 00:05:31,200
So using what we learned about Arima, we know that we should difference this column before applying

68
00:05:31,200 --> 00:05:32,220
any kind of model.

69
00:05:36,460 --> 00:05:41,370
In practice, we're going to take the log difference, which is the same thing we do with stock prices.

70
00:05:41,830 --> 00:05:43,460
So this should be unsurprising.

71
00:05:43,960 --> 00:05:45,730
We'll call this GDP growth.

72
00:05:50,430 --> 00:05:52,650
The next step is to compute the term spread.

73
00:05:53,160 --> 00:05:55,170
Now, this one is a bit more complex.

74
00:05:55,710 --> 00:06:01,140
To understand this, you should be familiar with fixed rate investments like Treasury bills or whatever

75
00:06:01,140 --> 00:06:02,400
they have in your country.

76
00:06:03,330 --> 00:06:09,120
So if you go to your bank's website and look up fixed rate investments you can buy, that would be the

77
00:06:09,120 --> 00:06:10,320
equivalent of this.

78
00:06:10,950 --> 00:06:12,420
OK, so what is spread?

79
00:06:12,990 --> 00:06:17,230
The spread is the difference between a long term investment and a short term investment.

80
00:06:17,610 --> 00:06:20,950
In this case, a 10 year investment and the three month investment.

81
00:06:21,870 --> 00:06:27,270
So, again, if you go to your bank's website and look up the fixed rate investments, you should notice

82
00:06:27,270 --> 00:06:28,070
this pattern.

83
00:06:29,010 --> 00:06:34,890
The pattern is that short term investments usually have lower interest rates than long term investments.

84
00:06:35,490 --> 00:06:40,710
This makes sense because you should be rewarded for giving up your money for longer periods of time.

85
00:06:41,310 --> 00:06:44,760
Now, the difference between these two rates fluctuates over time.

86
00:06:45,180 --> 00:06:50,820
So sometimes you'll see that the long term investment offers a much higher rate, but sometimes you'll

87
00:06:50,820 --> 00:06:52,260
see that they are nearly the same.

88
00:06:53,280 --> 00:06:58,440
Now, again, I don't want to get into economics, but this is important for people who care about things

89
00:06:58,440 --> 00:07:00,750
like interest rates for us.

90
00:07:00,780 --> 00:07:04,500
The interesting part is that these two time series may be related.

91
00:07:08,410 --> 00:07:10,630
The next step will be to plot the term spread.

92
00:07:15,680 --> 00:07:21,650
OK, so notice that it's not quite stationary, but there isn't any upward or downward trend, so we

93
00:07:21,650 --> 00:07:27,950
might be justified in leaving this as is notice that there seems to be some cyclical pattern, although

94
00:07:27,950 --> 00:07:30,230
it's not as strong as something like a sine wave.

95
00:07:32,980 --> 00:07:37,570
So at this point, since we now have a good understanding of this data and we've seen what it looks

96
00:07:37,570 --> 00:07:41,360
like, this is a good time to stop in the next lecture.

97
00:07:41,380 --> 00:07:46,260
We'll continue with the usual steps of fitting the model, making the forecast and so forth.
