1
00:00:11,110 --> 00:00:16,300
So in this lecture, we are going to go over a topic that is typically taught in a module on Varma,

2
00:00:16,720 --> 00:00:22,870
which is Granger causality, earlier in this course, we were exposed to various hypothesis tests.

3
00:00:23,350 --> 00:00:30,100
This is another the Granger causality test is a statistical test that checks whether one time series

4
00:00:30,100 --> 00:00:33,550
is useful in forecasting another time series.

5
00:00:34,510 --> 00:00:40,900
Now, one important thing to note about the Granger causality test is it's actually not a test for causality.

6
00:00:41,590 --> 00:00:48,310
When we use the term causality, we typically mean that one thing causes another thing, and this concept

7
00:00:48,310 --> 00:00:52,480
can get pretty philosophical in this case, despite its name.

8
00:00:52,690 --> 00:00:55,450
We are actually not talking about true causality.

9
00:00:56,140 --> 00:00:58,750
Instead, we take a more mechanical perspective.

10
00:00:59,410 --> 00:01:05,500
Granger causal simply means that one time series can be used in forecasting a different time series.

11
00:01:06,310 --> 00:01:11,080
A predictive relationship does not necessarily imply a causal relationship.

12
00:01:15,880 --> 00:01:17,470
Let's now look at some details.

13
00:01:18,340 --> 00:01:25,630
Suppose that we have two stationary time Series X of T and Y of T, suppose that we build in auto regressive

14
00:01:25,630 --> 00:01:27,040
model for Y of T.

15
00:01:27,940 --> 00:01:33,880
Now suppose that we build another auto regressive model to predict Y of T, but instead of only using

16
00:01:33,880 --> 00:01:38,170
past values of Y of T, we also use arbitrary legs from X of T.

17
00:01:39,020 --> 00:01:44,680
As you can see, you can pick a very specific lags like X of T minus three and X of T minus seven and

18
00:01:44,680 --> 00:01:45,370
so forth.

19
00:01:45,940 --> 00:01:49,570
So this method allows you to test only the legs you choose.

20
00:01:50,530 --> 00:01:56,770
Now what this test essentially does is it checks whether or not the coefficients on the legs of X are

21
00:01:56,770 --> 00:01:58,360
statistically significant.

22
00:01:59,410 --> 00:02:04,090
If they are, then we conclude that X of T Granger causes Y of T.

23
00:02:08,890 --> 00:02:14,770
As a reminder, recall that in regression analysis, it's always possible to test whether the coefficients

24
00:02:14,770 --> 00:02:17,680
of your model are statistically significant.

25
00:02:18,700 --> 00:02:24,280
Intuitively, what we mean by significant is that the coefficient is far away from zero.

26
00:02:25,300 --> 00:02:31,120
If it is far away enough from zero, then we have more confidence that it truly has an effect on the

27
00:02:31,120 --> 00:02:33,250
dependent variable or the target.

28
00:02:34,330 --> 00:02:39,850
In this lecture, we're essentially doing this, but with VAR models instead of generic linear regression.

29
00:02:44,540 --> 00:02:50,180
Note that there is also a multivariate version of the Granger causality test, but this is not available

30
00:02:50,180 --> 00:02:51,710
in the Stats Model's package.

31
00:02:52,670 --> 00:02:55,370
This looks at a VAR model, as we've seen in this section.

32
00:02:56,330 --> 00:03:03,560
Essentially, if we find that a subscript TAO at Roget column, I for any title has an absolute value,

33
00:03:03,560 --> 00:03:10,340
which is significantly larger than zero, then we can say that the time Series Y so by Granger causes

34
00:03:10,340 --> 00:03:11,840
the time Series Y Sub J.

35
00:03:12,620 --> 00:03:14,030
This should be pretty intuitive.

36
00:03:14,720 --> 00:03:21,140
Basically, it means that the coefficient from a lag of Y has an influence on the current value of YJ.

37
00:03:22,640 --> 00:03:26,000
So hopefully this helps to connect this topic to the rest of this section.

38
00:03:26,420 --> 00:03:31,730
But again, a note that the multivariate version is not included in the stats modest package.

39
00:03:36,380 --> 00:03:40,610
Now, let's get to the practical question, which is how do you do this in Python?

40
00:03:41,510 --> 00:03:46,130
Suppose that we have a buy very at time series, which occupy two columns of a data frame.

41
00:03:46,850 --> 00:03:52,490
We can then pass this through the method Granger causality tests, and this will produce a p value for

42
00:03:52,490 --> 00:03:56,120
every leg you want to test for using various different tests.

43
00:03:56,870 --> 00:04:01,730
Now, in order to understand what these specific tests are and how they work, you would need to take

44
00:04:01,730 --> 00:04:04,100
a graduate course on mathematical statistics.

45
00:04:04,490 --> 00:04:06,980
So that's definitely outside the scope of this course.

46
00:04:08,150 --> 00:04:13,880
These tests will typically produce similar results anyway, so the main thing in your work is to remain

47
00:04:13,880 --> 00:04:14,690
consistent.

48
00:04:15,680 --> 00:04:20,780
As always, you will choose a significance threshold beforehand, and any p value smaller than this

49
00:04:20,780 --> 00:04:23,000
threshold will be deemed significant.