1
00:00:11,030 --> 00:00:17,420
In this lecture, we are going to be looking at stationary in code specifically will be applying the

2
00:00:17,420 --> 00:00:23,210
augmented Dicky Fuller test to a variety of data sets and will learn whether or not it actually fits

3
00:00:23,210 --> 00:00:26,410
our intuition about what is stationary and what is not.

4
00:00:27,450 --> 00:00:32,670
This lecture is going to walk you through a prepared CoLab notebook, although a very good exercise,

5
00:00:32,670 --> 00:00:38,100
which I always recommend, is once you know how this is done, to try and recreate it yourself with

6
00:00:38,100 --> 00:00:39,910
as few references as possible.

7
00:00:40,440 --> 00:00:46,110
As always, you can check the lectures, how to code by yourself and how to practice for a more in-depth

8
00:00:46,110 --> 00:00:46,890
discussion.

9
00:00:47,370 --> 00:00:52,710
If there's anything in this lecture you didn't understand or you think I missed a step or didn't explain

10
00:00:52,710 --> 00:00:55,940
why we were doing something, please use the Q&amp;A to inquire.

11
00:00:56,550 --> 00:01:01,560
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

12
00:01:01,560 --> 00:01:01,870
at.

13
00:01:03,380 --> 00:01:09,920
OK, so let's start by importing pendas, numpty, matplotlib and the functionality fuller from stat's

14
00:01:09,920 --> 00:01:10,580
models.

15
00:01:14,130 --> 00:01:17,310
Next, we're going to download our airline passengers, Data said.

16
00:01:21,840 --> 00:01:25,980
Next, we're going to load in our data set using pedigreed CSFI.

17
00:01:28,930 --> 00:01:30,820
Next, we're going to plot our data said.

18
00:01:32,700 --> 00:01:38,190
So this will help remind you that this data set is not stationary, there's both a trend and a seasonal

19
00:01:38,190 --> 00:01:39,860
component to this data.

20
00:01:43,860 --> 00:01:48,360
Next, we're going to test out the 80 full or function UNDEF passengers.

21
00:01:51,480 --> 00:01:54,790
OK, so as you can see, we get a big table of numbers.

22
00:01:55,020 --> 00:01:56,280
What do these numbers mean?

23
00:01:56,850 --> 00:01:59,690
Well, we can check the documentation for stats models.

24
00:02:00,270 --> 00:02:05,600
The two key pieces of information we are interested in are the test statistic and the P-value.

25
00:02:06,060 --> 00:02:11,800
Most of the p value, as you know, the test statistic is used to compute the P value.

26
00:02:12,390 --> 00:02:15,480
Note that these are the first to return values in the tuple.

27
00:02:21,460 --> 00:02:27,040
So next, we're going to write a little helper function called ATF, this will help us more easily check

28
00:02:27,040 --> 00:02:31,860
the results of the ATF test rather than trying to interpret a couple of numbers.

29
00:02:32,380 --> 00:02:34,840
This function takes in a series called X.

30
00:02:35,440 --> 00:02:41,620
Inside the function, we run the fuller function on X. Next, we print the test statistic and the P

31
00:02:41,620 --> 00:02:42,100
value.

32
00:02:42,790 --> 00:02:48,800
As is typical, we'll use a significance threshold of five percent if the P value is significant.

33
00:02:49,000 --> 00:02:50,410
We'll print out stationery.

34
00:02:50,620 --> 00:02:53,080
Otherwise we'll print out non stationary.

35
00:02:58,440 --> 00:03:02,610
OK, so let's test their function on the DF passengers column once again.

36
00:03:05,600 --> 00:03:10,800
As you can see, the p value is quite large and we do not reject the null hypothesis.

37
00:03:11,210 --> 00:03:15,230
In other words, we conclude that this Time series is non stationary.

38
00:03:15,730 --> 00:03:18,170
Hopefully this is the result you anticipated.

39
00:03:21,430 --> 00:03:26,080
Next, let's test whether or not this function works for an actual stationary signal.

40
00:03:26,650 --> 00:03:33,730
So this will be ID noise sampled from the standard normal note that the signal is strong and stationary

41
00:03:33,730 --> 00:03:37,260
because the entire distribution remains constant over time.

42
00:03:40,850 --> 00:03:47,330
All right, and we get a very small p value on the order of ten to the minus 18, therefore we conclude

43
00:03:47,330 --> 00:03:50,360
that the signal is stationary as expected.

44
00:03:53,750 --> 00:03:59,120
Now, everyone knows about the normal distribution, but what if we try a more exotic distribution like

45
00:03:59,120 --> 00:03:59,860
the Gamma?

46
00:04:00,410 --> 00:04:06,080
Again, A theoretically we know that the signal is strong, sent stationary because the distribution

47
00:04:06,140 --> 00:04:09,530
does not change over time and each sample is Eid.

48
00:04:13,650 --> 00:04:18,850
OK, and we get another very small p value on the order of ten to the minus 16.

49
00:04:19,350 --> 00:04:22,290
Again, we conclude that this signal is stationary.

50
00:04:23,640 --> 00:04:28,230
And by the way, if you don't know what the gamma distribution is, I'd recommend plotting this Time

51
00:04:28,230 --> 00:04:31,380
series and also plotting a histogram of these samples.

52
00:04:31,920 --> 00:04:36,420
The gamma is a very important distribution, especially in Bayesian machine learning.

53
00:04:40,330 --> 00:04:45,490
Next, we're going to take the log of the passengers call, we'll be making use of this over the next

54
00:04:45,490 --> 00:04:46,590
few experiments.

55
00:04:49,150 --> 00:04:52,450
So next, let's run a test on the log passengers column.

56
00:04:55,030 --> 00:04:58,720
Not surprisingly, the result is that it's still non stationary.

57
00:05:02,220 --> 00:05:04,740
Next, let's difference the passengers call them.

58
00:05:06,410 --> 00:05:13,220
Remember that this is what we would like to fit in a rhema, too, so if we plot the this column, we

59
00:05:13,220 --> 00:05:16,820
see that the variance of the signal appears to increase over time.

60
00:05:17,270 --> 00:05:20,720
We can therefore guess that the signal is nine stationary.

61
00:05:24,550 --> 00:05:30,160
All right, so let's run the ATF test on the first difference of the passengers, call them, note that

62
00:05:30,160 --> 00:05:35,770
we have to drop in because whenever we take the first difference, we always have one and a value in

63
00:05:35,770 --> 00:05:36,690
the first row.

64
00:05:40,080 --> 00:05:44,320
OK, so, again, we end up saying that the signal is non stationary.

65
00:05:44,760 --> 00:05:47,670
But notice how close the P value is to five percent.

66
00:05:48,300 --> 00:05:54,270
If you had set your significant threshold to a higher number, like 10 percent, you would have concluded

67
00:05:54,270 --> 00:05:56,520
that the above signal is stationary.

68
00:05:59,320 --> 00:06:05,290
OK, so next, we're going to take the first difference of the log passengers will store this in a column

69
00:06:05,290 --> 00:06:06,640
called Dif Log.

70
00:06:10,490 --> 00:06:16,280
If we plot the flag, we notice that unlike the first difference of the raw passengers, the variance

71
00:06:16,280 --> 00:06:18,130
here does not increase over time.

72
00:06:18,800 --> 00:06:24,620
Perhaps then this signal is more stationary than just the Rothfuss difference, of course.

73
00:06:24,650 --> 00:06:26,090
We still have to run our test.

74
00:06:27,960 --> 00:06:34,800
So next, we run the ATF test on the deflate column again, recall that we have to drop the NRA values.

75
00:06:39,140 --> 00:06:44,660
All right, so surprisingly, we get a higher p value than we did when the variance was increasing.

76
00:06:49,230 --> 00:06:55,380
OK, so this is a good opportunity to remember that we care about stock prices in this cause, so let's

77
00:06:55,380 --> 00:06:57,750
download the SP 500 a CSV.

78
00:07:03,630 --> 00:07:11,100
Next, we're going to load in the S&amp;P 500 at CCV using Peaty that reads, CSFI will assign this data

79
00:07:11,100 --> 00:07:13,230
frame to a variable called stocks.

80
00:07:15,340 --> 00:07:19,740
Next, we do a stock stat head just to remind you of what's inside this data frame.

81
00:07:20,530 --> 00:07:26,380
As you can see, the final column is the name column, which can be used to filter out any particular

82
00:07:26,380 --> 00:07:27,130
stock ticker.

83
00:07:30,410 --> 00:07:35,450
Next, we're going to grab all the rows where the name is equal to Googs and we're going to grab the

84
00:07:35,450 --> 00:07:36,380
close column.

85
00:07:38,820 --> 00:07:42,750
Next, we're going to take the log of the close, call him to get the log price.

86
00:07:45,120 --> 00:07:49,710
Next, we're going to take the first difference of the log price column to get the log return.

87
00:07:52,100 --> 00:07:55,280
Next, we're going to plot the log price as a time series.

88
00:07:57,580 --> 00:08:00,040
As you can see, there is a clear trend.

89
00:08:02,770 --> 00:08:04,930
Next, we're going to plot the log return.

90
00:08:06,890 --> 00:08:12,040
So this looks pretty stationary, although the variance does seem to increase in some places.

91
00:08:15,770 --> 00:08:18,950
OK, so let's try the ATF test on the log price.

92
00:08:21,950 --> 00:08:26,300
As expected, we conclude that the log price is not stationary.

93
00:08:28,920 --> 00:08:32,190
Next, let's run the ATF test on the log returns.

94
00:08:34,960 --> 00:08:39,960
So the log returns are so stationary that the P-value just gets rounded down to zero.

95
00:08:43,720 --> 00:08:49,660
Now, just for good measure, let's test a different stock, let's use our other favorite stock, Starbucks,

96
00:08:50,200 --> 00:08:54,010
again, the same code to calculate the log price and the log return.

97
00:08:57,150 --> 00:08:58,860
Next, we plot the log price.

98
00:09:01,020 --> 00:09:03,120
Again, we see a pretty clear trend.

99
00:09:07,330 --> 00:09:13,570
Next, we plot the log return again, it looks pretty stationary, except that the variance seems to

100
00:09:13,570 --> 00:09:14,590
not be constant.

101
00:09:17,870 --> 00:09:21,500
So let's run the ATF test on the Starbucks log price.

102
00:09:23,090 --> 00:09:25,490
As expected, it's nine stationary.

103
00:09:28,770 --> 00:09:32,400
Next, let's run the ATF test on the Starbucks log return.

104
00:09:35,410 --> 00:09:40,750
All right, so just like the Google log returns, the p value is so small that it gets rounded down

105
00:09:40,750 --> 00:09:41,420
to zero.

106
00:09:42,040 --> 00:09:45,610
Therefore, we conclude that these log returns are stationary.
