1
00:00:11,070 --> 00:00:15,450
In this lecture, we are going to look at how to work with the simple moving average in code.

2
00:00:16,080 --> 00:00:19,350
This lecture is going to walk you through a Prepare to CoLab notebook.

3
00:00:19,590 --> 00:00:25,260
Although a very good exercise, which I always recommend, is once you know how this is done, to try

4
00:00:25,260 --> 00:00:28,650
and recreate it yourself with as few references as possible.

5
00:00:29,220 --> 00:00:34,890
As always, you can check the lectures, how to code by yourself and how to practice for a more in-depth

6
00:00:34,890 --> 00:00:35,610
discussion.

7
00:00:36,210 --> 00:00:41,580
If there's anything in this lecture you didn't understand or you think I missed the step or didn't explain

8
00:00:41,580 --> 00:00:44,880
why we were doing something, please use the Q&amp;A to inquire.

9
00:00:45,510 --> 00:00:50,580
As always, you can look at the title of the notebook to determine what notebook we are currently looking

10
00:00:50,580 --> 00:00:50,940
at.

11
00:00:52,440 --> 00:00:56,760
So let's start by downloading the file esp five closed at CSC.

12
00:00:57,270 --> 00:01:00,930
As you recall, this is our CSC of Close Presses Only.

13
00:01:01,770 --> 00:01:06,360
Next, we're going to import our usual libraries, pandas, map, plot, lib and numpy.

14
00:01:08,950 --> 00:01:13,390
Next we read in the csv as a dataframe using PD to read CC.

15
00:01:16,500 --> 00:01:21,000
Next we grab the clothes prices for Google since we're going to drop the end values.

16
00:01:21,150 --> 00:01:22,800
We want to make a copy first.

17
00:01:25,080 --> 00:01:28,350
Will assign this to the variable goog x.

18
00:01:28,350 --> 00:01:33,480
We'll do a google dot head as a sanity check to see what's in our data frame.

19
00:01:37,950 --> 00:01:42,660
Next, we'll do a Google plot to see Google stock prices as a time series.

20
00:01:46,560 --> 00:01:52,200
Next we'll calculate the log returns for Google by calling the percent change function, adding one

21
00:01:52,200 --> 00:01:53,580
and then taking the log.

22
00:01:57,580 --> 00:02:00,340
Next, we plot the log returns as a time series.

23
00:02:05,640 --> 00:02:08,460
Next, we're going to test our moving average and code.

24
00:02:09,330 --> 00:02:13,170
So on the right side, you'll see that I'm going to grab the GOOG column.

25
00:02:13,590 --> 00:02:17,790
Then I call the rolling function passing in ten for the window size.

26
00:02:18,240 --> 00:02:21,720
Then I call the mean function to say that I want the rolling mean.

27
00:02:22,530 --> 00:02:26,340
Finally, I assign this to a new column called Smart ten.

28
00:02:27,390 --> 00:02:32,850
Next we call the head function, with the argument 20 to say that we want to see the first 20 rows of

29
00:02:32,850 --> 00:02:33,900
our new data frame.

30
00:02:39,310 --> 00:02:43,460
As you can see, the first nine rows of semi ten are nine.

31
00:02:43,930 --> 00:02:48,490
This is because no rolling window of size ten exists for these elements.

32
00:02:49,000 --> 00:02:54,700
We can see that the first value in, say, May ten shows up in the 10th row as expected.

33
00:02:55,270 --> 00:03:00,580
You might want to do a manual test calculation to ensure that these values are what you expect.

34
00:03:01,120 --> 00:03:03,460
That is, calculate these values by hand.

35
00:03:06,710 --> 00:03:12,370
Next I do another little sanity check to see what type of object is returned by the rolling function.

36
00:03:14,830 --> 00:03:17,800
As you can see, it's an object of type rolling.

37
00:03:18,490 --> 00:03:23,650
This is always useful in case you need to look up the documentation, you know exactly which page to

38
00:03:23,650 --> 00:03:24,280
go to.

39
00:03:26,140 --> 00:03:32,590
Next I call Google Plot again so that we can see our new column plotted against the original Time series.

40
00:03:35,420 --> 00:03:35,630
She?

41
00:03:36,910 --> 00:03:41,650
As you can see, it's basically a smoother version of our original Time series.

42
00:03:42,100 --> 00:03:47,620
You might want to zoom in on your own computer or just select a few rows to see this more clearly.

43
00:03:52,580 --> 00:03:57,710
Next, we're going to calculate another a simple moving average, but this time with a window size of

44
00:03:57,710 --> 00:03:58,310
50.

45
00:04:00,320 --> 00:04:05,750
As you can see, this is the exact same code as before, except with 50 instead of ten.

46
00:04:06,650 --> 00:04:09,650
When we plot this, we see something very interesting.

47
00:04:10,850 --> 00:04:12,500
First, smart 50.

48
00:04:12,710 --> 00:04:17,750
By using more values in its sample, mean is even smoother than, say, ten.

49
00:04:18,380 --> 00:04:24,200
So your lesson is that the more values you use, the smoother the resulting time series will be.

50
00:04:24,950 --> 00:04:29,030
Second, you will see this lagging characteristic in the Time series.

51
00:04:29,480 --> 00:04:34,970
That is, it appears as if the simple moving average lags behind the original TIME series.

52
00:04:35,930 --> 00:04:41,090
If you try different window sizes, you'll see that this effect becomes more and more pronounced as

53
00:04:41,090 --> 00:04:43,040
the window size gets larger and larger.

54
00:04:49,520 --> 00:04:52,940
Next, we're going to work with a Multidimensional Time series.

55
00:04:53,360 --> 00:04:57,590
Let's create a new data frame by using both Google and Apple stocks.

56
00:04:58,040 --> 00:05:01,280
Again, we make a copy and drop any any values.

57
00:05:01,640 --> 00:05:04,340
We'll call this data frame goog aapl.

58
00:05:06,270 --> 00:05:07,230
On the next line.

59
00:05:07,230 --> 00:05:12,180
You can see that after calling the rolling function, I have other options than just the mean.

60
00:05:12,990 --> 00:05:17,430
Since we now have two columns of data, I can calculate the covariance.

61
00:05:23,380 --> 00:05:26,350
As you can see, the results of this is pretty strange.

62
00:05:26,950 --> 00:05:31,660
On the left hand side of this data frame, we can see a sort of multilevel index.

63
00:05:32,140 --> 00:05:37,600
At the top level is just the date, but there's a sub level where we get to choose either Apple or Google.

64
00:05:38,230 --> 00:05:44,020
So if you were to index this data frame by date, you would get a two by two covariance matrix.

65
00:05:44,620 --> 00:05:49,900
This is kind of a weird way of storing what is really a three dimensional tensor and a two dimensional

66
00:05:49,900 --> 00:05:50,620
data frame.

67
00:05:55,920 --> 00:06:01,830
Next, I demonstrate how you can select a single row by date and convert it to an empire way to get

68
00:06:01,830 --> 00:06:03,780
a single covariance matrix.

69
00:06:08,550 --> 00:06:13,020
Now, as you know, in finance, we like to work with returns rather than prices.

70
00:06:13,380 --> 00:06:18,240
So in the next block, we're going to calculate the log returns for both Apple and Google.

71
00:06:19,020 --> 00:06:21,690
Notice that this is exactly the same code as before.

72
00:06:21,840 --> 00:06:24,780
Since it works for one or multiple columns.

73
00:06:31,420 --> 00:06:37,120
In the next block, we're going to calculate the simple moving average for both Apple and Google returns.

74
00:06:41,060 --> 00:06:43,910
Notice that again it's just the same code as before.

75
00:06:43,910 --> 00:06:45,470
Excepts repeated twice.

76
00:06:48,300 --> 00:06:52,740
Next, we plot both Googles and Apples returns along with their moving averages.

77
00:06:58,940 --> 00:07:03,890
It's interesting that there seems to be a pretty strong correlation between their returns, although

78
00:07:03,890 --> 00:07:05,690
there are definitely some outliers.

79
00:07:06,200 --> 00:07:09,050
Recall that we discovered this in an earlier lecture as well.

80
00:07:14,180 --> 00:07:17,540
Next we calculate the rolling covariance of the returns.

81
00:07:17,960 --> 00:07:22,010
To do this, we call rolling 50 and then we call the code function.

82
00:07:22,550 --> 00:07:27,710
If we do a cocktail, we can see the last few rows of the covariance data frame.

83
00:07:31,670 --> 00:07:37,010
Notice that if I don't pass in any arguments into the tail function, it just prints out the last five

84
00:07:37,010 --> 00:07:40,670
rows, which cuts off one of the covariance matrices.

85
00:07:41,030 --> 00:07:42,980
So this is a full covariance matrix.

86
00:07:43,340 --> 00:07:44,960
This is a full covariance matrix.

87
00:07:44,960 --> 00:07:46,640
And this is half of one.

88
00:07:51,370 --> 00:07:55,060
Next we calculate the rolling correlation of the log returns.

89
00:07:55,450 --> 00:07:59,830
To do this, we call rolling 50 again and then we call the core function.

90
00:08:00,310 --> 00:08:07,060
If we do a core detail, we can see the last few rows of the dataframe I've passed in 16, so that we

91
00:08:07,060 --> 00:08:09,790
see the last eight correlation matrices in full.

92
00:08:15,490 --> 00:08:18,580
Notice how the correlation matrix changes over time?

93
00:08:19,360 --> 00:08:24,850
For most of the correlation matrices, we can see that there is a negative correlation between Apple

94
00:08:24,850 --> 00:08:27,970
and Google returns, at least for these time periods.

95
00:08:30,670 --> 00:08:36,270
The final correlation is positive, demonstrating that the correlation can even change direction.
