1
00:00:11,110 --> 00:00:17,650
In this lecture, we're going to discuss a contemporary application of the simple moving average, since

2
00:00:17,650 --> 00:00:22,930
this is still fresh in our minds, I think it's a great opportunity to see how basic concepts like these

3
00:00:23,140 --> 00:00:24,690
get applied in the real world.

4
00:00:25,420 --> 00:00:32,290
So covid-19, as you know, many data sources use a seven day rolling average to report the trend of

5
00:00:32,290 --> 00:00:33,340
new accounts.

6
00:00:33,970 --> 00:00:38,620
Using what you've learned in this section, you have a kind of insider view of the pros and cons of

7
00:00:38,620 --> 00:00:39,490
this approach.

8
00:00:40,630 --> 00:00:42,650
On one hand, this is very useful.

9
00:00:43,120 --> 00:00:46,360
We've now seen real world reasons for statistical noise.

10
00:00:46,720 --> 00:00:51,300
For example, sometimes there are data entry errors that needs to be corrected later.

11
00:00:52,180 --> 00:00:53,820
This will skew the case counts.

12
00:00:54,370 --> 00:01:00,100
Maybe they'll be undercounted one day and then overcounted another day to correct for the previous undercounting.

13
00:01:00,910 --> 00:01:06,490
Another example is where a specific locations don't always report their accounts promptly to whatever

14
00:01:06,490 --> 00:01:09,530
central authority is providing their accounts to the public.

15
00:01:10,300 --> 00:01:13,210
In this case, again, this will skew the case counts.

16
00:01:13,630 --> 00:01:18,910
The counts might be undercounted one day because some locations forgot to send their counts and then

17
00:01:18,910 --> 00:01:22,430
overcounted the next day when the backlogged counts are included.

18
00:01:23,020 --> 00:01:26,580
We've already seen how accounts from two different sources can disagree.

19
00:01:27,280 --> 00:01:30,190
You'd think a simple task like counting would be error free.

20
00:01:30,400 --> 00:01:32,770
But these are the complexities of the real world.

21
00:01:33,460 --> 00:01:38,930
We've also seen situations where incompetent politicians direct people to simply lie.

22
00:01:39,490 --> 00:01:41,710
In this case, again, accounts are inaccurate.

23
00:01:42,580 --> 00:01:47,500
Now, simple moving averages will not help you in this situation, but they do help you when the counts

24
00:01:47,500 --> 00:01:49,270
are simply delayed or backlogged.

25
00:01:50,930 --> 00:01:56,060
So using what you've learned in this course, what can we say about these seven day rolling averages?

26
00:01:56,630 --> 00:02:01,350
Well, we know that while it can smooth out statistical noise, it also has a lag.

27
00:02:01,790 --> 00:02:06,680
So if you see a very high seven day rolling average, this means that, yes, things are bad.

28
00:02:07,040 --> 00:02:09,220
But they were already bad a few days ago.

29
00:02:09,680 --> 00:02:11,480
In other words, it's old news.

30
00:02:11,990 --> 00:02:14,960
You might think an exponential moving average might help.

31
00:02:15,200 --> 00:02:18,950
But remember, this data needs to be communicated to the public.

32
00:02:19,310 --> 00:02:24,650
Most members of the public do not have the capacity to understand what an exponential moving average

33
00:02:24,800 --> 00:02:25,680
even is.

34
00:02:26,450 --> 00:02:32,030
Furthermore, because it involves a tunable parameter, specifically the decay rate, it's not clear

35
00:02:32,030 --> 00:02:37,520
how you could be consistent with everyone else in the world so that the counts are more easily comparable.

36
00:02:39,050 --> 00:02:44,420
So I hope this lecture helped you understand one real world application of the simple moving average

37
00:02:44,420 --> 00:02:50,540
as it pertains to science reporting, you've seen why such a technique is useful, but you've also seen

38
00:02:50,540 --> 00:02:52,210
the ways in which it is not useful.

39
00:02:52,760 --> 00:02:57,770
You now understand some of the real world issues that data scientists face when trying to transform

40
00:02:57,770 --> 00:03:01,300
data in such a way that can be easily communicated to others.
