1
00:00:11,110 --> 00:00:17,860
In this lecture, we are going to discuss a method of forecasting called the naive forecast before we

2
00:00:17,860 --> 00:00:20,030
discuss what the new forecast is.

3
00:00:20,320 --> 00:00:24,280
Let's talk about the importance of establishing baselines in machine learning.

4
00:00:25,120 --> 00:00:28,750
The purpose of a baseline is to have a relevant point of comparison.

5
00:00:29,410 --> 00:00:34,660
To give you an example, suppose that you went to your professor tomorrow and say, hey, I just made

6
00:00:34,660 --> 00:00:35,650
a great discovery.

7
00:00:35,950 --> 00:00:39,880
I built a model using deep learning and I got 99 percent accuracy.

8
00:00:40,510 --> 00:00:45,670
Unfortunately, the statement is meaningless because you don't have a context of a baseline.

9
00:00:46,420 --> 00:00:52,810
Your professor might say, but student, we already know that a simple linear model achieves 100 percent

10
00:00:52,810 --> 00:00:53,590
accuracy.

11
00:00:54,370 --> 00:00:57,670
In this case, your model is worse than what currently exists.

12
00:00:57,880 --> 00:00:59,960
And furthermore, it's also slower.

13
00:01:00,580 --> 00:01:03,210
So how can we make sure that we don't make this mistake?

14
00:01:07,990 --> 00:01:10,100
The answer is using a baseline.

15
00:01:10,720 --> 00:01:15,310
You'll notice that when you're reading machine learning papers, they often compare their results against

16
00:01:15,310 --> 00:01:16,810
the existing state of the art.

17
00:01:17,500 --> 00:01:22,060
Now, it's important to realize that you don't have to constantly try to beat the state of the art.

18
00:01:22,600 --> 00:01:27,400
In fact, that's kind of antithetical to science because all you're doing is chasing numbers.

19
00:01:27,910 --> 00:01:33,640
A good a real life example of that is when students study very hard to memorize their notes so that

20
00:01:33,640 --> 00:01:39,580
they can ace their exam without truly understanding the material or how it's applied when you're chasing

21
00:01:39,580 --> 00:01:40,430
only numbers.

22
00:01:40,720 --> 00:01:44,560
Sometimes it's easy to forget why those numbers matter in the first place.

23
00:01:45,190 --> 00:01:46,790
Anyway, to get back to the point.

24
00:01:47,020 --> 00:01:50,090
It's not always true that you want to compare to the state of the art.

25
00:01:50,710 --> 00:01:55,600
Sometimes if you just want to test whether or not something is working, for example, as a proof of

26
00:01:55,600 --> 00:01:59,110
concept, then the simplest model possible should suffice.

27
00:02:03,880 --> 00:02:10,450
In Time series analysis, the simplest model possible happens to be the naive forecast, what is the

28
00:02:10,450 --> 00:02:11,560
new forecast?

29
00:02:11,980 --> 00:02:17,530
Well, it's just another name for something we've already discussed that is to copy the previous known

30
00:02:17,530 --> 00:02:19,120
value forward in time.

31
00:02:19,960 --> 00:02:26,230
One interesting phenomena that happens in a Time series analysis is that a lot of bad models end up

32
00:02:26,230 --> 00:02:27,970
looking like naive forecasts.

33
00:02:28,570 --> 00:02:33,600
When you look at the model predictions from afar, they seem to be pretty close to the true values.

34
00:02:34,030 --> 00:02:39,340
But when you zoom in, you can see that they seem pretty close only because they're just copying the

35
00:02:39,340 --> 00:02:41,290
previous value or close to it.

36
00:02:46,190 --> 00:02:53,030
A really bad situation is this suppose that you fit a model and you say, aha, I beat the naive forecast

37
00:02:53,030 --> 00:02:56,300
because my accuracy is better than the Niyi forecast.

38
00:02:56,870 --> 00:03:02,330
But of course, you have to ask, are you talking about the accuracy on the sample data or the out of

39
00:03:02,330 --> 00:03:03,060
sample data?

40
00:03:03,740 --> 00:03:08,640
Note that in machine learning we often refer to these as the training data and the test data.

41
00:03:09,200 --> 00:03:11,240
So I would use these terms interchangeably.

42
00:03:11,250 --> 00:03:12,140
Just be aware.

43
00:03:13,430 --> 00:03:19,280
Well, one common mistake is that people believe because they got good accuracy on their in sample data,

44
00:03:19,460 --> 00:03:21,770
that the same will be true for their out of sample data.

45
00:03:22,190 --> 00:03:25,570
And also, quite commonly, it turns out that the opposite is true.

46
00:03:26,210 --> 00:03:31,790
You might beat the night forecast on the sample data, but on the out of sample data, the naive forecast

47
00:03:31,790 --> 00:03:32,480
beats you.

48
00:03:33,080 --> 00:03:34,100
Why does this happen?

49
00:03:36,080 --> 00:03:41,450
It happens when your model over fits to the noise in the training data, but doesn't actually generalize

50
00:03:41,450 --> 00:03:45,510
to the true underlying pattern in the Time series if one exists.

51
00:03:46,070 --> 00:03:51,360
So that's why it's really important to compare your model to a baseline such as the naive forecast.

52
00:03:51,890 --> 00:03:58,440
It's not good enough to say I got 80 percent classification rate on my train and 75 percent on my test

53
00:03:58,450 --> 00:03:58,750
set.

54
00:03:59,150 --> 00:04:04,260
If some other method achieves 70 percent on the test, then your model isn't as good.

55
00:04:09,370 --> 00:04:14,080
There's one application I've seen a lot, which I think is a very interesting consequence of the times

56
00:04:14,080 --> 00:04:16,140
that we live in today.

57
00:04:16,180 --> 00:04:18,210
There are marketers everywhere on the Internet.

58
00:04:18,640 --> 00:04:22,170
The name of the game is SEO search engine optimization.

59
00:04:23,110 --> 00:04:27,470
Everyone is trying to get clicks for their blog or to get people to sign up for their course or whatever.

60
00:04:28,000 --> 00:04:33,640
And of course, one of the obvious topics that many beginners care about is stock price prediction.

61
00:04:34,300 --> 00:04:36,340
Now, if you're taking this course, then you know better.

62
00:04:36,340 --> 00:04:37,960
But let's continue the story.

63
00:04:38,680 --> 00:04:40,080
So imagine what happens.

64
00:04:40,480 --> 00:04:45,490
You take one of the most popular topics that would appeal to someone who doesn't know about finance

65
00:04:45,490 --> 00:04:46,920
like stock predictions.

66
00:04:47,320 --> 00:04:52,090
You take one of the most popular machine learning algorithms that would appeal to someone who may not

67
00:04:52,090 --> 00:04:56,290
know a lot about machine learning, but has heard many buzzwords like LSM.

68
00:04:56,980 --> 00:05:00,310
LSM has been a very popular model for a sequence model.

69
00:05:00,320 --> 00:05:02,390
And then what do you do?

70
00:05:02,800 --> 00:05:07,890
Well, you combine these two, you make a blog post on stock predictions with LSD.

71
00:05:08,350 --> 00:05:10,420
In fact, many people have done so.

72
00:05:12,800 --> 00:05:15,060
It's obviously a very appealing idea.

73
00:05:15,590 --> 00:05:20,430
No one would blame you for clicking on an article or buying a course that claims to be able to do this.

74
00:05:20,930 --> 00:05:26,060
I won't name any names, but there are some courses that make the very mistake I'm talking about in

75
00:05:26,060 --> 00:05:26,820
this lecture.

76
00:05:27,410 --> 00:05:30,710
Maybe you're watching this and you know where you've seen something like this yourself.

77
00:05:31,580 --> 00:05:34,520
There is another even worse mistake that these marketers make.

78
00:05:34,700 --> 00:05:39,050
But we'll discuss that later when we talk about forecasting in general for a time series.

79
00:05:39,050 --> 00:05:44,780
Models like Arima, if you want, you're encouraged to skip over to the forecasting lecture so we can

80
00:05:44,780 --> 00:05:46,040
continue this discussion.

81
00:05:51,160 --> 00:05:55,540
To get back to the naive forecast, let's recall what we learned about random walks.

82
00:05:56,170 --> 00:06:02,320
As you recall, a random walk is where on every step of a time series I flip a coin or pick a number

83
00:06:02,320 --> 00:06:03,370
from a distribution.

84
00:06:03,700 --> 00:06:08,440
And that number is added to my current position in order to go to the next position.

85
00:06:09,160 --> 00:06:15,550
If my noise distribution is a zero centered Gaussian with variants Sigma Square, which is not unreasonable,

86
00:06:16,000 --> 00:06:19,390
then the best forecast is the naive forecast.

87
00:06:19,960 --> 00:06:23,030
I can do no better than predicting the last known value.

88
00:06:23,830 --> 00:06:29,410
Another way to think about this is if you build a model that you think is good but it cannot beat the

89
00:06:29,410 --> 00:06:35,480
night forecast, then it might suggest that your model is actually worse than a random walk model.

90
00:06:36,100 --> 00:06:40,570
In other words, a random walk model describes the data better than your model.