1
00:00:11,040 --> 00:00:16,730
OK, so in this lecture, we will be discussing a surprising observation we have made in this section,

2
00:00:17,520 --> 00:00:23,310
this observation was that Arnon's don't seem to perform particularly well when it comes to TIME series

3
00:00:23,310 --> 00:00:24,160
analysis.

4
00:00:24,870 --> 00:00:29,810
So before we even begin discussing the main point of this lecture, I have a question for you.

5
00:00:30,120 --> 00:00:36,300
The students of this course, I know that many of you have assumed that ListBox must be excellent for

6
00:00:36,300 --> 00:00:37,170
Time series.

7
00:00:37,740 --> 00:00:41,520
My question is, why do you think this what gave you this idea?

8
00:00:42,330 --> 00:00:47,970
So as an exercise for this lecture, if this was one of your assumptions coming into this course, please

9
00:00:47,970 --> 00:00:51,060
let me know on the Q&amp;A why you believe this to be true.

10
00:00:55,950 --> 00:01:00,420
So your next question might be, why do Arnon's seem to perform so poorly?

11
00:01:01,350 --> 00:01:05,190
Unfortunately, answers to questions like these are harder than you think.

12
00:01:05,640 --> 00:01:10,670
A question like this might take up a whole Ph.D. thesis that is years of work.

13
00:01:11,190 --> 00:01:16,710
My personal belief is that it's usually not worth your time to invest in the theory of deep neural networks

14
00:01:16,980 --> 00:01:19,830
unless you have a very strong background in mathematics.

15
00:01:20,970 --> 00:01:25,680
Furthermore, it's often the case that when people come up with theories about why things work or don't

16
00:01:25,680 --> 00:01:29,440
work, they are likely to be superseded by new theories in the future.

17
00:01:30,150 --> 00:01:35,340
So unless you want to devote years of your life studying a niche topic only for it to be obsolete a

18
00:01:35,340 --> 00:01:39,150
few months later, don't do it unless it's truly what you love to do.

19
00:01:40,050 --> 00:01:43,710
Now, my students know that I love inventing motto's about machine learning.

20
00:01:44,340 --> 00:01:49,790
The relevant model for this situation is machine learning is experimentation and not philosophy.

21
00:01:50,550 --> 00:01:54,660
What this means is that you often don't care about why something works or does not work.

22
00:01:54,900 --> 00:01:56,470
You just care about the results.

23
00:01:56,970 --> 00:01:58,560
The result is easy to check.

24
00:01:59,130 --> 00:02:02,100
Simply do the experiment and look at what happens.

25
00:02:06,530 --> 00:02:10,260
The nice thing about this is someone has already done these experiments.

26
00:02:10,730 --> 00:02:17,090
There is a paper included in extra reading called Statistical and Machine Learning, Forecasting Methods,

27
00:02:17,450 --> 00:02:19,010
Concerns and ways forward.

28
00:02:20,000 --> 00:02:26,000
So in this paper, the authors compare both classical times, various methods like Hetson Arima, along

29
00:02:26,000 --> 00:02:31,610
with machine learning and deep neural networks, including trees, support vector machines and LSD.

30
00:02:32,360 --> 00:02:34,070
So let's look at the results.

31
00:02:35,000 --> 00:02:41,110
So this is a chart showing estimate on the Y axis in computational complexity, on the X axis.

32
00:02:41,600 --> 00:02:44,550
Basically, the higher you are, the worse error you have.

33
00:02:45,170 --> 00:02:47,540
Well, which model is the highest on this chart?

34
00:02:47,960 --> 00:02:49,300
That would be the Helstone.

35
00:02:50,270 --> 00:02:55,730
Sadly, many of the machine learning methods perform poorly, showing that although these methods may

36
00:02:55,730 --> 00:03:00,650
be powerful for other types of data, this is not the case for Time series forecasting.

37
00:03:01,580 --> 00:03:07,160
Note that for this paper, the authors looked at the M3 competition, which is a data set with over

38
00:03:07,160 --> 00:03:09,010
one thousand monthly time series.

39
00:03:09,320 --> 00:03:13,600
So they certainly had a lot of data to work with to justify these results.