1
00:00:11,100 --> 00:00:15,810
So now that you know, the basics of classification and regression, we are now going to learn how to

2
00:00:15,810 --> 00:00:21,810
convert these problems into time, serious problems in actuality, the main focus of this lecture is

3
00:00:21,810 --> 00:00:24,540
not machine learning itself, but rather the data.

4
00:00:24,960 --> 00:00:27,440
So I want to repeat that because it's very important.

5
00:00:27,960 --> 00:00:30,570
The focus of this lecture is not machine learning.

6
00:00:30,600 --> 00:00:32,630
The focus of this lecture is the data.

7
00:00:33,330 --> 00:00:34,860
So what do I mean by this?

8
00:00:39,510 --> 00:00:44,730
OK, so let's start with a typical regression problem, so you would like to predict your exam grade

9
00:00:44,730 --> 00:00:49,690
from the number of hours you studied and the number of hours you slept the night before your exam.

10
00:00:49,920 --> 00:00:51,310
Pretty standard example.

11
00:00:51,810 --> 00:00:58,080
Now, as you can see, this data set is essentially just two tables of numbers, one table we call X

12
00:00:58,170 --> 00:00:59,850
and the other table we call Y.

13
00:01:00,420 --> 00:01:03,360
So how can this structure be used for a time series?

14
00:01:04,410 --> 00:01:07,540
Well, let's state what we are trying to do in a slightly different way.

15
00:01:07,980 --> 00:01:12,270
We are trying to predict the Y column, given the values in the X column.

16
00:01:17,020 --> 00:01:18,940
So why does that sound familiar?

17
00:01:19,450 --> 00:01:24,460
Well, in Time series forecasting, we're trying to predict the next value in the Time series from the

18
00:01:24,460 --> 00:01:26,290
previous values in the Time series.

19
00:01:26,800 --> 00:01:28,210
OK, so what does that mean?

20
00:01:28,750 --> 00:01:33,850
Well, it means that I can simply put, the thing I want to predict in the Y column and put the things

21
00:01:33,850 --> 00:01:36,630
I want to predict from in the X columns.

22
00:01:37,150 --> 00:01:39,790
And you could see here that I've done exactly this.

23
00:01:40,850 --> 00:01:47,920
I want to predict Y for from y one way to one y three, I want to predict Y five from Y to Y three and

24
00:01:47,920 --> 00:01:48,590
Y four.

25
00:01:49,270 --> 00:01:54,820
OK, so I bet you're probably a bit surprised at how simple that was in order to use machine learning,

26
00:01:54,820 --> 00:01:56,270
four time series forecasting.

27
00:01:56,530 --> 00:02:01,900
All I had to do was put my data into a format that can be accepted by machine learning models.

28
00:02:02,740 --> 00:02:07,000
At this point you can plug and chug into any machine learning model you choose.

29
00:02:11,590 --> 00:02:16,510
Now, there's one small note I want to make about this concept which relates to contemporary machine

30
00:02:16,510 --> 00:02:16,960
learning.

31
00:02:17,890 --> 00:02:22,450
So I'm sure you've all heard of the work done in Transformers, which have led to state of the art results

32
00:02:22,450 --> 00:02:23,140
in NLP.

33
00:02:23,950 --> 00:02:29,080
One of the new categories of learning that has come out in recent years is self supervised learning.

34
00:02:30,460 --> 00:02:36,790
One example of this is imagine you have a bunch of text downloaded from Wikipedia in order to train

35
00:02:36,790 --> 00:02:42,370
a language model, say, using a transformer, you simply take a part of your sentence and use the next

36
00:02:42,370 --> 00:02:47,980
word as the label that is your model learns to predicts the next word in a sentence, given previous

37
00:02:47,980 --> 00:02:49,210
words in the sentence.

38
00:02:50,320 --> 00:02:54,820
An alternative set up to this problem would be to predict any middle word in a sentence.

39
00:02:55,390 --> 00:02:57,330
Another example is with images.

40
00:02:57,700 --> 00:03:05,140
So given most of an image, try to figure out the missing part of the image or given a video, try to

41
00:03:05,140 --> 00:03:07,080
predict the next frame of the video.

42
00:03:07,840 --> 00:03:09,750
So hopefully this is sounding familiar.

43
00:03:10,330 --> 00:03:14,580
In fact, what we've done in this course is essentially self supervised learning.

44
00:03:15,460 --> 00:03:21,300
We technically do not have any targets or labels, but we've converted our data into a supervised problem.

45
00:03:22,540 --> 00:03:28,410
One major advantage of this approach is that it allows you to learn from data without requiring labels.

46
00:03:29,170 --> 00:03:34,270
It turns out that labels are pretty expensive to create because you actually have to pay someone to

47
00:03:34,270 --> 00:03:36,100
sit down and make these labels.

48
00:03:37,120 --> 00:03:42,090
It also turns out that, unsurprisingly, human error is a big problem with this process.

49
00:03:42,400 --> 00:03:44,880
So you end up having labels which are incorrect.

50
00:03:45,880 --> 00:03:51,190
Self supervised learning allows us to learn from essentially all existing data on the Internet, which

51
00:03:51,220 --> 00:03:52,780
would be infeasible to label.

52
00:03:53,830 --> 00:03:58,960
So I thought this was just an interesting note to make about how old ideas have been given new names

53
00:03:59,050 --> 00:04:01,090
in the modern era of machine learning.

54
00:04:05,910 --> 00:04:11,730
Note that this also opens the door for time series classification in this case, suppose that we have

55
00:04:11,730 --> 00:04:17,640
an input time series like your brain activity and we'd like to predict what you are thinking about either

56
00:04:17,640 --> 00:04:18,960
food or mathematics.

57
00:04:19,500 --> 00:04:26,340
Well, again, all we have to do is figure out how to put our data into X and Y tables in the next table.

58
00:04:26,340 --> 00:04:27,570
We have the time series.

59
00:04:27,810 --> 00:04:31,560
Call that X one up to X one hundred in the Y table.

60
00:04:31,560 --> 00:04:34,260
We have the label either food or mathematics.

61
00:04:34,600 --> 00:04:35,880
OK, so pretty simple.

62
00:04:40,810 --> 00:04:46,900
At this point, we can also turn our attention to multistep forecasts, as you recall, the Yarema method

63
00:04:46,900 --> 00:04:51,070
of making multistep forecasts is to do so in an incremental fashion.

64
00:04:52,630 --> 00:04:57,010
With machine learning, you can obviously take the same approach because once we get a prediction,

65
00:04:57,220 --> 00:04:59,180
we can just plug it back into the model.

66
00:05:00,190 --> 00:05:04,390
One downside to this method is that you get a problem called error propagation.

67
00:05:05,080 --> 00:05:08,750
What this means is that, well, your first prediction will have some amount of error.

68
00:05:09,130 --> 00:05:12,310
So let's say you just found we have four from white one whites.

69
00:05:12,310 --> 00:05:13,240
Who and why three?

70
00:05:14,410 --> 00:05:19,810
When you put this back into your model, you do not know the true way for you use white hat for which

71
00:05:19,810 --> 00:05:20,620
has an error.

72
00:05:21,160 --> 00:05:26,110
This gives you I had five, which will probably be even more wrong than if you had used the true white

73
00:05:26,110 --> 00:05:26,680
for.

74
00:05:27,760 --> 00:05:32,230
In other words, the error propagates because the errors are compounding over each step.

75
00:05:37,000 --> 00:05:43,690
So one possible solution to this is called the multi output multi-step forecast, the thing to realize

76
00:05:43,690 --> 00:05:49,420
is that your targets, which we've been calling y, are not limited to just being a single column table.

77
00:05:50,440 --> 00:05:55,330
So imagine that we have a forecast horizon of three and the number of lags is also three.

78
00:05:55,960 --> 00:05:59,470
In this case, the first row of X will be Y one way to and Y three.

79
00:05:59,950 --> 00:06:03,130
The first row Y will be Y for Y five and Y six.

80
00:06:03,610 --> 00:06:07,360
There's no reason that the first row of Y has to be only Y four.

81
00:06:07,900 --> 00:06:13,660
In this way our model will have three outputs and we can make a forecast instantly without having to

82
00:06:13,660 --> 00:06:15,570
plug our predictions back into the input.

83
00:06:16,360 --> 00:06:21,290
And because we haven't used predictions to make more predictions, there is no error propagation.

84
00:06:22,450 --> 00:06:23,610
Now one limitations.

85
00:06:23,680 --> 00:06:27,020
This is that not all machine learning models have this capability.

86
00:06:27,610 --> 00:06:30,460
So if you're curious about which ones do and which ones don't.

87
00:06:30,730 --> 00:06:33,520
My advice would be to simply try it and see if it works.

88
00:06:33,910 --> 00:06:38,950
Of course, at some point you probably want to know how your model actually works, but this is a quick

89
00:06:38,950 --> 00:06:39,790
way to find out.

90
00:06:44,600 --> 00:06:48,920
The final question I want to address in this lecture is, why is this approach powerful?

91
00:06:49,970 --> 00:06:54,650
The answer is that most of the models we will study in the section are non-linear.

92
00:06:55,160 --> 00:07:00,800
As you know, from the geometrical perspective, nonlinear models are more flexible and can learn more

93
00:07:00,800 --> 00:07:03,640
complex patterns than simple linear models.

94
00:07:04,640 --> 00:07:08,420
The auto regressive model we studied before is a linear model.

95
00:07:09,050 --> 00:07:11,350
So what we've done here is extremely potent.

96
00:07:11,990 --> 00:07:17,690
We've generalized our problem statements such that it applies to both linear and nonlinear models.

97
00:07:18,590 --> 00:07:24,320
We've allowed ourselves to build Time series models for more complex time series with more complex patterns.

98
00:07:24,740 --> 00:07:29,450
Now it remains to be seen if this actually helps, but at least the possibility exists.
