1
00:00:11,700 --> 00:00:15,410
In this lecture we are going to do another little demo on a time series.

2
00:00:15,420 --> 00:00:21,960
But this time with a much more complicated signal it's still going to be a synthetic signal meaning

3
00:00:21,990 --> 00:00:27,230
we can understand its behavior because we know the mathematical formula that represents it.

4
00:00:27,360 --> 00:00:33,240
But as you'll see just one small change will make this task much harder for both the auto regressive

5
00:00:33,240 --> 00:00:39,100
linear model and the aunt in this lecture is going to walk you through a prepared call lab notebook.

6
00:00:39,120 --> 00:00:45,060
Although a very good exercise which I always recommend is once you know how this is done to try and

7
00:00:45,060 --> 00:00:51,510
recreate it yourself with as few references as possible as usual you can look at the title of the notebook

8
00:00:51,780 --> 00:00:54,270
to determine what notebook we are currently looking at.

9
00:01:05,040 --> 00:01:05,510
OK.

10
00:01:05,550 --> 00:01:08,790
So after the endpoints we create the time series.

11
00:01:08,790 --> 00:01:14,430
As you can see it looks very similar to what we had previously but with one crucial difference this

12
00:01:14,430 --> 00:01:20,490
is that we are squaring the input argument into the sine function because we square the input argument.

13
00:01:20,490 --> 00:01:25,660
This means that the frequency and hence also the period of the wave changes over time.

14
00:01:26,010 --> 00:01:27,480
So let's plot the time series

15
00:01:37,120 --> 00:01:41,530
and we see exactly that in some parts the wave goes up and down very fast.

16
00:01:41,530 --> 00:01:43,260
And in other parts it slows down

17
00:01:49,480 --> 00:01:50,280
in the next block.

18
00:01:50,290 --> 00:01:53,190
We build our dataset now because of my rule.

19
00:01:53,200 --> 00:01:54,780
All data is the same.

20
00:01:54,790 --> 00:01:58,060
Nothing about this changes so we don't need to explain this again.

21
00:02:04,320 --> 00:02:07,850
Next we're going to build an auto regressive linear model.

22
00:02:08,160 --> 00:02:15,500
You've all seen this before so I'm not going to explain it again.

23
00:02:15,750 --> 00:02:23,110
Here's our loss and optimizer is trying to split here's a training function

24
00:02:26,180 --> 00:02:35,790
and here is running the training function so let's see how this performs so it looks like the loss is

25
00:02:35,790 --> 00:02:39,190
pretty high about zero point five zero point six.

26
00:02:39,210 --> 00:02:44,960
This is much higher than what we had before with the normal sine wave.

27
00:02:45,150 --> 00:02:47,370
Next we're going to plot the loss per iteration

28
00:02:51,510 --> 00:02:51,840
All right.

29
00:02:51,850 --> 00:02:53,730
So this does not look very good.

30
00:02:53,830 --> 00:02:57,670
In fact it looks like the test lost does not decrease at all.

31
00:02:57,670 --> 00:03:02,770
The train loss appears to decrease which is a sign that all the model is doing is over fitting without

32
00:03:02,920 --> 00:03:04,690
actually matching the underlying pattern

33
00:03:11,340 --> 00:03:12,620
so for our predictions.

34
00:03:12,630 --> 00:03:15,450
Let's start by doing a one step forecast.

35
00:03:15,450 --> 00:03:18,940
As a side note you'll see that I've taken a little shortcut here.

36
00:03:19,200 --> 00:03:24,220
If you want to do is a one step forecast we've already built our input data for that.

37
00:03:24,390 --> 00:03:31,340
All we need to do is call the model and pass an X test remember that the first half of X is our train

38
00:03:31,340 --> 00:03:33,830
set and the second half of X is our test set

39
00:03:40,450 --> 00:03:40,680
right.

40
00:03:40,860 --> 00:03:42,290
So let's plot the results.

41
00:03:45,650 --> 00:03:48,470
So as you can see it does not do very well.

42
00:03:48,470 --> 00:03:51,650
It doesn't even get the values in the right range.

43
00:03:51,650 --> 00:03:56,150
Knowing this we can deduce that the multi-step forecast is going to be even worse

44
00:03:58,880 --> 00:04:00,800
again since we've already gone through this code.

45
00:04:00,830 --> 00:04:02,940
I'm not going to explain it again.

46
00:04:03,170 --> 00:04:04,790
So let's run this and see if we get

47
00:04:15,880 --> 00:04:18,370
as you can see our suspicions are correct.

48
00:04:18,400 --> 00:04:20,740
The linear model does a terrible forecast

49
00:04:25,420 --> 00:04:27,910
so we're going to switch notebooks now.

50
00:04:27,910 --> 00:04:31,660
Now we're looking at Pi torch non-linear sequence simple art in

51
00:04:34,810 --> 00:04:38,350
so in this new notebook we're going to try and are an end model.

52
00:04:38,440 --> 00:04:42,120
We're going to start with a simple Arnon and then move on to an Elysium later

53
00:04:50,030 --> 00:04:55,160
the only different thing we need to do here is reshape our data so that it's three dimensional and by

54
00:04:55,160 --> 00:04:59,870
t by D then we create our and then fit it and look at the results

55
00:05:03,140 --> 00:05:11,670
so we can said that the vice define our simple R and then you've seen this before instantiate the model

56
00:05:12,570 --> 00:05:13,950
create the loss and optimizer

57
00:05:19,420 --> 00:05:23,170
meet the inputs and targets do our training

58
00:05:26,510 --> 00:05:37,090
move to the GP you do full gradient descent.

59
00:05:37,110 --> 00:05:40,850
All right so you can see that a loss is now better than the linear model.

60
00:05:40,890 --> 00:05:46,740
Now this is a good lesson for the simple time series like a sine wave with no noise a linear model is

61
00:05:46,740 --> 00:05:53,860
perfect and aunt and does worse at least with default parameters it has too much flexibility.

62
00:05:54,270 --> 00:05:59,980
But now we have a signal that is much more difficult a signal that a linear model can't solve at all.

63
00:06:00,180 --> 00:06:05,010
In this case the ANA does better because it actually has the flexibility to match the signal

64
00:06:08,320 --> 00:06:08,590
next.

65
00:06:08,590 --> 00:06:13,720
Let's plot the loss per iteration.

66
00:06:13,850 --> 00:06:18,390
All right so that seems reasonable.

67
00:06:18,390 --> 00:06:20,580
Next we're going to do a once that forecast

68
00:06:28,670 --> 00:06:31,770
so as you can see this does a much better job.

69
00:06:31,880 --> 00:06:37,490
It looks like it can match the frequency in most places even when the signal slows down significantly

70
00:06:37,490 --> 00:06:39,190
in the second half.

71
00:06:39,200 --> 00:06:45,080
This is despite the fact that the model never saw such a low frequency in the training set the values

72
00:06:45,080 --> 00:06:49,970
are also approximately in the right range.

73
00:06:49,980 --> 00:07:01,150
Now let's go to the multi-step forecast.

74
00:07:01,210 --> 00:07:06,670
So as you can see this doesn't look that good but at least at the beginning the model appears to capture

75
00:07:06,670 --> 00:07:08,610
the frequency quite well.

76
00:07:08,800 --> 00:07:11,970
It doesn't know that in the second half the signal slows down.

77
00:07:12,130 --> 00:07:16,120
So it kind of makes sense that the model isn't able to predict that.

78
00:07:16,180 --> 00:07:20,740
Also it does a much better job than the linear model at getting the right range of values

79
00:07:26,910 --> 00:07:27,500
finally.

80
00:07:27,510 --> 00:07:32,700
Let's go to our next script where we use an alias T.M. instead of a simple in

81
00:07:37,380 --> 00:07:46,590
so I'm gonna run the imports make the series plot the series build the data set.

82
00:07:49,390 --> 00:07:51,670
Set the device.

83
00:07:51,990 --> 00:07:54,880
Define out r and n so this uses an LSD M

84
00:07:57,780 --> 00:08:04,090
instantiate the model create the loss and optimizer make the train set and test set

85
00:08:10,410 --> 00:08:18,460
create the training function move data to the cheaper you train the model

86
00:08:22,920 --> 00:08:25,660
writes of the lost values are still small.

87
00:08:25,680 --> 00:08:26,270
That's good.

88
00:08:30,760 --> 00:08:37,090
All right so the loss goes down pretty smoothly and we get a slightly better loss than the simple urn

89
00:08:37,090 --> 00:08:42,320
in let's look at the ones that forecast

90
00:08:48,440 --> 00:08:49,150
all right.

91
00:08:49,150 --> 00:08:52,620
So I feel like though once that forecast looks a tiny bit better

92
00:08:55,540 --> 00:08:57,860
and if we go to the multi-step forecast

93
00:09:04,720 --> 00:09:10,420
I don't think there is any appreciable difference in the multi-step forecast.

94
00:09:10,520 --> 00:09:14,080
So essentially this is pretty much the same as the simple line in.

95
00:09:14,270 --> 00:09:16,700
Now why might this be.

96
00:09:16,700 --> 00:09:21,590
Well don't make the mistake of thinking that unless teams are more popular and because they are more

97
00:09:21,590 --> 00:09:27,230
popular it's probably because they are more powerful and therefore less teams are better and they can

98
00:09:27,230 --> 00:09:33,120
do anything unless teams are better than aren't ends at finding long term dependencies.

99
00:09:33,200 --> 00:09:39,240
But this doesn't mean that less teams are simply better at everything it doesn't mean that less teams

100
00:09:39,240 --> 00:09:41,200
are magic now.

101
00:09:41,280 --> 00:09:46,800
Although Ellis teams are good at finding long term dependencies it's important to realize that this

102
00:09:46,800 --> 00:09:51,190
fact isn't hold true for arbitrarily long term dependencies.

103
00:09:51,270 --> 00:09:57,230
There is a point where even Ellis teams forget of course that point is further than the simple aren't

104
00:09:57,230 --> 00:09:59,970
n but it exists nonetheless.

105
00:09:59,970 --> 00:10:03,570
So avoid absolute generalizations first of all.

106
00:10:03,570 --> 00:10:04,240
And second.

107
00:10:04,290 --> 00:10:10,380
Note that this dataset doesn't really have any long term dependencies at least for the time span of

108
00:10:10,380 --> 00:10:12,960
the dataset as it exists right now.

109
00:10:12,960 --> 00:10:18,150
Data from very far in the past isn't really going to help predict future values.

110
00:10:18,150 --> 00:10:23,330
So in fact it makes sense that there's no real advantage to using Ellis teams for this problem.
