1
00:00:11,120 --> 00:00:16,420
In this video, we'll be implementing it and then to be used in the airline passengers time series,

2
00:00:17,000 --> 00:00:19,690
so the basic outline for the script is as follows.

3
00:00:20,240 --> 00:00:24,640
We're going to go through the same usual steps as we did for regular machine learning.

4
00:00:25,220 --> 00:00:28,010
The first step will be to create the one step forecast.

5
00:00:28,460 --> 00:00:32,630
The second step will be to create the incremental, multi-step forecast.

6
00:00:33,110 --> 00:00:36,030
The third step will be to create the multi output forecast.

7
00:00:36,710 --> 00:00:38,120
So these are the three steps.

8
00:00:38,300 --> 00:00:39,800
Same as you've seen before.

9
00:00:41,320 --> 00:00:48,280
OK, so we'll begin by importing pandas, non-pay and matplotlib as usual, the next step is to import

10
00:00:48,280 --> 00:00:49,910
the relevant parts of Tenzer flow.

11
00:00:50,380 --> 00:00:53,170
You'll see how each of these are used throughout this script.

12
00:00:54,070 --> 00:00:58,180
I'm also going to set some random seeds so that we can get consistent results.

13
00:00:58,600 --> 00:01:03,430
Of course, in the real world, you should turn this off and make sure that your code works with any

14
00:01:03,430 --> 00:01:04,270
random seed.

15
00:01:09,440 --> 00:01:11,510
The next step is to update cycle learn.

16
00:01:16,150 --> 00:01:18,550
The next step is to import the main metric.

17
00:01:22,340 --> 00:01:25,660
The next step is to download our airline passengers CSFI.

18
00:01:30,380 --> 00:01:34,460
The next step is to load in our CEV using reads CSFI.

19
00:01:38,430 --> 00:01:40,770
The next step is to compute the log, transform.

20
00:01:44,220 --> 00:01:47,280
The next step is to split our data into train and test.

21
00:01:51,220 --> 00:01:54,790
The next step is to create index arrays for both training and test.

22
00:01:58,900 --> 00:02:01,760
The next step is to compute the first difference of our data.

23
00:02:02,260 --> 00:02:07,510
As mentioned often in this course, there are always going to be many options to try many more than

24
00:02:07,510 --> 00:02:09,150
we can go through in these videos.

25
00:02:09,550 --> 00:02:15,010
So you already know why defensing is useful, but perhaps you may believe that it is still not necessary.

26
00:02:15,400 --> 00:02:20,560
So if you think deep learning is powerful enough such that you don't have to difference, please give

27
00:02:20,560 --> 00:02:21,250
it a try.

28
00:02:25,620 --> 00:02:30,780
The next step is to create a supervised data center out of our Time series, since you've seen this

29
00:02:30,780 --> 00:02:31,620
code before.

30
00:02:31,680 --> 00:02:33,050
I won't explain it again.

31
00:02:38,150 --> 00:02:42,050
The next step is to split our supervised dataset into training and test.

32
00:02:46,530 --> 00:02:48,660
The next step is to create our neural network.

33
00:02:49,380 --> 00:02:52,590
OK, so basically this is review from the previous lecture.

34
00:02:53,580 --> 00:03:00,090
We start by creating an input layer which has input dimensionality t normally we call this D, but since

35
00:03:00,090 --> 00:03:04,460
this is a time series, our features are actually just the time series values.

36
00:03:04,770 --> 00:03:05,940
So we call it T.

37
00:03:07,440 --> 00:03:09,880
The next step is to create our first hidden layer.

38
00:03:10,200 --> 00:03:12,910
I've arbitrarily chosen ahead and size of 32.

39
00:03:13,650 --> 00:03:17,990
I've also chosen a new activation, which is pretty standard these days.

40
00:03:18,840 --> 00:03:23,550
Note that I've not added any additional hidden layers, although there is nothing stopping you from

41
00:03:23,550 --> 00:03:24,300
doing so.

42
00:03:24,840 --> 00:03:29,330
Recall that in the real world, choosing these values is just a matter of trial and error.

43
00:03:29,970 --> 00:03:34,890
The next step is to create our final output layer, which is just a dance of output, size one.

44
00:03:35,400 --> 00:03:38,670
This is because our initial model is a one step predictor.

45
00:03:39,900 --> 00:03:45,780
OK, and finally we instantiate a model object passing in the input and output of our layers.

46
00:03:50,480 --> 00:03:56,060
The next step is to call the compile function as mentioned, we'll use the mean squared error loss and

47
00:03:56,060 --> 00:03:58,460
by default we'll choose the atom optimizer.

48
00:04:03,370 --> 00:04:08,770
The next step is to call the fifth function, so we start by passing an X train and Y train, which

49
00:04:08,770 --> 00:04:10,070
are the first two arguments.

50
00:04:10,780 --> 00:04:16,330
The next step is to pass in the number of epochs which have set to one hundred again, choosing this

51
00:04:16,330 --> 00:04:18,170
value as a matter of trial and error.

52
00:04:18,490 --> 00:04:23,010
You want to make sure that your loss per iteration looks reasonable after training is complete.

53
00:04:24,250 --> 00:04:28,690
The next step is to pass in the validation data, which is X test and Y test.

54
00:04:34,660 --> 00:04:39,070
OK, so the next step is to plot the loss per iteration for both training and test.

55
00:04:45,890 --> 00:04:51,950
As you can see, they both decrease and converge as expected, you may want to try more or less epochs

56
00:04:51,950 --> 00:04:53,230
depending on what you see.

57
00:04:56,200 --> 00:05:02,020
The next step is to set the first T plus one values in train reacts to false since, as you recall,

58
00:05:02,020 --> 00:05:03,610
these items are not predictable.

59
00:05:07,770 --> 00:05:12,870
The next step is to compute our model predictions, recall that these are for the Difference this Time

60
00:05:12,870 --> 00:05:13,440
series.

61
00:05:14,010 --> 00:05:19,590
Also note that for Tenzer flow, the output of the model will be N by one, since we specified that

62
00:05:19,590 --> 00:05:21,030
the output has size one.

63
00:05:21,660 --> 00:05:24,430
However, this can be used in the subsequent code.

64
00:05:24,870 --> 00:05:29,280
Therefore, we'll call the a flattened function which will turn these into one arrays.

65
00:05:33,460 --> 00:05:37,020
The next step is to store our different predictions in the data frame.

66
00:05:40,520 --> 00:05:43,190
The next step is to plot the difference to predictions.

67
00:05:48,480 --> 00:05:52,560
So these look pretty accurate, but remember, this is just a one step prediction.

68
00:05:55,190 --> 00:06:00,710
The next step is to earn a difference, our predictions, so we'll start by shifting the log passengers

69
00:06:00,800 --> 00:06:01,640
up by one.

70
00:06:06,840 --> 00:06:09,480
The next step is to grab the last known train value.

71
00:06:14,040 --> 00:06:19,220
The next step is to compute and store the one step forecast using the same method you've seen before.

72
00:06:23,050 --> 00:06:25,540
The next step is to plot the one step forecast.

73
00:06:30,390 --> 00:06:35,100
So as you can see, it looks pretty good, but again, it's just the ones that forecast.

74
00:06:38,810 --> 00:06:41,880
The next step is to compute the multi-step forecast.

75
00:06:42,350 --> 00:06:46,000
Note that this code is the same as before, so I won't explain it again.

76
00:06:51,280 --> 00:06:56,410
The next step is to indifference the multistep predictions to get our multi-step forecast.

77
00:07:00,830 --> 00:07:03,470
The next step is to plot the multi-step forecast.

78
00:07:07,980 --> 00:07:13,980
So as you can see, the multistep forecast looks pretty good, in fact, just as good as the ones that

79
00:07:13,980 --> 00:07:14,790
forecast.

80
00:07:18,660 --> 00:07:24,130
The next step is to create our multi output supervised data set since you've seen this before.

81
00:07:24,240 --> 00:07:25,580
I won't explain it again.

82
00:07:29,790 --> 00:07:33,360
The next step is to split our new data center into train and test.

83
00:07:38,360 --> 00:07:44,510
The next step is to create our and note that this is the same as our previous CNN, except that the

84
00:07:44,510 --> 00:07:46,730
output now has size T.Y..

85
00:07:51,110 --> 00:07:54,170
The next step is to compile a model same as before.

86
00:07:58,750 --> 00:08:02,330
The next step is to fit our model with our new multi output data set.

87
00:08:07,890 --> 00:08:10,110
The next step is to plot the loss prepack.

88
00:08:13,430 --> 00:08:14,860
And it seems to be OK.

89
00:08:17,910 --> 00:08:21,660
The next step is to call model predict, to get our predictions.

90
00:08:22,200 --> 00:08:25,260
Note that this time we do not flatten the predictions.

91
00:08:29,940 --> 00:08:36,750
So as you can see, the predictions are now of size and by 12, each row contains a 12 step forecast.

92
00:08:39,730 --> 00:08:46,390
The next step is to index our predictions, so we'll do this differently for both train and test for

93
00:08:46,390 --> 00:08:51,220
the train set, we'll treat them like one step predictions and simply grab the zero column.

94
00:08:52,030 --> 00:08:54,480
As you recall, the time steps are redundant.

95
00:08:54,970 --> 00:09:00,250
For example, if the first row has a prediction for why, one way to win, why three, the second row

96
00:09:00,250 --> 00:09:02,950
will have a prediction for why two, why three and why four?

97
00:09:03,190 --> 00:09:06,520
So why two and why three would be redundant predictions.

98
00:09:07,930 --> 00:09:13,080
Now it's reasonable to believe that the predictions closer to the current time would be more accurate.

99
00:09:13,210 --> 00:09:20,680
So we'll only use the zero with row, which represents the prediction for one time step ahead for the

100
00:09:20,680 --> 00:09:21,280
test set.

101
00:09:21,280 --> 00:09:25,960
We have only one sample, but each of the 12 predictions are stored along the row.

102
00:09:26,350 --> 00:09:30,040
So we grabbed the zeroth row to get the 12 step forecast.

103
00:09:35,300 --> 00:09:40,130
The next step is to compute the difference forecast using the same method as before.

104
00:09:44,310 --> 00:09:46,500
The next step is to plot our new forecast.

105
00:09:51,450 --> 00:09:55,470
So interestingly, this time, it looks like the incremental forecast was better.

106
00:09:59,110 --> 00:10:01,960
The next step is to compute the map of our predictions.

107
00:10:05,370 --> 00:10:10,670
So this confirms what we saw above, the incremental forecast seems to be a bit better.
