1
00:00:11,100 --> 00:00:16,470
In this video, we will be doing code preparation for our own ends in order to preview the code that

2
00:00:16,470 --> 00:00:19,080
we will right next as before.

3
00:00:19,110 --> 00:00:24,270
The main purpose of this lecture is to show you the syntax so that you are not surprised when we use

4
00:00:24,270 --> 00:00:24,870
it later.

5
00:00:25,440 --> 00:00:30,360
Now, whether or not you understand what this code is doing will be dependent on whether you watch the

6
00:00:30,360 --> 00:00:31,410
previous lectures.

7
00:00:31,830 --> 00:00:35,100
If you did, then you will know what's going on under the hood.

8
00:00:35,430 --> 00:00:37,110
If not, that's OK, too.

9
00:00:37,290 --> 00:00:39,480
You can simply treat this like an API.

10
00:00:44,230 --> 00:00:49,030
OK, so the first thing we will discuss is that the structure of this code is no different than what

11
00:00:49,030 --> 00:00:50,890
we had for Anand's and CNN's.

12
00:00:51,820 --> 00:00:57,520
You'll find that the setup for this video, meaning the previous conceptual lectures, is way more meaty

13
00:00:57,520 --> 00:00:59,230
than the actual implementation.

14
00:00:59,800 --> 00:01:06,130
Thanks to our hard work in the previous section, we already know most of what to do with the new parts

15
00:01:06,130 --> 00:01:09,370
are really just learning about the syntax for the new layers.

16
00:01:09,820 --> 00:01:13,420
So without further ado, let's review what we've already learned.

17
00:01:13,960 --> 00:01:16,810
We know that the first step is to build our model.

18
00:01:17,200 --> 00:01:20,950
The syntax for that is precisely what we will learn very shortly.

19
00:01:21,970 --> 00:01:24,220
The next step is to call the compile function.

20
00:01:24,760 --> 00:01:30,310
The arguments to this, such as the loss, will be dependent on what task you are doing, but it works

21
00:01:30,310 --> 00:01:31,570
the same way as before.

22
00:01:32,620 --> 00:01:34,510
The next step is to call the fit function.

23
00:01:35,170 --> 00:01:40,540
Notice that, like CNN's extra and next test, should have the shape and bite C by D.

24
00:01:41,710 --> 00:01:45,280
After training the model, we would like to use it to make predictions.

25
00:01:45,700 --> 00:01:49,420
Thus, we use the predict function passing in the input data.

26
00:01:49,600 --> 00:01:55,810
We would like to make predictions for OK, so this is exactly the same as before as promised.

27
00:02:00,660 --> 00:02:05,460
Now, at this point since we've only discussed the simple Arnette, that's what we will look at how

28
00:02:05,460 --> 00:02:06,660
to implement first.

29
00:02:07,170 --> 00:02:11,920
But what you will learn is that this is no different than implementing LSD, Yams argues.

30
00:02:12,390 --> 00:02:16,410
You'll see that these other Arnon units can be switched in and out with ease.

31
00:02:17,040 --> 00:02:22,620
By the way, remember that I'm not saying simple Arnon to mean that simple as an adjective to describe

32
00:02:22,620 --> 00:02:25,080
the complexity of this kind of Arnon.

33
00:02:25,920 --> 00:02:29,550
Rather, this is the actual name we use for this kind of Arnette.

34
00:02:30,960 --> 00:02:34,470
In other words, we call this a simple Arnon, because that is its name.

35
00:02:34,680 --> 00:02:37,470
And I'm not saying it's simple as opposed to complex.

36
00:02:42,060 --> 00:02:47,550
So as before, the main thing we need to learn is how to build a model, since all the other steps remain.

37
00:02:48,270 --> 00:02:53,880
In this case, it's helpful to think about how aliens are built and ends are just a series of dense

38
00:02:53,880 --> 00:02:54,390
layers.

39
00:02:54,720 --> 00:03:01,020
So in the simple case, we have input dense, dense with an aunt in there is a self loop in the hidden

40
00:03:01,020 --> 00:03:01,500
layer.

41
00:03:02,010 --> 00:03:06,030
Thus, we simply switch the dense with the simple art and objects.

42
00:03:06,480 --> 00:03:08,280
And yes, it's just that simple.

43
00:03:08,670 --> 00:03:12,360
Self loop means simple arnon, and no loop means dense.

44
00:03:13,500 --> 00:03:19,140
One small note to make is that for aunt ends, the default activation is a hyperbolic tension, although

45
00:03:19,140 --> 00:03:21,270
you can use other activations if you like.

46
00:03:22,170 --> 00:03:25,680
This is unlike the dense layer or the default activation as identity.

47
00:03:30,560 --> 00:03:35,720
Now again, it's always good to think about the shape of the data as it passes through the neural network.

48
00:03:36,650 --> 00:03:40,940
So in this case, we start with a multivariate time series of shape activity.

49
00:03:41,810 --> 00:03:47,480
This then goes through a simple Arnon layer as you recall the simple answer, and it takes in a time

50
00:03:47,480 --> 00:03:51,170
series of vectors x one x two all the way up to X Big T.

51
00:03:51,770 --> 00:03:57,230
It converts them into a time series of hidden vectors, each one each to all the way up to Big T.

52
00:03:58,490 --> 00:04:03,590
Recall that each head and vector depends on the current input X and the past hidden vector.

53
00:04:04,310 --> 00:04:09,560
By default, we only get to keep the final each of the eighty, which includes all the information from

54
00:04:09,560 --> 00:04:10,280
the past.

55
00:04:10,940 --> 00:04:16,790
In other words, the output of the simple R9 by default is an m size vector h of Big T.

56
00:04:17,870 --> 00:04:22,730
From this, we pass this through one of our final dense layers, which is like a regular A9.

57
00:04:23,510 --> 00:04:27,810
So supposing we have key outputs, are output will be a vector of size K.

58
00:04:32,470 --> 00:04:38,110
Now, as you recall, it's also possible to use an acronym such that it not only returns the final hidden

59
00:04:38,110 --> 00:04:44,140
vector of Big T, but all the hidden vectors of the time series in order to do this in TensorFlow.

60
00:04:44,290 --> 00:04:46,210
We just need to pass in a single argument.

61
00:04:46,390 --> 00:04:48,150
Return sequences equals true.

62
00:04:48,820 --> 00:04:54,010
By doing this, we get back all the hidden vectors in a single array of size T by em.

63
00:04:55,270 --> 00:04:58,930
So you can imagine this holding each of one, each of two all the way up to big.

64
00:05:00,580 --> 00:05:02,590
From here, we have several options.

65
00:05:03,160 --> 00:05:08,560
So suppose this is a many to many task where we want to have one prediction for every time step.

66
00:05:08,950 --> 00:05:12,040
So we want y one y two all the way up to y big T.

67
00:05:12,910 --> 00:05:16,630
In this case, we can just pass the output of the Arnon through a dense layer.

68
00:05:16,930 --> 00:05:18,640
And this is what we automatically get.

69
00:05:19,900 --> 00:05:23,230
So in this case, the output of the neural network is T by K.

70
00:05:24,490 --> 00:05:27,250
Note that there is no special syntax you need to use.

71
00:05:27,700 --> 00:05:32,350
TensorFlow automatically knows that the dense layer should be dealing with a time series.

72
00:05:32,800 --> 00:05:37,930
In other words, the dense layer works for both single vectors and the time series of vectors.

73
00:05:38,530 --> 00:05:44,710
If you pass in a single vector, you get back a single vector of size K if you pass in a series of vectors.

74
00:05:44,860 --> 00:05:47,890
Then you get back a series of vectors each of size K.

75
00:05:52,500 --> 00:05:57,900
The next scenario to consider is when you're dealing with the many to one task, in this case, we want

76
00:05:57,900 --> 00:06:00,180
an output only for the final time step.

77
00:06:01,260 --> 00:06:06,470
Now, of course, the easy thing to do would be to not use return sequences at all and just keep your

78
00:06:06,490 --> 00:06:07,160
big T.

79
00:06:07,800 --> 00:06:12,600
But let's suppose you want to keep all the hidden vectors and take the maximum value over time.

80
00:06:13,470 --> 00:06:19,450
In this case, you can apply global max pooling just as you would with scenes, as you recall.

81
00:06:19,470 --> 00:06:25,650
This converts a time series of size T by M into a single vector of size m, completely eliminating the

82
00:06:25,650 --> 00:06:26,580
time dimension.

83
00:06:27,450 --> 00:06:32,520
Once we have a single vector of size M, we're back to the usual situation and we can pass this through

84
00:06:32,520 --> 00:06:36,030
a final dense layer to get back a single vector of size K.

85
00:06:40,780 --> 00:06:46,960
The last scenario I want to cover is when you want to stack multiple Arnon layers together, as you

86
00:06:46,960 --> 00:06:50,650
recall, the input of an iron and there must be a time series.

87
00:06:51,100 --> 00:06:55,090
And therefore the previous Arnon layer must output a time series.

88
00:06:55,660 --> 00:07:00,310
Of course, you already know that this can be done by setting return sequences equal to true.

89
00:07:01,000 --> 00:07:06,190
So here's an example of a many to one hour and then where we've stacked multiple aren't in layers.

90
00:07:07,710 --> 00:07:11,310
In this case, we start with an input time series of size TBD.

91
00:07:11,970 --> 00:07:15,750
Then we pass this through one Arnold layer with 32 hidden units.

92
00:07:16,140 --> 00:07:21,390
We set return sequences equal to true so that the output has the shape T by 32.

93
00:07:22,170 --> 00:07:27,240
Of course, this is still a multivariate time series which can be passed through moran and layers.

94
00:07:27,870 --> 00:07:30,900
So the next start in layer also has 32 hidden units.

95
00:07:31,110 --> 00:07:34,800
But return sequences is now set to false, which is the default.

96
00:07:35,340 --> 00:07:38,700
So the output of this layer will just be a vector of size 32.

97
00:07:39,600 --> 00:07:44,730
The next step is to pass this through one final dense layer where we get an output of size K.

98
00:07:46,190 --> 00:07:48,020
OK, so I hope this was pretty simple.

99
00:07:48,620 --> 00:07:53,390
You've now seen how to stack multiple aren't layers together by using return sequences.

100
00:07:58,080 --> 00:08:04,080
Now, just as a little preview, I want to show you how easy it is to use Elysium's and use, even if

101
00:08:04,080 --> 00:08:09,390
you don't know anything about how they were observed, that it's simply a matter of changing the type

102
00:08:09,390 --> 00:08:11,340
of object that's being used.

103
00:08:12,090 --> 00:08:17,220
So if I want to use in less time, then I simply type SDM instead of simple n.

104
00:08:18,120 --> 00:08:22,260
If I want to use a giu, then I use giu instead of lithium.

105
00:08:23,160 --> 00:08:28,380
So in fact, you don't even have to understand how these Arnon units work in order to use them.

106
00:08:29,190 --> 00:08:33,090
So if that's your preferred approach, then by all means use that approach.

107
00:08:37,700 --> 00:08:44,000
So as with our previous study on CNN's, it is helpful to think about Arnold's in terms of a more natural

108
00:08:44,000 --> 00:08:49,010
kind of input, such as a time series with length T and D input features.

109
00:08:49,730 --> 00:08:55,820
This is the same as when we studied scenes where we considered the input to be 2D images, even though

110
00:08:55,820 --> 00:09:01,040
we eventually use them for one D signals, which we eventually thought of as word embeddings.

111
00:09:01,850 --> 00:09:04,520
Note that the same situation applies in this case.

112
00:09:05,090 --> 00:09:09,250
Initially, it's easier to think of inputs which have the shape T by D.

113
00:09:09,890 --> 00:09:14,300
This is just a time series of length T and D different measurement sensors.

114
00:09:14,960 --> 00:09:20,660
Of course, in practice, this won't be a time series we recorded using sensors, but word embeddings.

115
00:09:21,710 --> 00:09:27,830
As you recall, this is what we get when we have an input with sequence length T, followed by an embedding

116
00:09:27,830 --> 00:09:29,780
layer with embedding Dimension D.

117
00:09:30,410 --> 00:09:32,840
The output of this is also T by D.

118
00:09:33,650 --> 00:09:39,350
Therefore, without loss of generality, we can always think of Arnaz as having an input of shape t

119
00:09:39,350 --> 00:09:39,950
by D.

