1
00:00:11,070 --> 00:00:16,740
In this video, we will be doing code preparation for Arnon's in order to preview the code that we will

2
00:00:16,740 --> 00:00:17,580
write next.

3
00:00:18,330 --> 00:00:23,760
As before, the main purpose of this lecture is to show you the syntax so that you are not surprised

4
00:00:23,760 --> 00:00:24,930
when we use it later.

5
00:00:25,440 --> 00:00:30,390
Now, whether or not you understand what this code is doing will be dependent on whether you watch the

6
00:00:30,390 --> 00:00:31,440
previous lectures.

7
00:00:31,770 --> 00:00:35,160
If you did, then you will know what's going on under the hood.

8
00:00:35,400 --> 00:00:37,110
If not, that's OK too.

9
00:00:37,300 --> 00:00:39,510
You can simply treat this like an API.

10
00:00:44,200 --> 00:00:49,030
OK, so the first thing we will discuss is that the structure of this code is no different than what

11
00:00:49,030 --> 00:00:55,660
we had for Anan's and CNN's, you'll find that the setup for this video, meaning the previous conceptual

12
00:00:55,660 --> 00:00:59,210
lectures, is way more meaty than the actual implementation.

13
00:00:59,830 --> 00:01:04,210
Thanks to our hard work in the previous section, we already know most of what to do.

14
00:01:05,290 --> 00:01:09,400
The new parts are really just learning about the syntax for the new layers.

15
00:01:09,820 --> 00:01:13,420
So without further ado, let's review what we've already learned.

16
00:01:13,960 --> 00:01:16,800
We know that the first step is to build our model.

17
00:01:17,170 --> 00:01:20,960
The syntax for that is precisely what we will learn very shortly.

18
00:01:22,000 --> 00:01:24,220
The next step is to call the compile function.

19
00:01:24,760 --> 00:01:30,310
The arguments to this, such as the loss, will be dependent on what task you are doing, but it works

20
00:01:30,310 --> 00:01:31,590
the same way as before.

21
00:01:32,650 --> 00:01:34,520
The next step is to call the fifth function.

22
00:01:35,170 --> 00:01:39,400
Notice that like CNN's X Traina, next test should have the shape.

23
00:01:39,400 --> 00:01:45,310
And by TBD after training the model, we would like to use it to make predictions.

24
00:01:45,730 --> 00:01:49,460
Thus we use the predictive function parsing in the input data.

25
00:01:49,600 --> 00:01:55,840
We would like to make predictions for OK, so this is exactly the same as before as promised.

26
00:02:00,660 --> 00:02:05,460
Now, at this point, since we've only discussed the simple arnet, that's what we will look at how

27
00:02:05,460 --> 00:02:06,690
to implement first.

28
00:02:07,170 --> 00:02:11,970
But what you will learn is that this is no different than implementing LSD or Giuse.

29
00:02:12,390 --> 00:02:16,460
You'll see that these other Ahren and units can be switched in and out with ease.

30
00:02:17,070 --> 00:02:22,620
By the way, remember that I'm not saying simple RNA to mean that simple as an adjective to describe

31
00:02:22,620 --> 00:02:25,080
the complexity of this kind of arnet.

32
00:02:25,950 --> 00:02:29,600
Rather, this is the actual name we use for this kind of arnet.

33
00:02:30,900 --> 00:02:34,500
In other words, we call this a simple ornon because that is its name.

34
00:02:34,650 --> 00:02:37,500
And I'm not saying it's simple as opposed to complex.

35
00:02:42,060 --> 00:02:47,550
So as before, the main thing we need to learn is how to build a model, since all the other steps remain

36
00:02:48,240 --> 00:02:53,880
in this case, it's helpful to think about how engines are built and ends are just a series of dense

37
00:02:53,880 --> 00:02:54,430
layers.

38
00:02:54,720 --> 00:03:01,050
So in the simple case, we have input dense, dense with an iron in there is a self loop in the hidden

39
00:03:01,050 --> 00:03:01,520
layer.

40
00:03:02,010 --> 00:03:06,050
Thus we simply switch the dense with the simple on an object.

41
00:03:06,450 --> 00:03:12,390
And yes, it's just that simple self loop means simple iron and no loop means dense.

42
00:03:13,500 --> 00:03:19,140
One small note to make is that for Arnon's, the default activation is a hyperbolic tangent, although

43
00:03:19,140 --> 00:03:21,340
you can use other activations if you like.

44
00:03:22,170 --> 00:03:25,740
This is unlike the dense layer or the default activation as identity.

45
00:03:30,560 --> 00:03:35,090
Now, again, it's always good to think about the shape of the data as it passes through the neural

46
00:03:35,090 --> 00:03:35,790
network.

47
00:03:36,620 --> 00:03:40,710
So in this case, we start with a multivariate time series of shape TBD.

48
00:03:41,810 --> 00:03:47,480
This then goes through a simple Arnon layer, as you recall, the simple ernan, it takes in a time

49
00:03:47,480 --> 00:03:53,570
series of vectors X one, x two all the way up to exploit T, it converts them into a time series of

50
00:03:53,570 --> 00:04:00,740
hidden vectors, each one, each to all the way up to each big T recall that each head and vector depends

51
00:04:00,740 --> 00:04:03,620
on the current input X and the past hidden vector.

52
00:04:04,310 --> 00:04:09,650
By default, we only get to keep the final each of Big T, which includes all the information from the

53
00:04:09,650 --> 00:04:10,300
past.

54
00:04:10,910 --> 00:04:16,790
In other words, the output of the simple answer by default is an M size vector H of Big T.

55
00:04:17,840 --> 00:04:22,160
From this, we pass this through one of our final dense layers, which is like a regular.

56
00:04:22,160 --> 00:04:27,830
And so supposing we have K outputs, our output will be a vector of size K.

57
00:04:32,470 --> 00:04:38,020
Now, as you recall, it's also possible to use an R and such that it not only returns the final hit

58
00:04:38,020 --> 00:04:44,130
and vector of Big T, but all the hidden vectors of the Time series, in order to do this intenser flow,

59
00:04:44,320 --> 00:04:46,220
we just need to pass in a single argument.

60
00:04:46,390 --> 00:04:48,300
Return sequences equals true.

61
00:04:48,850 --> 00:04:54,010
By doing this, we get back all the hidden vectors in a single array of size t biem.

62
00:04:55,240 --> 00:04:59,400
So you can imagine this holding each of one, each of two all the way up to big T..

63
00:05:00,550 --> 00:05:02,600
From here we have several options.

64
00:05:03,160 --> 00:05:08,590
So suppose this is a many to many task where we want to have one prediction for every timestep.

65
00:05:08,950 --> 00:05:15,730
So we want y one y to all the way up to y big T in this case we can just pass the output of the RNA

66
00:05:15,730 --> 00:05:18,640
through a dense layer and this is what we automatically get.

67
00:05:19,870 --> 00:05:26,470
So in this case, the output of the neural network is T by K note that there is no special syntax you

68
00:05:26,470 --> 00:05:27,280
need to use.

69
00:05:27,730 --> 00:05:32,370
Tensor flow automatically knows that the dense layer should be dealing with a time series.

70
00:05:32,770 --> 00:05:37,960
In other words, the dense layer works for both single vectors and the time series of vectors.

71
00:05:38,500 --> 00:05:42,580
If you pass in a single vector, you get back a single vector of size K.

72
00:05:42,970 --> 00:05:46,570
If you pass in a series of vectors, then you get back a series of vectors.

73
00:05:46,720 --> 00:05:47,890
Each of these K.

74
00:05:52,530 --> 00:05:57,930
The next scenario to consider is when you're dealing with the many to one task in this case, we want

75
00:05:57,930 --> 00:06:00,220
an output only for the final timestep.

76
00:06:01,260 --> 00:06:06,510
Now, of course, the easy thing to do would be to not use return sequences at all and just keep your

77
00:06:06,520 --> 00:06:07,210
big T..

78
00:06:07,830 --> 00:06:12,620
But let's suppose you want to keep all the hidden vectors and take the maximum value over time.

79
00:06:13,440 --> 00:06:18,020
In this case, you can apply global max pooling, just as you would with coins.

80
00:06:18,600 --> 00:06:25,590
As you recall, this a time series of size T Biem into a single vector of size m completely eliminating

81
00:06:25,590 --> 00:06:26,580
the time dimension.

82
00:06:27,430 --> 00:06:32,550
Once we have a single vector of size m, we're back to the usual situation and we can pass this through

83
00:06:32,550 --> 00:06:35,910
a final dense layer to get back a single vector of size k.

84
00:06:40,780 --> 00:06:46,960
The last scenario I want to cover is when you want to stack multiple Afnan layers together, as you

85
00:06:46,960 --> 00:06:53,290
recall, the input of an aunt and there must be a time series and therefore the previous Afnan layer

86
00:06:53,500 --> 00:06:55,120
must output a time series.

87
00:06:55,630 --> 00:07:00,280
Of course, you already know that this can be done by setting return sequences equal to true.

88
00:07:01,000 --> 00:07:06,250
So here's an example of a many to one Arnim where we've stacked multiple iron and layers.

89
00:07:07,710 --> 00:07:13,800
In this case, we start with an input time series of sighs TBD, then we pass through one Arnold and

90
00:07:13,800 --> 00:07:19,950
layer with thirty two hidden units we set return sequence is equal to true so that the output has the

91
00:07:19,950 --> 00:07:21,380
shape t by thirty two.

92
00:07:22,140 --> 00:07:27,270
Of course this is still a multivariate time series which can be passed through more often and layers.

93
00:07:27,870 --> 00:07:30,930
So the next line and layer also has thirty two hidden units.

94
00:07:31,110 --> 00:07:34,820
But return sequences is now set to force, which is the default.

95
00:07:35,340 --> 00:07:41,490
So the output of this layer will just be a vector of size 30 to the next step is to pass this through

96
00:07:41,550 --> 00:07:44,790
one final dense layer where we get an output of size k.

97
00:07:46,130 --> 00:07:48,030
OK, so I hope this was pretty simple.

98
00:07:48,590 --> 00:07:53,450
You've now seen how to stack of multiple art in layers together by using return sequences.

99
00:07:58,080 --> 00:08:03,990
Now, just as a little preview, I want to show you how easy it is to use Lithium's and G.R. use, even

100
00:08:03,990 --> 00:08:06,150
if you don't know anything about how they work.

101
00:08:06,780 --> 00:08:11,370
Observe that it's simply a matter of changing the type of object that's being used.

102
00:08:12,060 --> 00:08:17,220
So if I want to use ANTM, then I simply type LSM instead of simple arnet.

103
00:08:18,150 --> 00:08:22,210
If I want to use a jianhu, then I use Jianhu instead of Ellerston.

104
00:08:23,130 --> 00:08:28,390
So in fact, you don't even have to understand how these Arnon units work in order to use them.

105
00:08:29,190 --> 00:08:33,120
So if that's your preferred approach, then by all means use that approach.
