1
00:00:11,600 --> 00:00:17,780
In this lecture, we are going to go through a Colau notebook that emphasizes the importance of shapes

2
00:00:17,780 --> 00:00:18,650
in our own ends.

3
00:00:19,370 --> 00:00:24,920
Remember that whenever you hear me say something like and by TBD, you should be automatically thinking

4
00:00:24,920 --> 00:00:28,400
about a box without me explicitly showing you a box.

5
00:00:28,880 --> 00:00:34,940
If you don't have this automatic visualization reflex, you'll be at a disadvantage in trying to learn

6
00:00:34,940 --> 00:00:35,480
this stuff.

7
00:00:36,350 --> 00:00:39,140
This lecture is all about tracking the shapes in an art in.

8
00:00:39,380 --> 00:00:45,020
And also, we are going to go through the art and calculation of manually to reinforce our understanding

9
00:00:45,260 --> 00:00:46,640
of how an art and works.

10
00:00:47,390 --> 00:00:52,760
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

11
00:00:52,760 --> 00:00:53,030
at.

12
00:00:56,370 --> 00:00:58,920
So the first thing we have here are just some comments.

13
00:00:59,400 --> 00:01:03,360
I list out all the important size variables we have to pay attention to.

14
00:01:04,170 --> 00:01:07,050
These things should be permanently stored in your memory.

15
00:01:07,410 --> 00:01:10,280
You should never be asking what does m mean again?

16
00:01:10,830 --> 00:01:14,160
If you are, that will significantly slow down your learning.

17
00:01:14,400 --> 00:01:17,550
So take notes and write things down if you have to.

18
00:01:18,300 --> 00:01:24,060
So just to recap and is the number of samples in your dataset, this has been the case since the beginning

19
00:01:24,060 --> 00:01:24,900
of this course.

20
00:01:25,590 --> 00:01:33,300
T is the sequence like, remember that in TensorFlow, we assume constant size sequences D is the input

21
00:01:33,300 --> 00:01:34,500
feature dimensionality.

22
00:01:35,040 --> 00:01:40,740
We've gone through many examples of this where you might have a d bigger than one m is the number of

23
00:01:40,740 --> 00:01:41,580
hidden units.

24
00:01:42,030 --> 00:01:47,280
This is the same as we have in a regular feed forward, and so it's a hyper parameter which you can

25
00:01:47,280 --> 00:01:47,880
choose.

26
00:01:48,780 --> 00:01:51,030
Finally, K is the number of output nodes.

27
00:01:51,780 --> 00:01:57,510
As a side note, K being bigger than one does not automatically imply you were doing classification

28
00:01:57,510 --> 00:01:58,590
with a soft max.

29
00:01:59,160 --> 00:02:05,040
You can do multi-dimensional regression to imagine, for instance, you are trying to predict lat long

30
00:02:05,040 --> 00:02:05,820
coordinates.

31
00:02:06,270 --> 00:02:10,410
In that scenario, K would be two, but it would still be a regression problem.

32
00:02:14,320 --> 00:02:16,600
So next, we're going to make some dummy data.

33
00:02:18,000 --> 00:02:20,010
Also going to set our size variables.

34
00:02:21,050 --> 00:02:28,340
So we're setting A. the OK and then we make X to be just a random array of size end by TBD.

35
00:02:32,080 --> 00:02:37,090
OK, so and as one, so for this example, we're only going to be working with one sample.

36
00:02:38,080 --> 00:02:42,310
Is 10, so our sequence length is 10, the equals three.

37
00:02:42,550 --> 00:02:44,530
So our feature dimensionality is three.

38
00:02:45,280 --> 00:02:46,720
Finally, K equals two.

39
00:02:47,080 --> 00:02:48,580
So we have two ALPA nodes.

40
00:02:49,180 --> 00:02:52,470
And as you know, our Input X will be a shape and by T by the.

41
00:02:54,590 --> 00:02:56,600
Next, we're going to create our model.

42
00:02:57,750 --> 00:03:01,290
Here we also set em the number of hidden units to be five.

43
00:03:02,160 --> 00:03:06,120
As usual, we start with an input layer whose shape is TBD.

44
00:03:07,900 --> 00:03:12,000
Then we create a simple orange layer, which has the number of hidden units M.

45
00:03:13,140 --> 00:03:19,920
Let's assume the default activation, which is a 10h finally, we create a dense layer with a number

46
00:03:19,920 --> 00:03:20,760
of output units.

47
00:03:20,820 --> 00:03:21,300
OK.

48
00:03:22,170 --> 00:03:24,190
For this, I'll assume we're doing regression.

49
00:03:24,270 --> 00:03:26,160
So there is no activation function.

50
00:03:33,080 --> 00:03:36,050
Next, we are going to use our model to make a prediction.

51
00:03:36,710 --> 00:03:40,280
Now, obviously, both our data and weights are random.

52
00:03:40,850 --> 00:03:42,740
So this prediction is not meaningful.

53
00:03:43,460 --> 00:03:45,740
These numbers are really just for sanity checking.

54
00:03:46,520 --> 00:03:50,780
And as you can see, the output shape is as expected, one by two.

55
00:03:51,170 --> 00:03:57,620
So we have one sample in two output nodes taking note of these numbers as this is what we want to compare

56
00:03:57,620 --> 00:03:58,490
with later on.

57
00:04:02,600 --> 00:04:07,100
Next, we can do a model summary so that we can see all the layers of our own in.

58
00:04:09,640 --> 00:04:15,100
As expected, we have three layers the input layer, the simple orange layer and the dense layer.

59
00:04:17,200 --> 00:04:23,200
Now, we don't exactly know what parameters are stored in a simple answer, although we do know the

60
00:04:23,200 --> 00:04:25,420
mathematical equation to get the output.

61
00:04:26,380 --> 00:04:29,050
So let's just check the weights and see what they are.

62
00:04:32,060 --> 00:04:34,250
So we have some idea of what's stored in there.

63
00:04:36,120 --> 00:04:38,340
OK, so it looks like three big arrays.

64
00:04:39,990 --> 00:04:43,680
And it's actually more helpful to prints out the shape of these arrays.

65
00:04:44,730 --> 00:04:50,010
And from there, we can deduce which array corresponds to which way in the simple answer.

66
00:04:52,060 --> 00:04:57,100
So as you can see, we get a three by five, a five by five and a five length vector.

67
00:04:57,760 --> 00:05:01,000
If you recall, De equals three and M equals five.

68
00:05:01,510 --> 00:05:02,560
So that makes sense.

69
00:05:03,100 --> 00:05:07,210
The first way is d by M, which means it's the input to hidden weight.

70
00:05:08,260 --> 00:05:11,950
The second weight is m by M, which means it's the head into and way.

71
00:05:12,490 --> 00:05:16,150
And the third weight is a vector of length m, which means it's the bias term.

72
00:05:17,350 --> 00:05:20,920
Now we can assign our weight variables with confidence.

73
00:05:25,170 --> 00:05:31,050
So for the Lloret index one, we assign these weights W X W H and B H.

74
00:05:31,680 --> 00:05:38,760
Notice I'm using shorthand here, so I don't use W X H and W h, since it's not that useful for the

75
00:05:38,760 --> 00:05:40,000
lire index to.

76
00:05:40,020 --> 00:05:45,030
This corresponds to the output layer, so we assign these to the weights w o and B O.

77
00:05:48,450 --> 00:05:52,020
The last step is to do our manual art and calculation.

78
00:05:52,710 --> 00:05:57,120
This just follows the pseudocode we discussed earlier, so hopefully you were taking notes.

79
00:05:57,870 --> 00:06:03,300
To start, we are going to initialize the initial hidden state to a vector of zeros.

80
00:06:04,080 --> 00:06:07,150
By the way, if we get this wrong, then the output will be different.

81
00:06:07,170 --> 00:06:12,180
So here's another way we can confirm that the initial state really is zero.

82
00:06:13,290 --> 00:06:19,080
Next, we get X at index zero, which is our one and only sample, so we call this little x.

83
00:06:20,190 --> 00:06:23,820
Next, we initialize an empty list for all of our Y hats.

84
00:06:24,450 --> 00:06:28,560
As you know, in this example, we only care about the final Y have.

85
00:06:29,930 --> 00:06:32,510
But we are going to calculate them all for completeness.

86
00:06:34,500 --> 00:06:40,980
Next, we into a loop where little two counts up from zero up to Big T inside the loop, we calculate

87
00:06:40,980 --> 00:06:42,030
the first H.

88
00:06:42,660 --> 00:06:44,850
That's the hidden value at the hidden layer.

89
00:06:45,540 --> 00:06:54,300
It's equal to the 10h of exact T dotted with W X plus h last dotted with w h plus b h.

90
00:06:54,990 --> 00:06:58,140
So you should recognize this formula from the slides we just discussed.

91
00:06:58,350 --> 00:07:03,150
And of course, I know you took notes, so you should be cross-referencing with those.

92
00:07:04,200 --> 00:07:09,120
Once we have h, we can calculate y hat, which is just the usual neuron equation.

93
00:07:10,660 --> 00:07:16,870
Finally, we assign each to each last so that each last has the correct value for the next iteration

94
00:07:16,870 --> 00:07:17,410
of the Loop.

95
00:07:18,130 --> 00:07:24,310
And once we're outside the loop, we can print out the final value of the white list and hopefully this

96
00:07:24,310 --> 00:07:28,210
is equal to what we calculated before when we call the model that predicts.

97
00:07:37,320 --> 00:07:39,600
All right, so you probably forgot these numbers by now.

98
00:07:40,630 --> 00:07:45,210
Let's go back up and check, so it's minus 0.7 and zero point four or five.

99
00:07:48,190 --> 00:07:48,490
All right.

100
00:07:48,520 --> 00:07:50,770
Minus 0.7, 0.4, five.

101
00:07:52,030 --> 00:07:53,340
So that's pretty awesome.

102
00:07:53,350 --> 00:07:58,750
We've confirmed that these are indeed the calculations that are done in the simple art and.

103
00:08:05,500 --> 00:08:12,040
OK, so one thing that made this exercise simpler was that we only had one sample, so as a bonus exercise,

104
00:08:12,040 --> 00:08:13,000
here's what you can do.

105
00:08:13,480 --> 00:08:15,190
Use an end bigger than one.

106
00:08:16,030 --> 00:08:21,880
Modify this code so that it still produces the same result even when you have multiple samples.