1
00:00:11,570 --> 00:00:17,930
In this lecture, we going to go through a CoLab notebook that emphasizes the importance of shapes in

2
00:00:17,930 --> 00:00:18,670
Arnett's.

3
00:00:19,340 --> 00:00:24,950
Remember that whenever you hear me say something like and by TBD, you should be automatically thinking

4
00:00:24,950 --> 00:00:28,430
about a box without me explicitly showing you a box.

5
00:00:28,850 --> 00:00:34,970
If you don't have this automatic visualization reflex, you'll be at a disadvantage in trying to learn

6
00:00:34,970 --> 00:00:35,390
this stuff.

7
00:00:36,350 --> 00:00:39,140
This lecture is all about tracking the shapes in and in.

8
00:00:39,380 --> 00:00:45,050
And also we are going to go through the brain and calculation manually to reinforce our understanding

9
00:00:45,200 --> 00:00:46,660
of how an art and works.

10
00:00:47,360 --> 00:00:52,790
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

11
00:00:52,790 --> 00:00:53,040
at.

12
00:00:56,370 --> 00:01:02,010
So the first thing we have here are just some comments where I list out all the important size variables

13
00:01:02,010 --> 00:01:07,040
we have to pay attention to, these things should be permanently stored in your memory.

14
00:01:07,410 --> 00:01:10,290
You should never be asking what does it mean again?

15
00:01:10,800 --> 00:01:14,160
If you are, that will significantly slow down your learning.

16
00:01:14,370 --> 00:01:17,550
So take notes and write things down if you have to.

17
00:01:18,270 --> 00:01:22,060
So just to recap, and is the number of samples in your data set?

18
00:01:22,200 --> 00:01:24,930
This has been the case since the beginning of this course.

19
00:01:25,620 --> 00:01:26,640
T is the sequence.

20
00:01:27,540 --> 00:01:28,980
Remember that intensive flow?

21
00:01:29,190 --> 00:01:31,440
We assume constant size sequences.

22
00:01:32,190 --> 00:01:34,520
D is the input feature dimensionality.

23
00:01:35,070 --> 00:01:40,770
We've gone through many examples of this where you might have a D bigger than one M is the number of

24
00:01:40,770 --> 00:01:41,590
hidden units.

25
00:01:42,030 --> 00:01:44,550
This is the same as we have in a regular feed forward.

26
00:01:44,550 --> 00:01:47,910
And so it's a hyper parameter which you can choose.

27
00:01:48,810 --> 00:01:51,090
Finally, K is the number of output nodes.

28
00:01:51,750 --> 00:01:57,660
As a side note, K being bigger than one does not automatically imply you are doing classification with

29
00:01:57,660 --> 00:01:58,620
a softmax.

30
00:01:59,160 --> 00:02:01,140
You can do multidimensional regression.

31
00:02:01,140 --> 00:02:07,680
To imagine, for instance, you were trying to predict lat long coordinates in that scenario would be

32
00:02:07,680 --> 00:02:10,460
two, but it would still be a regression problem.

33
00:02:14,290 --> 00:02:16,630
So next, we're going to make some dummy data.

34
00:02:18,000 --> 00:02:20,040
Also going to set our size variables.

35
00:02:21,050 --> 00:02:28,340
So we're setting A.D., OK, and then we make X to be just a random array of size and by Tibaldi.

36
00:02:32,050 --> 00:02:38,710
OK, so and has one, so for this example, we're only going to be working with one sample T is ten,

37
00:02:38,710 --> 00:02:42,310
so our sequence length is ten, the equals three.

38
00:02:42,550 --> 00:02:44,560
So our feature dimensionality is three.

39
00:02:45,250 --> 00:02:46,720
Finally, Cagle's two.

40
00:02:47,050 --> 00:02:48,640
So we have two output nodes.

41
00:02:49,150 --> 00:02:52,480
And as you know, our input X will be of shape and by T by the.

42
00:02:54,590 --> 00:02:56,600
Next, we are going to create our model.

43
00:02:57,720 --> 00:03:01,330
Here we also set em the number of hidden units to be five.

44
00:03:02,130 --> 00:03:06,150
As usual, we start with an input layer whose shape is TBD.

45
00:03:07,900 --> 00:03:12,100
Then we create a simple orange layer, which has the number of hidden units M.

46
00:03:13,140 --> 00:03:19,920
Let's assume the default activation, which is a Tanach, finally we create a dense layer with the number

47
00:03:19,920 --> 00:03:26,190
of output units K for this, I'll assume we're doing regression so there is no activation function.

48
00:03:32,810 --> 00:03:36,050
And next, we are going to use our model to make a prediction.

49
00:03:36,680 --> 00:03:42,720
Now, obviously, both our data and weights are random, so this prediction is not meaningful.

50
00:03:43,460 --> 00:03:45,780
These numbers are really just for sanity checking.

51
00:03:46,490 --> 00:03:50,770
And as you can see, the upward shape is, as expected, one by two.

52
00:03:51,170 --> 00:03:57,650
So we have one sample and two output notes taken out of these numbers as this is what we want to compare

53
00:03:57,650 --> 00:03:58,520
with later on.

54
00:04:02,570 --> 00:04:07,020
Next, we can do a moral summary so that we can see all the layers of our own.

55
00:04:09,580 --> 00:04:15,160
As expected, we have three layers, the input layer, the simple orange layer and the dense layer.

56
00:04:17,200 --> 00:04:23,890
Now, we don't exactly know what parameters are stored in a simple Arnim, although we do know the mathematical

57
00:04:23,890 --> 00:04:25,420
equation to get the output.

58
00:04:26,380 --> 00:04:29,120
So let's just check the weights and see what they are.

59
00:04:32,030 --> 00:04:34,280
So we have some idea of what's stored in their.

60
00:04:36,120 --> 00:04:38,400
OK, so it looks like three big arrays.

61
00:04:39,990 --> 00:04:43,710
That's actually more helpful to Prince out the shape of these array's.

62
00:04:44,700 --> 00:04:50,040
And from there, we can deduce which array corresponds to which way in the simple arnet.

63
00:04:52,060 --> 00:04:58,660
So as you can see, we get a three by five, a five by five and a five length vector, if you recall,

64
00:04:58,660 --> 00:05:01,060
D equals three and M equals five.

65
00:05:01,480 --> 00:05:02,610
So that makes sense.

66
00:05:03,130 --> 00:05:07,230
The first weight is D by M, which means it's the input to hidden weight.

67
00:05:08,260 --> 00:05:13,930
The second weight is MBM, which means it's the head into head and way, and the third weight is a vector

68
00:05:13,930 --> 00:05:16,180
of length M, which means it's the bias term.

69
00:05:17,350 --> 00:05:20,950
Now we can assign our weight variables with confidence.

70
00:05:25,170 --> 00:05:32,910
So for the Lloret index one, we assign these weights W, X, W, H and B, H, notice I'm using shorthand

71
00:05:32,910 --> 00:05:41,040
here so I don't use w h and since it's not that useful for the liora index to this corresponds to the

72
00:05:41,040 --> 00:05:41,780
output layer.

73
00:05:42,000 --> 00:05:45,030
So we assign these to the weights W, O and BAEO.

74
00:05:48,480 --> 00:05:55,130
The last step is to do our manual and calculation, this just follows the pseudocode we discussed earlier.

75
00:05:55,260 --> 00:05:58,460
So hopefully you were taking notes to start.

76
00:05:58,470 --> 00:06:03,310
We are going to initialize the initial head and state to a vector of zeros.

77
00:06:04,110 --> 00:06:07,150
By the way, if we get this wrong, then the output will be different.

78
00:06:07,170 --> 00:06:12,180
So here's another way we can confirm that the initial state really is zero.

79
00:06:13,290 --> 00:06:17,660
Next, we get X at index zero, which is our one and only sample.

80
00:06:17,670 --> 00:06:23,850
So we call this little X. Next, we initialize an empty list for all of our Y hats.

81
00:06:24,420 --> 00:06:28,590
As you know, in this example, we only care about the final we have.

82
00:06:29,900 --> 00:06:32,570
But we are going to calculate them all for completeness.

83
00:06:34,530 --> 00:06:41,100
Next way into a loop where little T counts up from zero up to Big T inside the loop, we calculate the

84
00:06:41,100 --> 00:06:44,840
first H, that's the hidden value at the hidden layer.

85
00:06:45,540 --> 00:06:55,320
It's equal to the Tanach of exacty dotted with W, X plus H last dotted with W, H, plus B.H. So you

86
00:06:55,320 --> 00:06:58,170
should recognize this formula from the slides we just discussed.

87
00:06:58,320 --> 00:07:03,210
And of course I know you took notes, so you should be cross referencing with those.

88
00:07:04,200 --> 00:07:09,180
Once we have H we can calculate Y hat, which is just the usual neuron equation.

89
00:07:10,660 --> 00:07:17,050
Finally, we assign each to last so that each last has the correct value for the next iteration of the

90
00:07:17,050 --> 00:07:22,630
loop, and once we're outside the loop, we can print out the final value of the White House list.

91
00:07:22,900 --> 00:07:28,240
And hopefully this is equal to what we calculated before when we called model that predicts.

92
00:07:37,260 --> 00:07:39,660
All right, so you probably forgot these numbers by now.

93
00:07:40,500 --> 00:07:41,770
Let's go back up and check.

94
00:07:41,790 --> 00:07:45,270
So it's minus zero point seven and zero point four or five.

95
00:07:48,160 --> 00:07:48,520
All right.

96
00:07:48,550 --> 00:07:53,350
Minus four point seven zero point four or five, so that's pretty awesome.

97
00:07:53,380 --> 00:07:58,780
We've confirmed that these are indeed the calculations that are done in the simple arnet.

98
00:08:05,470 --> 00:08:12,070
OK, so one thing that made this exercise simpler was that we only had one sample, so as a bonus exercise,

99
00:08:12,070 --> 00:08:18,820
here's what you can do using any bigger than one, modify this code so that it still produces the same

100
00:08:18,820 --> 00:08:21,940
result even when you have multiple samples.