1
00:00:11,690 --> 00:00:17,030
In this lecture I'm going to show you the code for how to create an orange n without embedding layer

2
00:00:17,270 --> 00:00:23,510
so that it can take text as input rather than a feature vectors as input to be more specific.

3
00:00:23,540 --> 00:00:29,390
We'll assume that this text is already encoded as a sequence of integers that can be used to index and

4
00:00:29,390 --> 00:00:30,440
embedding.

5
00:00:30,620 --> 00:00:35,810
We haven't discussed how to create such sequences of integers yet but I assume you have some idea in

6
00:00:35,810 --> 00:00:39,840
your mind what is cause how exactly that will work in a later lecture.

7
00:00:40,310 --> 00:00:42,740
But for now we'll just focus on the model itself.

8
00:00:47,850 --> 00:00:50,870
So let's start with the constructor as you might expect.

9
00:00:50,880 --> 00:00:56,730
It's pretty similar to a regular Arnon but just with one additional embedding layer because we have

10
00:00:56,730 --> 00:01:02,700
this additional embedding layer we'll also need additional hyper parameters specifically the size of

11
00:01:02,700 --> 00:01:07,170
the embedding dimension which will denote as deep as a hyper parameter.

12
00:01:07,170 --> 00:01:12,390
Do we want the word vectors to be of length 20 or length 50 or maybe length 300.

13
00:01:12,390 --> 00:01:15,420
This is a hyper parameter choice to be made by you.

14
00:01:15,750 --> 00:01:19,700
We'll also need to know the size of the vocabulary which will denote as V.

15
00:01:21,520 --> 00:01:27,340
So why do we need to know V and B well this specifies the size of the embedding matrix which will be

16
00:01:27,340 --> 00:01:29,080
of shape V by D.

17
00:01:29,170 --> 00:01:31,490
It has V rows and these columns.

18
00:01:31,510 --> 00:01:38,650
Why is that is because each row gives us back a D size vector corresponding to a word.

19
00:01:38,710 --> 00:01:41,630
This is what we discussed in the previous lecture.

20
00:01:41,680 --> 00:01:47,620
So in total we have five different numbers that we pass into the constructor V which is the vocabulary

21
00:01:47,620 --> 00:01:53,920
size D which is the embedding dimension M which is the number of hidden units L which is the number

22
00:01:53,920 --> 00:01:57,800
of hidden layers and K which is the number of outputs.

23
00:01:57,940 --> 00:02:05,590
Then we instantiate three modules the embedding module the Alice module and the final linear module.

24
00:02:05,590 --> 00:02:08,560
We already know how the Elysium and linear layer work.

25
00:02:08,560 --> 00:02:16,220
So I assume this requires no further elaboration.

26
00:02:16,450 --> 00:02:21,610
Next we have the forward function in the forward function that we take in some kind of x and then we

27
00:02:21,610 --> 00:02:28,750
pass it through each of our layers that we defined earlier since our model includes an Alice to m.

28
00:02:28,830 --> 00:02:34,150
We know that we need to instantiate an initial head and state a night and an initial cell state scene

29
00:02:34,200 --> 00:02:37,240
I as usual.

30
00:02:37,260 --> 00:02:40,710
The important thing to pay attention to is shapes.

31
00:02:40,710 --> 00:02:48,240
First what is the shape of Vex that we expect as input since each sentence is a sequence of word indexes

32
00:02:48,300 --> 00:02:49,770
which are integers.

33
00:02:49,770 --> 00:02:54,870
We expect X to be an array of shape n by t containing integers.

34
00:02:54,930 --> 00:03:03,580
Each row is a sequence of length t in each of the T columns contains a word index after passing this

35
00:03:03,580 --> 00:03:04,650
through an embedding layer.

36
00:03:04,960 --> 00:03:08,780
It's going to map each word index to a desired vector.

37
00:03:08,890 --> 00:03:12,320
Therefore we get back and end by t by the tensor.

38
00:03:12,550 --> 00:03:19,750
We should recognize this as the usual shape that we pass through and aren't in therefore after this.

39
00:03:19,750 --> 00:03:25,810
Nothing else really changes when we pass in an end by t by D through an Alice M we get back in array

40
00:03:25,810 --> 00:03:32,380
of size and by t by M if we do a global Max pool or we simply take the final hit and state we get back

41
00:03:32,380 --> 00:03:35,330
a 2D array of size end by M.

42
00:03:35,800 --> 00:03:40,800
Finally after passing this through the final dense layer we get back in array of size n by K.
