1
00:00:11,090 --> 00:00:17,270
So in this lecture, you're going to be assigned another exercise, which involves returning to CNN's

2
00:00:17,810 --> 00:00:21,380
you've just seen that we can use Arnolds for many, too many tasks.

3
00:00:21,920 --> 00:00:28,190
We already know that origins are pretty well suited for NLP and dealing with sequences, but we've also

4
00:00:28,190 --> 00:00:30,860
seen that CNN's can be used for text as well.

5
00:00:31,580 --> 00:00:33,170
So here's a question to consider.

6
00:00:33,740 --> 00:00:40,070
Can we also use science to do many too many tasks, such as parts of speech tagging or recognizing named

7
00:00:40,070 --> 00:00:40,700
entities?

8
00:00:41,420 --> 00:00:47,330
The short answer is yes, but your exercise will be to demonstrate that this is true by building such

9
00:00:47,330 --> 00:00:48,050
as CNN.

10
00:00:48,980 --> 00:00:54,890
Now you might think that this will be pretty easy, as you've seen, our CNN and Arnon for classifying

11
00:00:54,920 --> 00:00:56,810
text are nearly identical.

12
00:00:57,110 --> 00:00:59,910
We simply switch the Conven D for analysis to.

13
00:01:04,640 --> 00:01:07,850
However, this is, in fact, quite a challenging exercise.

14
00:01:08,330 --> 00:01:12,050
I won't give away all the details, since that would make the exercise too easy.

15
00:01:12,680 --> 00:01:18,830
The goal is for you to encounter and recognize problems on your own and when you do figure out a way

16
00:01:18,830 --> 00:01:19,490
to solve them.

17
00:01:20,390 --> 00:01:21,650
But here are a few hints.

18
00:01:22,580 --> 00:01:27,650
Firstly, we know that the size of the data as it goes through the layers of a CNN can change.

19
00:01:28,220 --> 00:01:33,230
But we don't want this since in our case, the output must have the same length as the input.

20
00:01:33,890 --> 00:01:40,250
So consider what the CNN architecture has to be, such that the sequence length does not change as it

21
00:01:40,250 --> 00:01:41,420
goes through the network.

22
00:01:43,060 --> 00:01:47,050
Secondly, we know that we need to have some way of ignoring the padded tokens.

23
00:01:47,620 --> 00:01:53,680
Otherwise these will be included in the loss, which is not correct and will lead to suboptimal performance.

24
00:01:54,520 --> 00:01:59,080
We saw that setting an argument in the embedding layer called mask zero seemed to work.

25
00:01:59,650 --> 00:02:01,870
The other option was to create a custom loss.

26
00:02:02,590 --> 00:02:06,520
You'll need to figure out which of these options works for the convolution case.

27
00:02:07,150 --> 00:02:10,419
As a small hint, this might be more difficult than you assume.

28
00:02:11,320 --> 00:02:15,970
Understanding how things work will be important, and using arguments just because I showed them to

29
00:02:15,970 --> 00:02:18,940
you previously is not necessarily a good idea.

30
00:02:19,720 --> 00:02:24,880
In fact, we already learned how this can lead to misleading results in the previous example.

31
00:02:25,810 --> 00:02:30,670
Remember that just because your notebook doesn't throw any errors does not imply it's correct.

32
00:02:31,360 --> 00:02:36,190
As you saw, you can get 99 percent accuracy even when it's not correct.

33
00:02:36,760 --> 00:02:42,520
So you should devise methods to check whether or not your results truly reflect your model's performance.

34
00:02:47,220 --> 00:02:50,700
So there are some final questions you want to consider once you get this working.

35
00:02:51,570 --> 00:02:55,170
Firstly, how does the CNN perform relative to the Arnette?

36
00:02:55,800 --> 00:02:59,610
Is it better or worse in terms of the various metrics we care about?

37
00:03:00,540 --> 00:03:05,640
Secondly, you should compare the training and inference times of both the CNN and Arnett.

38
00:03:06,810 --> 00:03:12,330
Consider if you had to do this task for a Real Worlds project, which model you would choose.

39
00:03:12,360 --> 00:03:13,980
Given the above two results.

