1
00:00:11,100 --> 00:00:16,200
So in this lecture, we'll be looking at the notebook to do text classification using a CNN.

2
00:00:17,010 --> 00:00:21,630
We'll begin by downloading our dataset, which will be the BBC News data once again.

3
00:00:22,260 --> 00:00:27,870
But note that at this point in the course, you learn the skills to plug in any dataset you like, so

4
00:00:27,900 --> 00:00:30,600
please feel free to do that as an exercise.

5
00:00:37,480 --> 00:00:40,600
The next step is to import everything we need for this notebook.

6
00:00:47,630 --> 00:00:51,800
The next step is to load in our CSB using PD that reads GSV.

7
00:00:56,900 --> 00:01:01,550
The next step is to call the effect head to remind ourselves what our data frame looks like.

8
00:01:05,620 --> 00:01:09,010
As you can see, we have two columns, text and labels.

9
00:01:12,840 --> 00:01:16,590
The next step is to assign a numerical targets for each of our labels.

10
00:01:20,990 --> 00:01:24,860
The next step is to determine the number of classes which we'll call K.

11
00:01:29,620 --> 00:01:32,650
The next step is to split our data into training test.

12
00:01:37,790 --> 00:01:41,660
The next step is to convert our text sentences into sequences.

13
00:01:42,440 --> 00:01:48,470
Note that we are now not using TFI Taf, but instead will be representing each document as a list of

14
00:01:48,470 --> 00:01:49,130
integers.

15
00:01:49,820 --> 00:01:52,700
Note that we've set the max vocab size to two thousand.

16
00:01:59,430 --> 00:02:04,890
The next step is to assign our word to index mapping to a variable and also to check the true vocab

17
00:02:04,890 --> 00:02:05,520
size.

18
00:02:10,000 --> 00:02:13,390
As you can see, there are over 27000 tokens.

19
00:02:18,080 --> 00:02:20,540
The next step is to pad our training sequences.

20
00:02:24,880 --> 00:02:30,490
As you can see, our documents are quite long with a maximum of about a few thousand tokens.

21
00:02:31,300 --> 00:02:33,520
This makes sense since they are news articles.

22
00:02:37,200 --> 00:02:39,870
The next step is to pad the test sequences as well.

23
00:02:40,230 --> 00:02:46,800
But this time, setting the max length to tee this will emulate how we use this model in the real world,

24
00:02:47,220 --> 00:02:51,570
since we wouldn't know the length of any future data that we want to use this model on.

25
00:02:58,200 --> 00:02:59,880
The next step is to create a model.

26
00:03:01,140 --> 00:03:03,630
You can see that I've chosen and embedding size of 50.

27
00:03:03,960 --> 00:03:06,810
But you should feel free to change this as an exercise.

28
00:03:11,950 --> 00:03:18,640
The next step is to create our CNN layers, so we have the input followed by embedding followed by Conven

29
00:03:18,640 --> 00:03:22,120
be followed by global max pooling, followed by dense.

30
00:03:22,840 --> 00:03:26,260
Note that I've commented out the extra convolution and pulling layers.

31
00:03:26,590 --> 00:03:31,540
But as always, you should feel free to test different combinations as an exercise.

32
00:03:32,320 --> 00:03:34,930
Remember that there is no formula for hyper parameters.

33
00:03:35,200 --> 00:03:37,840
You simply choose them based on experiments.

34
00:03:44,760 --> 00:03:48,630
The next step is to call, compile and fit all of which you've seen before.

35
00:04:02,900 --> 00:04:05,090
The next step is to plot the loss per epoch.

36
00:04:11,050 --> 00:04:12,730
So the last for looks good.

37
00:04:15,990 --> 00:04:18,600
The next step is to plot the accuracy per block.

38
00:04:24,920 --> 00:04:30,680
So the accuracy for EPOC looks good, as expected, performance on the train set is better.

39
00:04:33,670 --> 00:04:39,290
So as a final exercise for this lecture, please continue computing other metrics like the F-1 and the

40
00:04:39,290 --> 00:04:40,000
AUC.

41
00:04:41,770 --> 00:04:47,260
In addition, compare the performance of this model to previous models we used on this dataset.

42
00:04:47,860 --> 00:04:49,750
You may be surprised at the result.

43
00:04:50,440 --> 00:04:56,230
Consider why this is the case and think about how this might help guide your decisions about which models

44
00:04:56,230 --> 00:04:58,240
to use in the real world.

45
00:05:00,430 --> 00:05:06,160
Finally, you'll notice that something I mentioned about hyper parameters earlier has now come to fruition,

46
00:05:06,790 --> 00:05:09,790
in particular as our models become more complex.

47
00:05:10,120 --> 00:05:12,400
We have many more hyper parameters to choose.

48
00:05:12,910 --> 00:05:15,580
You can choose the filter size and number of feature maps.

49
00:05:15,850 --> 00:05:20,770
Number of convolution layers with its use flatten or global max pooling and so forth.

50
00:05:21,280 --> 00:05:26,500
This is in addition to the and in part and simple things we saw before, like learning rates.