1
00:00:11,070 --> 00:00:16,200
In this lecture, we are going to look at a notebook that implements convolutional neural networks for

2
00:00:16,200 --> 00:00:17,490
tax classification.

3
00:00:18,210 --> 00:00:23,250
This lecture is going to walk you through a prepared CoLab notebook, although a very good exercise,

4
00:00:23,250 --> 00:00:28,740
which I always recommend is once you know how this is done, to try and recreate it yourself with as

5
00:00:28,740 --> 00:00:30,240
few references as possible.

6
00:00:30,960 --> 00:00:35,850
As usual, you can look at the title of the notebook to determine what notebook we are currently looking

7
00:00:35,850 --> 00:00:36,120
at.

8
00:00:38,130 --> 00:00:41,090
So at a high level, really, this script contains nothing new.

9
00:00:41,670 --> 00:00:44,600
You know how to do text preprocessing because we just did that.

10
00:00:45,090 --> 00:00:47,460
You know, how to use them betting's because we just did that.

11
00:00:47,940 --> 00:00:51,160
And you know how to build and train a CNN because we did that earlier.

12
00:00:51,960 --> 00:00:57,600
The only difference in this script is we're replacing all the tutee operations with one of the operations,

13
00:00:57,630 --> 00:00:58,700
not a huge change.

14
00:00:59,250 --> 00:01:03,780
So I hope you had the opportunity to try this yourself first before watching this lecture.

15
00:01:05,670 --> 00:01:10,650
OK, so since the script is very similar to the previous one, we're going to go through it very fast

16
00:01:10,650 --> 00:01:15,810
and only talk about the new thing, which is the CNN with one dimensional convolutions.

17
00:01:20,610 --> 00:01:23,100
So first we download Spambots CSFI.

18
00:01:28,520 --> 00:01:31,370
The next step is to read in the CSFI using pandas.

19
00:01:35,210 --> 00:01:38,570
The next step is to look at the CEV using the head.

20
00:01:43,030 --> 00:01:45,460
The next step is to drop the junk columns.

21
00:01:48,460 --> 00:01:52,270
The next step is to call the head again to make sure they were removed.

22
00:01:56,450 --> 00:01:59,420
The next step is to rename the columns to labels and data.

23
00:02:02,440 --> 00:02:05,950
The next step is to call the head again to make sure that we're not.

24
00:02:10,510 --> 00:02:12,910
The next step is to create a zero one labels.

25
00:02:17,020 --> 00:02:20,050
The next step is to split the data into train and test.

26
00:02:23,650 --> 00:02:27,790
The next step is to check the shape of our train and test set just to see what they were.

27
00:02:31,530 --> 00:02:34,630
The next step is to perform all the tests, preprocessing steps.

28
00:02:35,250 --> 00:02:38,640
We'll start by initializing IDEX and word to IDEX.

29
00:02:42,690 --> 00:02:46,710
The next step is to populate words while using the training corpus.

30
00:02:50,670 --> 00:02:54,300
The next step is to print out words, the acts as a sanity check.

31
00:03:00,510 --> 00:03:04,080
The next step is to check how many values are in word to ATX.

32
00:03:08,250 --> 00:03:12,270
The next step is to convert our training corpus into lists of integers.

33
00:03:16,970 --> 00:03:19,730
The next step is to do the same thing for the test corpus.

34
00:03:23,610 --> 00:03:26,970
The next step is to check the length of our new model input's.

35
00:03:31,190 --> 00:03:34,550
The next step is to create our data generator, which you've seen before.

36
00:03:39,680 --> 00:03:41,870
The next step is to grab our device variable.

37
00:03:46,490 --> 00:03:50,600
Now, I've told you why the output of the embedding layer will be assigned by T, by D.

38
00:03:51,020 --> 00:03:53,880
But this kind of thing is no good if you've just memorized it.

39
00:03:54,230 --> 00:03:55,820
You have to see it for yourself.

40
00:03:56,360 --> 00:03:59,380
So here we create an embedding with Dimension 20.

41
00:04:00,200 --> 00:04:05,840
Next we through our data generator and print out the shape of the data before and after the embedding.

42
00:04:10,090 --> 00:04:16,330
As you can see, the shape of the data before the embedding is NBT in the shape of the data after the

43
00:04:16,330 --> 00:04:20,080
embedding is NBT by 20, which is what we would expect.

44
00:04:23,340 --> 00:04:29,940
This brings us to our custom CNN model in the constructor, we create our embedding layer, a series

45
00:04:29,940 --> 00:04:33,340
of convolutions and max pools, and then a final dense layer.

46
00:04:34,140 --> 00:04:40,350
Notice that we now use one D instead of continuity and Max Poole, one D instead of Max fortuity.

47
00:04:44,110 --> 00:04:46,670
But the forward function is where things get strange.

48
00:04:47,410 --> 00:04:54,640
First, we pass the data through the embedding, which gives us an ENVI TBD, as you know, but convolutions

49
00:04:54,640 --> 00:05:01,720
and PI torch use features first, meaning that it expects an end by D by T, not an end by TBD.

50
00:05:02,380 --> 00:05:05,250
So we have to permute the data to make it that way.

51
00:05:08,290 --> 00:05:11,750
Note that in other libraries, this might be called transpose.

52
00:05:12,130 --> 00:05:16,750
It's just a generalisation of the usual two dimensional matrix transpose.

53
00:05:18,010 --> 00:05:21,970
Next, we pass our data through the convolutional and pooling layers.

54
00:05:24,230 --> 00:05:30,410
The next step is to permute the data back, to put the features last note that this isn't absolutely

55
00:05:30,410 --> 00:05:35,700
necessary since we get to choose which dimension to take the max over in the next step.

56
00:05:36,230 --> 00:05:40,070
So maybe as an exercise, you can try to remove that second permute.

57
00:05:40,910 --> 00:05:45,290
Lastly, we pass our data through the final dense layer and then return our output.

58
00:05:51,520 --> 00:05:58,120
The next step is to instantiate our CNN and move it to the GPU and after this is just stuff you've already

59
00:05:58,120 --> 00:05:58,600
seen.

60
00:06:05,290 --> 00:06:07,630
The next step is to create our loss and optimizer.

61
00:06:11,000 --> 00:06:13,670
The next step is to create transgene and Tashjian.

62
00:06:17,380 --> 00:06:19,600
The next step is to create the training function.

63
00:06:25,560 --> 00:06:27,870
The next step is to call the trading function.

64
00:06:34,070 --> 00:06:36,590
The next step is to plot the laws per iteration.

65
00:06:40,990 --> 00:06:43,300
OK, so the last iteration looks good.

66
00:06:46,120 --> 00:06:48,370
The next step is to calculate the accuracy.

67
00:06:53,720 --> 00:06:57,230
OK, so the accuracy looks good and we're done.

68
00:06:58,220 --> 00:07:04,820
OK, so we get a very high accuracy with a CNN in the high 90s, you might have thought because Arnon's

69
00:07:04,820 --> 00:07:08,020
are four sequences, they might do a little better than CNN's.

70
00:07:08,360 --> 00:07:10,690
But in fact, CNN's are pretty great as well.
