1
00:00:11,010 --> 00:00:17,160
So in this lecture, we will be looking at the notebook to implement and then for multiclass classification

2
00:00:17,160 --> 00:00:23,430
and code, we'll begin by downloading our data set, which will be the BBC News data once again.

3
00:00:31,810 --> 00:00:34,780
The next step is to import everything we need for this script.

4
00:00:35,320 --> 00:00:36,760
Notice that there's nothing new.

5
00:00:43,150 --> 00:00:47,380
The next step is to read in our 6V using PD that reads GSV.

6
00:00:52,800 --> 00:00:57,120
The next step is to call the effect head to remind ourselves what our data looks like.

7
00:01:01,280 --> 00:01:05,480
As you recall, there are two columns containing the text and the labels.

8
00:01:10,250 --> 00:01:16,160
Now, as you recall, for TensorFlow, we need to specify our targets as integers from zero up to K

9
00:01:16,220 --> 00:01:16,970
minus one.

10
00:01:17,810 --> 00:01:18,950
To do this more easily.

11
00:01:18,980 --> 00:01:24,860
Note that we can simply take the labels in our data frame, convert them to category type and then call

12
00:01:24,860 --> 00:01:26,120
the codes attribute.

13
00:01:26,660 --> 00:01:28,350
This basically does what we need.

14
00:01:28,400 --> 00:01:32,090
Converting every label to a unique integer starting from zero.

15
00:01:36,280 --> 00:01:40,720
So as you can see, this returns a a series of integers from zero up to four.

16
00:01:45,190 --> 00:01:50,070
The next step is to take what we got above and to assign it to a new column called Targets.

17
00:01:55,240 --> 00:01:58,090
The next step is to split our data into train and test.

18
00:02:03,290 --> 00:02:07,850
The next step is to compute the TF IDF representation of our training and test sets.

19
00:02:14,000 --> 00:02:17,390
The next step is to assign the targets to why trained and why test.

20
00:02:23,460 --> 00:02:27,780
The next step is to get the number of classes, which is the maximum target plus one.

21
00:02:34,670 --> 00:02:39,860
The next step is to get the number of dimensions in our data, which is the number of columns in X.

22
00:02:45,560 --> 00:02:47,450
The next step is to build our model.

23
00:02:48,020 --> 00:02:52,880
Note that this is exactly like our logistic regression model, but with one more dense in the middle,

24
00:02:53,900 --> 00:02:56,630
you'll see that I've chosen 300 hidden units.

25
00:02:57,110 --> 00:02:58,070
This is arbitrary.

26
00:02:58,730 --> 00:03:02,180
So please play around with this value on your own as an exercise.

27
00:03:02,780 --> 00:03:05,930
You'll also notice that have specified the activation function as well.

28
00:03:05,930 --> 00:03:09,110
You again, please try others on your own.

29
00:03:09,920 --> 00:03:13,790
See if you can improve the result and post your solution on the Q&A.

30
00:03:19,760 --> 00:03:23,690
The next step is to call motto that summary to look at the structure of our model.

31
00:03:28,920 --> 00:03:33,300
So notice that the first dense layer has many more parameters than what we've seen before.

32
00:03:33,840 --> 00:03:36,180
This is about seven to eight million weights.

33
00:03:40,930 --> 00:03:46,840
The next step is to call model that compile notice our use of the sparse, categorical cross entropy

34
00:03:47,260 --> 00:03:48,940
where we set from logic to true.

35
00:03:54,960 --> 00:03:59,130
The next step is to convert our to free of matrices into Nampai arrays.

36
00:04:05,340 --> 00:04:08,670
The next step is to call the fit method to find our model parameters.

37
00:04:17,050 --> 00:04:19,209
The next step is to plot the loss for Epoch.

38
00:04:24,690 --> 00:04:26,760
OK, so the last report looks good.

39
00:04:27,540 --> 00:04:32,150
Notice that the validation loss is not as good as the train loss, which is expected.

40
00:04:35,900 --> 00:04:38,480
The next step is to plot the accuracy for Epoch.

41
00:04:43,490 --> 00:04:49,640
Again, notice how the validation accuracy is not as good as the train accuracy, which again is expected.

42
00:04:53,350 --> 00:04:56,140
The next step is to check a histogram of our labels.

43
00:05:01,510 --> 00:05:07,150
So since our labels are relatively balanced, there's no real need to check the AUC or the F1.

44
00:05:12,260 --> 00:05:17,840
So as a final exercise for this lecture, please implement a neural network for our sentiment analysis

45
00:05:17,840 --> 00:05:20,090
dataset, as you recall.

46
00:05:20,120 --> 00:05:25,070
One thing we did previously was to interpret that model, and the results made a lot of sense.

47
00:05:25,580 --> 00:05:30,290
We saw that the most positive weights were assigned to positive words, while the most negative weights

48
00:05:30,290 --> 00:05:31,910
were assigned to negative words.

49
00:05:32,540 --> 00:05:37,940
Consider by yourself whether or not a neural network would have such a nice interpretability.