1
00:00:11,070 --> 00:00:16,350
OK, so as mentioned, the next step is to populate the A's in the Pies with the appropriate counts

2
00:00:16,350 --> 00:00:17,370
from the training set.

3
00:00:18,120 --> 00:00:23,130
Now, because we're going to do the same procedure for both sets of A's and pies, it's useful to put

4
00:00:23,130 --> 00:00:26,760
this into a function and then use the same function for both classes.

5
00:00:27,390 --> 00:00:31,330
So we'll create a function called compute counts as input.

6
00:00:31,350 --> 00:00:37,170
It's going to take in three arguments the text for the specific class and the pointer to a pie.

7
00:00:38,040 --> 00:00:44,430
Note that because a prior objects, any modifications we made to them inside this function will be remembered

8
00:00:44,430 --> 00:00:45,870
outside the function as well.

9
00:00:47,100 --> 00:00:52,710
The important part of this is to note that the data we pass in will be only for a single class at a

10
00:00:52,710 --> 00:00:53,310
time.

11
00:00:55,430 --> 00:01:02,000
So inside the function will begin by looping through a variable called text as it each element of this

12
00:01:02,030 --> 00:01:05,390
is a line of a poem represented by a list events.

13
00:01:07,040 --> 00:01:10,520
The next step is to set a variable called the last index to none.

14
00:01:11,150 --> 00:01:14,460
This will help us keep track of whether we are populating a pie.

15
00:01:15,920 --> 00:01:22,760
The next step is to live through each index in our list of tokens represented as it's inside a loop.

16
00:01:22,790 --> 00:01:25,370
We'll check whether or not last night is none.

17
00:01:26,120 --> 00:01:29,540
If it is, then we know that we're at the beginning of a sentence.

18
00:01:29,990 --> 00:01:35,060
If that is the case, we would like to populate PI, which is for the initial state distribution.

19
00:01:38,220 --> 00:01:44,040
So at this point, we index pie at the index RDX, and we increment this value by one.

20
00:01:47,180 --> 00:01:48,970
The next step is the Ellis Block.

21
00:01:49,580 --> 00:01:54,890
This is when last study is not done, and so we're dealing with the transition from one word to the

22
00:01:54,890 --> 00:01:55,520
next.

23
00:01:56,300 --> 00:02:02,930
What we would like to do here is index a at the row last three weeks and add the column index and then

24
00:02:02,930 --> 00:02:04,520
increment the value by one.

25
00:02:05,510 --> 00:02:08,960
As you recall, the first index to a is the starting word.

26
00:02:09,020 --> 00:02:11,540
And the second index today is the next word.

27
00:02:14,210 --> 00:02:20,540
The final step in the inner loop is to update a last 30 x 280 x so that it has the correct a value on

28
00:02:20,540 --> 00:02:22,130
the next iteration of the loop.

29
00:02:26,300 --> 00:02:32,420
OK, so once we've defined our computer counts function, the next step is to call the function the

30
00:02:32,420 --> 00:02:36,710
first time we call the function, we're going to filter the trained text in list.

31
00:02:37,100 --> 00:02:41,720
So include only the entries where the corresponding value in Y train is zero.

32
00:02:42,350 --> 00:02:46,340
That is, we only want the samples for one class, which is class zero.

33
00:02:47,030 --> 00:02:49,970
This will be used to populate A0 and PI zero.

34
00:02:51,990 --> 00:02:57,360
The second time we call a function, we'll do the same thing, except the label is one and we want to

35
00:02:57,360 --> 00:02:59,130
populate A1 and pie one.

36
00:03:00,000 --> 00:03:02,610
You may want to look at the list comprehension slowly.

37
00:03:02,700 --> 00:03:04,950
If you don't understand what it does right away.

38
00:03:05,580 --> 00:03:09,750
Alternatively, you may want to write your own version if this does not make sense.

39
00:03:15,830 --> 00:03:19,100
So at this point, there is in the pies contain on the counts.

40
00:03:19,490 --> 00:03:21,860
We, of course, need these to be probabilities.

41
00:03:22,430 --> 00:03:27,860
So the next step is to divide the A's in the pies by the appropriate numbers so that they are valid

42
00:03:27,860 --> 00:03:28,910
probabilities.

43
00:03:29,510 --> 00:03:32,570
As you recall, each row of they must sum to one.

44
00:03:33,260 --> 00:03:35,810
Therefore, we need to divide a by its row.

45
00:03:35,810 --> 00:03:41,210
Sometimes we can do so by setting axis equals one and keep them as equals true.

46
00:03:42,140 --> 00:03:47,390
Note that chiefdoms equals true, ensures that the sum is still two dimensional, which is required

47
00:03:47,390 --> 00:03:52,070
for the division to broadcast correctly in Nampai for PI.

48
00:03:52,100 --> 00:03:55,160
There is no need to do this since it's just a 1D array.

49
00:04:01,890 --> 00:04:07,710
The next step is to take the log of our A's and her pies, as you recall, we normally work with log

50
00:04:07,710 --> 00:04:08,580
probabilities.

51
00:04:08,820 --> 00:04:11,510
And thus we have no need for the originals.

52
00:04:18,329 --> 00:04:24,540
The next step is to compute the priors will begin by counting how many samples belong to class zero

53
00:04:24,720 --> 00:04:28,260
and how many samples belong to Class one in the training set.

54
00:04:29,610 --> 00:04:33,270
The next step is to compute the total, which is just the length of the train.

55
00:04:34,530 --> 00:04:39,450
The next step is to compute the prior probabilities, which we'll call P0 and P1.

56
00:04:40,350 --> 00:04:46,320
As usual, a reasonable estimate for these is simply the proportion of samples belonging to each class.

57
00:04:47,010 --> 00:04:52,740
So P zero is just the number of data points belonging to Class zero, divided by the total number of

58
00:04:52,740 --> 00:04:53,580
data points.

59
00:04:54,840 --> 00:04:58,440
The next step is again to take the log of P0 and P1.

60
00:04:58,770 --> 00:05:01,890
Since these are what will be using to do our computations.

61
00:05:02,730 --> 00:05:06,930
Note that we'll also print out P zero and P one just so we know what they are.

62
00:05:11,480 --> 00:05:19,010
OK, so we can see that P0 is about 33 percent and P one is about sixty six percent because of this,

63
00:05:19,010 --> 00:05:22,010
it wouldn't make sense to use the maximum likelihood method.

64
00:05:22,490 --> 00:05:26,780
Instead, we should look at the posterior, which corresponds to the map method.

65
00:05:27,620 --> 00:05:32,840
Note also that because the classes are slightly imbalanced, we may want to use a metric other than

66
00:05:32,840 --> 00:05:35,480
the accuracy to evaluate a classifier.

67
00:05:40,090 --> 00:05:42,790
OK, so the next step is to build our classifier.

68
00:05:43,600 --> 00:05:49,270
Now normally we build classifiers to have a circuit learn kind of interface with methods called fit

69
00:05:49,270 --> 00:05:50,080
and predict.

70
00:05:50,770 --> 00:05:55,090
Note that we could have done that, but we've already done the fitting part by finding the A's in the

71
00:05:55,090 --> 00:05:55,450
pot.

72
00:05:56,320 --> 00:06:01,270
If you'd like to try it as an exercise, you could restructure this code so that the model does have

73
00:06:01,270 --> 00:06:02,020
a fit method.

74
00:06:02,320 --> 00:06:04,360
And we do the training within that method.

75
00:06:06,810 --> 00:06:09,540
OK, so the first step will be to look at the constructor.

76
00:06:10,170 --> 00:06:16,710
The constructor will take in three arguments a list of log A's, a list of log pies and a list of log

77
00:06:16,710 --> 00:06:18,840
priors inside.

78
00:06:18,840 --> 00:06:22,200
The constructor will simply save these as attributes of the object.

79
00:06:23,430 --> 00:06:28,690
We'll also assign the number of classes to an attribute called K, which is just equal to the size of

80
00:06:28,690 --> 00:06:30,030
the list that we passed in.

81
00:06:34,990 --> 00:06:40,540
The next step is to define a function called compute log likelihood, this will take as input a line

82
00:06:40,540 --> 00:06:44,440
of text represented by a list of integers and a class.

83
00:06:45,070 --> 00:06:52,300
As you recall, the class tells us which Markov model to use since we have one for every class inside.

84
00:06:52,300 --> 00:06:57,850
The function will begin by retrieving log and log pi, by indexing our lists by the class.

85
00:07:02,310 --> 00:07:05,340
The next step is to live through each index in the input.

86
00:07:06,180 --> 00:07:11,220
As usual, we're going to follow the pattern of creating a variable called the last study X so that

87
00:07:11,220 --> 00:07:13,770
we know whether or not we're at the beginning of a sentence.

88
00:07:14,910 --> 00:07:18,830
We'll also initialize log probe to zero, which will hold the final answer.

89
00:07:21,710 --> 00:07:25,050
The next step is to enter our loop inside a loop.

90
00:07:25,070 --> 00:07:28,010
We're going to check whether or not last RDX is none.

91
00:07:28,790 --> 00:07:31,700
If it is, then we know it's the first token in the sentence.

92
00:07:31,970 --> 00:07:35,360
So we index log pie and add the result to log prop.

93
00:07:38,280 --> 00:07:44,310
Otherwise, last study is not known, and we are not at the beginning of a sentence in this case, we

94
00:07:44,310 --> 00:07:47,810
need to use our state transition matrix again.

95
00:07:47,820 --> 00:07:53,250
We index log a at least 30 eccentrics and we add the result to two log prob.

96
00:07:57,590 --> 00:08:01,880
The next step is to increment last three weeks for the next iteration of the Loop.

97
00:08:04,700 --> 00:08:07,640
When we're outside the loop, we simply return leg crab.

98
00:08:12,040 --> 00:08:18,130
The next step is to define the predict function, the input into this function as the list of input

99
00:08:18,130 --> 00:08:24,130
sequences inside the function will begin by initializing an array to store the predictions.

100
00:08:24,910 --> 00:08:31,150
The next step is to live through each input note that we use enumerate so that we get the index as well.

101
00:08:33,370 --> 00:08:38,320
Inside the loop, we compute the posterior for each class using a list comprehension.

102
00:08:39,250 --> 00:08:45,430
So for the loop, we use the index see and we leave through each integer from zero up to K minus one.

103
00:08:46,420 --> 00:08:52,180
Inside the loop, we compute the log likelihood under the Class C and add it to the log prior for the

104
00:08:52,180 --> 00:08:52,960
Class C.

105
00:08:53,830 --> 00:08:57,310
This will give us a list of posteriors one for each class.

106
00:08:58,330 --> 00:09:03,940
The next step is to obtain a prediction, which is simply the ARG max of the list of posteriors.

107
00:09:06,170 --> 00:09:13,430
The next step is to store the prediction in our array of predictions at the index of the final step,

108
00:09:13,460 --> 00:09:16,880
once we are outside the loop is to return the predictions.

109
00:09:24,900 --> 00:09:31,110
OK, so at this point, the hard work is basically over, the next step is to instantiate a classifier

110
00:09:31,110 --> 00:09:31,830
object.

111
00:09:32,520 --> 00:09:38,700
As you recall, we'd like to pass in a list of log A's, a list of log PIs and a list of log priors.

112
00:09:39,480 --> 00:09:45,810
Note that these must be passed in in the correct order, since each position in the list must correspond

113
00:09:45,810 --> 00:09:47,310
to the correct class label.

114
00:09:48,090 --> 00:09:53,280
That is, the log prior at index zero in this list should correspond with the label zero.

115
00:09:53,940 --> 00:09:57,900
Since we've done everything in corresponding order, this doesn't really matter too much.

116
00:10:03,900 --> 00:10:08,580
OK, so the next step is to use our classifier to obtain a prediction for the train sets.

117
00:10:09,090 --> 00:10:10,620
We'll call the result p train.

118
00:10:11,520 --> 00:10:16,410
We'll also print out the accuracy, which is just the mean of p train equals equals y train.

119
00:10:21,350 --> 00:10:24,550
OK, so as you can see, we get nearly a perfect score.

120
00:10:28,520 --> 00:10:30,950
The next step is to do the same thing for the test that.

121
00:10:36,160 --> 00:10:40,360
OK, so as you can see, we do pretty well, but not as well as for the train set.

122
00:10:44,970 --> 00:10:50,550
Now, as you recall, our data set has imbalanced classes, so it's helpful to look at some metrics

123
00:10:50,550 --> 00:10:52,170
that take this into account.

124
00:10:52,890 --> 00:10:58,200
We'll begin by importing the confusion matrix function and the F1 score function from Saikia.

125
00:10:58,200 --> 00:10:58,560
Learn.

126
00:11:03,830 --> 00:11:07,370
The next step is to compute the confusion matrix for the train set.

127
00:11:11,880 --> 00:11:17,340
OK, so as expected, most of the samples fall along the diagonal, leading to a higher accuracy.

128
00:11:21,140 --> 00:11:24,260
The next step is to compute the confusion matrix for the tests set.

129
00:11:27,970 --> 00:11:33,010
OK, so this time we have more samples off the diagonal, which leads to a lower accuracy.

130
00:11:36,520 --> 00:11:39,670
The next step is to compute the F1 score for the train set.

131
00:11:44,370 --> 00:11:47,130
So, as expected, the train scores nearly one.

132
00:11:50,890 --> 00:11:53,830
The next step is to compute the F1 score for the test set.

133
00:11:57,590 --> 00:12:01,400
So again, we perform decently well, but not as well as for the train set.

