1
00:00:11,070 --> 00:00:16,350
The next step is to create a function that will sample a word, given a dictionary of probabilities.

2
00:00:17,010 --> 00:00:21,600
As you recall, from what we just did, this dictionary will be structured as follows.

3
00:00:22,170 --> 00:00:27,000
The key will be each possible word and the value will be its corresponding probability.

4
00:00:27,780 --> 00:00:32,369
As mentioned earlier, understanding this is one of the main goals of this exercise.

5
00:00:32,790 --> 00:00:35,550
So do your best to implement this function yourself.

6
00:00:37,020 --> 00:00:42,240
Furthermore, as you recall, there are various ways to implement this, but using the method I'm about

7
00:00:42,240 --> 00:00:44,880
to show you makes use of first principles.

8
00:00:45,480 --> 00:00:50,550
All you really have to understand in order to understand this is the picture we drew earlier.

9
00:00:53,490 --> 00:00:58,560
OK, so as mentioned, the first step is to draw a sample from the uniform distribution.

10
00:00:59,070 --> 00:01:00,450
We'll call this P0.

11
00:01:01,110 --> 00:01:03,360
This will be a number between zero and one.

12
00:01:05,120 --> 00:01:10,340
The next step is to create a variable called cumulative, which will store the cumulative sum of the

13
00:01:10,340 --> 00:01:12,800
probabilities we encounter in the dictionary.

14
00:01:14,740 --> 00:01:17,920
The next step is to live through each item in the dictionary.

15
00:01:18,640 --> 00:01:23,410
So again, he is the token we might sample and pose its corresponding probability.

16
00:01:25,570 --> 00:01:31,810
Inside the loop, we will first increment the cumulative sum by PEA, the next step is to check whether

17
00:01:31,830 --> 00:01:34,870
P0 is less than the current cumulative sum.

18
00:01:35,470 --> 00:01:41,530
If it is, we should return the current token tea again and draw a picture to convince yourself that

19
00:01:41,530 --> 00:01:44,650
this works and follows the process we described earlier.

20
00:01:46,810 --> 00:01:53,170
As a final sanity check note that we have an a circle outside the loop since our function should always

21
00:01:53,170 --> 00:01:55,810
return a sample, we should never reach this line.

22
00:01:56,530 --> 00:02:01,480
If we do reach this line, then our program will raise an exception and we'll know that we've done something

23
00:02:01,480 --> 00:02:02,230
incorrect.

24
00:02:08,030 --> 00:02:13,490
Now that we know how to sample a single word from a probability dictionary, the next step is to implement

25
00:02:13,490 --> 00:02:15,470
a function to generate poems.

26
00:02:16,160 --> 00:02:19,310
This function will generate four lines of a poem at a time.

27
00:02:21,390 --> 00:02:26,070
So we start by answering a loop that runs four times inside the loop.

28
00:02:26,100 --> 00:02:31,080
Well, initialize an empty list called sentence, which will store the tokens we generate.

29
00:02:34,930 --> 00:02:41,140
The first step will be to generate the initial word since our dictionary called initial is already a

30
00:02:41,140 --> 00:02:47,920
word probability dictionary, we can pass this into the sample word function we'll call the result zero.

31
00:02:49,150 --> 00:02:52,870
The next step is to append W0 to our sentence list.

32
00:02:56,850 --> 00:03:00,180
The next step is to sample the second word in our sentence.

33
00:03:00,990 --> 00:03:04,740
As you recall, this should make use of our dictionary called First Order.

34
00:03:05,730 --> 00:03:12,360
Now we want this to depend on the first word we generated, so we'll index First Order using zero.

35
00:03:13,140 --> 00:03:16,290
This will again give us back a probability dictionary.

36
00:03:16,890 --> 00:03:20,280
We can then pass this into our simple word function once again.

37
00:03:21,150 --> 00:03:25,330
This will give us the second word in the sentence, which we will call W1.

38
00:03:26,460 --> 00:03:30,270
The next step is to then appended W1 to our sentence list.

39
00:03:34,630 --> 00:03:40,600
In the next block of code, we will continue to generate new words until we reach the special end token.

40
00:03:41,380 --> 00:03:47,440
So we'll begin by entering an infinite loop inside a loop, we will use our second order dictionary

41
00:03:47,440 --> 00:03:49,930
indexed by zero and one.

42
00:03:51,160 --> 00:03:55,960
As you recall, the keys to this dictionary are two pools of two previous words.

43
00:03:57,130 --> 00:04:02,710
Again, this gives us back a probability dictionary, which we can then pass into the sample word function.

44
00:04:03,430 --> 00:04:04,990
We'll call the result W2.

45
00:04:06,340 --> 00:04:11,620
The next step is to check whether or not W2 is equal to our fake and the token.

46
00:04:12,250 --> 00:04:16,810
If it is, we don't want to print this out, but simply ends the loop by calling break.

47
00:04:17,560 --> 00:04:23,290
Otherwise, if W2 is a real word, we will append W2 to our sentence list.

48
00:04:25,140 --> 00:04:31,530
The final step in this loop is to update the variables zero and W1 one, which represent the most recent

49
00:04:31,530 --> 00:04:33,750
previous two words in our sentence.

50
00:04:35,070 --> 00:04:38,970
Once we are outside the loop, we have finished generating one line of a poem.

51
00:04:39,720 --> 00:04:44,130
At this point, we print the line by joining the tokens into a single string.

52
00:04:50,720 --> 00:04:53,450
OK, so the next step is to actually try our function.

53
00:04:53,720 --> 00:04:55,010
So let's run this.

54
00:05:00,440 --> 00:05:07,970
OK, so here's the first verse of our poem I went to bed alone and left me might just as empty, but

55
00:05:07,970 --> 00:05:13,400
it isn't as if and that's not all the money goes so fast you couldn't call it living for.

56
00:05:13,400 --> 00:05:15,200
It ain't OK.

57
00:05:15,230 --> 00:05:17,360
So I think that's a pretty convincing poem.

58
00:05:17,870 --> 00:05:18,890
Let's try this again.

59
00:05:23,580 --> 00:05:25,350
One level higher than the Earth.

60
00:05:25,770 --> 00:05:30,120
She has her spiel on every single lizard with whose vast wheels?

61
00:05:30,330 --> 00:05:31,200
It's to say.

62
00:05:32,790 --> 00:05:34,710
OK, so let's try this one more time.

63
00:05:39,000 --> 00:05:47,010
I know up to pass a winter eve to make them out, and then someone actually, let's try this one more

64
00:05:47,010 --> 00:05:47,520
time.

65
00:05:51,610 --> 00:05:54,530
It's a, you know, a person so related to herself.

66
00:05:54,950 --> 00:05:56,960
You take the polish off the ground.

67
00:05:57,410 --> 00:06:00,140
No, I don't follow you in leaves.

68
00:06:00,140 --> 00:06:01,910
No step had trodden black.

69
00:06:03,140 --> 00:06:08,090
OK, so I think in most of these cases, the poems that were generated have been pretty convincing.

70
00:06:15,110 --> 00:06:19,190
So the last thing I want to mention in this lecture is an optional exercise.

71
00:06:19,850 --> 00:06:25,700
Recall that one advantage of storing our probability arrays as dictionaries is that they take advantage

72
00:06:25,700 --> 00:06:26,600
of sparsity.

73
00:06:27,140 --> 00:06:29,150
We know that there will be lots of zeros.

74
00:06:29,750 --> 00:06:35,690
Recall that zero probability means something which is impossible, but for dictionaries, we don't have

75
00:06:35,690 --> 00:06:38,270
to store any impossible transitions.

76
00:06:38,690 --> 00:06:44,840
And thus, there are no zeros since we already know that any value which does not exist in the dictionary

77
00:06:44,990 --> 00:06:46,370
has zero probability.

78
00:06:47,270 --> 00:06:51,170
Thus, storing things this way is possibly more space efficient.

79
00:06:52,010 --> 00:06:54,980
So let's suppose that we call our vocabulary size V.

80
00:06:55,520 --> 00:06:58,490
We know that PI should be a 1D array of size v.

81
00:06:58,970 --> 00:07:05,780
We know that A1 should be a 2D array of size V by V and A2 should be a 3D array of size, V by V by

82
00:07:05,780 --> 00:07:06,290
V.

83
00:07:07,580 --> 00:07:10,680
And this is the case whether they contain mostly zeros or not.

84
00:07:11,090 --> 00:07:12,980
We call this a dense representation.

85
00:07:13,910 --> 00:07:21,860
So your exercises this firstly determine the vocabulary size, which is V, then add up all the probability

86
00:07:21,860 --> 00:07:25,130
values we actually stored in our three dictionaries above.

87
00:07:26,030 --> 00:07:29,750
Then compare what we actually stored to what we would have stored.

88
00:07:29,780 --> 00:07:31,760
How do we use full dense arrays?

89
00:07:33,020 --> 00:07:35,600
It would be best to express these as a percentage.

90
00:07:35,990 --> 00:07:42,140
So, for example, you could say for a two, only two percent of the possible 1000000 values were stored.

91
00:07:44,820 --> 00:07:50,100
Also, as a bonus note for this lecture, notice another advantage of using dictionaries.

92
00:07:50,670 --> 00:07:56,070
One of the more tedious parts of doing an LP is that we have to keep converting back and forth between

93
00:07:56,070 --> 00:07:58,080
integer and text representation.

94
00:07:58,830 --> 00:08:04,590
But since the key to a dictionary can be any kind of object, there's no need to convert our text into

95
00:08:04,590 --> 00:08:05,700
integers first.

96
00:08:06,240 --> 00:08:11,010
Therefore, there's no need to maintain a word to index dictionary, and we can simply use the words

97
00:08:11,010 --> 00:08:13,230
themselves for the entire script.

98
00:08:19,130 --> 00:08:23,300
All right, so there's just one more exercise I want to mention before we end this lecture.

99
00:08:24,920 --> 00:08:30,890
As you've seen, we use two different strategies to build our probability dictionaries, in particular

100
00:08:30,890 --> 00:08:32,630
for the initial state distribution.

101
00:08:32,659 --> 00:08:35,000
We created a dictionary of counts directly.

102
00:08:35,690 --> 00:08:39,140
We accumulated the counts as we traverse the dataset.

103
00:08:39,890 --> 00:08:44,620
But for the First Order and Second Order models, we did something else for these.

104
00:08:44,630 --> 00:08:50,570
We simply collected all the possible next words in a list only after we finished looping through the

105
00:08:50,570 --> 00:08:51,380
whole dataset.

106
00:08:51,650 --> 00:08:54,260
Did we convert them into dictionaries of counts.

107
00:08:55,040 --> 00:08:57,350
However, note that this was not necessary.

108
00:08:57,740 --> 00:09:01,610
We could have just created the dictionaries of accounts directly here as well.

109
00:09:02,120 --> 00:09:09,530
So your exercises this modify this code so that we no longer need to collect each possible next word

110
00:09:09,530 --> 00:09:10,400
in a list.

111
00:09:10,970 --> 00:09:15,800
What you should end up with is that there is no longer any need for the list to predict function.

112
00:09:16,850 --> 00:09:20,090
So please try these exercises and I'll see you in the next lecture.

