1
00:00:02,170 --> 00:00:06,610
Hey, everyone, and welcome back to this class natural language processing in Python.

2
00:00:11,590 --> 00:00:16,810
In the previous lecture, we looked at the model that we will use to decrypt and encoded message.

3
00:00:17,350 --> 00:00:23,020
We know that if our decrypted message is correct, it should have a higher probability than an incorrectly

4
00:00:23,020 --> 00:00:24,160
decrypted message.

5
00:00:24,730 --> 00:00:27,340
So here's some simple code to approach this problem.

6
00:00:28,090 --> 00:00:31,930
First, we call a function to get all possible substitution ciphers.

7
00:00:32,500 --> 00:00:34,360
Then we initialize some variables.

8
00:00:35,080 --> 00:00:38,110
The best log likelihood is initialized to minus infinity.

9
00:00:38,500 --> 00:00:40,900
And the best message is initialized to none.

10
00:00:41,590 --> 00:00:44,610
Then we loop through each map inside the loop.

11
00:00:44,620 --> 00:00:47,800
We use the current map to decode the encrypted message.

12
00:00:48,310 --> 00:00:51,940
Then we calculate the log likelihood of the decrypted message.

13
00:00:53,220 --> 00:00:57,870
Then we check if that log likelihood is better than our current best log likelihood.

14
00:00:58,470 --> 00:01:04,319
If it is, then we make this message our new best message and we make this log likelihood, our current

15
00:01:04,319 --> 00:01:06,870
best log likelihood in this way.

16
00:01:07,020 --> 00:01:11,010
We will have found the maximum log likelihood message by the end of this loop.

17
00:01:11,760 --> 00:01:12,840
But here's one problem.

18
00:01:13,320 --> 00:01:15,660
How do we find all possible descriptions?

19
00:01:20,790 --> 00:01:23,250
In fact, this is an infeasible task.

20
00:01:23,910 --> 00:01:26,040
Consider that we have 26 letters.

21
00:01:26,580 --> 00:01:30,450
We have 26 possible positions that they can appear in our mapping.

22
00:01:31,140 --> 00:01:36,270
So what is the total number of combinations we would have to try if we took a brute force approach?

23
00:01:36,930 --> 00:01:40,890
This is equal to 26 times, 25 times 24 and so forth.

24
00:01:41,400 --> 00:01:46,740
In other words, that's 26 factorial, which is about four times 10 to the power 26.

25
00:01:47,580 --> 00:01:53,430
This is not a practical amount of combinations to try, even if each possibility could be checked in

26
00:01:53,430 --> 00:01:54,690
just one nanosecond.

27
00:01:55,050 --> 00:01:59,220
This would still take about twelve point seven billion years to complete.

28
00:02:04,290 --> 00:02:10,410
Luckily, there is a better solution genetic algorithms in order to have better intuition about genetic

29
00:02:10,410 --> 00:02:11,009
algorithms.

30
00:02:11,280 --> 00:02:17,050
It's helpful to understand a little bit about how genetics and evolution works at a high level.

31
00:02:17,070 --> 00:02:22,500
The basic idea is this each generation of individuals will produce offspring.

32
00:02:23,220 --> 00:02:26,130
You can think of this as like a parent and child relationship.

33
00:02:26,820 --> 00:02:29,820
The parent passes on their DNA to their children.

34
00:02:31,590 --> 00:02:36,300
That's why, for example, if you have black hair, then your children will also have black hair.

35
00:02:36,660 --> 00:02:40,620
If you have curly hair, then your children will also have curly hair and so on.

36
00:02:41,610 --> 00:02:46,050
However, in each generation, the child isn't an exact copy of the parent.

37
00:02:46,500 --> 00:02:47,910
Things always change a tiny bit.

38
00:02:49,080 --> 00:02:51,450
Now, it's important to be clear about what we're talking about.

39
00:02:51,930 --> 00:02:56,850
If we think about humans, this doesn't really apply because human beings are made from two parents.

40
00:02:57,180 --> 00:03:01,260
So the child's DNA is a mixture of the DNA from both parents.

41
00:03:01,830 --> 00:03:07,530
So we have to think on a lower level the level of individual cells or the kinds of organisms where the

42
00:03:07,530 --> 00:03:09,990
offspring can come from just a single parent.

43
00:03:15,170 --> 00:03:18,170
So what happens when one of these organisms creates offspring?

44
00:03:18,890 --> 00:03:23,930
Well, since there is only a single parent, the offspring takes DNA only from that parent.

45
00:03:24,710 --> 00:03:27,470
The offspring is therefore a copy of the parent.

46
00:03:28,190 --> 00:03:29,780
However, mistakes can happen.

47
00:03:31,130 --> 00:03:37,100
You can think of DNA as a string, just like we have in computer programming, but this string is only

48
00:03:37,100 --> 00:03:41,240
allowed to contain a four letters, whereas in English we have 26 letters.

49
00:03:42,900 --> 00:03:47,130
The language of DNA has only a four letter, Alphabet, ATC and G.

50
00:03:47,730 --> 00:03:53,370
Well, what happens when DNA gets copied is that mistakes can happen, and the DNA string from the parent

51
00:03:53,640 --> 00:03:55,830
may not be an exact copy of the child.

52
00:04:00,930 --> 00:04:03,150
There are several kinds of mistakes that can occur.

53
00:04:03,840 --> 00:04:06,810
Sometimes one letter might be replaced by some other letter.

54
00:04:07,260 --> 00:04:12,990
So Atci becomes 8g, for example, this is called a substitution.

55
00:04:13,740 --> 00:04:15,870
Another kind of mistake is called an insertion.

56
00:04:16,290 --> 00:04:20,010
So atci might become 8c tg.

57
00:04:20,880 --> 00:04:25,860
Another kind of mistake is called a deletion, so atci might become 8g.

58
00:04:26,700 --> 00:04:32,760
The idea is that every time DNA gets copied, either from a parent cell to a child cell or from a parent

59
00:04:32,760 --> 00:04:38,700
organism to a child organism, there is a chance that one or more of these errors can occur in a multiple

60
00:04:38,700 --> 00:04:40,380
places along the DNA string.

61
00:04:41,130 --> 00:04:42,780
These are called mutations.

62
00:04:47,910 --> 00:04:50,760
Of course, the probability of such errors is very small.

63
00:04:50,850 --> 00:04:53,670
Otherwise, offspring would not resemble their parents at all.

64
00:04:53,940 --> 00:04:56,310
In fact, they would probably have trouble surviving.

65
00:04:57,000 --> 00:05:03,360
That's why the process of evolution is so slow, depending on the context evolutionary changes happen

66
00:05:03,360 --> 00:05:05,880
on the scale of about one million years.

67
00:05:06,720 --> 00:05:11,880
To put that into context, the average human lifespan is less than 100 years, which is just a tiny

68
00:05:11,880 --> 00:05:12,660
fraction of that.

69
00:05:13,380 --> 00:05:19,170
But over time, these mutations in DNA will build up, and the changes that have the highest fitness

70
00:05:19,380 --> 00:05:20,850
will be the ones that persevere.

71
00:05:21,660 --> 00:05:27,800
So one small change leads to the next until the new organism is now very different from its ancestor

72
00:05:27,810 --> 00:05:29,010
one million years ago.

73
00:05:34,000 --> 00:05:39,700
The idea of fitness and natural selection is the guiding principle behind genetic algorithms.

74
00:05:40,240 --> 00:05:43,540
Mutations that can happen can be either bad or good.

75
00:05:44,110 --> 00:05:47,980
We have many examples of bad mutations we call these genetic diseases.

76
00:05:48,520 --> 00:05:54,130
Luckily, our technology has progressed enough so that we have now just started treating genetic diseases

77
00:05:54,130 --> 00:05:55,090
with DNA editing.

78
00:05:55,750 --> 00:05:59,830
In general, however, what happens when a bad genetic mutation occurs?

79
00:06:00,340 --> 00:06:05,590
If we don't have the technology to treat that individual, then they will have a smaller chance of survival

80
00:06:05,770 --> 00:06:07,240
and to pass on their genes.

81
00:06:07,870 --> 00:06:13,810
Therefore, from an objective standpoint, unfit genes have a smaller chance of propagating to the next

82
00:06:13,810 --> 00:06:14,530
generation.

83
00:06:15,400 --> 00:06:20,800
On the other end of the spectrum, genes that propagate more will have more copies of themselves in

84
00:06:20,800 --> 00:06:21,790
the next generation.

85
00:06:22,480 --> 00:06:24,040
This is almost tautological.

86
00:06:24,280 --> 00:06:28,540
If you have more children, then more of your DNA will exist in the next generation.

87
00:06:29,230 --> 00:06:32,350
So this is biology perspective on what it means to be fit.

88
00:06:33,860 --> 00:06:37,490
In the long term, what's happening can be put in rather simplistic terms.

89
00:06:38,000 --> 00:06:41,000
Genes that copy themselves more well, just keep doing that.

90
00:06:41,630 --> 00:06:47,300
Genes, which mutated but happened to be unlucky and get a bad mutation will have less opportunity to

91
00:06:47,300 --> 00:06:51,140
copy themselves, and they won't make it to subsequent generations.

92
00:06:51,860 --> 00:06:57,740
In the end, theoretically, we should be left with only the best mutations and the most fit DNA.

93
00:07:02,690 --> 00:07:05,310
Now, that can be pretty abstract if you're not a biologist.

94
00:07:05,330 --> 00:07:09,170
So let's think about this in terms of a numerical optimization problem.

95
00:07:09,800 --> 00:07:12,140
Let's phrase this in terms of maximization.

96
00:07:12,590 --> 00:07:16,070
So on the horizontal axis, you have some number representing DNA.

97
00:07:16,580 --> 00:07:20,960
On the vertical axis, you have some number representing the fitness of that DNA.

98
00:07:21,560 --> 00:07:26,930
In other words, the fitness is a function of the DNA, which makes sense in the context of biological

99
00:07:26,930 --> 00:07:27,710
organisms.

100
00:07:29,260 --> 00:07:34,960
In order to evolve our DNA, we can mutate the existing DNA, which means in this case, moving a little

101
00:07:34,960 --> 00:07:36,340
bit left or a little bit right.

102
00:07:37,060 --> 00:07:40,660
We can actually create multiple offspring from the parent DNA at once.

103
00:07:41,410 --> 00:07:44,140
Then we can check the fitness of each of those offspring.

104
00:07:44,920 --> 00:07:50,200
As with biological and natural selection, the less fit offspring will die off and not make it to the

105
00:07:50,200 --> 00:07:55,840
next generation, whereas the Morphett offspring will persevere and become the new parents of the next

106
00:07:55,840 --> 00:07:56,500
generation.

107
00:07:58,330 --> 00:08:04,030
So you can imagine that as we keep mutating and only keeping the most fit mutations that if we keep

108
00:08:04,030 --> 00:08:08,860
iterating over this process, we will eventually end up at the point of maximum fitness.

109
00:08:10,880 --> 00:08:15,230
Now, this is assuming that we don't have to deal with complications such as the environment changing,

110
00:08:15,230 --> 00:08:16,790
which is the case in our example.

111
00:08:17,300 --> 00:08:22,820
So just to give you a simple example of that, consider animals that evolved to be great swimmers.

112
00:08:23,360 --> 00:08:29,060
Then suppose that the environment changes so that water disappears and the world is covered in dry land,

113
00:08:29,510 --> 00:08:35,000
then great swimming will no longer entail great fitness, and some other trait will lead to better fitness.

114
00:08:35,570 --> 00:08:37,190
In our example, which is simpler.

115
00:08:37,580 --> 00:08:39,799
No such environmental changes are possible.

116
00:08:44,820 --> 00:08:50,470
Abstractly, you can think of the process we just described in the context of optimizing any function.

117
00:08:50,940 --> 00:08:54,030
The concept of fitness and DNA are just analogies.

118
00:08:54,630 --> 00:08:58,980
This part of the lecture is optional, so you can feel free to skip it if you're not familiar with these

119
00:08:58,980 --> 00:08:59,640
concepts.

120
00:09:00,360 --> 00:09:03,600
If you have a machine learning background, then you should be fairly comfortable.

121
00:09:04,320 --> 00:09:09,780
So you might recognize the numerical optimization problem as something we often have to do in machine

122
00:09:09,780 --> 00:09:10,200
learning.

123
00:09:10,830 --> 00:09:15,870
You might be familiar with tools such as gradient descent, which tell us the exact direction to go

124
00:09:16,110 --> 00:09:18,960
to improve the value of the function we're trying to optimize.

125
00:09:19,530 --> 00:09:23,910
So you might ask why can't we use gradient descent for a decryption algorithm?

126
00:09:24,540 --> 00:09:29,550
And then the problem with that is in order to do gradient descent, we would have to be able to find

127
00:09:29,550 --> 00:09:30,120
the gradient.

128
00:09:30,840 --> 00:09:35,640
The gradient is the derivative of the objective with respect to the model parameters.

129
00:09:36,240 --> 00:09:42,750
In our case, the objective is the log likelihood of the decrypted sentence and the model parameters

130
00:09:42,750 --> 00:09:46,950
are the dictionary we use to map the coded letter to the unencrypted letter.

131
00:09:47,640 --> 00:09:53,760
Unfortunately, as you can see, the objective is not differential with respect to the model parameters.

132
00:09:54,180 --> 00:09:57,810
Thus, there is no gradient to be found and we can't do gradient descent.

133
00:10:02,910 --> 00:10:08,370
The next issue we have to contend with, and this part is not optional is how do we represent our model

134
00:10:08,370 --> 00:10:12,240
parameters as a DNA string that we can mutate on each round?

135
00:10:13,590 --> 00:10:18,710
Our DNA, unlike real DNA, cannot be comprised of just the four letters ATCI.

136
00:10:19,830 --> 00:10:25,170
But remember that our model is just a mapping from one letter to some other, possibly different letter

137
00:10:25,170 --> 00:10:26,010
in the alphabet.

138
00:10:26,700 --> 00:10:28,800
We can also think of this as a dictionary.

139
00:10:30,170 --> 00:10:34,700
In this way, we don't need to consider the keys of the dictionary, since these can just be the alphabet

140
00:10:34,700 --> 00:10:35,990
in alphabetical order.

141
00:10:36,500 --> 00:10:38,810
We only need to vary the values of the dictionary.

142
00:10:38,870 --> 00:10:42,980
In other words, the letters we want to map to now this can be a string.

143
00:10:43,700 --> 00:10:49,490
In fact, it's just the 26 letters of the alphabet organized in some order without any repetition.

144
00:10:50,180 --> 00:10:56,450
So our DNA is just the 26 letters of the alphabet, which can be in any order constrained by the fact

145
00:10:56,660 --> 00:10:58,400
that every letter must appear once.

146
00:10:58,610 --> 00:11:00,140
And no letter can be repeated.

147
00:11:00,860 --> 00:11:05,900
This is just the assumption of our model assuming that the code really does use the substitution cipher.

148
00:11:06,530 --> 00:11:10,280
Of course, if we had a different cipher, we would have to modify our DNA.

149
00:11:15,360 --> 00:11:21,480
From this, it's possible to write a function F that takes as input a DNA string and outputs a fitness

150
00:11:21,480 --> 00:11:21,960
value.

151
00:11:22,560 --> 00:11:26,670
This fitness value is just the log likelihood of the translated message.

152
00:11:27,180 --> 00:11:30,300
So at a high level, this is what our Function F looks like.

153
00:11:30,810 --> 00:11:36,330
Step number one is to take the DNA string and use this to decode the encrypted message.

154
00:11:37,020 --> 00:11:41,520
Step number two is to calculate the log likelihood of the decrypted message.

155
00:11:42,060 --> 00:11:43,920
Then the function returns this value.

156
00:11:44,700 --> 00:11:48,810
At this point, you should have a pretty good idea of how you would write this function in code.

157
00:11:53,980 --> 00:11:59,500
Actually, here's a good accounting exercise that you might want to do given an input DNA string.

158
00:11:59,590 --> 00:12:05,500
You should write a function that returns a dictionary which maps letters in alphabetical order, two

159
00:12:05,500 --> 00:12:07,870
letters in the order given by the DNA string.

160
00:12:08,380 --> 00:12:13,180
So, for example, in this DNA string, we have the letters FlexJobs and so on.

161
00:12:13,810 --> 00:12:16,600
You might recognize that this is the same mapping we had before.

162
00:12:17,410 --> 00:12:18,970
In this case, it should be mapped.

163
00:12:18,970 --> 00:12:23,720
L b should be mapped to X, C should be mapped, j and so on.

164
00:12:24,310 --> 00:12:31,300
Therefore, our dictionary should have a key a mapped to a value l a key be mapped to a value X. and

165
00:12:31,300 --> 00:12:31,900
so forth.

166
00:12:33,560 --> 00:12:39,530
So please make sure you can write such a function in Python code by yourself without any resources.

167
00:12:44,450 --> 00:12:48,380
At this point, we understand how to represent our DNA code in code.

168
00:12:49,040 --> 00:12:53,240
We also understand the high level picture of how a genetic algorithm should work.

169
00:12:53,720 --> 00:12:56,720
But what does it actually look like in code or pseudocode?

170
00:12:57,290 --> 00:12:59,300
That's what we'll discuss for the rest of this lecture.

171
00:13:00,380 --> 00:13:04,670
First, we have to consider what it even means to mutate our DNA in the first place.

172
00:13:05,570 --> 00:13:11,510
Earlier, we discussed this in the biological picture, where the A's, T's, C's and G's can be changed

173
00:13:11,510 --> 00:13:14,090
by insertion, deletion or substitution.

174
00:13:14,750 --> 00:13:18,980
But in our case, we have to ask the question Do these operations make sense?

175
00:13:19,910 --> 00:13:26,450
Recall that for us, we must have 26 input letters and 26 output letters are input letters are just

176
00:13:26,450 --> 00:13:28,250
the alphabet in alphabetical order.

177
00:13:28,880 --> 00:13:31,940
So it's the 26 output letters that form the DNA code.

178
00:13:33,470 --> 00:13:39,890
What happens if we do an insertion or a deletion, then the result will no longer be 26 letters.

179
00:13:40,370 --> 00:13:42,740
Therefore, these operations should not be allowed.

180
00:13:43,520 --> 00:13:47,810
What about substitution in this case after doing a substitution?

181
00:13:47,990 --> 00:13:50,510
We will have the same number of letters, so that's good.

182
00:13:51,080 --> 00:13:53,360
But remember that we want the letters to be unique.

183
00:13:53,870 --> 00:13:55,910
Therefore, this also cannot work.

184
00:14:01,120 --> 00:14:02,230
So what's the solution?

185
00:14:03,160 --> 00:14:07,660
In order to make sure we meet the constraints of our problem while still following the principle of

186
00:14:07,660 --> 00:14:09,790
mutation, we'll use swapping.

187
00:14:10,720 --> 00:14:14,530
Remember that for each generation, the mutation rate should be very small.

188
00:14:15,130 --> 00:14:20,080
Therefore, whenever we generate an offspring, there will be only one swap from the parents of the

189
00:14:20,080 --> 00:14:20,620
child.

190
00:14:21,310 --> 00:14:25,900
You can verify for yourself that swapping does not violate any of our constraints.

191
00:14:26,530 --> 00:14:30,850
If we swap two characters, the total number of characters is still 26.

192
00:14:31,480 --> 00:14:36,580
Furthermore, all the characters in the sequence are still unique, and we always have the same set

193
00:14:36,580 --> 00:14:38,050
of characters that we started with.

194
00:14:38,530 --> 00:14:40,780
The only thing that has changed is their position.

195
00:14:45,930 --> 00:14:48,360
Now that we understand how to mutate our DNA.

196
00:14:48,750 --> 00:14:50,940
Let's look at a very basic genetic algorithm.

197
00:14:52,290 --> 00:14:57,300
This isn't the final form of what we're using, but it will help us get started and lay the foundation.

198
00:14:58,320 --> 00:15:04,050
As with most optimization algorithms, our code will take the form of a loop to start well, first pick

199
00:15:04,050 --> 00:15:08,460
a random DNA strain and evaluate the log likelihood using our function F.

200
00:15:09,890 --> 00:15:15,680
Next ones are a loop that will iterate over a predetermined number of epochs inside the loop, we will

201
00:15:15,680 --> 00:15:16,890
mutate our DNA.

202
00:15:17,480 --> 00:15:23,090
More specifically, will choose a new DNA string such that two characters from the existing string get

203
00:15:23,100 --> 00:15:23,690
switched.

204
00:15:23,780 --> 00:15:25,340
We'll call this DNA new.

205
00:15:26,120 --> 00:15:33,110
Next will evaluate the new DNA using the same function f if the new DNA leads to a better log likelihood

206
00:15:33,110 --> 00:15:34,160
than the old DNA.

207
00:15:34,430 --> 00:15:36,170
Then we will keep the new DNA.

208
00:15:36,680 --> 00:15:38,450
Otherwise, we'll keep the old DNA.

209
00:15:39,790 --> 00:15:44,890
In the sense, if you want to keep with the evolution analogy, this means that the parent always has

210
00:15:44,890 --> 00:15:45,610
two children.

211
00:15:46,240 --> 00:15:50,890
One child is an exact copy of itself and another child is a mutated copy.

212
00:15:51,490 --> 00:15:53,530
Only one child survives to procreate.

213
00:15:53,560 --> 00:15:54,670
The next generation.

214
00:15:54,940 --> 00:15:57,700
And that child will also have two children the same way.

215
00:15:59,590 --> 00:16:05,020
This is just a fancy way of saying we'll keep whichever of the two days give us the better log likelihood,

216
00:16:05,050 --> 00:16:06,370
the old one or the new one.

217
00:16:07,090 --> 00:16:13,270
As you can imagine, as we keep doing this at each iteration, the next log likelihood we get must be

218
00:16:13,270 --> 00:16:14,800
at least as good as the old one.

219
00:16:15,490 --> 00:16:21,550
The fitness of the DNA should never decrease because we always keep the old DNA if the new DNA is worse.

220
00:16:22,150 --> 00:16:25,240
And this way, we should at least find a local maximum.

221
00:16:25,810 --> 00:16:28,210
You might recognize this as a form of hill climbing.

222
00:16:33,570 --> 00:16:38,970
There is a problem with this method, as is, unfortunately, it's too easy to get stuck at a local

223
00:16:38,970 --> 00:16:40,530
maximum that is suboptimal.

224
00:16:41,140 --> 00:16:45,030
In fact, if you think about real world biology, it does not work like this.

225
00:16:45,600 --> 00:16:50,730
We won't just have one organism that has only one mutated child who then goes on to have one mutated

226
00:16:50,730 --> 00:16:51,750
child so forth.

227
00:16:52,440 --> 00:16:55,440
Usually, we have an entire population of individuals.

228
00:16:57,210 --> 00:17:00,540
For example, we have seven billion humans on the planet Earth.

229
00:17:01,410 --> 00:17:08,069
Humans are constantly making more humans, creating new mutations and evolving very slowly in a mathematical

230
00:17:08,069 --> 00:17:08,550
sense.

231
00:17:08,849 --> 00:17:14,520
This will help us search many different directions and locations at once, which will improve our chances

232
00:17:14,520 --> 00:17:15,869
at finding the correct answer.

233
00:17:16,560 --> 00:17:22,349
So it's a phrase that in a simpler way, in each generation, we're always going to hold a population

234
00:17:22,349 --> 00:17:23,550
of DNA strings.

235
00:17:24,210 --> 00:17:30,390
Each of the DNA strings, which represent a unique individual, will have multiple offspring creating

236
00:17:30,390 --> 00:17:33,450
the next generation at each generation.

237
00:17:33,780 --> 00:17:36,300
Not every individual will survive to procreate.

238
00:17:36,810 --> 00:17:42,750
We'll keep only the fittest individuals that will create the next generation to phrase it even more

239
00:17:42,750 --> 00:17:43,320
simply.

240
00:17:43,350 --> 00:17:46,410
This is just the common saying survival of the fittest.

241
00:17:51,590 --> 00:17:57,920
OK, so how does this work in pseudocode to begin, we're going to create an entire pool of DNA strings.

242
00:17:58,520 --> 00:18:04,130
These will all be completely random so we can search multiple areas of the parameter space more efficiently.

243
00:18:04,970 --> 00:18:10,340
Let's say we create a pool of 20 individuals, and each generation will always be a size 20.

244
00:18:11,710 --> 00:18:15,640
Next, we're going to enter a loop that iterates for a predetermined number of steps.

245
00:18:17,210 --> 00:18:20,180
Inside the loop, we're going to check if it's the first iteration.

246
00:18:20,840 --> 00:18:25,670
If it is, then we'll just do nothing because we already have the 20 individuals in our pool.

247
00:18:26,420 --> 00:18:28,670
If it's not, then we'll create some offspring.

248
00:18:29,330 --> 00:18:34,160
Specifically, we'll create three offspring per parent, which you'll understand why as we go on.

249
00:18:35,220 --> 00:18:41,220
So next, we need to live through the industry in our pool and evaluate its fitness, once we've done

250
00:18:41,220 --> 00:18:44,310
that, we can sort each day string by its fitness.

251
00:18:44,850 --> 00:18:47,970
We'll keep only the top five fittest individuals.

252
00:18:48,510 --> 00:18:52,020
So now you understand why we created three mutated children per parent.

253
00:18:53,910 --> 00:18:57,510
Say we have five parents, we create three mutations per parent.

254
00:18:58,080 --> 00:18:59,790
Then we have 15 children.

255
00:19:00,420 --> 00:19:03,270
But remember that we would also like to copy the parent directly.

256
00:19:03,840 --> 00:19:08,040
So in total, that means we have 15 plus five individuals, which is 20.

257
00:19:08,700 --> 00:19:13,260
Therefore, on each iteration of the loop, we will always have 20 individuals in total.

258
00:19:14,460 --> 00:19:16,050
Now, of course, this is not necessary.

259
00:19:16,050 --> 00:19:18,000
This is just the way I decided to build it.

260
00:19:19,020 --> 00:19:23,280
The hope is that by the end of this loop, we will have converged to the solution.

261
00:19:23,640 --> 00:19:26,640
Finding the best DNA to decrypt are encoded message.

262
00:19:27,150 --> 00:19:32,910
This is the DNA that leads to the maximum likelihood of the decrypted message under our language model.

263
00:19:37,940 --> 00:19:40,280
Let's summarize this lecture since it's been quite long.

264
00:19:41,300 --> 00:19:46,760
First, we discuss the biological side of evolution and why that leads to a better fitness over time

265
00:19:47,660 --> 00:19:49,100
in the evolutionary sense.

266
00:19:49,130 --> 00:19:51,710
Fitness is just who survives and procreate.

267
00:19:52,190 --> 00:19:56,720
This has nothing to do with how successful someone is in their life or anything like that.

268
00:19:57,320 --> 00:20:02,630
For example, you could be a genius billionaire who doesn't want to have children so you choose not

269
00:20:02,630 --> 00:20:07,910
to procreate, which means you're not successful in the evolutionary sense, but you are successful

270
00:20:07,910 --> 00:20:11,000
by most people's standards, which is a completely different concept.

271
00:20:12,410 --> 00:20:14,510
Well, we are doing is just cold, hard math.

272
00:20:15,140 --> 00:20:21,080
So in this lecture, the challenge was to relate this idea of genetic evolution to our decryption problem.

273
00:20:21,800 --> 00:20:28,340
We recognize that our substitution cipher can be represented by a simple DNA string with some constraints.

274
00:20:30,070 --> 00:20:34,600
Unlike real DNA, we can't do operations like insertion, deletion and substitution.

275
00:20:35,170 --> 00:20:39,100
What we can do is swapping since the number of letters must remain the same.

276
00:20:39,340 --> 00:20:41,530
And we must include all letters of the alphabet.

277
00:20:42,430 --> 00:20:47,950
In our case, the fitness of each DNA string is simply the log likelihood of the decrypted message.

278
00:20:49,420 --> 00:20:55,270
As we discussed previously, messages in real English should have a higher log likelihood than messages

279
00:20:55,270 --> 00:20:56,410
with random letters.

280
00:20:57,640 --> 00:21:01,750
Next, we developed the pseudo code that will implement our version of evolution.

281
00:21:02,530 --> 00:21:08,890
We started with a basic hill climbing strategy, but we extended that idea to include an entire population

282
00:21:08,890 --> 00:21:10,000
of individuals.

283
00:21:10,570 --> 00:21:16,180
Each individual would create more than one offspring at each generation, which allows for more variation

284
00:21:16,180 --> 00:21:18,460
and a higher chance of reaching the optimal value.