1
00:00:02,240 --> 00:00:06,890
Everyone and welcome back to this class natural language processing in Python.

2
00:00:11,640 --> 00:00:17,400
In this lecture, we're going to talk about ciphers specifically, we're interested in one of the simplest

3
00:00:17,400 --> 00:00:19,350
ciphers, the substitution cipher.

4
00:00:20,040 --> 00:00:22,980
Basically, this should be exactly what you think it is.

5
00:00:23,220 --> 00:00:27,960
We create our encoded message by substituting each letter with a different letter.

6
00:00:33,100 --> 00:00:35,680
Let's start with some terminology and a high level picture.

7
00:00:36,460 --> 00:00:42,040
First, we're going to assume that there is some senator and some receiver, and the sender is sending

8
00:00:42,040 --> 00:00:43,720
a message to the receiver.

9
00:00:44,500 --> 00:00:47,590
You are a spy who has intercepted this message.

10
00:00:47,920 --> 00:00:51,190
But of course, it's encrypted, so you can't just read it plainly.

11
00:00:52,120 --> 00:00:58,510
The message was encrypted by the sender before sending the message, and the receiver will decrypt the

12
00:00:58,510 --> 00:01:00,160
message upon receiving it.

13
00:01:02,060 --> 00:01:07,640
Both the sender and the receiver have a dictionary or a mapping such that they can look up at which

14
00:01:07,640 --> 00:01:12,830
letters in the unencrypted message get mapped to which letters in the encrypted message.

15
00:01:13,430 --> 00:01:19,040
This is a one to one mapping, so in order to decrypt the message, we just have to reverse the mapping.

16
00:01:19,790 --> 00:01:25,730
Of course, you as the spy don't have this mapping, so therefore you can't read the message without

17
00:01:25,730 --> 00:01:29,450
having to do some extra work, which is what this section is all about.

18
00:01:34,580 --> 00:01:36,230
Let's do a very simple example.

19
00:01:36,410 --> 00:01:37,760
So you get the basic idea.

20
00:01:38,630 --> 00:01:41,630
Suppose the message we want to send is I like cats.

21
00:01:42,410 --> 00:01:44,330
Now we need to encrypt this message.

22
00:01:44,870 --> 00:01:49,670
We're given this mapping, which tells us how to translate each of the letters here into its encrypted

23
00:01:49,670 --> 00:01:50,090
form.

24
00:01:50,780 --> 00:01:55,070
Since there are 26 letters in the alphabet, let's just focus on the letters we need.

25
00:01:55,760 --> 00:02:01,890
Specifically, I should be replaced with why L should be replaced with W AI again.

26
00:02:01,910 --> 00:02:08,810
It should be replaced with why K should be replaced with R e should be replaced with n c should be replaced

27
00:02:08,810 --> 00:02:09,440
with J.

28
00:02:09,919 --> 00:02:15,470
A should be replaced with l t should be replaced with O and s should be replaced with B.

29
00:02:17,870 --> 00:02:26,890
So our encrypted message becomes y w y are n jello B. Therefore, our sender will send the message y

30
00:02:27,100 --> 00:02:33,950
y are in jail, though b across the internet without transmitting the key, it's assumed that both the

31
00:02:33,950 --> 00:02:37,670
sender and receiver already know how to decrypt the message.

32
00:02:42,810 --> 00:02:47,790
Next, let's pretend that we have a receiver and we just receive this encrypted message.

33
00:02:48,240 --> 00:02:50,640
Our job is now to decrypt this message.

34
00:02:51,300 --> 00:02:57,510
In order to do this, we must apply the same mapping as before, but in reverse order specifically,

35
00:02:57,900 --> 00:03:02,700
why should be replaced with I w should be replaced with L y again.

36
00:03:02,700 --> 00:03:03,990
It should be replaced with AI.

37
00:03:04,440 --> 00:03:10,830
Ah, should be replaced with K and should be replaced with E j should be replaced with C l should be

38
00:03:10,830 --> 00:03:15,780
replaced with a o should be replaced with T and B should be replaced with s.

39
00:03:16,290 --> 00:03:21,000
Therefore YWCA in a jail or B becomes I like cats.

40
00:03:26,180 --> 00:03:31,700
So hopefully, this lecture helps you to understand the basic mechanism behind the substitution cipher

41
00:03:31,940 --> 00:03:38,240
and to understand the basic terminology, you know how to encrypt and decrypt a message and why we want

42
00:03:38,240 --> 00:03:40,040
to solve this problem in the first place.

43
00:03:40,700 --> 00:03:47,270
You now understand your role in the process, which is to crack the code and decrypt the messages without

44
00:03:47,270 --> 00:03:48,230
being given the key.

45
00:03:49,100 --> 00:03:54,140
In the next few lectures, we'll talk about how our understanding of probability will help us in that

46
00:03:54,140 --> 00:03:54,620
pursuit.

