1
00:00:02,220 --> 00:00:06,600
Everyone and welcome back to this class natural language processing in Python.

2
00:00:11,720 --> 00:00:16,640
In this lecture, we will continue looking at our code to implement a cipher, a decryption algorithm.

3
00:00:17,480 --> 00:00:22,670
This lecture will focus on creating functionality to encode and decode a message.

4
00:00:23,390 --> 00:00:26,420
The first thing we need, obviously, is an actual message.

5
00:00:27,020 --> 00:00:33,020
This message is a paragraph I took from another classic book The Adventures of Sherlock Holmes.

6
00:00:35,690 --> 00:00:41,270
So below this, I've commented out a few more lines, which are other paragraphs from the same book.

7
00:00:42,110 --> 00:00:45,860
The results we get at the end will depend on what the original message was.

8
00:00:46,130 --> 00:00:51,350
So you can try adding these to the message if you like to see what kind of results you get and what

9
00:00:51,350 --> 00:00:54,500
kind of tuning you have to do, if any, to get it to work.

10
00:00:55,130 --> 00:00:58,040
For example, you might change the size of the DNA pool.

11
00:00:58,280 --> 00:01:01,520
The number of children per parent or the number of epochs.

12
00:01:08,090 --> 00:01:15,020
Next, we have a function to encode a message it takes as input, a raw message such as the above and

13
00:01:15,020 --> 00:01:20,000
returns cipher text representing that message using the substitution cipher.

14
00:01:20,210 --> 00:01:23,870
We defined at the start of the scripts inside the function.

15
00:01:23,930 --> 00:01:26,810
The first thing we do is lowercase all the letters.

16
00:01:27,290 --> 00:01:30,650
This is because our mapping only contains lowercase letters.

17
00:01:32,460 --> 00:01:37,740
Next, we use our regex familiar to replace all non alpha characters with a space.

18
00:01:38,340 --> 00:01:42,000
This is because we do not have any mappings for non alpha characters.

19
00:01:42,750 --> 00:01:47,580
Another option would be to simply leave those characters as is, although that would take some additional

20
00:01:47,580 --> 00:01:48,090
work.

21
00:01:49,110 --> 00:01:52,080
Next, we enter a loop to build up the coded message.

22
00:01:52,710 --> 00:01:58,710
We start by initializing the coded message as an empty list will be converting and storing each character

23
00:01:58,710 --> 00:01:59,520
one at a time.

24
00:02:03,510 --> 00:02:07,530
Inside the loop will initialize coated C-H to be just C-H.

25
00:02:08,039 --> 00:02:13,170
This is because if the character is a non alphabetical character, then there is no mapping for it.

26
00:02:14,070 --> 00:02:16,440
Next, we check if C-H is in our dictionary.

27
00:02:16,470 --> 00:02:22,350
True mapping If it is, we indexed through mapping by C-H to get the cipher character.

28
00:02:22,980 --> 00:02:24,960
We assign this to coded C-H.

29
00:02:25,620 --> 00:02:28,590
Next, we append coded C-H to the coded message.

30
00:02:29,370 --> 00:02:34,500
Finally, when we've completed the loop, we join each character in coded message to create a final

31
00:02:34,500 --> 00:02:34,890
string.

32
00:02:41,300 --> 00:02:47,930
Next, outside the function, we use our encode message function to encode the original message, and

33
00:02:47,930 --> 00:02:50,210
we'll call this variable and coded message.

34
00:02:55,370 --> 00:03:01,400
Next, we have a function to decode the message this takes in two arguments a cipher text message and

35
00:03:01,400 --> 00:03:02,120
a word map.

36
00:03:03,820 --> 00:03:06,900
This function largely works the same as the encode function.

37
00:03:07,150 --> 00:03:12,010
It just maps each character in the input to a different character, according to the word map.

38
00:03:12,550 --> 00:03:18,160
It's just that this word map is an argument to the function, whereas for encoding the word map was

39
00:03:18,160 --> 00:03:19,870
the true mapping we defined earlier.

40
00:03:20,710 --> 00:03:24,160
So to start, we create an empty list called Decoded Message.

41
00:03:26,030 --> 00:03:29,960
Then we lived through each character in the message Inside the Loop.

42
00:03:29,990 --> 00:03:32,720
We assign C-H to decoded C-H.

43
00:03:33,230 --> 00:03:37,580
Remember, this might be a non alpha character, in which case we would not do anything to it.

44
00:03:38,510 --> 00:03:40,770
Next, we check if C-H is in our word map.

45
00:03:41,360 --> 00:03:46,850
If it is, we indexed the word map using C-H and assign the result to decoded C-H.

46
00:03:47,780 --> 00:03:52,100
Then we append that decoded C-H to the variable decoded message.

47
00:03:54,230 --> 00:03:58,100
Finally, we return the decoded message joined into a single string.

