WEBVTT

0
00:01.230 --> 00:08.640
All right, so we've taken a look at level one encryption, which is basically just storing the password

1
00:08.640 --> 00:11.220
as plain text in our database.

2
00:11.370 --> 00:18.770
So maybe it'll be a little bit difficult for people to get access to our server and access our database.

3
00:18.810 --> 00:20.160
At least you can't just simply

4
00:20.160 --> 00:20.490
right-

5
00:20.490 --> 00:25.690
click on a website to view page source and be able to see it in the HTML.

6
00:25.710 --> 00:30.140
At least it's stored at server level. But that's not really good enough.

7
00:30.150 --> 00:37.150
So let's go ahead and see what we can do to improve the security for our users on our website.

8
00:37.470 --> 00:47.710
So let's increase to level two authentication. And level two authentication involves the use of encryption.

9
00:48.480 --> 00:50.520
So what exactly is encryption?

10
00:50.550 --> 00:57.900
Well, basically all it is is just scrambling something so that people can't tell what the original

11
00:57.900 --> 01:02.890
was unless they were in on the secret and they knew how to unscramble it.

12
01:03.240 --> 01:08.610
This is exactly the same as if you and your friend were sending each other secret messages and you had

13
01:08.610 --> 01:14.010
a key to encode the message that you both knew that so that you could decode the message.

14
01:14.970 --> 01:21.480
Now, on a bigger scale, if you've ever watched The Imitation Game or read about the Enigma machine,

15
01:21.690 --> 01:24.750
well, that is basically a form of encryption.

16
01:25.230 --> 01:31.320
And the Enigma machine, if you don't know, is just simply a machine that was used during World War

17
01:31.320 --> 01:34.890
2 when the Germans would send each other messages,

18
01:35.100 --> 01:41.280
they would use the machine to encrypt those messages so that when the messages are intercepted, say,

19
01:41.280 --> 01:50.220
over the radio, unless you had the same Enigma machine and you knew what the decoding key was or what

20
01:50.220 --> 01:54.660
the settings were for the machine, then you wouldn't be able to tell what it is that they were trying

21
01:54.660 --> 01:56.000
to communicate with each other.

22
01:56.310 --> 02:03.660
If you're interested, I really recommend watching two videos that were done by Numberphile on YouTube and

23
02:03.660 --> 02:06.210
I've linked to it in the course resources list.

24
02:06.540 --> 02:14.310
But it explains the Enigma machine and it talks about the flaw in the Enigma machine that led Alan Turing

25
02:14.310 --> 02:20.760
and other people at Bletchley Park to be able to crack the code and create what was very much a specialized

26
02:20.760 --> 02:25.920
computer to be able to decode those messages and helped the allies win the war.

27
02:26.310 --> 02:32.010
And if you ever visit London, be sure to go and check out Bletchley Park and they have a computer museum

28
02:32.010 --> 02:34.230
next to it as well, which is super fascinating.

29
02:34.740 --> 02:36.020
Anyways, I digress.

30
02:36.030 --> 02:36.810
So back to

31
02:36.810 --> 02:43.590
ciphers and encryption, one of the earliest ways of encrypting messages that we know about is the Caesar

32
02:43.590 --> 02:44.120
cipher.

33
02:44.610 --> 02:50.850
And this comes from Julius Caesar, who was one of the generals in the Roman Empire.

34
02:51.000 --> 02:57.770
And what he did is he would send messages to his generals and he would encrypt it

35
02:58.020 --> 03:03.330
so if his messenger got murdered along the way, then his messages would be kept secret.

36
03:03.990 --> 03:09.090
And this is one of the simplest forms of encryption we know about

37
03:09.360 --> 03:10.550
and it's very simple.

38
03:10.560 --> 03:16.080
Let's say we have the alphabet, right? ABCDEFG. All that the Caesar Cipher does

39
03:16.080 --> 03:18.540
is a letter substitution cipher.

40
03:18.660 --> 03:24.820
And the key for the cipher is the number of letters that you would shift by.

41
03:24.870 --> 03:31.900
So if you knew what the shift pattern was, then you could really quickly decipher the message.

42
03:32.340 --> 03:38.210
So if we were to encrypt the word hello, there's a really neat tool online that can help us do that.

43
03:38.220 --> 03:41.850
It's called cryptii.com and it's got two 'i's at the end.

44
03:42.300 --> 03:47.760
And you can basically choose the kind of cipher or encryption that you want to use

45
03:48.120 --> 03:52.550
and then you can specify the shift and we're going to say a shift of three, let's say.

46
03:52.800 --> 04:00.930
So if my text was hello, then it becomes shifted into khoor. And to an unknowing person and a non-cryptographer,

47
04:01.200 --> 04:07.950
it can be quite difficult to see at a glance what exactly this is trying to say. Now in modern days and

48
04:07.950 --> 04:14.010
with modern cryptography, this is overly simplistic and it's very, very easy to crack.

49
04:14.460 --> 04:20.550
But there are other forms of encryption which are a little bit more complicated and it involves a lot

50
04:20.550 --> 04:25.380
more math to make it more time consuming for somebody to crack.

51
04:25.770 --> 04:29.760
But essentially all encryption works exactly the same way.

52
04:30.150 --> 04:38.730
You have a way of scrambling your message and it requires a key to be able to unscramble that message.

53
04:39.450 --> 04:39.930
All right.

54
04:39.930 --> 04:44.520
So now it's time to level up to the next level of security.

55
04:44.760 --> 04:48.630
And in this lesson, we're going to cover something called hashing.

56
04:49.620 --> 04:56.910
Now, previously, we've already looked at encryption, so taking the user's password and securing it

57
04:56.910 --> 05:00.120
using an encryption key, and then 

58
05:00.140 --> 05:06.560
using a particular cipher method, be it a Caesar cipher or the Enigma cipher, no matter which way

59
05:06.560 --> 05:12.440
we chose, we always had a password, a key, and we ended up with some ciphertext which will make it

60
05:12.440 --> 05:16.940
hard for people to be able to immediately guess what our user's password is.

61
05:17.120 --> 05:23.390
So, for example, if we took a password like qwerty and we use the Caesar cipher method and we decided to

62
05:23.390 --> 05:27.440
shift it by one, then our encryption key is the number one.

63
05:27.860 --> 05:32.880
And that creates the ciphertext where every single letter is shifted up by one.

64
05:33.350 --> 05:39.110
Now, in order to decrypt this, all you have to do, as long as you know what the key is, then you

65
05:39.110 --> 05:45.980
can simply shift all of the ciphertext down by one and you end up with the original password.

66
05:46.340 --> 05:51.650
Now, the Caesar cipher is a very, very weak encryption method.

67
05:51.650 --> 05:58.310
It's incredibly easy to figure out what the original text was, even if you didn't have a key.

68
05:58.880 --> 06:04.790
And just to illustrate what bad things can happen when you have a weak encryption system, I'm going

69
06:04.790 --> 06:11.110
to tell you a story from history that tells us why we should not be using a weak encryption system.

70
06:11.690 --> 06:18.770
So back in the 1500's on this island that we now call the United Kingdom, there used to be two large

71
06:18.770 --> 06:19.400
areas.

72
06:19.670 --> 06:23.810
One was Scotland and the other was England.

73
06:24.320 --> 06:27.080
And they were ruled over by two Queens.

74
06:27.380 --> 06:33.860
Scotland was ruled by Mary Queen of Scots, who was a Catholic, and England was ruled over by Queen

75
06:33.860 --> 06:35.210
Elizabeth the first.

76
06:35.630 --> 06:41.330
Now, these two ladies between them controlled the land that we now call the UK, but they each wanted

77
06:41.330 --> 06:43.790
to have more power and more land.

78
06:44.300 --> 06:45.990
So what did they do?

79
06:46.010 --> 06:53.870
Well, Mary Queen of Scots who ruled over Scotland decided to plot with her friend, Lord Babington,

80
06:54.200 --> 06:56.800
to try and assassinate Queen Elizabeth.

81
06:57.260 --> 07:02.720
That way, she would be the legitimate heir to both the English and Scottish throne,

82
07:03.050 --> 07:06.920
and it was kind of a Game of Thrones kind of situation going on back then.

83
07:07.460 --> 07:13.910
But in order to mobilize their forces or try to come up with some sort of secret plan, they decided

84
07:13.910 --> 07:17.800
to send letters to each other using ciphertext.

85
07:18.020 --> 07:25.850
So they came up with a system to encrypt their letters to each other such that if it fell into the wrong

86
07:25.850 --> 07:32.800
hands, the subject of the letter wouldn't be revealed and they wouldn't end up being tried for treason.

87
07:32.990 --> 07:39.260
But the problem was that the encryption method that they used, which was a letter substitution method

88
07:39.410 --> 07:44.510
similar to the Caesar cipher, was a very weak form of encryption.

89
07:45.110 --> 07:54.230
And Queen Elizabeth had a chief decoder who ended up deciphering their letters and figuring out what

90
07:54.230 --> 07:56.220
their encryption key was.

91
07:56.420 --> 08:04.400
So he decided to take this encryption key and write a letter back to Lord Babington to try and get him

92
08:04.400 --> 08:07.490
to reveal all of the co-conspirators.

93
08:07.880 --> 08:11.330
And what was the end result of having their weak encryption system?

94
08:11.720 --> 08:19.250
Well, Queen Elizabeth decided to accuse Mary Queen of Scots of treason, and hence she ended up having

95
08:19.250 --> 08:20.360
her head chopped off.

96
08:20.750 --> 08:25.420
So this is not what you want to happen to you or your website.

97
08:25.940 --> 08:33.320
So weak encryption systems can end up putting user passwords at risk and your company might end up metaphorically

98
08:33.320 --> 08:39.230
decapitated, such as in the case of companies like TalkTalk or Equifax, where they ended up getting

99
08:39.230 --> 08:42.200
hacked and lost a lot of the trust of their users.

100
08:42.440 --> 08:48.410
Now, if you're interested in more stories like this and to learn more about cryptography and encryption,

101
08:48.590 --> 08:52.820
there's a really great book recommendation I would make called The Code Book by Simon Singh.

102
08:53.000 --> 08:56.480
It contains stories like the one that I just told you and more.

103
08:56.540 --> 08:59.380
So if you're interested in this, go ahead and read more about it.

104
08:59.720 --> 09:03.500
Now, how can we make our password more secure Now,

105
09:03.530 --> 09:09.920
at the moment, the biggest flaw in our authentication method is the fact that we need an encryption

106
09:09.920 --> 09:14.450
key to encrypt our passwords and decrypt our passwords.

107
09:14.870 --> 09:22.040
And chances are that if somebody is motivated enough to spend time and hack into your database, then

108
09:22.040 --> 09:29.150
it's probably not that difficult for them to also be able to get your encryption key, even if you've

109
09:29.150 --> 09:33.480
saved it in environment variable or somewhere secure on your server.

110
09:33.950 --> 09:38.880
So how can we address this weakest link, the need for that encryption key?

111
09:39.140 --> 09:45.380
Well, here is where hashing comes into play. Whereas previously with encryption we needed that encryption

112
09:45.380 --> 09:52.560
key, hashing takes it away and no longer requires the need for an encryption key.

113
09:53.180 --> 09:59.450
Well, then you might ask, well, if we don't have an encryption key, how can we decrypt our password

114
09:59.450 --> 09:59.990
back into

115
10:00.060 --> 10:08.520
plain text? Well, the secret is you don't. Let's say a user registers on our website and they enter

116
10:08.520 --> 10:16.560
a password to register with, we use something called a hash function to turn that password into a hash

117
10:16.560 --> 10:19.500
and we store that hash in our database.

118
10:20.160 --> 10:29.700
Now, the problem is that hash functions are mathematical equations that are designed to make it almost

119
10:29.700 --> 10:32.280
impossible to go backwards.

120
10:32.560 --> 10:38.910
So it's almost impossible to turn a hash back into a password.

121
10:39.150 --> 10:41.130
How is this possible, you might ask?

122
10:41.160 --> 10:48.690
How is it possible that you can turn a password into a hash very quickly and easily, but make it almost

123
10:48.690 --> 10:52.080
impossible to turn that hash back into a password?

124
10:52.800 --> 10:53.880
Well, here's a question.

125
10:54.600 --> 11:01.710
Let me ask you, what are the factors of 377 other than one and 377?

126
11:02.040 --> 11:05.650
So basically, I'm saying 377 is not a prime number.

127
11:06.090 --> 11:14.490
Not only can you divide 377 by 1 and 377, but there's also two other numbers that you

128
11:14.490 --> 11:15.450
can divide it by.

129
11:15.870 --> 11:19.260
Now it's your job to figure out what those numbers are.

130
11:20.130 --> 11:21.420
So, what might you do?

131
11:21.450 --> 11:23.130
Well, you might divide it by two.

132
11:23.490 --> 11:26.090
OK, so that becomes 188.5.

133
11:26.130 --> 11:27.300
That's not a whole number

134
11:27.300 --> 11:28.880
so 2 is not a factor.

135
11:29.280 --> 11:30.610
What if you divide it by three?

136
11:30.630 --> 11:36.290
Well, that becomes 113.3 recurring, which is also not a whole number.

137
11:36.450 --> 11:38.870
So three is not a factor either.

138
11:39.180 --> 11:45.940
And you might go through this process for a long time, tediously going through number by number.

139
11:45.990 --> 11:53.010
Well, then you might arrive at the point where you divide 377 by 13 and you end up with

140
11:53.010 --> 11:53.680
29.

141
11:54.060 --> 12:04.140
So 13 and 29 are the answers to this question. They are the only factors of 377 other than 1 and

142
12:04.140 --> 12:05.000
377.

143
12:05.640 --> 12:12.750
And as you can see, that process of getting to this point of finding those two factors took us a while,

144
12:12.750 --> 12:13.130
right?

145
12:13.140 --> 12:14.400
It wasn't that easy.

146
12:14.910 --> 12:17.400
But consider if I asked you a different question.

147
12:17.400 --> 12:20.990
If I said to you, can you multiply 13 by 29?

148
12:21.330 --> 12:24.900
Well, you would be able to do that really quickly and easily.

149
12:24.900 --> 12:30.050
It would take you almost no time at all to figure out that the answer is 377.

150
12:30.870 --> 12:36.880
So here is a very, very simplified version of a hash function.

151
12:37.350 --> 12:44.790
So going forward, multiplying 13 by 29 is really quick and easy, but going backward, trying

152
12:44.790 --> 12:46.440
to get back those numbers

153
12:46.440 --> 12:52.360
13 and 29 starting from 377 is very, very time consuming.

154
12:52.770 --> 12:56.120
So this is essentially how a hash function works.

155
12:56.520 --> 13:01.340
Just add a little bit more complexity and you end up with a real hash function.

156
13:01.590 --> 13:09.360
So they're designed to be calculated very quickly going forwards, but almost impossible to go backward.

157
13:09.360 --> 13:10.860
And by almost impossible

158
13:11.100 --> 13:19.110
I simply mean that using current levels of computing power, it would take far too long to make it worthwhile

159
13:19.110 --> 13:19.810
for the hacker.

160
13:20.130 --> 13:26.880
So let's say that to calculate the hash going forward, it takes a millisecond, but to go backward

161
13:26.880 --> 13:31.800
it takes two years, then that hacker probably has better things to do with his time.

162
13:31.980 --> 13:38.700
So when a user tries to register on our website, then we ask them for the registration password, which

163
13:38.700 --> 13:44.720
we turn into a hash using our hash function, and then we store that hash on our database.

164
13:45.210 --> 13:52.020
Now, at a later point when the user tries to log in and they type in their password, then we again

165
13:52.110 --> 14:00.330
hash that password that they typed in to produce a hash and then we compare it against the hash that

166
14:00.330 --> 14:02.570
we have stored in our database.

167
14:03.060 --> 14:10.170
And if those two hashes match, then that must mean that the login password is the same as the registration

168
14:10.170 --> 14:11.100
password as well.

169
14:11.400 --> 14:18.810
And at no point in this process do we have to store their password in plain text or are we able to reverse

170
14:18.810 --> 14:22.350
the process to figure out their original password?

171
14:22.710 --> 14:26.480
The only person who knows their password is the user themselves.

172
14:26.760 --> 14:35.400
Now, previously we saw that by using the Enigma machine, as long as we knew what the settings were

173
14:35.400 --> 14:39.030
for the Enigma machine, which is basically the encryption key,

174
14:39.030 --> 14:39.360
right?

175
14:39.690 --> 14:46.800
As long as we knew what that was, then I can decode it by setting it to the same encryption key.

176
14:48.270 --> 14:55.920
And we end up being able to retrieve the original text. Now, however, if I was to go and change this

177
14:56.130 --> 15:04.860
to a hash function instead, then you can see that when we try to decode this using the same hash function,

178
15:04.860 --> 15:06.900
MD5, we get the error

179
15:06.900 --> 15:12.210
that decoding step is not defined for hash function because you can't really go back.

180
15:12.570 --> 15:19.310
That's the whole point of the hash function and this is what will make our authentication more secure.