1
00:00:01,230 --> 00:00:08,640
All right, so we've taken a look at level one encryption, which is basically just storing the password

2
00:00:08,640 --> 00:00:11,220
as plain text in our database.

3
00:00:11,370 --> 00:00:18,770
So maybe it'll be a little bit difficult for people to get access to our server and access our database.

4
00:00:18,810 --> 00:00:20,160
At least you can't just simply

5
00:00:20,160 --> 00:00:20,490
right-

6
00:00:20,490 --> 00:00:25,690
click on a website to view page source and be able to see it in the HTML.

7
00:00:25,710 --> 00:00:30,140
At least it's stored at server level. But that's not really good enough.

8
00:00:30,150 --> 00:00:37,150
So let's go ahead and see what we can do to improve the security for our users on our website.

9
00:00:37,470 --> 00:00:47,710
So let's increase to level two authentication. And level two authentication involves the use of encryption.

10
00:00:48,480 --> 00:00:50,520
So what exactly is encryption?

11
00:00:50,550 --> 00:00:57,900
Well, basically all it is is just scrambling something so that people can't tell what the original

12
00:00:57,900 --> 00:01:02,890
was unless they were in on the secret and they knew how to unscramble it.

13
00:01:03,240 --> 00:01:08,610
This is exactly the same as if you and your friend were sending each other secret messages and you had

14
00:01:08,610 --> 00:01:14,010
a key to encode the message that you both knew that so that you could decode the message.

15
00:01:14,970 --> 00:01:21,480
Now, on a bigger scale, if you've ever watched The Imitation Game or read about the Enigma machine,

16
00:01:21,690 --> 00:01:24,750
well, that is basically a form of encryption.

17
00:01:25,230 --> 00:01:31,320
And the Enigma machine, if you don't know, is just simply a machine that was used during World War

18
00:01:31,320 --> 00:01:34,890
2 when the Germans would send each other messages,

19
00:01:35,100 --> 00:01:41,280
they would use the machine to encrypt those messages so that when the messages are intercepted, say,

20
00:01:41,280 --> 00:01:50,220
over the radio, unless you had the same Enigma machine and you knew what the decoding key was or what

21
00:01:50,220 --> 00:01:54,660
the settings were for the machine, then you wouldn't be able to tell what it is that they were trying

22
00:01:54,660 --> 00:01:56,000
to communicate with each other.

23
00:01:56,310 --> 00:02:03,660
If you're interested, I really recommend watching two videos that were done by Numberphile on YouTube and

24
00:02:03,660 --> 00:02:06,210
I've linked to it in the course resources list.

25
00:02:06,540 --> 00:02:14,310
But it explains the Enigma machine and it talks about the flaw in the Enigma machine that led Alan Turing

26
00:02:14,310 --> 00:02:20,760
and other people at Bletchley Park to be able to crack the code and create what was very much a specialized

27
00:02:20,760 --> 00:02:25,920
computer to be able to decode those messages and helped the allies win the war.

28
00:02:26,310 --> 00:02:32,010
And if you ever visit London, be sure to go and check out Bletchley Park and they have a computer museum

29
00:02:32,010 --> 00:02:34,230
next to it as well, which is super fascinating.

30
00:02:34,740 --> 00:02:36,020
Anyways, I digress.

31
00:02:36,030 --> 00:02:36,810
So back to

32
00:02:36,810 --> 00:02:43,590
ciphers and encryption, one of the earliest ways of encrypting messages that we know about is the Caesar

33
00:02:43,590 --> 00:02:44,120
cipher.

34
00:02:44,610 --> 00:02:50,850
And this comes from Julius Caesar, who was one of the generals in the Roman Empire.

35
00:02:51,000 --> 00:02:57,770
And what he did is he would send messages to his generals and he would encrypt it

36
00:02:58,020 --> 00:03:03,330
so if his messenger got murdered along the way, then his messages would be kept secret.

37
00:03:03,990 --> 00:03:09,090
And this is one of the simplest forms of encryption we know about

38
00:03:09,360 --> 00:03:10,550
and it's very simple.

39
00:03:10,560 --> 00:03:16,080
Let's say we have the alphabet, right? ABCDEFG. All that the Caesar Cipher does

40
00:03:16,080 --> 00:03:18,540
is a letter substitution cipher.

41
00:03:18,660 --> 00:03:24,820
And the key for the cipher is the number of letters that you would shift by.

42
00:03:24,870 --> 00:03:31,900
So if you knew what the shift pattern was, then you could really quickly decipher the message.

43
00:03:32,340 --> 00:03:38,210
So if we were to encrypt the word hello, there's a really neat tool online that can help us do that.

44
00:03:38,220 --> 00:03:41,850
It's called cryptii.com and it's got two 'i's at the end.

45
00:03:42,300 --> 00:03:47,760
And you can basically choose the kind of cipher or encryption that you want to use

46
00:03:48,120 --> 00:03:52,550
and then you can specify the shift and we're going to say a shift of three, let's say.

47
00:03:52,800 --> 00:04:00,930
So if my text was hello, then it becomes shifted into khoor. And to an unknowing person and a non-cryptographer,

48
00:04:01,200 --> 00:04:07,950
it can be quite difficult to see at a glance what exactly this is trying to say. Now in modern days and

49
00:04:07,950 --> 00:04:14,010
with modern cryptography, this is overly simplistic and it's very, very easy to crack.

50
00:04:14,460 --> 00:04:20,550
But there are other forms of encryption which are a little bit more complicated and it involves a lot

51
00:04:20,550 --> 00:04:25,380
more math to make it more time consuming for somebody to crack.

52
00:04:25,770 --> 00:04:29,760
But essentially all encryption works exactly the same way.

53
00:04:30,150 --> 00:04:38,730
You have a way of scrambling your message and it requires a key to be able to unscramble that message.

54
00:04:39,450 --> 00:04:39,930
All right.

55
00:04:39,930 --> 00:04:44,520
So now it's time to level up to the next level of security.

56
00:04:44,760 --> 00:04:48,630
And in this lesson, we're going to cover something called hashing.

57
00:04:49,620 --> 00:04:56,910
Now, previously, we've already looked at encryption, so taking the user's password and securing it

58
00:04:56,910 --> 00:05:00,120
using an encryption key, and then 

59
00:05:00,140 --> 00:05:06,560
using a particular cipher method, be it a Caesar cipher or the Enigma cipher, no matter which way

60
00:05:06,560 --> 00:05:12,440
we chose, we always had a password, a key, and we ended up with some ciphertext which will make it

61
00:05:12,440 --> 00:05:16,940
hard for people to be able to immediately guess what our user's password is.

62
00:05:17,120 --> 00:05:23,390
So, for example, if we took a password like qwerty and we use the Caesar cipher method and we decided to

63
00:05:23,390 --> 00:05:27,440
shift it by one, then our encryption key is the number one.

64
00:05:27,860 --> 00:05:32,880
And that creates the ciphertext where every single letter is shifted up by one.

65
00:05:33,350 --> 00:05:39,110
Now, in order to decrypt this, all you have to do, as long as you know what the key is, then you

66
00:05:39,110 --> 00:05:45,980
can simply shift all of the ciphertext down by one and you end up with the original password.

67
00:05:46,340 --> 00:05:51,650
Now, the Caesar cipher is a very, very weak encryption method.

68
00:05:51,650 --> 00:05:58,310
It's incredibly easy to figure out what the original text was, even if you didn't have a key.

69
00:05:58,880 --> 00:06:04,790
And just to illustrate what bad things can happen when you have a weak encryption system, I'm going

70
00:06:04,790 --> 00:06:11,110
to tell you a story from history that tells us why we should not be using a weak encryption system.

71
00:06:11,690 --> 00:06:18,770
So back in the 1500's on this island that we now call the United Kingdom, there used to be two large

72
00:06:18,770 --> 00:06:19,400
areas.

73
00:06:19,670 --> 00:06:23,810
One was Scotland and the other was England.

74
00:06:24,320 --> 00:06:27,080
And they were ruled over by two Queens.

75
00:06:27,380 --> 00:06:33,860
Scotland was ruled by Mary Queen of Scots, who was a Catholic, and England was ruled over by Queen

76
00:06:33,860 --> 00:06:35,210
Elizabeth the first.

77
00:06:35,630 --> 00:06:41,330
Now, these two ladies between them controlled the land that we now call the UK, but they each wanted

78
00:06:41,330 --> 00:06:43,790
to have more power and more land.

79
00:06:44,300 --> 00:06:45,990
So what did they do?

80
00:06:46,010 --> 00:06:53,870
Well, Mary Queen of Scots who ruled over Scotland decided to plot with her friend, Lord Babington,

81
00:06:54,200 --> 00:06:56,800
to try and assassinate Queen Elizabeth.

82
00:06:57,260 --> 00:07:02,720
That way, she would be the legitimate heir to both the English and Scottish throne,

83
00:07:03,050 --> 00:07:06,920
and it was kind of a Game of Thrones kind of situation going on back then.

84
00:07:07,460 --> 00:07:13,910
But in order to mobilize their forces or try to come up with some sort of secret plan, they decided

85
00:07:13,910 --> 00:07:17,800
to send letters to each other using ciphertext.

86
00:07:18,020 --> 00:07:25,850
So they came up with a system to encrypt their letters to each other such that if it fell into the wrong

87
00:07:25,850 --> 00:07:32,800
hands, the subject of the letter wouldn't be revealed and they wouldn't end up being tried for treason.

88
00:07:32,990 --> 00:07:39,260
But the problem was that the encryption method that they used, which was a letter substitution method

89
00:07:39,410 --> 00:07:44,510
similar to the Caesar cipher, was a very weak form of encryption.

90
00:07:45,110 --> 00:07:54,230
And Queen Elizabeth had a chief decoder who ended up deciphering their letters and figuring out what

91
00:07:54,230 --> 00:07:56,220
their encryption key was.

92
00:07:56,420 --> 00:08:04,400
So he decided to take this encryption key and write a letter back to Lord Babington to try and get him

93
00:08:04,400 --> 00:08:07,490
to reveal all of the co-conspirators.

94
00:08:07,880 --> 00:08:11,330
And what was the end result of having their weak encryption system?

95
00:08:11,720 --> 00:08:19,250
Well, Queen Elizabeth decided to accuse Mary Queen of Scots of treason, and hence she ended up having

96
00:08:19,250 --> 00:08:20,360
her head chopped off.

97
00:08:20,750 --> 00:08:25,420
So this is not what you want to happen to you or your website.

98
00:08:25,940 --> 00:08:33,320
So weak encryption systems can end up putting user passwords at risk and your company might end up metaphorically

99
00:08:33,320 --> 00:08:39,230
decapitated, such as in the case of companies like TalkTalk or Equifax, where they ended up getting

100
00:08:39,230 --> 00:08:42,200
hacked and lost a lot of the trust of their users.

101
00:08:42,440 --> 00:08:48,410
Now, if you're interested in more stories like this and to learn more about cryptography and encryption,

102
00:08:48,590 --> 00:08:52,820
there's a really great book recommendation I would make called The Code Book by Simon Singh.

103
00:08:53,000 --> 00:08:56,480
It contains stories like the one that I just told you and more.

104
00:08:56,540 --> 00:08:59,380
So if you're interested in this, go ahead and read more about it.

105
00:08:59,720 --> 00:09:03,500
Now, how can we make our password more secure Now,

106
00:09:03,530 --> 00:09:09,920
at the moment, the biggest flaw in our authentication method is the fact that we need an encryption

107
00:09:09,920 --> 00:09:14,450
key to encrypt our passwords and decrypt our passwords.

108
00:09:14,870 --> 00:09:22,040
And chances are that if somebody is motivated enough to spend time and hack into your database, then

109
00:09:22,040 --> 00:09:29,150
it's probably not that difficult for them to also be able to get your encryption key, even if you've

110
00:09:29,150 --> 00:09:33,480
saved it in environment variable or somewhere secure on your server.

111
00:09:33,950 --> 00:09:38,880
So how can we address this weakest link, the need for that encryption key?

112
00:09:39,140 --> 00:09:45,380
Well, here is where hashing comes into play. Whereas previously with encryption we needed that encryption

113
00:09:45,380 --> 00:09:52,560
key, hashing takes it away and no longer requires the need for an encryption key.

114
00:09:53,180 --> 00:09:59,450
Well, then you might ask, well, if we don't have an encryption key, how can we decrypt our password

115
00:09:59,450 --> 00:09:59,990
back into

116
00:10:00,060 --> 00:10:08,520
plain text? Well, the secret is you don't. Let's say a user registers on our website and they enter

117
00:10:08,520 --> 00:10:16,560
a password to register with, we use something called a hash function to turn that password into a hash

118
00:10:16,560 --> 00:10:19,500
and we store that hash in our database.

119
00:10:20,160 --> 00:10:29,700
Now, the problem is that hash functions are mathematical equations that are designed to make it almost

120
00:10:29,700 --> 00:10:32,280
impossible to go backwards.

121
00:10:32,560 --> 00:10:38,910
So it's almost impossible to turn a hash back into a password.

122
00:10:39,150 --> 00:10:41,130
How is this possible, you might ask?

123
00:10:41,160 --> 00:10:48,690
How is it possible that you can turn a password into a hash very quickly and easily, but make it almost

124
00:10:48,690 --> 00:10:52,080
impossible to turn that hash back into a password?

125
00:10:52,800 --> 00:10:53,880
Well, here's a question.

126
00:10:54,600 --> 00:11:01,710
Let me ask you, what are the factors of 377 other than one and 377?

127
00:11:02,040 --> 00:11:05,650
So basically, I'm saying 377 is not a prime number.

128
00:11:06,090 --> 00:11:14,490
Not only can you divide 377 by 1 and 377, but there's also two other numbers that you

129
00:11:14,490 --> 00:11:15,450
can divide it by.

130
00:11:15,870 --> 00:11:19,260
Now it's your job to figure out what those numbers are.

131
00:11:20,130 --> 00:11:21,420
So, what might you do?

132
00:11:21,450 --> 00:11:23,130
Well, you might divide it by two.

133
00:11:23,490 --> 00:11:26,090
OK, so that becomes 188.5.

134
00:11:26,130 --> 00:11:27,300
That's not a whole number

135
00:11:27,300 --> 00:11:28,880
so 2 is not a factor.

136
00:11:29,280 --> 00:11:30,610
What if you divide it by three?

137
00:11:30,630 --> 00:11:36,290
Well, that becomes 113.3 recurring, which is also not a whole number.

138
00:11:36,450 --> 00:11:38,870
So three is not a factor either.

139
00:11:39,180 --> 00:11:45,940
And you might go through this process for a long time, tediously going through number by number.

140
00:11:45,990 --> 00:11:53,010
Well, then you might arrive at the point where you divide 377 by 13 and you end up with

141
00:11:53,010 --> 00:11:53,680
29.

142
00:11:54,060 --> 00:12:04,140
So 13 and 29 are the answers to this question. They are the only factors of 377 other than 1 and

143
00:12:04,140 --> 00:12:05,000
377.

144
00:12:05,640 --> 00:12:12,750
And as you can see, that process of getting to this point of finding those two factors took us a while,

145
00:12:12,750 --> 00:12:13,130
right?

146
00:12:13,140 --> 00:12:14,400
It wasn't that easy.

147
00:12:14,910 --> 00:12:17,400
But consider if I asked you a different question.

148
00:12:17,400 --> 00:12:20,990
If I said to you, can you multiply 13 by 29?

149
00:12:21,330 --> 00:12:24,900
Well, you would be able to do that really quickly and easily.

150
00:12:24,900 --> 00:12:30,050
It would take you almost no time at all to figure out that the answer is 377.

151
00:12:30,870 --> 00:12:36,880
So here is a very, very simplified version of a hash function.

152
00:12:37,350 --> 00:12:44,790
So going forward, multiplying 13 by 29 is really quick and easy, but going backward, trying

153
00:12:44,790 --> 00:12:46,440
to get back those numbers

154
00:12:46,440 --> 00:12:52,360
13 and 29 starting from 377 is very, very time consuming.

155
00:12:52,770 --> 00:12:56,120
So this is essentially how a hash function works.

156
00:12:56,520 --> 00:13:01,340
Just add a little bit more complexity and you end up with a real hash function.

157
00:13:01,590 --> 00:13:09,360
So they're designed to be calculated very quickly going forwards, but almost impossible to go backward.

158
00:13:09,360 --> 00:13:10,860
And by almost impossible

159
00:13:11,100 --> 00:13:19,110
I simply mean that using current levels of computing power, it would take far too long to make it worthwhile

160
00:13:19,110 --> 00:13:19,810
for the hacker.

161
00:13:20,130 --> 00:13:26,880
So let's say that to calculate the hash going forward, it takes a millisecond, but to go backward

162
00:13:26,880 --> 00:13:31,800
it takes two years, then that hacker probably has better things to do with his time.

163
00:13:31,980 --> 00:13:38,700
So when a user tries to register on our website, then we ask them for the registration password, which

164
00:13:38,700 --> 00:13:44,720
we turn into a hash using our hash function, and then we store that hash on our database.

165
00:13:45,210 --> 00:13:52,020
Now, at a later point when the user tries to log in and they type in their password, then we again

166
00:13:52,110 --> 00:14:00,330
hash that password that they typed in to produce a hash and then we compare it against the hash that

167
00:14:00,330 --> 00:14:02,570
we have stored in our database.

168
00:14:03,060 --> 00:14:10,170
And if those two hashes match, then that must mean that the login password is the same as the registration

169
00:14:10,170 --> 00:14:11,100
password as well.

170
00:14:11,400 --> 00:14:18,810
And at no point in this process do we have to store their password in plain text or are we able to reverse

171
00:14:18,810 --> 00:14:22,350
the process to figure out their original password?

172
00:14:22,710 --> 00:14:26,480
The only person who knows their password is the user themselves.

173
00:14:26,760 --> 00:14:35,400
Now, previously we saw that by using the Enigma machine, as long as we knew what the settings were

174
00:14:35,400 --> 00:14:39,030
for the Enigma machine, which is basically the encryption key,

175
00:14:39,030 --> 00:14:39,360
right?

176
00:14:39,690 --> 00:14:46,800
As long as we knew what that was, then I can decode it by setting it to the same encryption key.

177
00:14:48,270 --> 00:14:55,920
And we end up being able to retrieve the original text. Now, however, if I was to go and change this

178
00:14:56,130 --> 00:15:04,860
to a hash function instead, then you can see that when we try to decode this using the same hash function,

179
00:15:04,860 --> 00:15:06,900
MD5, we get the error

180
00:15:06,900 --> 00:15:12,210
that decoding step is not defined for hash function because you can't really go back.

181
00:15:12,570 --> 00:15:19,310
That's the whole point of the hash function and this is what will make our authentication more secure.

