1
00:00:00,650 --> 00:00:01,550
Hi everyone.

2
00:00:01,550 --> 00:00:05,900
In this video we are going to discuss the hash table data structure.

3
00:00:05,900 --> 00:00:08,390
Now these are the four parts of this video.

4
00:00:08,390 --> 00:00:10,880
We will discuss what hash tables are.

5
00:00:10,880 --> 00:00:15,830
We will discuss about a hashing function and how it works, and how it's used in the case of a hash

6
00:00:15,830 --> 00:00:17,030
table or a hash map.

7
00:00:17,030 --> 00:00:19,460
And then we will discuss about collision handling.

8
00:00:19,460 --> 00:00:25,160
And finally we will discuss the space and time complexity of common operations on a hash table.

9
00:00:25,160 --> 00:00:26,480
So let's get started.

10
00:00:26,480 --> 00:00:31,550
So hash tables are built in almost every programming language.

11
00:00:31,550 --> 00:00:33,380
So that's the good thing about hash tables.

12
00:00:33,380 --> 00:00:33,770
All right.

13
00:00:33,770 --> 00:00:36,860
And they are very useful in JavaScript we have objects.

14
00:00:36,860 --> 00:00:39,590
And in Python we have dictionaries etc..

15
00:00:39,590 --> 00:00:39,980
All right.

16
00:00:39,980 --> 00:00:42,800
So you can find them in almost every programming language.

17
00:00:42,800 --> 00:00:46,190
And hash tables are just key value pairs.

18
00:00:46,930 --> 00:00:47,320
All right.

19
00:00:47,320 --> 00:00:53,170
So for example if this is your hash table and then you have key one which is pointing to value one.

20
00:00:53,170 --> 00:00:55,930
And then you have key two which is pointing to value two.

21
00:00:55,930 --> 00:00:57,640
And then we can just call this anything.

22
00:00:57,640 --> 00:00:59,560
Let's say this is h t which is hash table.

23
00:00:59,560 --> 00:01:02,170
So this over here is an example of a hash table.

24
00:01:02,170 --> 00:01:05,110
Now if you notice that the arrow is in this direction.

25
00:01:05,110 --> 00:01:07,600
Again the syntax in JavaScript is not like this.

26
00:01:07,600 --> 00:01:08,620
It would be a colon.

27
00:01:08,620 --> 00:01:09,070
All right.

28
00:01:09,070 --> 00:01:15,880
Now this arrow over here is just to symbolize that we can only get the value from the key right.

29
00:01:15,880 --> 00:01:17,560
We can get the value from the key.

30
00:01:17,560 --> 00:01:19,870
But from the value we cannot get the key.

31
00:01:19,870 --> 00:01:22,120
So we cannot go in the reverse direction.

32
00:01:22,270 --> 00:01:24,820
That's an interesting thing to remember about hash tables.

33
00:01:24,850 --> 00:01:26,560
Now let's proceed.

34
00:01:26,560 --> 00:01:28,810
And let's try to see how they work.

35
00:01:28,810 --> 00:01:30,880
And what's the benefit of hash tables.

36
00:01:30,880 --> 00:01:32,770
Let's say we have this data.

37
00:01:32,770 --> 00:01:35,500
We have three people John, Abby and Sarah.

38
00:01:35,500 --> 00:01:39,160
And we have these numbers which is related to these three people.

39
00:01:39,160 --> 00:01:40,810
And we need to store these numbers.

40
00:01:40,810 --> 00:01:43,810
Now one way of storing it is like we could just use an array.

41
00:01:43,810 --> 00:01:45,850
So that would be storing it in this manner.

42
00:01:45,850 --> 00:01:46,210
Right.

43
00:01:46,210 --> 00:01:49,960
So we have these three numbers at zero one and two indices.

44
00:01:49,960 --> 00:01:54,160
Now if we want to retrieve this number we could just say array at one.

45
00:01:54,160 --> 00:01:56,800
Again we are calling this array over here as r.

46
00:01:56,800 --> 00:01:59,230
So in that case we could just say r at one.

47
00:01:59,230 --> 00:02:00,550
So this is index one right.

48
00:02:00,550 --> 00:02:02,500
So this is zero one and two.

49
00:02:02,530 --> 00:02:03,550
So this is index one.

50
00:02:03,550 --> 00:02:05,320
So we would get back this number.

51
00:02:05,320 --> 00:02:07,030
But then it's difficult right.

52
00:02:07,030 --> 00:02:09,460
How do we know that we have to call R at one.

53
00:02:09,460 --> 00:02:10,930
So it's difficult to remember.

54
00:02:10,930 --> 00:02:13,420
And it's it's not um friendly.

55
00:02:13,420 --> 00:02:17,200
So it's much better to use an object or a hash table.

56
00:02:17,200 --> 00:02:20,020
And in that case we could store it like this.

57
00:02:20,020 --> 00:02:22,360
So we have John, Abby and Sarah.

58
00:02:22,390 --> 00:02:27,430
Now you can see that the difference over here is in the case of array, the indices over here have to

59
00:02:27,430 --> 00:02:29,680
be the integer zero, one and two.

60
00:02:29,680 --> 00:02:33,100
But then in the case of an object we can have anything as key.

61
00:02:33,100 --> 00:02:34,600
It can be a string.

62
00:02:34,600 --> 00:02:35,380
It can be a number.

63
00:02:35,380 --> 00:02:36,070
It can be anything.

64
00:02:36,070 --> 00:02:36,370
Right.

65
00:02:36,370 --> 00:02:41,500
So we can just say let's say this, this, uh, hash table over here is called h t.

66
00:02:42,430 --> 00:02:48,520
Now if we want to access the number of Abby, which is this number, we can just say h t at Abby.

67
00:02:49,280 --> 00:02:52,190
And it would give back this key value.

68
00:02:52,190 --> 00:02:52,490
All right.

69
00:02:52,490 --> 00:02:53,810
So over here this is the key.

70
00:02:53,810 --> 00:02:56,450
So TX at the key gives back the value.

71
00:02:56,450 --> 00:02:58,100
So you can see it's very useful.

72
00:02:58,100 --> 00:02:58,730
All right.

73
00:02:59,300 --> 00:03:03,590
Another interesting thing to observe over here is that an array is ordered.

74
00:03:03,590 --> 00:03:05,060
You have zero one and two.

75
00:03:05,090 --> 00:03:09,620
Right now when you're inserting another element or pushing an element to the array you would insert

76
00:03:09,620 --> 00:03:10,070
it over here.

77
00:03:10,070 --> 00:03:10,250
Right.

78
00:03:10,250 --> 00:03:11,540
If you're pushing it at the end.

79
00:03:11,540 --> 00:03:13,370
So that would be again at index three.

80
00:03:13,370 --> 00:03:16,190
But when it comes to a hash table it's not ordered.

81
00:03:16,190 --> 00:03:18,290
So you just have John and Abby.

82
00:03:18,290 --> 00:03:20,390
Both of these are there in the hash table.

83
00:03:20,390 --> 00:03:22,430
It's not that John is ahead of Abby.

84
00:03:22,430 --> 00:03:24,650
So a hash table is not ordered.

85
00:03:24,650 --> 00:03:25,070
All right.

86
00:03:25,070 --> 00:03:26,780
So we've understood what a hash table is.

87
00:03:26,780 --> 00:03:31,340
Now let's proceed and take a look at hashing functions and how they are used.

88
00:03:31,340 --> 00:03:32,960
In the case of a hash table.

89
00:03:32,960 --> 00:03:36,590
Now a hash table is built on top of an array.

90
00:03:36,770 --> 00:03:42,320
Now let's again take this example where we have John, Abby, Sarah and these three numbers which are

91
00:03:42,320 --> 00:03:43,070
related to them.

92
00:03:43,070 --> 00:03:43,490
All right.

93
00:03:43,490 --> 00:03:47,720
And we are storing them in a hash table, let's say where these are keys.

94
00:03:47,720 --> 00:03:49,790
And these over here are values.

95
00:03:50,300 --> 00:03:53,030
Now over here again let's try to see how this works.

96
00:03:53,030 --> 00:03:53,390
All right.

97
00:03:53,390 --> 00:03:55,970
So hash tables are built on top of arrays.

98
00:03:55,970 --> 00:03:57,950
So let's say we have this array over here.

99
00:03:57,950 --> 00:04:03,620
And remember the size of this array will actually actually depend on the size of the information that

100
00:04:03,620 --> 00:04:04,460
we have to store.

101
00:04:04,460 --> 00:04:09,680
So let's just keep that aside for a moment and let's see how the hashing function works over here.

102
00:04:09,680 --> 00:04:15,890
So let's say this hash table where these keys and values are stored is built on top of this array over

103
00:04:15,890 --> 00:04:16,220
here.

104
00:04:16,220 --> 00:04:21,740
Now what is going to happen is we are we will take a key at a time and we will pass it to a hashing

105
00:04:21,740 --> 00:04:22,370
function.

106
00:04:22,370 --> 00:04:29,360
And this hashing function will convert this key to an integer in the range between 0 and 5.

107
00:04:29,360 --> 00:04:33,410
Because we are using this array over here to build the hash table.

108
00:04:33,410 --> 00:04:38,930
So again let me repeat for example, it will take this John over here which is the first key it will

109
00:04:38,930 --> 00:04:41,570
pass in, pass it in to the hashing function.

110
00:04:41,570 --> 00:04:46,700
And then this hashing function will spit out an integer between 0 and 5.

111
00:04:46,700 --> 00:04:48,230
So let's say we are getting two.

112
00:04:48,230 --> 00:04:48,590
All right.

113
00:04:48,590 --> 00:04:54,890
So that means that this value which is 994836 will be stored at index two.

114
00:04:55,220 --> 00:04:59,540
So next time we want to access the value at key John.

115
00:04:59,540 --> 00:05:02,540
All we do is again we will pass the key which is John.

116
00:05:02,540 --> 00:05:04,880
And the hashing function will give us two.

117
00:05:04,880 --> 00:05:08,870
And then we directly come over to index two in this array.

118
00:05:08,870 --> 00:05:10,010
And we can access this.

119
00:05:10,010 --> 00:05:15,470
And we know that if we have the index we can access from an array in constant time.

120
00:05:15,470 --> 00:05:15,860
All right.

121
00:05:15,860 --> 00:05:18,020
So this is how a hash table works.

122
00:05:18,020 --> 00:05:18,950
Now let's proceed.

123
00:05:18,950 --> 00:05:21,530
Let's say we take the next key which is a b.

124
00:05:21,530 --> 00:05:23,660
And that gives us the index zero.

125
00:05:23,660 --> 00:05:27,950
So that means 762511 will be stored at index zero.

126
00:05:27,950 --> 00:05:32,480
And let's say when we pass to the hashing function we get the integer five.

127
00:05:32,480 --> 00:05:37,460
Now in that case 357921 will be stored at index five.

128
00:05:37,460 --> 00:05:39,860
So this is how a hash table works.

129
00:05:39,860 --> 00:05:45,650
It takes in the key, passes it to a hashing function, and gets an integer between the indices of the

130
00:05:45,650 --> 00:05:51,290
array where the hashing function is stored, and then that particular value is stored at that particular

131
00:05:51,290 --> 00:05:53,780
index, which is given by the hashing function.

132
00:05:53,780 --> 00:05:58,970
Now let's take a look at what are the necessities for a good hashing function.

133
00:05:59,390 --> 00:06:00,650
There are three necessities.

134
00:06:00,650 --> 00:06:06,620
The first is that when we pass the same key, the next time, we should get back the same integer,

135
00:06:06,620 --> 00:06:06,860
right?

136
00:06:06,860 --> 00:06:08,000
So that is very important.

137
00:06:08,000 --> 00:06:10,790
Otherwise we won't be able to read the information back.

138
00:06:10,790 --> 00:06:13,880
So the same input should always give the same output.

139
00:06:13,910 --> 00:06:18,560
Now the second thing is that the hashing should happen in constant time.

140
00:06:18,560 --> 00:06:24,260
Now there are powerful hashing functions which already computer scientists have come up with and they

141
00:06:24,260 --> 00:06:25,640
work in constant time.

142
00:06:25,640 --> 00:06:29,960
So that's another important thing because we want efficient operation over here.

143
00:06:29,960 --> 00:06:33,650
And then finally we want to minimize collisions.

144
00:06:33,650 --> 00:06:36,200
Now let's try to understand what collisions are.

145
00:06:36,200 --> 00:06:41,810
Let's say when we pass a B over here first we assume that the hashing function gave us zero.

146
00:06:41,810 --> 00:06:43,220
And that's how we stored it over here.

147
00:06:43,220 --> 00:06:48,740
But what would have happened if when we passed a b to the hashing function and we got back the integer

148
00:06:48,740 --> 00:06:49,070
two.

149
00:06:49,100 --> 00:06:53,960
In that case you can see John and Abby are both getting the integer two.

150
00:06:54,530 --> 00:06:58,760
So we would have to store both these values at this index.

151
00:06:58,760 --> 00:07:00,890
So this is what we call a collision.

152
00:07:00,890 --> 00:07:04,340
And a good hashing function should minimize collisions.

153
00:07:04,340 --> 00:07:10,130
So yes we have these computer scientists have come up with powerful hashing functions which do minimize

154
00:07:10,130 --> 00:07:10,760
collisions.

155
00:07:10,760 --> 00:07:16,280
But let's now go ahead and see how collisions are handled in hash tables.

156
00:07:16,280 --> 00:07:18,680
So again let's take this example.

157
00:07:18,680 --> 00:07:20,780
So we have the array from 0 to 5.

158
00:07:20,780 --> 00:07:22,520
And we have this data over here.

159
00:07:22,520 --> 00:07:22,880
All right.

160
00:07:22,910 --> 00:07:25,910
Now to handle collisions in hash tables.

161
00:07:25,910 --> 00:07:27,410
There are different techniques.

162
00:07:27,410 --> 00:07:29,780
Now over here let's just discuss one technique.

163
00:07:29,780 --> 00:07:34,190
Let's say when we pass in John and when we pass in Abby we get two.

164
00:07:34,190 --> 00:07:34,460
Right.

165
00:07:34,460 --> 00:07:36,470
So that's the integer two which is this over here.

166
00:07:36,470 --> 00:07:42,590
So the way we would store this is this value 994836 would be over here.

167
00:07:42,590 --> 00:07:42,890
Right.

168
00:07:42,890 --> 00:07:46,100
We'll have a pointer which points to this value.

169
00:07:46,100 --> 00:07:48,800
And then this value which is 76251.

170
00:07:48,930 --> 00:07:51,870
One also will be stored at the same index.

171
00:07:51,870 --> 00:07:55,980
Now this value will point to its key which is John.

172
00:07:55,980 --> 00:07:59,580
And similarly this value will point to its key which is Abby.

173
00:07:59,730 --> 00:08:04,230
Now for this we will use another data structure which is called linked List.

174
00:08:04,230 --> 00:08:08,610
And this is something that we will learn soon in the next coming up in the coming up days.

175
00:08:08,610 --> 00:08:09,000
All right.

176
00:08:09,030 --> 00:08:10,320
Now we will learn this soon.

177
00:08:10,320 --> 00:08:13,710
But again a linked list is just data which is connected to each other.

178
00:08:13,710 --> 00:08:18,060
So you have some data over here that's pointing somewhere and that is pointing to somewhere.

179
00:08:18,180 --> 00:08:19,290
And it goes on like this.

180
00:08:19,290 --> 00:08:21,600
So this type of a data structure is a linked list.

181
00:08:21,600 --> 00:08:25,980
So you can see over here at index two we have 994836.

182
00:08:25,980 --> 00:08:30,120
And that is pointing to the next value which is 762511.

183
00:08:30,120 --> 00:08:33,390
And then each of these is pointing back to its key.

184
00:08:33,420 --> 00:08:35,370
Now how would we access this.

185
00:08:35,400 --> 00:08:38,160
We want to get back the number of a b.

186
00:08:38,160 --> 00:08:39,990
So how does retrieving work.

187
00:08:39,990 --> 00:08:41,460
So let's take a look at that.

188
00:08:41,460 --> 00:08:46,140
Now while retrieving we will again pass a b to the hashing function.

189
00:08:46,140 --> 00:08:47,880
And it will give us two.

190
00:08:47,880 --> 00:08:48,210
Right.

191
00:08:48,210 --> 00:08:50,400
So we have this index two over here.

192
00:08:50,400 --> 00:08:56,130
Now what we will do is we will take a look at the linked list at index two, which is this over here.

193
00:08:56,130 --> 00:09:01,080
And then we will traverse this linked list and check is the key over here.

194
00:09:01,080 --> 00:09:02,400
This is pointing to a value right.

195
00:09:02,400 --> 00:09:03,330
So is this a b.

196
00:09:03,330 --> 00:09:04,050
No it's not.

197
00:09:04,050 --> 00:09:05,490
So we continue and we check.

198
00:09:05,490 --> 00:09:06,090
Is this a b.

199
00:09:06,090 --> 00:09:07,260
Yes this is a b.

200
00:09:07,260 --> 00:09:10,620
Therefore it will identify this as the number and return it.

201
00:09:10,620 --> 00:09:15,330
So it will check by traversing this linked list for matching with a B over here.

202
00:09:15,330 --> 00:09:17,730
And that's how we will retrieve the value.

203
00:09:17,730 --> 00:09:22,230
Again we don't need to worry about this because all of this is happening under the hood.

204
00:09:22,230 --> 00:09:25,890
Now this is just to understand how a hash table works.

205
00:09:25,890 --> 00:09:32,310
Now let's proceed to the final part of this video, which is taking a look at the space and time complexity

206
00:09:32,310 --> 00:09:35,070
of common operations using a hash table.

207
00:09:35,070 --> 00:09:41,220
Now you will understand why a hash table is very powerful when we take a look at these, uh, operations

208
00:09:41,220 --> 00:09:42,960
and their space and time complexity.

209
00:09:42,990 --> 00:09:46,650
Now let's start with accessing a value from a hash table.

210
00:09:46,650 --> 00:09:51,660
Now with accessing, we mean, you know, the key, you are given the key and then you just want to

211
00:09:51,660 --> 00:09:52,560
find the value.

212
00:09:52,560 --> 00:09:52,860
Right.

213
00:09:52,860 --> 00:09:55,740
We have seen that a hash table is nothing but key value pairs.

214
00:09:55,740 --> 00:09:58,830
So you have key one over here which is pointing to value one.

215
00:09:58,830 --> 00:10:01,560
Then key two which is pointing to value two, etc..

216
00:10:01,560 --> 00:10:05,370
So let's say you know key two and you want to access value two.

217
00:10:05,400 --> 00:10:07,290
So this is what we mean with accessing.

218
00:10:07,290 --> 00:10:11,610
And inserting means you want to insert a new key value pair into the hash table.

219
00:10:11,610 --> 00:10:14,970
And deleting means let's say you want to remove this key value pair.

220
00:10:14,970 --> 00:10:21,120
Now the good thing about hash tables is that for all these operations, the time and space complexity

221
00:10:21,120 --> 00:10:24,600
is O of one or constant time and constant space.

222
00:10:24,600 --> 00:10:27,390
Now over here we say this on average, right?

223
00:10:27,390 --> 00:10:31,770
In the case of the worst case this is going to take O of n time complexity.

224
00:10:31,770 --> 00:10:32,640
Now why is that.

225
00:10:32,640 --> 00:10:34,530
So let's take an example to understand this.

226
00:10:34,530 --> 00:10:36,000
So let me just clear this up.

227
00:10:36,000 --> 00:10:39,030
Now again let's take a look at the same data which we have over here.

228
00:10:39,030 --> 00:10:40,770
We have John A, B and Sarah.

229
00:10:40,800 --> 00:10:46,230
Now if the hashing function is not good and let's say John gives three when hashed.

230
00:10:46,230 --> 00:10:52,050
And when we pass a b to the hashing function that also gives three and we pass when we pass Sarah to

231
00:10:52,050 --> 00:10:53,790
the hashing function that also gives three.

232
00:10:53,790 --> 00:10:56,370
So you can see that all the data is coming over here.

233
00:10:56,910 --> 00:11:02,220
Now in this case, if we want to access the number of Sarah, we would have to traverse through the

234
00:11:02,220 --> 00:11:03,540
entire linked list over here.

235
00:11:03,540 --> 00:11:03,840
Right.

236
00:11:03,840 --> 00:11:04,740
So we have to go.

237
00:11:04,770 --> 00:11:05,610
We have to check this.

238
00:11:05,610 --> 00:11:06,600
Then we have to check this.

239
00:11:06,600 --> 00:11:07,740
Then we have to check this.

240
00:11:07,740 --> 00:11:11,730
So that's why in the worst case it's going to be O of n time complexity.

241
00:11:11,730 --> 00:11:17,610
But then with respect to coding interviews we don't need to consider this because there are very good

242
00:11:17,610 --> 00:11:22,080
hashing functions which which are used in, uh, programming languages.

243
00:11:22,080 --> 00:11:29,550
And these give O of one average time complexity when it comes to accessing or inserting or deleting

244
00:11:29,550 --> 00:11:30,870
from a hash table.

245
00:11:30,870 --> 00:11:32,670
Now, what are the other things?

246
00:11:32,670 --> 00:11:38,310
If you want to search for a key in a given hash table, that's a O of one time complexity operation.

247
00:11:38,310 --> 00:11:43,620
But if you want to search for a value in a hash table, that's a O of n time complexity operation.

248
00:11:43,620 --> 00:11:44,370
So what is this?

249
00:11:44,370 --> 00:11:45,810
What's the difference between these two?

250
00:11:45,810 --> 00:11:48,780
Let's say this is our hash table and you have key one.

251
00:11:48,780 --> 00:11:49,890
You have value one.

252
00:11:49,890 --> 00:11:51,060
Then you have key two.

253
00:11:51,090 --> 00:11:52,140
You have value two.

254
00:11:52,140 --> 00:11:55,140
And let's say you have key three and you have value three.

255
00:11:55,140 --> 00:12:00,810
Now if you want to check if key three is there in this hash table, we can do that in constant time.

256
00:12:00,810 --> 00:12:02,880
So that's the beauty of hashing hash tables.

257
00:12:02,880 --> 00:12:03,270
All right.

258
00:12:03,270 --> 00:12:07,230
But if you want to know if value three is there then we have to come to key one.

259
00:12:07,230 --> 00:12:11,700
Then we have to go to its value one and check whether value one is equal to value three.

260
00:12:11,700 --> 00:12:12,180
It's not.

261
00:12:12,180 --> 00:12:13,590
So we proceed to key two.

262
00:12:13,620 --> 00:12:18,030
We access its value and check whether value two is equal to value three.

263
00:12:18,030 --> 00:12:18,600
It's not.

264
00:12:18,600 --> 00:12:21,570
So we proceed to key three and check its value.

265
00:12:21,570 --> 00:12:24,630
And we find that value three over here is equal to value three.

266
00:12:24,630 --> 00:12:27,450
So for this we would have to traverse through the hash table.

267
00:12:27,450 --> 00:12:33,660
So that's why searching for a value in a hash table is going to be an O of n time complexity operation.

268
00:12:33,660 --> 00:12:40,200
So again just to revise hash tables are very powerful because we can access from a hash table in constant

269
00:12:40,200 --> 00:12:40,650
time.

270
00:12:40,650 --> 00:12:47,370
We can insert values a key value pair into a hash table in constant time, and we can delete from a

271
00:12:47,370 --> 00:12:48,540
key value pair in a hash table.

272
00:12:48,890 --> 00:12:50,060
In constant time.

273
00:12:50,060 --> 00:12:52,730
Again, these operations don't take up space, right?

274
00:12:52,940 --> 00:12:54,890
Uh, so that's why it's constant space as well.

275
00:12:54,920 --> 00:12:55,310
All right.

276
00:12:55,340 --> 00:12:56,600
Now let's proceed.

277
00:12:57,050 --> 00:12:58,880
This is a summary of what we have discussed.

278
00:12:58,880 --> 00:13:02,630
Accessing, inserting, deleting, searching for a key, searching for a value.

279
00:13:02,630 --> 00:13:08,270
And then if you want to create a new hash table with n key value pairs, this is going to take up space

280
00:13:08,270 --> 00:13:09,350
of the order of n.

281
00:13:09,350 --> 00:13:16,160
Now you'll see that in many coding interview questions, to reduce the time complexity, we may store

282
00:13:16,160 --> 00:13:18,890
some values key value pairs in a hash table.

283
00:13:18,890 --> 00:13:24,110
And if you're going to store n key value pairs, then this is going to take up space of the order of

284
00:13:24,110 --> 00:13:24,560
n.
