1

00:00:01,230  -->  00:00:07,860
Not let's look at hostel which is a very interesting and a very widely used data structure it is very

2

00:00:07,860  -->  00:00:14,240
widely used as it supports some really fast operations like searching of elements but hash map on how

3

00:00:14,240  -->  00:00:16,110
set are based on how stable.

4

00:00:16,370  -->  00:00:22,020
And so before we look at them it is important to understand how a hash table looks.

5

00:00:22,430  -->  00:00:29,460
It has to as a fundamental data structure that is used to implement an associate to it which associates

6

00:00:29,460  -->  00:00:29,580
.

7

00:00:29,610  -->  00:00:31,550
He's with values.

8

00:00:31,560  -->  00:00:38,360
So it is an area where each element is basically an association between a key and a value.

9

00:00:38,400  -->  00:00:41,260
So we're talking about a collection of key value pairs.

10

00:00:41,640  -->  00:00:47,650
An example key value pair can be NAME ON PHONE worki is name and value is fun.

11

00:00:47,760  -->  00:00:52,130
This would help us to extract a person's phone number given a person's name.

12

00:00:52,710  -->  00:00:59,550
So it is very useful for searching on hash table helps us to do it in a very efficient we know each

13

00:00:59,550  -->  00:01:06,000
key value pair is also referred to as a mapping on a hash table is also referred to as a dictionary

14

00:01:06,840  -->  00:01:13,380
as a dictionary with no help in looking up of words on their corresponding meanings.

15

00:01:13,500  -->  00:01:15,640
And here are the key operations of the hash table.

16

00:01:15,960  -->  00:01:22,080
We can insert a key value mapping into a hash table and we can also search a mapping using a given key

17

00:01:22,920  -->  00:01:29,610
and we cannot remove a particular mapping by using a key and all of these operations happened in a constant

18

00:01:29,610  -->  00:01:30,570
time.

19

00:01:30,570  -->  00:01:34,240
So how stable is it really fast.

20

00:01:34,240  -->  00:01:36,530
And here are some characteristics of a hash table.

21

00:01:36,750  -->  00:01:40,800
A hash table can not contain duplicate keys.

22

00:01:40,800  -->  00:01:47,490
However it can contain duplicate values which means that are more than one key can have the same value

23

00:01:47,490  -->  00:01:48,470
.

24

00:01:48,690  -->  00:01:52,680
Also each key maps to at most one value.

25

00:01:52,830  -->  00:02:00,570
Now as for notes go certain implementations allow null values but they also allow only one Nalby but

26

00:02:00,570  -->  00:02:08,970
some implementations do not allow nulls for Book use as well as what I use on here is an illustration

27

00:02:09,060  -->  00:02:15,240
of how a hash table is implemented music that hash table is an associate at it.

28

00:02:15,240  -->  00:02:22,170
So as you can see here we have an array and each element of it is referencing a linked list which stores

29

00:02:22,170  -->  00:02:24,810
the actual Kubler mappings.

30

00:02:24,840  -->  00:02:30,280
So the idea is not storing the key value mappings but it is a linked list which is doing that.

31

00:02:30,300  -->  00:02:36,750
So here key is name and value as phone number or not that each slot in the array is commonly referred

32

00:02:36,750  -->  00:02:39,180
to as a bucket.

33

00:02:39,180  -->  00:02:44,310
Note that each of the linked list can have multiple mappings too as we can see here in the illustration

34

00:02:44,970  -->  00:02:48,990
No front of any operation whether it is search insert or delete.

35

00:02:49,230  -->  00:02:55,130
The most important goal is to quickly look at the target bucket that is next.

36

00:02:55,350  -->  00:03:00,990
And then we need to search the linked list at that particular bucket not could we look at the bucket

37

00:03:01,080  -->  00:03:06,990
for a given key value mapping function called hash function is applied on the key.

38

00:03:06,990  -->  00:03:11,460
For example here in this diagram Epling the hash function on the key.

39

00:03:11,460  -->  00:03:20,330
Lisa Smet build it on the next number one on hash function is basically a function of the key and the

40

00:03:20,470  -->  00:03:29,130
arrests and a simple example of hash function will be key more precise and more helps in transforming

41

00:03:29,250  -->  00:03:35,000
a lot space into smaller one the smallest space would be the size of the.

42

00:03:35,220  -->  00:03:43,410
For instance if you're a resizes want to fight on your key is 315 then 315 Mark 4:59 would give us 15

43

00:03:43,430  -->  00:03:43,820
.

44

00:03:44,190  -->  00:03:52,860
So 15 would be the target bucket number for the mapping with the key 350 Jindalee hashing in using two

45

00:03:52,860  -->  00:03:53,670
functions.

46

00:03:53,880  -->  00:04:00,300
Angela also does that as we see here here in the first step function called hash is invoked.

47

00:04:00,490  -->  00:04:04,150
And this function would be the hash code of Dekhi.

48

00:04:04,590  -->  00:04:11,760
Recall the hash code method in the Object class Rutan's the hash code which would typically be the memory

49

00:04:11,760  -->  00:04:15,830
address of the object that is converted into an angry value.

50

00:04:16,320  -->  00:04:20,160
So the hash code muttered returns on end value.

51

00:04:20,250  -->  00:04:26,590
Also the hash code method is provided in order to support hash table based implementations such as hash

52

00:04:26,600  -->  00:04:27,390
map.

53

00:04:27,620  -->  00:04:31,310
Later we will see a use case then this method is overridden.

54

00:04:32,010  -->  00:04:38,310
So once you get the hash it becomes and is applied on it in order to offset into the hash table that

55

00:04:38,310  -->  00:04:40,020
is defined the target bucket.

56

00:04:40,410  -->  00:04:47,190
So here because and is used in sort of a modulus operation on here is the source of the hash method

57

00:04:47,380  -->  00:04:49,630
in one of the jolliest expressions.

58

00:04:50,010  -->  00:04:55,920
But let's not get into the details of this quarter the main goal for has to happen is to operate in

59

00:04:55,950  -->  00:05:01,320
constant time and for this the hash function should have these properties.

60

00:05:01,800  -->  00:05:04,780
First is it took quickly look at the target bucket.

61

00:05:04,930  -->  00:05:09,470
And for this the hash function logic should be highly efficient.

62

00:05:09,570  -->  00:05:15,420
For instance a function that uses a sober and moderate application some divisions can be slow but using

63

00:05:15,420  -->  00:05:21,930
bitwise and better for operations such as this hash function will give us the performance benefit.

64

00:05:21,930  -->  00:05:28,110
Second is the hash function should be able to dispose the elements as uniformly as possible into the

65

00:05:28,110  -->  00:05:29,430
buckets.

66

00:05:29,580  -->  00:05:36,310
For example if we use only hash code of the key instead of this additional hash function then there

67

00:05:36,310  -->  00:05:40,500
is a chance that several keys may get passed into the same bucket.

68

00:05:40,620  -->  00:05:43,410
And this is called us hash collision.

69

00:05:43,410  -->  00:05:49,530
So even though the keys are different DeMeo it has to the same bucket on all the mappings then be stored

70

00:05:49,530  -->  00:05:51,550
in the same linked list.

71

00:05:51,800  -->  00:05:57,690
And if a large number of mappings get to the same bucket then operations like search can be slow.

72

00:05:57,890  -->  00:06:04,070
That's because once we find the bucket we have to perform a linear search on the corresponding length

73

00:06:04,070  -->  00:06:07,260
list in order to find the target mapping.

74

00:06:07,530  -->  00:06:13,500
Ideal situation would be to have one mapping per bucket but that is very difficult to get you practically

75

00:06:13,500  -->  00:06:14,340
.

76

00:06:14,340  -->  00:06:19,440
So we should at least try to minimize the hash map collisions as much as possible which is what this

77

00:06:19,440  -->  00:06:23,190
hash function Angela is also trying to do.

78

00:06:23,540  -->  00:06:27,040
Let's see how a typical insertion operation looks like.

79

00:06:27,080  -->  00:06:32,780
We just look at the bucket for a given key and we check if the bucket is empty.

80

00:06:33,240  -->  00:06:39,420
If the bucket is empty then we create a new long list at that index on the mapping into that linked

81

00:06:39,420  -->  00:06:41,050
list.

82

00:06:41,220  -->  00:06:47,220
But if bucket is not empty which means that there is a collision then we check if the linked list has

83

00:06:47,310  -->  00:06:49,860
and then that mapping with the scene key.

84

00:06:50,250  -->  00:06:55,470
Now if there is no such mapping then the mapping gets added at the front of the list.

85

00:06:55,770  -->  00:07:01,860
But if there is a mapping with the same key then the value in the old mapping is done using the new

86

00:07:01,860  -->  00:07:03,390
value.

87

00:07:03,420  -->  00:07:08,820
So here there is a collision and we're using a collision resolution strategy called Separate chinning

88

00:07:08,820  -->  00:07:09,690
.

89

00:07:09,800  -->  00:07:15,910
There is also another technique called Open interesting but we are not going to discuss it here.

90

00:07:15,930  -->  00:07:21,810
There are also other factors besides a good hash function that contribute to performance.

91

00:07:21,810  -->  00:07:27,810
One is capacity and the other is load factor capacity is the number of buckets in the hash table load

92

00:07:27,810  -->  00:07:35,820
factor is about how from the hostel can get before it's capacity is automatically increased for Hasek

93

00:07:35,910  -->  00:07:43,740
default capacity 16 points or in flight that is that people will not be resized until it is three quarters

94

00:07:43,940  -->  00:07:44,710
.

95

00:07:45,780  -->  00:07:52,410
Now in case of data just look at a list resizing is not an issue as it is simply about creating a new

96

00:07:52,410  -->  00:07:55,990
larger array on copying the requirements.

97

00:07:56,010  -->  00:08:02,430
But with Hastur bill it's not just about copying but the hash function has to be to be applied on all

98

00:08:02,430  -->  00:08:07,130
the keys once again as a hash function is dependent on the precise.

99

00:08:07,160  -->  00:08:09,270
Now we have a new Orosius.

100

00:08:09,450  -->  00:08:13,290
So the keys might get hashed and no buckets.

101

00:08:13,320  -->  00:08:17,790
So there is the additional expense of rehashing step with hash tables.

102

00:08:17,820  -->  00:08:22,530
So if you're dealing with large number of entries then it is better to create a hash table that a very

103

00:08:22,530  -->  00:08:25,880
large initial capacity.

104

00:08:26,030  -->  00:08:32,190
Note that if you use a very high load factor then you may have less frequent rehashing but you may have

105

00:08:32,190  -->  00:08:36,530
lots of collisions too and that can slow things down normally.

106

00:08:36,660  -->  00:08:41,490
Defaults are good unless you are dealing with a large number of entries in which case you might have

107

00:08:41,490  -->  00:08:46,350
to do some experiments to choose the right factors.

108

00:08:46,370  -->  00:08:52,150
Finally know that hash tables are very widely used if you're familiar with relational databases.

109

00:08:52,270  -->  00:09:00,390
You mean other indexes help faster to be full of data and an index can be based on hash tables nice

110

00:09:00,390  -->  00:09:07,560
global databases are also based on hostages as are essentially key value stores Djama switch statement

111

00:09:07,850  -->  00:09:13,330
also in dentally uses hashing to quickly locate matching case blocks.

112

00:09:13,440  -->  00:09:15,160
So that's about hash tables.

113

00:09:15,210  -->  00:09:22,080
They're primarily used for fast lookups which is possible by using a good hash function just know that

114

00:09:22,080  -->  00:09:28,500
since elements are stored based on the hash function a hash table does not preserve the order of insertion

115

00:09:28,500  -->  00:09:29,100
.

116

00:09:29,100  -->  00:09:32,250
So elements can get inserted anywhere.

117

00:09:32,250  -->  00:09:33,150
So that's about it.

118

00:09:33,150  -->  00:09:35,680
Hope you like this lesson on hash tables.

119

00:09:35,700  -->  00:09:36,090
Thank you