1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:04,000
So we are going to continue the discussion with respect to natural language processing.

3
00:00:04,000 --> 00:00:08,000
And in this video we are going to discuss about name entity recognition.

4
00:00:08,000 --> 00:00:10,000
This is an amazing topic.

5
00:00:10,000 --> 00:00:12,000
So let me open a file over here.

6
00:00:12,000 --> 00:00:16,000
So here you can see that let's say that we have a lot of sentences right.

7
00:00:16,000 --> 00:00:18,000
Like one example of this particular sentence is over here.

8
00:00:18,000 --> 00:00:26,000
The Eiffel Tower was built from 1887 to 1889 by French engineer Gustave Eiffel, uh, whose company

9
00:00:26,000 --> 00:00:29,000
specialized in building metal frameworks and structures.

10
00:00:29,000 --> 00:00:31,000
Right now, this is a sentence.

11
00:00:31,000 --> 00:00:35,000
Now, from this particular sentence we know that what is parts of speech tagging, right.

12
00:00:35,000 --> 00:00:37,000
Which is noun, pronoun and all.

13
00:00:37,000 --> 00:00:43,000
But along with that, with the help of NLTK, we will also be able to get something called as name entity

14
00:00:43,000 --> 00:00:44,000
tags right now.

15
00:00:44,000 --> 00:00:47,000
What are some of the examples of name entity tags.

16
00:00:47,000 --> 00:00:50,000
So here you can see one of the tag is something called as person.

17
00:00:50,000 --> 00:00:52,000
The second tag can be location or place.

18
00:00:52,000 --> 00:00:55,000
Third tag can be date time.

19
00:00:55,000 --> 00:00:55,000
Right?

20
00:00:55,000 --> 00:00:57,000
And here also I have given some examples.

21
00:00:57,000 --> 00:01:01,000
So if I probably say Eiffel Tower, it may be coming as a place.

22
00:01:01,000 --> 00:01:03,000
It can come as a location.

23
00:01:03,000 --> 00:01:06,000
Right over here you can see that okay Gustav Eiffel.

24
00:01:06,000 --> 00:01:11,000
This will basically be getting tagged as a name itself right here.

25
00:01:11,000 --> 00:01:15,000
Suppose if this numbers are actually there, it can be something else, right?

26
00:01:15,000 --> 00:01:18,000
Suppose if you are giving some money value like $1 million.

27
00:01:18,000 --> 00:01:21,000
So let's say this $1 million is present somewhere here, right?

28
00:01:21,000 --> 00:01:26,000
And uh, you know, that will be basically tagged as something called as money.

29
00:01:26,000 --> 00:01:26,000
Right.

30
00:01:26,000 --> 00:01:29,000
So it will be given some kind of name entity tags.

31
00:01:29,000 --> 00:01:29,000
Right.

32
00:01:29,000 --> 00:01:35,000
So let us see some examples and let us see that how this name entity recognition will be given with

33
00:01:35,000 --> 00:01:36,000
the help of NLTK library.

34
00:01:36,000 --> 00:01:40,000
To begin with, what I am actually going to do, I'm going to take this particular sentence.

35
00:01:40,000 --> 00:01:42,000
I'm going to just execute it over here okay.

36
00:01:42,000 --> 00:01:47,000
And again we'll be going like, uh, how we did in the past videos itself.

37
00:01:47,000 --> 00:01:49,000
So let me do one thing over here.

38
00:01:49,000 --> 00:01:51,000
Let me create some more cells okay.

39
00:01:51,000 --> 00:01:55,000
Now the first word, first thing I will just go ahead and import NLTK.

40
00:01:55,000 --> 00:02:01,000
And as you know that uh, with respect to NLTK, also I can use something called as a word tokenize.

41
00:02:02,000 --> 00:02:04,000
Word word tokenize.

42
00:02:06,000 --> 00:02:08,000
Word underscore tokenize.

43
00:02:08,000 --> 00:02:12,000
And here I'm just going to give all my the complete sentence itself.

44
00:02:12,000 --> 00:02:15,000
So here uh let's go ahead and give the sentence over here.

45
00:02:15,000 --> 00:02:20,000
And once I execute it here you'll be able to see that the Eiffel Tower was everything is coming over

46
00:02:20,000 --> 00:02:20,000
here.

47
00:02:21,000 --> 00:02:24,000
So this is my entire words, the list of words.

48
00:02:24,000 --> 00:02:24,000
Right.

49
00:02:24,000 --> 00:02:26,000
So this is my entire list of words now.

50
00:02:26,000 --> 00:02:27,000
Perfect.

51
00:02:27,000 --> 00:02:31,000
Now see this usually, uh, for doing the parts of speech tagging.

52
00:02:31,000 --> 00:02:32,000
What we what do we do?

53
00:02:33,000 --> 00:02:39,000
We basically write nltk dot, pos, underscore tag, and I basically give this entire words to this.

54
00:02:39,000 --> 00:02:43,000
And based on this, every, each and every word will be assigned to some tags.

55
00:02:43,000 --> 00:02:43,000
Okay.

56
00:02:43,000 --> 00:02:49,000
So let me just write this as, uh, as my tagged uh, elements or tag something.

57
00:02:49,000 --> 00:02:52,000
I'm just giving a variable name which is like tagged elements.

58
00:02:52,000 --> 00:02:53,000
And this will get stored over here.

59
00:02:53,000 --> 00:02:53,000
Okay.

60
00:02:53,000 --> 00:02:59,000
Now see this if I really want to provide some an, uh, named entity recognition, all I have to do

61
00:02:59,000 --> 00:03:04,000
is that use this NLTK dot named entity.

62
00:03:04,000 --> 00:03:06,000
So there is something called as mne.

63
00:03:06,000 --> 00:03:12,000
Okay, so let me just show you any, uh, any chunk okay.

64
00:03:12,000 --> 00:03:14,000
So any chunk is a function over here.

65
00:03:14,000 --> 00:03:19,000
And if I probably see the definition of this particular thing, uh, function you'll be seeing use NLTK

66
00:03:19,000 --> 00:03:25,000
currently recommended name entity Chunker to chunk the given list to a given list of tag tokens.

67
00:03:25,000 --> 00:03:29,000
Okay, and inside this I will just pass my all my tag elements.

68
00:03:29,000 --> 00:03:32,000
Okay, so this will be my tag elements.

69
00:03:32,000 --> 00:03:33,000
I'm just going to give this.

70
00:03:33,000 --> 00:03:34,000
Probably we may get an error.

71
00:03:34,000 --> 00:03:38,000
The reason is that because we need to download this one right.

72
00:03:38,000 --> 00:03:42,000
So here you can see that NLTK dot download max net any chunker.

73
00:03:42,000 --> 00:03:48,000
Because here we are specifically using some chunker techniques to basically get the named entity.

74
00:03:48,000 --> 00:03:50,000
And for that we really need to download it.

75
00:03:50,000 --> 00:03:51,000
And that is what is the first requirement.

76
00:03:51,000 --> 00:03:53,000
So here you can see that I'm downloading it.

77
00:03:53,000 --> 00:04:00,000
And once this downloaded gets uh, once this entirely, this uh, Max, Max and any chunker gets downloaded,

78
00:04:00,000 --> 00:04:02,000
we are good to run this specific code.

79
00:04:02,000 --> 00:04:08,000
So this may take some amount of time because this may also be a huge, uh, it can be a huge library

80
00:04:08,000 --> 00:04:10,000
inside it which needs to be downloaded.

81
00:04:10,000 --> 00:04:13,000
So now I will just go ahead and execute it.

82
00:04:13,000 --> 00:04:14,000
Still I'm getting some error.

83
00:04:14,000 --> 00:04:18,000
It also says that okay you need to download NLTK dot download words.

84
00:04:18,000 --> 00:04:19,000
Perfect.

85
00:04:19,000 --> 00:04:22,000
So what I'm actually going to do go and make one more cell.

86
00:04:22,000 --> 00:04:24,000
And please don't worry whenever you get some errors.

87
00:04:24,000 --> 00:04:26,000
I've seen some people who who gets worried.

88
00:04:26,000 --> 00:04:28,000
First of all just go ahead and see the error.

89
00:04:28,000 --> 00:04:29,000
What exactly it is.

90
00:04:29,000 --> 00:04:32,000
It is very much simple and you just need to execute it.

91
00:04:32,000 --> 00:04:39,000
So guys now once this NLTK or download of words gets downloaded, now all I have to do is that over

92
00:04:39,000 --> 00:04:41,000
here you will be seeing that NLTK dot any underscore chunk.

93
00:04:41,000 --> 00:04:43,000
I have to give the tag elements.

94
00:04:43,000 --> 00:04:45,000
And then I just need to write dot draw.

95
00:04:45,000 --> 00:04:48,000
Once I execute this here you'll be able to see this.

96
00:04:48,000 --> 00:04:51,000
What an amazing graph I'm able to get.

97
00:04:51,000 --> 00:04:52,000
Now see this.

98
00:04:52,000 --> 00:04:53,000
Everybody observe this.

99
00:04:53,000 --> 00:04:58,000
I don't know whether you are able to see this properly or not, but just clearly if you will be able

100
00:04:58,000 --> 00:05:02,000
to see it, it has given most of the information very much clearly.

101
00:05:02,000 --> 00:05:06,000
So here it is very much clear that you, uh, this entire sentence.

102
00:05:06,000 --> 00:05:07,000
Right.

103
00:05:07,000 --> 00:05:11,000
The organization is, uh, recognized for FL and tower, right.

104
00:05:11,000 --> 00:05:16,000
Which is a noun right over here you can see this word, uh, was is a verb built.

105
00:05:16,000 --> 00:05:17,000
It is over here.

106
00:05:17,000 --> 00:05:20,000
1887 is something like CD.

107
00:05:20,000 --> 00:05:22,000
And here you can see GPE right.

108
00:05:22,000 --> 00:05:23,000
French JJ right.

109
00:05:23,000 --> 00:05:29,000
It is being able to determine as JJ here you can see person information is being able to get captured

110
00:05:29,000 --> 00:05:30,000
which is like Gustav NP.

111
00:05:30,000 --> 00:05:34,000
So here you can clearly see all this information very nicely.

112
00:05:34,000 --> 00:05:34,000
Right.

113
00:05:34,000 --> 00:05:40,000
So which whichever entity is been able to get recognized here you will be able to find that.

114
00:05:40,000 --> 00:05:44,000
So person is their GP is there and organization is there.

115
00:05:44,000 --> 00:05:44,000
What.

116
00:05:44,000 --> 00:05:46,000
That is what we will be able to find out.

117
00:05:46,000 --> 00:05:48,000
This s is the entire sentence over here.

118
00:05:48,000 --> 00:05:52,000
But with this graph, you will be able to understand that you are able to get the entire information.

119
00:05:52,000 --> 00:05:54,000
So like this.

120
00:05:54,000 --> 00:05:59,000
If you probably go and see that what all uh, other entities you can find person, place, location,

121
00:05:59,000 --> 00:06:01,000
date, time, money and all.

122
00:06:01,000 --> 00:06:06,000
So I hope you were able to understand about named entity recognition and how you will be able to see

123
00:06:06,000 --> 00:06:08,000
this and how you will be able to see the diagrams.

124
00:06:08,000 --> 00:06:14,000
So yes, this was it with respect to this particular video, but understand that NLTK provides this

125
00:06:14,000 --> 00:06:15,000
amazing feature.

126
00:06:15,000 --> 00:06:18,000
You should definitely use it when it where it is required, right?

127
00:06:18,000 --> 00:06:20,000
So this was it from my side for this particular video.

128
00:06:20,000 --> 00:06:21,000
I'll see you all in the next video.

129
00:06:21,000 --> 00:06:21,000
Thank.

