1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:04,000
So we are going to continue the discussion with respect to NLP.

3
00:00:04,000 --> 00:00:10,000
And in this video we are going to discuss about something called as parts of speech tagging.

4
00:00:10,000 --> 00:00:17,000
Now I hope you have understood in Lemmatization this parts of speech tagging, it plays a very important

5
00:00:17,000 --> 00:00:22,000
role because if we are giving it as a verb or a noun, right?

6
00:00:22,000 --> 00:00:28,000
Based on that, you know, we are able to get the root form of the specific word, right.

7
00:00:28,000 --> 00:00:30,000
And we have seen lot of examples with respect to this.

8
00:00:30,000 --> 00:00:35,000
In this video, we will try to understand that how many different types of parts of speech tagging is

9
00:00:35,000 --> 00:00:36,000
there.

10
00:00:36,000 --> 00:00:39,000
And with respect to that, we'll also see a practical example.

11
00:00:39,000 --> 00:00:44,000
Let's say that if I am actually getting a sentence like this, Taj Mahal is a beautiful monument.

12
00:00:44,000 --> 00:00:51,000
How with the help of NLTK, we are able to understand that and you'll be seeing that the output of this,

13
00:00:51,000 --> 00:00:53,000
let's say I'm just taking this particular example.

14
00:00:53,000 --> 00:00:57,000
The output for this will be that Tajammul will be considered as a noun.

15
00:00:57,000 --> 00:00:59,000
Beautiful may be considered as a adjective.

16
00:00:59,000 --> 00:01:02,000
Monument can be considered as a verb, right?

17
00:01:02,000 --> 00:01:05,000
I'm just giving as an example, but we'll definitely see this particular example.

18
00:01:05,000 --> 00:01:08,000
And we'll also see some extensive example altogether.

19
00:01:08,000 --> 00:01:12,000
Now, with respect to parts of speech, how many different types of things are there.

20
00:01:12,000 --> 00:01:12,000
Right.

21
00:01:12,000 --> 00:01:20,000
So here you have something called as CC coordinating conjunction CD Cardinal dejate de determinar x.

22
00:01:21,000 --> 00:01:22,000
Uh, existential.

23
00:01:22,000 --> 00:01:28,000
There f w foreign word I n preposition jj adjective jj r adjective.

24
00:01:28,000 --> 00:01:33,000
So that basically means suppose if I'm giving a sentence automatically, it will be able to categorize

25
00:01:33,000 --> 00:01:38,000
in all this specific different different parts of speech automatically with the help of NLTK.

26
00:01:38,000 --> 00:01:40,000
And similarly there are a lot of things over here.

27
00:01:40,000 --> 00:01:46,000
You can see a personal, now not personal pronoun like, uh, for the letters like hi, he she will

28
00:01:46,000 --> 00:01:48,000
be put up as PRP, right?

29
00:01:48,000 --> 00:01:54,000
It'll be getting a tag called as PRP if you see over here some more example like AAB adverb right?

30
00:01:54,000 --> 00:01:58,000
Words like very silently will be put up over here rb r.

31
00:01:58,000 --> 00:02:02,000
It is also above but it will be like a comparative one right.

32
00:02:02,000 --> 00:02:03,000
So example like better.

33
00:02:03,000 --> 00:02:06,000
So you have r bs adverb like this.

34
00:02:06,000 --> 00:02:06,000
You have lot.

35
00:02:06,000 --> 00:02:07,000
Right.

36
00:02:07,000 --> 00:02:10,000
So what I'm actually going to do is that I'm going to take a very good example.

37
00:02:10,000 --> 00:02:14,000
And one assignment you just have to do is that take this particular simple example and try to find out

38
00:02:14,000 --> 00:02:17,000
and try to write down the comment in the comment section.

39
00:02:17,000 --> 00:02:18,000
Okay.

40
00:02:18,000 --> 00:02:24,000
Now over here, you'll be able to see that I have this particular speech of doctor, uh, A.P.J. Abdul

41
00:02:24,000 --> 00:02:24,000
Kalam.

42
00:02:24,000 --> 00:02:29,000
And obviously the same example I have shown you in the stop words example itself.

43
00:02:29,000 --> 00:02:29,000
Right.

44
00:02:29,000 --> 00:02:32,000
So I'm going to take this very simple okay.

45
00:02:32,000 --> 00:02:38,000
And then let's consider that, uh, in this parts of speech tagging I really wanted to perform uh,

46
00:02:38,000 --> 00:02:43,000
stemming let's say, but, uh, I don't want to perform any stemming because I want to see for each

47
00:02:43,000 --> 00:02:43,000
and every word.

48
00:02:43,000 --> 00:02:44,000
Right.

49
00:02:44,000 --> 00:02:47,000
What kind of POS speech tagging will be there.

50
00:02:47,000 --> 00:02:49,000
So no need to import all these things.

51
00:02:49,000 --> 00:02:50,000
Okay.

52
00:02:50,000 --> 00:02:52,000
So I'm just going to, uh, remove all these things instead.

53
00:02:52,000 --> 00:02:55,000
I'll just be focusing on importing NLTK.

54
00:02:55,000 --> 00:02:57,000
So I'll write import NLTK.

55
00:02:57,000 --> 00:03:01,000
And then here you can see that I'm just writing NLTK dot sent to underscore tokenize.

56
00:03:01,000 --> 00:03:05,000
So that basically means I'm actually converting the paragraph into sentences.

57
00:03:05,000 --> 00:03:07,000
So let me just execute this.

58
00:03:07,000 --> 00:03:10,000
And once I get it here you'll be able to see all the sentences.

59
00:03:10,000 --> 00:03:15,000
So in this example I'm trying to show you that how for each and every word we'll be able to find out

60
00:03:15,000 --> 00:03:16,000
the POS tag.

61
00:03:16,000 --> 00:03:17,000
Okay.

62
00:03:17,000 --> 00:03:22,000
So the next thing is that now what I'm actually going to do is that apply the stopwords.

63
00:03:22,000 --> 00:03:26,000
Okay, and no need to apply the Stopwords also because we don't want to remove anything.

64
00:03:26,000 --> 00:03:31,000
So I'm just going to uncomment all these things and I'm going to say that we will find out the POS tag

65
00:03:31,000 --> 00:03:32,000
okay.

66
00:03:32,000 --> 00:03:35,000
Find out the POS tag.

67
00:03:35,000 --> 00:03:35,000
Right.

68
00:03:35,000 --> 00:03:36,000
So this is perfect.

69
00:03:36,000 --> 00:03:38,000
Uh, till here we are going.

70
00:03:38,000 --> 00:03:38,000
Good.

71
00:03:38,000 --> 00:03:43,000
Right now, what I'm actually doing over here, uh, it is very much clear, very much easy to understand

72
00:03:43,000 --> 00:03:48,000
that, uh, over here, we are able to, uh, do that, like I'm doing.

73
00:03:48,000 --> 00:03:53,000
I'm just simply putting a for loop over here with respect to all the sentences and this sentences here,

74
00:03:53,000 --> 00:03:56,000
what I'm doing is that I'm just trying to convert this into words.

75
00:03:56,000 --> 00:03:57,000
I'll be getting a list of words.

76
00:03:57,000 --> 00:04:00,000
And now no need to do the stemming.

77
00:04:00,000 --> 00:04:01,000
So I'll just remove the stemming.

78
00:04:02,000 --> 00:04:02,000
Okay.

79
00:04:02,000 --> 00:04:06,000
And uh, over here I'll just add this w word itself.

80
00:04:06,000 --> 00:04:07,000
I'll be getting the word.

81
00:04:07,000 --> 00:04:13,000
And I'm just saying that if the word is not in set stop words dot words of English.

82
00:04:13,000 --> 00:04:15,000
So, uh, let's see that.

83
00:04:15,000 --> 00:04:15,000
Okay?

84
00:04:15,000 --> 00:04:20,000
I told you that I'll not apply stop words, but let's let us just remove the stop words, because some

85
00:04:20,000 --> 00:04:22,000
of the stop words will not be playing an important role.

86
00:04:22,000 --> 00:04:26,000
So in order to do that, uh, I'll just, uh, import all the libraries with respect to stop word.

87
00:04:26,000 --> 00:04:28,000
And obviously over here, I had already written that.

88
00:04:28,000 --> 00:04:30,000
But let me do Ctrl Z.

89
00:04:30,000 --> 00:04:31,000
So here it is.

90
00:04:31,000 --> 00:04:34,000
Stop words with respect to this and I will also remove this.

91
00:04:34,000 --> 00:04:37,000
Okay, so two things and I will just import the stop words.

92
00:04:37,000 --> 00:04:40,000
So I'm writing that from Anticorpos import stop words.

93
00:04:40,000 --> 00:04:42,000
And now let's apply for this same thing.

94
00:04:42,000 --> 00:04:45,000
Whatever we have done in the previous time.

95
00:04:45,000 --> 00:04:47,000
Uh the same part we are actually repeating it.

96
00:04:47,000 --> 00:04:53,000
So here you can see that for word for word in words if word not in this particular stop word with respect

97
00:04:53,000 --> 00:04:58,000
to English, I'm going to take all the words and then I'm just going to comment down this code, because

98
00:04:58,000 --> 00:05:05,000
I'm not going to join them back into the sentences, because I need to understand what POS tag each

99
00:05:05,000 --> 00:05:06,000
word will be getting.

100
00:05:06,000 --> 00:05:08,000
So I will basically write print.

101
00:05:08,000 --> 00:05:11,000
Okay, so here I'm just going to write a simple print statement.

102
00:05:11,000 --> 00:05:16,000
And in this particular print statement uh, before printing this also what I can do, I can write basically

103
00:05:16,000 --> 00:05:20,000
this in the next line here I'll be using something called as NLTK dot post tag.

104
00:05:21,000 --> 00:05:23,000
Okay, so here you'll be able to see post tag.

105
00:05:23,000 --> 00:05:26,000
I can also apply it for words and I can also apply it for sentences.

106
00:05:26,000 --> 00:05:29,000
Right now I'm just going to apply this for the words itself.

107
00:05:29,000 --> 00:05:31,000
So this is my entire thing.

108
00:05:32,000 --> 00:05:37,000
And uh here I'll just create a uh pos underscore tag variable.

109
00:05:37,000 --> 00:05:41,000
So this will basically indicate that everything is getting stored over here.

110
00:05:41,000 --> 00:05:44,000
And with respect to that I'm just going to print this.

111
00:05:44,000 --> 00:05:45,000
Now.

112
00:05:45,000 --> 00:05:46,000
Let me revise it again.

113
00:05:46,000 --> 00:05:47,000
What all things we did.

114
00:05:47,000 --> 00:05:52,000
I'm iterating through each and every sentence, and then I'm converting this particular sentence into

115
00:05:52,000 --> 00:05:53,000
words.

116
00:05:53,000 --> 00:05:55,000
And for each words I'm applying stop words.

117
00:05:55,000 --> 00:05:59,000
Initially I thought that I'll not apply, but let's apply the stop words because see, at the end of

118
00:05:59,000 --> 00:06:01,000
the day they are so small.

119
00:06:01,000 --> 00:06:04,000
Small words like isn't the he she.

120
00:06:04,000 --> 00:06:05,000
I don't want this particular words right.

121
00:06:05,000 --> 00:06:09,000
So I'll, I'll remove the stop words and I will take all the list of words over here.

122
00:06:09,000 --> 00:06:14,000
I do not apply stemming because I really need to find out whatever words are present over here.

123
00:06:14,000 --> 00:06:16,000
It needs to be, uh, like this.

124
00:06:16,000 --> 00:06:21,000
Analytic will be able to find out all the different, different types of, uh, parts of speech tagging

125
00:06:21,000 --> 00:06:22,000
with respect to that.

126
00:06:22,000 --> 00:06:27,000
And then I just apply this particular simple code that is NLTK dot post underscore tag of words.

127
00:06:27,000 --> 00:06:28,000
And then finally I'm printing it.

128
00:06:28,000 --> 00:06:32,000
So here you can see I'll just execute it I'm getting some error.

129
00:06:32,000 --> 00:06:36,000
I'm not going to, uh, make sure that nothing I'm not going to like if there is an error, I'm not

130
00:06:36,000 --> 00:06:37,000
going to edit the video.

131
00:06:37,000 --> 00:06:42,000
So here it says NLTK dot download average perceptron trigger tagger.

132
00:06:42,000 --> 00:06:46,000
So I definitely require this particular tagger to apply post tag.

133
00:06:46,000 --> 00:06:50,000
So what I will do is that I'll just go and uh copy and paste this.

134
00:06:50,000 --> 00:06:53,000
You also have to do this because you will also face this particular error.

135
00:06:53,000 --> 00:06:55,000
Now here you can see that it has been downloaded.

136
00:06:55,000 --> 00:07:00,000
Now if I probably go ahead and run it now here you'll be seeing for the first sentence I is basically

137
00:07:00,000 --> 00:07:01,000
a PRP.

138
00:07:01,000 --> 00:07:03,000
PRP basically means what?

139
00:07:03,000 --> 00:07:08,000
So if you probably go over here and see personal pronoun and I have given the examples like I he she

140
00:07:08,000 --> 00:07:08,000
right.

141
00:07:08,000 --> 00:07:15,000
Similarly you'll be able to see that other words like three three CD vision Gnns uh, India NLP.

142
00:07:15,000 --> 00:07:17,000
Let's go ahead and see what is PNP.

143
00:07:17,000 --> 00:07:19,000
Because India has been categorized with PNP.

144
00:07:19,000 --> 00:07:22,000
So PNP is nothing but proper noun singular.

145
00:07:22,000 --> 00:07:23,000
It's like a name place.

146
00:07:23,000 --> 00:07:24,000
It can be monuments.

147
00:07:24,000 --> 00:07:27,000
It can be different, different things like NP.

148
00:07:27,000 --> 00:07:29,000
S is proper, proper noun plural.

149
00:07:29,000 --> 00:07:31,000
Americans, Indians like that.

150
00:07:31,000 --> 00:07:31,000
Right?

151
00:07:31,000 --> 00:07:32,000
So all those things are there.

152
00:07:32,000 --> 00:07:39,000
And how easily it is being able to simply, you know, just show the post tags, right, with the help

153
00:07:39,000 --> 00:07:39,000
of NLTK.

154
00:07:39,000 --> 00:07:41,000
This is an amazing thing.

155
00:07:41,000 --> 00:07:44,000
And here, with respect to all the sentences, you will be able to see this okay?

156
00:07:44,000 --> 00:07:47,000
And that is how powerful NLTK is.

157
00:07:47,000 --> 00:07:50,000
Uh, let me just show you whatever assignment that I had actually given you.

158
00:07:50,000 --> 00:07:50,000
Right.

159
00:07:50,000 --> 00:07:51,000
So this was the sentence over here.

160
00:07:51,000 --> 00:07:56,000
I'm just going to copy this, and I'm just going to take this entirely, okay.

161
00:07:56,000 --> 00:07:58,000
And let's see whether we'll be able to do it.

162
00:07:58,000 --> 00:08:05,000
So I'm just going to use NLTK dot pos accents.

163
00:08:05,000 --> 00:08:06,000
Now let's see with respect to sense.

164
00:08:06,000 --> 00:08:08,000
Also if you are able to do this or not.

165
00:08:08,000 --> 00:08:13,000
So I'm just going to put this Taj Mahal is a beautiful monument and I'm going to execute this.

166
00:08:13,000 --> 00:08:15,000
Now here you can see T okay.

167
00:08:15,000 --> 00:08:20,000
Now in this particular case what has happened each and every word has been considered over here.

168
00:08:20,000 --> 00:08:21,000
Right.

169
00:08:21,000 --> 00:08:22,000
This should not basically happen.

170
00:08:22,000 --> 00:08:25,000
So what I'm actually going to do is that I'm going to basically use post tag.

171
00:08:25,000 --> 00:08:27,000
Let's see what is the output with respect to this.

172
00:08:27,000 --> 00:08:29,000
So here also the same thing is coming.

173
00:08:29,000 --> 00:08:36,000
So the simple way in this particular scenario is that again I'll put a for loop for I in.

174
00:08:37,000 --> 00:08:43,000
Okay, I'll say sentence or I in this entire quotes I will just put this.

175
00:08:44,000 --> 00:08:47,000
I'll go through each and every word over here okay.

176
00:08:49,000 --> 00:08:53,000
And I can write dot split with respect to this.

177
00:08:53,000 --> 00:08:58,000
Because if I'm just using this dot split this is my string okay.

178
00:08:58,000 --> 00:09:03,000
And if I probably use dot split, you'll be able to see that what I'm getting.

179
00:09:03,000 --> 00:09:05,000
Taj Mahal is a beautiful monument.

180
00:09:05,000 --> 00:09:05,000
Right.

181
00:09:05,000 --> 00:09:11,000
So for I in this now and here you can see that I'm iterating through each and every things right.

182
00:09:11,000 --> 00:09:12,000
Each and every word.

183
00:09:12,000 --> 00:09:17,000
So I'm just going to print NLTK dot pos tag.

184
00:09:17,000 --> 00:09:19,000
And I am actually going to use this.

185
00:09:19,000 --> 00:09:25,000
So you can here you can uh just go ahead and write this specific I or instead of writing this, I,

186
00:09:25,000 --> 00:09:27,000
I'll just write words.

187
00:09:27,000 --> 00:09:28,000
Okay.

188
00:09:28,000 --> 00:09:31,000
For word in this and I'll take a dot post tag.

189
00:09:31,000 --> 00:09:34,000
I'm just going to use this specific word now here.

190
00:09:34,000 --> 00:09:38,000
Uh, still, uh, there is an issue because I am iterating through each and every thing.

191
00:09:38,000 --> 00:09:44,000
So if I probably get this entire thing, uh, with respect to the list, you'll be able to see.

192
00:09:44,000 --> 00:09:46,000
Okay, uh, let's see, let's see, let's see.

193
00:09:46,000 --> 00:09:48,000
There's something again.

194
00:09:48,000 --> 00:09:49,000
I'm not going to delete it.

195
00:09:49,000 --> 00:09:52,000
I'm just going to see what is the word over here.

196
00:09:52,000 --> 00:09:53,000
So I'm getting this okay.

197
00:09:53,000 --> 00:10:04,000
And if I take this NLTK dot pos tag with respect to this specific word.

198
00:10:05,000 --> 00:10:06,000
Uh, okay.

199
00:10:06,000 --> 00:10:07,000
This is the problem.

200
00:10:07,000 --> 00:10:07,000
Let's see.

201
00:10:07,000 --> 00:10:09,000
What did I do over here?

202
00:10:09,000 --> 00:10:12,000
Uh, I need to provide a list of words.

203
00:10:12,000 --> 00:10:12,000
Okay.

204
00:10:12,000 --> 00:10:14,000
And then I'll be applying this.

205
00:10:14,000 --> 00:10:14,000
Okay.

206
00:10:14,000 --> 00:10:15,000
This is the problem.

207
00:10:15,000 --> 00:10:15,000
Okay?

208
00:10:15,000 --> 00:10:20,000
Now, uh, all you have to do is that I don't have to give word by word.

209
00:10:20,000 --> 00:10:24,000
If it is giving word by word here, you will be able to see for every single character it is giving

210
00:10:24,000 --> 00:10:25,000
this right.

211
00:10:25,000 --> 00:10:27,000
I really need to provide the list of words.

212
00:10:27,000 --> 00:10:33,000
So if I probably just copy this part and put it over here and remove all these things.

213
00:10:33,000 --> 00:10:38,000
Now what I have actually done after this error, see, I'm not editing this particular video with respect

214
00:10:38,000 --> 00:10:40,000
to this error because you really need to see all these errors.

215
00:10:40,000 --> 00:10:43,000
In short, when I give like this, I'll be getting a list of words.

216
00:10:43,000 --> 00:10:46,000
So if I execute this here now you can see that Taj.

217
00:10:46,000 --> 00:10:54,000
It is showing an NP Mahal and NP is is VBS a TT beautiful JJ monument n now see what did I do over here?

218
00:10:54,000 --> 00:10:55,000
POS tag.

219
00:10:55,000 --> 00:11:00,000
Basically, uh, whatever parameter we give, we should be giving in the form of list of words.

220
00:11:00,000 --> 00:11:03,000
And if you are able to do that, we will be getting the same answer.

221
00:11:03,000 --> 00:11:03,000
Right.

222
00:11:04,000 --> 00:11:08,000
So guys, I hope you are able to understand parts of speech tagging.

223
00:11:08,000 --> 00:11:13,000
I definitely got some errors, but I really wanted to show you all the errors that we are getting and

224
00:11:13,000 --> 00:11:15,000
based on that, how you can solve this, right?

225
00:11:15,000 --> 00:11:17,000
That is the main purpose.

226
00:11:17,000 --> 00:11:20,000
So the error part, I'm not going to edit it, I'm going to keep it like that.

227
00:11:20,000 --> 00:11:22,000
So please make sure that you practice.

228
00:11:22,000 --> 00:11:23,000
And yes, this was it.

229
00:11:23,000 --> 00:11:25,000
I will see you all in the next video.

230
00:11:25,000 --> 00:11:25,000
Thank you.