WEBVTT

00:01.170 --> 00:02.670
Hello, everyone, and welcome back.

00:03.360 --> 00:11.310
So the next step in Texas and exposes which we are going to learn in this video are the timing and limitation.

00:12.180 --> 00:18.570
So limitation you can see like a little upgraded was born of this stemming for further analysis of your

00:18.670 --> 00:20.370
tax, but not here.

00:20.850 --> 00:26.460
Basically, stemming is so finding the root word of your original word.

00:26.940 --> 00:36.170
So let's say we have words like, let's say playing, or maybe it will be a play or it will be a play.

00:36.690 --> 00:43.260
So all those words, you can just simply convert it into your Rudaw like a play.

00:43.980 --> 00:51.680
So that is called a stemming no limitations and is also exactly like an stemming only.

00:52.560 --> 00:59.820
But the only basic difference between distending and a limitation is how many will you convert all your

00:59.820 --> 01:03.660
work into your root or kind of stem word?

01:04.110 --> 01:07.260
Not necessarily that in case of stemming the output.

01:07.750 --> 01:13.030
What will be a part of your original dictionary or it will be a part of vocabulary.

01:13.590 --> 01:17.160
So we will see some example and you will get a more idea.

01:17.940 --> 01:21.000
What is the basic difference between stemming a limitation?

01:21.440 --> 01:28.320
Whereas in case of limitation, whenever you convert your original void and find its lamma, whatever

01:28.320 --> 01:32.640
the root word we will get, that is always be a part of vocabulary.

01:33.300 --> 01:38.970
Now, the spacy library doesn't help any kind of stemming algorithm got implemented.

01:39.690 --> 01:43.040
Instead of that, they have only limited implementation.

01:43.500 --> 01:46.660
So what do we do in this video we'll use?

01:46.770 --> 01:51.390
Let me just make it to the last line.

01:53.610 --> 01:55.470
So playing play played.

01:58.160 --> 01:59.410
Everything will be complete.

02:00.170 --> 02:00.560
All right.

02:00.980 --> 02:05.090
So for stemming the two algorithms are available, that is a part of this.

02:05.170 --> 02:08.210
And LDK Library, not part of spacy library.

02:08.960 --> 02:10.370
That will be a part of STEMMER.

02:10.730 --> 02:12.910
And under one is a snowball steamer.

02:14.360 --> 02:21.710
So let's say we help these words are available and for this work with the help of this analytical library,

02:21.830 --> 02:24.410
author Stemmer and a Snowball Stemmer.

02:24.560 --> 02:28.590
We are going to find it's the root word like run, right.

02:28.640 --> 02:29.270
Not running.

02:29.770 --> 02:30.060
Right.

02:30.560 --> 02:34.390
So all this should convert to one base record.

02:35.390 --> 02:40.340
So let's see what result we will get after applying this part timer.

02:40.740 --> 02:42.310
Hannah Snowball Stemmer.

02:43.310 --> 02:46.810
So first, let me put this in taking.

02:48.350 --> 02:52.670
Next, let's import from Analytica, not stem.

02:56.480 --> 02:57.410
Not water.

02:59.000 --> 03:00.050
We are going to import

03:03.690 --> 03:04.370
water it.

03:04.970 --> 03:08.180
And that is on the Stamatis, that's all available.

03:08.570 --> 03:10.130
That is a snowball stamp.

03:10.190 --> 03:14.960
So it will be a from an LDK not.

03:16.070 --> 03:17.500
STEM not.

03:18.760 --> 03:24.620
Snowbowl, let's import Snow-White Stemmer.

03:25.600 --> 03:28.170
Now let's create the object of MODOK class.

03:28.660 --> 03:31.270
So it will be a part of Stemmer.

03:33.730 --> 03:35.590
Let me Dupee in this code.

03:36.700 --> 03:39.390
Stamler and under one is a.

03:41.480 --> 03:43.130
Ascendis, Stemler.

03:45.700 --> 03:48.050
So that will be, you know, why Stanway?

03:53.770 --> 03:59.150
Now, here we need to supply for which language you are going to play this Snowbowl stemware.

03:59.590 --> 04:01.510
So let's add the language like Englis.

04:06.070 --> 04:12.200
And let me executed and you will be able to see we created a two object, be this coach, --.

04:12.470 --> 04:14.170
And it has underscored Stemler.

04:14.550 --> 04:19.010
Now, let's apply this stamping process on all this was.

04:19.470 --> 04:21.600
So let me look at every single word.

04:21.650 --> 04:26.580
So let's have for wording words.

04:27.250 --> 04:30.380
Let's bring more.

04:32.020 --> 04:32.500
Plus.

04:34.270 --> 04:43.540
Stambaugh is not this individual word, so we are going to use this P in this court stamp, not stand

04:44.500 --> 04:46.530
and will use this word.

04:47.320 --> 04:48.530
And let me executed it.

04:48.540 --> 04:53.100
And before that, let me put get some differentiating factors.

04:54.170 --> 04:54.660
All right.

04:55.680 --> 04:58.110
So if you execute it, that is the output of.

04:59.060 --> 05:01.490
Porter stammer certain become one.

05:02.390 --> 05:03.440
Then we have a runner.

05:03.500 --> 05:08.250
So then that becomes the nut itself running become run and then become ran.

05:08.580 --> 05:11.690
Drones become Navasky easily become easily.

05:11.900 --> 05:13.550
So this kind of thing.

05:13.610 --> 05:19.850
Now I don't think that this is Zilly and this family or whatever it has reduced to its best form, or

05:19.910 --> 05:23.800
I would say in a good form that is not available might be an extended.

05:23.810 --> 05:26.090
So that may not be valid in dictionary.

05:26.370 --> 05:27.980
Well, let's play something.

05:28.130 --> 05:30.110
Or with Snowball Stemmer.

05:30.200 --> 05:32.420
So understanding algorithms are available.

05:33.140 --> 05:35.180
So instead of FEMA.

05:37.440 --> 05:43.210
Let me make it underscore Stemler and let's see, is there any difference we'll get?

05:43.260 --> 05:43.650
Or not.

05:44.560 --> 05:44.940
All right.

05:44.970 --> 05:46.080
So then become.

05:46.080 --> 05:46.350
Run!

05:46.450 --> 05:48.690
Run, run, run, run.

05:48.720 --> 05:50.760
So first translated related.

05:50.790 --> 05:52.190
Everything remains same.

05:52.680 --> 05:53.280
Even in his.

05:53.450 --> 05:54.980
So we got the same result.

05:55.350 --> 05:59.670
But in fairly it has reduced to not Fadhli but because it is too fat.

05:59.720 --> 06:01.080
So that looks better.

06:01.590 --> 06:06.070
So Snowbowl Stemmer works a little better compared to this quarter.

06:06.300 --> 06:06.720
Stemler.

06:07.260 --> 06:09.590
So that is the whole story behind us.

06:09.760 --> 06:13.140
Coming next is glamorises and now for limitation.

06:13.350 --> 06:15.090
We are going to use this.

06:16.230 --> 06:18.650
Special library and lemonade's isn't really seen.

06:18.930 --> 06:19.440
Next, what you.
