WEBVTT

00:01.700 --> 00:02.600
All right, everyone.

00:02.660 --> 00:09.560
So next topic, which we're going to learn in this annual six hard sentence segmentation.

00:10.130 --> 00:11.180
So far, illustration.

00:11.210 --> 00:12.260
But was I who kept you?

00:12.320 --> 00:19.760
The two sentences and each of these full screen contains some multiple sentence like this is sentence.

00:20.000 --> 00:20.990
The second sentence.

00:21.050 --> 00:22.430
And this last sentence.

00:22.910 --> 00:25.920
The only difference within these two strings.

00:26.060 --> 00:32.320
Like all sentences in a vault and below, they are separated by two different separate.

00:32.810 --> 00:36.490
And let's see how we can handle with the help of this specially.

00:37.070 --> 00:38.540
So let me execute this one.

00:39.590 --> 00:42.740
And let's import our model.

00:43.880 --> 00:51.890
Now we are going to play this Esslin and has to one by one to a modern and in, let's say, Hasman.

00:52.840 --> 00:55.370
And it will create a knock one object.

00:56.950 --> 00:57.310
Now.

00:59.670 --> 01:05.670
To get individual sentence, rare species, they will go identify sentence or not.

01:05.820 --> 01:09.840
We can use like Duquan one not sense.

01:10.840 --> 01:13.600
And if you're security, you'll be able to see.

01:13.650 --> 01:15.050
That is kind of generator.

01:16.740 --> 01:18.350
So let's just get straight to what

01:21.760 --> 01:22.650
Sankin?

01:25.360 --> 01:26.410
Let me bring.

01:28.850 --> 01:29.170
Sen.

01:30.890 --> 01:32.340
He got tax.

01:34.490 --> 01:38.710
And you'll be able to see the first sentences in this sentence.

01:39.170 --> 01:40.490
This is second sentence.

01:40.700 --> 01:42.650
And this is last sentence.

01:43.610 --> 01:46.150
Now, let's make a little twist here.

01:47.280 --> 01:55.150
In this S9, I'm just going to create the new S3 and let just keep it some red.

01:55.670 --> 01:57.900
Here you came.

01:58.340 --> 02:05.450
So you can also kind of a.d.a cap here just because here there is a dot.

02:05.630 --> 02:11.120
So whether space is able to identify with Delvaux DOT or not, the sentencing.

02:11.210 --> 02:12.910
And next, intense stock.

02:13.550 --> 02:18.500
So let me run it for documentary.

02:18.980 --> 02:20.150
An S3.

02:20.770 --> 02:22.960
Just a click already.

02:23.910 --> 02:25.320
And let's see how it goes.

02:27.230 --> 02:29.490
I like let Kenny.

02:31.500 --> 02:33.120
Hopes as he is not defined.

02:35.600 --> 02:36.000
All right.

02:36.050 --> 02:36.770
So you could see.

02:37.990 --> 02:43.130
Species have to understand that this you don't get out.

02:43.210 --> 02:47.500
That means this UK, even after that, is a dot net sentence.

02:47.530 --> 02:48.640
There's an answer here.

02:49.030 --> 02:52.210
Instead of that, the sentences ending at this particular location.

02:52.630 --> 02:57.820
So that is the kind of intelligence Spacey has based on this particular model.

02:57.910 --> 02:58.480
We'll grab.

02:59.020 --> 03:02.470
Now, let's apply for S2 because for us to we don't have a dot.

03:02.680 --> 03:04.750
Instead of that, we have a semicolon.

03:05.320 --> 03:08.400
So I am going to use it for as to.

03:11.700 --> 03:14.340
So same model, we're going to apply for this, too.

03:17.200 --> 03:18.590
And let's run it.

03:19.360 --> 03:23.830
And you'll be able to see it has identified holstering as a just the one sentence.

03:23.890 --> 03:27.870
Instead of that, we had a multiple sentences available.

03:29.650 --> 03:37.690
So how we can add such a kind of custom rule and we can explain the spacy that this is end of this particular

03:37.750 --> 03:40.240
sentence, not like an adult.

03:40.870 --> 03:42.970
Earlier, we heard the end of the sentence.

03:43.000 --> 03:44.500
And a new sentence stuck here.

03:45.040 --> 03:50.260
And for that, we can do it with the help of one custom function.

03:50.290 --> 03:54.850
So I have only created this function like a set custom mildly.

03:55.360 --> 03:55.840
So what?

03:55.960 --> 03:57.130
Here we are going to do.

03:57.250 --> 03:59.630
We are going to pass the document object.

04:00.520 --> 04:05.200
And that will be our doctor who in our case, we are going to treat the what every single.

04:05.290 --> 04:09.060
Let me bring this as to once more.

04:10.800 --> 04:15.660
So you can see first two sentence has been separated by semicolon over the last sentences.

04:16.410 --> 04:20.050
So I'm just going to remove the last token.

04:20.190 --> 04:21.150
That will be done.

04:21.780 --> 04:24.990
And we are going to agree to what every single other token.

04:25.440 --> 04:29.400
And whenever I encounter that, there is a semicolon.

04:30.000 --> 04:33.580
Just make it through his sentence start.

04:33.990 --> 04:37.170
That means the token after was one.

04:37.530 --> 04:40.050
We will just make it through that.

04:40.440 --> 04:45.990
Whatever comes or whatever the new tokens come after this semicolon.

04:46.850 --> 04:48.230
Let's make it through.

04:48.430 --> 04:50.880
That is the starting of new sentence.

04:51.240 --> 04:53.160
And eventually will return the dog.

04:53.610 --> 05:01.410
So it is just trying to give the information to the specie that this semicolon, whenever you encounter

05:01.740 --> 05:03.990
after that, whatever token you will get.

05:04.440 --> 05:10.380
You consider that particular token as a starting of your new sentence.

05:10.770 --> 05:11.190
All right.

05:11.490 --> 05:15.270
Now, how we will apply to a document object before that.

05:15.360 --> 05:21.990
Let me display this before and I'll be likely that will be a fight needs.

05:22.950 --> 05:27.880
So when we apply this hand LP model, what it will be loaded.

05:28.020 --> 05:33.000
This one on our document object or I would say on a string.

05:33.330 --> 05:35.190
Eventually it will create a document object.

05:35.850 --> 05:44.160
It will go through all those steps like it will force do the tagging, then passing and then name entity

05:44.280 --> 05:44.930
recognition.

05:45.420 --> 05:53.910
So at which particular phase of this an pipeline you want to do this custom set boundary.

05:54.330 --> 06:02.180
So we can add an LP, not add by first what action you want to take it.

06:02.580 --> 06:04.760
So that will be a set custom boundaries.

06:05.930 --> 06:10.880
That means which component you want to play and where you want to play.

06:10.970 --> 06:15.760
So let's say we want to play before, let's say Pozza.

06:17.090 --> 06:18.650
So we fought partisan.

06:18.650 --> 06:23.180
And after that, it will be a play if you just need this.

06:23.500 --> 06:25.580
And he hopes.

06:26.680 --> 06:28.010
I had to create it.

06:28.390 --> 06:34.030
So it will be in a sequential manner and you will be able to see it now, you know, an pipe names.

06:34.090 --> 06:36.280
We have a multiple components.

06:36.310 --> 06:40.420
Our one more components, part of protector set custom boundaries.

06:40.750 --> 06:43.750
Now, let's apply this LP once again.

06:44.530 --> 06:47.130
So an LP on has to.

06:48.070 --> 06:49.390
Let me use it.

06:49.610 --> 06:53.360
Knock on this score to legislate right away.

06:56.680 --> 06:58.120
So it's one v dot.

06:58.480 --> 07:01.840
Instead of the on, let's go to any fact security.

07:04.160 --> 07:04.430
All right.

07:04.470 --> 07:11.820
So for the 90 days display, just because this high biometric token, not high, that means a had that

07:11.820 --> 07:16.740
particular location, that token for a token nine is a color.

07:17.810 --> 07:18.730
Semicolon, talking.

07:19.160 --> 07:21.890
So you can see zero, one, two, three.

07:21.950 --> 07:22.510
And a fourth.

07:22.540 --> 07:24.560
The fourth one is a semicolon Domen.

07:24.980 --> 07:26.900
Five, six, seven, eight and nine.

07:26.930 --> 07:28.590
Nine is also semicolon talking.

07:28.970 --> 07:30.990
So that's why four and nine does this split.

07:31.490 --> 07:35.830
Apart from that, it is able to identify that this is the sentence to one.

07:36.500 --> 07:43.640
Then our new sentence start just because we made this custom rule, like whenever you encountered this

07:43.850 --> 07:47.750
semicolon, the next token after was this semicolon.

07:48.020 --> 07:51.590
You can consider it as a starting of new sentence.

07:52.340 --> 07:57.610
And it is perfectly identified, all three sentence and segmented our life.

07:57.710 --> 08:03.020
So that is what the beauty of sentence segmentation in a spacy that.

08:03.350 --> 08:04.880
We are giving a very simple sentence.

08:05.330 --> 08:08.450
Then we'll give one legal twist here.

08:08.840 --> 08:11.000
But species smart enough to identify.

08:11.480 --> 08:16.160
That then we are given like different kinds of punctuation mark to the end of the sentence.

08:16.190 --> 08:21.230
But then Spacy didn't understand it and we created our custom rule and added to it.

08:21.730 --> 08:22.610
NLB Pipeline.

08:23.120 --> 08:23.510
All right.

08:23.540 --> 08:26.090
So that is all about the sentence segmentation.
