WEBVTT

00:01.040 --> 00:01.600
Hey, everyone.

00:01.670 --> 00:06.200
So next thing which we are going to do is sentence tokenization.

00:06.860 --> 00:14.570
So let me just segment all those sentences and then we'll try to find those caudal individual sentence

00:14.630 --> 00:16.660
based on this word frequency count.

00:16.880 --> 00:17.870
We have created.

00:19.630 --> 00:21.950
So let's say for St..

00:22.450 --> 00:30.490
So this thing we have already done in this sentence segmentation review, that will be a sense and say

00:30.890 --> 00:32.950
so there will be a list comprehension.

00:33.010 --> 00:33.950
I hope take.

00:35.030 --> 00:41.300
And every single sentences I will carry has a list of sentence.

00:42.390 --> 00:45.240
So I can see, like, it's a san.

00:46.570 --> 00:48.810
Underscore, let's say, tokens.

00:49.720 --> 00:51.100
Hey, let me bring

00:53.680 --> 00:55.000
centredness tokens.

00:56.540 --> 00:58.760
All right, so individuals sentence.

00:59.450 --> 01:03.740
You can see he's a part of one single values.

01:06.440 --> 01:11.340
No, now a three do it every single sentences.

01:11.840 --> 01:17.090
And we will try to provide a score for every single sentence.

01:17.750 --> 01:21.020
And for that, let's create the same way, like a dictionary.

01:21.350 --> 01:26.750
So it will be a sentence score, let's say score and one empty dictionary.

01:27.230 --> 01:35.030
And this dictionary will wilhoit the individual sentences as a key and its value corresponding to those

01:35.200 --> 01:36.060
sentence scores.

01:36.650 --> 01:43.160
And in the next video, the next processing on words, we will try to grab all those 30 percent digital

01:43.250 --> 01:45.010
maximum sentence score.

01:45.580 --> 01:47.420
Holly, let me executive.

01:49.020 --> 01:57.690
And first, Lexi cradle what every single sentence so far in the same sentence broken.

01:59.620 --> 02:02.230
We are going to recreate everything that even a word.

02:03.520 --> 02:05.740
For Vought in.

02:07.410 --> 02:07.670
Sank.

02:09.410 --> 02:11.240
And let us print all those for.

02:11.520 --> 02:12.170
Plus the hot.

02:13.870 --> 02:17.800
So these are all words, no individual sentences.

02:18.760 --> 02:19.150
All right.

02:19.780 --> 02:25.050
Next thing is based on individual verse, frequency, counter.

02:25.240 --> 02:28.780
We are going to go to school to individual sentences.

02:30.370 --> 02:31.660
So now for each word.

02:31.840 --> 02:37.570
First of all, we will try to find whether those were appears in a dictionary or not.

02:38.380 --> 02:44.140
There is a 100 percent chance that it will appear because we'll just grab this thing from the same article.

02:44.170 --> 02:51.520
Only it may happen that this automated text summarisation you will be built upon, let's say, million

02:51.520 --> 02:53.350
different tactical like for the treaty.

02:53.920 --> 02:58.210
And when you do such a kind of testing, you will encounter some different article.

02:58.650 --> 03:03.300
And that's why this many checks, we are going to do it for our kids.

03:03.340 --> 03:09.840
It is not required because we are doing training and you can say kind of testing or kind of text summarization

03:09.880 --> 03:10.430
automated.

03:10.580 --> 03:12.430
We on the same article.

03:12.730 --> 03:17.020
So obviously all those were you will definitely appear in this word frequency.

03:17.020 --> 03:28.330
Counter that toward North Texas, lower in frequency.

03:29.050 --> 03:34.990
Your keys, if it appears inside the keys, then only real Kaposi.

03:35.350 --> 03:35.470
Hey.

03:36.730 --> 03:41.520
Because in that case, only Velho discord associated with individual would.

03:43.250 --> 03:48.140
Now, let's first grab if that sentence up here in the sentence or not.

03:48.230 --> 03:54.020
So if, say, in sentence score.

03:57.650 --> 03:58.630
Not peace.

03:59.640 --> 04:06.530
So the first time when some sentences has not been assigned any score, those particular, he definitely

04:06.800 --> 04:07.790
doesn't exist.

04:08.150 --> 04:15.170
And if those particular sentence already appears in the sentence, quote, it may be assigned some school.

04:15.740 --> 04:20.030
And in that case, we are just going to update those code based on different was.

04:21.120 --> 04:24.220
So instead of sentencing, we are going to foster.

04:24.490 --> 04:27.990
Look, if sentence is not appear so fast, partition.

04:29.250 --> 04:37.930
We will do like this and we are going to use this sentence scor sentence.

04:38.040 --> 04:41.280
So individual sentence inside this sentence called.

04:42.730 --> 04:48.500
We will give some weight, each of those weight age will be associated with void.

04:49.120 --> 04:50.900
So that will be a move.

04:51.730 --> 04:54.980
And those particular words value school.

04:55.390 --> 04:57.580
We will get it from our frequency.

04:57.910 --> 05:01.420
So from our frequency, we will get those score.

05:02.970 --> 05:07.430
Let just make it tax, not lower.

05:08.470 --> 05:08.850
All right.

05:09.410 --> 05:17.030
And suppose he already appears in sight that we are just going to update it so as.

05:18.960 --> 05:20.560
Let me selective,

05:23.820 --> 05:25.320
let me put here plus.

05:26.250 --> 05:29.490
So if all those sanctions appear in a sentence court.

05:30.560 --> 05:36.010
In that case, it is just going to update those sentenced based on what they would appear.

05:36.110 --> 05:39.930
And for that quote, what is the schools associated with?

05:40.970 --> 05:45.620
Let me add only and it will fill up this sentence, quote, lectionary for us.

05:47.190 --> 05:51.810
All right, so then let's try to display sentence quote.

05:53.760 --> 05:54.810
Let me bring Kate.

05:59.430 --> 05:59.900
All right.

05:59.950 --> 06:04.800
So you can see the first ever sentence Maria Sharapova has physically.

06:05.560 --> 06:09.180
It has its codes like zero point eight three three eight.

06:09.900 --> 06:10.320
All right.

06:10.660 --> 06:12.720
Same way like four seconds sentence.

06:13.620 --> 06:15.430
The score is one point lately.

06:15.910 --> 06:22.720
Now, what we are going to do will try to grab all the score, having the maximum value.

06:22.730 --> 06:26.740
So first 30 percentage of all the sentences having a maximum value.

06:27.220 --> 06:29.850
And those particular sentence will just combine.

06:30.160 --> 06:33.550
And we will assign it as a summary of all.

06:33.570 --> 06:34.850
It wasn't exciting.

06:35.650 --> 06:39.460
So getting those thirty percent is maximum scoring sentence.

06:39.490 --> 06:40.960
We will see in the next 12.
