WEBVTT

00:02.220 --> 00:03.160
All right, everyone.

00:03.760 --> 00:06.790
So let's continue our discussion on tax summarization.

00:07.390 --> 00:13.390
And in this video, first, we are going to create those award frequency nets from this particular article.

00:13.720 --> 00:16.840
How many times each individual would appear?

00:17.230 --> 00:18.970
So let's say one, someone will appear.

00:19.000 --> 00:21.230
How many times my my will appear.

00:21.280 --> 00:21.940
How many times.

00:22.180 --> 00:29.020
So this movie are going to create one dictionary, which will hold up all those keys as an individual

00:29.020 --> 00:31.200
words and evaluates those.

00:31.330 --> 00:32.020
Counter that.

00:32.020 --> 00:34.620
How many times those particular one appear.

00:35.380 --> 00:38.170
So let me create one dictionary object.

00:38.220 --> 00:39.490
A blank look snoddy object.

00:39.520 --> 00:40.360
That will be a word.

00:42.100 --> 00:42.540
Let's say.

00:44.670 --> 00:51.560
And let me try to post all those over in our dark object.

00:53.560 --> 00:53.920
All right.

00:54.860 --> 00:57.350
Let me bring all those who were.

00:59.660 --> 01:00.110
All right.

01:00.980 --> 01:04.570
So these are the words we're going to work up on.

01:08.570 --> 01:10.800
Yes, that is starting from Maria Sarovar.

01:11.450 --> 01:18.460
Now, first thing, what we are going to do, we just try to find better tools for appearing stop wordlist

01:19.040 --> 01:21.530
order, punctuation, mark or not.

01:21.980 --> 01:28.280
If it doesn't appear, then only we'll go ahead for further processing because as it is appear as a

01:28.280 --> 01:33.800
part of contrition, Mark or a stop, or at least we just need to remove it and we are not going to

01:33.800 --> 01:38.390
proceed ahead for further processing of those particular token.

01:39.550 --> 01:40.180
So if.

01:41.810 --> 01:46.460
Let's say were not tax, not lower.

01:49.400 --> 01:56.390
Knocking, let's say stoppers, so we can create here the top wordlist from whatever we have imported

01:56.600 --> 02:00.510
earlier, so there will be a stop worse.

02:01.250 --> 02:06.080
Let me assign you two words as a list.

02:06.860 --> 02:12.380
So if the Lord was in attacks, norteño stoppers.

02:17.570 --> 02:24.810
Same like if or not tax, not lower.

02:26.800 --> 02:29.840
Uniting punctuation mark.

02:31.820 --> 02:35.040
Only will go ahead for further processing.

02:37.290 --> 02:44.150
So if those particular work is not a part of stoppers punctuation mark, we are going up in, those

02:44.150 --> 02:46.780
were in a word frequency count.

02:47.460 --> 02:49.410
So, again, we'll just try to find.

02:50.130 --> 02:51.840
Let me put into context.

02:53.550 --> 02:58.830
As of now and let's see whether it is able to remove all those stopwork and a contusion mark or not.

03:00.370 --> 03:03.340
All right, so you can see Maria Sharapova.

03:03.370 --> 03:05.380
There is no dog that is no Colma.

03:07.800 --> 03:11.780
And you can see everything like limo loading coal, my.

03:12.240 --> 03:14.490
I mean, that single quotes leading removed.

03:15.030 --> 03:20.910
So you can see that is not a part of, I think, contrition, maybe this kind of characters.

03:21.210 --> 03:21.670
All right.

03:22.410 --> 03:26.610
Let's not put much effort to update our punctuation.

03:26.610 --> 03:28.530
Mark, as of now, has an exercise.

03:28.530 --> 03:31.830
You can always go ahead and do that next days.

03:32.490 --> 03:37.680
Again, we'll try to find whether those particular were already existing over frequency or not.

03:38.040 --> 03:41.740
So first time, obviously, it will be a blank word frequency count.

03:42.720 --> 03:49.800
But as in, when you get fill up all those words frequency, it may happen that you are in contact somewhere.

03:49.860 --> 03:52.170
That is only a part of this word frequency.

03:52.560 --> 04:04.800
So if this word, not text, not in a word frequency, not keys, that is not a part of keys.

04:05.250 --> 04:11.220
So in that case, you can just add this word frequency key.

04:12.010 --> 04:12.260
Whoops.

04:17.700 --> 04:19.520
Or not.

04:20.210 --> 04:23.220
Tax is equal to one.

04:25.830 --> 04:26.330
Else.

04:31.670 --> 04:37.580
What we're going to do if it already appears, we can just simply update the calendar.

04:38.330 --> 04:39.290
So what happens here?

04:40.070 --> 04:44.660
Suppose this Maria appears two times in this whole article.

04:45.240 --> 04:47.110
So the first time it encounters that.

04:47.380 --> 04:50.360
That is not a stop, or at least that is not a punctuation mark.

04:50.390 --> 04:50.840
Let's go.

04:51.770 --> 04:54.590
It is not even a part of a word frequency keys.

04:55.220 --> 04:59.360
So in that case, we'll just add it to the word frequency list.

04:59.750 --> 05:07.130
Having a key media and its value will be one moment, the next time this Maria appears again.

05:07.610 --> 05:09.050
So again, it is not a stopper.

05:09.140 --> 05:10.370
It is not a contrition map.

05:10.730 --> 05:13.880
But now in this case, it is a part of this case.

05:14.210 --> 05:15.800
So what happened that earlier?

05:15.950 --> 05:17.900
We have only assigned this.

05:17.900 --> 05:19.610
Maria is one.

05:20.000 --> 05:22.940
Now we are going to put this down to the four keys.

05:23.060 --> 05:25.670
Maria, you are going to add one more.

05:25.940 --> 05:27.620
So now, Maria, become two.

05:28.160 --> 05:34.190
And if I just execute it and let us bring this the word.

05:36.280 --> 05:37.000
Frequency.

05:38.110 --> 05:43.930
You can see this place appears one day, Murray appears mundane side up or someone time, basically

05:43.930 --> 05:46.260
when times ten is six times.

05:46.370 --> 05:47.350
Little six time.

05:48.310 --> 05:48.790
All right.

05:48.820 --> 05:56.580
So we got every single unique word exist in this article and has been assigned to some sort of counter

05:56.580 --> 05:56.750
that.

05:56.830 --> 06:00.490
How many times in that particular article and those four appear.

06:01.730 --> 06:04.340
Let's try to find what is the maximum values.

06:06.060 --> 06:10.760
So we can just simply grab it from max function that is in will functional pattern.

06:11.460 --> 06:13.310
So word frequency.

06:16.010 --> 06:17.420
Large values.

06:19.840 --> 06:21.490
And you can see it's six.

06:24.320 --> 06:28.220
Let me assign it to Max Sanders called Frequency.

06:29.130 --> 06:34.900
So as the maximum frequencies six, let's try to normalize all those scores.

06:35.300 --> 06:36.230
So what we can do?

06:36.260 --> 06:40.150
We can just simply divide every single values by the six.

06:40.520 --> 06:46.340
So let us agree to what all those words, in a word, frequency keys.

06:51.740 --> 06:53.390
And word frequency.

06:56.370 --> 06:59.410
What we are just going to divide it by.

07:06.230 --> 07:08.120
Max Sanders, good frequency.

07:10.370 --> 07:11.950
And now, if you tried to bring.

07:15.360 --> 07:17.370
Widen and discord frequency.

07:19.200 --> 07:24.340
All school will be normalized, so one become zero point one six six.

07:25.900 --> 07:29.530
So now we go to school for every single word.

07:30.160 --> 07:32.650
I would say a unique word appearing, not practical.

07:33.130 --> 07:38.430
Next thing is this particular school we are going to apply on the individual sentence.

07:38.740 --> 07:42.350
And we are trying to find the score of each sentences.

07:42.670 --> 07:44.910
So those thing, we will see the next video.
