WEBVTT

00:01.370 --> 00:01.980
Hey, everyone.

00:02.010 --> 00:09.860
So next Project Cough and LP, which we're going to work upon, is a tax simulation and for illustration.

00:09.950 --> 00:12.510
But it was kept here one big tax.

00:13.080 --> 00:15.350
So what is tax summarization is.

00:15.780 --> 00:20.250
So we all know that we are all bombarded with lots of lots of data.

00:20.640 --> 00:25.920
There is lots of articles we keep on reading on a daily basis.

00:26.310 --> 00:28.490
So there are lots of data around us.

00:28.980 --> 00:30.460
Mostly in a tax forms.

00:30.990 --> 00:33.260
I won't say all because I in a tax form.

00:33.750 --> 00:37.110
But many times we keep on releasing a tax forum.

00:37.560 --> 00:40.830
And it consumes lots of lots of a long time.

00:41.280 --> 00:45.790
It may happen that there will be, let's say, 5000 words article.

00:45.960 --> 00:50.090
And maybe the important point will be just five or 10 important points today.

00:50.520 --> 00:56.940
So even if you just can perhaps somehow those five important points, that will be sufficient for you

00:57.600 --> 00:59.910
to grasp those whole article.

01:00.450 --> 01:07.800
So the idea behind this automatic tax symbolizes and is the same that whether you can pick up all those

01:07.800 --> 01:16.380
important 10 important statement on a sentence or some kind of somebody out of it, can you grab it

01:16.950 --> 01:21.330
through automatic this natural language processing related techniques?

01:21.900 --> 01:25.080
And that is what the idea behind this tax summarizes.

01:27.110 --> 01:33.270
Something like a finding all those useful information out of huge amount of tax.

01:33.860 --> 01:37.020
And even it will reduce your reading time also.

01:37.420 --> 01:40.430
So you can even possess more and more documents.

01:40.490 --> 01:46.660
So you can be very much choosy in choosing or selecting some kinds of article.

01:46.910 --> 01:51.230
Because now you can process more and more article while reading.

01:51.530 --> 01:56.870
So mainly these are some of the big advantage of this tax summarizes an application.

01:58.290 --> 02:06.200
And as I told you, I already created this particular tax so you can see Maria set up for a related

02:06.200 --> 02:07.710
one article, is that so?

02:07.710 --> 02:14.500
I would highly suggest to you that you just pause this video and read this whole article, which I kept

02:14.500 --> 02:16.770
putting a factual string.

02:17.400 --> 02:22.200
Let me executive and let's see how many characters are there inside this.

02:28.010 --> 02:34.540
So you can see we have fifteen hundred sixty three characters out there inside this particular stream.

02:35.000 --> 02:43.100
So instead of writing such a huge amount of text, can we just come up with some limited set of sentences

02:43.160 --> 02:46.930
or words like a hundred maybe or maybe 200?

02:47.540 --> 02:49.580
And that will just easily summarize it.

02:50.090 --> 02:55.520
So, as I told you, just pause this video and read this full article.

02:55.550 --> 03:00.820
So when we find a summarization of this particular article, you'll get to know about.

03:01.040 --> 03:06.620
How good of a summarization is because as a human, you can create those somebody easily, as far as

03:06.620 --> 03:12.470
you can understand, those cyclical tax or any huge amount of tax you are trading on.

03:13.100 --> 03:13.450
All right.

03:13.520 --> 03:18.600
So let's proceed ahead with importing this minimum library.

03:19.130 --> 03:26.750
And then we'll see what techniques and mechanism we are going to apply to choose all those important

03:27.320 --> 03:29.290
information out of this article.

03:32.150 --> 03:33.630
So let me create.

03:35.130 --> 03:37.130
Let's import less Pesi Lively

03:39.830 --> 03:41.630
and Fromm's Tracy.

03:43.360 --> 03:47.420
Not Lange, not even.

03:50.220 --> 03:54.850
Dutch Topo's We are going to import stop.

03:56.230 --> 04:01.780
And one more thing from this ruling class, we are going to import increased punctuation.

04:05.150 --> 04:05.600
All right.

04:06.040 --> 04:06.340
Let me.

04:06.530 --> 04:06.880
Security.

04:09.220 --> 04:11.640
Let's load our small size, Martin.

04:12.490 --> 04:17.950
So it will be NLB, spacey, dark lord.

04:19.920 --> 04:20.800
An endless court.

04:20.850 --> 04:21.170
Court.

04:22.390 --> 04:24.760
And this put a verb in this quote, test him.

04:25.480 --> 04:26.380
Let me run.

04:30.730 --> 04:39.940
Let's apply this whole tax honor and be model so far that we are going to use this in L.P tax.

04:41.640 --> 04:44.180
Let me assign it to some dark object.

04:47.300 --> 04:52.280
And legislate played over every single token between will cook in.

04:53.230 --> 04:56.780
No, let's bring them to conduct tax.

04:57.620 --> 05:00.680
And let us keep it as a list comprehensive.

05:02.030 --> 05:04.360
So that will be netting what tokens?

05:08.360 --> 05:12.230
Let me print pawprints.

05:16.460 --> 05:16.820
All right.

05:16.850 --> 05:20.810
So these are the tokens with which we are going to work up on.

05:21.590 --> 05:26.180
All right, so one more thing is if you just display this punctuation.

05:29.130 --> 05:30.910
So these are all contrition marks.

05:31.770 --> 05:36.400
Now, what we are going to do, we are going to add one more contrition that will be slashed.

05:36.590 --> 05:40.140
And so new line and let me.

05:40.140 --> 05:42.540
Assigning to same contrition.

05:47.730 --> 05:49.260
And if you displayed.

05:51.570 --> 05:55.200
You can see now less and also exist as a punctuation mark.

05:55.920 --> 06:01.980
All right, so deserted some of the minimum stuff while reading your actual article, which we are going

06:01.980 --> 06:03.960
to use for the somebody purpose.

06:04.650 --> 06:05.010
All right.

06:05.970 --> 06:06.990
So there is a first step.

06:08.360 --> 06:08.780
Hopes.

06:11.890 --> 06:14.690
I kept here the first Luol heading.

06:14.950 --> 06:17.030
Let me make it here.

06:17.200 --> 06:18.540
So firstly, 130.

06:21.960 --> 06:26.480
Next, is tax planning really do and then sentenced tokenization.

06:27.180 --> 06:27.870
And before that?

06:29.020 --> 06:34.030
Let me discuss what idea on which we are going to create this summarization.

06:34.780 --> 06:41.860
So there is a one basic technique which we can apply, like we can try to do the score of individual

06:42.000 --> 06:45.580
sentence so we can do the sentence tokenization.

06:46.180 --> 06:53.140
And to each particular, let's say, sentence, we will give us some score and we can try to find out.

06:54.390 --> 07:00.510
The highest court or first, let's say 20 or 30 percentage of those standards, and that will be nothing

07:00.540 --> 07:01.940
but a tax summarization.

07:02.910 --> 07:09.560
But now the question is how to view those scores to individual sentence for that.

07:09.570 --> 07:15.690
There is a one basic idea you can apply, like we can first create a word frequency.

07:16.170 --> 07:22.800
And those frequency will give us the information that how many times each individual vocabulary word

07:22.860 --> 07:24.330
here appears.

07:24.900 --> 07:34.370
And based on that, we can give some score to individual words in a sentence and score.

07:34.470 --> 07:39.810
We can just add it up according to individual voice appeared in a sentence.

07:40.170 --> 07:46.380
So as we process along, you will get idea in sooner text leaning part in a sentence, tokenization.

07:47.010 --> 07:50.540
In the next video, we will see how to achieve those things.

07:50.640 --> 07:54.080
First, we'll try to create those word frequency contact.

07:54.610 --> 07:54.930
All right.

07:55.000 --> 07:56.420
So see you in the next video.