WEBVTT

00:01.340 --> 00:02.540
Hey, everyone, welcome back.

00:02.990 --> 00:09.890
So the new project on which we are going to work up on that Ashton review classification and mainly

00:09.920 --> 00:11.330
throughout this whole project.

00:11.420 --> 00:13.910
We are going to use this and indicate lively.

00:14.540 --> 00:14.870
All right.

00:14.900 --> 00:21.060
So let me first input the data set and we see our business problem that what we are going to achieve.

00:21.710 --> 00:23.330
And what is the classification?

00:23.420 --> 00:25.970
We are going to do so from the file.

00:26.420 --> 00:27.820
Just click on upload.

00:29.460 --> 00:30.570
And let me go to the.

00:30.860 --> 00:31.900
And before that stop.

00:32.290 --> 00:32.570
Right.

00:32.810 --> 00:37.010
I kept this restaurant and the could leave you start ESV Fine.

00:38.260 --> 00:38.590
OK.

00:40.370 --> 00:42.990
So you can see fine uploaded.

00:44.000 --> 00:48.840
And I hope you did this whole project to four different videos.

00:48.920 --> 00:53.910
So in this video, we'll be focusing on our business problem and importing all those later.

00:54.800 --> 01:00.640
Next lecture is very much important because where we are going to do all those cleaning step, whatever

01:00.660 --> 01:07.040
we learn in this NLB basic section, and then we are going to apply one very simple feature engineering

01:07.040 --> 01:10.100
model, which is nothing but a bag of or model.

01:10.550 --> 01:16.070
And then we'll be applying one classification algorithm, which is based on a base theorem, a nine

01:16.070 --> 01:17.210
Navys algorithm.

01:17.870 --> 01:18.260
All right.

01:18.350 --> 01:26.420
So let me put all those necessary and required libraries, just as these were like Nubby Mike, Rutley

01:26.560 --> 01:27.830
and Pangas Library.

01:28.340 --> 01:30.020
And let me just add undescended.

01:31.360 --> 01:31.730
All right.

01:32.210 --> 01:34.450
So let's just import our restaurant review.

01:35.680 --> 01:36.920
So we got.

01:39.120 --> 01:47.760
And this could CSC and let just import a restaurant reviews, not VSP, fine.

01:48.160 --> 01:52.850
So here the separate that will be again, Celesta.

01:54.550 --> 01:56.340
And let me just run it.

01:58.140 --> 01:58.580
All right.

01:58.630 --> 02:03.580
So we are able to successfully lead this fight, but one more argument.

02:03.610 --> 02:04.420
I want to push it.

02:05.160 --> 02:05.590
Be.

02:08.110 --> 02:10.750
Voting so that we just keep it.

02:11.500 --> 02:17.620
So why that is because sometimes you may observe that, you know, place of such a kind of textual data,

02:17.740 --> 02:19.950
you may encounter some level quotes.

02:20.200 --> 02:26.830
So those double code, it needs to be ignored while reading this CSP or kind of PSV.

02:26.900 --> 02:27.150
Fine.

02:27.880 --> 02:32.470
Let me put all the data in to the data frame object.

02:35.250 --> 02:37.170
Let's observe first few records.

02:40.610 --> 02:47.360
And you will see we have every loser available and each and every reviews has been assigned to label

02:47.360 --> 02:48.680
either zero or one.

02:49.650 --> 02:54.370
So if you win here indicates that it is a positive review.

02:54.390 --> 02:55.640
It is like by to use it.

02:57.510 --> 02:59.710
And Yellow indicates he had the negative review.

03:00.250 --> 03:01.660
User does not like it.

03:02.500 --> 03:07.230
If you want to see data from the taling will get idea how many.

03:07.250 --> 03:08.710
Go to Lacasse when a relevant.

03:10.460 --> 03:16.210
All right, so you can see the total from zero to 999 Lachman's thousand recordset are.

03:17.890 --> 03:22.190
Let's try to find out how many of them started one and how many of them are zero.

03:22.990 --> 03:28.600
So later, like and we are going to play this.

03:28.730 --> 03:31.750
Will you find this good KONE's?

03:34.680 --> 03:41.160
So we hope Total find out the code side belongs to Class Zero and find it, the codes belongs to class

03:41.160 --> 03:41.350
one.

03:41.370 --> 03:44.050
That means it's a quiet balance, let's say.

03:44.670 --> 03:48.810
And what object to force is to clean this all decks.

03:49.110 --> 03:55.410
And in the next video, we will see how to clean all those texts and build the model, which eventually

03:55.410 --> 03:59.970
going to put any weather review like this one.

04:00.170 --> 04:06.700
Are not genetically the reviews positive or negative based on this review tax.

04:07.140 --> 04:07.590
All right.

04:07.650 --> 04:08.840
See you in the next video.
