WEBVTT

00:00.780 --> 00:01.810
All right, everyone.

00:01.830 --> 00:05.650
So in the last video, we have successfully installed this first text library.

00:06.190 --> 00:10.800
Now we are going to use this first text to do a text classification.

00:10.800 --> 00:16.830
And for that, we are going to use the same review, a restaurant review files that I just made some

00:16.830 --> 00:18.070
modification in that file.

00:18.090 --> 00:25.260
Let me just open it and I'll share this with you, because this text requires data in such a kind of

00:25.290 --> 00:25.970
format.

00:25.980 --> 00:28.770
So I just formatted all those data in this way.

00:29.220 --> 00:33.470
So you can see corresponding to level one, we have a ham data.

00:34.140 --> 00:39.810
Either way, I would say positive reviews and corresponding to zero.

00:39.810 --> 00:41.430
It's a negative review.

00:41.760 --> 00:45.450
This way there are a thousand records are available.

00:46.070 --> 00:46.410
All right.

00:46.890 --> 00:56.970
So next thing is, let me go to the terminal or will the funding will create a new folder and up past

00:57.150 --> 00:57.480
X.

01:01.540 --> 01:07.150
Or I will write it like a tax classification.

01:08.320 --> 01:16.780
All right, and let me first go to our first tax where we are installed as fast tax.

01:16.780 --> 01:20.020
So it will be in a city fast tax.

01:20.620 --> 01:21.290
Let's go.

01:21.580 --> 01:22.180
What is there?

01:22.180 --> 01:22.730
Inside there?

01:23.530 --> 01:25.480
So inside the first tax, there will be one more.

01:26.270 --> 01:32.770
Let me go inside the box and you can see there are a number of files that are available.

01:33.160 --> 01:34.660
We can just simply type.

01:37.430 --> 01:47.750
Foster hopes in this case it didn't work, so you can just simply type notes and you can see now there

01:47.750 --> 01:49.370
are a number of options provided.

01:49.370 --> 01:54.590
That means these are the things you can do with this tax levy so far is like a on.

01:54.920 --> 01:56.600
You can provide a different arguments.

01:56.990 --> 01:59.570
So Carmines will be something like a fast tax.

01:59.570 --> 02:04.370
You can do the supervised learning, you can test your different models.

02:04.370 --> 02:05.720
You can have a prediction.

02:05.720 --> 02:06.950
You can apply this Kebra.

02:07.310 --> 02:11.500
So Majority will try to see in this video and upcoming videos.

02:12.020 --> 02:14.000
So first thing what we are going to do.

02:14.480 --> 02:18.050
Let me go to the tax classification and remember this.

02:18.050 --> 02:25.970
But every time we are just going to fire this pot to fire the first tax before that, let me go back

02:26.480 --> 02:31.550
and let me go to the next top tax classification.

02:31.910 --> 02:36.690
And I'm just going to move this review, start to fight inside the tax classification.

02:37.310 --> 02:37.790
All right.

02:38.030 --> 02:43.070
Let me put tallies here and I have reviewed not the file available with me.

02:44.030 --> 02:50.240
All right, let me put get and it will display this previews, not testify.

02:50.480 --> 02:51.050
All right.

02:51.060 --> 02:52.050
So it's all available.

02:52.850 --> 02:54.470
Next thing, what we are going to do.

02:54.830 --> 02:58.880
Let me clear the screen, put this data into two different buckets.

02:59.090 --> 03:06.080
So something like some 70 percent of your data will put it inside the cleaning basket and remaining

03:06.080 --> 03:07.100
on our testing basket.

03:07.460 --> 03:11.010
So what we will do, we are going to use this hard.

03:11.180 --> 03:11.660
Come on.

03:12.480 --> 03:14.750
Let me make it a little bigger screen.

03:16.640 --> 03:22.850
Head first, the let's say seven hundred record, I just keep it inside the.

03:23.870 --> 03:24.570
Cleaning bucket.

03:24.890 --> 03:31.970
So I'm going to take the data from reviewing the text and put this error so it will take first 700 recalled

03:31.970 --> 03:39.280
because of this her come on and let's say it will put it inside the reviews, not drain file.

03:39.770 --> 03:40.880
Don't worry about the expense.

03:41.000 --> 03:41.870
It just the name.

03:41.870 --> 03:50.540
There is nothing like extensive and from the last or Tail-End will take another to ninety nine because

03:50.540 --> 03:53.870
I think we have a nine hundred and ninety nine records are available.

03:54.320 --> 03:56.360
So what will do reviews.

03:56.360 --> 04:00.550
Not the last two ninety nine data point.

04:01.130 --> 04:02.960
We will put it into texting.

04:03.590 --> 04:08.270
So that will be inside the reviews, not what it will be a test.

04:09.100 --> 04:09.770
All right.

04:10.070 --> 04:15.580
Now if you go to text classification will have now three files.

04:15.890 --> 04:21.540
So we are going to work with these reviews that trained first to build a model and then we'll work with

04:21.540 --> 04:23.000
the reviews that test.

04:24.950 --> 04:32.960
So our task is to create a classification model with which reviews doctrine before that, I want to

04:33.260 --> 04:41.420
show you that if we want to run this fast library directly from this spot, we can do home UNGEI and

04:41.420 --> 04:45.560
we have four stacks inside the first text via one more text.

04:45.560 --> 04:49.790
And you can see every time when we want to fight this campaign, we'll fight it like this.

04:50.010 --> 04:57.110
Otherwise you can include this fast text inside your party variable also so supervised, supervised

04:57.350 --> 04:57.920
learning.

04:58.190 --> 05:00.260
We are going to train our classifier.

05:01.450 --> 05:06.770
So come on, which we are going to use for training, our classifier will be a supervisor.

05:06.970 --> 05:10.510
So it will be supervised.

05:12.760 --> 05:14.860
Now, this requires two things.

05:15.340 --> 05:16.800
One will be your input file.

05:17.080 --> 05:20.380
So obviously our input file will be reviewed doctrine.

05:20.860 --> 05:26.510
And the next one will be what is your output, where you want to put your model.

05:27.130 --> 05:29.710
So let me provide the option.

05:29.710 --> 05:33.100
Or if you don't know what are the options, you can give it with the supervisor.

05:33.100 --> 05:38.800
Come on, you can just simply press with supervisor enter and you will have a whole lot of options are

05:38.800 --> 05:39.280
available.

05:39.280 --> 05:39.970
Which supervisor?

05:40.210 --> 05:46.870
So these are the mandatory options and remaining all might be optional options out there, because as

05:46.870 --> 05:50.200
you are going to train on which data set, you must provide it.

05:50.200 --> 05:55.300
So that's why have an input and an output if the MASKEW argument while.

05:56.230 --> 06:04.750
Training is classified, so one argument will be hyphen input that will be nothing but play reviews,

06:04.750 --> 06:12.700
not train, and the next one will be output to hyphen output and.

06:13.640 --> 06:18.680
What are the output finally, we can give any filename, so I'm going to give you like a model, let's

06:18.680 --> 06:22.340
say model and the second one under review.

06:22.670 --> 06:27.760
Let me give just the model and the score one that's perfectly fine, legendary.

06:27.800 --> 06:34.520
And it will start building a model for us, a supervised learning model based on this class text library

06:34.760 --> 06:40.110
and whatever data we provided of a training model of RVO 700 datapoint.

06:41.540 --> 06:42.080
All right.

06:42.090 --> 06:43.670
So we'll just see it again.

06:43.700 --> 06:48.650
So it just immediately did a very fast all those training.

06:48.650 --> 06:55.520
But if you just do that, you will have model one dot mean and model and it's got one dot.

06:56.030 --> 07:02.360
So as a model, it has created a two model and it's got one dot bin and model and it's called one dot

07:02.840 --> 07:04.040
file it has created.

07:04.250 --> 07:06.470
That means a training process is over.

07:06.680 --> 07:08.480
Now we can use this model.

07:09.930 --> 07:16.980
To build a prediction, so let me just clear it and how we are going to predict it so far that same

07:17.760 --> 07:26.350
home and we have stacks and stacks so we can even create a place for it.

07:26.370 --> 07:28.120
So in Linux, it's impossible.

07:28.400 --> 07:31.410
So let me give you some Malia's name.

07:31.440 --> 07:34.400
Let's see some fast track.

07:34.420 --> 07:35.610
So it will be F.P..

07:35.640 --> 07:40.620
I I'm just providing here and put this whole carmin inside the.

07:43.410 --> 07:50.420
So Saulius, God created, we can just search for aliens, so we have aliens, did it God created,

07:51.010 --> 07:54.180
yes, God created a physical look how long it takes.

07:54.840 --> 08:00.030
Now I can just simply write like F.T. But obviously I am not sure whether it will work in another time

08:00.030 --> 08:00.540
or not.

08:00.990 --> 08:01.590
So after.

08:02.280 --> 08:07.530
Yeah, it won't work here because it is specific to this one on only the cell.

08:07.830 --> 08:09.370
So let me just clear it.

08:09.720 --> 08:17.400
And now from onwards I'm going to write this feat so every word options are available for prediction.

08:17.400 --> 08:25.350
So we have one command is available like a real one, more commands like a prop and we have one more

08:25.350 --> 08:29.480
like a test so we can have our evolution on our testing dataset.

08:29.490 --> 08:31.950
But before that, let us test on our.

08:33.410 --> 08:44.100
Random data like let's give any data by hardcoded input so happy and I'm going to provide here a pretty

08:45.510 --> 08:51.880
hefty prediction, next one is on using my using which model they are going to do it.

08:51.900 --> 08:56.960
So that will be nothing but model one or model underscore one not been filed.

08:57.510 --> 09:05.250
And as we are not going to provide any text data, so I'll just provide hyphen and it will ask me for

09:05.430 --> 09:08.180
which data point you want to do the classification.

09:08.460 --> 09:16.290
So I will, I will just say like I love food and let's see what we classify.

09:17.070 --> 09:18.090
It is a level one.

09:18.090 --> 09:19.080
That means hillocks.

09:19.590 --> 09:21.240
I do not.

09:22.760 --> 09:28.250
Let me whoops, I do not allow food in a restaurant.

09:29.030 --> 09:33.880
Still, it is classified as the one we can even try with the chili.

09:34.580 --> 09:35.850
We don't have enough data.

09:36.040 --> 09:36.440
Might be.

09:36.440 --> 09:38.870
So I don't like four.

09:40.890 --> 09:47.220
Yes, so it has not taken into consideration, like, I guess do not part, and that's very important.

09:47.410 --> 09:50.220
What I can say, like I d like it.

09:50.430 --> 09:50.970
Let's see.

09:52.080 --> 09:58.170
Still level one only, but I'm not sure about that, but maybe we don't have sufficient data set, but

09:58.170 --> 10:00.420
that is how you can train your classifier.

10:00.900 --> 10:03.080
Let me just come out from here.

10:03.660 --> 10:11.430
Now, apart from that, you can have a predict probability so it will predict the probability that particular

10:11.430 --> 10:15.290
data points belongs to level one or it will be level zero.

10:16.360 --> 10:24.030
Next thing is, we can have a complete based on testing data, so we have all the testing data set.

10:24.040 --> 10:26.920
So you can see this is reviews the test.

10:27.280 --> 10:32.040
Let me open it and I'll show you some data set reviews or test.

10:33.010 --> 10:33.600
All right.

10:33.610 --> 10:36.010
So there are some there are some one.

10:36.790 --> 10:39.570
And let's just try one of them.

10:40.540 --> 10:47.830
So all 299 a record, it will do the calculation or do the prediction and it will finally give us the

10:47.830 --> 10:48.400
accuracy.

10:48.760 --> 10:50.070
So let me try it.

10:50.120 --> 10:58.840
Every test that will be overcome on if we just execute it, it will tell us what are the few more things

10:58.840 --> 10:59.700
you need to provide.

10:59.710 --> 11:02.080
So that will be a model model.

11:02.080 --> 11:02.950
Model filename.

11:05.430 --> 11:14.730
So F.T. test the inviable model filename, so that will be nothing but model one that bin and let me

11:14.730 --> 11:18.150
provide filename, so that will be the test.

11:18.930 --> 11:19.580
All right.

11:19.890 --> 11:26.220
So you can see they have given that are total two hundred and ninety nine data points and four places

11:26.250 --> 11:29.500
and they are given this much and that is nothing but a recall.

11:29.580 --> 11:34.740
So they will be also delivered to Dessaix and present also there would be zero point thirty six.

11:35.460 --> 11:36.060
All right.

11:36.270 --> 11:44.940
Now we are just trying this model without using any modification of default parameter, but there are

11:44.940 --> 11:47.280
a lot of things you can always do.

11:47.670 --> 11:52.170
So let's go to our original commands of supervised learning.

11:52.890 --> 11:53.390
Yes.

11:53.420 --> 11:55.660
So that's our advice.

11:55.950 --> 11:59.250
Now, along with that, there are a number of options you can provide.

11:59.460 --> 12:06.180
So let's say you want to run your model for let's say there will be some default number of people for

12:06.180 --> 12:07.500
this model, Gaudron.

12:07.680 --> 12:13.200
But you can run it for, let's say, a thousand époque so you can have a provision like Let Me provided

12:13.200 --> 12:21.630
another model like Model and Escoto and I will do hyphen people, let's say Chozen.

12:21.930 --> 12:22.590
All right.

12:23.850 --> 12:30.570
Apart from that, let's say while conversion of text into a number of feature and coding or I would

12:30.570 --> 12:38.280
say feature engineering does, I want to use this and grammar so I have one more options, like a hyphen

12:38.280 --> 12:38.700
W.

12:40.520 --> 12:41.000
Void

12:43.910 --> 12:49.590
Ingrams, let's say, bigram model, I want to use it or I want to use Trigram model, I can use it.

12:49.610 --> 12:50.880
Let's give it a two.

12:51.290 --> 12:53.060
So this way there are a number of options.

12:53.060 --> 12:56.450
You can make it and it will create a brand new model for you.

12:57.420 --> 13:04.360
All right, so now you can see this many numbers of people take a little amount of time and we have

13:04.360 --> 13:08.960
a Bigram model it has created, let's test the accuracy again.

13:08.970 --> 13:11.820
So we are going to use this prediction.

13:12.270 --> 13:13.080
Yes, predict.

13:13.680 --> 13:17.580
But instead of using model in the score, one will create it now, model in this two.

13:18.030 --> 13:22.980
And due to that, you can see more than there's got to be not to backfill got created.

13:24.330 --> 13:33.720
Let me give Nene some reviews like I like your food here.

13:33.840 --> 13:34.740
So it has this.

13:35.700 --> 13:41.230
Predicts there will be a label and a second one, I'm not sure about it, but that's OK, maybe some

13:41.230 --> 13:42.920
more trainings are required to address it.

13:43.260 --> 13:48.350
And you can see we just come online from by writing a few comments.

13:48.360 --> 13:55.380
If you have a dataset, you can train your model with this vast text library, and it is one of the

13:55.380 --> 13:56.790
very high performance library.

13:57.890 --> 14:03.560
So this is mainly used for the tax classification task here, we used to classify the review between

14:03.770 --> 14:04.920
positive and negative.

14:05.210 --> 14:10.550
But for any kind of tax classification task, whether it's a binary classification problem or multiclass

14:10.550 --> 14:16.490
classification problem, always you can use it whenever you have a huge amount of tax rate available

14:16.500 --> 14:16.850
with you.

14:17.420 --> 14:17.960
All right.

14:17.990 --> 14:19.310
See you in the next video.