WEBVTT

00:00.960 --> 00:02.400
Hello, everyone, and welcome back.

00:03.240 --> 00:09.180
So the new topic, which we are going to learn in this accent is a word M80, and it is one of the very

00:09.180 --> 00:14.640
important and crucial step, you know, any kind of natural language processing, delicate task.

00:15.000 --> 00:19.870
So what I'm reading is you can see like it has complete political slides.

00:20.310 --> 00:21.830
There's an LP field.

00:22.260 --> 00:24.630
So what exactly avoid embedding is?

00:26.480 --> 00:29.000
Let's see if we get to straight to the computer.

00:29.360 --> 00:35.810
And if you tell computer to match character by character, I mean computer can easily do it.

00:35.930 --> 00:39.520
And computer can do very much at a faster rate.

00:40.820 --> 00:47.360
But suppose if you're searching for, let's say, massee on a Google Web search and you'll get the results

00:47.360 --> 00:51.760
related to football, also, you will get results related to internal Loiselle.

00:52.130 --> 00:54.800
So why such things happen as a human?

00:54.890 --> 00:59.390
We know that this message is related to football lives that aren't always related to football.

00:59.870 --> 01:02.340
And this three towns are interconnected.

01:02.840 --> 01:09.470
But that doesn't signifies with a complete string matching because these two strings are completely

01:09.530 --> 01:09.920
different.

01:10.220 --> 01:16.820
So how computer can understand that these three things are related to each other, even if there is

01:16.820 --> 01:17.840
no magic between them.

01:18.220 --> 01:23.960
This football itself is a in terms of computer representation, it's a completely different thing compared

01:23.960 --> 01:29.330
to what Partovi, Masti and other Nilo represented in a computer.

01:29.810 --> 01:33.740
So that string representation is not sufficient.

01:34.280 --> 01:40.220
While comparing this type of things, but as we see in Google Web site, it is quite smart enough.

01:40.580 --> 01:46.190
And they're using internally this voyle imaging technique so that we get the idea that when you search

01:46.190 --> 01:49.280
for Masoli, you will get the results related to football.

01:49.340 --> 01:52.070
And and so let's take one more example.

01:52.490 --> 01:54.740
Let's say Happel is a tasty fruit.

01:55.190 --> 02:03.240
Now, in this particular sentence, how can Computer understand that apple is a fruit which can be eaten?

02:03.650 --> 02:05.430
But that is not an organization.

02:05.900 --> 02:12.050
So that is the kind of intelligence this void embedding will bring into our natural language processing.

02:12.240 --> 02:12.620
Martin.

02:13.780 --> 02:20.150
Well, the meeting is all about understanding about your tax, the semantic represent and exist between

02:20.360 --> 02:24.020
individual votes, and let's try to formally define it.

02:24.950 --> 02:31.220
So the word I'm reading is all about the representation of your word, which captures the meaning of

02:31.220 --> 02:31.930
immediate word.

02:32.270 --> 02:35.480
The semantic relationship exists between the different types.

02:35.510 --> 02:36.230
So what?

02:36.650 --> 02:40.540
And the same word which is being used in a different kind of contexts.

02:41.030 --> 02:47.120
And all this thing, we are going to get implemented by this word embedding technique, which is nothing

02:47.150 --> 02:50.150
but a numerical representation of your text.

02:50.330 --> 02:57.350
So once you convert this text into kind of numerical representation, you can have a comparison between

02:57.350 --> 02:57.520
them.

02:57.830 --> 03:01.070
You can define some sort of distance measure criteria.

03:01.670 --> 03:08.150
How closely these two words are related to each other or how far they're situated in a full gamut of

03:08.540 --> 03:09.770
Englis Dictionary.

03:10.610 --> 03:12.830
But why does what I'm reading is required?

03:13.250 --> 03:18.320
Now, if you observe that many of the machine learning algorithm are almost all machine learning and

03:18.380 --> 03:24.590
deep learning architecture, I am indeed just cannot process the text directly in that raw form.

03:24.830 --> 03:33.210
So my idea here is to convert this plane taxied to kind of raw numbers and once we convert it a raw

03:33.210 --> 03:39.240
numbers so we can apply a different machine learning algorithms like a classification variation and

03:39.300 --> 03:42.270
listening because they require such a kind of number as input.

03:42.500 --> 03:46.280
They just cannot understand text statically as the input and void.

03:46.310 --> 03:51.800
Embedding plays a very important, vital role for converting this vote to a kind of text.

03:53.010 --> 03:55.900
Now, what are the different types of war and embedding techniques?

03:55.940 --> 04:00.690
So at a broad level, you can divide this technique into two different media categories.

04:01.020 --> 04:04.660
So like a frequency base, embedding and undermine is up Predix and Baizhang.

04:05.250 --> 04:09.780
Now, our main focus here will be to learn about this prediction base embedding.

04:10.380 --> 04:17.130
And we won't go into much detail about the critical part, but mainly Vokes based on the neural network

04:17.550 --> 04:18.180
architecture.

04:18.540 --> 04:25.370
So one is a CB or W that will be a continuous Baggot Ford model and undermining this.

04:25.410 --> 04:26.280
Keep Cramerton.

04:26.850 --> 04:33.930
So here what we are trying to do based on the context of war or the word which is surrounded by some

04:33.930 --> 04:39.720
particular war, we are trying to build some neural network model and those neural network model will

04:39.720 --> 04:47.880
come up with a vector representation of your any of the words that will be kind of very best representation

04:47.880 --> 04:54.380
you can see compared to this frequency base embedding like a contractor, DFI of Corkins.

04:54.600 --> 05:00.100
So mostly both of this countercurrent mean contractors and T.F. idea.

05:00.550 --> 05:06.930
Both of them behave a little seen in our text classification projects where we are trying to find the

05:07.230 --> 05:12.600
same numerical representation based on the frequency that how many times some particular work.

05:12.790 --> 05:14.460
So that will be a contractor.

05:14.760 --> 05:19.260
How many times the combination of two or three or multiple words occur together.

05:19.560 --> 05:24.480
So that will be idea behind this co-occurrence vector and a little bit better.

05:24.480 --> 05:26.550
An advanced technique like a PEF idea.

05:26.640 --> 05:32.190
So that will count further how many times some particular Kaarina document.

05:32.760 --> 05:37.720
And another component as the IDF like inverse document frequency stack.

05:37.800 --> 05:41.520
How many times some particular volke across all the documents.

05:41.970 --> 05:49.470
So these are all basic frequency based embedding technique and under cites is prediction based modeling

05:49.470 --> 05:49.860
technique.

05:50.190 --> 05:52.560
Let me give you a little bit more detail.

05:52.620 --> 05:58.980
So let me go to the call up and then in the next video, we will see how to implement VITTA, one of

05:58.980 --> 06:03.770
the NLB library, Jency Predix and base embedding technique.

06:04.890 --> 06:08.730
So for better understanding purposes, I created this KALEV file.

06:09.300 --> 06:15.750
And there are three examples I have given, like how you can represent the text into a kind of number.

06:16.290 --> 06:17.990
So let's have one heart encoding.

06:18.270 --> 06:22.040
So that will be kind of bag of words kind of model.

06:22.560 --> 06:28.150
So that says that, let's say rehabber, just the one sentence like a dog gets sat on the mat.

06:28.740 --> 06:36.290
So totally velho how unique words are available like doe cat sat on hand.

06:37.110 --> 06:44.070
So wherever that particular word occurs, the next position of that particular word will be given the

06:44.070 --> 06:47.190
value one meaning all values will be zero.

06:47.760 --> 06:49.760
So that is Koller's of one order encoding.

06:50.100 --> 06:57.900
Now for the what the vector will be zero zero zero zero one because at this particular date only occupies

06:57.960 --> 06:58.920
one remaining.

06:58.920 --> 06:59.970
All values will be zero.

07:00.360 --> 07:06.580
But now this approach is quite efficient because whatever than one hardcoded vector you will get, that

07:06.600 --> 07:08.080
is a very sparse.

07:08.430 --> 07:14.520
So let's see if you can just imagine the total words in a dictionary like a 10000 word dictionary.

07:14.820 --> 07:20.370
So in every single vector you will get almost ninety nine point ninety nine percent element will be

07:20.370 --> 07:20.640
zero.

07:20.680 --> 07:23.700
Only one place there will be a value one.

07:24.180 --> 07:26.570
So this approach will be quite inefficient.

07:26.970 --> 07:27.820
Let's go.

07:27.820 --> 07:32.460
We done under outputs like a code, each word with some unique number.

07:32.850 --> 07:34.530
So let's take a same example.

07:34.530 --> 07:36.560
Let's say the cats act on a map.

07:37.040 --> 07:40.770
Now we will give one single number to every single voice.

07:41.340 --> 07:46.800
So that will be assigned to one and two will be assigned to set simply some numbers will be assigned

07:46.800 --> 07:49.620
to every single unique one.

07:50.010 --> 07:57.210
Now, this particular approach is even more efficient, but that is better compared to this sparse vector

07:57.300 --> 07:57.710
approach.

07:58.530 --> 08:04.520
Now here the main problem is that when you convert this type of things into this, this is as even a

08:04.530 --> 08:04.980
label.

08:04.990 --> 08:05.810
Bayes indexing.

08:05.940 --> 08:06.360
Also.

08:07.410 --> 08:12.270
Let's say no has been presented with one and this two has been represented.

08:12.870 --> 08:15.410
I mean, said has been represented by 10 and 12.

08:15.900 --> 08:21.120
Now, when we supply this thing to the machine learning algorithm, the algorithm will understand that

08:21.460 --> 08:22.730
set is better than that.

08:23.280 --> 08:24.660
But that is not a case, actually.

08:24.810 --> 08:30.390
That is not a case, because now once we convert it into two number, it should also make sense how

08:30.390 --> 08:33.170
something is not better than other in terms of text.

08:33.390 --> 08:37.410
But when we convert it to number of automate, it will be something like that.

08:37.470 --> 08:39.360
And that control makes.

08:40.480 --> 08:46.410
Well, I saw it last one, an important one is a word emerging so you can read about things here.

08:46.830 --> 08:54.330
And I just want to show you that compared to earlier to approach this word embedding approach will try

08:54.330 --> 08:57.730
to give us some fixed line rector.

08:58.710 --> 09:03.090
So let's say in our case, the vector land will be, let's say, for Davidson.

09:03.360 --> 09:09.360
So care has been represented in such a kind of four day image that Matt has been represented by this

09:09.360 --> 09:10.020
phone number.

09:10.080 --> 09:12.900
And on has been represented by this phone number.

09:13.140 --> 09:20.350
So this with a four dimensional embedding will create for every single voice or I would say unique words.

09:20.470 --> 09:20.700
Really?

09:20.700 --> 09:22.110
Well, you know, what a dictionary.

09:22.110 --> 09:27.390
And that is what the word Embling is and how we are going to create this thing based on the neural network.

09:27.840 --> 09:28.170
All right.

09:28.440 --> 09:29.940
So see you in the next video.

09:29.940 --> 09:37.380
We'll get my hands on practical tech, how we can create our own war and maybe based on some simple

09:37.650 --> 09:38.080
dataset.

09:38.670 --> 09:38.980
All right.

09:39.120 --> 09:40.440
See you in the next video.