WEBVTT

00:00.420 --> 00:01.380
All right, everyone.

00:02.390 --> 00:07.820
So the next topic, which we are going to learn is named entity recognition.

00:08.720 --> 00:12.660
So for illustration purposes, I kept you three sentence.

00:13.220 --> 00:19.040
Let's say Apple is looking at my UK startup up for one billion dollars.

00:19.850 --> 00:26.690
So name recognition is something like whenever you encounter, let's say, some organization, some

00:26.840 --> 00:35.030
in some places, if you can go to the species documentation inside that there is a linguistic feature

00:35.780 --> 00:37.970
so named entity recognition.

00:38.540 --> 00:46.580
So this spacy named entity recognition can recognize, like a company location, organization products

00:46.730 --> 00:47.690
very much easily.

00:48.050 --> 00:55.340
And out of all those tax, if you have already an idea that if you are looking for Apple as an organization,

00:55.370 --> 00:59.710
you can even take Veigar, this is a for further analysis of your tax.

01:00.080 --> 01:05.570
So name entity recognition is one of the very crucial and an important step for understanding of your

01:05.810 --> 01:06.230
tax.

01:06.770 --> 01:13.670
So I woke up here three documents or I would say three sentence saying we are going to find what are

01:13.670 --> 01:15.830
the name entities that are associated with them.

01:16.340 --> 01:17.900
So let me execute Lisburn.

01:19.060 --> 01:26.350
And I'm going to Laura, a small size model in English language, and let's apply this sentence one

01:26.350 --> 01:29.480
by one and create a knock object out of.

01:32.220 --> 01:32.900
Brian LPT.

01:34.340 --> 01:39.530
Let's say Esslin and I'm going to assign you to, let's say, park one.

01:41.970 --> 01:45.330
And just like earlier, we help grab this.

01:47.010 --> 01:51.660
What I would say limited edition and tokenization saved me.

01:51.840 --> 01:59.130
This penalty is not just to go squill for getting some particular text analytics process.

01:59.160 --> 02:06.760
Instead of that, it's a very generic function which will pass through your text or any kind of full

02:07.170 --> 02:13.380
document which will return us a very high level document object.

02:14.060 --> 02:20.790
Like a lot of things, tagging, passing, cost, tagging, naming BD recognize just tokenization.

02:20.790 --> 02:23.940
Lemrick, I design everything it will take to extract it.

02:24.300 --> 02:27.660
So this is like an a very common Tarpeena especially.

02:29.600 --> 02:34.040
All right, now, to get the entities we can just simply use, like a E.A., yes.

02:34.740 --> 02:36.120
And let me run it.

02:36.390 --> 02:37.610
Let's see what it was written.

02:38.130 --> 02:45.150
So you can see it has identified that there are three entities are available in this particular statement.

02:45.750 --> 02:52.140
Now, that is Apple and we have a UK and one billion dollar.

02:52.680 --> 02:55.830
Now, other all like an envelope or a known.

02:56.840 --> 03:03.560
So what we can do if we want to get detailed explanation about this entities, we can decide later what

03:03.960 --> 03:04.260
part.

03:06.320 --> 03:14.290
Let's E.A. in, let me bring this E.A. tech sector.

03:14.800 --> 03:23.200
Let's say I want to grab ENFP thought, the label name, that Apple is like an organization or it will

03:23.200 --> 03:24.190
be kind of poison.

03:24.400 --> 03:25.510
So what is the entity?

03:25.540 --> 03:28.490
It belongs to so label undisclosed.

03:29.260 --> 03:33.730
And if I just run, it will get a little bit more detailed explanation.

03:34.030 --> 03:36.280
So Apple is or do your arguments.

03:36.290 --> 03:37.440
It's an organization.

03:37.940 --> 03:40.540
UK Teckman say it's a geographic location.

03:40.570 --> 03:42.940
Cheapy and one billion dollar.

03:42.970 --> 03:44.410
That means it's a kind of money.

03:44.710 --> 03:50.140
If you want to get a little more detailed explanation about this entity type.

03:50.510 --> 03:52.570
We can just simply use SDR.

03:53.440 --> 04:00.190
So just wrapping around that string function as their function and we can use like us Plessey.

04:02.260 --> 04:02.760
Explain.

04:03.140 --> 04:09.970
And here I can pass E.A. not label it so little bit more descriptive stuff.

04:10.580 --> 04:10.980
Hurricane.

04:12.840 --> 04:19.610
Now you can see Happel is organization and its legal description will be something like it's a company

04:19.730 --> 04:21.260
as INSEE institution.

04:22.100 --> 04:23.510
This is G.P payments.

04:23.630 --> 04:27.080
It's like a country's cities, states.

04:27.290 --> 04:32.800
And one billion dollars is associated with money, monetary values, including even you need also.

04:33.680 --> 04:34.070
All right.

04:34.490 --> 04:36.280
So that is about the Hasman.

04:36.740 --> 04:41.150
Now, it has not left any particular over, which is not at all.

04:41.330 --> 04:48.680
I mean, with this entity and it has not recognized in FUSCUS, let's try to apply same thing for S2

04:49.220 --> 04:51.470
and let's see how it goes.

04:52.310 --> 04:56.650
So let's say high school and all this Mikie document to

04:59.580 --> 05:05.750
now has to is like San Francisco considers banning sidewalk delivery robots.

05:06.230 --> 05:13.460
So from this particular document or a sentence, it has just identified San Francisco as a country apart

05:13.460 --> 05:14.200
from all other.

05:14.300 --> 05:17.490
They are not considering as an entity like.

05:17.870 --> 05:19.900
OK, let's call it the third one.

05:21.370 --> 05:29.140
So let me paint when that three first and let me make it documentary.

05:32.050 --> 05:33.050
And let's run it.

05:33.400 --> 05:33.790
All right.

05:33.820 --> 05:40.480
So you can see it as they're displayed DST, that it's a Facebook is hiring a new vice president in

05:40.490 --> 05:40.900
US.

05:41.290 --> 05:48.250
But from this sentence, it has just identified us, has a name entity, and that is nothing better

05:48.430 --> 05:48.910
countries.

05:48.940 --> 05:51.940
And that's perfectly fine, but has a leader.

05:51.940 --> 05:54.200
It has identified Apple as an organization.

05:54.250 --> 05:57.460
But here in this case, it has not identified.

05:57.640 --> 06:04.630
Facebook has an organization or any kind of neat even it has not identified as a name entity.

06:05.050 --> 06:11.850
So that is a provision in a spacy to add some extra tokens as and name entity also.

06:11.860 --> 06:16.240
And we can even assign those things as a specific entity.

06:16.480 --> 06:18.150
So how we can do it?

06:18.440 --> 06:23.640
So what we can do, we how to create a new objects, let's say.

06:25.420 --> 06:28.640
Not the not entity.

06:30.190 --> 06:33.910
If he were displayed here, there will be only one entity that is the US.

06:34.180 --> 06:41.740
So here we need to add one more thing so we can just simply use it like this and let's create some new

06:42.370 --> 06:42.880
entity.

06:43.990 --> 06:46.060
Let me assign it like a list.

06:48.510 --> 06:49.360
So we are ready.

06:49.620 --> 06:56.100
One more angry and this entity we are going to define like Facebook is like an Apple.

06:56.220 --> 06:58.560
So that will be an organization.

06:58.800 --> 07:01.920
Companies, agency or institution kind of naming.

07:03.950 --> 07:09.340
And this thing, I'm just going to assign it to Doctrine ENP.

07:09.800 --> 07:10.130
All right.

07:10.190 --> 07:13.850
So now our objective is to define this new entity.

07:14.330 --> 07:16.490
So let me create a few more subtle.

07:18.580 --> 07:26.100
And to define this new underscore entity, we are going to use the span so span.

07:26.740 --> 07:29.050
And this particular span requires.

07:30.570 --> 07:31.650
Offshore arguing.

07:32.110 --> 07:37.090
So first, the argument is from which document you are looking for some entity.

07:37.120 --> 07:38.580
So that will be a No.

07:38.830 --> 07:45.340
Three, and that will be a very first token only or from which token to which token you are looking

07:45.340 --> 07:45.520
for.

07:45.550 --> 07:46.510
So only for token.

07:46.930 --> 07:48.490
So zero to one.

07:49.120 --> 07:53.110
And what other types of this new entity you are referring to?

07:53.620 --> 07:57.300
So that will be a legacy label and it will be equal.

07:57.670 --> 07:59.380
He cites an organization.

07:59.830 --> 08:02.350
Now this audio is so you need to define it.

08:02.720 --> 08:02.930
So.

08:04.690 --> 08:05.620
How will define it?

08:05.770 --> 08:11.020
So we are going to use these Lochley, not Rocket Springs.

08:13.140 --> 08:19.820
And let me add here, or, you know, this particular span, we have to import it.

08:20.380 --> 08:23.680
So before this new entity, I'm just going to import.

08:25.330 --> 08:29.210
From spacy dark dawkins'.

08:30.940 --> 08:31.460
Import.

08:33.400 --> 08:33.870
Span.

08:34.180 --> 08:40.450
OK, so we have defined a new or R d organization kind of object.

08:41.170 --> 08:44.050
If you just execute it for you will be defined.

08:44.620 --> 08:49.630
Now from this basically tokens, that is a span function is there.

08:49.690 --> 08:53.350
So this band function will create a new entity for us.

08:54.580 --> 09:00.670
From this documentary object, and now we are going to head this new entity to our existing and to do

09:00.670 --> 09:03.080
list in this documentary, Rocky.

09:03.910 --> 09:09.520
So inside this documentary, very well, wherever this Facebook will appear, it will be identified

09:09.520 --> 09:10.760
as a name entity.

09:12.010 --> 09:15.980
All right, so now let us please play Lockheed or even Pierce.

09:18.640 --> 09:25.250
All right, so now you can see Facebook karma, you dartez that means to name and delete has recognized

09:25.610 --> 09:26.270
if you can.

09:27.080 --> 09:29.570
Let's say around this one again.

09:29.930 --> 09:30.300
All right.

09:30.430 --> 09:30.980
You can see.

09:33.110 --> 09:33.400
Hopes.

09:33.500 --> 09:33.970
I hope.

09:34.270 --> 09:36.160
I mean, executed this part again.

09:36.220 --> 09:37.800
That's why it is just that.

09:38.380 --> 09:39.370
So what we can do.

09:39.390 --> 09:41.020
We can just execute this one again.

09:41.890 --> 09:42.460
All of them.

09:43.960 --> 09:52.010
So it will add one more entity, like a Facebook and let me remove this one and let dissipate over everything

09:52.010 --> 09:53.950
that entities in this Dockley.

09:54.490 --> 09:58.300
And you'll be able to see there are two entities it has recognized.

09:58.350 --> 10:01.480
Facebook and the US, Facebook, his organization.

10:01.930 --> 10:04.400
US is like in a country's geographic location.

10:06.400 --> 10:11.650
Now, let's say same thing, if you want to do some kind of Zuli, you want to display.

10:13.220 --> 10:18.830
And for that, we can use from a sea import display.

10:20.850 --> 10:22.000
Actually, it's a display.

10:22.060 --> 10:22.450
See?

10:22.780 --> 10:23.320
All right.

10:23.740 --> 10:28.720
So we can use like in this place, we know, let's say, a vendor.

10:31.330 --> 10:34.030
Now, here we can pass multiple arguments.

10:34.230 --> 10:38.110
Well, it will be against the first argument will be dock's.

10:38.950 --> 10:44.230
So what dogs we are referring to, let's say we want to display Forder knock one.

10:45.040 --> 10:45.890
We our learoyd.

10:45.890 --> 10:54.180
So define this document that is corresponding to this Apple string Apple is looking for and style.

10:54.830 --> 10:55.240
So.

10:55.690 --> 10:56.620
Let's just make it.

10:57.400 --> 10:58.620
What other possible styles.

10:59.380 --> 11:04.210
And you can see there is the one BP heading there to go to their documentation.

11:04.630 --> 11:06.510
But I only know that it's in a.

11:07.590 --> 11:08.250
ENFP.

11:10.150 --> 11:16.440
And let's make it Jupiter easy, equal to prove otherwise, it one this.

11:16.670 --> 11:16.920
Yes.

11:18.570 --> 11:18.920
All right.

11:18.990 --> 11:21.830
So you can see in a very beautiful, nice manner.

11:21.990 --> 11:23.210
It has a scientist.

11:23.230 --> 11:28.730
Apple has organization, UK is a GP and one billion dollar that is like money.

11:29.860 --> 11:35.590
Now, suppose out of this entity, you will just want to find some specific entity only.

11:35.940 --> 11:37.510
So the same function, you can use it.

11:38.000 --> 11:40.580
But one more extra argument you can add.

11:40.900 --> 11:43.360
Let's say option an option.

11:43.360 --> 11:45.130
You can it like on a dictionary.

11:46.210 --> 11:47.790
Let's say d.a.'s.

11:50.130 --> 11:59.910
And let's see, I just want to display organization finally and let me put it a comma like this display.

12:00.790 --> 12:01.140
All right.

12:01.170 --> 12:05.050
So you can see only organization, it has displayed Bud Light.

12:05.160 --> 12:12.780
So that is the whole story about the name and recognition and how to do very easily distinguish functions

12:12.780 --> 12:14.470
are really well inside the space.

12:15.100 --> 12:15.510
All right.

12:15.540 --> 12:16.870
See you in the next video.
