1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:05,000
So we are going to continue a discussion with respect to natural language processing for machine learning.

3
00:00:05,000 --> 00:00:08,000
In this video we are going to discuss about word embeddings.

4
00:00:08,000 --> 00:00:14,000
And this was probably the topic that I should have covered long back, but I am deliberately keeping

5
00:00:14,000 --> 00:00:16,000
this particular topic at this point of time.

6
00:00:16,000 --> 00:00:16,000
Why?

7
00:00:16,000 --> 00:00:22,000
Because we have discussed so many topics wherein we focused on converting word into vectors.

8
00:00:22,000 --> 00:00:28,000
So I now you'll be getting a very clear idea about what exactly word embeddings is.

9
00:00:28,000 --> 00:00:30,000
And here I have given you a Wikipedia definition.

10
00:00:30,000 --> 00:00:33,000
So this is a very simple Wikipedia definition.

11
00:00:33,000 --> 00:00:34,000
I have taken it from Wikipedia.

12
00:00:34,000 --> 00:00:37,000
So the entire credit goes to Wikipedia over here.

13
00:00:37,000 --> 00:00:42,000
So over here you can see the definition that in natural language processing word embedding is a term

14
00:00:42,000 --> 00:00:50,000
used for representation of the words right for text analysis, typically in the form of real valued

15
00:00:50,000 --> 00:00:56,000
vectors, that encodes the meaning of the word such that the word are closer in the vector space, are

16
00:00:56,000 --> 00:00:58,000
expected to be similar in the meaning.

17
00:00:59,000 --> 00:01:03,000
So let's say that I have two words king and queen.

18
00:01:04,000 --> 00:01:06,000
Okay, or forget about king and queen.

19
00:01:06,000 --> 00:01:15,000
Let's say that I have two words, and the two word is like happy and excited, right?

20
00:01:15,000 --> 00:01:17,000
Let's say I have this two specific word.

21
00:01:17,000 --> 00:01:23,000
Now when I have this two specific word with the help of word embedding techniques, what we do is that

22
00:01:23,000 --> 00:01:25,000
we convert this particular word into vectors.

23
00:01:26,000 --> 00:01:33,000
And let's say if I try to plot this vectors in a two dimensional graph okay.

24
00:01:33,000 --> 00:01:38,000
And in if I really want to convert this into two dimensional graph, we have techniques like PCA or

25
00:01:38,000 --> 00:01:42,000
other techniques, which is an unsupervised technique to do dimensionality reduction.

26
00:01:42,000 --> 00:01:49,000
So once I probably plot this let's say happy and excited are coming near to each other based on this

27
00:01:49,000 --> 00:01:49,000
particular vectors.

28
00:01:49,000 --> 00:01:52,000
It basically indicates both are similar word.

29
00:01:52,000 --> 00:01:55,000
Okay, let's say that I have one more word like angry.

30
00:01:56,000 --> 00:01:59,000
Now in this case with the help of word embeddings.

31
00:01:59,000 --> 00:02:05,000
If I'm trying to convert this into vectors, the tentative thing is that obviously happy is the opposite

32
00:02:05,000 --> 00:02:05,000
to angry.

33
00:02:05,000 --> 00:02:11,000
So angry will be somewhere here if I probably try to plot this particular, uh, vectors over here.

34
00:02:11,000 --> 00:02:12,000
Why?

35
00:02:12,000 --> 00:02:13,000
Why it is coming so far?

36
00:02:13,000 --> 00:02:14,000
Because it is an opposite word.

37
00:02:14,000 --> 00:02:19,000
So the distance between this word will be quite high, whereas the distance between this particular

38
00:02:19,000 --> 00:02:20,000
word will be quite less.

39
00:02:20,000 --> 00:02:25,000
So this indicates that this both the words are similar, whereas this both words are opposite to each

40
00:02:25,000 --> 00:02:26,000
other.

41
00:02:26,000 --> 00:02:26,000
Right.

42
00:02:26,000 --> 00:02:32,000
And this is all possible because of efficient conversion of the word into vectors.

43
00:02:32,000 --> 00:02:33,000
Right.

44
00:02:33,000 --> 00:02:36,000
And how we are doing this again with the help of word embeddings.

45
00:02:36,000 --> 00:02:41,000
But the techniques we have learned till now is it is something like one hot encoded.

46
00:02:41,000 --> 00:02:46,000
We have learnt about bag of words, we have learned, TF-IDF we have learnt and all these techniques

47
00:02:46,000 --> 00:02:48,000
are a part of word embeddings.

48
00:02:48,000 --> 00:02:55,000
So if I properly clearly show you the division, you know in the first step if I have the word embedding

49
00:02:55,000 --> 00:02:55,000
techniques.

50
00:02:55,000 --> 00:02:58,000
So let me just again go ahead and write it.

51
00:02:58,000 --> 00:03:03,000
Word embedding techniques are specifically of two types.

52
00:03:03,000 --> 00:03:07,000
So this is my first type and this is my second type.

53
00:03:07,000 --> 00:03:12,000
The first type is based on count or frequency.

54
00:03:13,000 --> 00:03:15,000
Count or frequency.

55
00:03:16,000 --> 00:03:21,000
And the second type is based on deep learning trained models.

56
00:03:21,000 --> 00:03:27,000
Please hear to this very properly because this deep learning trained models will give you very good

57
00:03:27,000 --> 00:03:28,000
accuracies.

58
00:03:28,000 --> 00:03:32,000
Okay, now in count of frequency we have learned about three different types.

59
00:03:32,000 --> 00:03:34,000
One is one hot encoded.

60
00:03:35,000 --> 00:03:37,000
Second one is something called as bag of words.

61
00:03:37,000 --> 00:03:41,000
And the third one is something called as tf IDF.

62
00:03:42,000 --> 00:03:42,000
Right.

63
00:03:42,000 --> 00:03:44,000
So all these techniques we have learned right.

64
00:03:45,000 --> 00:03:47,000
And we know the advantages and disadvantages.

65
00:03:47,000 --> 00:03:50,000
But here we are focusing more on count of frequency.

66
00:03:50,000 --> 00:03:51,000
Right.

67
00:03:51,000 --> 00:03:58,000
But the major one, which is having a better accuracy, which is actually and all these techniques at

68
00:03:58,000 --> 00:04:05,000
the end of the day is also converting words into vectors, or it is converting sentence into vectors,

69
00:04:05,000 --> 00:04:07,000
right sentence into vectors.

70
00:04:07,000 --> 00:04:10,000
But we have seen lot of advantages and disadvantages.

71
00:04:10,000 --> 00:04:14,000
Maximum number of disadvantages are there with respect to all these techniques.

72
00:04:14,000 --> 00:04:19,000
And this all disadvantages are getting solved by this deep learning trained model.

73
00:04:19,000 --> 00:04:24,000
And the trained model is nothing but which is basically called as word two vec.

74
00:04:25,000 --> 00:04:27,000
It's not like we cannot create it from scratch.

75
00:04:27,000 --> 00:04:29,000
We can definitely create it from scratch.

76
00:04:29,000 --> 00:04:34,000
But again, you require a huge amount of data and what we are going to do in that, first of all, we

77
00:04:34,000 --> 00:04:40,000
are going to understand in the upcoming videos, uh, what is exactly word two vec and how it is basically

78
00:04:40,000 --> 00:04:45,000
converting a word into vectors and how it is solving all the disadvantaged things that were there in

79
00:04:45,000 --> 00:04:46,000
this particular technique.

80
00:04:46,000 --> 00:04:50,000
Everything we are going to discuss, but just understand that what is word two vec?

81
00:04:50,000 --> 00:04:55,000
It is a word embedding technique which will efficiently convert a word into a vectors, which will be

82
00:04:55,000 --> 00:04:57,000
making sure that both this property.

83
00:04:57,000 --> 00:05:01,000
It is expected to have similar in meaning whenever it is converting into a vector space.

84
00:05:01,000 --> 00:05:06,000
Along with that, it will also give you a very good representation of the words right.

85
00:05:06,000 --> 00:05:08,000
Sparsity will not be there and all.

86
00:05:08,000 --> 00:05:12,000
There are many points which I'm going to discuss in the upcoming videos with respect to word two vec.

87
00:05:12,000 --> 00:05:15,000
Now word two vec are of two types.

88
00:05:15,000 --> 00:05:20,000
One is because the entire deep learning architecture is built on two different types.

89
00:05:20,000 --> 00:05:30,000
One is we basically say see bo, see bo see bo is nothing but continuous bag of words.

90
00:05:30,000 --> 00:05:34,000
Super important, Continuous bag of words.

91
00:05:35,000 --> 00:05:38,000
And we are also going to see that how the models gets trained.

92
00:05:38,000 --> 00:05:43,000
But for this you really need to have a prerequisite knowledge of how an Ann works.

93
00:05:43,000 --> 00:05:46,000
What is loss function, what is optimizers.

94
00:05:46,000 --> 00:05:46,000
And all right.

95
00:05:46,000 --> 00:05:50,000
The second technique is something called as skip gram.

96
00:05:50,000 --> 00:05:52,000
Skip gram.

97
00:05:52,000 --> 00:05:54,000
Again this is a different technique.

98
00:05:54,000 --> 00:05:57,000
Uh and again it is a part of word two vec itself.

99
00:05:57,000 --> 00:05:58,000
It is a different type of word two vec.

100
00:05:58,000 --> 00:06:04,000
At the end of the day, we can either use cbow or skip gram to get an efficient conversion of word to

101
00:06:04,000 --> 00:06:05,000
vectors.

102
00:06:05,000 --> 00:06:05,000
Right.

103
00:06:05,000 --> 00:06:07,000
And this is what we are going to see.

104
00:06:07,000 --> 00:06:12,000
We are also going to see some pre-trained models of word two vec, you know, probably created by Google

105
00:06:12,000 --> 00:06:14,000
and it is somewhere around 1.5 GB.

106
00:06:14,000 --> 00:06:15,000
Big model size.

107
00:06:15,000 --> 00:06:19,000
Will try to download it will try to see, we will try to execute it.

108
00:06:19,000 --> 00:06:25,000
But in the upcoming videos what we are going to see is that how word two vec word embedding works,

109
00:06:25,000 --> 00:06:26,000
right?

110
00:06:26,000 --> 00:06:30,000
And how it is making sure that all these disadvantages that are present in this techniques is getting

111
00:06:30,000 --> 00:06:30,000
removed.

112
00:06:30,000 --> 00:06:31,000
Right?

113
00:06:31,000 --> 00:06:34,000
So this is what we are going to discuss, but I hope you got an idea.

114
00:06:34,000 --> 00:06:40,000
At the end of the day, whatever techniques we have discussed till now in converting word into vectors,

115
00:06:40,000 --> 00:06:42,000
it falls under word embeddings, right?

116
00:06:42,000 --> 00:06:44,000
So yes, this was it from my side.

117
00:06:44,000 --> 00:06:46,000
I will see you all in the next video.

118
00:06:46,000 --> 00:06:49,000
And in the next video I'm going to discuss about word two VEC.

119
00:06:49,000 --> 00:06:49,000
Thank you.

