1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:05,000
So let us go ahead and continue the discussion with respect to word embedding already, I gave you an

3
00:00:05,000 --> 00:00:08,000
idea right where word embedding or embedding layer is specifically used.

4
00:00:08,000 --> 00:00:14,000
See word embedding is a technique which is just converting a word into vectors.

5
00:00:14,000 --> 00:00:20,000
And specifically in a neural network we have to use this as a layer, another layer like how we have

6
00:00:20,000 --> 00:00:21,000
a dense layer, right.

7
00:00:21,000 --> 00:00:24,000
Similarly, embedding layer will also be there.

8
00:00:24,000 --> 00:00:28,000
And the main work of embedding layer is that it will be using some kind of word embedding technique,

9
00:00:28,000 --> 00:00:31,000
where it will take an input and it will convert that into vectors.

10
00:00:31,000 --> 00:00:31,000
Okay.

11
00:00:32,000 --> 00:00:39,000
Now when I'm discussing about this word embedding and uh it is also called as feature representation.

12
00:00:39,000 --> 00:00:40,000
Okay.

13
00:00:40,000 --> 00:00:41,000
What exactly it is.

14
00:00:41,000 --> 00:00:42,000
Okay.

15
00:00:42,000 --> 00:00:43,000
We will discuss about it.

16
00:00:43,000 --> 00:00:43,000
Right.

17
00:00:43,000 --> 00:00:48,000
Let's say I have a data set okay.

18
00:00:48,000 --> 00:00:53,000
And in this data set uh, let's say I have text and this is my output.

19
00:00:55,000 --> 00:01:03,000
Now, with respect to this particular text, uh, let's consider that I have some sentence X11X12X13X14.

20
00:01:04,000 --> 00:01:05,000
This output is zero.

21
00:01:05,000 --> 00:01:10,000
Let's say again you have X21X22X23X24.

22
00:01:11,000 --> 00:01:13,000
Uh then you have this output one.

23
00:01:13,000 --> 00:01:16,000
Similarly you have lot of data like this okay.

24
00:01:16,000 --> 00:01:17,000
Available okay.

25
00:01:18,000 --> 00:01:27,000
Now as I told you, whenever I need to give any input for my neural network, the list specifically

26
00:01:27,000 --> 00:01:28,000
simple RNN.

27
00:01:28,000 --> 00:01:33,000
This is how the representation is right with respect to the simple RNN right from here also.

28
00:01:34,000 --> 00:01:37,000
So let's consider that this is my simple RNN.

29
00:01:38,000 --> 00:01:44,000
If I want to give any inputs over here, one thing that you really need to understand is that whatever,

30
00:01:45,000 --> 00:01:54,000
whatever width I of t whatever I give words with respect to timestamp, I need to convert this into

31
00:01:54,000 --> 00:01:56,000
vectors by using some embedding layer.

32
00:01:57,000 --> 00:01:57,000
Okay.

33
00:01:57,000 --> 00:02:00,000
So I'll, I'll try to convert this into some vectors.

34
00:02:02,000 --> 00:02:09,000
Now let me just go ahead and talk about one of the process which is most initially it was most commonly

35
00:02:09,000 --> 00:02:11,000
used, which is called as one hot encoding.

36
00:02:11,000 --> 00:02:11,000
Right.

37
00:02:12,000 --> 00:02:18,000
Now, let's say first of all, if I really want to use some word embedding okay.

38
00:02:18,000 --> 00:02:20,000
It is also called as feature representation.

39
00:02:20,000 --> 00:02:21,000
Understand this word okay.

40
00:02:21,000 --> 00:02:24,000
This I will discuss about what exactly is feature representation.

41
00:02:24,000 --> 00:02:26,000
Very important thing okay.

42
00:02:26,000 --> 00:02:32,000
So initially let's say I will not use word embedding I'll use some other techniques like uh one hot

43
00:02:32,000 --> 00:02:32,000
encoding.

44
00:02:32,000 --> 00:02:37,000
Let's say, uh, we will use some other technique and we will discuss what is the disadvantage of this

45
00:02:37,000 --> 00:02:38,000
particular technique.

46
00:02:38,000 --> 00:02:39,000
And then we will go to word embedding.

47
00:02:39,000 --> 00:02:44,000
So let's say with the help of first technique that we are going to use is one hot encoding.

48
00:02:44,000 --> 00:02:49,000
Now in one hot encoding what happens is that we define a vocabulary size.

49
00:02:49,000 --> 00:02:54,000
Vocabulary size is basically the the total number of words that you have in this vocabulary.

50
00:02:54,000 --> 00:02:57,000
So let's say if I go ahead and say it is 10,000.

51
00:02:57,000 --> 00:03:00,000
Okay, so 10,000 words I have.

52
00:03:00,000 --> 00:03:00,000
Okay.

53
00:03:01,000 --> 00:03:12,000
Now if I consider one hot representation one hot representation, what does this basically mean is that

54
00:03:13,000 --> 00:03:19,000
if I have a word something like this, let's say man, in this sentence I have a word which is called

55
00:03:19,000 --> 00:03:22,000
as man or over here I'll just define man.

56
00:03:23,000 --> 00:03:28,000
So every word will be converted based on this one.

57
00:03:28,000 --> 00:03:29,000
Hot representation.

58
00:03:29,000 --> 00:03:31,000
What does this one hot representation basically say?

59
00:03:31,000 --> 00:03:38,000
This man will basically get converted to a vector of 10,000 dimensions.

60
00:03:38,000 --> 00:03:43,000
So over here you'll be able to see that this this vector that you will be seeing will be having 10,000

61
00:03:43,000 --> 00:03:44,000
words.

62
00:03:44,000 --> 00:03:48,000
And let's say these are like represented somewhere like this and somewhere like this.

63
00:03:48,000 --> 00:03:50,000
I have Dot and here I will be having one.

64
00:03:50,000 --> 00:03:53,000
Then I will be having zero and continuously all zero.

65
00:03:53,000 --> 00:03:53,000
Okay.

66
00:03:54,000 --> 00:04:02,000
So one hot representation basically says that in this vocabulary in whichever index man word is present,

67
00:04:02,000 --> 00:04:05,000
that will be one and remaining all will be zero.

68
00:04:06,000 --> 00:04:12,000
So how we are representing man by one hot representation, we are basically using this vector, a vocabulary

69
00:04:12,000 --> 00:04:15,000
size that many number of vectors will be there.

70
00:04:15,000 --> 00:04:20,000
Every value will be zero only on the index where man is present.

71
00:04:20,000 --> 00:04:26,000
Let's say if this is 5000 index, right this way we will specifically make this value as one.

72
00:04:27,000 --> 00:04:29,000
So man will be shown in.

73
00:04:29,000 --> 00:04:35,000
This vector will be represented using this vector according to one hot representation Okay.

74
00:04:35,000 --> 00:04:39,000
Similarly, let's say there is also another word which is called as boy.

75
00:04:39,000 --> 00:04:42,000
And let's say for boy in this index.

76
00:04:42,000 --> 00:04:45,000
Boy index in 2000 index.

77
00:04:45,000 --> 00:04:52,000
Let's say I will go and write 000 like this up to one, then 000, then like this up to zero.

78
00:04:52,000 --> 00:04:52,000
Right.

79
00:04:52,000 --> 00:04:54,000
So this size will be 10,000.

80
00:04:54,000 --> 00:04:59,000
In the 2000 index I'm just imagining in the 2000 index boy is present.

81
00:04:59,000 --> 00:05:00,000
So that is the.

82
00:05:00,000 --> 00:05:02,000
This word boy is present in this vocabulary.

83
00:05:02,000 --> 00:05:04,000
So this value is one.

84
00:05:04,000 --> 00:05:04,000
Okay.

85
00:05:04,000 --> 00:05:11,000
So this way the entire one hot representation is basically done for a specific word.

86
00:05:11,000 --> 00:05:13,000
And it is converted into a vector.

87
00:05:13,000 --> 00:05:14,000
Right.

88
00:05:14,000 --> 00:05:17,000
Now just imagine you are having a word.

89
00:05:17,000 --> 00:05:21,000
You're converting this every word into 10,000 dimensions.

90
00:05:21,000 --> 00:05:25,000
And you are sending it in the form of vectors, which is having zeros and ones and all, and only at

91
00:05:25,000 --> 00:05:27,000
one place there is one and remaining all are zero.

92
00:05:27,000 --> 00:05:31,000
So this is really a sparse matrix.

93
00:05:32,000 --> 00:05:34,000
This is basically called as a sparse matrix.

94
00:05:34,000 --> 00:05:37,000
And what is the problem with this sparse matrix.

95
00:05:38,000 --> 00:05:43,000
The problem with this sparse matrix is that whenever we try to use this, it leads to something called

96
00:05:43,000 --> 00:05:50,000
as overfitting, because you have just zeros and ones, there is much, no much calculation as such,

97
00:05:51,000 --> 00:05:51,000
right?

98
00:05:51,000 --> 00:05:57,000
So this is really important for you all to understand how these things are basically taking place,

99
00:05:57,000 --> 00:05:58,000
right.

100
00:05:58,000 --> 00:06:02,000
So obviously we cannot use one hot representation.

101
00:06:02,000 --> 00:06:03,000
It is not an efficient technique.

102
00:06:03,000 --> 00:06:04,000
Right.

103
00:06:04,000 --> 00:06:09,000
Because any number of zeros and ones we have based on the vocabulary size, we are making it so much

104
00:06:09,000 --> 00:06:10,000
big, right.

105
00:06:10,000 --> 00:06:12,000
Similarly you can see by so many number of words.

106
00:06:12,000 --> 00:06:14,000
Are there like so many uh.

107
00:06:14,000 --> 00:06:18,000
Uh, this one hot representation, you just have zero, zero and one place.

108
00:06:18,000 --> 00:06:19,000
You just have one.

109
00:06:19,000 --> 00:06:21,000
So not an efficient way to use this.

110
00:06:21,000 --> 00:06:22,000
Right.

111
00:06:22,000 --> 00:06:29,000
And this is what we in order to overcome this disadvantage, we specifically use something called as

112
00:06:29,000 --> 00:06:31,000
word embedding.

113
00:06:32,000 --> 00:06:38,000
Now if you don't know about word two vec I again suggest please go ahead and revise about word two vec.

114
00:06:39,000 --> 00:06:47,000
Word two vec is one type of word to word embedding technique, and this can also be used in the embedding

115
00:06:47,000 --> 00:06:47,000
layer.

116
00:06:47,000 --> 00:06:49,000
Now what does this basically say.

117
00:06:49,000 --> 00:06:50,000
Word embedding.

118
00:06:50,000 --> 00:06:51,000
What does it say.

119
00:06:51,000 --> 00:06:52,000
Okay.

120
00:06:52,000 --> 00:06:57,000
And please remember this word called as feature representation okay.

121
00:06:57,000 --> 00:07:03,000
It creates a feature representation for every word that is available over here in this data set.

122
00:07:03,000 --> 00:07:03,000
Okay.

123
00:07:03,000 --> 00:07:05,000
What does that basically mean?

124
00:07:05,000 --> 00:07:08,000
Let's say in my data set I have some words.

125
00:07:08,000 --> 00:07:09,000
Okay.

126
00:07:09,000 --> 00:07:12,000
Let's say I'll just go ahead and make this like boy.

127
00:07:12,000 --> 00:07:18,000
I have something like girl I have something like King, I have something like Queen.

128
00:07:21,000 --> 00:07:24,000
Then I have something like Apple.

129
00:07:25,000 --> 00:07:27,000
Then I have something like mango.

130
00:07:28,000 --> 00:07:29,000
Okay, let's consider this.

131
00:07:29,000 --> 00:07:30,000
All words are there.

132
00:07:31,000 --> 00:07:34,000
And again over here also my vocabulary sizes.

133
00:07:34,000 --> 00:07:35,000
Let's consider 10,000.

134
00:07:36,000 --> 00:07:36,000
Okay.

135
00:07:37,000 --> 00:07:39,000
Let's say boy is present in 2000.

136
00:07:39,000 --> 00:07:43,000
Index girl is present in 5000.

137
00:07:43,000 --> 00:07:46,000
Index King is present in 6000 index.

138
00:07:46,000 --> 00:07:51,000
The word in this vocabulary, it is present in that index 9000 Queen is present.

139
00:07:51,000 --> 00:07:56,000
Apple you have something like 1000 and mango you may have in 7000.

140
00:07:57,000 --> 00:07:57,000
Okay.

141
00:07:58,000 --> 00:08:01,000
Now let's consider these words.

142
00:08:01,000 --> 00:08:04,000
Now I need to convert these words right.

143
00:08:04,000 --> 00:08:06,000
Let's say these are some of the words that I have picked from the first sentence.

144
00:08:06,000 --> 00:08:08,000
I'm just considering it okay.

145
00:08:08,000 --> 00:08:13,000
Now how every words is basically converted into vectors that we really need to understand with the help

146
00:08:13,000 --> 00:08:15,000
of word embedding.

147
00:08:15,000 --> 00:08:20,000
If you remember word two vec, I I've already shown you how word two vec is basically trained, right?

148
00:08:21,000 --> 00:08:23,000
There we uh, there is two types.

149
00:08:23,000 --> 00:08:25,000
Script gram is there, cbow is there.

150
00:08:25,000 --> 00:08:28,000
We have already seen how forward propagation backward propagation will happen.

151
00:08:28,000 --> 00:08:32,000
Now, in the case of word embedding, or in the case of embedding layer, which is using some kind of

152
00:08:32,000 --> 00:08:40,000
word embedding for all these words, first of all we will go ahead and select some feature representation.

153
00:08:41,000 --> 00:08:45,000
Let's say for this we will go ahead and select some feature representation.

154
00:08:46,000 --> 00:08:54,000
And let's say I'll say hey, in this feature representation I will take each and every words and I will

155
00:08:54,000 --> 00:09:01,000
change this into so many number, or I'll represent every word in some dimensions according to this

156
00:09:01,000 --> 00:09:02,000
feature representation.

157
00:09:02,000 --> 00:09:05,000
So here I will say let's go ahead and do one thing.

158
00:09:05,000 --> 00:09:09,000
Let's take 300 dimension okay.

159
00:09:09,000 --> 00:09:15,000
Let's take 300 dimensions or vector and represent each word in this 300 dimension.

160
00:09:16,000 --> 00:09:17,000
Okay.

161
00:09:17,000 --> 00:09:21,000
Let's represent each word as 300 dimension okay.

162
00:09:21,000 --> 00:09:27,000
It is very much difficult to understand how the inner working of word two vec is, or this word embedding

163
00:09:27,000 --> 00:09:30,000
is because we will not be able to see all the features.

164
00:09:30,000 --> 00:09:35,000
But here, let's go ahead and consider that I am going to represent some features over here.

165
00:09:36,000 --> 00:09:40,000
I'll probably use some feature representation and I'll write some words.

166
00:09:40,000 --> 00:09:40,000
Okay.

167
00:09:40,000 --> 00:09:42,000
That will actually help you to understand.

168
00:09:42,000 --> 00:09:46,000
Let's say that I'm going to consider gender over here one of the feature.

169
00:09:46,000 --> 00:09:48,000
Then I'm going to consider Royal.

170
00:09:48,000 --> 00:09:50,000
Then I am going to consider age.

171
00:09:50,000 --> 00:09:53,000
And then I'm going to consider food.

172
00:09:53,000 --> 00:09:53,000
Okay.

173
00:09:53,000 --> 00:09:55,000
These are some of the features that I've used.

174
00:09:55,000 --> 00:09:56,000
And let's say like this.

175
00:09:56,000 --> 00:09:58,000
They are 300 dimensions.

176
00:09:59,000 --> 00:10:00,000
More, more.

177
00:10:00,000 --> 00:10:01,000
So many words are there till here.

178
00:10:01,000 --> 00:10:02,000
Okay.

179
00:10:02,000 --> 00:10:07,000
Now what word embedding does is that based on this features that we have considered how these features

180
00:10:07,000 --> 00:10:09,000
are basically coming up.

181
00:10:09,000 --> 00:10:12,000
I have already shown you that in word two VEC training.

182
00:10:12,000 --> 00:10:13,000
Please go ahead and see that.

183
00:10:14,000 --> 00:10:14,000
Okay.

184
00:10:14,000 --> 00:10:17,000
We are going to train this separately.

185
00:10:17,000 --> 00:10:18,000
And this is what is used in the embedding layer.

186
00:10:19,000 --> 00:10:22,000
Right now we are just considering this feature representation.

187
00:10:22,000 --> 00:10:29,000
And each word will probably will try to find out the relation between this word and this word, this

188
00:10:29,000 --> 00:10:33,000
word and this word, this word and this word, and will assign some vectors.

189
00:10:33,000 --> 00:10:33,000
Okay.

190
00:10:34,000 --> 00:10:41,000
So this boy word will be given by this 300 dimension features or three 300 dimension vectors.

191
00:10:41,000 --> 00:10:44,000
By providing by seeing the relationship between this particular word.

192
00:10:44,000 --> 00:10:46,000
Again, let me repeat it.

193
00:10:46,000 --> 00:10:46,000
Okay.

194
00:10:46,000 --> 00:10:53,000
So here what we are doing is that if we consider this word boy, this boy will be converted into vectors

195
00:10:53,000 --> 00:11:00,000
in such a way that we'll try to find out the relationship between all this, all this, all this, all

196
00:11:00,000 --> 00:11:01,000
this features.

197
00:11:01,000 --> 00:11:02,000
Right.

198
00:11:02,000 --> 00:11:03,000
And then we'll get one vector value.

199
00:11:03,000 --> 00:11:05,000
Let's say one example.

200
00:11:05,000 --> 00:11:07,000
I'll say boy to gender.

201
00:11:07,000 --> 00:11:10,000
If I write minus one okay I I'll just go ahead and write minus one.

202
00:11:10,000 --> 00:11:11,000
I'm just assigning some value.

203
00:11:11,000 --> 00:11:12,000
Let's say minus one.

204
00:11:12,000 --> 00:11:18,000
Then girl two gender will be plus one because these are opposite right opposite to boys girl.

205
00:11:18,000 --> 00:11:20,000
So that is the reason I have written minus one and plus one.

206
00:11:20,000 --> 00:11:25,000
Now if I see the relationship between Royal and Boy, we don't say hey, it is a royal boy or something,

207
00:11:25,000 --> 00:11:27,000
so there will be hardly some relationship.

208
00:11:27,000 --> 00:11:31,000
So for this, let's say I'll go ahead and write one value 0.01.

209
00:11:31,000 --> 00:11:36,000
Similarly, royal girl girl can be said royal, so I'll just go ahead and give some value more than

210
00:11:36,000 --> 00:11:37,000
boys okay.

211
00:11:38,000 --> 00:11:43,000
Now similarly, if I take King to Royal they'll say hey royal King.

212
00:11:43,000 --> 00:11:45,000
So obviously this value will be higher.

213
00:11:45,000 --> 00:11:48,000
Let's say 0.95 okay.

214
00:11:48,000 --> 00:11:53,000
If I say hey Queen to Royal, yes we obviously say Queen Royal.

215
00:11:53,000 --> 00:11:53,000
Okay.

216
00:11:53,000 --> 00:11:55,000
But if I say king to gender okay.

217
00:11:55,000 --> 00:12:00,000
If I see some relation let's say it is 0.92 queen to gender.

218
00:12:00,000 --> 00:12:05,000
Again, I'll go ahead and write 0.92 and again this will be opposite okay.

219
00:12:05,000 --> 00:12:07,000
Apple to gender.

220
00:12:07,000 --> 00:12:09,000
Is there any relationship with respect to gender.

221
00:12:09,000 --> 00:12:16,000
So it will be 0.0 manga to gender okay I'll just keep one value 0.1 because there is hardly no relationship.

222
00:12:16,000 --> 00:12:18,000
But do we use royal to apple.

223
00:12:18,000 --> 00:12:20,000
Do we say royal apple.

224
00:12:20,000 --> 00:12:20,000
No.

225
00:12:20,000 --> 00:12:21,000
Okay.

226
00:12:21,000 --> 00:12:23,000
So it will be 0.02 let's say.

227
00:12:23,000 --> 00:12:24,000
And for this it will be 0.01.

228
00:12:24,000 --> 00:12:27,000
I'm putting some values over here based on the relationship.

229
00:12:27,000 --> 00:12:28,000
Right.

230
00:12:28,000 --> 00:12:29,000
So like this.

231
00:12:29,000 --> 00:12:36,000
All these values will with this representation with this relation it will be trained and it will be

232
00:12:36,000 --> 00:12:37,000
provided a vector.

233
00:12:37,000 --> 00:12:38,000
How do we train it.

234
00:12:38,000 --> 00:12:41,000
Please go ahead and see my word two vec tutorial.

235
00:12:42,000 --> 00:12:45,000
Um there you will be able to understand it okay.

236
00:12:46,000 --> 00:12:48,000
Word two vec word two vec.

237
00:12:48,000 --> 00:12:48,000
Okay.

238
00:12:48,000 --> 00:12:50,000
It is necessary that you need to understand what to vec okay.

239
00:12:51,000 --> 00:12:55,000
Now this boy is given by this vectors.

240
00:12:56,000 --> 00:13:00,000
This is now important for you to understand right.

241
00:13:00,000 --> 00:13:07,000
So whenever I pass by word over here first of all I will give this word to my embedding layer and my

242
00:13:07,000 --> 00:13:08,000
embedding layer.

243
00:13:08,000 --> 00:13:15,000
If it is using this 300 dimension it is just going to convert this into a vector with respect to 300

244
00:13:15,000 --> 00:13:21,000
dimension with all this relationship vector values, okay, like minus one, 0.0, one and all.

245
00:13:21,000 --> 00:13:26,000
And once we pass this, then only our simple RNN will start getting trained.

246
00:13:26,000 --> 00:13:27,000
Okay.

247
00:13:27,000 --> 00:13:32,000
Now, uh, this is what exactly is all about.

248
00:13:32,000 --> 00:13:35,000
So here there are some important parameters you should remember.

249
00:13:35,000 --> 00:13:37,000
One is obviously the vocabulary size.

250
00:13:37,000 --> 00:13:40,000
So in this case I have taken 10,000.

251
00:13:40,000 --> 00:13:45,000
The second is what is the features dimension that I really need to use.

252
00:13:45,000 --> 00:13:49,000
So feature dimension let's say I'm using over here as 300.

253
00:13:49,000 --> 00:13:55,000
So there is a word two vec technique uh from Google from uh there is also a library called as glove.

254
00:13:55,000 --> 00:13:55,000
Right.

255
00:13:55,000 --> 00:13:58,000
This has specifically 300 dimensions.

256
00:13:58,000 --> 00:13:58,000
Okay.

257
00:13:58,000 --> 00:14:01,000
Then uh this two are the important parameters.

258
00:14:01,000 --> 00:14:06,000
And, uh, you have to remember this because in the next video, what I'm actually going to do is that

259
00:14:06,000 --> 00:14:14,000
I'm going to show you a practical, practical implementation how word embedding will work.

260
00:14:15,000 --> 00:14:17,000
Word embedding layer will work.

261
00:14:19,000 --> 00:14:22,000
And this is what we are going to discuss in the next video.

262
00:14:22,000 --> 00:14:31,000
And uh here I will just take some examples and then convert that into vectors before I, uh, convert

263
00:14:31,000 --> 00:14:33,000
that into a vector with the help of word embedding layer.

264
00:14:33,000 --> 00:14:34,000
Okay.

265
00:14:34,000 --> 00:14:37,000
Then after that we will start our end to end project.

266
00:14:37,000 --> 00:14:37,000
Okay.

267
00:14:37,000 --> 00:14:39,000
So yes, uh, this was it from my side.

268
00:14:39,000 --> 00:14:44,000
I hope you liked this particular video, but I hope you got an idea with respect to word embedding or

269
00:14:44,000 --> 00:14:46,000
what exactly is embedding layer or feature representation?

270
00:14:47,000 --> 00:14:50,000
I will see you all in the next video.

271
00:14:50,000 --> 00:14:51,000
Thank you.