1
00:00:00,000 --> 00:00:03,000
So guys, now we are going to continue the discussion with respect to word embedding.

2
00:00:03,000 --> 00:00:08,000
And now in this video we are going to see a practical implementation that how we can use word embedding

3
00:00:08,000 --> 00:00:11,000
and convert your word into vectors.

4
00:00:11,000 --> 00:00:16,000
Again before I go ahead please remember this vocabulary size and feature dimensions because we are also

5
00:00:16,000 --> 00:00:19,000
going to use the same thing in our code okay.

6
00:00:19,000 --> 00:00:25,000
So let's quickly go ahead and open my, uh, open my file.

7
00:00:25,000 --> 00:00:31,000
So here is my entire, uh, you can see in an classification I've created a folder over here.

8
00:00:31,000 --> 00:00:36,000
Similarly, in simple RNN, I will go ahead and create one of my file, which is called as embedding

9
00:00:36,000 --> 00:00:39,000
dot I pi and b.

10
00:00:39,000 --> 00:00:39,000
Okay.

11
00:00:40,000 --> 00:00:47,000
Now this in this embedding I will first of all go ahead and select our kernel quickly and start our

12
00:00:47,000 --> 00:00:48,000
code over here.

13
00:00:48,000 --> 00:00:49,000
Right.

14
00:00:49,000 --> 00:00:52,000
So here uh you will be able to see that.

15
00:00:52,000 --> 00:00:55,000
First let me some make some code cells.

16
00:00:55,000 --> 00:00:57,000
And this is where I'm going to write the my code.

17
00:00:57,000 --> 00:00:58,000
Okay.

18
00:00:58,000 --> 00:01:01,000
Now here also we are going to use this current TensorFlow.

19
00:01:01,000 --> 00:01:05,000
So that is the reason in the same v env environment we are going to work okay.

20
00:01:05,000 --> 00:01:15,000
So here I will go ahead and write from TensorFlow dot Keras dot preprocessing dot text.

21
00:01:15,000 --> 00:01:21,000
First of all what I'm actually going to do is that I'm going to perform one hot representation for a

22
00:01:21,000 --> 00:01:21,000
specific word.

23
00:01:21,000 --> 00:01:22,000
Okay.

24
00:01:22,000 --> 00:01:25,000
So that is the reason I will go ahead and import one.

25
00:01:25,000 --> 00:01:26,000
Hot okay.

26
00:01:26,000 --> 00:01:30,000
So this one hot representation I've already shown you in my previous video.

27
00:01:30,000 --> 00:01:32,000
How do we specifically do it.

28
00:01:32,000 --> 00:01:38,000
Now to try some some of the sentences, here are my list of sentences that I'm actually going to use.

29
00:01:38,000 --> 00:01:38,000
Okay.

30
00:01:38,000 --> 00:01:41,000
So I'll go ahead and execute this particular code with respect to one hot.

31
00:01:41,000 --> 00:01:43,000
And these are my sentences.

32
00:01:43,000 --> 00:01:48,000
The glass of milk, the glass of juice the cup of tea I'm a good boy.

33
00:01:48,000 --> 00:01:49,000
I'm a good developer.

34
00:01:49,000 --> 00:01:50,000
Understand the meaning of words.

35
00:01:50,000 --> 00:01:51,000
Your videos are good.

36
00:01:51,000 --> 00:01:56,000
Okay, so these are my set of questions that I am going to specifically use.

37
00:01:56,000 --> 00:02:03,000
So if you go ahead and execute this, uh, now if I go ahead and probably see my sentences, here are

38
00:02:03,000 --> 00:02:04,000
all my sentences.

39
00:02:04,000 --> 00:02:12,000
Let's say, uh, as I said you, that we need to define the vocabulary size and we are going to use

40
00:02:12,000 --> 00:02:17,000
this word and probably convert this into vectors each and every word.

41
00:02:17,000 --> 00:02:19,000
You'll be able to see that I'll be able to convert into vectors.

42
00:02:19,000 --> 00:02:20,000
Okay.

43
00:02:20,000 --> 00:02:23,000
So here my vocabulary size that I'm actually going to consider.

44
00:02:23,000 --> 00:02:24,000
It's 10,000 okay.

45
00:02:24,000 --> 00:02:27,000
Let's consider it 10,000 that I'm actually going to do.

46
00:02:27,000 --> 00:02:35,000
Now the first thing that we are going to discuss about is one hot representation for every word.

47
00:02:35,000 --> 00:02:36,000
Okay.

48
00:02:36,000 --> 00:02:44,000
So I will just go ahead and write for word in sentence okay.

49
00:02:45,000 --> 00:02:48,000
So first of all I will just go ahead and write.

50
00:02:48,000 --> 00:02:53,000
I'll say for words in sentences because when I'm running through each and every sentence, I'll get

51
00:02:53,000 --> 00:02:54,000
each and every sentence itself.

52
00:02:54,000 --> 00:02:55,000
So I'm writing words.

53
00:02:55,000 --> 00:02:56,000
Okay.

54
00:02:56,000 --> 00:03:00,000
And for every words that I or every sentence that I get.

55
00:03:00,000 --> 00:03:01,000
Okay.

56
00:03:01,000 --> 00:03:07,000
Uh, in, in this specific sentence itself, I will pass it to one underscore hot.

57
00:03:07,000 --> 00:03:10,000
And here I'm just going to give two parameters.

58
00:03:10,000 --> 00:03:10,000
Okay.

59
00:03:10,000 --> 00:03:13,000
Please focus on this I'm going to give two parameters.

60
00:03:13,000 --> 00:03:19,000
The first parameter will be my words what sentence I'm specifically giving.

61
00:03:19,000 --> 00:03:20,000
and the second parameter.

62
00:03:20,000 --> 00:03:23,000
Here we are going to focus on the vocabulary size.

63
00:03:23,000 --> 00:03:29,000
And this I will try to create as a list representation or list comprehension.

64
00:03:29,000 --> 00:03:30,000
Sorry.

65
00:03:30,000 --> 00:03:30,000
Right.

66
00:03:30,000 --> 00:03:36,000
And this will basically give my one hot representation.

67
00:03:36,000 --> 00:03:43,000
Now once I write this code let's go ahead and execute it okay I will just go ahead and execute it.

68
00:03:43,000 --> 00:03:48,000
So here you can see vocabulary size is not defined because okay I did not execute this code okay.

69
00:03:48,000 --> 00:03:50,000
Now it should get executed.

70
00:03:50,000 --> 00:03:54,000
Now if I go ahead and execute it now see this is very amazing okay.

71
00:03:54,000 --> 00:04:02,000
If you remember what were my sentences, if I go ahead and write my sentence here, you can see my first

72
00:04:02,000 --> 00:04:04,000
sentence is the glass of milk.

73
00:04:04,000 --> 00:04:07,000
The second sentence is the glass of juice.

74
00:04:07,000 --> 00:04:09,000
Third sentence is the cup of tea.

75
00:04:10,000 --> 00:04:11,000
Here you have.

76
00:04:11,000 --> 00:04:12,000
I am a good boy.

77
00:04:12,000 --> 00:04:22,000
Now this glass of milk has got converted into this vectors in my 10,000 vocabulary size.

78
00:04:22,000 --> 00:04:24,000
Okay, 10,000 vocabulary size.

79
00:04:24,000 --> 00:04:25,000
That basically means this.

80
00:04:25,000 --> 00:04:28,000
The is given by the index 6186.

81
00:04:28,000 --> 00:04:36,000
Right glass is given by the index 6775 Off is basically given by the index 637.

82
00:04:36,000 --> 00:04:36,000
Right.

83
00:04:36,000 --> 00:04:40,000
And milk is basically given by 4895.

84
00:04:40,000 --> 00:04:41,000
What does this basically mean?

85
00:04:41,000 --> 00:04:49,000
If I expand this off into a vector of dimension in the 638 index, I will get one remaining.

86
00:04:49,000 --> 00:04:50,000
Everything will be zero.

87
00:04:50,000 --> 00:04:52,000
Similarly for this particular word.

88
00:04:52,000 --> 00:04:54,000
Similarly for this particular word, this is what it is basically saying.

89
00:04:55,000 --> 00:05:00,000
It is just giving an idea right for every word which index it is just going to be one okay.

90
00:05:01,000 --> 00:05:06,000
Now here also you can actually see this is one hot representation I am specifically using.

91
00:05:06,000 --> 00:05:08,000
But we need to use this in our embedding layer.

92
00:05:08,000 --> 00:05:13,000
So we are not going to use that entire vectors where only one is present and remaining all are zeros.

93
00:05:13,000 --> 00:05:16,000
We will try to convert all our words based on this index.

94
00:05:16,000 --> 00:05:21,000
So every word that you will be seeing in the sentence is representing one index okay.

95
00:05:21,000 --> 00:05:23,000
Now once we do this okay.

96
00:05:23,000 --> 00:05:24,000
Now see see observe.

97
00:05:24,000 --> 00:05:26,000
One thing I have written the glass of milk.

98
00:05:26,000 --> 00:05:30,000
And also I have written the glass of juice okay.

99
00:05:30,000 --> 00:05:37,000
Now when we write the glass of milk and the glass of juice okay, it is very much important to see you

100
00:05:37,000 --> 00:05:39,000
will be able to see all these words are same.

101
00:05:39,000 --> 00:05:41,000
All these vectors are same.

102
00:05:41,000 --> 00:05:45,000
Only this particular vector is basically changed, right?

103
00:05:45,000 --> 00:05:47,000
Because over here I have milk and then I have juice.

104
00:05:47,000 --> 00:05:49,000
So that is the reason and let's say vector.

105
00:05:49,000 --> 00:05:52,000
But the index has got changed right now.

106
00:05:52,000 --> 00:05:56,000
This is a very important thing that I really want you all to explain okay.

107
00:05:56,000 --> 00:06:04,000
So please focus over here now in word embedding if you probably see based on this vectors okay.

108
00:06:04,000 --> 00:06:07,000
Based on this vectors, if I take all these vectors okay.

109
00:06:07,000 --> 00:06:11,000
And if I start plotting this, let's say I apply a PCA.

110
00:06:12,000 --> 00:06:21,000
Uh, PCA basically means principal component analysis, and I convert this 300 dimensions into two dimension.

111
00:06:21,000 --> 00:06:26,000
Let's say that I go ahead and do a dimensionality reduction okay.

112
00:06:26,000 --> 00:06:33,000
And if I then plot all these vectors, you'll be seeing, let's say if I have man over here Okay.

113
00:06:34,000 --> 00:06:36,000
Human will be somewhere here again.

114
00:06:37,000 --> 00:06:37,000
Okay.

115
00:06:37,000 --> 00:06:39,000
King will be somewhere here.

116
00:06:40,000 --> 00:06:42,000
Queen will be somewhere here.

117
00:06:42,000 --> 00:06:47,000
So here you can actually see the vectors like man and king, because they are almost similar kind of

118
00:06:47,000 --> 00:06:48,000
vectors.

119
00:06:48,000 --> 00:06:49,000
It will be near to each other.

120
00:06:49,000 --> 00:06:51,000
Woman and queen, they are similar kind of vectors.

121
00:06:51,000 --> 00:06:54,000
It will be very close to each other.

122
00:06:54,000 --> 00:06:58,000
Similarly, if I go ahead and see with respect to fruits like apple and mango, these all are fruits,

123
00:06:58,000 --> 00:06:59,000
right?

124
00:06:59,000 --> 00:07:02,000
If you go ahead and see both these vectors will be very much nearby.

125
00:07:02,000 --> 00:07:02,000
Right.

126
00:07:02,000 --> 00:07:04,000
So this is very much important.

127
00:07:04,000 --> 00:07:06,000
That is what it actually mentions.

128
00:07:06,000 --> 00:07:11,000
Whenever you have this particular vector, you will be able to find out which vector is near to which

129
00:07:11,000 --> 00:07:11,000
one.

130
00:07:11,000 --> 00:07:11,000
Right.

131
00:07:11,000 --> 00:07:17,000
And here only your very important algorithm gets applied, which is called as cosine similarity.

132
00:07:18,000 --> 00:07:22,000
One of the amazing use cases that we will be seeing is something called as a recommendation system,

133
00:07:23,000 --> 00:07:23,000
right?

134
00:07:23,000 --> 00:07:27,000
Recommendation system uses this based on this particular vector.

135
00:07:27,000 --> 00:07:30,000
If I'm watching an action movie, I should be recommended another action movie.

136
00:07:30,000 --> 00:07:31,000
Okay.

137
00:07:31,000 --> 00:07:33,000
And that is what we are specifically doing now.

138
00:07:33,000 --> 00:07:37,000
I told you there are some problems with respect to one hot representation, so I will not directly use

139
00:07:37,000 --> 00:07:38,000
that.

140
00:07:38,000 --> 00:07:42,000
Instead, I will convert all my words into this particular indexes.

141
00:07:42,000 --> 00:07:42,000
Okay.

142
00:07:43,000 --> 00:07:47,000
And now I will take all these words sentence by sentence and I'll pass it to my embedding layer.

143
00:07:47,000 --> 00:07:51,000
And then we will try to convert this into a word embedding representation.

144
00:07:51,000 --> 00:07:55,000
So let's go ahead and write this word embedding representation.

145
00:07:55,000 --> 00:07:57,000
Very much important okay.

146
00:07:57,000 --> 00:08:02,000
So quickly let's go ahead and do this now in word embedding representation.

147
00:08:02,000 --> 00:08:05,000
As I said right now we use dense layer.

148
00:08:05,000 --> 00:08:09,000
Similarly there will be one more addition layer which is called as embedding.

149
00:08:09,000 --> 00:08:16,000
So for that we will go ahead and import for TensorFlow dot Keras dot layers.

150
00:08:16,000 --> 00:08:21,000
I'm just going to import embedding okay.

151
00:08:21,000 --> 00:08:29,000
Then we have from TensorFlow dot Keras dot preprocessing dot.

152
00:08:30,000 --> 00:08:38,000
Oops okay let me just go ahead and write from TensorFlow dot Keras.

153
00:08:38,000 --> 00:08:44,000
I'm also going to import one very important library which is called as dot sequence.

154
00:08:44,000 --> 00:08:51,000
And we are going to import pad underscore sequence I'll talk about it why this is really important.

155
00:08:51,000 --> 00:08:52,000
Okay.

156
00:08:52,000 --> 00:08:53,000
Pad underscore sequence.

157
00:08:54,000 --> 00:08:55,000
Okay I will talk about it.

158
00:08:55,000 --> 00:08:55,000
Just wait.

159
00:08:55,000 --> 00:08:58,000
But I have imported the embedding layer over here.

160
00:08:58,000 --> 00:09:01,000
Along with this, we know that we are going to create a sequential model.

161
00:09:01,000 --> 00:09:08,000
So for this I will go ahead and write from TensorFlow Keras dot models import sequential.

162
00:09:10,000 --> 00:09:14,000
Okay now I'm going to use this three important libraries.

163
00:09:14,000 --> 00:09:20,000
So I'm actually getting one error saying that hey from TensorFlow Keras dot processing.

164
00:09:20,000 --> 00:09:24,000
So now here you can see guys we are getting an error which is called as no module name TensorFlow Keras

165
00:09:24,000 --> 00:09:25,000
dot pre-processing.

166
00:09:26,000 --> 00:09:32,000
Now previously uh this pre-processing dot sequence was present inside this TensorFlow Keras dot processing.

167
00:09:32,000 --> 00:09:35,000
But now uh after seeing the documentation it has changed.

168
00:09:35,000 --> 00:09:35,000
Okay.

169
00:09:35,000 --> 00:09:41,000
So I will go ahead and write from TensorFlow dot Keras okay.

170
00:09:42,000 --> 00:09:48,000
From TensorFlow dot Keras dot utils Okay.

171
00:09:48,000 --> 00:09:54,000
And here we are just going to go ahead and import my padsequences okay.

172
00:09:54,000 --> 00:09:57,000
So uh we'll just comment this out.

173
00:09:57,000 --> 00:09:58,000
It was before in this.

174
00:09:58,000 --> 00:10:03,000
But again whenever you see this kind of error always refer to the documentation page okay.

175
00:10:03,000 --> 00:10:05,000
So now I will go ahead and execute it.

176
00:10:05,000 --> 00:10:07,000
This has got executed perfectly.

177
00:10:08,000 --> 00:10:13,000
Uh, now the next step that I am actually going to probably do is that we will also go ahead and import

178
00:10:13,000 --> 00:10:17,000
one important library called as numpy as NP.

179
00:10:17,000 --> 00:10:17,000
Okay.

180
00:10:18,000 --> 00:10:24,000
Now, first of all, we'll understand what this pad sequences basically mean.

181
00:10:24,000 --> 00:10:27,000
Now see observe each and every sentence.

182
00:10:27,000 --> 00:10:31,000
Every sentence is of different different number of words.

183
00:10:31,000 --> 00:10:33,000
Let's say okay first sentence is having four words.

184
00:10:33,000 --> 00:10:35,000
Second sentence is having four words.

185
00:10:35,000 --> 00:10:36,000
This is also four words.

186
00:10:36,000 --> 00:10:38,000
But here you have five words.

187
00:10:38,000 --> 00:10:40,000
Here you have five words.

188
00:10:40,000 --> 00:10:41,000
And here also you have five words.

189
00:10:41,000 --> 00:10:43,000
And some of the sentences may also have six words.

190
00:10:43,000 --> 00:10:44,000
Seven words.

191
00:10:44,000 --> 00:10:44,000
Right.

192
00:10:45,000 --> 00:10:52,000
One very important thing is that we need to make all these sentences of equal size okay.

193
00:10:52,000 --> 00:10:59,000
Otherwise we will not be able to train it in our RNN because at the end of the day, all the words that

194
00:10:59,000 --> 00:11:03,000
will be probably going, it will be going for a fixed number of time stamp based on the sentence size.

195
00:11:03,000 --> 00:11:03,000
Okay.

196
00:11:03,000 --> 00:11:05,000
So that is the reason.

197
00:11:05,000 --> 00:11:09,000
What we do is that we import this padsequences and what we are going to do with this patch sequence.

198
00:11:09,000 --> 00:11:12,000
We will go ahead and set up one maximum sentence length.

199
00:11:12,000 --> 00:11:18,000
So here let me just go ahead and write uh right now based on our data set, I will just go ahead and

200
00:11:18,000 --> 00:11:20,000
set this sentence length to eight okay.

201
00:11:20,000 --> 00:11:23,000
Because I know maximum number of words are five.

202
00:11:23,000 --> 00:11:29,000
So considering this, all the words, all the sentences in this, uh, should have a maximum of eight

203
00:11:29,000 --> 00:11:29,000
words.

204
00:11:29,000 --> 00:11:29,000
Okay.

205
00:11:29,000 --> 00:11:30,000
So I'm just keeping it.

206
00:11:30,000 --> 00:11:31,000
You can also keep it to ten.

207
00:11:31,000 --> 00:11:32,000
Okay.

208
00:11:32,000 --> 00:11:41,000
Now I will go ahead and say use this embedded docs or create my new docs where I apply all this pad

209
00:11:41,000 --> 00:11:41,000
sequence.

210
00:11:41,000 --> 00:11:49,000
Because if I want to make this sentence of eight words, I need to add three zeros more, right?

211
00:11:49,000 --> 00:11:50,000
So Padsequences does that.

212
00:11:50,000 --> 00:11:55,000
Only you can add three zeros before or you can add three zeros after.

213
00:11:55,000 --> 00:11:57,000
So in order to make all the sentences equal.

214
00:11:57,000 --> 00:12:00,000
So for this sentence also I'll say if four words are missing.

215
00:12:00,000 --> 00:12:05,000
So I'll go ahead and add either four zeros in the foreword or four zeros in the backward.

216
00:12:05,000 --> 00:12:06,000
So that will actually become eight sentence.

217
00:12:06,000 --> 00:12:11,000
Similarly here also I'll try to make it as eight sorry, not eight sentence eight words.

218
00:12:11,000 --> 00:12:13,000
Similarly, I'll also try to make this as eight words.

219
00:12:13,000 --> 00:12:16,000
Similarly, I'll try to see over here what is the length?

220
00:12:16,000 --> 00:12:16,000
Five words.

221
00:12:16,000 --> 00:12:17,000
So three more zeros.

222
00:12:17,000 --> 00:12:20,000
If I add it you'll be able to see that okay.

223
00:12:20,000 --> 00:12:21,000
It will become eight words.

224
00:12:21,000 --> 00:12:24,000
So that is what Padsequences basically do okay.

225
00:12:24,000 --> 00:12:25,000
So I will go ahead and use this.

226
00:12:25,000 --> 00:12:28,000
And I'll use the one hot representation that I've used.

227
00:12:28,000 --> 00:12:30,000
I'll say hey go ahead and apply padding.

228
00:12:30,000 --> 00:12:33,000
And inside this padding I will say hey go ahead and do pre padding.

229
00:12:33,000 --> 00:12:38,000
Pre padding basically means if I want to make the sentence of the same length I will.

230
00:12:38,000 --> 00:12:41,000
How many words are basically remaining out there?

231
00:12:41,000 --> 00:12:44,000
That many number of zeros will get added forward, right?

232
00:12:44,000 --> 00:12:48,000
And the third parameter that we really need to give is something called as max length.

233
00:12:48,000 --> 00:12:52,000
So inside this max length I will just go ahead and give my sentence length okay.

234
00:12:53,000 --> 00:12:57,000
And now let me just go ahead and print my embedded docs.

235
00:12:57,000 --> 00:13:03,000
Now this will be lovely because now you'll be able to see that all my sentences are of equal words.

236
00:13:03,000 --> 00:13:04,000
There are two ways.

237
00:13:04,000 --> 00:13:06,000
One is post, one is pre.

238
00:13:06,000 --> 00:13:09,000
So if I do post I will be getting the zeros at last.

239
00:13:09,000 --> 00:13:12,000
If I go ahead and write pre I will be getting zeros at the first.

240
00:13:12,000 --> 00:13:13,000
Okay.

241
00:13:13,000 --> 00:13:16,000
So right now I will just go ahead and use this pre okay.

242
00:13:16,000 --> 00:13:18,000
So I've used my vocabulary size.

243
00:13:18,000 --> 00:13:26,000
I have converted all my words into a representation wherein I'm using Padsequences along with that pre

244
00:13:26,000 --> 00:13:31,000
write, and I've converted all the things based on my maximum sentence length.

245
00:13:31,000 --> 00:13:31,000
Okay.

246
00:13:31,000 --> 00:13:33,000
Now is very much important.

247
00:13:33,000 --> 00:13:37,000
Now we will talk about feature representation.

248
00:13:38,000 --> 00:13:38,000
Okay.

249
00:13:38,000 --> 00:13:40,000
Feature representation.

250
00:13:40,000 --> 00:13:43,000
Now in this feature representation I'll say hey use the dimension ten.

251
00:13:43,000 --> 00:13:50,000
That basically means I want ten features with respect to the dimensions right in the word embedding.

252
00:13:50,000 --> 00:13:50,000
Okay.

253
00:13:50,000 --> 00:13:56,000
Over there I showed you in my theoretical intuition, I showed you the maximum example of 300 dimension.

254
00:13:56,000 --> 00:13:58,000
But here we are going to use this ten dimension okay.

255
00:13:59,000 --> 00:14:00,000
Now I'll go ahead and execute this.

256
00:14:00,000 --> 00:14:04,000
Now let's go ahead and create this model that is embedding layer.

257
00:14:04,000 --> 00:14:07,000
So I will go ahead and write model is equal to sequential first of all.

258
00:14:07,000 --> 00:14:11,000
And then we will go ahead and add model dot add one embedding layer.

259
00:14:11,000 --> 00:14:17,000
How do I add my embedding layer I'll just say embedding Now inside this embedding layer there are some

260
00:14:17,000 --> 00:14:19,000
parameters that we need to give.

261
00:14:19,000 --> 00:14:21,000
One is my vocabulary size.

262
00:14:21,000 --> 00:14:25,000
Second is my dimension how many dimension I want.

263
00:14:25,000 --> 00:14:30,000
And third is nothing but my input underscore.

264
00:14:32,000 --> 00:14:36,000
Length which is equal to your sentence underscore length.

265
00:14:36,000 --> 00:14:39,000
This is just to give you the maximum length okay.

266
00:14:40,000 --> 00:14:45,000
And this is my model with respect to the embedding layer.

267
00:14:45,000 --> 00:14:50,000
Now along with this I will go ahead and say hey I want to probably train with this entire we see I'm

268
00:14:50,000 --> 00:14:53,000
I'm showing how you can also train your entire embedding layer.

269
00:14:53,000 --> 00:14:54,000
Right.

270
00:14:54,000 --> 00:14:58,000
So I'll say hey, model dot compile, I'll use the optimizer which is called as Adam.

271
00:14:58,000 --> 00:15:02,000
Along with this I will go ahead and use my loss function which is nothing but MSE.

272
00:15:02,000 --> 00:15:04,000
Okay, done.

273
00:15:04,000 --> 00:15:09,000
Now if I go ahead and execute this okay, so this is what it has got executed.

274
00:15:09,000 --> 00:15:11,000
Now I'll go ahead and see my model summary.

275
00:15:11,000 --> 00:15:14,000
So this is my embedding model right.

276
00:15:14,000 --> 00:15:16,000
I have this many number of parameters.

277
00:15:16,000 --> 00:15:20,000
This many number of params are there because my vocabulary size is 10,000 okay.

278
00:15:20,000 --> 00:15:23,000
And then you have this eight comma ten okay.

279
00:15:23,000 --> 00:15:24,000
So this is perfect.

280
00:15:24,000 --> 00:15:25,000
Now this is my model.

281
00:15:25,000 --> 00:15:28,000
Now I will just go ahead and use this model.

282
00:15:28,000 --> 00:15:30,000
I'm not trained it.

283
00:15:30,000 --> 00:15:31,000
See I've used this embedding layer.

284
00:15:31,000 --> 00:15:36,000
So embedding already is a uh is a is a class over here.

285
00:15:36,000 --> 00:15:41,000
Now if I pass any words right, you'll be able to see that how it will be converting into a vector.

286
00:15:41,000 --> 00:15:45,000
So now what I will do I will take this embedding layer.

287
00:15:45,000 --> 00:15:47,000
I will go ahead and write model dot predict.

288
00:15:48,000 --> 00:15:52,000
And let's go ahead and give my embedded docs all the embedded docs.

289
00:15:52,000 --> 00:15:54,000
And this is what is very amazing.

290
00:15:54,000 --> 00:15:58,000
Every word will be represented by how many number of dimension ten

291
00:15:58,000 --> 00:16:05,000
123456789

292
00:16:05,000 --> 00:16:08,000
ten C type ten dimensions.

293
00:16:08,000 --> 00:16:09,000
Right.

294
00:16:09,000 --> 00:16:12,000
And if I just give get the first sentence okay.

295
00:16:12,000 --> 00:16:13,000
So my first sentence.

296
00:16:13,000 --> 00:16:17,000
What was my first sentence if I probably see over here.

297
00:16:17,000 --> 00:16:17,000
Right.

298
00:16:17,000 --> 00:16:21,000
So, uh, let's go ahead and write in this way, then you'll be able to understand.

299
00:16:23,000 --> 00:16:26,000
I'll go ahead and write embedded docs of zero.

300
00:16:26,000 --> 00:16:28,000
So this is my first sentence okay.

301
00:16:29,000 --> 00:16:34,000
Now I will just go ahead and predict for the first sentence what my embedding layer will basically be

302
00:16:34,000 --> 00:16:36,000
giving me as a vector representation.

303
00:16:36,000 --> 00:16:42,000
Obviously it needs to be ten dimension or so embedded docs of 0000 okay.

304
00:16:42,000 --> 00:16:44,000
Zero of zero.

305
00:16:44,000 --> 00:16:45,000
Perfect.

306
00:16:47,000 --> 00:16:47,000
Okay.

307
00:16:47,000 --> 00:16:52,000
Zero of zero is not working because this I'm getting is an array.

308
00:16:53,000 --> 00:16:58,000
Um, then, uh, this is my first sentence.

309
00:16:58,000 --> 00:17:03,000
So what you can see over here, this entire sentence has got converted to this.

310
00:17:04,000 --> 00:17:10,000
Okay then my second sentence, where only the last word is changing, has got converted to this.

311
00:17:10,000 --> 00:17:14,000
So that basically means this zero is got converted to this.

312
00:17:14,000 --> 00:17:18,000
This zero has got converted to this zero zero is converted to this particular vector.

313
00:17:18,000 --> 00:17:22,000
This zero is converted into this vector, this where uh index is converted into this vector.

314
00:17:22,000 --> 00:17:25,000
So here you can actually see all these vectors right.

315
00:17:25,000 --> 00:17:29,000
And that is how your entire embedding layer works Okay.

316
00:17:29,000 --> 00:17:30,000
Please see this?

317
00:17:30,000 --> 00:17:32,000
We don't even have to train anything.

318
00:17:32,000 --> 00:17:35,000
I've used the embedding layer internally I think.

319
00:17:35,000 --> 00:17:37,000
So it it is using word two vec here.

320
00:17:37,000 --> 00:17:39,000
It takes vocabulary size input.

321
00:17:39,000 --> 00:17:40,000
Uh then it takes the dimension.

322
00:17:40,000 --> 00:17:42,000
Then it takes the input length.

323
00:17:42,000 --> 00:17:47,000
Now this is something really important because in my next video I'm going to start our end to end project.

324
00:17:47,000 --> 00:17:50,000
And I'm going to take a very much bigger data set.

325
00:17:50,000 --> 00:17:53,000
And this embedding layer will be a part of that simple RNN.

326
00:17:53,000 --> 00:17:54,000
right?

327
00:17:54,000 --> 00:17:58,000
But, uh, I hope you got an idea with respect to working with embedding layer.

328
00:17:58,000 --> 00:18:01,000
So yes, uh, this was it.

329
00:18:01,000 --> 00:18:07,000
Uh, and uh, you can proceed further and you can go ahead and practice with different, different text.

330
00:18:07,000 --> 00:18:09,000
You can go ahead and write more text over here.

331
00:18:10,000 --> 00:18:14,000
Uh, and because this all steps will follow over there, you can add any number of sentences you can

332
00:18:14,000 --> 00:18:15,000
verify.

333
00:18:15,000 --> 00:18:15,000
Right.

334
00:18:15,000 --> 00:18:17,000
So yes, this was it.

335
00:18:17,000 --> 00:18:18,000
I will see you all in the next video.

336
00:18:18,000 --> 00:18:18,000
Thank you.