1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue our discussion with respect to word two VEC.

3
00:00:03,000 --> 00:00:06,000
Uh, already we know what two vec are basically of two types.

4
00:00:06,000 --> 00:00:13,000
One is cbow which we have already seen previously continuous bag of words and skip gram in this video.

5
00:00:13,000 --> 00:00:17,000
What we are going to do is that we are going to understand that how word two vec model is basically

6
00:00:17,000 --> 00:00:18,000
created.

7
00:00:18,000 --> 00:00:22,000
What is the deep learning model that we are specifically saying?

8
00:00:22,000 --> 00:00:26,000
You know how the inputs and outputs are there and how the model is basically trained.

9
00:00:26,000 --> 00:00:32,000
One important thing is that you really need to have a prerequisite knowledge about an loss function

10
00:00:32,000 --> 00:00:34,000
and optimizers.

11
00:00:34,000 --> 00:00:37,000
So if you do not have this, I would suggest.

12
00:00:37,000 --> 00:00:42,000
First of all, please make sure that you have some knowledge about this right before you understand

13
00:00:42,000 --> 00:00:42,000
this.

14
00:00:42,000 --> 00:00:48,000
Now with respect to word two vec, one more important thing is that in word two vec we also have a pre-trained

15
00:00:48,000 --> 00:00:49,000
models.

16
00:00:49,000 --> 00:00:53,000
Right now if I talk about a pre-trained model like Google, right?

17
00:00:53,000 --> 00:00:59,000
Google has a pre-trained model with respect to word two vec, which is trained on three 3 billion words.

18
00:00:59,000 --> 00:01:02,000
And we can also train a model from scratch.

19
00:01:02,000 --> 00:01:05,000
Train a model from scratch okay.

20
00:01:05,000 --> 00:01:10,000
And again, uh, the reason why I'm taking this, because you really need to understand that how that

21
00:01:10,000 --> 00:01:12,000
feature representation is basically getting created.

22
00:01:12,000 --> 00:01:13,000
Okay.

23
00:01:13,000 --> 00:01:20,000
Now let me go ahead and let me say is that, uh, uh, let me take a simple corpus and let's say that

24
00:01:20,000 --> 00:01:22,000
first of all, we'll start with cbow.

25
00:01:22,000 --> 00:01:22,000
Okay.

26
00:01:22,000 --> 00:01:28,000
So we are going to discuss about cbow, which is nothing but continuous bag of words and how this model

27
00:01:28,000 --> 00:01:29,000
is basically created.

28
00:01:29,000 --> 00:01:32,000
It is a type of word two vec right.

29
00:01:32,000 --> 00:01:34,000
Continuous bag of words.

30
00:01:34,000 --> 00:01:38,000
Now to solve any problem I will definitely have a data set.

31
00:01:38,000 --> 00:01:40,000
So let's say that this is my corpus.

32
00:01:40,000 --> 00:01:46,000
And remember corpus because all these models, you know like word two vec is trained on a huge data

33
00:01:46,000 --> 00:01:51,000
set, huge data set like this particular pre-trained model from Google is basically trained on 3 billion

34
00:01:51,000 --> 00:01:52,000
words.

35
00:01:52,000 --> 00:01:57,000
So let's say I have a corpus or a statement or a data set or or a paragraph.

36
00:01:57,000 --> 00:01:58,000
It can be anything.

37
00:01:58,000 --> 00:02:03,000
And for just making you understand, I'm just going to take a simple paragraph.

38
00:02:03,000 --> 00:02:05,000
I'm going to say that okay I neuron.

39
00:02:07,000 --> 00:02:15,000
I neuron is is I neuron company or I neuron company.

40
00:02:17,000 --> 00:02:19,000
Is related to.

41
00:02:22,000 --> 00:02:23,000
Data science.

42
00:02:23,000 --> 00:02:25,000
Let's say I have this particular corpus.

43
00:02:26,000 --> 00:02:28,000
Now remember like this.

44
00:02:28,000 --> 00:02:31,000
In a use case you will be having a bigger corpus.

45
00:02:31,000 --> 00:02:33,000
It can have millions of words, right?

46
00:02:33,000 --> 00:02:36,000
But let's say that I'm digging just a simple corpus over here.

47
00:02:36,000 --> 00:02:38,000
Just a single liner.

48
00:02:38,000 --> 00:02:43,000
Now you're going to understand that how a cbor word two vec is basically created, and how a model is

49
00:02:43,000 --> 00:02:45,000
basically trained with the help of deep learning.

50
00:02:46,000 --> 00:02:51,000
Now, first thing is that whenever we have a corpus, we really need to know what is our input data

51
00:02:51,000 --> 00:02:52,000
and what is our output data.

52
00:02:52,000 --> 00:02:55,000
Because word two vec altogether is a supervised machine learning, right?

53
00:02:56,000 --> 00:03:00,000
So first of all, what we do is that we select a window size.

54
00:03:01,000 --> 00:03:04,000
And I'll talk about this window size and why it is super important.

55
00:03:04,000 --> 00:03:06,000
Let's say that I'm going to select a window size of five.

56
00:03:07,000 --> 00:03:14,000
Now this window size is super important to basically create your input data and output data okay.

57
00:03:14,000 --> 00:03:15,000
Super super important.

58
00:03:15,000 --> 00:03:19,000
Like what should be your input data and what should be your output data so that you can train your model.

59
00:03:19,000 --> 00:03:24,000
Right now this window size five indicates that how many words I need to select initially.

60
00:03:24,000 --> 00:03:26,000
So let's say I am selecting five words.

61
00:03:26,000 --> 00:03:28,000
So here is my five words.

62
00:03:29,000 --> 00:03:33,000
Now from this particular five words I will take the center word.

63
00:03:34,000 --> 00:03:35,000
Center word.

64
00:03:35,000 --> 00:03:41,000
Now understand how I will take up this window size five words and convert into input and output data.

65
00:03:41,000 --> 00:03:46,000
So let's say this is my input data and this is my output data okay.

66
00:03:46,000 --> 00:03:53,000
Now the central element that I've actually taken over here is is right now in the input I will be having

67
00:03:53,000 --> 00:03:56,000
I neuron company.

68
00:03:58,000 --> 00:04:03,000
And then on the right hand side I have related and I have two okay.

69
00:04:03,000 --> 00:04:08,000
Now why why you may be thinking that I'm taking the forward and the backward, because understand,

70
00:04:09,000 --> 00:04:14,000
if I'm taking this as a central word and this will basically be my output word, okay is okay.

71
00:04:14,000 --> 00:04:16,000
This is is basically my output word.

72
00:04:16,000 --> 00:04:22,000
Now it should be knowing that what all words are in the forward context and what all words are in the

73
00:04:22,000 --> 00:04:22,000
backward context.

74
00:04:22,000 --> 00:04:25,000
So that is the reason we are creating in this particular way.

75
00:04:25,000 --> 00:04:30,000
So that this is output should be knowing about its forward word and this backward word.

76
00:04:30,000 --> 00:04:34,000
Just to get some idea about the context of that specific sentence.

77
00:04:34,000 --> 00:04:35,000
Now this is the first step.

78
00:04:35,000 --> 00:04:40,000
Now when I took the window size is five, the initial five words, I divided my data set into input

79
00:04:40,000 --> 00:04:41,000
and output.

80
00:04:41,000 --> 00:04:41,000
Perfect.

81
00:04:41,000 --> 00:04:47,000
Now the next step is that I will go ahead and I'll move this window by one step and take the next five

82
00:04:47,000 --> 00:04:47,000
words.

83
00:04:47,000 --> 00:04:50,000
So the next five words over here is nothing but.

84
00:04:50,000 --> 00:04:52,000
So here this is my sentence one.

85
00:04:53,000 --> 00:04:53,000
So sorry.

86
00:04:53,000 --> 00:04:54,000
Input one.

87
00:04:54,000 --> 00:04:55,000
And now this will become my input two.

88
00:04:55,000 --> 00:04:57,000
And here I will be having company.

89
00:04:58,000 --> 00:04:59,000
Okay.

90
00:04:59,000 --> 00:05:02,000
Uh and again from all these five words which is my center word.

91
00:05:02,000 --> 00:05:04,000
So this is basically my center word.

92
00:05:04,000 --> 00:05:04,000
Related.

93
00:05:05,000 --> 00:05:05,000
Right.

94
00:05:05,000 --> 00:05:06,000
So related.

95
00:05:06,000 --> 00:05:11,000
So here I'll write company is uh two and data.

96
00:05:11,000 --> 00:05:12,000
So this is my second input.

97
00:05:12,000 --> 00:05:15,000
And over here the output will basically be related.

98
00:05:16,000 --> 00:05:16,000
Right.

99
00:05:16,000 --> 00:05:17,000
Then.

100
00:05:17,000 --> 00:05:21,000
Similarly I will go to the next step and I will push this windows to one step more.

101
00:05:22,000 --> 00:05:22,000
Okay.

102
00:05:22,000 --> 00:05:28,000
So here now I'll be having my third sentence with respect to my input and output.

103
00:05:28,000 --> 00:05:30,000
And the third sentence again which will be central word.

104
00:05:30,000 --> 00:05:32,000
So this will basically be my central word.

105
00:05:32,000 --> 00:05:34,000
So here you have to.

106
00:05:35,000 --> 00:05:35,000
Okay.

107
00:05:35,000 --> 00:05:41,000
And here I'm going to have is related.

108
00:05:42,000 --> 00:05:43,000
Okay.

109
00:05:43,000 --> 00:05:45,000
And then the right hand side.

110
00:05:45,000 --> 00:05:47,000
Data and science.

111
00:05:47,000 --> 00:05:50,000
Now you may be thinking, Krish, should we only take the window size as five?

112
00:05:51,000 --> 00:05:51,000
No.

113
00:05:51,000 --> 00:05:52,000
You can take any value.

114
00:05:52,000 --> 00:05:53,000
You can take any value.

115
00:05:53,000 --> 00:05:53,000
And why?

116
00:05:53,000 --> 00:05:55,000
This window size is playing an important role.

117
00:05:55,000 --> 00:05:58,000
I'll just say in some time you can take up any value.

118
00:05:58,000 --> 00:05:59,000
But don't take an even number.

119
00:05:59,000 --> 00:06:04,000
Take an odd number so that I will be getting the central element, which I'm taking as an output will

120
00:06:04,000 --> 00:06:08,000
have the correct number of words in the forward context and in the backward context.

121
00:06:08,000 --> 00:06:13,000
Okay, so is related data science is there right now which is a central word over here.

122
00:06:13,000 --> 00:06:14,000
Two.

123
00:06:14,000 --> 00:06:14,000
Right.

124
00:06:14,000 --> 00:06:15,000
So I'm just going to write it as two.

125
00:06:15,000 --> 00:06:18,000
Now this became my input and output.

126
00:06:18,000 --> 00:06:21,000
Now what I'm actually going to do is that I'm going to train my model with this.

127
00:06:21,000 --> 00:06:22,000
Very simple right.

128
00:06:22,000 --> 00:06:24,000
I'm going to train my model with this.

129
00:06:24,000 --> 00:06:26,000
Now how the training will basically happen.

130
00:06:26,000 --> 00:06:31,000
Now, one very important thing that you need to understand over here, you'll be seeing in your own

131
00:06:31,000 --> 00:06:33,000
company related to all these inputs and outputs.

132
00:06:33,000 --> 00:06:36,000
So I cannot probably send this text directly.

133
00:06:36,000 --> 00:06:42,000
I need to convert this into some vectors initially to send it in as an input to the neural network also.

134
00:06:42,000 --> 00:06:46,000
So for this, what I'm actually going to do, first of all, you know that how many number of words

135
00:06:46,000 --> 00:06:51,000
I have, how many number of words I have in the vocabulary I have I neuron I have company I have is

136
00:06:51,000 --> 00:06:52,000
related to data science.

137
00:06:52,000 --> 00:06:53,000
Right.

138
00:06:53,000 --> 00:06:56,000
So they are around 1234567.

139
00:06:56,000 --> 00:06:57,000
Right.

140
00:06:57,000 --> 00:06:58,000
Seven words are there.

141
00:06:58,000 --> 00:07:02,000
Now if I probably use one hot encoding technique.

142
00:07:03,000 --> 00:07:04,000
Now see this okay.

143
00:07:04,000 --> 00:07:06,000
This is super important in one hot encoding technique.

144
00:07:06,000 --> 00:07:13,000
If I probably consider a neuron let's consider the first sentence over here I have a neuron and then

145
00:07:13,000 --> 00:07:15,000
I have company.

146
00:07:15,000 --> 00:07:18,000
Then I have related.

147
00:07:18,000 --> 00:07:20,000
And then I have two.

148
00:07:20,000 --> 00:07:21,000
Right?

149
00:07:21,000 --> 00:07:25,000
So for all these words how I'll be giving the one hot code.

150
00:07:25,000 --> 00:07:28,000
A one hot encoding representation wherever there will be a neuron.

151
00:07:28,000 --> 00:07:31,000
I'm just going to make it as one remaining.

152
00:07:31,000 --> 00:07:31,000
All will be zeros.

153
00:07:31,000 --> 00:07:36,000
So there will be around 1234, five six zeros.

154
00:07:36,000 --> 00:07:41,000
Similarly when when companies there I'll make this as one and remaining all will be zeros right.

155
00:07:41,000 --> 00:07:43,000
Similarly related related is in the fourth word.

156
00:07:43,000 --> 00:07:47,000
So I'm going to make it as 001000.

157
00:07:47,000 --> 00:07:48,000
And then two is present after this.

158
00:07:48,000 --> 00:07:52,000
So I'm basically going to write it as 0100.

159
00:07:52,000 --> 00:07:53,000
Right.

160
00:07:53,000 --> 00:07:55,000
So this is this is pretty much clear till here.

161
00:07:55,000 --> 00:07:55,000
Right.

162
00:07:55,000 --> 00:08:00,000
So here you can see that I have, I have basically done this a simple one hot encoded format.

163
00:08:00,000 --> 00:08:08,000
That basically means if I really want to pass I neuron I need to give this as my vector.

164
00:08:08,000 --> 00:08:09,000
This is what is the understanding.

165
00:08:09,000 --> 00:08:12,000
And this vector is basically given by seven dimensions.

166
00:08:12,000 --> 00:08:16,000
So seven vectors I am giving it over here right 10000.

167
00:08:16,000 --> 00:08:19,000
If I'm sending company as my next word then this should be the vector.

168
00:08:19,000 --> 00:08:21,000
It should go right then.

169
00:08:21,000 --> 00:08:23,000
If I'm sending related this should be the vector I should go.

170
00:08:23,000 --> 00:08:28,000
So similarly all these particular words will be converted into this particular vector using one hot

171
00:08:28,000 --> 00:08:28,000
encoding.

172
00:08:28,000 --> 00:08:31,000
Now let's go to the next step which is super super important okay.

173
00:08:31,000 --> 00:08:33,000
Super super important.

174
00:08:33,000 --> 00:08:35,000
What does Cbow basically mean?

175
00:08:35,000 --> 00:08:41,000
Continuous bag of words okay, this is nothing, but this is a fully connected neural network.

176
00:08:42,000 --> 00:08:46,000
Now you will be able to understand how these models are created.

177
00:08:46,000 --> 00:08:48,000
Fully connected neural network okay.

178
00:08:48,000 --> 00:08:53,000
Now in this fully connected neural network you will be able to see one very, very important thing.

179
00:08:53,000 --> 00:09:01,000
One is first of all just understand how many number of inputs should I be giving, right?

180
00:09:01,000 --> 00:09:04,000
Like how many words I will be giving as my input, right?

181
00:09:04,000 --> 00:09:07,000
Since my window size is same, window size is five.

182
00:09:07,000 --> 00:09:09,000
All my inputs are fixed.

183
00:09:09,000 --> 00:09:11,000
I hope that is very much clear right now.

184
00:09:11,000 --> 00:09:15,000
In this particular problem statement you'll be seeing, I'm giving four words in every sentence, so

185
00:09:15,000 --> 00:09:17,000
my input is basically fixed.

186
00:09:17,000 --> 00:09:23,000
Now I neuron when I give my word, I neuron let's say in the first case I give my sentence one and this

187
00:09:23,000 --> 00:09:24,000
is my sentence one.

188
00:09:24,000 --> 00:09:28,000
When I give I neuron I neuron is represented by seven vectors.

189
00:09:28,000 --> 00:09:30,000
Over here it is represented by this vector.

190
00:09:30,000 --> 00:09:33,000
Then company is basically represented by this vector.

191
00:09:33,000 --> 00:09:39,000
So if I probably see in fully connected layer my first input layer will basically be nothing.

192
00:09:39,000 --> 00:09:40,000
But.

193
00:09:40,000 --> 00:09:43,000
So here you will be able to see this will be my input okay.

194
00:09:43,000 --> 00:09:44,000
And this is super important guys.

195
00:09:44,000 --> 00:09:45,000
See this.

196
00:09:45,000 --> 00:09:46,000
So this will be my input.

197
00:09:47,000 --> 00:09:49,000
My first input word okay.

198
00:09:49,000 --> 00:09:55,000
And understand I'm giving I'm creating this circle as my inputs okay.

199
00:09:55,000 --> 00:09:58,000
I'm just creating this circle as my input.

200
00:09:58,000 --> 00:10:05,000
So if you see 12345677 inputs I'm giving it right now.

201
00:10:05,000 --> 00:10:09,000
When I give this seven inputs then similarly how many words will be going?

202
00:10:09,000 --> 00:10:10,000
Four words will be going.

203
00:10:10,000 --> 00:10:10,000
Right.

204
00:10:10,000 --> 00:10:12,000
So one word two word.

205
00:10:13,000 --> 00:10:15,000
So this is basically my input layer.

206
00:10:15,000 --> 00:10:21,000
So this layer is nothing but my input layer in a fully connected layer a simple an if I probably consider

207
00:10:21,000 --> 00:10:22,000
an example of an.

208
00:10:22,000 --> 00:10:25,000
So here also how many uh inputs I will be having.

209
00:10:25,000 --> 00:10:28,000
1234567.

210
00:10:28,000 --> 00:10:28,000
Right.

211
00:10:28,000 --> 00:10:32,000
And this is my first first word, second word, third word, fourth word.

212
00:10:32,000 --> 00:10:36,000
So I will be having four different words over here that will be going.

213
00:10:36,000 --> 00:10:43,000
And each word will be having a dimension of seven vectors 123, four, five, six, seven.

214
00:10:43,000 --> 00:10:44,000
Right.

215
00:10:44,000 --> 00:10:45,000
I'm not giving this value.

216
00:10:45,000 --> 00:10:48,000
Don't consider that these all are zeros okay?

217
00:10:48,000 --> 00:10:49,000
I'm just saying that these are my input layer.

218
00:10:49,000 --> 00:10:51,000
Input layer input circles okay.

219
00:10:51,000 --> 00:10:53,000
That is how we create it in an right.

220
00:10:53,000 --> 00:10:55,000
And then probably I have my last one.

221
00:10:56,000 --> 00:10:58,000
So this is my input, right?

222
00:10:58,000 --> 00:11:02,000
I'm basically designing the neural network, how it will look like when we are training a word two vec.

223
00:11:02,000 --> 00:11:04,000
So these are my four four words.

224
00:11:04,000 --> 00:11:05,000
Understand.

225
00:11:05,000 --> 00:11:08,000
In the first case I'm going to pass a neuron over here.

226
00:11:08,000 --> 00:11:10,000
So let's say I'm going to pass over here a neuron.

227
00:11:10,000 --> 00:11:17,000
This will be my input over here this is represented a neuron will be represented by 100000.

228
00:11:17,000 --> 00:11:17,000
Right.

229
00:11:17,000 --> 00:11:22,000
And similarly if I go to the second word that is like company then it will be represented by another

230
00:11:22,000 --> 00:11:23,000
word like this.

231
00:11:23,000 --> 00:11:30,000
Like it will be represented by a different vector like 0100000.

232
00:11:30,000 --> 00:11:30,000
Right.

233
00:11:30,000 --> 00:11:34,000
Seven it will be 1234567 right now.

234
00:11:34,000 --> 00:11:37,000
Similarly other words will be represented like this okay.

235
00:11:37,000 --> 00:11:39,000
So this becomes my input layer okay.

236
00:11:39,000 --> 00:11:44,000
And every input is basically given by a vector of seven seven dimensions.

237
00:11:44,000 --> 00:11:48,000
So because I am representing every word based on the vocabulary size using one hot encoding.

238
00:11:48,000 --> 00:11:50,000
Now this becomes my input layer.

239
00:11:50,000 --> 00:11:53,000
Now let's go to the middle layer that is called as the hidden layer.

240
00:11:53,000 --> 00:11:56,000
Now in this hidden layer this is super important.

241
00:11:56,000 --> 00:11:59,000
Just pause the video and guess what will be the size?

242
00:11:59,000 --> 00:12:04,000
You know the our window size is how much our window size is basically five, right?

243
00:12:05,000 --> 00:12:08,000
So I'm just going to make this as my window size.

244
00:12:08,000 --> 00:12:08,000
Okay.

245
00:12:08,000 --> 00:12:10,000
So this is my window size.

246
00:12:10,000 --> 00:12:15,000
Now in my window size if you remember how many, how many we are having in our window size.

247
00:12:15,000 --> 00:12:17,000
Our window size is nothing but five.

248
00:12:17,000 --> 00:12:21,000
So I will be having 12345.

249
00:12:21,000 --> 00:12:21,000
Right.

250
00:12:21,000 --> 00:12:23,000
Window size is basically five.

251
00:12:23,000 --> 00:12:26,000
So in my hidden layer I'll be having this five vectors okay.

252
00:12:26,000 --> 00:12:31,000
So just understand that with respect to this five our window size will be set over here okay.

253
00:12:31,000 --> 00:12:37,000
Now with respect to the output in output how many values I have, I just have one value and each word

254
00:12:37,000 --> 00:12:39,000
I just have one word in the output.

255
00:12:39,000 --> 00:12:39,000
Right.

256
00:12:39,000 --> 00:12:42,000
And each word is represented by a vector of seven.

257
00:12:42,000 --> 00:12:47,000
Because if I'm also considering this is using one hot encoding, I'm going to get this vectors of this

258
00:12:47,000 --> 00:12:48,000
dimension that is seven.

259
00:12:48,000 --> 00:12:49,000
Right.

260
00:12:49,000 --> 00:12:58,000
So what I will be doing in output, I will basically be having another output layer like this which

261
00:12:58,000 --> 00:13:05,000
will again be having seven different outputs 1234567.

262
00:13:05,000 --> 00:13:05,000
Okay.

263
00:13:05,000 --> 00:13:09,000
Now this is how my fully connected neural net will look like.

264
00:13:09,000 --> 00:13:10,000
Neural network will look like.

265
00:13:10,000 --> 00:13:13,000
Now you need to understand one thing over here.

266
00:13:13,000 --> 00:13:19,000
Each and every node, each and every node will be connected to the other node.

267
00:13:20,000 --> 00:13:20,000
Right.

268
00:13:20,000 --> 00:13:21,000
Like this.

269
00:13:24,000 --> 00:13:24,000
Like this.

270
00:13:24,000 --> 00:13:25,000
It will be connected.

271
00:13:25,000 --> 00:13:25,000
Like.

272
00:13:25,000 --> 00:13:27,000
Like how an an will work.

273
00:13:27,000 --> 00:13:30,000
It will be connected like this only, right?

274
00:13:30,000 --> 00:13:33,000
Similarly right now similarly this all will be connected to this also.

275
00:13:33,000 --> 00:13:39,000
So in short, I can basically make a very simple connection like this which will look like this itself.

276
00:13:40,000 --> 00:13:42,000
And this will be entirely connected to this.

277
00:13:42,000 --> 00:13:43,000
Right.

278
00:13:43,000 --> 00:13:47,000
Understand all these lines will have some initialized weight.

279
00:13:47,000 --> 00:13:48,000
initialized weights.

280
00:13:48,000 --> 00:13:50,000
And we need to train these weights.

281
00:13:50,000 --> 00:13:52,000
And this is what it happens in an write.

282
00:13:52,000 --> 00:13:54,000
Similarly this will be connected to this.

283
00:13:55,000 --> 00:13:57,000
This will be connected to this.

284
00:13:58,000 --> 00:14:01,000
This will be connected to this right.

285
00:14:01,000 --> 00:14:03,000
And finally this will also be connected to this.

286
00:14:06,000 --> 00:14:06,000
Right.

287
00:14:06,000 --> 00:14:09,000
So everything is basically getting connected.

288
00:14:09,000 --> 00:14:16,000
And from the hidden layer this is my hidden layer one HL one and this is my output layer right now from

289
00:14:16,000 --> 00:14:18,000
this hidden layer it will basically get connected over here.

290
00:14:18,000 --> 00:14:20,000
And this will get connected over here.

291
00:14:20,000 --> 00:14:20,000
Okay.

292
00:14:21,000 --> 00:14:24,000
Now understand one very very important thing okay.

293
00:14:25,000 --> 00:14:27,000
This is super, super important.

294
00:14:27,000 --> 00:14:27,000
Fine.

295
00:14:27,000 --> 00:14:30,000
We are connecting it with the help of loss function.

296
00:14:30,000 --> 00:14:33,000
Will will also do forward and backward propagation.

297
00:14:33,000 --> 00:14:39,000
Now let's say we will consider let's let's pass this particular word AI neuron company related to.

298
00:14:39,000 --> 00:14:40,000
So I have passed AI neuron company.

299
00:14:40,000 --> 00:14:44,000
And here also you'll be able to see I'm passing related to.

300
00:14:44,000 --> 00:14:45,000
Very simple okay.

301
00:14:45,000 --> 00:14:47,000
Let me just zoom out a bit okay.

302
00:14:47,000 --> 00:14:54,000
Now once I pass all these things what happens over here with respect to this seven output, I already

303
00:14:54,000 --> 00:14:57,000
know what is my real output is right.

304
00:14:57,000 --> 00:14:58,000
I'll be getting some values over here.

305
00:14:59,000 --> 00:14:59,000
Okay.

306
00:14:59,000 --> 00:15:00,000
I'll be getting some values.

307
00:15:00,000 --> 00:15:01,000
Okay.

308
00:15:01,000 --> 00:15:06,000
But the real output is what if I consider is is is my third word.

309
00:15:06,000 --> 00:15:07,000
So is is my real output.

310
00:15:07,000 --> 00:15:15,000
So this will basically be represented in this vector format that is 0010000.

311
00:15:15,000 --> 00:15:15,000
Right.

312
00:15:15,000 --> 00:15:21,000
But after training while we are training the model with different different weights, this is my true

313
00:15:21,000 --> 00:15:21,000
output.

314
00:15:21,000 --> 00:15:22,000
This is my Y.

315
00:15:22,000 --> 00:15:25,000
I may also get different y hat right?

316
00:15:25,000 --> 00:15:28,000
I may get some values like 0.25.

317
00:15:28,000 --> 00:15:30,000
I may get some values like 0.33.

318
00:15:30,000 --> 00:15:34,000
Then like this 01000 something like this.

319
00:15:34,000 --> 00:15:37,000
Then what we do, we basically calculate the loss function.

320
00:15:37,000 --> 00:15:39,000
And based on this loss we need to reduce this.

321
00:15:39,000 --> 00:15:44,000
We do the backward propagation right backward propagation.

322
00:15:44,000 --> 00:15:49,000
And we do it unless and until the difference between y and and y hat are minimal.

323
00:15:49,000 --> 00:15:49,000
Okay.

324
00:15:49,000 --> 00:15:51,000
And this process is continuous.

325
00:15:51,000 --> 00:15:52,000
Very simple.

326
00:15:52,000 --> 00:15:56,000
But now you really need to understand one very important thing.

327
00:15:57,000 --> 00:15:58,000
Very very important thing.

328
00:15:59,000 --> 00:16:04,000
Now since this is giving me a specific output okay.

329
00:16:04,000 --> 00:16:06,000
This is basically giving me a specific output.

330
00:16:06,000 --> 00:16:14,000
When I say my my middle layer is basically window size of five Window size of five.

331
00:16:14,000 --> 00:16:17,000
That basically means over here in the word two vec.

332
00:16:17,000 --> 00:16:23,000
When I said I will be getting a 300 dimensions over here when I'm using Google Word two vec.

333
00:16:24,000 --> 00:16:28,000
This is all because of this window size okay.

334
00:16:28,000 --> 00:16:35,000
That basically means if my window size is five, I'm going to get the output as five for every word.

335
00:16:35,000 --> 00:16:40,000
That basically means when a word is getting converted into a vector.

336
00:16:41,000 --> 00:16:47,000
I am going to get a size of five vectors and this will basically be my final output.

337
00:16:48,000 --> 00:16:50,000
Now I hope you are able to understand again.

338
00:16:50,000 --> 00:16:51,000
Let me repeat it.

339
00:16:51,000 --> 00:16:58,000
The reason I have actually selected window size is equal to five, because I want to probably provide

340
00:16:58,000 --> 00:17:03,000
a feature representation with a vector size of five.

341
00:17:03,000 --> 00:17:04,000
Okay.

342
00:17:04,000 --> 00:17:08,000
That basically means every word will be converted into a five vector.

343
00:17:08,000 --> 00:17:15,000
Now, when I took an example of Google, which was getting converted into 300 dimension, that basically

344
00:17:15,000 --> 00:17:18,000
means my window size is 300 and more.

345
00:17:18,000 --> 00:17:22,000
The bigger window size, the better the model can basically perform.

346
00:17:22,000 --> 00:17:27,000
Okay, so in this case you will you'll be able to see that over here.

347
00:17:27,000 --> 00:17:28,000
My window size is five.

348
00:17:28,000 --> 00:17:34,000
That basically means if I see from starting right, my metrics for every word will be seven cross five.

349
00:17:34,000 --> 00:17:35,000
That many number of weights will be there.

350
00:17:35,000 --> 00:17:38,000
Then here also I'll be having seven cross five weights here.

351
00:17:38,000 --> 00:17:43,000
Also, I'll be having seven cross five weights because I'm giving seven different vectors here.

352
00:17:43,000 --> 00:17:44,000
Also I'm having seven cross five.

353
00:17:44,000 --> 00:17:48,000
But in this case I will basically be having five cross seven.

354
00:17:48,000 --> 00:17:50,000
Now what does five cross seven basically mean?

355
00:17:50,000 --> 00:17:53,000
When this loss gets reduced?

356
00:17:53,000 --> 00:17:57,000
Then my final vector will look something like this.

357
00:17:57,000 --> 00:17:59,000
This all will get connected to this.

358
00:17:59,000 --> 00:18:00,000
This all will get connected to this.

359
00:18:00,000 --> 00:18:03,000
Let's say it is getting connected to this one.

360
00:18:03,000 --> 00:18:04,000
It is getting connected to this one.

361
00:18:04,000 --> 00:18:07,000
It is getting connected to this one.

362
00:18:07,000 --> 00:18:14,000
So once we have this particular connection, let's say our first word over here is what is our first

363
00:18:14,000 --> 00:18:14,000
word.

364
00:18:14,000 --> 00:18:21,000
If you probably see what is our first word with respect to this particular vectors with vocabulary you'll

365
00:18:21,000 --> 00:18:22,000
be seeing I neuron right.

366
00:18:22,000 --> 00:18:24,000
This is my vocabulary.

367
00:18:24,000 --> 00:18:29,000
The first word that is basically getting represented over here is I neuron right.

368
00:18:29,000 --> 00:18:38,000
So I neuron will have a output dimension of five vectors because I'm getting this five vectors over

369
00:18:38,000 --> 00:18:40,000
here joined to this.

370
00:18:40,000 --> 00:18:43,000
So this five vectors will be like 0.92.

371
00:18:43,000 --> 00:18:44,000
It can be 0.94.

372
00:18:44,000 --> 00:18:49,000
Based on the training it can be 0.25, it can be 0.36 and it can be 0.45.

373
00:18:50,000 --> 00:18:53,000
And this is based on some feature representation.

374
00:18:53,000 --> 00:18:55,000
So I hope you are able to understand.

375
00:18:55,000 --> 00:18:58,000
And this is how a neuron will be represented.

376
00:18:58,000 --> 00:19:02,000
The second word that we have in the vocabulary that is company, right.

377
00:19:02,000 --> 00:19:05,000
This will again get connected to this company.

378
00:19:05,000 --> 00:19:06,000
This will also get connected.

379
00:19:06,000 --> 00:19:08,000
This will also get connected.

380
00:19:08,000 --> 00:19:09,000
This will also get connected.

381
00:19:09,000 --> 00:19:14,000
This will also get and this entire word will be the vector for the company itself and for the company.

382
00:19:14,000 --> 00:19:19,000
We may have a different vector, but again the size will be five dimension because our window size is

383
00:19:19,000 --> 00:19:20,000
five.

384
00:19:21,000 --> 00:19:26,000
And this training of forward and the backward propagation, when the loss is minimal, then only we

385
00:19:26,000 --> 00:19:32,000
will be able to get the vectors and that vector is basically taken, and it will be represented in the

386
00:19:32,000 --> 00:19:35,000
uh format of the feature representation for each and every word.

387
00:19:35,000 --> 00:19:43,000
So I hope you are able to understand like how word two vec actually works with respect to CV or Cbow.

388
00:19:43,000 --> 00:19:43,000
Right?

389
00:19:43,000 --> 00:19:45,000
That is continuous bag of words.

390
00:19:45,000 --> 00:19:48,000
If you don't know about and then it will be quite difficult.

391
00:19:48,000 --> 00:19:51,000
But again understand the loss function, it will keep on reducing.

392
00:19:51,000 --> 00:19:55,000
And finally, whatever vectors you are getting at the last that will be combined and that will be taken

393
00:19:55,000 --> 00:19:58,000
as the first word, then second word and third word like that.

394
00:19:58,000 --> 00:20:01,000
Okay, so yes, this was it about Cbow.

395
00:20:02,000 --> 00:20:04,000
Uh, in the next video, I'll be talking about Skip gram.

396
00:20:04,000 --> 00:20:05,000
Right.

397
00:20:06,000 --> 00:20:10,000
Uh, and then we'll talk about the advantages and disadvantages of this particular word two vec.

398
00:20:10,000 --> 00:20:12,000
So yes, uh, I will see you all in the next video.

399
00:20:13,000 --> 00:20:14,000
Uh, this was it.

400
00:20:14,000 --> 00:20:14,000
Thank you.

401
00:20:14,000 --> 00:20:15,000
Bye bye.

