1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:05,000
So we are going to continue the discussion with respect to attention mechanism.

3
00:00:05,000 --> 00:00:11,000
Already in our previous video, we have seen the problems with encoder and decoder or decoder sequence

4
00:00:11,000 --> 00:00:12,000
to sequence architecture.

5
00:00:12,000 --> 00:00:19,000
And we also understood if we have a context vector, like if we have a longer sentence right or longer

6
00:00:19,000 --> 00:00:20,000
paragraph.

7
00:00:20,000 --> 00:00:25,000
Whenever we try to give this specific paragraph over here, word by word, the context vector that is

8
00:00:25,000 --> 00:00:31,000
created by the encoder, sometime it will not be sufficient enough for the decoder to probably get the

9
00:00:31,000 --> 00:00:31,000
output.

10
00:00:31,000 --> 00:00:37,000
So based on that, we also saw that okay, uh, the performance metric which was used by researchers,

11
00:00:37,000 --> 00:00:38,000
which is called as Bleu score.

12
00:00:39,000 --> 00:00:43,000
As the sentence length was increasing, you could see the Bleu score was also going down right.

13
00:00:43,000 --> 00:00:49,000
So in order to overcome this, uh, advantage or disadvantage, we are specifically using something

14
00:00:49,000 --> 00:00:57,000
called as attention mechanism and the main idea behind attention mechanism was that for longer or shorter

15
00:00:57,000 --> 00:01:02,000
paragraph, along with the context vector, we will go ahead and provide more context, right?

16
00:01:02,000 --> 00:01:04,000
So this was the entire idea over it.

17
00:01:04,000 --> 00:01:09,000
Now for this, we will specifically be using this particular research paper to understand this is an

18
00:01:09,000 --> 00:01:16,000
amazing research paper, uh, neural machine translation by jointly learning to align and translate.

19
00:01:16,000 --> 00:01:18,000
Anyhow, I will be giving you the specific link.

20
00:01:18,000 --> 00:01:23,000
You can also go ahead and explore it, but if you just go and search, you will be able to get the link

21
00:01:23,000 --> 00:01:23,000
right.

22
00:01:23,000 --> 00:01:27,000
And over here the researcher was Yoshua Bengio.

23
00:01:27,000 --> 00:01:32,000
I hope you know him is a famous famous researcher and scientist in this specific field and specifically

24
00:01:32,000 --> 00:01:33,000
in the field of NLP.

25
00:01:33,000 --> 00:01:39,000
Along with that, uh, you have uh, it's very hard to pronounce, so I don't want to really pronounce

26
00:01:39,000 --> 00:01:40,000
it, but I'm extremely sorry.

27
00:01:40,000 --> 00:01:42,000
Uh, you know, I may pronounce it wrong.

28
00:01:42,000 --> 00:01:42,000
Right?

29
00:01:42,000 --> 00:01:44,000
And they are respected personalities.

30
00:01:44,000 --> 00:01:45,000
I don't want to do that.

31
00:01:45,000 --> 00:01:51,000
Okay, so, um, we will discuss about it, like how this, uh, attention mechanism is solving the

32
00:01:51,000 --> 00:01:52,000
specific problem.

33
00:01:52,000 --> 00:01:57,000
And if you go down, you know, this is the main decoder, general description.

34
00:01:57,000 --> 00:02:01,000
Uh, we will try to change the architecture of encoder and decoder.

35
00:02:01,000 --> 00:02:01,000
What?

36
00:02:01,000 --> 00:02:02,000
It will basically be changing.

37
00:02:02,000 --> 00:02:03,000
We will talk about it okay.

38
00:02:03,000 --> 00:02:08,000
So first of all, uh, let's go ahead with a simple example over here.

39
00:02:08,000 --> 00:02:08,000
Right.

40
00:02:08,000 --> 00:02:14,000
So before we go ahead, let's go ahead and probably discuss the encoder decoder architecture once again.

41
00:02:14,000 --> 00:02:17,000
And then uh we will try to understand the problem.

42
00:02:17,000 --> 00:02:21,000
And quickly we will see what kind of changes is basically there with respect to this architecture.

43
00:02:21,000 --> 00:02:25,000
The reason why I'm again, going back to encoder and decoder architecture, because here in the research

44
00:02:25,000 --> 00:02:28,000
paper there is some different kind of notation that is used.

45
00:02:28,000 --> 00:02:28,000
Okay.

46
00:02:28,000 --> 00:02:34,000
So here uh, what I'm actually going to do, let's go ahead and probably create the LSTM in the encoder.

47
00:02:34,000 --> 00:02:36,000
So this will be my H1 tag.

48
00:02:37,000 --> 00:02:38,000
Uh sorry not H1.

49
00:02:38,000 --> 00:02:41,000
So this is my LSTM uh neural network.

50
00:02:41,000 --> 00:02:43,000
So this will be my H2 with respect to time.

51
00:02:43,000 --> 00:02:45,000
Then I have my H3 okay.

52
00:02:46,000 --> 00:02:50,000
So let's say that I have this three text that is probably going on.

53
00:02:50,000 --> 00:02:52,000
So this is my x one.

54
00:02:52,000 --> 00:02:53,000
This is my x two.

55
00:02:53,000 --> 00:02:55,000
This is my x three okay.

56
00:02:55,000 --> 00:02:57,000
Let's say over here I will just go ahead and write.

57
00:02:57,000 --> 00:02:58,000
Hello.

58
00:02:58,000 --> 00:03:00,000
This is the text that I'm actually sending.

59
00:03:00,000 --> 00:03:01,000
Hello.

60
00:03:01,000 --> 00:03:04,000
What's up okay I'm giving this particular text.

61
00:03:04,000 --> 00:03:09,000
Now if I consider this as my encoder.

62
00:03:09,000 --> 00:03:09,000
Right.

63
00:03:09,000 --> 00:03:16,000
So the outcome of this particular encoder will be that obviously we'll take the sent text and we'll

64
00:03:16,000 --> 00:03:20,000
convert uh, we'll use a word embedding layer and convert that into vectors and pass it to the hidden

65
00:03:20,000 --> 00:03:22,000
layer right hidden layer, which is LSTM.

66
00:03:22,000 --> 00:03:27,000
Then uh, once I probably go till the end of the sentence, that is what's up?

67
00:03:27,000 --> 00:03:31,000
So at the end of the day, from here we will be able to get two important information.

68
00:03:31,000 --> 00:03:35,000
One is s zero, S zero is nothing but your hidden state.

69
00:03:35,000 --> 00:03:40,000
And along with this we will also be getting one more uh, very important parameter that is called as

70
00:03:40,000 --> 00:03:40,000
c.

71
00:03:40,000 --> 00:03:42,000
C is nothing but context vector.

72
00:03:42,000 --> 00:03:44,000
Now see why I'm using this notation?

73
00:03:44,000 --> 00:03:46,000
Why I did not use the previous notation like this.

74
00:03:46,000 --> 00:03:50,000
Because in the in the research paper also, I have this as zero and all.

75
00:03:50,000 --> 00:03:52,000
So it will be very easy for you all to understand.

76
00:03:52,000 --> 00:03:57,000
So from this I will be getting two important uh, information.

77
00:03:57,000 --> 00:04:00,000
One is one is S zero.

78
00:04:00,000 --> 00:04:02,000
That is the hidden state.

79
00:04:02,000 --> 00:04:02,000
Okay.

80
00:04:02,000 --> 00:04:06,000
And then I have something called as c okay.

81
00:04:06,000 --> 00:04:10,000
C is nothing, but it is my context vector.

82
00:04:10,000 --> 00:04:11,000
Okay.

83
00:04:12,000 --> 00:04:16,000
Now the next step will be that from this S0.

84
00:04:16,000 --> 00:04:19,000
Now this is my encoder stage right encoder.

85
00:04:19,000 --> 00:04:22,000
And first of all we will wait for all the inputs to come in the encoder.

86
00:04:22,000 --> 00:04:25,000
All the internal processing will basically happen you know.

87
00:04:25,000 --> 00:04:29,000
Or because they have three amazing gates over here for for get input and output gate.

88
00:04:29,000 --> 00:04:35,000
Once we get this context vector this context vector will further get passed to the decoders.

89
00:04:35,000 --> 00:04:41,000
Now with respect to the decoder, first of all, I will just go ahead and create one decoder over here.

90
00:04:41,000 --> 00:04:44,000
In this I will just go ahead and create three LSTM neurons.

91
00:04:44,000 --> 00:04:45,000
Okay.

92
00:04:46,000 --> 00:04:50,000
With respect to time, you know I need to probably create this also with respect to time, right.

93
00:04:50,000 --> 00:04:54,000
So here I will use some another color let's say.

94
00:04:54,000 --> 00:04:59,000
So this is my first, second and third.

95
00:04:59,000 --> 00:05:00,000
I need to convert this three words.

96
00:05:00,000 --> 00:05:01,000
Right.

97
00:05:01,000 --> 00:05:04,000
So what used to happen in the decoder over here.

98
00:05:04,000 --> 00:05:07,000
So this is my entire decoder right.

99
00:05:08,000 --> 00:05:10,000
Let's go ahead and divide like this okay.

100
00:05:10,000 --> 00:05:13,000
Inside this decoder first of all we used to pass this as zero.

101
00:05:13,000 --> 00:05:20,000
So s zero will get passed over here right then the C used to get passed over here okay.

102
00:05:20,000 --> 00:05:20,000
Okay.

103
00:05:20,000 --> 00:05:27,000
And along with this context vector, what used to happen is that, uh, our first, um, first input

104
00:05:27,000 --> 00:05:29,000
that we were passing.

105
00:05:29,000 --> 00:05:29,000
Right.

106
00:05:30,000 --> 00:05:32,000
Our first input used to be y zero.

107
00:05:32,000 --> 00:05:33,000
Y zero is our truth value.

108
00:05:33,000 --> 00:05:35,000
Let's say this is the source.

109
00:05:35,000 --> 00:05:38,000
Start off sentence okay.

110
00:05:38,000 --> 00:05:40,000
So this used to get passed over here.

111
00:05:40,000 --> 00:05:45,000
If I'm not passing this source then whatever is the truth value with respect to my first sentence.

112
00:05:45,000 --> 00:05:50,000
Right in my training data, you know, let's say in my training data, I have this values, right?

113
00:05:50,000 --> 00:05:52,000
I have something like this.

114
00:05:52,000 --> 00:05:55,000
And inside this I have this English text.

115
00:05:55,000 --> 00:05:55,000
Hello?

116
00:05:56,000 --> 00:05:57,000
What's up?

117
00:05:57,000 --> 00:06:02,000
And inside my French I'll say gracias.

118
00:06:02,000 --> 00:06:03,000
Something.

119
00:06:03,000 --> 00:06:03,000
Okay.

120
00:06:03,000 --> 00:06:04,000
Some text.

121
00:06:04,000 --> 00:06:05,000
Okay.

122
00:06:05,000 --> 00:06:08,000
Now in the first y zero, I should pass.

123
00:06:08,000 --> 00:06:09,000
gaseous.

124
00:06:09,000 --> 00:06:09,000
Okay.

125
00:06:09,000 --> 00:06:12,000
So this was gaseous that was passed over here.

126
00:06:12,000 --> 00:06:16,000
Now as soon as this gaseous was passed by using this context vector along with this context vector,

127
00:06:16,000 --> 00:06:17,000
it will passed.

128
00:06:17,000 --> 00:06:20,000
It will get passed in the um LSTM.

129
00:06:20,000 --> 00:06:21,000
Okay.

130
00:06:21,000 --> 00:06:27,000
And this LSTM will be responsible in finding your y1 right, and y1.

131
00:06:27,000 --> 00:06:28,000
How do you calculate it.

132
00:06:28,000 --> 00:06:30,000
Because here another softmax will be there.

133
00:06:30,000 --> 00:06:30,000
Right.

134
00:06:30,000 --> 00:06:35,000
So if I just go ahead and see there will be another soft max over here.

135
00:06:35,000 --> 00:06:35,000
Right.

136
00:06:35,000 --> 00:06:38,000
So this will basically be my soft Max.

137
00:06:38,000 --> 00:06:38,000
Okay.

138
00:06:39,000 --> 00:06:42,000
Now once I calculate the y hat or y one okay.

139
00:06:42,000 --> 00:06:43,000
This is y one hat.

140
00:06:43,000 --> 00:06:43,000
Okay.

141
00:06:43,000 --> 00:06:51,000
Then we used to pass this y one hat to my next next LSTM.

142
00:06:51,000 --> 00:06:51,000
Right.

143
00:06:51,000 --> 00:06:54,000
To calculate the next word right along with this.

144
00:06:54,000 --> 00:07:00,000
What we used to also do is that uh we used to take this entire context vector and again pass it over

145
00:07:00,000 --> 00:07:00,000
here.

146
00:07:01,000 --> 00:07:01,000
Right.

147
00:07:01,000 --> 00:07:06,000
And by using both of them, we used to go ahead and calculate our y two hat.

148
00:07:06,000 --> 00:07:06,000
Right.

149
00:07:06,000 --> 00:07:09,000
Similarly to calculate the y three hat.

150
00:07:09,000 --> 00:07:11,000
Uh, I will go ahead and pass just a second.

151
00:07:11,000 --> 00:07:16,000
I will go ahead and pass this to my next neuron.

152
00:07:16,000 --> 00:07:19,000
Along with this, I used to also pass this my context vector.

153
00:07:19,000 --> 00:07:25,000
And then finally I used to get my y three Y3 hat and which can be actually my end of sentence, right?

154
00:07:25,000 --> 00:07:30,000
So this way was the entire process of the encoder decoder.

155
00:07:30,000 --> 00:07:33,000
I missed this line previously, so I'm again writing it down.

156
00:07:33,000 --> 00:07:36,000
So this was my encoder decoder architecture.

157
00:07:37,000 --> 00:07:37,000
Right.

158
00:07:37,000 --> 00:07:39,000
And what was the problem with this architecture.

159
00:07:39,000 --> 00:07:41,000
It is very simple.

160
00:07:41,000 --> 00:07:48,000
This if my sentence is very long, this context vector will not be able to capture the essence of the

161
00:07:48,000 --> 00:07:49,000
entire sentence, right?

162
00:07:49,000 --> 00:07:57,000
Because obviously, if my sentence is of a length of 100 timestamp, right, 100 timestamp.

163
00:07:57,000 --> 00:07:59,000
So I'm probably going with till t is equal to 100.

164
00:07:59,000 --> 00:08:05,000
During that particular case, whatever context vector will be created at that point of time, only this

165
00:08:05,000 --> 00:08:08,000
will be able to capture with respect to the nearest timestamp.

166
00:08:08,000 --> 00:08:12,000
When it is t is equal to one and all will not be able to capture the essence.

167
00:08:12,000 --> 00:08:12,000
Right?

168
00:08:12,000 --> 00:08:15,000
And because of this, you could see that from the blue score.

169
00:08:15,000 --> 00:08:19,000
When the context, when the sentence length was increasing, it was not able to give us a good accuracy.

170
00:08:20,000 --> 00:08:27,000
So what exactly were the changes that were made in this, uh, you know, in this attention mechanism?

171
00:08:27,000 --> 00:08:27,000
Okay.

172
00:08:27,000 --> 00:08:30,000
Now I will just follow this particular architecture.

173
00:08:30,000 --> 00:08:32,000
So here you will be able to see this amazing diagram.

174
00:08:32,000 --> 00:08:33,000
Okay.

175
00:08:33,000 --> 00:08:37,000
And this amazing diagram will try to understand what exactly it is okay.

176
00:08:37,000 --> 00:08:42,000
So first of all, uh, over here you'll be able to see that this is my encoder.

177
00:08:42,000 --> 00:08:42,000
Okay.

178
00:08:42,000 --> 00:08:46,000
This is my encoder and this is my decoder.

179
00:08:46,000 --> 00:08:46,000
Okay.

180
00:08:46,000 --> 00:08:51,000
Just to give you a clear idea, this exactly is my encoder.

181
00:08:51,000 --> 00:08:52,000
Okay.

182
00:08:52,000 --> 00:08:55,000
And this exactly is my decoder.

183
00:08:55,000 --> 00:09:00,000
Now, what exactly is the difference between this encoder and decoder and why there is so much architecture

184
00:09:00,000 --> 00:09:00,000
difference?

185
00:09:00,000 --> 00:09:06,000
Okay, so to make you understand again what is the architecture difference that we will discuss in attention

186
00:09:06,000 --> 00:09:07,000
mechanism.

187
00:09:07,000 --> 00:09:07,000
Okay.

188
00:09:07,000 --> 00:09:14,000
So in attention mechanism now attention mechanism main idea is to provide more context to the decoder

189
00:09:14,000 --> 00:09:16,000
to probably do the prediction okay.

190
00:09:16,000 --> 00:09:25,000
So here what we will do is that uh in the encoder first of all I will go ahead and use, let's say over

191
00:09:25,000 --> 00:09:26,000
here.

192
00:09:27,000 --> 00:09:33,000
Instead of just using a simple LSTM, I will be using bidirectional LSTM.

193
00:09:33,000 --> 00:09:38,000
So let's say this is my one LSTM.

194
00:09:38,000 --> 00:09:39,000
This is my second LSTM.

195
00:09:39,000 --> 00:09:42,000
I hope everybody knows what is bidirectional LSTM right?

196
00:09:42,000 --> 00:09:44,000
So here I will just use this.

197
00:09:47,000 --> 00:09:47,000
Okay.

198
00:09:47,000 --> 00:09:49,000
Please focus on the diagram.

199
00:09:49,000 --> 00:09:52,000
And this architecture will completely make sense.

200
00:09:52,000 --> 00:09:53,000
What we are exactly doing.

201
00:10:00,000 --> 00:10:01,000
Okay.

202
00:10:01,000 --> 00:10:02,000
Perfect.

203
00:10:02,000 --> 00:10:06,000
Now over here, you'll be able to see that I will be giving my input over here.

204
00:10:06,000 --> 00:10:07,000
Right.

205
00:10:07,000 --> 00:10:10,000
And this input will also go over here.

206
00:10:10,000 --> 00:10:11,000
Right?

207
00:10:11,000 --> 00:10:14,000
This is my H1 okay.

208
00:10:14,000 --> 00:10:18,000
And I'll say this is my H1 with information going in this direction.

209
00:10:18,000 --> 00:10:21,000
This is my H2 going in this direction.

210
00:10:21,000 --> 00:10:21,000
This is my H3.

211
00:10:21,000 --> 00:10:25,000
Then the information going in this direction and in bidirectional what happens.

212
00:10:25,000 --> 00:10:31,000
We will also be having our H3 same way over here right with respect to time.

213
00:10:31,000 --> 00:10:31,000
Right.

214
00:10:31,000 --> 00:10:36,000
So here you can see h1 h2 h3 h3 right h3 h3 h2 h1.

215
00:10:36,000 --> 00:10:38,000
So here what I will do I will go ahead and write h3.

216
00:10:38,000 --> 00:10:46,000
But my information will go from the reverse h3 h2 reverse h1 reverse right.

217
00:10:46,000 --> 00:10:52,000
So what exactly this architecture is of this architecture is of a bidirectional uh RNN or LSTM RNN.

218
00:10:52,000 --> 00:10:53,000
Right.

219
00:10:53,000 --> 00:10:55,000
Wherein I will take up any text.

220
00:10:55,000 --> 00:10:56,000
Right.

221
00:10:56,000 --> 00:10:57,000
If I say hello.

222
00:10:58,000 --> 00:10:59,000
What's up?

223
00:10:59,000 --> 00:11:00,000
Okay.

224
00:11:00,000 --> 00:11:01,000
What's up.

225
00:11:01,000 --> 00:11:05,000
So in X11 here I'm going to pass.

226
00:11:05,000 --> 00:11:06,000
Hello.

227
00:11:06,000 --> 00:11:06,000
Right.

228
00:11:06,000 --> 00:11:10,000
This information will get passed to both the both the LSTM RNN okay.

229
00:11:10,000 --> 00:11:16,000
But before this is getting passed you know you'll be able to see from the reverse direction your text

230
00:11:16,000 --> 00:11:17,000
will get passed also.

231
00:11:17,000 --> 00:11:17,000
Right.

232
00:11:17,000 --> 00:11:19,000
So here is my second input.

233
00:11:19,000 --> 00:11:21,000
Here is my third input.

234
00:11:21,000 --> 00:11:23,000
So with respect to t is equal to one I'm going to pass.

235
00:11:23,000 --> 00:11:23,000
Hello.

236
00:11:23,000 --> 00:11:28,000
With respect to t is equal to two I'm going to pass what's okay t is equal to.

237
00:11:28,000 --> 00:11:31,000
And what's up at t is equal to three right.

238
00:11:31,000 --> 00:11:34,000
Similarly at t is equal to one.

239
00:11:35,000 --> 00:11:39,000
We are just going to pass this information first of all in the reverse direction.

240
00:11:39,000 --> 00:11:42,000
Similarly as t is equal to two we are going to pass this information.

241
00:11:42,000 --> 00:11:48,000
And at t is equal to uh t is equal to two and t is equal to three.

242
00:11:48,000 --> 00:11:50,000
I'm going to pass this in the reverse direction.

243
00:11:50,000 --> 00:11:51,000
Right.

244
00:11:51,000 --> 00:11:55,000
So this is what happens in the bidirectional uh LSTM, RNN.

245
00:11:55,000 --> 00:11:59,000
And the major aim is that we will also be able to provide the further words context.

246
00:11:59,000 --> 00:12:00,000
Right.

247
00:12:00,000 --> 00:12:03,000
Just imagine this in terms of when you have a larger word.

248
00:12:03,000 --> 00:12:03,000
Right.

249
00:12:03,000 --> 00:12:08,000
So once we do all this things right and you know that this will be interconnected, right.

250
00:12:08,000 --> 00:12:10,000
So here I will be able to get the output.

251
00:12:10,000 --> 00:12:13,000
So let me just go ahead and define this.

252
00:12:13,000 --> 00:12:15,000
So this and this will get combined.

253
00:12:16,000 --> 00:12:17,000
Okay.

254
00:12:17,000 --> 00:12:19,000
This and this will get combined.

255
00:12:19,000 --> 00:12:23,000
Uh over here this and this will get combined whatever output we are specifically getting and this and

256
00:12:23,000 --> 00:12:25,000
this will get combined okay.

257
00:12:26,000 --> 00:12:29,000
Now if we go ahead, right.

258
00:12:29,000 --> 00:12:35,000
As you all know with respect to this, uh, the further information that we specifically have from this

259
00:12:35,000 --> 00:12:41,000
hidden layer, right when it is getting combined here, we will be able to get S zero right, which

260
00:12:41,000 --> 00:12:42,000
is my hidden state.

261
00:12:42,000 --> 00:12:43,000
This is perfectly fine.

262
00:12:43,000 --> 00:12:44,000
Okay, let me do one thing.

263
00:12:44,000 --> 00:12:49,000
Let me just take this diagram little bit below.

264
00:12:49,000 --> 00:12:49,000
Okay.

265
00:12:49,000 --> 00:12:56,000
So I will copy this and let me take it over here completely okay.

266
00:12:57,000 --> 00:13:04,000
Because I need to draw a lot of things on the top okay Okay, so I will just copy this over here.

267
00:13:04,000 --> 00:13:05,000
I'll paste it over here.

268
00:13:05,000 --> 00:13:05,000
Okay.

269
00:13:06,000 --> 00:13:10,000
Now what will happen is that as I get this particular value as s zero okay.

270
00:13:10,000 --> 00:13:18,000
You know that from here I will be getting my output, let's say my output over here H one here my output

271
00:13:18,000 --> 00:13:19,000
is H two.

272
00:13:19,000 --> 00:13:21,000
Here my output is h three.

273
00:13:21,000 --> 00:13:26,000
Now previously what we are doing is that uh in the encoder decoder we were not taking this particular

274
00:13:26,000 --> 00:13:26,000
output.

275
00:13:26,000 --> 00:13:27,000
Right.

276
00:13:27,000 --> 00:13:28,000
We are not taking this.

277
00:13:28,000 --> 00:13:30,000
We just created one context vectors.

278
00:13:30,000 --> 00:13:30,000
Right.

279
00:13:30,000 --> 00:13:37,000
But this output also we can specifically use and we can we can uh, provide this particular output or

280
00:13:37,000 --> 00:13:42,000
context or will convert this output into some kind of context and give it to our decoder.

281
00:13:42,000 --> 00:13:44,000
That is the main idea over here okay.

282
00:13:44,000 --> 00:13:49,000
Now what we will do from this, you know, see further thing what we are going to do over here.

283
00:13:49,000 --> 00:13:50,000
And this will be very amazing.

284
00:13:50,000 --> 00:13:52,000
So first of all, in the decoder, this is the.

285
00:13:52,000 --> 00:13:55,000
So first of all in the encoder this is the chain okay.

286
00:13:55,000 --> 00:13:57,000
So this is my entire encoder here.

287
00:13:57,000 --> 00:14:04,000
I'm actually passing my bidirectional LSTM RNN so that I will be able to also provide the further words

288
00:14:04,000 --> 00:14:04,000
context.

289
00:14:04,000 --> 00:14:10,000
Right now in the next step what we are going to do is that we will create something.

290
00:14:10,000 --> 00:14:11,000
Okay.

291
00:14:11,000 --> 00:14:15,000
So let's say uh, I will just go ahead and combine some information.

292
00:14:15,000 --> 00:14:20,000
Let's say I'm going to probably go ahead and use some notation.

293
00:14:20,000 --> 00:14:21,000
This is my first notation.

294
00:14:21,000 --> 00:14:24,000
This is my second and this is my third okay.

295
00:14:25,000 --> 00:14:28,000
Now in this in this nodes right.

296
00:14:28,000 --> 00:14:32,000
We are going to pass this h one along with h one.

297
00:14:32,000 --> 00:14:38,000
We are also going to we are also going to pass this s zero.

298
00:14:40,000 --> 00:14:41,000
Okay.

299
00:14:41,000 --> 00:14:44,000
So here we are specifically going to pass a zero.

300
00:14:44,000 --> 00:14:47,000
So this will basically go over here along with this.

301
00:14:47,000 --> 00:14:49,000
It will also go over here.

302
00:14:49,000 --> 00:14:51,000
This will also go over here right.

303
00:14:51,000 --> 00:14:53,000
And this H2 tag information.

304
00:14:53,000 --> 00:14:58,000
The hidden state from the output of the H2 will be going over here.

305
00:14:58,000 --> 00:15:00,000
And S3 will also be going over here.

306
00:15:00,000 --> 00:15:00,000
Right.

307
00:15:00,000 --> 00:15:05,000
So this is the first information that is basically getting passed so that we'll be able to convert this

308
00:15:05,000 --> 00:15:06,000
into entire context.

309
00:15:06,000 --> 00:15:07,000
Now what is happening.

310
00:15:07,000 --> 00:15:09,000
The output of this is going over here.

311
00:15:09,000 --> 00:15:16,000
Along with that my hidden state right is also going over here on this particular node right now, the

312
00:15:16,000 --> 00:15:20,000
next important thing that actually happens and let me just denote this okay.

313
00:15:20,000 --> 00:15:23,000
So let's say with respect to words this is my E11.

314
00:15:23,000 --> 00:15:27,000
This is E12 and this is E13.

315
00:15:27,000 --> 00:15:30,000
I will explain you what exactly is the importance of this okay.

316
00:15:30,000 --> 00:15:36,000
Now what we do is that once we have this entire information that is the combination of h one, S zero,

317
00:15:36,000 --> 00:15:38,000
h two and S0S3 and s zero.

318
00:15:38,000 --> 00:15:43,000
Okay, we pass this entirely to a softmax.

319
00:15:44,000 --> 00:15:47,000
When we are saying that, we pass it to a softmax.

320
00:15:47,000 --> 00:15:53,000
That basically means here we are going to train it with a a an softmax okay.

321
00:15:54,000 --> 00:15:55,000
Softmax.

322
00:15:55,000 --> 00:16:00,000
So in short, what is basically happening is that here we will create a feed forward neural network.

323
00:16:00,000 --> 00:16:06,000
And we will try to create a and the output will be using the softmax.

324
00:16:06,000 --> 00:16:06,000
Right.

325
00:16:06,000 --> 00:16:12,000
So here inside this feed neural network we will be giving the combination of h one and s zero which

326
00:16:12,000 --> 00:16:13,000
is my hidden state.

327
00:16:13,000 --> 00:16:15,000
And then this will probably getting trained.

328
00:16:15,000 --> 00:16:21,000
And over here in our output, whenever we try to pass it to the softmax, it is basically going to have

329
00:16:21,000 --> 00:16:24,000
a multi uh multi classification problem statement.

330
00:16:24,000 --> 00:16:28,000
Now with respect to this you will be able to see that here.

331
00:16:28,000 --> 00:16:31,000
We will be getting some important nodes okay.

332
00:16:31,000 --> 00:16:34,000
And what are the nodes or what are the information that will be getting.

333
00:16:34,000 --> 00:16:37,000
It is nothing but A11A11.

334
00:16:37,000 --> 00:16:40,000
And let me just go ahead and write this as A12.

335
00:16:41,000 --> 00:16:43,000
And this will be finally A13.

336
00:16:43,000 --> 00:16:46,000
I'm giving some notation over here right now.

337
00:16:46,000 --> 00:16:48,000
See, see this entire architecture.

338
00:16:48,000 --> 00:16:50,000
And imagine over here, right.

339
00:16:50,000 --> 00:16:51,000
Same architecture over here.

340
00:16:51,000 --> 00:16:55,000
We had our bidirectional right LSTM, RNN.

341
00:16:55,000 --> 00:17:01,000
And from this we used we we generated this 8182 8382.

342
00:17:01,000 --> 00:17:02,000
Right.

343
00:17:02,000 --> 00:17:04,000
And this is what we have actually generated from it.

344
00:17:04,000 --> 00:17:07,000
And this is how the entire generation is basically happening, right?

345
00:17:08,000 --> 00:17:09,000
Uh, we are taking this hidden.

346
00:17:09,000 --> 00:17:16,000
Uh, we are basically taking the output of each and every LSTM hidden output, considering both bidirectional

347
00:17:16,000 --> 00:17:21,000
and, uh, over here, you'll be able to see with respect to this LSTM, we are combining this with

348
00:17:21,000 --> 00:17:22,000
our S0.

349
00:17:22,000 --> 00:17:25,000
Combining basically means together we are giving this to our an.

350
00:17:25,000 --> 00:17:29,000
And then finally you will be able to see that we are passing it to the softmax and we are getting this

351
00:17:29,000 --> 00:17:30,000
particular output.

352
00:17:30,000 --> 00:17:30,000
Okay.

353
00:17:30,000 --> 00:17:35,000
Now this this step, right.

354
00:17:35,000 --> 00:17:38,000
This step that you will be able to see over here where I'm combining all these things.

355
00:17:38,000 --> 00:17:42,000
These are basically called as alignment scores, right.

356
00:17:42,000 --> 00:17:43,000
Yes.

357
00:17:43,000 --> 00:17:45,000
We are aligning all the scores okay.

358
00:17:45,000 --> 00:17:52,000
And uh over here you'll be able to see that this is a, an a this is a entirely an that is artificial

359
00:17:52,000 --> 00:17:53,000
neural network.

360
00:17:53,000 --> 00:17:53,000
Okay.

361
00:17:53,000 --> 00:17:56,000
I'll talk about what is the importance of this.

362
00:17:56,000 --> 00:18:01,000
Then in the next step we basically create our attention weights.

363
00:18:01,000 --> 00:18:04,000
And I'll talk about the importance of this attention weights.

364
00:18:04,000 --> 00:18:06,000
This attention weights are very much important.

365
00:18:06,000 --> 00:18:11,000
C attention weights is all about how much context we really need to give it to the decoder in order

366
00:18:11,000 --> 00:18:13,000
to do that specific prediction.

367
00:18:13,000 --> 00:18:18,000
So here what we do is that whatever output we get from H1 right.

368
00:18:18,000 --> 00:18:26,000
This output, this output, we will take this output and we will combine it with A11.

369
00:18:26,000 --> 00:18:30,000
And here we will be doing a point wise multiplication operation okay.

370
00:18:30,000 --> 00:18:34,000
Similarly over here you will be able to see with respect to A12.

371
00:18:34,000 --> 00:18:37,000
We will take this H2 output.

372
00:18:38,000 --> 00:18:40,000
We will take this H2 output.

373
00:18:40,000 --> 00:18:43,000
And we will do a point wise operation with respect to this.

374
00:18:43,000 --> 00:18:47,000
And similarly with H3 output, we will do a point wise operation with respect to this.

375
00:18:47,000 --> 00:18:52,000
And this is what this diagram basically says when we say this point wise operation.

376
00:18:52,000 --> 00:18:52,000
Right.

377
00:18:52,000 --> 00:18:55,000
So here the same thing is basically happening.

378
00:18:55,000 --> 00:19:00,000
Now let me talk about the importance of this A11A12A13.

379
00:19:00,000 --> 00:19:01,000
What is the importance of this.

380
00:19:01,000 --> 00:19:07,000
This is basically used in deciding.

381
00:19:09,000 --> 00:19:11,000
How much context of H one.

382
00:19:11,000 --> 00:19:15,000
I should probably consider how much context of H2I should probably consider right?

383
00:19:15,000 --> 00:19:23,000
How much context of H3I should probably consider this entirely A11A12A13 is created from this particular

384
00:19:23,000 --> 00:19:29,000
an wherein we are combining h one, and we are combining the hidden state right together and then passing

385
00:19:29,000 --> 00:19:36,000
to the softmax activation function by using a feed forward neural network right feed forward neural

386
00:19:36,000 --> 00:19:37,000
network.

387
00:19:37,000 --> 00:19:43,000
So here, according to the research paper, we are just trying to say that how much context you really

388
00:19:43,000 --> 00:19:45,000
need to pass from one stage to the other stage.

389
00:19:45,000 --> 00:19:46,000
Okay.

390
00:19:46,000 --> 00:19:53,000
Now if you are clear with this, this finally, what we will do is that we will combine all these things.

391
00:19:53,000 --> 00:20:01,000
We will take all this values, and then we will again do a point wise operation over here with respect

392
00:20:01,000 --> 00:20:01,000
to plus.

393
00:20:01,000 --> 00:20:02,000
Okay.

394
00:20:02,000 --> 00:20:08,000
And here we are going to compute our context vector.

395
00:20:09,000 --> 00:20:11,000
We are going to compute our context vector.

396
00:20:11,000 --> 00:20:13,000
And the context vector formula is very simple.

397
00:20:13,000 --> 00:20:19,000
It is nothing but CT is equal to summation of I to whatever value it is one to t okay.

398
00:20:19,000 --> 00:20:27,000
A of t and then you basically a of t of I h of I right a of t of I basically means 1 or 1 1213 like

399
00:20:27,000 --> 00:20:29,000
that h of I right h of I is nothing.

400
00:20:29,000 --> 00:20:32,000
But we are specifically multiplying with all the hidden states.

401
00:20:32,000 --> 00:20:33,000
Okay.

402
00:20:33,000 --> 00:20:38,000
So once we get this and this is finally the operation that you will be able to see over here with respect

403
00:20:38,000 --> 00:20:39,000
to plus, right.

404
00:20:39,000 --> 00:20:40,000
This is what you are able to get it.

405
00:20:41,000 --> 00:20:43,000
I know see there are a lot of hidden things that you will be able to see.

406
00:20:43,000 --> 00:20:46,000
And this is where see, your context vector is basically calculating.

407
00:20:46,000 --> 00:20:51,000
And this context vector is coming from a I j a I j is nothing, but it is a softmax activation function.

408
00:20:51,000 --> 00:20:52,000
Right.

409
00:20:52,000 --> 00:20:54,000
And then you also have this e I j.

410
00:20:54,000 --> 00:20:56,000
And that is what I have actually written over here.

411
00:20:56,000 --> 00:21:01,000
E I j e I j is what e I j is nothing but a multiplied by um, sorry.

412
00:21:01,000 --> 00:21:07,000
A activation function on s of I comma h s j right s of I is nothing but your hidden state, right?

413
00:21:07,000 --> 00:21:14,000
And s of j right s of I is nothing but your hidden state, along with uh, h of j, h of J is nothing

414
00:21:14,000 --> 00:21:16,000
but all your hidden states.

415
00:21:16,000 --> 00:21:16,000
Right?

416
00:21:16,000 --> 00:21:19,000
And this is where, step by step we have basically calculated.

417
00:21:19,000 --> 00:21:26,000
But now the time comes is that you need to understand what happens in the decoder.

418
00:21:27,000 --> 00:21:36,000
So once you get this, this c this plus once you add it here you are going to get your context vector

419
00:21:36,000 --> 00:21:37,000
okay.

420
00:21:37,000 --> 00:21:41,000
Or let me just go ahead and draw it downwards okay.

421
00:21:41,000 --> 00:21:43,000
So I will just go ahead and take this.

422
00:21:43,000 --> 00:21:47,000
And this will be your context vector C right.

423
00:21:47,000 --> 00:21:52,000
Now we need this as zero and SC for our first decoder.

424
00:21:52,000 --> 00:21:52,000
Right.

425
00:21:52,000 --> 00:21:55,000
So in the decoder my first node will be S1.

426
00:21:55,000 --> 00:21:59,000
Let's say this S1 this S0 is getting passed over here okay.

427
00:22:00,000 --> 00:22:05,000
Now when it is getting passed we will also have to give our C okay.

428
00:22:05,000 --> 00:22:12,000
Along with the C we will also give our y zero right y zero value which will be our truth value.

429
00:22:12,000 --> 00:22:16,000
And then again over here our softmax layer will be used.

430
00:22:17,000 --> 00:22:23,000
And with respect to this softmax I will be passing it over here and getting my Y1 hat.

431
00:22:23,000 --> 00:22:24,000
Okay.

432
00:22:25,000 --> 00:22:28,000
Now here you'll be able to see that how much context we are getting.

433
00:22:28,000 --> 00:22:33,000
See, all this context is basically getting combined based on proportion, like a one on a one to a

434
00:22:33,000 --> 00:22:33,000
one.

435
00:22:33,000 --> 00:22:36,000
Three is basically saying how much from each one.

436
00:22:36,000 --> 00:22:38,000
I have to probably take the context from which to how much.

437
00:22:38,000 --> 00:22:42,000
I have to probably take the context from x3, how much context I need to take, right?

438
00:22:42,000 --> 00:22:45,000
And here you will be able to see with respect to X1, I am I'm able to get this right.

439
00:22:45,000 --> 00:22:49,000
Then uh, you will be also able to see Y1 over here.

440
00:22:49,000 --> 00:22:51,000
Now what will happen in the next step?

441
00:22:51,000 --> 00:22:55,000
See in the next step I need to pass this information to S2.

442
00:22:55,000 --> 00:22:57,000
Right which will be my second layer.

443
00:22:57,000 --> 00:23:00,000
But now here what will be the information that will be going.

444
00:23:00,000 --> 00:23:01,000
Think over it okay.

445
00:23:02,000 --> 00:23:04,000
What will be the information that it will be going right.

446
00:23:04,000 --> 00:23:09,000
So first of all I need to see over here.

447
00:23:09,000 --> 00:23:09,000
Very simple.

448
00:23:09,000 --> 00:23:15,000
You'll be able to see I'm getting the information of S zero to S one right.

449
00:23:15,000 --> 00:23:17,000
Then if I really want to do it for s two.

450
00:23:17,000 --> 00:23:18,000
Right.

451
00:23:18,000 --> 00:23:21,000
First of all again I need to generate this particular context vector.

452
00:23:21,000 --> 00:23:23,000
Now for generating this particular context vector.

453
00:23:23,000 --> 00:23:30,000
What I will do is that very simply I will instead of passing just from s zero, I will remove the line

454
00:23:30,000 --> 00:23:31,000
from s zero.

455
00:23:31,000 --> 00:23:34,000
And I will take this and pass it from S1.

456
00:23:34,000 --> 00:23:38,000
Now this will go and generate my new context completely, right?

457
00:23:38,000 --> 00:23:39,000
New context completely.

458
00:23:39,000 --> 00:23:43,000
And this new context will then pass to my S2.

459
00:23:43,000 --> 00:23:44,000
Right.

460
00:23:44,000 --> 00:23:48,000
And this context that I'm actually going to generate.

461
00:23:48,000 --> 00:23:50,000
It will be my C2 right.

462
00:23:50,000 --> 00:23:51,000
So this was C1.

463
00:23:51,000 --> 00:23:54,000
Now this will be C2 which I'm actually passing.

464
00:23:54,000 --> 00:23:58,000
C1 was getting passed over here to S1 right along with this.

465
00:23:58,000 --> 00:24:00,000
I will take this y1.

466
00:24:00,000 --> 00:24:05,000
I will also pass this information to my S2.

467
00:24:05,000 --> 00:24:05,000
Right.

468
00:24:05,000 --> 00:24:07,000
So this will basically be my Y1.

469
00:24:07,000 --> 00:24:14,000
Again I will go over here, calculate my y2 hat right y1 and y2 hat.

470
00:24:14,000 --> 00:24:19,000
Then again when it goes to the last timestamp which is S3, this will go over here.

471
00:24:19,000 --> 00:24:25,000
But again what we are basically going to do instead of S1 passing the new information or state information,

472
00:24:25,000 --> 00:24:27,000
we will go ahead and pass from S3.

473
00:24:28,000 --> 00:24:28,000
Right.

474
00:24:28,000 --> 00:24:31,000
So here we will go ahead and pass from here.

475
00:24:31,000 --> 00:24:33,000
And this will probably go ahead and generate our new context.

476
00:24:33,000 --> 00:24:34,000
C3.

477
00:24:34,000 --> 00:24:37,000
Now this C3 is going to get passed over here.

478
00:24:37,000 --> 00:24:40,000
Along with this I'm also going to give it pass Y2.

479
00:24:40,000 --> 00:24:45,000
And finally I will be able to get my output Y3 hat right.

480
00:24:45,000 --> 00:24:49,000
So this is the entire process of this entire research paper.

481
00:24:50,000 --> 00:24:51,000
Isn't it amazing?

482
00:24:51,000 --> 00:24:51,000
Right.

483
00:24:51,000 --> 00:24:58,000
Now this is a beautiful diagram that I tried to explore from multiple research paper and all.

484
00:24:58,000 --> 00:25:02,000
Uh, and got the complete idea about this, how things basically happen.

485
00:25:02,000 --> 00:25:02,000
Right.

486
00:25:02,000 --> 00:25:07,000
And, uh, this is really important for every one of you out there who specifically doing.

487
00:25:07,000 --> 00:25:12,000
But with respect to this, what is the nice thing that is happening every time?

488
00:25:12,000 --> 00:25:18,000
Whenever a decoder is predicting something, it is again going and calculating the new context information

489
00:25:18,000 --> 00:25:20,000
right for every timestamp.

490
00:25:20,000 --> 00:25:25,000
Because of this, we are able to also perform our work with a larger data set.

491
00:25:25,000 --> 00:25:25,000
Right?

492
00:25:25,000 --> 00:25:29,000
And this is the most important thing with respect to this particular architecture that is called as

493
00:25:29,000 --> 00:25:30,000
attention mechanism.

494
00:25:30,000 --> 00:25:31,000
Okay.

495
00:25:31,000 --> 00:25:37,000
Now, uh, one final thing, uh, that I really want to show it to you is that in this research paper,

496
00:25:37,000 --> 00:25:38,000
when we are discussing.

497
00:25:38,000 --> 00:25:42,000
Right, there is also something called as this particular score that is blue score.

498
00:25:42,000 --> 00:25:47,000
And here you can see with the help of a sentence length when we did our own search, uh, of 50, the

499
00:25:47,000 --> 00:25:51,000
blue score of the generated translation on a test set with respect to the length of the sentence, the

500
00:25:51,000 --> 00:25:54,000
result on a full test set which includes sentences having unknown words to the model.

501
00:25:54,000 --> 00:25:57,000
It's it's blue score remained almost constant.

502
00:25:57,000 --> 00:25:58,000
Right.

503
00:25:58,000 --> 00:25:59,000
I did not come down.

504
00:25:59,000 --> 00:26:00,000
Right.

505
00:26:00,000 --> 00:26:02,000
And there is also one more thing over here.

506
00:26:03,000 --> 00:26:07,000
Uh, if you probably see the example with respect to A11, A12, A13.

507
00:26:07,000 --> 00:26:07,000
Right.

508
00:26:07,000 --> 00:26:11,000
So here, uh, with a specific word, what is the translation.

509
00:26:11,000 --> 00:26:13,000
And that is what this specific color shows, right.

510
00:26:14,000 --> 00:26:18,000
Uh, you can probably go ahead and see about this particular research paper, but I hope you got an

511
00:26:18,000 --> 00:26:20,000
idea that how this entire things work.

512
00:26:20,000 --> 00:26:21,000
Right?

513
00:26:21,000 --> 00:26:22,000
I know there are multiple steps.

514
00:26:22,000 --> 00:26:25,000
It is a very complicated neural, uh, attention mechanism.

515
00:26:25,000 --> 00:26:31,000
But this is what is the base of what we will specifically use in transformers, right?

516
00:26:31,000 --> 00:26:36,000
Which is very much important and which is the base of every generative AI models that we use currently.

517
00:26:36,000 --> 00:26:36,000
Right.

518
00:26:37,000 --> 00:26:39,000
So I hope you got this entire idea again.

519
00:26:39,000 --> 00:26:41,000
Just go and see this particular research paper.

520
00:26:41,000 --> 00:26:43,000
This is first of all, my bidirectional.

521
00:26:43,000 --> 00:26:50,000
Then I went ahead and here you can see e of I j we can go and see from here e of I j is nothing but

522
00:26:50,000 --> 00:26:51,000
this.

523
00:26:51,000 --> 00:26:55,000
What we are basically doing where we are using this s of zero and h of I, we are passing it through

524
00:26:55,000 --> 00:27:00,000
an an, and from the softmax activation function I'm getting a1 one a1 to a1 three.

525
00:27:00,000 --> 00:27:04,000
This is basically saying that how much context of H1 I have to probably go ahead and take and probably

526
00:27:04,000 --> 00:27:06,000
create my attention weights.

527
00:27:06,000 --> 00:27:06,000
Right?

528
00:27:06,000 --> 00:27:09,000
So, uh, I hope you like this particular video.

529
00:27:09,000 --> 00:27:17,000
Uh, again, uh, in order to show you a good examples, I also have one more, uh, website.

530
00:27:17,000 --> 00:27:17,000
Okay.

531
00:27:17,000 --> 00:27:20,000
And probably I will also give you this particular link in the website.

532
00:27:20,000 --> 00:27:20,000
Okay.

533
00:27:20,000 --> 00:27:24,000
I will just reload this introduction to attention mechanism.

534
00:27:24,000 --> 00:27:27,000
Let's say this is with respect to our encoder decoder.

535
00:27:27,000 --> 00:27:29,000
So I'm passing this h1 h2 h3.

536
00:27:29,000 --> 00:27:30,000
We go over here.

537
00:27:30,000 --> 00:27:34,000
Then over here you will be able to see that I'm creating one S0 which is my hidden state and context

538
00:27:34,000 --> 00:27:34,000
vector.

539
00:27:34,000 --> 00:27:36,000
And this is basically getting passed.

540
00:27:36,000 --> 00:27:40,000
And this is actually going and generating all the sentence text.

541
00:27:40,000 --> 00:27:45,000
This was basically happening in uh, sequence to sequence encoder and decoder architecture right now

542
00:27:45,000 --> 00:27:48,000
in the next step, how we are basically doing.

543
00:27:48,000 --> 00:27:51,000
So here is your entire diagram how we are calculating this E three.

544
00:27:51,000 --> 00:27:53,000
This s of zero is going over here.

545
00:27:53,000 --> 00:27:55,000
Then you have the softmax then A11.

546
00:27:55,000 --> 00:27:58,000
Then finally we see this multiply attention weights.

547
00:27:58,000 --> 00:28:00,000
Then we create this particular weights of attention weights.

548
00:28:00,000 --> 00:28:02,000
Then we go ahead and create our CT.

549
00:28:02,000 --> 00:28:05,000
That is a context vector right with respect to timestamp C1.

550
00:28:05,000 --> 00:28:09,000
And passing this information that we'll go ahead and create C2.

551
00:28:09,000 --> 00:28:14,000
Here you can see SS1 is going in uh when we calculate the s two right over here.

552
00:28:14,000 --> 00:28:17,000
Then here you can see y two is currently coming up.

553
00:28:17,000 --> 00:28:17,000
Right.

554
00:28:17,000 --> 00:28:21,000
So I will give you all this specific information and you can go ahead and check it out.

555
00:28:21,000 --> 00:28:27,000
But again uh super thanks to this particular block uh, from which is published in 12th May 2021 International

556
00:28:27,000 --> 00:28:28,000
Introduction to Attention Mechanism.

557
00:28:28,000 --> 00:28:29,000
Right.

558
00:28:29,000 --> 00:28:30,000
So yes, this was it for my side.

559
00:28:30,000 --> 00:28:32,000
I hope you liked this particular video.

560
00:28:32,000 --> 00:28:36,000
I will just go ahead and give you this link quickly somewhere.

561
00:28:36,000 --> 00:28:37,000
I'll paste it.

562
00:28:37,000 --> 00:28:38,000
Right.

563
00:28:38,000 --> 00:28:41,000
And this is the link that you really need to refer.

564
00:28:41,000 --> 00:28:42,000
Right.

565
00:28:42,000 --> 00:28:46,000
So yes, this was it from my side with respect to attention mechanism.

566
00:28:46,000 --> 00:28:47,000
I hope you liked this particular video.

567
00:28:47,000 --> 00:28:48,000
I will see you in the next video.

568
00:28:48,000 --> 00:28:49,000
Thank you.

569
00:28:49,000 --> 00:28:49,000
Take care.

570
00:28:49,000 --> 00:28:49,000
Bye bye.

