1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:06,000
So in this video we are going to discuss about gated recurrent unit which is again a variant of LSTM

3
00:00:06,000 --> 00:00:07,000
RNN.

4
00:00:07,000 --> 00:00:15,000
This uh gru rnn was introduced by Koi et al at 2014.

5
00:00:15,000 --> 00:00:20,000
Okay, now just imagine in 1980s LSTM RNN was there.

6
00:00:20,000 --> 00:00:23,000
Then we saw some variants in 2000, right?

7
00:00:24,000 --> 00:00:30,000
Uh, now in 2014 you will be able to see that, hey, we have this green variant.

8
00:00:30,000 --> 00:00:37,000
Now, since this variant, this this green has just come recently in 2014, hardly ten years back.

9
00:00:37,000 --> 00:00:40,000
Right now it is 2024, right?

10
00:00:40,000 --> 00:00:42,000
It has some very important meaning.

11
00:00:42,000 --> 00:00:44,000
And we'll be discussing why do we use this green.

12
00:00:44,000 --> 00:00:45,000
Okay.

13
00:00:45,000 --> 00:00:48,000
So let me just go ahead and explain some of the things.

14
00:00:48,000 --> 00:00:57,000
So see in LSTM RNN you know that we have a separate cell or memory cell and we have a separate.

15
00:00:57,000 --> 00:00:59,000
So this is for my long term memory.

16
00:01:00,000 --> 00:01:07,000
Long term memory which we or let me just go ahead and draw it below so that you will have sufficient

17
00:01:07,000 --> 00:01:10,000
space to understand this right.

18
00:01:10,000 --> 00:01:12,000
So let us say this is my long term memory.

19
00:01:12,000 --> 00:01:14,000
This is my short term memory, okay.

20
00:01:14,000 --> 00:01:19,000
And I have already shown you we have this forget gate.

21
00:01:19,000 --> 00:01:20,000
We have this input gate.

22
00:01:20,000 --> 00:01:22,000
We have this output gate.

23
00:01:22,000 --> 00:01:22,000
Right.

24
00:01:23,000 --> 00:01:24,000
So I'll just give some notation over here.

25
00:01:24,000 --> 00:01:26,000
This is CT minus one.

26
00:01:26,000 --> 00:01:27,000
This is C of t.

27
00:01:27,000 --> 00:01:28,000
This is HT minus one.

28
00:01:28,000 --> 00:01:29,000
This is h of t.

29
00:01:30,000 --> 00:01:32,000
Then we do some point wise operation.

30
00:01:32,000 --> 00:01:35,000
This will be plus this will be plus over here.

31
00:01:35,000 --> 00:01:42,000
And uh you know let me just go ahead and check it because, uh, it's been see, this was this.

32
00:01:42,000 --> 00:01:42,000
Right?

33
00:01:42,000 --> 00:01:46,000
Then we use the tan H, and then we specifically do this.

34
00:01:46,000 --> 00:01:47,000
I could use this particular image.

35
00:01:47,000 --> 00:01:47,000
Okay.

36
00:01:47,000 --> 00:01:56,000
So here, uh, what I will do, I'll just go ahead and use this tan h and then a point wise operation.

37
00:01:57,000 --> 00:01:59,000
So here you have all the information going right.

38
00:01:59,000 --> 00:02:00,000
Right.

39
00:02:00,000 --> 00:02:02,000
Now understand one thing.

40
00:02:02,000 --> 00:02:10,000
This line is for my long term memory, long term memory.

41
00:02:10,000 --> 00:02:17,000
And this is for my short term memory, because this is my hidden state that is basically mentioned over

42
00:02:17,000 --> 00:02:17,000
here.

43
00:02:17,000 --> 00:02:19,000
So this is my short term memory.

44
00:02:20,000 --> 00:02:23,000
Now, you know, for every gate.

45
00:02:23,000 --> 00:02:25,000
So as I said this is my forget gate.

46
00:02:25,000 --> 00:02:26,000
Right.

47
00:02:26,000 --> 00:02:27,000
So this is my forget gate.

48
00:02:27,000 --> 00:02:30,000
This is my input gate and this is my output gate.

49
00:02:30,000 --> 00:02:33,000
Now, for every of this gate you know that we have some weights.

50
00:02:33,000 --> 00:02:37,000
So here I will go ahead and write w of f here w of I.

51
00:02:37,000 --> 00:02:40,000
Then here you have w of o.

52
00:02:40,000 --> 00:02:41,000
Okay.

53
00:02:41,000 --> 00:02:47,000
So at the end of the day with respect to this particular gates, this LSTM RNN variant, you will be

54
00:02:47,000 --> 00:02:57,000
seeing that one of the major problem is that this LSTM RNN architecture is quite complex, right?

55
00:02:57,000 --> 00:02:58,000
Why?

56
00:02:58,000 --> 00:03:01,000
It is quite complex because here we are using three gates.

57
00:03:01,000 --> 00:03:03,000
First of all right.

58
00:03:03,000 --> 00:03:07,000
Here you'll be able to see that we'll be using this forget gate.

59
00:03:07,000 --> 00:03:09,000
We'll be using this input gate.

60
00:03:09,000 --> 00:03:14,000
And obviously along with this we also have this candidate memory.

61
00:03:17,000 --> 00:03:19,000
And we have this output gate.

62
00:03:20,000 --> 00:03:25,000
Now, when we have this many number of gates, you know that we also have three important weights.

63
00:03:25,000 --> 00:03:31,000
One is w of f, one is w of H and one is w of oh this many number of weights.

64
00:03:31,000 --> 00:03:31,000
Right.

65
00:03:31,000 --> 00:03:35,000
Sorry I'll just go ahead and write this weights again.

66
00:03:35,000 --> 00:03:40,000
So I have w of f w of I and I have w of.

67
00:03:40,000 --> 00:03:41,000
Oh o right.

68
00:03:41,000 --> 00:03:48,000
And even though if I consider that in my input, I have 100 dimension in the hidden layer, I have uh,

69
00:03:48,000 --> 00:03:49,000
100 hidden nodes.

70
00:03:49,000 --> 00:03:50,000
Right.

71
00:03:50,000 --> 00:03:52,000
So these weights will keep on getting increasing.

72
00:03:52,000 --> 00:03:55,000
So whenever this weights and obviously bias will also be there.

73
00:03:56,000 --> 00:03:59,000
Now what do we consider this weights and bias.

74
00:03:59,000 --> 00:04:00,000
This weights and bias are nothing.

75
00:04:00,000 --> 00:04:03,000
But these are called as trainable parameters.

76
00:04:05,000 --> 00:04:06,000
Trainable parameters.

77
00:04:07,000 --> 00:04:12,000
Right now, because of this complex architecture, you will be seeing that this trainable parameters

78
00:04:12,000 --> 00:04:14,000
will keep on increasing.

79
00:04:14,000 --> 00:04:15,000
Right.

80
00:04:15,000 --> 00:04:18,000
Because that many number of parameters is specifically required.

81
00:04:18,000 --> 00:04:24,000
And whenever trainable parameter increases, uh, obviously the training time will also be increasing.

82
00:04:25,000 --> 00:04:31,000
And because and this is basically happening just because of the complex architecture, we have so many

83
00:04:31,000 --> 00:04:35,000
number of gates and we are we are having so many number of weights.

84
00:04:35,000 --> 00:04:39,000
So we are having so many number of bias and forward and backward propagation when we keep on doing with

85
00:04:39,000 --> 00:04:40,000
respect to time.

86
00:04:41,000 --> 00:04:41,000
Right.

87
00:04:42,000 --> 00:04:45,000
So see LSTM, RNN does a good job in solving long term dependency.

88
00:04:45,000 --> 00:04:46,000
That is very much clear.

89
00:04:46,000 --> 00:04:49,000
But still the architecture is very complex.

90
00:04:49,000 --> 00:04:53,000
And because of this you will be having more trainable parameters when you have more trainable parameters,

91
00:04:53,000 --> 00:04:55,000
you have more training time.

92
00:04:55,000 --> 00:04:58,000
Right now, what researchers do.

93
00:04:58,000 --> 00:05:01,000
Uh, researcher Co et al did in 2014.

94
00:05:01,000 --> 00:05:03,000
I hope I'm pronouncing it right, guys.

95
00:05:03,000 --> 00:05:06,000
Uh, if I'm not pronouncing it right, please apologize.

96
00:05:06,000 --> 00:05:06,000
Right.

97
00:05:06,000 --> 00:05:10,000
I, uh, sorry for that.

98
00:05:10,000 --> 00:05:10,000
Okay.

99
00:05:11,000 --> 00:05:19,000
So what in GR you specifically done is that instead of using two memory like long term memory and short

100
00:05:19,000 --> 00:05:27,000
term memory here, what it has been specifically done is that, hey, we will see again for handling

101
00:05:27,000 --> 00:05:29,000
this long term and short term memory.

102
00:05:29,000 --> 00:05:31,000
There will be different different parameters that will be assigned.

103
00:05:31,000 --> 00:05:34,000
More operations will be specifically done.

104
00:05:34,000 --> 00:05:37,000
You know what they have actually done in GRU.

105
00:05:37,000 --> 00:05:40,000
They have combined this two memory cell.

106
00:05:40,000 --> 00:05:44,000
See this is just one memory cell which is representing hidden state.

107
00:05:44,000 --> 00:05:47,000
And here there is no long term memory cell.

108
00:05:47,000 --> 00:05:48,000
Right.

109
00:05:48,000 --> 00:05:58,000
This cell, this cell that you will be seeing, it acts as both long term and short term.

110
00:06:00,000 --> 00:06:03,000
It acts as both okay.

111
00:06:03,000 --> 00:06:05,000
And from this only we pass this information.

112
00:06:05,000 --> 00:06:07,000
And this is my input gate over here.

113
00:06:07,000 --> 00:06:09,000
This is my.

114
00:06:09,000 --> 00:06:11,000
You can you can see over here, right.

115
00:06:11,000 --> 00:06:13,000
Uh, this I'll not say input gate.

116
00:06:13,000 --> 00:06:15,000
So see this is basically called as.

117
00:06:15,000 --> 00:06:18,000
And here you also need to understand with respect to the gates okay.

118
00:06:18,000 --> 00:06:24,000
So Z of T over here I will denote it over here.

119
00:06:25,000 --> 00:06:29,000
Like how we had this forget gate input gate and output gate here.

120
00:06:29,000 --> 00:06:34,000
Uh, when I consider this architecture here, you'll be seeing that we first of all find out Z of t.

121
00:06:34,000 --> 00:06:35,000
So z of t is nothing.

122
00:06:35,000 --> 00:06:37,000
But we take this t minus one.

123
00:06:37,000 --> 00:06:40,000
We take this x of t, and then we pass it to the sigmoid activation function.

124
00:06:40,000 --> 00:06:44,000
And here we go ahead and apply this w of z weights.

125
00:06:44,000 --> 00:06:44,000
Right.

126
00:06:44,000 --> 00:06:49,000
So when z of t is basically calculated this z of t is nothing.

127
00:06:49,000 --> 00:06:53,000
But it is basically called as update gate.

128
00:06:54,000 --> 00:06:58,000
Okay, let me just go ahead and write this information.

129
00:06:58,000 --> 00:07:00,000
This is called as update gate.

130
00:07:03,000 --> 00:07:05,000
Update gate.

131
00:07:06,000 --> 00:07:07,000
Okay.

132
00:07:08,000 --> 00:07:12,000
Now when we go to the next one here you can see that it is r of t.

133
00:07:12,000 --> 00:07:14,000
How do we calculate r of t.

134
00:07:14,000 --> 00:07:16,000
We take s x of t dt minus one.

135
00:07:16,000 --> 00:07:22,000
And then here we are assigning and another weight another weight over here.

136
00:07:22,000 --> 00:07:24,000
And this weight will be nothing.

137
00:07:24,000 --> 00:07:32,000
But it will be uh w of r right now this value that we are computing r of T, this is basically called

138
00:07:32,000 --> 00:07:35,000
as reset gate.

139
00:07:36,000 --> 00:07:37,000
Very much important.

140
00:07:37,000 --> 00:07:41,000
This is basically called as reset gate okay.

141
00:07:41,000 --> 00:07:42,000
Now reset gate.

142
00:07:42,000 --> 00:07:43,000
What is the functionality?

143
00:07:43,000 --> 00:07:45,000
I'll give you a brief functionality and information.

144
00:07:45,000 --> 00:07:49,000
What this specifically does we will be getting to know all about that okay.

145
00:07:49,000 --> 00:07:57,000
And then uh the final one that you have is nothing but h of a temporary hidden state.

146
00:07:57,000 --> 00:08:00,000
I can basically say this is a temporary hidden state.

147
00:08:00,000 --> 00:08:02,000
I'll just name it.

148
00:08:02,000 --> 00:08:02,000
Okay?

149
00:08:02,000 --> 00:08:03,000
Temporary.

150
00:08:06,000 --> 00:08:08,000
Hidden state.

151
00:08:08,000 --> 00:08:10,000
So let's do one thing.

152
00:08:10,000 --> 00:08:16,000
First of all, let's understand how this operation of R of T is there, Z of T is there and h of T is

153
00:08:16,000 --> 00:08:18,000
there h h mode of t?

154
00:08:18,000 --> 00:08:19,000
Is there h hash.

155
00:08:19,000 --> 00:08:20,000
Right.

156
00:08:20,000 --> 00:08:22,000
I'll say h this upper symbol.

157
00:08:22,000 --> 00:08:26,000
Okay I'll give this kind of notation over here which I'm writing it as temporary hidden state.

158
00:08:26,000 --> 00:08:31,000
So first of all, uh, everybody is clear with Z of T, because here it is very much simple.

159
00:08:31,000 --> 00:08:32,000
We are combining x t minus one.

160
00:08:32,000 --> 00:08:34,000
We are combining x of T.

161
00:08:34,000 --> 00:08:39,000
We are passing it to this, uh, sigmoid activation function with uh sorry.

162
00:08:39,000 --> 00:08:39,000
Over here.

163
00:08:39,000 --> 00:08:40,000
This one.

164
00:08:40,000 --> 00:08:40,000
Right.

165
00:08:40,000 --> 00:08:48,000
We are passing it to the sigmoid activation function with z okay z over here z uh uh z of t.

166
00:08:48,000 --> 00:08:50,000
We are basically getting the output here.

167
00:08:50,000 --> 00:08:53,000
We are going to apply a weight which is called as w of z.

168
00:08:54,000 --> 00:08:57,000
So let me just go ahead and write w offset okay.

169
00:08:57,000 --> 00:08:59,000
So this is one.

170
00:08:59,000 --> 00:09:02,000
Now let's say r of t r of t which is nothing but reset gate.

171
00:09:02,000 --> 00:09:05,000
This is nothing but update gate right now.

172
00:09:05,000 --> 00:09:10,000
When we go to the reset gate and reset gate, what we are doing, we are taking this W of R, whatever

173
00:09:10,000 --> 00:09:14,000
is the weights over here and we are combining HT minus one and x of t.

174
00:09:14,000 --> 00:09:15,000
So this is also very much simple.

175
00:09:15,000 --> 00:09:17,000
And then we are passing it to the sigmoid activation function.

176
00:09:17,000 --> 00:09:19,000
And finally I get my r of t.

177
00:09:19,000 --> 00:09:27,000
Now when I get my r of t you will be seeing that I am doing a point wise operation over here.

178
00:09:27,000 --> 00:09:29,000
So point wise operation with what?

179
00:09:29,000 --> 00:09:30,000
With T minus one.

180
00:09:31,000 --> 00:09:31,000
Right.

181
00:09:32,000 --> 00:09:39,000
Once I do this point wise operation with T minus one along with x of T, I am passing this entirely

182
00:09:39,000 --> 00:09:45,000
to my tanh activation function, and with the help of an activation function, this is where I'm getting

183
00:09:45,000 --> 00:09:47,000
my temporary.

184
00:09:49,000 --> 00:09:50,000
Hidden state.

185
00:09:51,000 --> 00:09:51,000
Okay.

186
00:09:52,000 --> 00:09:58,000
And then what we do by doing another point operation with z of T, right.

187
00:09:58,000 --> 00:09:59,000
So z of t is also there.

188
00:09:59,000 --> 00:10:07,000
And by doing another point operation with z of t I'm updating my final h of T, which is my memory cell

189
00:10:07,000 --> 00:10:10,000
for both long term and short term memory.

190
00:10:10,000 --> 00:10:13,000
So this is the entire operations that is specifically done right.

191
00:10:13,000 --> 00:10:22,000
Understand the dependency right one or the other dependency x of t z of t is used over here along with

192
00:10:22,000 --> 00:10:23,000
h of t right.

193
00:10:23,000 --> 00:10:25,000
So here we are doing dot operation.

194
00:10:25,000 --> 00:10:27,000
So this is my dot operation that we are doing.

195
00:10:27,000 --> 00:10:37,000
And then in order to calculate a h of t I'm also subtracting some information one minus z of t one minus

196
00:10:37,000 --> 00:10:37,000
z of t.

197
00:10:37,000 --> 00:10:40,000
And I'm doing dot operation with h t minus one.

198
00:10:40,000 --> 00:10:42,000
So that is what you are able to see over here.

199
00:10:42,000 --> 00:10:42,000
Okay.

200
00:10:42,000 --> 00:10:46,000
Step by step again let me repeat it and we'll try to make it much more clearer.

201
00:10:46,000 --> 00:10:49,000
Once I make you understand why this entire steps is done.

202
00:10:49,000 --> 00:10:53,000
So first of all you need to understand Z of T is nothing, but it is called as update gate.

203
00:10:54,000 --> 00:10:56,000
R of T is nothing, but it is called as reset gate.

204
00:10:56,000 --> 00:10:59,000
H mod of T is nothing but temporary hidden state.

205
00:10:59,000 --> 00:11:01,000
So first of all, we go ahead and compute z of t.

206
00:11:01,000 --> 00:11:03,000
How to calculate z of t.

207
00:11:03,000 --> 00:11:06,000
It is nothing, but we take this t minus one.

208
00:11:06,000 --> 00:11:09,000
We combine x of t, then we pass it to a sigmoid activation function.

209
00:11:10,000 --> 00:11:15,000
Before passing it to the sigmoid activation function, we multiply with weights w of z and finally I

210
00:11:15,000 --> 00:11:15,000
get z of t.

211
00:11:16,000 --> 00:11:22,000
Now this value so z of t will be used where I'll talk about it okay.

212
00:11:22,000 --> 00:11:25,000
So first of all we have computed z of t okay.

213
00:11:25,000 --> 00:11:30,000
Now the next one we are we are going to compute r of t r of t is very small simple.

214
00:11:30,000 --> 00:11:34,000
In case of r of t we take t minus one x of t we apply weights.

215
00:11:34,000 --> 00:11:38,000
Multiply of weights w of t because this is our neural network right.

216
00:11:38,000 --> 00:11:39,000
Which you have already discussed.

217
00:11:39,000 --> 00:11:41,000
Then we pass it to the sigmoid activation.

218
00:11:42,000 --> 00:11:42,000
We get r of T.

219
00:11:43,000 --> 00:11:43,000
Okay.

220
00:11:43,000 --> 00:11:49,000
Then what we do we in order to compute this h dash uh t right.

221
00:11:49,000 --> 00:11:53,000
We take this h t minus one, we take this r of t we do point wise operation.

222
00:11:53,000 --> 00:11:55,000
So that is where this point wise operation is done.

223
00:11:55,000 --> 00:11:57,000
Then we take x of t.

224
00:11:57,000 --> 00:11:59,000
We combine both of this input.

225
00:12:00,000 --> 00:12:05,000
Then we multiply by what we multiply by weights which weights.

226
00:12:05,000 --> 00:12:07,000
So that will be w weights over here.

227
00:12:07,000 --> 00:12:09,000
Then we pass it to the tanh h.

228
00:12:10,000 --> 00:12:13,000
Then I get h dash of t which is my temporary hidden state.

229
00:12:14,000 --> 00:12:18,000
Now what I do, I take the output from my update gate.

230
00:12:18,000 --> 00:12:21,000
I take the output from this temporary hidden state.

231
00:12:21,000 --> 00:12:26,000
I do a point wise operation and then I send it over here.

232
00:12:26,000 --> 00:12:32,000
And then we do a point wise plus operation with HT minus one and get the memory back before that.

233
00:12:32,000 --> 00:12:38,000
Also, from the update gate, I subtract some of the information from one minus z of t.

234
00:12:38,000 --> 00:12:43,000
So from this particular information, and then do a point wise operation and send it to this particular

235
00:12:43,000 --> 00:12:44,000
information.

236
00:12:44,000 --> 00:12:44,000
Right.

237
00:12:44,000 --> 00:12:52,000
So finally in order to get h of t I do this two important operation right now let's go ahead and understand

238
00:12:52,000 --> 00:12:56,000
what does update gate, reset gate and temporary hidden state actually do.

239
00:12:56,000 --> 00:12:56,000
Okay.

240
00:12:56,000 --> 00:12:58,000
So let's go ahead.

241
00:12:58,000 --> 00:13:05,000
So guys now let's go ahead and discuss more information about this update gate reset gate and temporary

242
00:13:05,000 --> 00:13:09,000
hidden state which we also see it as candidate hidden state.

243
00:13:09,000 --> 00:13:12,000
All the notation is specifically given over here okay.

244
00:13:13,000 --> 00:13:20,000
Now what is the importance of this reset gate?

245
00:13:21,000 --> 00:13:21,000
Okay.

246
00:13:22,000 --> 00:13:27,000
Here I'm going to basically denote by r of T which is basically given over here r of T.

247
00:13:27,000 --> 00:13:32,000
And you know r of t how it is calculated sigmoid of w r.

248
00:13:32,000 --> 00:13:35,000
And we are combining ht minus one and x of t.

249
00:13:35,000 --> 00:13:38,000
And we are passing it to the sigmoid activation function.

250
00:13:38,000 --> 00:13:42,000
And then we specifically get r of t Now r of t.

251
00:13:42,000 --> 00:13:44,000
We basically say it as a reset gate.

252
00:13:44,000 --> 00:13:51,000
And its main responsibility is let's say for my HT minus one line.

253
00:13:51,000 --> 00:13:51,000
Right.

254
00:13:51,000 --> 00:13:54,000
Which is my current memory cell okay.

255
00:13:55,000 --> 00:14:00,000
I want to see this R of T as I said.

256
00:14:00,000 --> 00:14:00,000
Right.

257
00:14:00,000 --> 00:14:04,000
Let's say I will just go ahead and right over here.

258
00:14:04,000 --> 00:14:07,000
Let's say I'm getting the R of T after this sigmoid.

259
00:14:07,000 --> 00:14:08,000
Right.

260
00:14:08,000 --> 00:14:13,000
And for this I'm passing my h t minus one and x xt.

261
00:14:14,000 --> 00:14:15,000
Right.

262
00:14:15,000 --> 00:14:21,000
Now once I give this specific output, once I get this particular output from this sigmoid right, you

263
00:14:21,000 --> 00:14:31,000
will be seeing that this reset gate will be responsible in resetting and resetting some information.

264
00:14:33,000 --> 00:14:47,000
Resetting some information from h t minus one, which is nothing but our memory cell.

265
00:14:47,000 --> 00:14:53,000
And when I say memory cell, this is the combination of long term memory and short term memory.

266
00:14:53,000 --> 00:14:59,000
Okay, so this is what R of T will specifically do.

267
00:14:59,000 --> 00:15:05,000
So if you go ahead and see over here how we are calculating R of T, it is nothing but sigmoid of w

268
00:15:05,000 --> 00:15:12,000
of r with uh w of r multiplied by this h t minus one comma x of t.

269
00:15:12,000 --> 00:15:12,000
Right.

270
00:15:12,000 --> 00:15:14,000
We are combining both these things.

271
00:15:14,000 --> 00:15:21,000
Now the main aim of this reset get why I'm saying you once we do this calculation here, you can see

272
00:15:21,000 --> 00:15:28,000
that I am taking this T minus one and we are combining or we are doing a um point wise operation, point

273
00:15:28,000 --> 00:15:29,000
wise multiplication operation.

274
00:15:29,000 --> 00:15:37,000
See, whenever you see point wise multiplication operation, always consider this as resetting some

275
00:15:37,000 --> 00:15:39,000
information or forgetting some information.

276
00:15:39,000 --> 00:15:42,000
Now what does this resetting some information basically mean?

277
00:15:42,000 --> 00:15:43,000
Let's say from h t minus one.

278
00:15:43,000 --> 00:15:48,000
If I just go ahead and consider some four dimension vectors.

279
00:15:48,000 --> 00:15:49,000
Right.

280
00:15:49,000 --> 00:15:51,000
So let's consider that this is my h t minus one.

281
00:15:51,000 --> 00:15:58,000
And here I have values like .6.5.3 and point nine okay.

282
00:15:58,000 --> 00:16:00,000
Now I will go ahead and calculate my r of t.

283
00:16:00,000 --> 00:16:04,000
Now r of t may be some values like this.

284
00:16:04,000 --> 00:16:06,000
It can be point of two.

285
00:16:06,000 --> 00:16:13,000
Let's say that we are just going to calculate it .4.8 and uh point two okay.

286
00:16:13,000 --> 00:16:21,000
Now this indicates when we do this when we do this point wise operation.

287
00:16:21,000 --> 00:16:23,000
In short we are just going to multiply this.

288
00:16:23,000 --> 00:16:23,000
Right.

289
00:16:23,000 --> 00:16:26,000
We are going to multiply this when we are going to multiply it.

290
00:16:26,000 --> 00:16:32,000
Understand one very important thing over here When we multiply these two values, what we are specifically

291
00:16:32,000 --> 00:16:38,000
saying that, hey, there is some 0.6 value and we are just multiplying with the 20% of this value.

292
00:16:38,000 --> 00:16:39,000
Yes.

293
00:16:39,000 --> 00:16:40,000
Right.

294
00:16:40,000 --> 00:16:45,000
So once we probably multiply it I will be getting something like this 0.12 okay.

295
00:16:45,000 --> 00:16:50,000
So here you can see that from this point six I'm just taking 20% I'm resetting this particular value.

296
00:16:50,000 --> 00:16:54,000
I'm I'm forgetting some information through this reset way.

297
00:16:54,000 --> 00:16:54,000
Right.

298
00:16:54,000 --> 00:16:59,000
Similarly over here, if I say 0.5 multiplied by 0.4.

299
00:16:59,000 --> 00:17:01,000
So here you will be getting 0.20.

300
00:17:01,000 --> 00:17:04,000
What does this basically indicate from this 50%?

301
00:17:04,000 --> 00:17:05,000
I'm taking only 40%.

302
00:17:05,000 --> 00:17:07,000
I'm resetting by 40%.

303
00:17:07,000 --> 00:17:08,000
Right.

304
00:17:08,000 --> 00:17:12,000
So when I'm resetting by 40% I'm only getting 0.20 over here.

305
00:17:12,000 --> 00:17:15,000
I'm saying, hey go ahead and reset this point three by 80%.

306
00:17:15,000 --> 00:17:20,000
So this will be nothing, but it will be, uh, .24.

307
00:17:20,000 --> 00:17:20,000
Right.

308
00:17:20,000 --> 00:17:22,000
So something like this.

309
00:17:22,000 --> 00:17:26,000
So you can just consider that I am trying to reset some value out of it.

310
00:17:26,000 --> 00:17:27,000
Okay.

311
00:17:27,000 --> 00:17:29,000
And this is how I'm actually able to do it.

312
00:17:29,000 --> 00:17:34,000
You know, now when I'm resetting my 80%, I'm getting this specific value.

313
00:17:34,000 --> 00:17:36,000
If I'm resetting by 40%.

314
00:17:36,000 --> 00:17:43,000
So 40% of 0.5 is nothing but 0.20 when I'm resetting this point to that, 20% of 0.6 is nothing but

315
00:17:43,000 --> 00:17:44,000
0.12.

316
00:17:44,000 --> 00:17:46,000
Similarly, what is 20% of 0.9?

317
00:17:46,000 --> 00:17:48,000
You know, so you can go ahead and do the multiplication.

318
00:17:48,000 --> 00:17:49,000
It is 0.18.

319
00:17:49,000 --> 00:17:53,000
So here what I'm saying I'm basically resetting the value.

320
00:17:53,000 --> 00:17:55,000
I'm removing some of the context.

321
00:17:55,000 --> 00:17:58,000
And this resetting is happening based on the context.

322
00:17:58,000 --> 00:18:01,000
I'm saying hey I got my new x of T, let's say.

323
00:18:01,000 --> 00:18:05,000
And based on this x of T I need to reset all this particular value.

324
00:18:05,000 --> 00:18:08,000
And this is what this point wise operation will specifically do.

325
00:18:08,000 --> 00:18:08,000
Okay.

326
00:18:09,000 --> 00:18:13,000
Now once we reset it, you'll be seeing that we are sending this information to our tan H.

327
00:18:13,000 --> 00:18:13,000
Right.

328
00:18:13,000 --> 00:18:20,000
So tan H over here it is sending and we are calculating h h t which is nothing, but it is our temporary

329
00:18:20,000 --> 00:18:20,000
hidden state.

330
00:18:20,000 --> 00:18:23,000
Or we can also say it as candidate hidden state.

331
00:18:23,000 --> 00:18:23,000
Right.

332
00:18:23,000 --> 00:18:26,000
So that is the reason you will be able to see that when we pass this information.

333
00:18:26,000 --> 00:18:34,000
In short, we are just passing the resetted value right after we probably perform the Resetted value.

334
00:18:35,000 --> 00:18:37,000
We are passing this to this particular Tanitch.

335
00:18:37,000 --> 00:18:38,000
Okay.

336
00:18:38,000 --> 00:18:40,000
And we will be getting this edge bar of T okay.

337
00:18:40,000 --> 00:18:42,000
Once I pass this okay.

338
00:18:42,000 --> 00:18:44,000
Now let's go ahead and see.

339
00:18:44,000 --> 00:18:47,000
I think this is very much clear for everyone I guess.

340
00:18:47,000 --> 00:18:50,000
Uh, so the first operation where we calculated R of t.

341
00:18:51,000 --> 00:18:54,000
Now let's go and see with respect to z of T how do we.

342
00:18:54,000 --> 00:18:55,000
Z of T is calculated.

343
00:18:55,000 --> 00:18:56,000
So we'll take this t minus one.

344
00:18:56,000 --> 00:18:57,000
We'll take this x of t.

345
00:18:57,000 --> 00:18:59,000
We'll pass it to sigmoid activation function.

346
00:18:59,000 --> 00:19:02,000
And here another weight is applied w of z.

347
00:19:02,000 --> 00:19:02,000
Okay.

348
00:19:03,000 --> 00:19:05,000
So here we have calculated this.

349
00:19:05,000 --> 00:19:06,000
Now this z of t.

350
00:19:07,000 --> 00:19:11,000
And whatever h bar of t will be coming up.

351
00:19:11,000 --> 00:19:11,000
Right.

352
00:19:11,000 --> 00:19:16,000
This both are combined together to get this value.

353
00:19:16,000 --> 00:19:21,000
So guys now let's go ahead and talk about this gate which is called as update gate.

354
00:19:21,000 --> 00:19:21,000
Right.

355
00:19:21,000 --> 00:19:23,000
So update gate is nothing but this gate.

356
00:19:23,000 --> 00:19:27,000
And here you have basically temporary hidden hidden state okay.

357
00:19:27,000 --> 00:19:29,000
Now see guys, uh, it is very much simple.

358
00:19:29,000 --> 00:19:35,000
First of all you need to understand from h t minus one I am able to get h t hat.

359
00:19:35,000 --> 00:19:38,000
And then from this we are getting h t okay.

360
00:19:38,000 --> 00:19:45,000
If I combine combine x of T, right?

361
00:19:45,000 --> 00:19:47,000
If I combine both of them right.

362
00:19:47,000 --> 00:19:48,000
Let's let's go ahead and combine this.

363
00:19:48,000 --> 00:19:49,000
Both of them.

364
00:19:49,000 --> 00:19:59,000
And uh, if I, if I combine this and if I pass it to a sigmoid activation function with some weights,

365
00:19:59,000 --> 00:20:04,000
I will be getting, first of all, which gate over here I'll be getting r of T right.

366
00:20:04,000 --> 00:20:05,000
R of T is my reset gate.

367
00:20:06,000 --> 00:20:12,000
And with respect to this reset gate, uh, further, you'll be seeing that once I get this r of T,

368
00:20:13,000 --> 00:20:13,000
right.

369
00:20:13,000 --> 00:20:18,000
Once I get this r of t, I'm doing a point wise operation.

370
00:20:18,000 --> 00:20:25,000
So here you can see I'm doing a point wise operation again with this.

371
00:20:25,000 --> 00:20:27,000
I'm passing this entire information.

372
00:20:28,000 --> 00:20:35,000
Passing this entire information to my tanitch to get this value.

373
00:20:35,000 --> 00:20:36,000
Okay.

374
00:20:36,000 --> 00:20:41,000
But here you should also remember here I will further pass this x of t also.

375
00:20:42,000 --> 00:20:44,000
Okay x of t over here.

376
00:20:44,000 --> 00:20:49,000
So once I do this operation that is what it is happening right x t minus one with this operation.

377
00:20:49,000 --> 00:20:53,000
And here you can see x of t is passing and we are passing to the tanh neural network.

378
00:20:53,000 --> 00:20:58,000
So neural hidden neural network where the activation function is tan h okay.

379
00:20:58,000 --> 00:21:00,000
And here I'm actually able to get h of t.

380
00:21:00,000 --> 00:21:02,000
So we basically have to do all this particular information.

381
00:21:02,000 --> 00:21:06,000
Now this is very important from h t minus one to h t.

382
00:21:06,000 --> 00:21:08,000
We are making a transition.

383
00:21:08,000 --> 00:21:13,000
And this gate uh that we will specifically say uh I'll talk about that also.

384
00:21:13,000 --> 00:21:19,000
Now from h of t to come to h t this is the most important step.

385
00:21:19,000 --> 00:21:20,000
Okay.

386
00:21:21,000 --> 00:21:26,000
And for this step what we do is that we combine both of this input.

387
00:21:26,000 --> 00:21:30,000
Then we pass it to the sigmoid activation function.

388
00:21:30,000 --> 00:21:37,000
And an sigmoid activation function basically means here I have a hidden layer okay which will have a

389
00:21:37,000 --> 00:21:38,000
sigmoid activation function.

390
00:21:39,000 --> 00:21:44,000
Then I get the output over here This output is nothing but z of t, right.

391
00:21:44,000 --> 00:21:48,000
So here I'm just going to go ahead and write z of t okay.

392
00:21:48,000 --> 00:21:54,000
Now this z of t along with the output of h t.

393
00:21:54,000 --> 00:21:56,000
We will do this point wise operation.

394
00:21:57,000 --> 00:22:01,000
Now this point wise operation that we are specifically doing over here.

395
00:22:01,000 --> 00:22:01,000
Right.

396
00:22:01,000 --> 00:22:06,000
It is between an update gate and a temporary hidden state.

397
00:22:06,000 --> 00:22:08,000
My final state is H of T, right?

398
00:22:09,000 --> 00:22:27,000
Update gate basically says that what context information needs to be added needs to be added, but this

399
00:22:27,000 --> 00:22:32,000
will be completely dependent on my candidate.

400
00:22:32,000 --> 00:22:33,000
Hidden state.

401
00:22:36,000 --> 00:22:37,000
Okay.

402
00:22:37,000 --> 00:22:43,000
And candidate hidden state basically talks about the current context.

403
00:22:43,000 --> 00:22:54,000
I will say, hey, if the current context is important then I will add this information with less information

404
00:22:55,000 --> 00:22:58,000
from with less information from Z of T.

405
00:22:59,000 --> 00:23:06,000
So I will take less info from this and I'll add more info from the correct current candidate.

406
00:23:06,000 --> 00:23:07,000
Hidden state.

407
00:23:07,000 --> 00:23:07,000
Okay.

408
00:23:08,000 --> 00:23:13,000
And that is the reason you will be able to see over here my H of T, which is my final h of T, which

409
00:23:13,000 --> 00:23:15,000
I'm adding it up.

410
00:23:15,000 --> 00:23:21,000
It is nothing but one minus z of t multiplied by one minus z of TC1 minus z of t basically means what?

411
00:23:21,000 --> 00:23:25,000
First of all, I'm giving this input on the top memory cell.

412
00:23:25,000 --> 00:23:25,000
Right.

413
00:23:25,000 --> 00:23:32,000
So here you will be able to see that I am coupling for forget gate with this temporary hidden state.

414
00:23:32,000 --> 00:23:34,000
That is what we saw in this variant, right?

415
00:23:34,000 --> 00:23:37,000
If you remember coupling forget and input gates here.

416
00:23:37,000 --> 00:23:41,000
How how we coupled are two gates.

417
00:23:41,000 --> 00:23:48,000
Similarly, here we are coupling this particular this particular update gate with the temporary hidden

418
00:23:48,000 --> 00:23:49,000
state.

419
00:23:49,000 --> 00:23:50,000
And the operation is same thing.

420
00:23:50,000 --> 00:23:51,000
Right.

421
00:23:51,000 --> 00:23:56,000
So what I'm actually doing over here, one minus f of TC1 minus f of t is further getting multiplied

422
00:23:56,000 --> 00:23:58,000
by c of t okay.

423
00:23:58,000 --> 00:24:00,000
And sorry c of t over here.

424
00:24:00,000 --> 00:24:05,000
And uh this f of t is basically getting, uh, multiplied like a pointwise operation with c t minus

425
00:24:05,000 --> 00:24:05,000
one.

426
00:24:05,000 --> 00:24:12,000
Similarly, over here, one minus z of T will basically get done, uh, as a point wise operation with

427
00:24:12,000 --> 00:24:15,000
t minus one plus the remaining information.

428
00:24:15,000 --> 00:24:24,000
Based on this context, whatever context needs to be added right is getting multiplied by h of t, right?

429
00:24:24,000 --> 00:24:25,000
So here sorry h bar of t.

430
00:24:26,000 --> 00:24:31,000
And finally both this value will get added over here When this both the value gets added.

431
00:24:31,000 --> 00:24:32,000
I'm getting h of t now.

432
00:24:32,000 --> 00:24:34,000
What does this basically mean?

433
00:24:34,000 --> 00:24:36,000
This basically means very much simple right?

434
00:24:36,000 --> 00:24:44,000
So if I'll just take this equation over here one minus z of t multiplied by h t minus one.

435
00:24:44,000 --> 00:24:50,000
That basically means from this particular information, from this particular information, from this

436
00:24:50,000 --> 00:24:59,000
particular update gate, what information we should include that is completely decided based on this

437
00:24:59,000 --> 00:25:01,000
temporary hidden state.

438
00:25:01,000 --> 00:25:08,000
So in order to understand just this equation here, this equation basically indicates one minus z of

439
00:25:08,000 --> 00:25:10,000
t multiplied by multiplied.

440
00:25:10,000 --> 00:25:16,000
Basically means point wise operation with ht minus one plus z of t h dash of t.

441
00:25:16,000 --> 00:25:22,000
So this basically includes the information of the current context.

442
00:25:22,000 --> 00:25:25,000
Let's say the current sentence that we are getting is very important.

443
00:25:25,000 --> 00:25:26,000
I want to include this.

444
00:25:26,000 --> 00:25:28,000
How much I want to include it.

445
00:25:28,000 --> 00:25:29,000
It will be decided by this value.

446
00:25:30,000 --> 00:25:36,000
And the more context that I include, the lesser context I should be including from here, the lesser

447
00:25:36,000 --> 00:25:36,000
context from here.

448
00:25:36,000 --> 00:25:39,000
If I include the more context I will be including from here.

449
00:25:39,000 --> 00:25:43,000
So this is this works in a much more coupled way.

450
00:25:43,000 --> 00:25:45,000
This is what I really want to talk about.

451
00:25:45,000 --> 00:25:46,000
Coupled way.

452
00:25:46,000 --> 00:26:02,000
So in short to update from t minus one to h t this memory cell whenever we need to update this is exactly

453
00:26:02,000 --> 00:26:10,000
the, uh, you know, this is exactly decided with the combination of update gate.

454
00:26:12,000 --> 00:26:17,000
And your candidate memory gate.

455
00:26:19,000 --> 00:26:23,000
Candidate memory gate.

456
00:26:23,000 --> 00:26:24,000
Okay.

457
00:26:25,000 --> 00:26:26,000
It is decided from both of them.

458
00:26:26,000 --> 00:26:29,000
This indicates h bar of T.

459
00:26:29,000 --> 00:26:31,000
This indicates.

460
00:26:31,000 --> 00:26:38,000
Let me talk about the symbol that is nothing but z of T, Z of t, and one more gate that we specifically

461
00:26:38,000 --> 00:26:42,000
had was nothing but our reset gate.

462
00:26:42,000 --> 00:26:48,000
This basically indicates how much information we need to reset from this memory cell.

463
00:26:49,000 --> 00:26:49,000
Right.

464
00:26:49,000 --> 00:26:53,000
So this is the entire context behind GRU.

465
00:26:53,000 --> 00:26:56,000
And already I've spoken about how the training usually occurs in LSTM.

466
00:26:56,000 --> 00:27:00,000
And similarly the training will also occur based on all these equations.

467
00:27:00,000 --> 00:27:04,000
And here there are less number of weights when compared to LSTM RNN.

468
00:27:04,000 --> 00:27:12,000
So this architecture uh over here, the training also with terms of optimization is better when compared

469
00:27:12,000 --> 00:27:13,000
to the LSTM RNA.

470
00:27:13,000 --> 00:27:14,000
So yes, this was it for my side.

471
00:27:14,000 --> 00:27:16,000
I hope you liked this particular video.

472
00:27:16,000 --> 00:27:18,000
I will see you all in the next video.

473
00:27:18,000 --> 00:27:18,000
Thank you.

