1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue the discussion with respect to RNN.

3
00:00:03,000 --> 00:00:08,000
And in this video we are going to see a variance of LSTM RNN.

4
00:00:08,000 --> 00:00:12,000
Now uh we have already seen the entire LSTM architecture.

5
00:00:12,000 --> 00:00:18,000
But in some of the research paper you will also find this kind of architecture that I have actually

6
00:00:18,000 --> 00:00:19,000
mentioned over here.

7
00:00:19,000 --> 00:00:19,000
Okay.

8
00:00:19,000 --> 00:00:22,000
Now please focus on the diagram.

9
00:00:22,000 --> 00:00:27,000
I would always suggest, uh, whenever you want to check different variants.

10
00:00:27,000 --> 00:00:27,000
Right.

11
00:00:27,000 --> 00:00:28,000
You know, what is RNN?

12
00:00:28,000 --> 00:00:31,000
You know, the generic representation of RNN.

13
00:00:31,000 --> 00:00:33,000
You know about LSTM, RNN.

14
00:00:33,000 --> 00:00:40,000
Now just pause for hardly 10 to 15 seconds and just focus on this diagram okay.

15
00:00:40,000 --> 00:00:46,000
And again suggest if you really want to learn all these things pretty much properly, you really need

16
00:00:46,000 --> 00:00:51,000
to focus on all this representation of different different variants.

17
00:00:51,000 --> 00:00:51,000
Right.

18
00:00:52,000 --> 00:00:59,000
And just think within yourself like how it is different from LSTM RNN okay.

19
00:00:59,000 --> 00:01:04,000
Because here also you have this forget gate, you have this input gate, you have this output gate.

20
00:01:04,000 --> 00:01:07,000
But along with this you will be seeing some lines.

21
00:01:07,000 --> 00:01:09,000
So what will happen if you get this kind of line.

22
00:01:09,000 --> 00:01:13,000
Just just take 10 to 15 seconds and uh uh, you know think about it.

23
00:01:13,000 --> 00:01:18,000
You can pause the video and you can think, but I will go ahead and continue the explanation over here.

24
00:01:18,000 --> 00:01:21,000
Now, if I talk about LSTM, RNN.

25
00:01:21,000 --> 00:01:24,000
So it's not like LSTM RNN is very new concept, right.

26
00:01:24,000 --> 00:01:31,000
So LSTM RNN came in somewhere 1970s to 80s okay.

27
00:01:31,000 --> 00:01:35,000
At this point of time then different different variants.

28
00:01:35,000 --> 00:01:36,000
LSTM variants.

29
00:01:36,000 --> 00:01:40,000
Uh came like this is one of the variant we will discuss about this.

30
00:01:40,000 --> 00:01:45,000
And this was introduced by Girs and, uh, uh, Schmid.

31
00:01:45,000 --> 00:01:45,000
Huber.

32
00:01:45,000 --> 00:01:48,000
I guess I'm pronouncing it right.

33
00:01:48,000 --> 00:01:50,000
If I'm not, I'm extremely sorry.

34
00:01:50,000 --> 00:01:56,000
Okay, so this type of variant specifically came in 2000, and I have seen a lot of research paper where

35
00:01:56,000 --> 00:01:59,000
they have specifically used a LSTM RNN.

36
00:01:59,000 --> 00:02:02,000
So in some of the research paper I had seen this diagram.

37
00:02:02,000 --> 00:02:02,000
Okay.

38
00:02:02,000 --> 00:02:06,000
That is the reason I'm teaching you this different kind of variants also.

39
00:02:06,000 --> 00:02:07,000
Okay.

40
00:02:07,000 --> 00:02:11,000
Um, so obviously, uh, this is another kind of variant.

41
00:02:11,000 --> 00:02:13,000
And, uh, you can just consider that.

42
00:02:13,000 --> 00:02:14,000
Okay.

43
00:02:14,000 --> 00:02:15,000
In some of the research paper you will find this.

44
00:02:15,000 --> 00:02:20,000
And in some of the research paper we will find the normal LSTM, RNN what we had actually seen.

45
00:02:20,000 --> 00:02:20,000
Okay.

46
00:02:21,000 --> 00:02:26,000
Now let's go ahead and focus on this entire diagram okay.

47
00:02:26,000 --> 00:02:33,000
Now here you can see that the input gate, the sorry the forget gate, the in forget the forget gate,

48
00:02:33,000 --> 00:02:39,000
the input gate, the memory candidate, uh, the candidate memory and the output gate are almost same.

49
00:02:39,000 --> 00:02:43,000
You know, you also have the CT minus one, you have CT, ht, minus one, you have HT, you have XT.

50
00:02:43,000 --> 00:02:44,000
Everything is same.

51
00:02:44,000 --> 00:02:50,000
But the additional thing that you specifically have is this line that you are sending from the memory

52
00:02:50,000 --> 00:02:59,000
cell, uh, memory cell to both the forget gate and also to the out, uh, to the to the input gate,

53
00:02:59,000 --> 00:03:01,000
along with the candidate memory.

54
00:03:01,000 --> 00:03:04,000
And you are also sending it to the output gate.

55
00:03:04,000 --> 00:03:05,000
Okay.

56
00:03:05,000 --> 00:03:11,000
So what is basically happening in this variant from this particular variant you is you can actually

57
00:03:11,000 --> 00:03:17,000
see a line that is probably coming up from the memory cell to the.

58
00:03:17,000 --> 00:03:23,000
Again, let me repeat it to the forget gate, to the input and the memory candidate candidate memory,

59
00:03:23,000 --> 00:03:25,000
and also to the output gate.

60
00:03:25,000 --> 00:03:26,000
Okay.

61
00:03:26,000 --> 00:03:29,000
So this is all information are specifically coming.

62
00:03:29,000 --> 00:03:39,000
This connections, this connections, this connection from memory cell.

63
00:03:42,000 --> 00:03:42,000
Two.

64
00:03:44,000 --> 00:03:46,000
Forget gate.

65
00:03:49,000 --> 00:03:49,000
Two.

66
00:03:49,000 --> 00:03:50,000
Input gate.

67
00:03:52,000 --> 00:03:52,000
Input gate.

68
00:03:52,000 --> 00:03:56,000
And, uh, you know, you are also combining later on.

69
00:03:56,000 --> 00:03:58,000
This is just going to the input gate right now okay.

70
00:03:58,000 --> 00:04:00,000
So I'll not go ahead and say that.

71
00:04:00,000 --> 00:04:03,000
Hey, we have to include candidate memory.

72
00:04:03,000 --> 00:04:07,000
Candidate memory is done with a point wise operation Along with this, you also are sending it to the

73
00:04:07,000 --> 00:04:09,000
output gate, right?

74
00:04:09,000 --> 00:04:16,000
This connections are called as very important word that I'm going to use.

75
00:04:16,000 --> 00:04:20,000
And you'll also see in the research paper which is called as peephole.

76
00:04:22,000 --> 00:04:23,000
Connections.

77
00:04:23,000 --> 00:04:23,000
Okay.

78
00:04:23,000 --> 00:04:26,000
So this is basically called as peephole connections.

79
00:04:26,000 --> 00:04:27,000
Very important.

80
00:04:28,000 --> 00:04:28,000
Okay.

81
00:04:29,000 --> 00:04:30,000
Peephole.

82
00:04:33,000 --> 00:04:33,000
Connections.

83
00:04:33,000 --> 00:04:34,000
Right.

84
00:04:34,000 --> 00:04:37,000
So this is specifically called as peephole connection.

85
00:04:37,000 --> 00:04:39,000
And as the term says.

86
00:04:39,000 --> 00:04:39,000
Right.

87
00:04:40,000 --> 00:04:41,000
What is this?

88
00:04:41,000 --> 00:04:42,000
Peephole connections.

89
00:04:45,000 --> 00:04:48,000
Let me just give a simple definition over here.

90
00:04:48,000 --> 00:04:50,000
Peephole connection is nothing.

91
00:04:50,000 --> 00:04:52,000
But, uh.

92
00:04:53,000 --> 00:04:54,000
And you see this, okay?

93
00:04:54,000 --> 00:05:00,000
They will be providing this entire information, uh, in some of the research and some research paper

94
00:05:00,000 --> 00:05:01,000
will not be getting it.

95
00:05:01,000 --> 00:05:01,000
Okay.

96
00:05:02,000 --> 00:05:19,000
So here peephole connection means that through this architecture we let the gate layers look Look at

97
00:05:19,000 --> 00:05:23,000
the cell state.

98
00:05:23,000 --> 00:05:24,000
Okay.

99
00:05:24,000 --> 00:05:29,000
So this is one additional thing, right, that we are specifically doing.

100
00:05:29,000 --> 00:05:35,000
We are saying that hey, when you are from this architecture, we are just giving some functionality

101
00:05:35,000 --> 00:05:42,000
to the forget gate, saying that, hey, if you want to make this memory cell, forget some of the information.

102
00:05:42,000 --> 00:05:49,000
You can also take the previous context or previous information that is available in the memory cell

103
00:05:49,000 --> 00:05:52,000
to decide what information you need to make it forget.

104
00:05:52,000 --> 00:05:52,000
Right.

105
00:05:52,000 --> 00:05:53,000
We are.

106
00:05:53,000 --> 00:05:54,000
We are just saying that.

107
00:05:54,000 --> 00:05:57,000
Hey, go ahead and look at the cell state and then probably take a decision.

108
00:05:57,000 --> 00:06:04,000
By this, they will be able to take much more good decision with respect to forgetting, with respect

109
00:06:04,000 --> 00:06:08,000
to adding some information to the memory cell and with respect to getting the output.

110
00:06:08,000 --> 00:06:08,000
Okay.

111
00:06:08,000 --> 00:06:12,000
So this is the entire funda behind this particular variant.

112
00:06:13,000 --> 00:06:16,000
And here you can probably see the equations right.

113
00:06:16,000 --> 00:06:17,000
We need to calculate this f of t.

114
00:06:17,000 --> 00:06:18,000
This is there.

115
00:06:18,000 --> 00:06:21,000
We need to calculate I of t and we need to calculate o of t.

116
00:06:21,000 --> 00:06:23,000
Now in f of t only.

117
00:06:23,000 --> 00:06:25,000
This term is basically getting added right.

118
00:06:26,000 --> 00:06:32,000
This three important terms are basically getting added right before it was this two.

119
00:06:33,000 --> 00:06:36,000
Then we also are combining this right.

120
00:06:36,000 --> 00:06:37,000
We are also combining C of T.

121
00:06:37,000 --> 00:06:42,000
Let's say C of t are four dimensions sorry three dimension.

122
00:06:42,000 --> 00:06:44,000
Let's say t of minus one are three dimension.

123
00:06:45,000 --> 00:06:47,000
Let's let's consider in this way.

124
00:06:47,000 --> 00:06:48,000
So it will be three dimension.

125
00:06:49,000 --> 00:06:52,000
And you also have x of t.

126
00:06:52,000 --> 00:06:54,000
Let's say this is four dimension.

127
00:06:54,000 --> 00:06:55,000
So we are going to combine everything.

128
00:06:55,000 --> 00:06:57,000
And then we are going to give it to the neural network.

129
00:06:59,000 --> 00:07:01,000
Let's say there are three neurons.

130
00:07:01,000 --> 00:07:06,000
And let's say we are after this we are going to pass it to the sigmoid for the forget gate.

131
00:07:06,000 --> 00:07:08,000
And then finally I'm going to get the output.

132
00:07:08,000 --> 00:07:11,000
So this will get connected to each and every thing right over here.

133
00:07:11,000 --> 00:07:14,000
And you can further do the connection based on this.

134
00:07:14,000 --> 00:07:14,000
Right.

135
00:07:15,000 --> 00:07:19,000
So this indicates my if I'm just considering three dimension.

136
00:07:19,000 --> 00:07:20,000
This is my CT minus one.

137
00:07:20,000 --> 00:07:27,000
This is my PT minus one for the forget gate for the first operation to calculate f of t.

138
00:07:27,000 --> 00:07:28,000
So this is my f of t.

139
00:07:28,000 --> 00:07:32,000
And then this is my x of t okay.

140
00:07:32,000 --> 00:07:33,000
So this is how we are combining.

141
00:07:33,000 --> 00:07:37,000
And similarly I of t we are combining CT minus one over here o of t.

142
00:07:37,000 --> 00:07:41,000
We are combining CT because after both this operation we will be able to get CT over here.

143
00:07:42,000 --> 00:07:45,000
We will be able to get CT over here, right?

144
00:07:46,000 --> 00:07:48,000
So this was about one of the variant.

145
00:07:48,000 --> 00:07:50,000
Uh, let's talk about one more variant okay.

146
00:07:50,000 --> 00:07:52,000
So this specific variant okay.

147
00:07:52,000 --> 00:07:56,000
Now this uh, variant uh, is also amazing.

148
00:07:56,000 --> 00:08:01,000
Um, over here, this is, uh, I can just go ahead and write.

149
00:08:01,000 --> 00:08:06,000
This is another variation.

150
00:08:08,000 --> 00:08:09,000
Okay.

151
00:08:10,000 --> 00:08:18,000
Where we are coupling this is very important where we are coupling or where we are combining forget

152
00:08:19,000 --> 00:08:21,000
and input gates.

153
00:08:23,000 --> 00:08:24,000
Okay.

154
00:08:26,000 --> 00:08:28,000
Where we are combining forget and input gates.

155
00:08:29,000 --> 00:08:35,000
So here you can see that, um, I'll just go ahead and provide you the definition, what we are specifically

156
00:08:35,000 --> 00:08:36,000
doing.

157
00:08:36,000 --> 00:08:47,000
Here we are saying instead of instead of separately deciding.

158
00:08:49,000 --> 00:08:54,000
See, in my previous LSTM we had a separate forget gate.

159
00:08:54,000 --> 00:08:55,000
We had a separate input gate.

160
00:08:55,000 --> 00:09:00,000
With the help of forget gate we were separately deciding what to forget, right?

161
00:09:00,000 --> 00:09:04,000
And with the help of input gate, we were separately deciding what to add to the memory cell.

162
00:09:04,000 --> 00:09:05,000
Right?

163
00:09:05,000 --> 00:09:20,000
So instead of separately deciding what to forget and what we should add, We should add in.

164
00:09:20,000 --> 00:09:22,000
Add new information.

165
00:09:23,000 --> 00:09:24,000
New info.

166
00:09:25,000 --> 00:09:26,000
Okay.

167
00:09:27,000 --> 00:09:37,000
We make this decision together through this particular variant.

168
00:09:37,000 --> 00:09:38,000
Right?

169
00:09:40,000 --> 00:09:43,000
We make this decision together, right?

170
00:09:43,000 --> 00:09:45,000
And this is what we are doing.

171
00:09:45,000 --> 00:09:49,000
Forget information is also going over here and some kind of operation is basically happening okay.

172
00:09:49,000 --> 00:09:53,000
So here the goal is okay.

173
00:09:53,000 --> 00:09:54,000
The goal is very simple.

174
00:09:55,000 --> 00:09:56,000
I will just go ahead and write my goal.

175
00:09:56,000 --> 00:09:58,000
The goal is very simple.

176
00:09:58,000 --> 00:10:03,000
The goal is just to make sure that we only forget.

177
00:10:05,000 --> 00:10:10,000
We only forget when we are.

178
00:10:11,000 --> 00:10:20,000
When we are going to input something.

179
00:10:22,000 --> 00:10:24,000
In its place.

180
00:10:26,000 --> 00:10:28,000
So what does this sentence mean?

181
00:10:28,000 --> 00:10:32,000
Is that whenever we are inputting something, then only we are going to forget something.

182
00:10:32,000 --> 00:10:39,000
Okay, so if I want to probably give a conclusion over here, I can also write.

183
00:10:39,000 --> 00:10:45,000
We only input new values.

184
00:10:48,000 --> 00:10:49,000
To the state.

185
00:10:52,000 --> 00:10:55,000
When we forget.

186
00:10:56,000 --> 00:10:57,000
something older.

187
00:10:59,000 --> 00:11:01,000
So this is the entire funda.

188
00:11:01,000 --> 00:11:09,000
Okay, uh, I have to, uh, say this is one kind of operation which is basically doing this, this,

189
00:11:09,000 --> 00:11:10,000
this entire functionality.

190
00:11:10,000 --> 00:11:11,000
That's it.

191
00:11:11,000 --> 00:11:11,000
Okay.

192
00:11:12,000 --> 00:11:15,000
And by this, you probably get another variation.

193
00:11:15,000 --> 00:11:16,000
And this is nothing.

194
00:11:16,000 --> 00:11:23,000
But we are trying to combine forget and input Get in such a way that whenever we don't have any information

195
00:11:23,000 --> 00:11:28,000
to remove from the memory cell, then we don't add new information.

196
00:11:28,000 --> 00:11:32,000
When we remove some information, then only we add new information from the input gate.

197
00:11:32,000 --> 00:11:35,000
So I hope you like both this particular variant.

198
00:11:35,000 --> 00:11:42,000
Now there is also one more variant which is called as GRU gated recurrent recurrent unit.

199
00:11:42,000 --> 00:11:42,000
Okay.

200
00:11:43,000 --> 00:11:46,000
And this is also again a different variant of RNN.

201
00:11:46,000 --> 00:11:48,000
Um, why we use this.

202
00:11:48,000 --> 00:11:51,000
We will talk about the advantages and disadvantages.

203
00:11:51,000 --> 00:11:56,000
And here also if we see uh here you can actually see what was the kind of operation that we did.

204
00:11:56,000 --> 00:12:04,000
So when we are calculating okay C of T that is this we take f of t multiplied by.

205
00:12:04,000 --> 00:12:06,000
That is point wise operation.

206
00:12:06,000 --> 00:12:11,000
So this is my point wise operation over here with c t minus one.

207
00:12:11,000 --> 00:12:12,000
So this is fine.

208
00:12:12,000 --> 00:12:21,000
Plus what we do we say hey one minus f of t we are going to multiply with c of t whatever output we

209
00:12:21,000 --> 00:12:23,000
are getting from this particular tanh edge.

210
00:12:23,000 --> 00:12:24,000
Right.

211
00:12:24,000 --> 00:12:29,000
And here you can see c one minus f of t is nothing but this.

212
00:12:29,000 --> 00:12:30,000
We are just subtracting one.

213
00:12:30,000 --> 00:12:32,000
Over here we are multiplying with C.

214
00:12:32,000 --> 00:12:33,000
Uh, we are doing a point.

215
00:12:33,000 --> 00:12:37,000
Uh, we are basically taking the C of T, we are multiplying it over here.

216
00:12:37,000 --> 00:12:39,000
And then we get this entire information.

217
00:12:39,000 --> 00:12:40,000
Right.

218
00:12:40,000 --> 00:12:40,000
So this is nothing.

219
00:12:40,000 --> 00:12:42,000
But this is my point wise operation.

220
00:12:43,000 --> 00:12:50,000
So this basically indicates that through this operation we we add new values.

221
00:12:51,000 --> 00:12:56,000
We add new values only when we forget.

222
00:12:58,000 --> 00:13:02,000
Only when we forget from the forget date.

223
00:13:02,000 --> 00:13:02,000
Right.

224
00:13:04,000 --> 00:13:06,000
That's it, you know.

225
00:13:06,000 --> 00:13:07,000
So I hope.

226
00:13:07,000 --> 00:13:09,000
Are you able to understand this video with respect to variance?

227
00:13:09,000 --> 00:13:15,000
In our next video we will be talking about gated recurrent neural networks or gate gated recurrent unit.

228
00:13:15,000 --> 00:13:18,000
And we'll try to understand what are the advantages and disadvantages.

229
00:13:18,000 --> 00:13:24,000
But I would suggest you please just check the diagram, see the variation from the other diagram that

230
00:13:24,000 --> 00:13:26,000
is like LSTM as a.

231
00:13:26,000 --> 00:13:28,000
And I will definitely see you guys.

232
00:13:28,000 --> 00:13:32,000
LSTM RNN is must to understand anything.

233
00:13:32,000 --> 00:13:38,000
The 5 to 6 videos where I have explained about input output, how the training basically happens, everything

234
00:13:38,000 --> 00:13:39,000
you really need to know.

235
00:13:39,000 --> 00:13:42,000
Okay, so yes, this was it from my side.

236
00:13:42,000 --> 00:13:44,000
I will see you all in the next video.

237
00:13:44,000 --> 00:13:44,000
Thank you.

