1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:05,000
So we are going to continue the discussion with respect to NLP in deep learning.

3
00:00:05,000 --> 00:00:11,000
And in this video we are going to see about a new variant of RNN which is called as LSTM.

4
00:00:11,000 --> 00:00:15,000
LSTM is nothing but long short term memory RNN.

5
00:00:15,000 --> 00:00:25,000
Okay, now already in our previous video we have discussed about RNN and we also understood that RNN

6
00:00:25,000 --> 00:00:31,000
is not able to solve long term dependencies, right?

7
00:00:32,000 --> 00:00:44,000
And that is where we got to know about something called as vanishing gradient problem.

8
00:00:45,000 --> 00:00:45,000
Right.

9
00:00:46,000 --> 00:00:47,000
We got to know about this.

10
00:00:48,000 --> 00:00:53,000
Now in vanishing gradient problem, whenever we had a long chain rule at that point of time, what was

11
00:00:53,000 --> 00:00:53,000
happening?

12
00:00:54,000 --> 00:00:59,000
Whenever we are trying to find out the derivative of one output with respect to the other output, at

13
00:00:59,000 --> 00:01:02,000
that point of time I was getting a smaller value.

14
00:01:02,000 --> 00:01:07,000
So whenever we had a very bigger chain rule with respect to time, the value was approximately equal

15
00:01:07,000 --> 00:01:08,000
to zero.

16
00:01:08,000 --> 00:01:08,000
Okay.

17
00:01:08,000 --> 00:01:11,000
So that all things we discussed in our previous video.

18
00:01:11,000 --> 00:01:15,000
Now again I will go ahead and revise some important questions.

19
00:01:15,000 --> 00:01:20,000
So first of all, I will say again I will talk about the problems of RNN in this video.

20
00:01:21,000 --> 00:01:22,000
Okay.

21
00:01:23,000 --> 00:01:29,000
Um, then we will discuss about why LSTM RNN instead of directly using RNN.

22
00:01:29,000 --> 00:01:32,000
So this is the second topic that I will discuss about.

23
00:01:32,000 --> 00:01:40,000
Third thing we will just how LSTM RNN is solving the problem right.

24
00:01:40,000 --> 00:01:43,000
So here two important topics we will be talking about.

25
00:01:43,000 --> 00:01:48,000
One is what is this long long term memory.

26
00:01:49,000 --> 00:01:54,000
And then we will also be talking about the short term memory right.

27
00:01:54,000 --> 00:01:59,000
Short term memory along with this.

28
00:01:59,000 --> 00:02:05,000
Then in the upcoming videos we will be seeing the LSTM architecture.

29
00:02:07,000 --> 00:02:09,000
Architecture.

30
00:02:09,000 --> 00:02:12,000
And finally we'll see the working of LSTM okay.

31
00:02:12,000 --> 00:02:17,000
Fifth will be working of LSTM.

32
00:02:20,000 --> 00:02:22,000
With examples.

33
00:02:23,000 --> 00:02:24,000
Now everything will be discussing.

34
00:02:24,000 --> 00:02:30,000
But before I go ahead I really want to quickly give an amazing shout out to this amazing blog which

35
00:02:30,000 --> 00:02:32,000
is called as Koalas Koala's blog.

36
00:02:32,000 --> 00:02:39,000
Again, this blog if you probably go ahead and see who's the author over here, is an amazing author,

37
00:02:39,000 --> 00:02:39,000
right?

38
00:02:39,000 --> 00:02:40,000
Uh, his name is Jay.

39
00:02:41,000 --> 00:02:41,000
Okay.

40
00:02:41,000 --> 00:02:45,000
And definitely do check out this particular blog again.

41
00:02:45,000 --> 00:02:52,000
Uh, because from here I will be taking most of the diagrams, because here all the diagrams are given

42
00:02:52,000 --> 00:02:58,000
much more in an amazing way to make you understand right about LSTM Again, a quick shout out to this

43
00:02:58,000 --> 00:02:59,000
amazing blog.

44
00:02:59,000 --> 00:03:00,000
The credit goes to this.

45
00:03:01,000 --> 00:03:02,000
I'm giving the proper credit.

46
00:03:02,000 --> 00:03:05,000
Uh, and I will be referring to this particular blog.

47
00:03:05,000 --> 00:03:12,000
So, uh, let's go ahead and let's start our session with respect to LSTM RNN.

48
00:03:12,000 --> 00:03:17,000
So guys now let's go ahead and discuss about the basic problems with RNN.

49
00:03:17,000 --> 00:03:20,000
And already I have discussed this in our previous video.

50
00:03:20,000 --> 00:03:23,000
but here I will try to give you some more examples.

51
00:03:23,000 --> 00:03:24,000
Okay.

52
00:03:24,000 --> 00:03:30,000
So let's say that I am planning to predict, uh, the next word in a sentence.

53
00:03:30,000 --> 00:03:31,000
Okay.

54
00:03:31,000 --> 00:03:35,000
Next word in a sentence.

55
00:03:36,000 --> 00:03:40,000
So this is my task that I really want to do.

56
00:03:40,000 --> 00:03:41,000
Okay.

57
00:03:42,000 --> 00:03:47,000
Now with respect to this particular task, Let's say one of the example that I have actually taken,

58
00:03:47,000 --> 00:03:59,000
let's say my sentence is something like the color of the sky is dash.

59
00:03:59,000 --> 00:04:00,000
Okay.

60
00:04:00,000 --> 00:04:03,000
So I have to predict this particular word.

61
00:04:04,000 --> 00:04:08,000
Now you know that in order to predict this word, let's say the answer is blue.

62
00:04:09,000 --> 00:04:10,000
Okay, I will just go ahead and write.

63
00:04:10,000 --> 00:04:12,000
The answer is blue over here.

64
00:04:12,000 --> 00:04:15,000
So the color of the sky is blue.

65
00:04:16,000 --> 00:04:23,000
Now when we need to predict this particular word, you know that we do not need further context, you

66
00:04:23,000 --> 00:04:26,000
know, based on the previous word.

67
00:04:26,000 --> 00:04:26,000
Right.

68
00:04:26,000 --> 00:04:30,000
So based on this particular word, since we are as we know that.

69
00:04:30,000 --> 00:04:31,000
Right?

70
00:04:31,000 --> 00:04:33,000
Obviously the color of the sky is blue.

71
00:04:33,000 --> 00:04:33,000
Right.

72
00:04:33,000 --> 00:04:33,000
right?

73
00:04:33,000 --> 00:04:39,000
So this word is mostly proportionate to this has a dependency to this particular word.

74
00:04:39,000 --> 00:04:47,000
And obviously to get the output we don't require any further context, any further context.

75
00:04:47,000 --> 00:04:50,000
That basically means there is no dependency on other word over here.

76
00:04:50,000 --> 00:04:51,000
Right.

77
00:04:52,000 --> 00:04:57,000
Uh, since we are talking about the color of the sky is blue, uh, you can see that the sentence is

78
00:04:57,000 --> 00:04:58,000
also very small.

79
00:04:59,000 --> 00:05:04,000
And when in terms of dependency, let's say there are two words there that are dependent to probably

80
00:05:04,000 --> 00:05:04,000
find out the output.

81
00:05:04,000 --> 00:05:06,000
One is sky and one is color.

82
00:05:06,000 --> 00:05:10,000
The gap between this particular word is also not that long, right?

83
00:05:10,000 --> 00:05:17,000
So obviously with the help of RNN, we will be able to solve this problem and it will even not face

84
00:05:17,000 --> 00:05:20,000
anything like vanishing gradient problem.

85
00:05:20,000 --> 00:05:25,000
So vanishing gradient problem will not be there, right?

86
00:05:25,000 --> 00:05:32,000
But if I take one more example, let's say uh, I will go ahead and write something like this.

87
00:05:32,000 --> 00:05:41,000
I grew up in India and let's say there is some more text over here.

88
00:05:42,000 --> 00:05:49,000
And then finally I say I speak fluent Hindi.

89
00:05:50,000 --> 00:05:53,000
Okay, now here you can actually see.

90
00:05:55,000 --> 00:05:57,000
Obviously just by seeing this.

91
00:05:57,000 --> 00:05:57,000
Right.

92
00:05:57,000 --> 00:06:00,000
And let's say this is this is what I really want to find out.

93
00:06:00,000 --> 00:06:03,000
This is what I really want to find out.

94
00:06:03,000 --> 00:06:05,000
I want to find out this specific word.

95
00:06:05,000 --> 00:06:08,000
So this is the word blank that I really want to find out the next word.

96
00:06:09,000 --> 00:06:12,000
Now just by seeing this you can see Okay, fine.

97
00:06:12,000 --> 00:06:17,000
If I probably go ahead with this entire sentence, obviously a word that should come over here should

98
00:06:17,000 --> 00:06:19,000
be a name of a language.

99
00:06:20,000 --> 00:06:23,000
Let's consider it should be a name of a language.

100
00:06:25,000 --> 00:06:25,000
Right.

101
00:06:26,000 --> 00:06:28,000
But which language?

102
00:06:28,000 --> 00:06:29,000
Right.

103
00:06:29,000 --> 00:06:31,000
They are different.

104
00:06:31,000 --> 00:06:34,000
Different languages that actually exist throughout the world, right.

105
00:06:34,000 --> 00:06:39,000
So obviously it requires some more further context.

106
00:06:40,000 --> 00:06:45,000
Now, in this case, the context that it requires may be the name of the country.

107
00:06:46,000 --> 00:06:46,000
Right.

108
00:06:46,000 --> 00:06:53,000
And as I said that this sentence I grew up in India, this probably came at the first.

109
00:06:53,000 --> 00:06:54,000
And then we had a lot of sentence.

110
00:06:54,000 --> 00:07:00,000
And then finally we ended with, I speak fluent whatever language over here it is.

111
00:07:00,000 --> 00:07:00,000
Right.

112
00:07:00,000 --> 00:07:06,000
So obviously this word is dependent on this particular context, right?

113
00:07:06,000 --> 00:07:12,000
And obviously you can also see that there is a long term dependency because there is a huge gap.

114
00:07:12,000 --> 00:07:21,000
There is a huge gap between the output that I want to find and the word which has its further context.

115
00:07:21,000 --> 00:07:22,000
So there is a huge gap over here.

116
00:07:23,000 --> 00:07:25,000
So in this particular scenario.

117
00:07:25,000 --> 00:07:31,000
So here you could see that uh uh, if I, if I just take this as an example okay.

118
00:07:31,000 --> 00:07:36,000
So I will just go ahead and probably take this as an example over here.

119
00:07:36,000 --> 00:07:37,000
And I will write it over here.

120
00:07:37,000 --> 00:07:41,000
The reason is very simple because this is the architecture that we are looking at.

121
00:07:41,000 --> 00:07:43,000
And similarly over here okay.

122
00:07:43,000 --> 00:07:46,000
Let's go ahead and probably do this right.

123
00:07:47,000 --> 00:07:54,000
So here uh, the problem was very simple that whenever I was trying to perform a task, okay, whenever

124
00:07:54,000 --> 00:07:58,000
I was trying to perform a task over here with respect to this.

125
00:07:59,000 --> 00:08:02,000
So just imagine that I want to find out this particular output.

126
00:08:02,000 --> 00:08:08,000
And there is there is a dependency on this particular word to probably find out the output.

127
00:08:08,000 --> 00:08:08,000
Right.

128
00:08:08,000 --> 00:08:17,000
So here the gap is less So when the gap is less, so the dependency.

129
00:08:17,000 --> 00:08:20,000
And it may also not be probably requiring any further context.

130
00:08:20,000 --> 00:08:22,000
It is just directly dependent on this word or this word.

131
00:08:22,000 --> 00:08:22,000
Right.

132
00:08:22,000 --> 00:08:25,000
And it is being able to find out the output.

133
00:08:25,000 --> 00:08:30,000
But if I have something like this kind of scenario where there is a huge gap between the output and

134
00:08:30,000 --> 00:08:32,000
the further context word that is required, right?

135
00:08:32,000 --> 00:08:37,000
Like in this scenario, let's say I want to predict this, I want to predict this.

136
00:08:37,000 --> 00:08:38,000
Right.

137
00:08:38,000 --> 00:08:42,000
But this is completely dependent on the first initial word.

138
00:08:42,000 --> 00:08:49,000
And here you can basically see that there is a huge gap when there is a huge gap, I can basically say,

139
00:08:49,000 --> 00:08:52,000
hey, there is a long term dependency.

140
00:08:54,000 --> 00:08:58,000
And obviously RNN cannot solve this, right?

141
00:08:58,000 --> 00:09:05,000
RNN cannot solve this long term dependency.

142
00:09:06,000 --> 00:09:07,000
And it's very simple.

143
00:09:07,000 --> 00:09:08,000
Why?

144
00:09:08,000 --> 00:09:14,000
Because it faces this problem, which is called as vanishing gradient problem and vanishing gradient

145
00:09:14,000 --> 00:09:14,000
problem.

146
00:09:14,000 --> 00:09:22,000
What it does, it basically says, hey, whichever is the nearest word to this.

147
00:09:22,000 --> 00:09:23,000
Like let's say this word is nearest.

148
00:09:23,000 --> 00:09:31,000
This can really change the weights and it will be much more responsible in finding the next word.

149
00:09:31,000 --> 00:09:31,000
Right?

150
00:09:31,000 --> 00:09:37,000
But if we see the words that came earlier at t is equal to one, at t is equal to two.

151
00:09:37,000 --> 00:09:41,000
This word, because of the huge gap that is present over here.

152
00:09:41,000 --> 00:09:41,000
Right?

153
00:09:42,000 --> 00:09:45,000
The huge gap is between the relevant information.

154
00:09:45,000 --> 00:09:46,000
Right.

155
00:09:46,000 --> 00:09:50,000
The output that we really need to find out, and with respect to the further context for the context

156
00:09:50,000 --> 00:09:53,000
basically means it is having some kind of dependency on this particular word.

157
00:09:53,000 --> 00:09:56,000
So there is a huge gap, right?

158
00:09:56,000 --> 00:10:02,000
So this kind of scenario what happens vanishing gradient problem is basically faced.

159
00:10:02,000 --> 00:10:05,000
And there the updation of the weights will not happen.

160
00:10:05,000 --> 00:10:10,000
Because whenever we try to find out the derivative of loss with respect to derivative of words at this

161
00:10:10,000 --> 00:10:14,000
particular timestamp, right, the weight will be there will be some value.

162
00:10:14,000 --> 00:10:18,000
But at this particular timestamp it will be approximately equal to zero.

163
00:10:18,000 --> 00:10:19,000
Right?

164
00:10:19,000 --> 00:10:20,000
Why?

165
00:10:20,000 --> 00:10:25,000
Because the derivative of a activation function that we are specifically using is between zero to 0.25.

166
00:10:25,000 --> 00:10:30,000
Let's say in the case of sigmoid, in the case of uh, tanh it is 0 to 1.

167
00:10:30,000 --> 00:10:31,000
Right.

168
00:10:31,000 --> 00:10:39,000
And whenever we use this in our derivative chain rule of derivative right, it will be a big rule.

169
00:10:39,000 --> 00:10:39,000
Right?

170
00:10:39,000 --> 00:10:40,000
It will be a big chain right?

171
00:10:40,000 --> 00:10:45,000
At that point of time, the value will be approximately equal to zero that we discussed in our previous

172
00:10:45,000 --> 00:10:46,000
video.

173
00:10:46,000 --> 00:10:46,000
Right.

174
00:10:46,000 --> 00:10:48,000
So I hope you understood this.

175
00:10:48,000 --> 00:10:58,000
And this problem that I would definitely like to say is something called as long short term sorry long

176
00:10:58,000 --> 00:11:00,000
term dependency.

177
00:11:01,000 --> 00:11:04,000
This is the problem that is basically faced with RNN.

178
00:11:06,000 --> 00:11:07,000
And I've also taken one example.

179
00:11:08,000 --> 00:11:15,000
If this is a smaller sentence, and let's say this particular word is just dependent on, you know,

180
00:11:15,000 --> 00:11:17,000
the nearest word, the gap is not that much.

181
00:11:17,000 --> 00:11:19,000
It will not require any further context.

182
00:11:19,000 --> 00:11:20,000
Right?

183
00:11:20,000 --> 00:11:24,000
So, uh, I hope you got an idea what was the problem with RNN?

184
00:11:24,000 --> 00:11:29,000
And obviously now previous video we have discussed about the vanishing gradient problem, mathematical

185
00:11:29,000 --> 00:11:29,000
intuition.

186
00:11:29,000 --> 00:11:34,000
Now let's go ahead and see how our LSTM RNN solves this problem okay.

187
00:11:34,000 --> 00:11:41,000
So before I go ahead, this on the left hand side is a basic representation of RNN okay.

188
00:11:42,000 --> 00:11:44,000
Now here you'll be able to see that okay.

189
00:11:44,000 --> 00:11:48,000
This is with respect to t is equal to t minus one.

190
00:11:48,000 --> 00:11:50,000
This is t is equal to t.

191
00:11:50,000 --> 00:11:53,000
Let's consider if I give some numbers right.

192
00:11:53,000 --> 00:11:59,000
So let's say this is with respect to t is equal to one, t is equal to two and t is equal to three.

193
00:11:59,000 --> 00:11:59,000
Right.

194
00:11:59,000 --> 00:12:01,000
We are passing some information.

195
00:12:01,000 --> 00:12:07,000
And over here you can see that uh the previous output from the timestamp is combined along with the

196
00:12:07,000 --> 00:12:08,000
input right here.

197
00:12:08,000 --> 00:12:10,000
There will be some weights that will be assigned.

198
00:12:10,000 --> 00:12:14,000
So if you really want to just see this it will be looking something like this.

199
00:12:14,000 --> 00:12:14,000
Right.

200
00:12:14,000 --> 00:12:16,000
This is the same thing that we discussed.

201
00:12:17,000 --> 00:12:17,000
Right.

202
00:12:17,000 --> 00:12:23,000
So I have I have a neural network or a neural network which looks like this.

203
00:12:23,000 --> 00:12:24,000
Right.

204
00:12:24,000 --> 00:12:28,000
So I'll just take this as an example.

205
00:12:28,000 --> 00:12:29,000
So what exactly this is.

206
00:12:29,000 --> 00:12:30,000
Right.

207
00:12:30,000 --> 00:12:36,000
So at timestamp t is equal to Ma one I'm passing this as an information over here.

208
00:12:36,000 --> 00:12:38,000
Also will get some kind of information here.

209
00:12:38,000 --> 00:12:40,000
There will be some weights that will be assigned.

210
00:12:40,000 --> 00:12:41,000
We talked we spoken about this.

211
00:12:41,000 --> 00:12:42,000
This will be my input weights.

212
00:12:42,000 --> 00:12:43,000
Right.

213
00:12:43,000 --> 00:12:47,000
Uh, here we are going to pass uh XI1.

214
00:12:47,000 --> 00:12:50,000
This will be XI2 this will be XI3.

215
00:12:50,000 --> 00:12:52,000
Again I'll be having w of I w of I.

216
00:12:52,000 --> 00:12:54,000
And here it will be t is equal to two.

217
00:12:54,000 --> 00:12:56,000
The here will be t is equal to three.

218
00:12:56,000 --> 00:12:56,000
Okay.

219
00:12:56,000 --> 00:13:01,000
Now just imagine when we are passing this input along with the weights I will be able to calculate my

220
00:13:01,000 --> 00:13:02,000
O one over here.

221
00:13:02,000 --> 00:13:03,000
And this is nothing.

222
00:13:03,000 --> 00:13:05,000
But this is my O one right?

223
00:13:05,000 --> 00:13:11,000
I'm passing this information to my, uh, next, next time stamp right over here.

224
00:13:11,000 --> 00:13:12,000
Now, what will happen?

225
00:13:12,000 --> 00:13:16,000
This will be assigned with another hidden neuron weights.

226
00:13:16,000 --> 00:13:17,000
Sorry are done.

227
00:13:17,000 --> 00:13:19,000
Other hidden weights okay.

228
00:13:19,000 --> 00:13:23,000
And this both are basically getting combined over here along with the weights.

229
00:13:23,000 --> 00:13:26,000
And then on top of it you apply a tan h activation function.

230
00:13:26,000 --> 00:13:27,000
And finally we get an output.

231
00:13:27,000 --> 00:13:29,000
So this is my h of t right.

232
00:13:29,000 --> 00:13:33,000
And I'm also getting an output over here which is passed to the next state.

233
00:13:33,000 --> 00:13:37,000
And same representation is basically used okay.

234
00:13:37,000 --> 00:13:43,000
Now if I go ahead towards the LSTM RNN representation see guys nothing is different.

235
00:13:43,000 --> 00:13:45,000
So this is the same thing like how what we discussed.

236
00:13:45,000 --> 00:13:51,000
Because internally over here inside this hidden layer within each each and every neuron we basically

237
00:13:51,000 --> 00:13:54,000
apply tanh activation function or sigmoid activation function.

238
00:13:54,000 --> 00:13:54,000
Right.

239
00:13:54,000 --> 00:13:58,000
So here it is basically given as tan activation function okay.

240
00:13:58,000 --> 00:14:00,000
Now if I talk about LSTM RNN.

241
00:14:00,000 --> 00:14:03,000
So this is what is the generic representation okay.

242
00:14:03,000 --> 00:14:05,000
So don't don't get don't get worried.

243
00:14:05,000 --> 00:14:10,000
Like it looks really complex when compared to this I will break down break this down each and every

244
00:14:10,000 --> 00:14:11,000
thing as we go ahead.

245
00:14:12,000 --> 00:14:18,000
Uh, but uh, just to give you a smaller idea, you know what exactly is done by LSTM RNN?

246
00:14:18,000 --> 00:14:21,000
Okay, so let's say this is my RNN.

247
00:14:22,000 --> 00:14:24,000
This is my RNN over here.

248
00:14:27,000 --> 00:14:30,000
Now with respect to this RNN okay.

249
00:14:30,000 --> 00:14:35,000
When I'm passing this information okay.

250
00:14:35,000 --> 00:14:40,000
See RNN usually has this short term memory okay.

251
00:14:42,000 --> 00:14:44,000
It has this short term memory.

252
00:14:47,000 --> 00:14:53,000
So since we know that whenever we work with RNN, how is the generic representation?

253
00:14:53,000 --> 00:14:55,000
It looks something like this.

254
00:14:55,000 --> 00:14:57,000
And there is a feedback loop okay.

255
00:14:57,000 --> 00:15:06,000
This feedback loop is specifically for short term memory okay.

256
00:15:06,000 --> 00:15:16,000
Short term memory now with the help of LSTM RNN, what we are doing is that we are also adding a LSTM

257
00:15:16,000 --> 00:15:16,000
memory.

258
00:15:16,000 --> 00:15:21,000
Sorry, we also adding a long term memory.

259
00:15:23,000 --> 00:15:28,000
Long term memory along with the short term memory.

260
00:15:31,000 --> 00:15:35,000
Now we know that in RNN all already short term memory is there.

261
00:15:35,000 --> 00:15:38,000
That is this particular feedback loop right.

262
00:15:38,000 --> 00:15:39,000
In RNN we know that right.

263
00:15:39,000 --> 00:15:44,000
So this is this this line that is specifically going to all the other hidden neurons.

264
00:15:44,000 --> 00:15:46,000
This is nothing but your short term memory.

265
00:15:46,000 --> 00:15:48,000
But what about long term memory.

266
00:15:48,000 --> 00:15:51,000
So over here the long term memory is added something like this.

267
00:15:51,000 --> 00:15:58,000
There will be a big line and how this line will be like, you know, uh, this line will be like, I

268
00:15:58,000 --> 00:16:00,000
hope everybody has visited airport, right?

269
00:16:00,000 --> 00:16:04,000
So in airport you will be seeing something like a conveyance belt.

270
00:16:06,000 --> 00:16:07,000
Right.

271
00:16:07,000 --> 00:16:13,000
Conveyance belt basically means, uh, where you specifically put luggages, right?

272
00:16:13,000 --> 00:16:14,000
Luggages.

273
00:16:14,000 --> 00:16:18,000
So there will be a continuously moving line.

274
00:16:18,000 --> 00:16:19,000
Right?

275
00:16:19,000 --> 00:16:24,000
Whenever an airport arrives, people will start, uh, the worker will start putting out the luggage

276
00:16:24,000 --> 00:16:25,000
over here.

277
00:16:25,000 --> 00:16:29,000
And when it reaches here, let's say there are some human beings.

278
00:16:29,000 --> 00:16:31,000
They'll take out the luggage from here, right?

279
00:16:31,000 --> 00:16:33,000
They'll take out the luggage from here.

280
00:16:33,000 --> 00:16:37,000
So this is with respect to all the human beings who are specifically using it.

281
00:16:37,000 --> 00:16:49,000
Similarly, in LSTM, RNN, there will be this long term memory Long term memory.

282
00:16:49,000 --> 00:17:00,000
And the main work of this particular memory is to just make sure, add what context is required and

283
00:17:00,000 --> 00:17:03,000
remove what context is not required okay.

284
00:17:03,000 --> 00:17:10,000
So in this particular case, let's say if I take this particular example, uh, I grew up in India.

285
00:17:10,000 --> 00:17:12,000
I speak fluent this.

286
00:17:12,000 --> 00:17:13,000
Right.

287
00:17:13,000 --> 00:17:20,000
So here you'll be able to see when we pass with respect to time XI1XI2XI3.

288
00:17:20,000 --> 00:17:25,000
Like this when we go ahead and pass as soon as we talk about India.

289
00:17:25,000 --> 00:17:26,000
Right.

290
00:17:26,000 --> 00:17:28,000
Uh, let's say the word over there is India.

291
00:17:28,000 --> 00:17:30,000
And we need to predict what is the language that I speak.

292
00:17:30,000 --> 00:17:31,000
Okay.

293
00:17:31,000 --> 00:17:37,000
So whenever till the context is, see this?

294
00:17:37,000 --> 00:17:42,000
This, this memory cell will make sure that it will maintain the context till whenever it is required.

295
00:17:42,000 --> 00:17:45,000
Whatever information is specifically required, it will store over here.

296
00:17:45,000 --> 00:17:48,000
If it is not required, it will be removed from there.

297
00:17:48,000 --> 00:17:48,000
Right.

298
00:17:48,000 --> 00:17:51,000
That is what is the main importance of memory cell.

299
00:17:51,000 --> 00:17:59,000
So that if I really want to predict this particular word, this India context will be saved in this

300
00:17:59,000 --> 00:18:00,000
long term memory, right?

301
00:18:00,000 --> 00:18:00,000
Right.

302
00:18:00,000 --> 00:18:03,000
Even though the information is not there in short term memory.

303
00:18:03,000 --> 00:18:05,000
See in short term memory.

304
00:18:05,000 --> 00:18:09,000
If I probably consider the same sentence, first of all, I will go ahead and give my word.

305
00:18:09,000 --> 00:18:13,000
I then grew, then up right in India.

306
00:18:13,000 --> 00:18:16,000
Now once we went till India over here some other sentence came.

307
00:18:16,000 --> 00:18:18,000
So the context switch has actually happened.

308
00:18:18,000 --> 00:18:19,000
So what will happen?

309
00:18:19,000 --> 00:18:21,000
We'll forget all this information.

310
00:18:21,000 --> 00:18:21,000
Right.

311
00:18:21,000 --> 00:18:27,000
But I know that we are going to go ahead and probably this sentence is going to come.

312
00:18:27,000 --> 00:18:29,000
I speak fluent in so and so.

313
00:18:29,000 --> 00:18:29,000
Right.

314
00:18:29,000 --> 00:18:31,000
So what memory cell is going to do?

315
00:18:31,000 --> 00:18:40,000
It is going to save this India word, or it is going to save this context in the memory cell.

316
00:18:40,000 --> 00:18:40,000
Right.

317
00:18:40,000 --> 00:18:49,000
So finally when it comes like I talk in whatever language it is, it will go in and check in the memory

318
00:18:49,000 --> 00:18:51,000
cell what all information is specifically there, right.

319
00:18:52,000 --> 00:18:55,000
And if that context is available based on that we will do the prediction.

320
00:18:55,000 --> 00:18:56,000
Right.

321
00:18:56,000 --> 00:18:59,000
So this is just like a book of library okay.

322
00:18:59,000 --> 00:19:01,000
And obviously we as a human being right.

323
00:19:01,000 --> 00:19:06,000
If I tell you hey um, let's say I am learning, I'm studying something.

324
00:19:06,000 --> 00:19:08,000
You know, after some time I forget something.

325
00:19:08,000 --> 00:19:12,000
So what I do, if I want to probably refer that I will remove the book from this particular shelf,

326
00:19:12,000 --> 00:19:14,000
and I will try to go ahead and revise it.

327
00:19:14,000 --> 00:19:15,000
Okay.

328
00:19:15,000 --> 00:19:18,000
And then whenever I do not require, I may forget it.

329
00:19:18,000 --> 00:19:18,000
Right?

330
00:19:18,000 --> 00:19:21,000
I may just say remove from my memory because it is not important.

331
00:19:21,000 --> 00:19:24,000
But when exam day comes, I really need to remember all these things.

332
00:19:24,000 --> 00:19:29,000
So this is also like my brain cells which we also call as our memory cells.

333
00:19:29,000 --> 00:19:32,000
Till what extent we can basically remember.

334
00:19:32,000 --> 00:19:32,000
Right.

335
00:19:32,000 --> 00:19:39,000
And sometimes, uh, based on the further context, based on the exams, based on the test.

336
00:19:39,000 --> 00:19:41,000
I need to know like how much things I really need to remember, right?

337
00:19:41,000 --> 00:19:45,000
So that is the entire funda about LSTM RNN.

338
00:19:45,000 --> 00:19:46,000
I've just given you an idea.

339
00:19:46,000 --> 00:19:49,000
So this is basically called as a memory cell.

340
00:19:50,000 --> 00:19:55,000
I will be discussing more about it because in the next upcoming videos we will break down this entire

341
00:19:55,000 --> 00:20:00,000
architecture and we'll see how it is probably solving this long term dependency.

342
00:20:00,000 --> 00:20:01,000
Right.

343
00:20:01,000 --> 00:20:06,000
So I hope, uh Uh, you understood the specific video.

344
00:20:06,000 --> 00:20:08,000
Uh, this was it for my side.

345
00:20:08,000 --> 00:20:12,000
So in this video we have discussed about, uh, LSTM.

346
00:20:12,000 --> 00:20:13,000
What is the problem over here?

347
00:20:13,000 --> 00:20:15,000
Why LSTM rnn?

348
00:20:15,000 --> 00:20:16,000
Okay.

349
00:20:16,000 --> 00:20:22,000
And uh, we have also seen the basic representation of both LSTM and RNN.

350
00:20:23,000 --> 00:20:27,000
Now in my next video I am going to talk about how RNN works.

351
00:20:27,000 --> 00:20:30,000
I give you an idea that okay, we will be having a long term memory.

352
00:20:30,000 --> 00:20:32,000
Separately, we'll be having a short term memory.

353
00:20:32,000 --> 00:20:37,000
Whichever information I need to store for a longer period of time, I'll put that information in longer

354
00:20:37,000 --> 00:20:38,000
long term memory.

355
00:20:38,000 --> 00:20:40,000
Whichever I do not require it, I'll remove it from the long term memory.

356
00:20:40,000 --> 00:20:41,000
Okay.

357
00:20:41,000 --> 00:20:44,000
And whenever I need to reuse it, I will go ahead and reuse it.

358
00:20:44,000 --> 00:20:44,000
Okay.

359
00:20:44,000 --> 00:20:50,000
So, uh, yes, let's go ahead and discuss about this in the next video where we'll be talking more

360
00:20:50,000 --> 00:20:52,000
about how our LSTM RNN works.

361
00:20:52,000 --> 00:20:52,000
Thank you.

