1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:04,000
So we are going to continue the discussion with respect to LSTM RNN.

3
00:00:04,000 --> 00:00:09,000
Uh, now let's go ahead and understand about the entire architecture initially.

4
00:00:09,000 --> 00:00:15,000
Then we will break down this into topic by topic and try to understand that how does an LSTM RNN work?

5
00:00:15,000 --> 00:00:21,000
Okay, so the basic architecture representation which we saw in the previous video was this one.

6
00:00:21,000 --> 00:00:21,000
Okay.

7
00:00:21,000 --> 00:00:28,000
And uh here you could see that okay I have this x of T x of T is nothing, but it is the inputs which

8
00:00:28,000 --> 00:00:29,000
is passed with respect to time.

9
00:00:29,000 --> 00:00:30,000
Like t is equal to one.

10
00:00:30,000 --> 00:00:32,000
One word is passed, t is equal to two other word is passed.

11
00:00:33,000 --> 00:00:37,000
Uh, if I probably expand this architecture.

12
00:00:37,000 --> 00:00:39,000
So we go to this particular next stage.

13
00:00:39,000 --> 00:00:47,000
The entire LSTM RNN has this three important gates okay.

14
00:00:47,000 --> 00:00:49,000
One is the forget gate.

15
00:00:49,000 --> 00:00:50,000
Okay.

16
00:00:50,000 --> 00:00:55,000
Now if you tell me which part is basically the forget gate, I will probably say from here to here.

17
00:00:55,000 --> 00:00:59,000
So if I divide this into three layers, let's say this is my first layer.

18
00:01:00,000 --> 00:01:00,000
Okay.

19
00:01:01,000 --> 00:01:03,000
I'll not say layer but first part.

20
00:01:03,000 --> 00:01:03,000
Okay.

21
00:01:03,000 --> 00:01:06,000
So this entire part is all about forget gate.

22
00:01:06,000 --> 00:01:13,000
Then the next important gates that I have is something called as input gate and candidate memory.

23
00:01:13,000 --> 00:01:13,000
Okay.

24
00:01:13,000 --> 00:01:14,000
So this is my second part.

25
00:01:14,000 --> 00:01:17,000
And the third part is basically output gate.

26
00:01:17,000 --> 00:01:24,000
So a single hidden neuron right a single hidden layer neurons with respect to our LSTM, RNN has this

27
00:01:24,000 --> 00:01:26,000
three important gates.

28
00:01:26,000 --> 00:01:27,000
One is the forget gate.

29
00:01:27,000 --> 00:01:30,000
Then you have input and candidate gate uh candidate memory.

30
00:01:30,000 --> 00:01:32,000
And then you have this output gate okay.

31
00:01:33,000 --> 00:01:36,000
Now let's go ahead and discuss more about this.

32
00:01:36,000 --> 00:01:39,000
So here uh let's discuss about this entire parameters okay.

33
00:01:39,000 --> 00:01:47,000
So here in the first instance, I am giving my, uh, input over here x of T again with respect to timestamp,

34
00:01:47,000 --> 00:01:53,000
I will be giving this t of minus one is the hidden state of the previous, uh, hidden neuron, right?

35
00:01:53,000 --> 00:01:57,000
Uh, when I say previous hidden neuron, let's consider this as my neuron.

36
00:01:57,000 --> 00:01:57,000
Right.

37
00:01:58,000 --> 00:02:03,000
So, you know, with respect to neuron, uh, in our case, right.

38
00:02:03,000 --> 00:02:06,000
A simple RNN provides one feedback loop, right?

39
00:02:07,000 --> 00:02:13,000
But in the case of LSTM RNN, we will also have another feedback loop which will act as a long term

40
00:02:13,000 --> 00:02:14,000
memory.

41
00:02:14,000 --> 00:02:20,000
This is basically a short term memory, short term memory.

42
00:02:21,000 --> 00:02:23,000
And this is nothing.

43
00:02:23,000 --> 00:02:25,000
But this is a long term memory.

44
00:02:26,000 --> 00:02:28,000
Long term memory.

45
00:02:29,000 --> 00:02:34,000
So when I say HT minus one, this is nothing, but it is the hidden state with respect to short term

46
00:02:34,000 --> 00:02:37,000
memory of the previous previous timestamp.

47
00:02:37,000 --> 00:02:38,000
Okay.

48
00:02:38,000 --> 00:02:41,000
So here you can see I'm giving input x.

49
00:02:41,000 --> 00:02:43,000
Then I'm giving HT minus one HT of minus one.

50
00:02:43,000 --> 00:02:46,000
That is nothing but the previous hidden state.

51
00:02:46,000 --> 00:02:47,000
Then these are both combined.

52
00:02:47,000 --> 00:02:50,000
Then we pass it to the sigmoid activation function.

53
00:02:50,000 --> 00:02:52,000
And all this kind of operations will happen.

54
00:02:52,000 --> 00:02:54,000
Okay, we'll discuss one by one and what exactly it is.

55
00:02:54,000 --> 00:02:54,000
Okay.

56
00:02:55,000 --> 00:03:00,000
Now, before this, uh, uh, with respect to any diagrams that we see, right, there are some notation

57
00:03:00,000 --> 00:03:02,000
that is specifically used over here.

58
00:03:02,000 --> 00:03:04,000
So let's consider this okay.

59
00:03:04,000 --> 00:03:07,000
This is basically a neural network I will talk about.

60
00:03:07,000 --> 00:03:11,000
See whenever you find out something like this here sigmoid is there here tanh is there here tan.

61
00:03:11,000 --> 00:03:12,000
Uh here sigmoid is there.

62
00:03:12,000 --> 00:03:16,000
This is basically called as a neural network operator layer.

63
00:03:16,000 --> 00:03:16,000
Okay.

64
00:03:16,000 --> 00:03:19,000
So this is specifically called as a neural network layer.

65
00:03:19,000 --> 00:03:21,000
What does the neural network layer basically mean?

66
00:03:21,000 --> 00:03:22,000
Nothing.

67
00:03:22,000 --> 00:03:29,000
Uh, here you can just consider that it is one of my hidden neuron or hidden layer neuron, which has

68
00:03:29,000 --> 00:03:30,000
some neurons over here.

69
00:03:30,000 --> 00:03:35,000
And in all these neurons we have applied a sigmoid activation function.

70
00:03:35,000 --> 00:03:38,000
So I have just applied a sigmoid activation function.

71
00:03:38,000 --> 00:03:40,000
And here we are getting giving the input okay.

72
00:03:40,000 --> 00:03:43,000
So if I probably consider this as an example.

73
00:03:43,000 --> 00:03:46,000
So here I'm taking the input x of t.

74
00:03:46,000 --> 00:03:48,000
And there is also one more input ht minus one.

75
00:03:48,000 --> 00:03:49,000
We are giving both of them.

76
00:03:49,000 --> 00:03:53,000
First of all we are combining this and then we are giving it okay.

77
00:03:53,000 --> 00:04:00,000
So if I probably want to write this particular diagram in a better way so I can say, hey, uh, this

78
00:04:00,000 --> 00:04:05,000
is my HT minus one, this is my X of T, I will take this input.

79
00:04:05,000 --> 00:04:09,000
I'll combine it together and then I will send it to my.

80
00:04:11,000 --> 00:04:14,000
I'll send it to my hidden neuron.

81
00:04:15,000 --> 00:04:23,000
I'll send it to my hidden layer which will specifically have this input nodes, uh, this hidden nodes.

82
00:04:23,000 --> 00:04:25,000
And on top of that I'm applying an activation function.

83
00:04:25,000 --> 00:04:28,000
This is what it looks like if I probably expand this.

84
00:04:28,000 --> 00:04:35,000
So over here, this entire thing, when we say we have this kind of notation, right, this is nothing,

85
00:04:35,000 --> 00:04:37,000
but this is this notation.

86
00:04:37,000 --> 00:04:43,000
It is basically a neural network neural network layer.

87
00:04:44,000 --> 00:04:47,000
We will discuss more about it as I keep on expanding.

88
00:04:47,000 --> 00:04:52,000
And once we go ahead and understand each and every gates, we will be able to understand.

89
00:04:52,000 --> 00:04:57,000
Then the second one is that you'll be able to see this pink color operations, the pink color operations

90
00:04:57,000 --> 00:04:58,000
that you have over here.

91
00:04:58,000 --> 00:05:00,000
Let's take that particular color.

92
00:05:00,000 --> 00:05:04,000
I will try to find the pink color over here okay.

93
00:05:05,000 --> 00:05:07,000
So light pink okay.

94
00:05:07,000 --> 00:05:08,000
Dark purple.

95
00:05:09,000 --> 00:05:10,000
Pink.

96
00:05:10,000 --> 00:05:10,000
Pink.

97
00:05:10,000 --> 00:05:11,000
Pink.

98
00:05:11,000 --> 00:05:11,000
Pink.

99
00:05:11,000 --> 00:05:11,000
Pink.

100
00:05:11,000 --> 00:05:12,000
Pink.

101
00:05:12,000 --> 00:05:13,000
Pink pink pink okay.

102
00:05:13,000 --> 00:05:15,000
Let's go ahead and do this.

103
00:05:15,000 --> 00:05:18,000
So let's say there is one operation which is like this.

104
00:05:19,000 --> 00:05:20,000
This is nothing.

105
00:05:20,000 --> 00:05:23,000
But this is called as a point wise multiplication operation.

106
00:05:23,000 --> 00:05:28,000
And if I have this kind of operation, this is nothing but point wise addition operation.

107
00:05:28,000 --> 00:05:28,000
Okay.

108
00:05:28,000 --> 00:05:34,000
And one more that you will be able to see that something called as tan H.

109
00:05:35,000 --> 00:05:37,000
So this is nothing but point wise tan H operation.

110
00:05:37,000 --> 00:05:39,000
Now what does this basically mean?

111
00:05:39,000 --> 00:05:46,000
Let's say if I have two vectors my one vector is one, two, three and my another vector is four, five,

112
00:05:46,000 --> 00:05:47,000
six.

113
00:05:48,000 --> 00:05:53,000
Let's consider that this is my x of T, or this is my t of minus one.

114
00:05:53,000 --> 00:05:53,000
Or let.

115
00:05:53,000 --> 00:05:56,000
Let's just consider this v one and v two vectors.

116
00:05:57,000 --> 00:06:03,000
If I say point wise multiplication, that basically means we are just going to take each and every number

117
00:06:04,000 --> 00:06:06,000
position wise, and we are just going to multiply it.

118
00:06:07,000 --> 00:06:09,000
This is basically called as point wise multiplication.

119
00:06:10,000 --> 00:06:17,000
So if I say point wise multiplication, if I perform this this uh, here I am specifically going to

120
00:06:17,000 --> 00:06:26,000
get, let's say if I probably say this, I'm basically going to get one vector, which will be a multiplication

121
00:06:26,000 --> 00:06:31,000
of one multiplied by four, two multiplied by five, and six multiplied by three.

122
00:06:31,000 --> 00:06:34,000
Okay, so this is nothing, but this is a 18.

123
00:06:34,000 --> 00:06:42,000
If I talk about point wise addition, that basically means my output will be nothing but four plus one,

124
00:06:42,000 --> 00:06:45,000
five, five plus two, seven, six plus three nine.

125
00:06:45,000 --> 00:06:51,000
And if I probably apply tan H as my point wise operation.

126
00:06:51,000 --> 00:06:59,000
So here you'll be able to see I will just go ahead and, you know, we'll take one vector, let's say

127
00:06:59,000 --> 00:07:01,000
for this particular vector I'm applying tan h.

128
00:07:02,000 --> 00:07:05,000
So in short we are doing tan h of one.

129
00:07:05,000 --> 00:07:08,000
Then we have tan h of two.

130
00:07:08,000 --> 00:07:10,000
And then we will go ahead and calculate an h of three.

131
00:07:10,000 --> 00:07:12,000
So this is what we are specifically doing.

132
00:07:12,000 --> 00:07:13,000
And this is nothing.

133
00:07:13,000 --> 00:07:16,000
But it is basically called as a point wise operation.

134
00:07:16,000 --> 00:07:16,000
Okay.

135
00:07:16,000 --> 00:07:21,000
So I hope you got an idea with respect to point wise operation, any arrows that you will be seeing

136
00:07:21,000 --> 00:07:22,000
over here, this is nothing.

137
00:07:22,000 --> 00:07:24,000
But this is called as vector transfer.

138
00:07:24,000 --> 00:07:25,000
We are just transferring the vector.

139
00:07:25,000 --> 00:07:30,000
So let's say in this line we are just transferring the vector from one state to the other state okay.

140
00:07:30,000 --> 00:07:34,000
Then similarly you'll be able to see concatenate concatenate.

141
00:07:34,000 --> 00:07:36,000
What does concatenate mean.

142
00:07:36,000 --> 00:07:38,000
Uh so let's let's take this as an example okay.

143
00:07:38,000 --> 00:07:46,000
So here you can see I, I have, I have this scenario where h t minus one and x of t is there.

144
00:07:46,000 --> 00:07:48,000
And these are basically getting concatenated.

145
00:07:48,000 --> 00:07:56,000
Concatenated is like combining combining two vectors, combining two vectors.

146
00:07:56,000 --> 00:08:02,000
It's not like adding two vectors, but I'm trying to combine let's say my h t minus one.

147
00:08:02,000 --> 00:08:11,000
It is represented by a three dimensional vector like this 123 and my x of T is a three dimensional vector

148
00:08:11,000 --> 00:08:12,000
two, three, four.

149
00:08:13,000 --> 00:08:18,000
Now if I say I'm combining, that basically means see, this is what, in short, is basically happening

150
00:08:18,000 --> 00:08:25,000
when I'm combining, uh, in short, what I am doing, I will just combine this three vectors like this.

151
00:08:25,000 --> 00:08:27,000
So this will be my HT minus one.

152
00:08:27,000 --> 00:08:28,000
Okay.

153
00:08:29,000 --> 00:08:35,000
Along with this I will go ahead and use my another vector x of T, right.

154
00:08:35,000 --> 00:08:40,000
And then I will send this as an input to any other neurons.

155
00:08:40,000 --> 00:08:40,000
Right.

156
00:08:40,000 --> 00:08:42,000
So this is what we are specifically doing.

157
00:08:42,000 --> 00:08:46,000
So when we say combining we are we are combining them together.

158
00:08:46,000 --> 00:08:48,000
And then we are sending it to the hidden layer.

159
00:08:48,000 --> 00:08:51,000
Let's say this is my hidden layer okay.

160
00:08:51,000 --> 00:08:52,000
This is what we are trying to do okay.

161
00:08:52,000 --> 00:08:55,000
When we say combining two vectors right.

162
00:08:55,000 --> 00:08:57,000
And that is nothing but concatenate okay.

163
00:08:57,000 --> 00:09:01,000
Now similarly copy basically means we can copy one vector to the other.

164
00:09:01,000 --> 00:09:03,000
Uh, like we like let's say there is one vector.

165
00:09:03,000 --> 00:09:09,000
Then we can also make a duplicate copy of that duplicate copy okay.

166
00:09:09,000 --> 00:09:12,000
So these are some of the operations that we will be seeing.

167
00:09:12,000 --> 00:09:15,000
Uh with respect to the LSTM architecture okay.

168
00:09:15,000 --> 00:09:20,000
Now let's go ahead and let's first of all discuss about some of the important points okay.

169
00:09:20,000 --> 00:09:20,000
Okay.

170
00:09:20,000 --> 00:09:23,000
Now the first line that you will be seeing on top of it.

171
00:09:23,000 --> 00:09:23,000
Right.

172
00:09:23,000 --> 00:09:29,000
This line, this line is basically called as something called as memory cell.

173
00:09:29,000 --> 00:09:35,000
So here I will just go ahead and write this as memory cell.

174
00:09:35,000 --> 00:09:37,000
Now what does memory cell basically means.

175
00:09:37,000 --> 00:09:47,000
This is for my long term memory Whatever information I want, I will be saving inside this.

176
00:09:47,000 --> 00:09:50,000
Whatever information I don't want, I'll be removing from this.

177
00:09:50,000 --> 00:09:52,000
That is the main task of this memory cell.

178
00:09:52,000 --> 00:09:55,000
Okay, till now, just get to know about this.

179
00:09:55,000 --> 00:10:05,000
CT minus one is nothing but the memory cell of the previous of the previous, um, timestamp and CT

180
00:10:05,000 --> 00:10:06,000
is nothing.

181
00:10:06,000 --> 00:10:13,000
But after we are removing some context, after we are removing some context and adding some context

182
00:10:14,000 --> 00:10:20,000
and adding some context after that, whatever state of this memory cell will be, it will be C of T

183
00:10:20,000 --> 00:10:22,000
for that particular timestamp.

184
00:10:22,000 --> 00:10:22,000
Okay.

185
00:10:23,000 --> 00:10:26,000
So I hope you are able to understand this basic things.

186
00:10:26,000 --> 00:10:32,000
Now it's time to understand how this forget gate works, how this input and candidate memory gate works,

187
00:10:32,000 --> 00:10:36,000
how this output gate works, and how the entire LSTM RNN works.

188
00:10:36,000 --> 00:10:38,000
We'll be discussing about that.

189
00:10:38,000 --> 00:10:46,000
Okay, so the first gate that we are going to discuss about is something called as forget gate.

190
00:10:48,000 --> 00:10:48,000
Okay.

191
00:10:50,000 --> 00:10:53,000
And this is what we are going to discuss in our next video.

192
00:10:53,000 --> 00:10:55,000
So we will take this as an example.

193
00:10:55,000 --> 00:10:57,000
And here I will break down each and every thing.

194
00:10:57,000 --> 00:10:59,000
How does I forget get work.

195
00:10:59,000 --> 00:11:03,000
So in short in our next video we will be discussing about this stage.

196
00:11:04,000 --> 00:11:04,000
Right?

197
00:11:04,000 --> 00:11:10,000
This stage which is highlighted over here and here also, you should be able to see that I am giving

198
00:11:10,000 --> 00:11:11,000
this information.

199
00:11:11,000 --> 00:11:13,000
This should also be considered okay.

200
00:11:13,000 --> 00:11:16,000
This entire operation should be considered and this is my CT minus one.

201
00:11:17,000 --> 00:11:17,000
Okay.

202
00:11:17,000 --> 00:11:20,000
As I said, HT minus one is the hidden state.

203
00:11:20,000 --> 00:11:23,000
So let me just go ahead and write some of the notation.

204
00:11:23,000 --> 00:11:32,000
HT minus one is the hidden state of previous timestamp.

205
00:11:33,000 --> 00:11:34,000
Let's say t is equal to one.

206
00:11:34,000 --> 00:11:37,000
And right now we are working in t is equal to two.

207
00:11:37,000 --> 00:11:47,000
X of t is nothing but word passed as input in the current timestamp.

208
00:11:50,000 --> 00:11:52,000
Current timestamp okay.

209
00:11:52,000 --> 00:11:54,000
And here we are concatenating it okay.

210
00:11:54,000 --> 00:11:58,000
Now let's go ahead and discuss about this forget gate in our next video.

211
00:11:59,000 --> 00:12:05,000
But till here I think you got a basic idea about the entire LSTM architecture and some of the important

212
00:12:05,000 --> 00:12:07,000
points which is basically mentioned in this diagram.

213
00:12:07,000 --> 00:12:09,000
So yes, this was it from my side.

214
00:12:09,000 --> 00:12:10,000
I'll see you in the next video.

215
00:12:10,000 --> 00:12:10,000
Thank you.

