1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:04,000
So we are going to continue the discussion with respect to LSTM, RNN.

3
00:00:04,000 --> 00:00:08,000
And in this video we are going to discuss about input gate and candidate memory.

4
00:00:08,000 --> 00:00:14,000
Now uh, one thing that I missed out in the previous video is talking about this entire operation.

5
00:00:14,000 --> 00:00:14,000
Right?

6
00:00:14,000 --> 00:00:18,000
When we are calculating f of t, what does this equation basically mean?

7
00:00:18,000 --> 00:00:18,000
Right?

8
00:00:19,000 --> 00:00:23,000
I have explained you each and every thing, uh, with respect to working, but let's go ahead and write

9
00:00:23,000 --> 00:00:24,000
this equation.

10
00:00:24,000 --> 00:00:31,000
So first of all, you'll be able to see that, uh, I am I'm first of all, combining this concatenating

11
00:00:31,000 --> 00:00:34,000
x of t and h t uh, minus one.

12
00:00:34,000 --> 00:00:34,000
Right.

13
00:00:34,000 --> 00:00:41,000
So this way we can go ahead and write like how it is written over here h t minus one comma x of t.

14
00:00:41,000 --> 00:00:43,000
That basically means we are just combining all the inputs.

15
00:00:43,000 --> 00:00:44,000
Right.

16
00:00:44,000 --> 00:00:45,000
Like how we combined over here.

17
00:00:46,000 --> 00:00:52,000
Then once we combine this, then you know that we are just going to have some kind of weights.

18
00:00:52,000 --> 00:00:56,000
So weights included over here is f of f w of f that is actually given over here.

19
00:00:56,000 --> 00:01:01,000
So we are just going to multiply with w of f as my weights okay.

20
00:01:01,000 --> 00:01:05,000
And along with this you'll also be seeing that in my neurons right.

21
00:01:05,000 --> 00:01:08,000
I will also be adding a bias neural network.

22
00:01:08,000 --> 00:01:11,000
So here you will be having plus B of F right.

23
00:01:11,000 --> 00:01:15,000
And then finally you will be able to see that I'm applying a sigmoid activation function.

24
00:01:15,000 --> 00:01:16,000
Right.

25
00:01:16,000 --> 00:01:19,000
So this is how the entire operation has basically taken place.

26
00:01:19,000 --> 00:01:21,000
And this is how the notation looks like.

27
00:01:21,000 --> 00:01:21,000
Okay.

28
00:01:21,000 --> 00:01:26,000
I hope uh, pretty much clear with respect to the explanation I have done, I think it should be very

29
00:01:26,000 --> 00:01:27,000
much easy to understand.

30
00:01:27,000 --> 00:01:32,000
Now let's go ahead and make you understand about the input gate and candidate memory.

31
00:01:32,000 --> 00:01:37,000
Right now, the input gate from this particular diagram is nothing but this specific gate where we are

32
00:01:37,000 --> 00:01:41,000
combining x of t hidden state t minus one, and we are passing it to this particular neural network.

33
00:01:41,000 --> 00:01:42,000
Okay.

34
00:01:42,000 --> 00:01:47,000
Now here when we are passing to this particular neural network, it is same thing like how we have actually

35
00:01:47,000 --> 00:01:49,000
done over here, right.

36
00:01:49,000 --> 00:01:50,000
How we have actually done over here.

37
00:01:50,000 --> 00:01:57,000
And instead of calculating f t, we are specifically getting um, let's, let's consider over here as

38
00:01:57,000 --> 00:01:57,000
I of t.

39
00:01:57,000 --> 00:01:58,000
Okay.

40
00:01:58,000 --> 00:02:00,000
So this is nothing but I of t okay.

41
00:02:00,000 --> 00:02:03,000
So we know how I of t is basically getting calculated.

42
00:02:03,000 --> 00:02:05,000
But let's talk about this tan h right.

43
00:02:05,000 --> 00:02:08,000
Now in this case of tarnish, what will specifically happen.

44
00:02:09,000 --> 00:02:13,000
This I of t let's say that I am actually getting some vectors.

45
00:02:14,000 --> 00:02:20,000
Okay I of t let's say I'm getting a three vectors like two four, six.

46
00:02:21,000 --> 00:02:21,000
Okay.

47
00:02:22,000 --> 00:02:27,000
Now with respect to this I of t over here you can see the operation is almost same like how we did it

48
00:02:27,000 --> 00:02:28,000
for the forget gate.

49
00:02:28,000 --> 00:02:32,000
The similar operation will be over here where I will be having all my inputs.

50
00:02:32,000 --> 00:02:35,000
Let's say three, four, five, six, seven.

51
00:02:35,000 --> 00:02:39,000
So this is my HT of HT minus one.

52
00:02:39,000 --> 00:02:43,000
And this is specifically my x of T okay.

53
00:02:43,000 --> 00:02:50,000
Then the next layer I will just go ahead and pass it to my hidden layer which will have this three neurons

54
00:02:50,000 --> 00:02:51,000
okay.

55
00:02:51,000 --> 00:02:54,000
And here we are also going to add.

56
00:02:54,000 --> 00:02:57,000
So there will be one bias that will be added over here.

57
00:02:57,000 --> 00:03:01,000
Now apart from that we apply a sigmoid activation function on top of it.

58
00:03:01,000 --> 00:03:03,000
If I want to just go ahead and apply it.

59
00:03:03,000 --> 00:03:06,000
So here I'm going to apply a sigmoid activation function.

60
00:03:06,000 --> 00:03:11,000
And finally I get my output which is nothing but I of t okay I of t.

61
00:03:12,000 --> 00:03:16,000
Now let's go ahead and do the same thing for this candidate memory also.

62
00:03:16,000 --> 00:03:20,000
Now in candidate memory what we do is that again the same way.

63
00:03:20,000 --> 00:03:23,000
First of all, we will take our HT minus one.

64
00:03:23,000 --> 00:03:25,000
Let's say it is a three dimension.

65
00:03:25,000 --> 00:03:27,000
Then I have XT minus one.

66
00:03:27,000 --> 00:03:30,000
Let's say this is sorry XT.

67
00:03:30,000 --> 00:03:32,000
It is four dimension.

68
00:03:32,000 --> 00:03:39,000
We take this entire thing and here this time we are using a hidden layer inside this hidden layer.

69
00:03:39,000 --> 00:03:41,000
Let's consider that I'm using three hidden neuron.

70
00:03:41,000 --> 00:03:47,000
After this I pass to an activation function which is called as tan h okay.

71
00:03:47,000 --> 00:03:50,000
And here we add a bias okay.

72
00:03:50,000 --> 00:03:53,000
In short we are just combining each and everything.

73
00:03:53,000 --> 00:03:57,000
So this will basically get combined to this this this.

74
00:03:57,000 --> 00:04:02,000
And you can further draw this diagram I'll leave it up to you okay.

75
00:04:02,000 --> 00:04:09,000
Let's say I am just drawing some of the diagram so that you face this thing right now here also, and

76
00:04:09,000 --> 00:04:11,000
just connect all the other lines, okay.

77
00:04:11,000 --> 00:04:14,000
Here also, you'll be able to see that I'm passing one cross seven.

78
00:04:14,000 --> 00:04:18,000
This will be uh, one cross seven.

79
00:04:18,000 --> 00:04:22,000
And this will also be something like uh seven cross three.

80
00:04:22,000 --> 00:04:27,000
So the final one that I'm actually getting is one cross three after I apply a tan H.

81
00:04:27,000 --> 00:04:34,000
So here also I am basically going to get my C of T, which will again be a three dimensional vector

82
00:04:34,000 --> 00:04:37,000
because this is what I'm actually getting one cross three.

83
00:04:37,000 --> 00:04:40,000
And let's say that I am getting some values like 010.

84
00:04:40,000 --> 00:04:41,000
So again three dimension vectors.

85
00:04:42,000 --> 00:04:48,000
Now you need to understand whenever we take this input gate and along with this candidate memory we

86
00:04:48,000 --> 00:04:50,000
have to combine this together.

87
00:04:50,000 --> 00:04:50,000
Right.

88
00:04:50,000 --> 00:04:57,000
So whatever output I am getting from it uh it and CT, we are performing an another operation which

89
00:04:57,000 --> 00:04:59,000
is called as pointwise multiplication operation.

90
00:05:00,000 --> 00:05:00,000
Okay.

91
00:05:00,000 --> 00:05:04,000
We are performing this pointwise multiplication operation which is basically mentioned over here.

92
00:05:05,000 --> 00:05:05,000
Right.

93
00:05:05,000 --> 00:05:08,000
So this is my pointwise multiplication operation.

94
00:05:09,000 --> 00:05:11,000
Now with respect to the pointwise multiplication operation.

95
00:05:11,000 --> 00:05:14,000
Why do we say this as input gate.

96
00:05:14,000 --> 00:05:19,000
This is called as an input gate because from here based on the context again.

97
00:05:19,000 --> 00:05:29,000
So here I'm just going to write it down based on the context, if any new information needed to be added.

98
00:05:31,000 --> 00:05:39,000
Vision needed to be added to be added in the memory cell.

99
00:05:39,000 --> 00:05:43,000
In the memory cell c t minus one.

100
00:05:44,000 --> 00:05:49,000
Then after this point operation, the information will get added.

101
00:05:50,000 --> 00:05:54,000
The information will be added.

102
00:05:56,000 --> 00:05:58,000
So in short, forget gate.

103
00:05:58,000 --> 00:06:04,000
If you remember, forget gate the first gate when we are trying to do the multiplication operation over

104
00:06:04,000 --> 00:06:04,000
here.

105
00:06:04,000 --> 00:06:05,000
Pointwise multiplication.

106
00:06:05,000 --> 00:06:07,000
We are forgetting some information.

107
00:06:07,000 --> 00:06:10,000
And from this specific gate we are.

108
00:06:11,000 --> 00:06:13,000
We are adding information.

109
00:06:14,000 --> 00:06:19,000
And that is where I get my final cell state that is called as C of t.

110
00:06:19,000 --> 00:06:22,000
This is what we get as a final state C of T.

111
00:06:22,000 --> 00:06:23,000
Okay.

112
00:06:23,000 --> 00:06:30,000
So this is uh, more about a basic information of, uh, input and candidate memory.

113
00:06:30,000 --> 00:06:35,000
You know, uh, again, uh, to explain you, it's very much simple.

114
00:06:35,000 --> 00:06:40,000
When I do the point wise operation, you know, so here I'm saying, hey, there is some new information

115
00:06:40,000 --> 00:06:42,000
that I really want to probably add it up.

116
00:06:42,000 --> 00:06:46,000
I can add it up over here with along with this particular information in my forget gate.

117
00:06:46,000 --> 00:06:51,000
I know just by using this forget gate, I can remove some of the information and that is the entire

118
00:06:51,000 --> 00:06:54,000
context of using input gate and candidate memory gate.

119
00:06:54,000 --> 00:06:55,000
Right?

120
00:06:55,000 --> 00:07:00,000
So, uh, I hope you are able to understand this thing and you're able to make sense.

121
00:07:00,000 --> 00:07:01,000
Okay.

122
00:07:01,000 --> 00:07:03,000
So let's talk about these two operations.

123
00:07:03,000 --> 00:07:10,000
So first of all, uh, you can see with respect to uh x of HT minus one and with respect to sigmoid.

124
00:07:10,000 --> 00:07:12,000
So we are using this one right.

125
00:07:13,000 --> 00:07:15,000
This is basically calculating I of t.

126
00:07:15,000 --> 00:07:21,000
Then you have uh we are combining see for this we are using another weight w of I.

127
00:07:21,000 --> 00:07:26,000
And for this I'm using another weight w of CT okay w of c.

128
00:07:26,000 --> 00:07:30,000
So for the tanh operation I'm specifically using WFC.

129
00:07:30,000 --> 00:07:32,000
So here you can see WFC with HT minus one.

130
00:07:32,000 --> 00:07:33,000
With x we are combining it.

131
00:07:33,000 --> 00:07:36,000
And then for that we are using another bias.

132
00:07:36,000 --> 00:07:37,000
So this is also another neural network.

133
00:07:37,000 --> 00:07:38,000
This is also other neuron.

134
00:07:38,000 --> 00:07:40,000
So that is the reason I created this.

135
00:07:40,000 --> 00:07:41,000
Write this here.

136
00:07:41,000 --> 00:07:47,000
My weights that I have will be w of I here.

137
00:07:47,000 --> 00:07:51,000
The weights that I have is nothing but WFC and I'm combining them together.

138
00:07:51,000 --> 00:07:54,000
And finally I'm getting my c uh c of t okay.

139
00:07:54,000 --> 00:08:00,000
But once we do this dot operation of both of them, uh, it's just like the, uh, the input and the

140
00:08:00,000 --> 00:08:03,000
candidate memory that I'm specifically having with this operation.

141
00:08:03,000 --> 00:08:06,000
Any new information that I want to add.

142
00:08:06,000 --> 00:08:07,000
Right.

143
00:08:07,000 --> 00:08:11,000
You need to understand with respect to the candidate memory, what important thing it is doing.

144
00:08:11,000 --> 00:08:14,000
Any new information based on the context that needed to be added.

145
00:08:14,000 --> 00:08:17,000
We will just try to do a point wise operation over here.

146
00:08:17,000 --> 00:08:17,000
Right?

147
00:08:17,000 --> 00:08:22,000
So let's say if CD has this important context in over here.

148
00:08:22,000 --> 00:08:22,000
Right.

149
00:08:22,000 --> 00:08:24,000
So I may probably put some new information.

150
00:08:24,000 --> 00:08:26,000
I may go ahead and write 020.

151
00:08:26,000 --> 00:08:31,000
And when we do this dot operation here, you can actually see that this will become zero, this will

152
00:08:31,000 --> 00:08:33,000
become eight and this will become zero.

153
00:08:33,000 --> 00:08:39,000
So new information is basically getting added, uh, in my CT minus one cell state along with the previous

154
00:08:39,000 --> 00:08:41,000
state, and we are adding it up.

155
00:08:41,000 --> 00:08:42,000
This is also very much important.

156
00:08:42,000 --> 00:08:43,000
We are adding it up.

157
00:08:43,000 --> 00:08:44,000
Okay.

158
00:08:45,000 --> 00:08:46,000
So let's do one thing.

159
00:08:46,000 --> 00:08:48,000
Let's discuss about both this.

160
00:08:48,000 --> 00:08:52,000
The combination of forget gate and input and uh, candidate memory.

161
00:08:52,000 --> 00:08:59,000
So here we saw that uh, with respect to this particular information that I had right over here, I

162
00:08:59,000 --> 00:09:00,000
had.

163
00:09:00,000 --> 00:09:06,000
So first of all, if I want to get to this final memory cell state C of T, first of all, we will multiply

164
00:09:06,000 --> 00:09:08,000
f of t multiplied by c t minus one.

165
00:09:09,000 --> 00:09:12,000
So here we are removing.

166
00:09:14,000 --> 00:09:22,000
Or forgetting some information if it is, if the context is switching.

167
00:09:22,000 --> 00:09:26,000
Otherwise we'll just make the cell state pass in that way.

168
00:09:26,000 --> 00:09:31,000
Along with that, when we multiply I of t and c of T that we are doing a point wise operation.

169
00:09:31,000 --> 00:09:36,000
So this c of T is nothing, but it is a it is a what kind of cell it is.

170
00:09:36,000 --> 00:09:38,000
Basically it is a candidate memory.

171
00:09:41,000 --> 00:09:43,000
It is nothing, but it is a candidate memory.

172
00:09:43,000 --> 00:09:47,000
And the task of the candidate memory is to add new information.

173
00:09:47,000 --> 00:09:47,000
Right?

174
00:09:48,000 --> 00:09:51,000
Add new information to this particular cell state.

175
00:09:51,000 --> 00:09:52,000
And here you can see point wise operation.

176
00:09:52,000 --> 00:09:54,000
We are just adding it up.

177
00:09:54,000 --> 00:09:57,000
And finally we get a new cell state that is C of T.

178
00:09:57,000 --> 00:09:58,000
Okay.

179
00:09:58,000 --> 00:10:05,000
So I hope you got an idea with respect to the input gate and candidate memory gate.

180
00:10:05,000 --> 00:10:12,000
So if I take a very good example, let's say I go ahead and write uh, I stay in India.

181
00:10:13,000 --> 00:10:14,000
Okay.

182
00:10:14,000 --> 00:10:19,000
Some sentence and I speak dash.

183
00:10:19,000 --> 00:10:20,000
Okay.

184
00:10:20,000 --> 00:10:25,000
So if this kind of sentence, I want to, uh, probably, uh, find out what is the next word.

185
00:10:25,000 --> 00:10:27,000
So all the words will be going line by line.

186
00:10:27,000 --> 00:10:28,000
Right.

187
00:10:28,000 --> 00:10:33,000
But here you can see we will be saving this Indian context in the memory cell Right?

188
00:10:33,000 --> 00:10:36,000
When the context switches, we may be forgetting it.

189
00:10:36,000 --> 00:10:41,000
We may forget some other information, but for a longer term, we will be keeping this particular information

190
00:10:41,000 --> 00:10:44,000
over there to predict my next language.

191
00:10:44,000 --> 00:10:44,000
And obviously.

192
00:10:44,000 --> 00:10:50,000
But just by seeing this context, my neural network will be able to predict, okay, I do speak Hindi

193
00:10:50,000 --> 00:10:56,000
or English or anything as such, but right now this particular prediction is based on my this context.

194
00:10:56,000 --> 00:10:58,000
That is which country I'm actually staying.

195
00:10:58,000 --> 00:10:58,000
Okay.

196
00:10:59,000 --> 00:11:02,000
So I hope you are able to understand it.

197
00:11:02,000 --> 00:11:04,000
I hope you are able to understand this entire equation.

198
00:11:04,000 --> 00:11:09,000
I hope you are able to understand the entire working right to simply give you.

199
00:11:09,000 --> 00:11:16,000
Till now we have discussed about forget gate and we have discussed about input gate, input gate plus

200
00:11:16,000 --> 00:11:17,000
candidate memory.

201
00:11:19,000 --> 00:11:23,000
Candidate memory when we do a point wise operation over here.

202
00:11:24,000 --> 00:11:25,000
Candidate memory.

203
00:11:25,000 --> 00:11:27,000
Whence we do the point wise operation.

204
00:11:27,000 --> 00:11:34,000
And then we do the addition operation with CT minus one to finally get C of D, which is my final cell

205
00:11:34,000 --> 00:11:35,000
state.

206
00:11:35,000 --> 00:11:35,000
Memory cell state.

207
00:11:36,000 --> 00:11:38,000
So yes, this was it for my side.

208
00:11:38,000 --> 00:11:42,000
In this video we have discussed about input and candidate memory along with the explanation along with

209
00:11:42,000 --> 00:11:43,000
mathematical intuition.

210
00:11:43,000 --> 00:11:44,000
This was it for my side.

211
00:11:44,000 --> 00:11:46,000
I will see you in the next video.

212
00:11:46,000 --> 00:11:46,000
Thank you.

