1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue the discussion with respect to LSTM RNN.

3
00:00:03,000 --> 00:00:06,000
Now in this video we are going to discuss about output gate.

4
00:00:06,000 --> 00:00:12,000
Now output gate is the third important module in this LSTM RNN, specifically in the architecture.

5
00:00:12,000 --> 00:00:18,000
And if I probably see with respect to this diagram, this is the part that is related to output gate

6
00:00:18,000 --> 00:00:19,000
over here.

7
00:00:19,000 --> 00:00:21,000
So we are going to discuss about the working of this okay.

8
00:00:22,000 --> 00:00:26,000
Already uh we discussed about forward forget gate and forget gate.

9
00:00:26,000 --> 00:00:31,000
Our main intention was to uh forget some information.

10
00:00:32,000 --> 00:00:37,000
Information based on context from the memory cell.

11
00:00:37,000 --> 00:00:42,000
So all the operation that we are actually doing, it is actually leading to that specific thing.

12
00:00:43,000 --> 00:00:50,000
Now, uh, in the second one, which was specifically in my input gate, our main aim was aim was to

13
00:00:50,000 --> 00:00:53,000
add information based on the context.

14
00:00:54,000 --> 00:00:54,000
Okay.

15
00:00:54,000 --> 00:00:57,000
Add information based on the context.

16
00:00:57,000 --> 00:01:01,000
Now let's go ahead and discuss about the third output gate.

17
00:01:01,000 --> 00:01:03,000
That is the third component over here.

18
00:01:03,000 --> 00:01:05,000
Now there are two things.

19
00:01:05,000 --> 00:01:09,000
One is again the same operation which we have with respect to sigmoid.

20
00:01:09,000 --> 00:01:13,000
Here we are passing x of t dt minus one and we are adding a bias.

21
00:01:13,000 --> 00:01:17,000
So here we specifically we will be using b0 as our bias.

22
00:01:17,000 --> 00:01:19,000
And we finally do the sigmoid activation function.

23
00:01:19,000 --> 00:01:25,000
So if I consider this this is nothing but it is my neural network right?

24
00:01:25,000 --> 00:01:27,000
It is the hidden neural network.

25
00:01:28,000 --> 00:01:29,000
Perfect.

26
00:01:30,000 --> 00:01:38,000
Now after doing this specific operation, whatever information that I have, you know, after I forget

27
00:01:38,000 --> 00:01:43,000
some of the information and then add some information from the input gate through this candidate memory.

28
00:01:43,000 --> 00:01:47,000
From that we will be passing one information that is called a CT.

29
00:01:48,000 --> 00:01:50,000
Now this CT that you'll be able to see.

30
00:01:50,000 --> 00:01:52,000
This is my memory cell, right?

31
00:01:52,000 --> 00:01:54,000
This is the information in my memory cell.

32
00:01:55,000 --> 00:01:57,000
Now this will continue on right.

33
00:01:57,000 --> 00:02:00,000
And along with this you should understand one thing right.

34
00:02:00,000 --> 00:02:02,000
So see let me just go ahead and discuss this.

35
00:02:02,000 --> 00:02:06,000
So this is my CT and this is my PT okay.

36
00:02:06,000 --> 00:02:13,000
Whenever I talk about CT this is my long term memory.

37
00:02:14,000 --> 00:02:16,000
Long term memory.

38
00:02:16,000 --> 00:02:23,000
Now in the output gate what we do we again do this tan h operation or tan h uh operation with respect

39
00:02:23,000 --> 00:02:25,000
to the output that we get from here.

40
00:02:25,000 --> 00:02:32,000
And we apply on each and every element like point wise, point wise operation is specifically done to

41
00:02:32,000 --> 00:02:34,000
this sigmoid output, right?

42
00:02:34,000 --> 00:02:40,000
That output that I have with respect to O of T by doing this, what will happen is that whatever output

43
00:02:40,000 --> 00:02:46,000
is probably going to come from here, this will be retained in my h of T.

44
00:02:46,000 --> 00:02:49,000
H of t is nothing but my hidden state.

45
00:02:49,000 --> 00:02:54,000
And whenever I talk about hidden state, I'm actually talking about my short term memory.

46
00:02:54,000 --> 00:02:55,000
Right.

47
00:02:55,000 --> 00:02:57,000
Short term memory.

48
00:02:57,000 --> 00:03:04,000
And this short term memory will be related to the just the current context or one context before it.

49
00:03:04,000 --> 00:03:05,000
Right.

50
00:03:05,000 --> 00:03:10,000
So this operation is specifically done to get the output one.

51
00:03:10,000 --> 00:03:14,000
Once we get this particular output we will be sending it, uh, you know, we'll be sending this h of

52
00:03:14,000 --> 00:03:19,000
t which denotes our short term memory, to my next timestamp.

53
00:03:19,000 --> 00:03:20,000
Right.

54
00:03:20,000 --> 00:03:25,000
And similarly, with respect to the operation that we are doing here, the point operation, uh, point

55
00:03:25,000 --> 00:03:29,000
wise operation from CT minus one, I will be getting CT, right.

56
00:03:29,000 --> 00:03:31,000
So this will basically be my long term memory.

57
00:03:31,000 --> 00:03:37,000
And this will be my short term memory which will be passed to my next layer right.

58
00:03:37,000 --> 00:03:38,000
Next timestamp.

59
00:03:38,000 --> 00:03:45,000
And if I say this, uh, this line also probably I can just go ahead and connect this over here.

60
00:03:45,000 --> 00:03:45,000
Right.

61
00:03:45,000 --> 00:03:47,000
This is the h t itself, right.

62
00:03:47,000 --> 00:03:48,000
I can use this also.

63
00:03:48,000 --> 00:03:49,000
I can use this also.

64
00:03:50,000 --> 00:03:53,000
So uh, that is the entire operation with respect to output.

65
00:03:53,000 --> 00:03:59,000
Now if I go ahead and see over here, here you can see I'm using sigmoid of w of o w of o is the weight

66
00:03:59,000 --> 00:04:00,000
that is assigned over here.

67
00:04:00,000 --> 00:04:05,000
Then I have h t minus 1XT plus b zero.

68
00:04:05,000 --> 00:04:07,000
And then I'm performing the same sigmoid function.

69
00:04:07,000 --> 00:04:09,000
I am getting the odd value.

70
00:04:09,000 --> 00:04:16,000
Then when I do point wise operation with tanh over here and c of t is nothing but uh, C of T is this

71
00:04:16,000 --> 00:04:18,000
specific value that I have.

72
00:04:18,000 --> 00:04:23,000
Then through this I will be able to get the context that is available in the short term memory.

73
00:04:24,000 --> 00:04:26,000
And then we continue this process.

74
00:04:26,000 --> 00:04:27,000
Okay.

75
00:04:27,000 --> 00:04:31,000
Now you can just imagine like this, right.

76
00:04:31,000 --> 00:04:35,000
Uh, let's say this is my LSTM.

77
00:04:36,000 --> 00:04:40,000
This is my LSTM in T minus one.

78
00:04:40,000 --> 00:04:44,000
Then in t t is equal to two.

79
00:04:44,000 --> 00:04:47,000
Sorry I'll say one and t is equal to two.

80
00:04:47,000 --> 00:04:49,000
I will be having another LSTM and t is equal to three.

81
00:04:49,000 --> 00:04:53,000
I will be basically providing this particular loop right.

82
00:04:53,000 --> 00:04:54,000
So this will keep on going on.

83
00:04:54,000 --> 00:04:55,000
Right.

84
00:04:55,000 --> 00:05:01,000
And finally you'll be able to see that once I get the output, I may pass this to my sigmoid function

85
00:05:01,000 --> 00:05:04,000
based on the kind of output that I really want Y hat.

86
00:05:04,000 --> 00:05:05,000
Right.

87
00:05:05,000 --> 00:05:10,000
And this operation will keep on going with two cells.

88
00:05:10,000 --> 00:05:15,000
As I said, one is the long term memory cell and one is the short term memory cell.

89
00:05:15,000 --> 00:05:18,000
So here this is my long term cell.

90
00:05:19,000 --> 00:05:20,000
And this is my short term.

91
00:05:22,000 --> 00:05:23,000
Right.

92
00:05:23,000 --> 00:05:24,000
Memory cell.

93
00:05:24,000 --> 00:05:30,000
And with respect to the information what I really need to forget I will be removing some of the information

94
00:05:30,000 --> 00:05:30,000
from here.

95
00:05:30,000 --> 00:05:32,000
I will be adding some of the information from here.

96
00:05:33,000 --> 00:05:38,000
Again I may see both these things may happen over here right from one neural network.

97
00:05:38,000 --> 00:05:40,000
You know, from sorry, from one hidden layer.

98
00:05:40,000 --> 00:05:42,000
I may add some information.

99
00:05:42,000 --> 00:05:44,000
I may remove some of the information.

100
00:05:45,000 --> 00:05:45,000
Right.

101
00:05:45,000 --> 00:05:49,000
Adding the information basically is done by, uh, the input and memory cell.

102
00:05:49,000 --> 00:05:53,000
And removing some of the information will be done by forget cell right.

103
00:05:53,000 --> 00:05:58,000
Forget gate I'll not say cell but forget gate.

104
00:05:58,000 --> 00:06:05,000
And adding some information will be specifically done by our input gate along with candidate memory.

105
00:06:06,000 --> 00:06:07,000
Right?

106
00:06:07,000 --> 00:06:10,000
So that is how the entire process will be going on the forward.

107
00:06:10,000 --> 00:06:16,000
And the backward propagation will keep on happening unless and until our loss keeps on getting reduced.

108
00:06:16,000 --> 00:06:17,000
Right.

109
00:06:17,000 --> 00:06:19,000
And that is how we make sure that we update all the weights.

110
00:06:19,000 --> 00:06:22,000
Now in this particular case, what are all the weights that we saw.

111
00:06:22,000 --> 00:06:23,000
One was w zero.

112
00:06:23,000 --> 00:06:31,000
Then uh here also uh, we saw some more weights w of I and then you had w of C, right.

113
00:06:31,000 --> 00:06:33,000
So these three weights were specifically there.

114
00:06:33,000 --> 00:06:34,000
Right.

115
00:06:34,000 --> 00:06:40,000
So all these three weights w of I, w of C and w of.

116
00:06:40,000 --> 00:06:40,000
Oh.

117
00:06:40,000 --> 00:06:46,000
So let me just go ahead and do this and write down all the weights that we are updating w of I w of

118
00:06:46,000 --> 00:06:54,000
c, w of 00I is with respect to input, C is with respect to the memory cell and O that you will be

119
00:06:54,000 --> 00:07:00,000
able to see is with respect to the output, which I'm actually getting back my hidden state over here.

120
00:07:00,000 --> 00:07:00,000
Right.

121
00:07:00,000 --> 00:07:04,000
Sorry, not hidden state, but here when I'm putting it last on my.

122
00:07:04,000 --> 00:07:07,000
So this this neural network that we are using over here.

123
00:07:07,000 --> 00:07:07,000
Right.

124
00:07:07,000 --> 00:07:09,000
This will specifically be my W of zero.

125
00:07:09,000 --> 00:07:17,000
So we need to keep on updating this weight with the help of back propagation, back propagation.

126
00:07:18,000 --> 00:07:22,000
But at the end of the day, you will be able to see the memory cell is playing a very important role

127
00:07:22,000 --> 00:07:26,000
because it is keeping some of the information for a longer context.

128
00:07:26,000 --> 00:07:29,000
Uh, whichever needs to be remembered, it will be remembering.

129
00:07:29,000 --> 00:07:30,000
Right?

130
00:07:30,000 --> 00:07:35,000
And uh, let's say if the context is switching and we really want to remove that, the entire training

131
00:07:35,000 --> 00:07:38,000
will be happening in this specific way, right?

132
00:07:38,000 --> 00:07:42,000
So, uh, I hope you are able to understand about the output gate LSTM rnn.

133
00:07:42,000 --> 00:07:50,000
Now, in my next video, I will be talking about one more variant of RNN which is called as GRU RNN.

134
00:07:50,000 --> 00:07:57,000
Okay, so this is just another variant of LSTM, LSTM another variant.

135
00:07:57,000 --> 00:08:01,000
And here uh we will be understanding some important things with respect to this.

136
00:08:01,000 --> 00:08:04,000
What all things will be changing right.

137
00:08:04,000 --> 00:08:07,000
This operations will be changing with respect to the GRU RNA.

138
00:08:07,000 --> 00:08:09,000
So yes, this was it for my side.

139
00:08:09,000 --> 00:08:10,000
I hope you liked this particular video.

140
00:08:10,000 --> 00:08:11,000
I will see you all in the next video.

141
00:08:11,000 --> 00:08:18,000
And, uh, I hope, uh, you're able to understand, but just to give a summary, you have this forward

142
00:08:18,000 --> 00:08:18,000
gate.

143
00:08:18,000 --> 00:08:24,000
Forward gate is, uh, responsible in removing some of the information from this particular memory cell

144
00:08:24,000 --> 00:08:29,000
input and candidate memory is responsible in adding some of the information.

145
00:08:29,000 --> 00:08:37,000
And finally, output gate is used to distinguish properly all the memory cell and the uh, hidden state.

146
00:08:38,000 --> 00:08:43,000
Hidden state which represents the short term cell okay, short term memory.

147
00:08:43,000 --> 00:08:44,000
Or I can also say short term memory.

148
00:08:44,000 --> 00:08:50,000
Not right short term cell, but instead I can basically go ahead and write short term memory.

149
00:08:51,000 --> 00:08:53,000
So I hope are you able to understand it?

150
00:08:53,000 --> 00:08:54,000
I will see you in the next video.

151
00:08:54,000 --> 00:08:55,000
Thank you.

