1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue our discussion with respect to RNN.

3
00:00:03,000 --> 00:00:08,000
And in this video we are going to talk about RNN back propagation with time already in our previous

4
00:00:08,000 --> 00:00:12,000
video, we have seen how the forward propagation with time happens in RNN.

5
00:00:12,000 --> 00:00:16,000
In a simple RNN, we have also seen about the basic architecture of a simple RNN.

6
00:00:16,000 --> 00:00:21,000
Now with respect to this particular forward propagation, once we go at the end, once we calculate

7
00:00:21,000 --> 00:00:24,000
the loss function, our main aim is to reduce this particular loss function.

8
00:00:24,000 --> 00:00:29,000
Now, in order to reduce this particular loss function, we have to do a back propagation and we have

9
00:00:29,000 --> 00:00:31,000
to update all these weights that are available over here.

10
00:00:31,000 --> 00:00:32,000
Right.

11
00:00:32,000 --> 00:00:37,000
So, uh, in this video we'll be understanding how the back propagation will happen and how the weights

12
00:00:37,000 --> 00:00:39,000
will get updated in a simple RNN.

13
00:00:39,000 --> 00:00:45,000
Now, before I go ahead, what I will do is that I will again quickly revise our forward propagation.

14
00:00:45,000 --> 00:00:48,000
So let's say this is the basic architecture of a simple RNN.

15
00:00:48,000 --> 00:00:54,000
And here you know that we use a self feedback loop.

16
00:00:54,000 --> 00:00:57,000
And here also we pass some kind of inputs.

17
00:00:57,000 --> 00:00:59,000
So here also some weights will be involved okay.

18
00:00:59,000 --> 00:01:03,000
But when we unfold this with respect to time okay.

19
00:01:03,000 --> 00:01:06,000
And let's consider I'm going to take my sentence one.

20
00:01:06,000 --> 00:01:11,000
It has this X11X12X13.

21
00:01:11,000 --> 00:01:15,000
Let's say that my sentence one has this three words and this is my output.

22
00:01:15,000 --> 00:01:17,000
It can be one and zero okay.

23
00:01:17,000 --> 00:01:17,000
Okay.

24
00:01:17,000 --> 00:01:24,000
Similarly, my sentence 2nd May have another one like X21X22X23.

25
00:01:24,000 --> 00:01:25,000
Right.

26
00:01:25,000 --> 00:01:27,000
And let's say this is the output zero.

27
00:01:27,000 --> 00:01:34,000
And similarly my sentence 3rd May have something called as X31X32X33.

28
00:01:34,000 --> 00:01:37,000
Okay, I'm just considering three words at a time okay.

29
00:01:37,000 --> 00:01:39,000
You can have any number of words.

30
00:01:39,000 --> 00:01:40,000
That is up to you.

31
00:01:40,000 --> 00:01:44,000
And this word may also vary because at the end of the day, we will be converting these words into some

32
00:01:44,000 --> 00:01:46,000
kind of vectors by using one hot encoding.

33
00:01:46,000 --> 00:01:48,000
By using word two vec, it is up to us.

34
00:01:49,000 --> 00:01:49,000
Okay?

35
00:01:49,000 --> 00:01:53,000
And we will be passing every input one at a time based on the time stamp.

36
00:01:53,000 --> 00:01:56,000
But if I really want to give this generic format.

37
00:01:56,000 --> 00:01:59,000
So here I can go ahead and write something like this.

38
00:01:59,000 --> 00:02:02,000
I will make this sentence one as generic.

39
00:02:02,000 --> 00:02:08,000
I'll say, hey, this will be x I of one, then this will be x I of two and this will be x I of three.

40
00:02:09,000 --> 00:02:15,000
Okay, so here I is, nothing, but it will be the sentence number okay.

41
00:02:15,000 --> 00:02:16,000
Sentence number.

42
00:02:16,000 --> 00:02:17,000
Perfect.

43
00:02:17,000 --> 00:02:19,000
So I'm just giving some basic information over here.

44
00:02:19,000 --> 00:02:21,000
Now let's do one thing.

45
00:02:22,000 --> 00:02:26,000
Um, let's unfold this entire RNN simple RNN with respect to time.

46
00:02:26,000 --> 00:02:33,000
Now, you know, with respect to time when my T is equal to one, I will be passing my first input,

47
00:02:33,000 --> 00:02:33,000
right?

48
00:02:33,000 --> 00:02:36,000
So I will be going and passing my first word.

49
00:02:36,000 --> 00:02:39,000
So let me just go ahead and define this.

50
00:02:39,000 --> 00:02:44,000
And this will basically be my first word okay.

51
00:02:44,000 --> 00:02:48,000
So at t is equal to one I am going to pass x of I one okay.

52
00:02:49,000 --> 00:02:53,000
Now initially I will also pass some kind of weights over here.

53
00:02:53,000 --> 00:02:55,000
Let's say this is my hidden weights.

54
00:02:55,000 --> 00:02:58,000
By default, some weights will be getting assigned.

55
00:02:58,000 --> 00:03:01,000
And uh, initially I will not put this.

56
00:03:01,000 --> 00:03:03,000
Okay, let let this be empty for now.

57
00:03:03,000 --> 00:03:04,000
Okay.

58
00:03:04,000 --> 00:03:07,000
I will just go unfold this entire network over here.

59
00:03:07,000 --> 00:03:14,000
We are just going to unfold it now, once we give a input at a time, step T1T is equal to one.

60
00:03:14,000 --> 00:03:16,000
Here some weights will get assigned.

61
00:03:16,000 --> 00:03:19,000
So let me just go ahead and write this as w of I.

62
00:03:19,000 --> 00:03:19,000
Okay.

63
00:03:19,000 --> 00:03:23,000
Now you know that, um, what will happen in the forward propagation.

64
00:03:23,000 --> 00:03:25,000
We multiply with this specific weights.

65
00:03:25,000 --> 00:03:27,000
Then we add a bias.

66
00:03:27,000 --> 00:03:30,000
And after this we get a output.

67
00:03:30,000 --> 00:03:32,000
So this will be my output right.

68
00:03:32,000 --> 00:03:34,000
And this output will be nothing but oh one.

69
00:03:34,000 --> 00:03:35,000
Let's consider oh one.

70
00:03:36,000 --> 00:03:38,000
So this will be my oh one output.

71
00:03:39,000 --> 00:03:44,000
Now you know that my output I need to send it back to this particular RNN, right?

72
00:03:44,000 --> 00:03:48,000
To this particular hidden layer and all the other hidden neurons in that layer.

73
00:03:48,000 --> 00:03:53,000
So this will be sent to my neuron.

74
00:03:53,000 --> 00:03:54,000
Hidden neuron okay.

75
00:03:54,000 --> 00:03:56,000
The same output will also be sent to my hidden neuron.

76
00:03:56,000 --> 00:04:01,000
But at time step t is equal to two I will be passing my next input.

77
00:04:01,000 --> 00:04:04,000
So here it will be writing right XI2.

78
00:04:04,000 --> 00:04:08,000
Okay, that basically means we are just passing the second input with t is equal to two.

79
00:04:08,000 --> 00:04:12,000
Again the same weights will get initialized over here.

80
00:04:12,000 --> 00:04:18,000
Along with this we will go ahead and initialize one more weight over here w of h.

81
00:04:18,000 --> 00:04:18,000
Okay.

82
00:04:18,000 --> 00:04:22,000
So this is nothing but this is the hidden neurons wet weight right.

83
00:04:22,000 --> 00:04:26,000
If you remember over here in this hidden neurons I told you right.

84
00:04:26,000 --> 00:04:28,000
We are also going to assign this all weights.

85
00:04:28,000 --> 00:04:28,000
Right.

86
00:04:28,000 --> 00:04:30,000
This all weights will be also getting assigned.

87
00:04:30,000 --> 00:04:30,000
Right.

88
00:04:31,000 --> 00:04:38,000
And uh, the next step will be that we will multiply with this and this, and we'll add o one with w

89
00:04:38,000 --> 00:04:38,000
h.

90
00:04:38,000 --> 00:04:44,000
And finally, when we combine both of them, you'll be able to see that we will be getting our next

91
00:04:44,000 --> 00:04:45,000
output.

92
00:04:45,000 --> 00:04:46,000
That is nothing but O2.

93
00:04:46,000 --> 00:04:53,000
Now with respect to this particular O2, again I will go ahead and use my uh, the the weights will

94
00:04:53,000 --> 00:04:54,000
get assigned in this.

95
00:04:54,000 --> 00:05:03,000
And here I'm actually going to get my next input again at t is equal to three I'm going to pass x of

96
00:05:03,000 --> 00:05:04,000
I three okay.

97
00:05:04,000 --> 00:05:09,000
And finally I will get my output which is called as o4 okay.

98
00:05:09,000 --> 00:05:13,000
Now with respect to this particular output of O4 okay.

99
00:05:13,000 --> 00:05:18,000
Again I will go ahead and pass this to my.

100
00:05:20,000 --> 00:05:25,000
Sigmoid if it is a binary classification.

101
00:05:25,000 --> 00:05:26,000
Right.

102
00:05:26,000 --> 00:05:29,000
And finally I get my output over here, which is my y hat.

103
00:05:29,000 --> 00:05:35,000
Okay, so this is how I had also written it over here earlier when you saw this entire steps.

104
00:05:35,000 --> 00:05:36,000
Let me just show you once again.

105
00:05:36,000 --> 00:05:38,000
So here, right.

106
00:05:38,000 --> 00:05:39,000
We were getting this offer.

107
00:05:39,000 --> 00:05:40,000
We send it to the sigmoid.

108
00:05:40,000 --> 00:05:42,000
And finally, if it is a binary output we use sigmoid.

109
00:05:42,000 --> 00:05:48,000
Otherwise we use softmax if it is a multi-class classification, we specifically use something called

110
00:05:48,000 --> 00:05:49,000
as softmax.

111
00:05:49,000 --> 00:05:55,000
And here also you know that with respect to this offer, I will also assign some weights, uh, uh,

112
00:05:55,000 --> 00:05:56,000
with respect to this.

113
00:05:56,000 --> 00:06:00,000
So let's say here also we go ahead and use w of H.

114
00:06:01,000 --> 00:06:02,000
Okay.

115
00:06:02,000 --> 00:06:09,000
Now and here uh there will be a weight of I that will get initialized over here.

116
00:06:09,000 --> 00:06:13,000
Now, with respect to the forward propagation so quickly, let me just go ahead and write.

117
00:06:14,000 --> 00:06:17,000
So here is my forward propagation.

118
00:06:18,000 --> 00:06:20,000
Now in forward propagation what happens.

119
00:06:20,000 --> 00:06:21,000
Okay.

120
00:06:21,000 --> 00:06:26,000
Uh, one more important thing is that, guys, uh, initially I did not pass any input, right?

121
00:06:26,000 --> 00:06:32,000
Because, uh, when we are starting this, we will also initialize some kind of input weights.

122
00:06:32,000 --> 00:06:37,000
And here, when I am initializing some kind of input weights, obviously I will also initialize some

123
00:06:37,000 --> 00:06:38,000
w h over here.

124
00:06:39,000 --> 00:06:39,000
Okay.

125
00:06:39,000 --> 00:06:40,000
This will be my w h.

126
00:06:40,000 --> 00:06:45,000
Along with that, uh, you'll be seeing that since every layer has some kind of output here.

127
00:06:45,000 --> 00:06:47,000
Also, we need to make some kind of output right.

128
00:06:47,000 --> 00:06:50,000
We need to probably initialize some kind of output.

129
00:06:50,000 --> 00:06:52,000
So here I will be getting my O zero.

130
00:06:52,000 --> 00:06:55,000
This o zero can be initialized as zeros.

131
00:06:55,000 --> 00:06:58,000
You know it can be initialized any values as such, right.

132
00:06:58,000 --> 00:07:00,000
But randomly it can also be initialized.

133
00:07:00,000 --> 00:07:06,000
So here we can also initialize it zeros or here also randomly any values can be initialized based on

134
00:07:06,000 --> 00:07:07,000
some initialization factor right.

135
00:07:07,000 --> 00:07:09,000
The initial output.

136
00:07:09,000 --> 00:07:12,000
And again here also you'll be having this hidden weight okay.

137
00:07:12,000 --> 00:07:17,000
Now in forward propagation what happens is that whenever we go with the forward propagation the first

138
00:07:17,000 --> 00:07:19,000
thing that we need to compute is O one.

139
00:07:19,000 --> 00:07:19,000
Right.

140
00:07:19,000 --> 00:07:23,000
So O one needs to be computed over here now Now in order to compute oh one.

141
00:07:23,000 --> 00:07:29,000
What I will do is that I will just go ahead and and in every hidden neuron I apply some kind of activation

142
00:07:29,000 --> 00:07:30,000
function.

143
00:07:30,000 --> 00:07:33,000
So let's go ahead and apply this activation function f.

144
00:07:33,000 --> 00:07:36,000
Let's say the activation function in this particular case is tan h.

145
00:07:36,000 --> 00:07:36,000
Right.

146
00:07:36,000 --> 00:07:46,000
So I will go ahead and multiply x of I I x of I one multiplied by w of I plus.

147
00:07:46,000 --> 00:07:48,000
Then you can see O zero is also there.

148
00:07:48,000 --> 00:07:52,000
This this, uh, o zero is nothing, but it is zeros only.

149
00:07:52,000 --> 00:07:52,000
Right.

150
00:07:52,000 --> 00:07:56,000
I'll be just considering that we have initialized all the values zero, so zero multiplied by this hidden

151
00:07:56,000 --> 00:07:58,000
weight will be zero only.

152
00:07:58,000 --> 00:08:00,000
So that entire thing.

153
00:08:00,000 --> 00:08:00,000
Right?

154
00:08:00,000 --> 00:08:01,000
It will be zero.

155
00:08:01,000 --> 00:08:02,000
Because we need to sum this up.

156
00:08:02,000 --> 00:08:04,000
Then we have a bias.

157
00:08:04,000 --> 00:08:06,000
Let's consider a bias over here.

158
00:08:06,000 --> 00:08:11,000
In every hidden neuron we will be having a bias right?

159
00:08:11,000 --> 00:08:13,000
So let's consider this as a bias okay.

160
00:08:14,000 --> 00:08:17,000
Now this is my first operation with respect to this thing.

161
00:08:17,000 --> 00:08:18,000
Right.

162
00:08:18,000 --> 00:08:23,000
And one more important thing over here is that I will just go ahead and write this also so that you

163
00:08:23,000 --> 00:08:24,000
will not get confused.

164
00:08:24,000 --> 00:08:30,000
So I'll say, hey, my TO0 multiplied by w h plus bias okay b one.

165
00:08:30,000 --> 00:08:34,000
And then we basically go ahead and apply an activation function okay.

166
00:08:34,000 --> 00:08:39,000
Now coming to my next output O2 okay.

167
00:08:39,000 --> 00:08:41,000
Now this is how the forward propagation will happen.

168
00:08:41,000 --> 00:08:44,000
Now in forward propagation of O2 my next word is basically sent.

169
00:08:44,000 --> 00:08:54,000
So this will nothing be but it will be a function of x of I two multiplied by w of I plus.

170
00:08:54,000 --> 00:08:56,000
Now o one and w h is also there.

171
00:08:56,000 --> 00:08:59,000
So I will go ahead and multiply o one multiplied by w h.

172
00:08:59,000 --> 00:09:02,000
Plus they will again be a bias over here.

173
00:09:03,000 --> 00:09:05,000
Let's consider this bias as b.

174
00:09:05,000 --> 00:09:08,000
And here I'm going to use one another activation function.

175
00:09:08,000 --> 00:09:12,000
This is how we can go ahead and calculate O2 right now.

176
00:09:12,000 --> 00:09:19,000
Similarly when I go ahead and send my third word that is to calculate O3, it will be nothing but X13

177
00:09:19,000 --> 00:09:30,000
multiplied by w of I plus o two multiplied by w of h plus b bias right o two multiplied by w of h,

178
00:09:30,000 --> 00:09:32,000
w of h is nothing but my hidden weights.

179
00:09:32,000 --> 00:09:35,000
Okay, so everything looks perfectly fine over here.

180
00:09:35,000 --> 00:09:41,000
Now finally, when we get our output right, we send this output to a sigmoid activation function and

181
00:09:41,000 --> 00:09:43,000
then only we'll be able to get my y hat.

182
00:09:43,000 --> 00:09:48,000
So in order to get the y hat, let's say the sigmoid activation function, I'm going to apply over there.

183
00:09:48,000 --> 00:09:51,000
it will be nothing but oh three multiplied by.

184
00:09:51,000 --> 00:09:58,000
So here you can see 0300300 sorry I did not write I wrote oh four over here.

185
00:09:58,000 --> 00:10:00,000
So it should be oh three.

186
00:10:01,000 --> 00:10:03,000
This is my output oh three multiplied by W of H.

187
00:10:03,000 --> 00:10:04,000
Right.

188
00:10:04,000 --> 00:10:08,000
And then uh, we send it to the sigmoid to get my output right.

189
00:10:08,000 --> 00:10:09,000
So here what we are doing.

190
00:10:09,000 --> 00:10:14,000
We are just going to multiply oh three with w of h o.

191
00:10:14,000 --> 00:10:19,000
So once I probably do all these things over here c o3 w of h.

192
00:10:19,000 --> 00:10:24,000
Instead of writing w of h here I'll not use hidden weights because this will be my output weights.

193
00:10:24,000 --> 00:10:27,000
So let me just go ahead and write w of zero okay.

194
00:10:27,000 --> 00:10:31,000
So this will basically be my W of zero.

195
00:10:32,000 --> 00:10:33,000
Perfect right.

196
00:10:33,000 --> 00:10:36,000
So this is how the forward propagation usually happens.

197
00:10:36,000 --> 00:10:43,000
And finally when we pass it to sigmoid or softmax I'm going to get my y hat okay O3 multiplied by w

198
00:10:43,000 --> 00:10:43,000
of zero.

199
00:10:43,000 --> 00:10:47,000
So here we are specifically going to apply w of zero because this is my output weights right.

200
00:10:47,000 --> 00:10:50,000
And finally this is my output node where I'm getting my output okay.

201
00:10:51,000 --> 00:10:52,000
Now this is fine.

202
00:10:52,000 --> 00:10:57,000
This is perfectly till here everything uh I think we have discussed this in our previous video also.

203
00:10:57,000 --> 00:11:03,000
Now the time comes is that after we calculate y hat now once we calculate y hat, what will happen?

204
00:11:03,000 --> 00:11:07,000
We will take y minus y hat and we will calculate the loss.

205
00:11:07,000 --> 00:11:08,000
Right?

206
00:11:08,000 --> 00:11:09,000
There will be some loss value.

207
00:11:09,000 --> 00:11:17,000
Now our main aim is to reduce this loss and how we will be able to reduce this loss only when only when

208
00:11:17,000 --> 00:11:23,000
we update update all the weights.

209
00:11:23,000 --> 00:11:26,000
Now what are the weights that are available over here?

210
00:11:26,000 --> 00:11:33,000
Uh, as I probably see, there is, there are three weights, uh, that will be involved over here.

211
00:11:33,000 --> 00:11:35,000
One is w of I.

212
00:11:36,000 --> 00:11:40,000
The next one is w of h and w of h w of I.

213
00:11:40,000 --> 00:11:42,000
And one more weight is w of zero.

214
00:11:43,000 --> 00:11:52,000
Okay, so this three weights needs to keep on getting updated during the back propagation.

215
00:11:54,000 --> 00:11:54,000
Right.

216
00:11:54,000 --> 00:12:02,000
This is the most important thing that if we really need to update w of I, w of x and w of zero, right,

217
00:12:02,000 --> 00:12:07,000
I need to probably update this weights itself and probably do the backward propagation, and then continuously

218
00:12:07,000 --> 00:12:10,000
do the forward and the backward propagation in similar way.

219
00:12:10,000 --> 00:12:12,000
And we'll be doing unless and until this.

220
00:12:13,000 --> 00:12:15,000
Till when?

221
00:12:15,000 --> 00:12:19,000
Till our entire convergence will happen in this gradient.

222
00:12:19,000 --> 00:12:25,000
Descent will reach this global minima, or when my loss will be really less.

223
00:12:25,000 --> 00:12:30,000
This is equivalent to when my loss will be really, really it will be less right?

224
00:12:31,000 --> 00:12:36,000
Now let me just go ahead and quickly show you with respect to the back propagation, how does back propagation

225
00:12:36,000 --> 00:12:37,000
actually happen.

226
00:12:37,000 --> 00:12:40,000
So I will just go ahead and create one more cell over here.

227
00:12:41,000 --> 00:12:41,000
Okay.

228
00:12:41,000 --> 00:12:43,000
So this is my forward propagation.

229
00:12:43,000 --> 00:12:46,000
Now in the backward propagation with respect to time.

230
00:12:49,000 --> 00:12:54,000
Backward propagation with time.

231
00:12:55,000 --> 00:13:03,000
Now see at the end you know over here when we do the forward propagation we finally go ahead and calculate

232
00:13:03,000 --> 00:13:04,000
our loss.

233
00:13:04,000 --> 00:13:05,000
Right?

234
00:13:05,000 --> 00:13:06,000
So this is our loss.

235
00:13:06,000 --> 00:13:10,000
So we go ahead and calculate our loss okay.

236
00:13:10,000 --> 00:13:13,000
Now once we calculate this loss okay.

237
00:13:13,000 --> 00:13:14,000
Please focus on this okay.

238
00:13:14,000 --> 00:13:19,000
Once we calculate this loss the first output.

239
00:13:19,000 --> 00:13:22,000
So our first weight that we need to update is w o.

240
00:13:23,000 --> 00:13:26,000
Now in order to update w.

241
00:13:26,000 --> 00:13:36,000
And as I said we have to update which all weights we have to update w of I, w of h and w of O.

242
00:13:36,000 --> 00:13:39,000
Okay, we need to update this three weights.

243
00:13:39,000 --> 00:13:44,000
Now in the backward propagation the first weight will get updated based on your timestamp.

244
00:13:44,000 --> 00:13:46,000
So let's say timestamp is equal to three.

245
00:13:46,000 --> 00:13:47,000
The first weight that is coming is w four.

246
00:13:47,000 --> 00:13:50,000
So in order to update w four how do I do it.

247
00:13:50,000 --> 00:13:57,000
So here I will be using a weight updation formula.

248
00:13:58,000 --> 00:14:00,000
And I hope everybody remembers this.

249
00:14:00,000 --> 00:14:02,000
What is the weight updation formula.

250
00:14:03,000 --> 00:14:14,000
It is nothing, but you will be able to see that weight of new is equal to weight of old minus some

251
00:14:14,000 --> 00:14:15,000
learning rate.

252
00:14:15,000 --> 00:14:19,000
Derivative of loss with respect to derivative of w old.

253
00:14:19,000 --> 00:14:20,000
Okay.

254
00:14:20,000 --> 00:14:26,000
Now this formula I hope we are very much familiar because we have also discussed this in an but out

255
00:14:26,000 --> 00:14:29,000
of it this is the most important one okay.

256
00:14:29,000 --> 00:14:35,000
Because of this here when we are doing this, this is basically my derivative.

257
00:14:36,000 --> 00:14:37,000
And why we are doing it.

258
00:14:37,000 --> 00:14:41,000
Because we need to calculate the slope of.

259
00:14:41,000 --> 00:14:42,000
Slope of.

260
00:14:44,000 --> 00:14:46,000
Gradient descent.

261
00:14:46,000 --> 00:14:51,000
We need to go ahead and calculate this slope of gradient descent.

262
00:14:51,000 --> 00:14:52,000
Okay.

263
00:14:52,000 --> 00:14:57,000
Now my question is that, uh, our question is that how do we solve this?

264
00:14:57,000 --> 00:14:57,000
Now see.

265
00:14:57,000 --> 00:15:01,000
First is derivative of loss with respect to derivative of w old.

266
00:15:01,000 --> 00:15:03,000
So first weight that is.

267
00:15:03,000 --> 00:15:04,000
W o.

268
00:15:04,000 --> 00:15:09,000
So first thing that we really need to calculate over here is that derivative of loss with respect to

269
00:15:09,000 --> 00:15:11,000
derivative of w old.

270
00:15:11,000 --> 00:15:14,000
This is what I really need to calculate right.

271
00:15:15,000 --> 00:15:18,000
If I calculate this then only see my.

272
00:15:18,000 --> 00:15:22,000
This weight is nothing but derivative of loss with respect to derivative of w o.

273
00:15:22,000 --> 00:15:23,000
Right.

274
00:15:23,000 --> 00:15:29,000
And if I talk with respect to the weight updation formula, this will be nothing but derivative of O

275
00:15:29,000 --> 00:15:35,000
is equal to derivative of O old or I'll write like this.

276
00:15:35,000 --> 00:15:38,000
Derivative of o new.

277
00:15:39,000 --> 00:15:49,000
Sorry w of o new is equal to w of o old minus learning rate derivative of loss with respect to derivative

278
00:15:49,000 --> 00:15:51,000
of w o old.

279
00:15:51,000 --> 00:15:51,000
Okay.

280
00:15:51,000 --> 00:15:53,000
So this will be old.

281
00:15:53,000 --> 00:15:59,000
So this is my entire weight updation formula for w o right.

282
00:16:00,000 --> 00:16:02,000
But here I need to calculate this.

283
00:16:02,000 --> 00:16:03,000
This is fine.

284
00:16:03,000 --> 00:16:07,000
I'll be getting the old value now in order to get the new value I have the learning rate.

285
00:16:07,000 --> 00:16:08,000
Learning rate will be a small value.

286
00:16:08,000 --> 00:16:12,000
So let's consider learning rate will be something like 0.001 okay.

287
00:16:12,000 --> 00:16:13,000
But I need to compute this.

288
00:16:13,000 --> 00:16:20,000
So in order to compute this derivative of loss with respect to derivative of w old, how do I probably

289
00:16:20,000 --> 00:16:23,000
write this entire or how do I find this?

290
00:16:23,000 --> 00:16:24,000
Okay.

291
00:16:24,000 --> 00:16:34,000
Now it is important, uh, for you all to understand is that here w o is directly dependent on y hat,

292
00:16:34,000 --> 00:16:40,000
and y hat is also dependent on loss, because at the end of the day, we are calculating the loss.

293
00:16:40,000 --> 00:16:43,000
So here you can see there is a direct relationship.

294
00:16:43,000 --> 00:16:47,000
The loss is dependent on y hat and y hat is dependent on w zero.

295
00:16:47,000 --> 00:16:48,000
Right.

296
00:16:48,000 --> 00:16:53,000
So if I want to probably expand this equation based on chain rule.

297
00:16:54,000 --> 00:16:58,000
So this based on chain rule of derivative.

298
00:16:58,000 --> 00:17:01,000
If you remember this we have already discussed.

299
00:17:02,000 --> 00:17:07,000
So based on chain rule of derivative you will be able to see that I'll be having derivative of loss

300
00:17:07,000 --> 00:17:11,000
with respect to derivative of w old.

301
00:17:11,000 --> 00:17:12,000
Instead of writing w old.

302
00:17:12,000 --> 00:17:18,000
I know loss is dependent on y hat, so I'll go ahead and write derivative of loss with respect to derivative

303
00:17:18,000 --> 00:17:19,000
of y hat.

304
00:17:19,000 --> 00:17:25,000
And similarly I will go ahead and say derivative of y hat is dependent on.

305
00:17:25,000 --> 00:17:26,000
Now I can go ahead and write.

306
00:17:26,000 --> 00:17:32,000
Derivative of y hat is dependent on what w o old right?

307
00:17:33,000 --> 00:17:36,000
Now once we do this, we will be able to get this value.

308
00:17:36,000 --> 00:17:36,000
And this is nothing.

309
00:17:36,000 --> 00:17:39,000
But this is a chain rule of derivative, right?

310
00:17:40,000 --> 00:17:45,000
And through this you will be able to see that we will be able to calculate or will be able to update

311
00:17:45,000 --> 00:17:47,000
the weights of w o new.

312
00:17:47,000 --> 00:17:47,000
Okay.

313
00:17:47,000 --> 00:17:49,000
This is perfect.

314
00:17:49,000 --> 00:17:54,000
So our first first main thing is that how do we update.

315
00:17:55,000 --> 00:17:56,000
w zero?

316
00:17:56,000 --> 00:17:58,000
That has been very much cleared.

317
00:17:58,000 --> 00:17:58,000
Okay.

318
00:17:58,000 --> 00:18:00,000
Because there are three types of weight.

319
00:18:00,000 --> 00:18:03,000
One is w zero, one is w of h and one is w of I.

320
00:18:03,000 --> 00:18:07,000
So here with respect to time stamp here you can see that I have updated my W of zero.

321
00:18:07,000 --> 00:18:15,000
Now coming to the second important case we need to update w of H.

322
00:18:16,000 --> 00:18:19,000
And this is nothing but this is my hidden weights.

323
00:18:19,000 --> 00:18:19,000
Right?

324
00:18:19,000 --> 00:18:20,000
Hidden layer weights.

325
00:18:23,000 --> 00:18:34,000
Now this weights are important because this weight is used in all timestamps, right?

326
00:18:34,000 --> 00:18:37,000
T is equal to one, t is equal to two, t is equal to three, t is equal to four.

327
00:18:37,000 --> 00:18:41,000
So we have used this in each and every timestamp.

328
00:18:41,000 --> 00:18:43,000
Now how do we do that okay.

329
00:18:43,000 --> 00:18:49,000
So here you can see now we what we are going to basically do while updating this is that it's very much

330
00:18:49,000 --> 00:18:50,000
simple okay.

331
00:18:50,000 --> 00:18:54,000
And first of all let me just go ahead and write the weight updation formula here.

332
00:18:54,000 --> 00:19:05,000
So I'll say hey w of h new is equal to w of h old minus learning rate derivative of loss with respect

333
00:19:05,000 --> 00:19:07,000
to derivative of w h old.

334
00:19:07,000 --> 00:19:10,000
So this is the formula the weight updation formula.

335
00:19:10,000 --> 00:19:19,000
And out of this formula the main thing that I have to compute is nothing but derivative of loss with

336
00:19:19,000 --> 00:19:21,000
respect to derivative of w h old.

337
00:19:22,000 --> 00:19:24,000
Okay, this we need to compute.

338
00:19:24,000 --> 00:19:26,000
So in order to compute this how do I write.

339
00:19:26,000 --> 00:19:32,000
Now derivative of loss with respect to derivative of w h Old.

340
00:19:32,000 --> 00:19:33,000
How do I compute it?

341
00:19:33,000 --> 00:19:38,000
I have to make sure that how many time stamps are there in my entire RNN, t is equal to one, two,

342
00:19:38,000 --> 00:19:43,000
three, so I have to make sure that I have to find this based on my timestamp.

343
00:19:43,000 --> 00:19:45,000
T is equal to one, two, and three.

344
00:19:46,000 --> 00:19:47,000
Now how do I do it?

345
00:19:47,000 --> 00:19:50,000
Okay, first of all let's in the back propagation.

346
00:19:50,000 --> 00:19:51,000
My first timestamp is three.

347
00:19:51,000 --> 00:19:58,000
Now I need to update which weight over here I need to update this w of H right?

348
00:19:58,000 --> 00:20:02,000
I need to specifically update my w of h weights.

349
00:20:02,000 --> 00:20:04,000
Okay okay.

350
00:20:04,000 --> 00:20:05,000
Perfect.

351
00:20:05,000 --> 00:20:07,000
Uh, so let's go ahead and do that okay.

352
00:20:07,000 --> 00:20:10,000
And here you can see there is w of h and there is w of I.

353
00:20:10,000 --> 00:20:11,000
Okay.

354
00:20:11,000 --> 00:20:14,000
So uh I will just try to show you how we will update this w of h.

355
00:20:14,000 --> 00:20:17,000
So w of oh we saw it how we can update it.

356
00:20:17,000 --> 00:20:23,000
Now let's go ahead and see with respect to W of H now this hidden weight, you know that this is dependent

357
00:20:23,000 --> 00:20:24,000
on O3.

358
00:20:24,000 --> 00:20:28,000
See just try to always make a rule like this is dependent on O3.

359
00:20:29,000 --> 00:20:35,000
O3 is dependent on y hat and y hat is further dependent on Los.

360
00:20:35,000 --> 00:20:35,000
Okay.

361
00:20:36,000 --> 00:20:38,000
So here you can clearly see this.

362
00:20:38,000 --> 00:20:45,000
First of all Los like uh, if I go from this particular because I'm searching with respect to time is

363
00:20:45,000 --> 00:20:46,000
equal to three, right?

364
00:20:46,000 --> 00:20:47,000
And time is equal to three.

365
00:20:47,000 --> 00:20:48,000
I have w h over here.

366
00:20:48,000 --> 00:20:51,000
Now first of all this is dependent on O3.

367
00:20:51,000 --> 00:20:55,000
O3 is further dependent on y hat right.

368
00:20:56,000 --> 00:20:59,000
And y hat is further dependent on loss.

369
00:20:59,000 --> 00:20:59,000
Right.

370
00:20:59,000 --> 00:21:01,000
So there is a kind of dependency right.

371
00:21:01,000 --> 00:21:02,000
And vice versa.

372
00:21:02,000 --> 00:21:08,000
Also I can say right, I'll not say okay w h is dependent on O3 and I'll say O3 is dependent on w h.

373
00:21:08,000 --> 00:21:09,000
Or you can also think in this way.

374
00:21:09,000 --> 00:21:10,000
Let me reframe it.

375
00:21:10,000 --> 00:21:10,000
Okay.

376
00:21:10,000 --> 00:21:12,000
So I have my loss.

377
00:21:12,000 --> 00:21:17,000
Loss is dependent on y hat y hat is dependent on o three, and o three is dependent on w h.

378
00:21:17,000 --> 00:21:23,000
Now in order to update this I have to follow this chain okay I have to follow this chain okay.

379
00:21:23,000 --> 00:21:26,000
So let me just go ahead and write it and how it will look like.

380
00:21:26,000 --> 00:21:29,000
So first of all I'll go ahead and write derivative of Loss.

381
00:21:30,000 --> 00:21:31,000
Let me go ahead and write this.

382
00:21:31,000 --> 00:21:35,000
So here I'll go ahead and write derivative of loss with respect to derivative of y hat.

383
00:21:35,000 --> 00:21:37,000
Because this is the first dependency.

384
00:21:37,000 --> 00:21:40,000
Then derivative of y hat will be dependent on what.

385
00:21:40,000 --> 00:21:44,000
Derivative of derivative of o.

386
00:21:45,000 --> 00:21:48,000
Or let me just see this weights okay o w for okay.

387
00:21:48,000 --> 00:21:51,000
So derivative of w for or instead of writing derivative of w for.

388
00:21:51,000 --> 00:21:57,000
I know this will be dependent on derivative of O3 because O3 is the output over here, right?

389
00:21:57,000 --> 00:21:59,000
O3 is the output.

390
00:21:59,000 --> 00:22:07,000
Then similarly this O3 will be further dependent on derivative of O2, right?

391
00:22:08,000 --> 00:22:13,000
Now I know that in time stamp t is equal three I have to first of all update this weight.

392
00:22:13,000 --> 00:22:21,000
So what I will do instead of writing O2, I will go ahead and write derivative of O sorry derivative

393
00:22:21,000 --> 00:22:21,000
of.

394
00:22:22,000 --> 00:22:26,000
Derivative of w h w h.

395
00:22:27,000 --> 00:22:32,000
Okay, now this is very much important because when t is equal to three I'm doing the back propagation.

396
00:22:32,000 --> 00:22:37,000
And I'm finding out this particular derivative of this value in this way okay.

397
00:22:38,000 --> 00:22:41,000
Now I will take this as one okay.

398
00:22:41,000 --> 00:22:45,000
One entire thing and we'll update this will will calculate it okay.

399
00:22:45,000 --> 00:22:52,000
Now we will go ahead and write plus y because at time stamp t is equal to two I still have w of h over

400
00:22:52,000 --> 00:22:54,000
here okay.

401
00:22:54,000 --> 00:22:58,000
Now what kind of rule we will follow or what kind of chain will follow.

402
00:22:58,000 --> 00:23:02,000
So here you can see loss is dependent on y hat Y hat is dependent on O3.

403
00:23:02,000 --> 00:23:10,000
O3 is now dependent on O2 and O2 is now dependent on w h.

404
00:23:10,000 --> 00:23:14,000
Now in order to update this w of H, please see this.

405
00:23:14,000 --> 00:23:15,000
Okay.

406
00:23:15,000 --> 00:23:18,000
In order to update this w of H, I have to follow this chain rule.

407
00:23:18,000 --> 00:23:22,000
Okay so here what I will go ahead and write again derivative.

408
00:23:22,000 --> 00:23:27,000
So I will start from here okay I will go ahead and write at timestamp t is equal to two.

409
00:23:27,000 --> 00:23:28,000
How I will update this.

410
00:23:28,000 --> 00:23:33,000
First of all I will go ahead and write derivative of l with respect to derivative of y hat.

411
00:23:33,000 --> 00:23:42,000
Then I will multiply derivative of y hat derivative of y hat with respect to derivative of o3.

412
00:23:43,000 --> 00:23:49,000
Then I have derivative of O3 with respect to derivative of O2.

413
00:23:49,000 --> 00:23:54,000
Because there is also a dependency over there, and then with respect to derivative of O2, I will be

414
00:23:54,000 --> 00:23:56,000
calculating the derivative of w h.

415
00:23:56,000 --> 00:24:01,000
So this will be my For time stamp t is equal to two.

416
00:24:01,000 --> 00:24:03,000
How I will go ahead and calculate my w of h.

417
00:24:03,000 --> 00:24:04,000
Right?

418
00:24:04,000 --> 00:24:06,000
This is how based on the chain rule it will happen again.

419
00:24:06,000 --> 00:24:07,000
Let me repeat it.

420
00:24:07,000 --> 00:24:10,000
Loss is dependent on y hat Y hat is dependent on O3.

421
00:24:10,000 --> 00:24:11,000
O3 is dependent on O2.

422
00:24:11,000 --> 00:24:13,000
O2 is dependent on w h.

423
00:24:13,000 --> 00:24:16,000
And that is how I have actually calculated my derivative.

424
00:24:16,000 --> 00:24:18,000
Now comes to the next point.

425
00:24:18,000 --> 00:24:19,000
And the next point is nothing.

426
00:24:19,000 --> 00:24:22,000
But with time stamp t is equal to one.

427
00:24:22,000 --> 00:24:24,000
Now how do I calculate it?

428
00:24:24,000 --> 00:24:27,000
So here what I'm actually going to do I'm going to probably create this chain.

429
00:24:28,000 --> 00:24:34,000
Now I need to update this particular value that is w of h right now.

430
00:24:34,000 --> 00:24:35,000
How do I calculate it.

431
00:24:35,000 --> 00:24:37,000
Now you can see there will be a dependency.

432
00:24:37,000 --> 00:24:43,000
Now from here I'll say O two is dependent on 0101 is dependent on w h.

433
00:24:43,000 --> 00:24:43,000
Right.

434
00:24:43,000 --> 00:24:45,000
So first we'll say loss.

435
00:24:45,000 --> 00:24:46,000
Loss is dependent on y hat.

436
00:24:46,000 --> 00:24:47,000
Y hat is dependent on O3.

437
00:24:47,000 --> 00:24:49,000
O3 is dependent on O2.

438
00:24:49,000 --> 00:24:50,000
O2 is dependent on O1.

439
00:24:50,000 --> 00:24:52,000
And finally O1 is dependent on w h.

440
00:24:52,000 --> 00:24:55,000
And that is how this chain rule will be working.

441
00:24:55,000 --> 00:24:59,000
Okay, now if I really want to write this again so it will be nothing.

442
00:24:59,000 --> 00:25:06,000
But here I'm going to write derivative of loss with respect to derivative of y hat multiplied by derivative

443
00:25:06,000 --> 00:25:12,000
of y hat multiplied by derivative of O3 based on the chain rule derivative O3 divided by derivative

444
00:25:12,000 --> 00:25:15,000
of 020202.

445
00:25:15,000 --> 00:25:20,000
Then again, I will go ahead and calculate derivative oh two divided by derivative of oh one.

446
00:25:21,000 --> 00:25:23,000
I think it is oh one right oh one.

447
00:25:23,000 --> 00:25:25,000
Then oh one is derivative of.

448
00:25:25,000 --> 00:25:32,000
Then oh one is dependent on w f uh w of h w of w of h.

449
00:25:33,000 --> 00:25:35,000
So here I have my next one.

450
00:25:36,000 --> 00:25:41,000
And once we update all, once we add up all this particular values that we are going to get.

451
00:25:41,000 --> 00:25:45,000
That is when we are going to get this entire value.

452
00:25:45,000 --> 00:25:47,000
And we'll go ahead and update it.

453
00:25:47,000 --> 00:25:54,000
And our w h new will be the value with respect to this old minus learning rate multiplied by this whatever

454
00:25:54,000 --> 00:25:55,000
value we specifically get.

455
00:25:55,000 --> 00:26:01,000
And that is how in one back propagation, all the hidden weights gets updated.

456
00:26:01,000 --> 00:26:03,000
Okay, this is perfect, right?

457
00:26:04,000 --> 00:26:10,000
So I hope you are able to get how to update w h, which is my hidden layer weight based on timestamp.

458
00:26:10,000 --> 00:26:14,000
Now coming to the third and the very most important thing that is w of I.

459
00:26:14,000 --> 00:26:21,000
Okay, so third is nothing but updating weights updating weights.

460
00:26:23,000 --> 00:26:24,000
W of I.

461
00:26:25,000 --> 00:26:28,000
Now let's go ahead and discuss about this.

462
00:26:29,000 --> 00:26:36,000
So guys now we are going to probably see how we can update weights w of I again based on timestamp okay.

463
00:26:37,000 --> 00:26:44,000
Because that t is equal to three I'm sending one word at t is equal to two I'm sending one word and

464
00:26:44,000 --> 00:26:45,000
at t is equal to one I'm sending one word.

465
00:26:45,000 --> 00:26:48,000
So we are just talking about these weights right.

466
00:26:48,000 --> 00:26:51,000
So this weights we are we also have to update in the back propagation.

467
00:26:52,000 --> 00:26:54,000
So let's go ahead and see step by step okay.

468
00:26:54,000 --> 00:26:57,000
Now first of all for this what kind of chain rule I'll go ahead and apply.

469
00:26:57,000 --> 00:27:02,000
So here you can see that uh first I have loss.

470
00:27:02,000 --> 00:27:05,000
The loss is dependent on y hat Y hat is dependent on O3.

471
00:27:05,000 --> 00:27:07,000
O3 is dependent on w of I.

472
00:27:07,000 --> 00:27:08,000
Right.

473
00:27:08,000 --> 00:27:13,000
So again for this the weight updation formula will look something like this.

474
00:27:13,000 --> 00:27:22,000
So here I will go ahead and write w of I nu is equal to w of I old minus learning rate of derivative

475
00:27:22,000 --> 00:27:27,000
of loss with respect to derivative of w of I old.

476
00:27:27,000 --> 00:27:27,000
Okay.

477
00:27:27,000 --> 00:27:31,000
Now as you know we have to go ahead and compute this okay.

478
00:27:32,000 --> 00:27:34,000
And this will be with respect to time.

479
00:27:35,000 --> 00:27:44,000
Now quickly derivative of loss with respect to derivative of w I old is equal to I know first of all

480
00:27:44,000 --> 00:27:45,000
loss.

481
00:27:45,000 --> 00:27:47,000
This uh derivative of loss.

482
00:27:47,000 --> 00:27:50,000
So this will be with respect to time stamp t is equal to three okay.

483
00:27:50,000 --> 00:27:52,000
We have to do the back propagation.

484
00:27:52,000 --> 00:27:56,000
So here you can see derivative of loss with respect to derivative of w.

485
00:27:57,000 --> 00:28:05,000
Um here you know that loss is dependent on loss is dependent on y hat.

486
00:28:05,000 --> 00:28:08,000
Okay, then I have y hat.

487
00:28:08,000 --> 00:28:10,000
It is in turn dependent on.

488
00:28:10,000 --> 00:28:14,000
So y hat is dependent on what y hat is dependent on.

489
00:28:14,000 --> 00:28:15,000
Oh three.

490
00:28:15,000 --> 00:28:18,000
So I'll just go ahead and compute 030.

491
00:28:18,000 --> 00:28:24,000
Then oh three is dependent on w of I.

492
00:28:25,000 --> 00:28:25,000
Right.

493
00:28:25,000 --> 00:28:28,000
And when I say w of I this is nothing but w of I hold.

494
00:28:28,000 --> 00:28:32,000
Okay so this is with respect to my timestamp t is equal to three okay.

495
00:28:33,000 --> 00:28:35,000
Similarly we'll go ahead and write plus.

496
00:28:35,000 --> 00:28:38,000
Now let's go with respect to timestamp two.

497
00:28:38,000 --> 00:28:42,000
Now with respect to timestamp two you know that how we are going to go y is dependent on y hat.

498
00:28:42,000 --> 00:28:44,000
Y hat is dependent on O3.

499
00:28:44,000 --> 00:28:45,000
O3 is dependent on O2.

500
00:28:45,000 --> 00:28:48,000
O2 is dependent on w of I.

501
00:28:48,000 --> 00:28:48,000
Right.

502
00:28:48,000 --> 00:28:50,000
So we will go in this rule.

503
00:28:50,000 --> 00:28:53,000
So first of all I will say hey let's go this.

504
00:28:53,000 --> 00:28:57,000
So this you have one dependency loss with this.

505
00:28:57,000 --> 00:29:03,000
Then oh three then oh three from oh three to oh two and oh two to this w of I.

506
00:29:03,000 --> 00:29:03,000
Okay.

507
00:29:03,000 --> 00:29:06,000
So we will go ahead and write this so quickly.

508
00:29:06,000 --> 00:29:07,000
Let's go ahead and do this.

509
00:29:07,000 --> 00:29:13,000
So here I'll be writing derivative of loss with respect to derivative of y hat multiplied by derivative

510
00:29:13,000 --> 00:29:17,000
of y hat multiply divided by derivative of O3.

511
00:29:17,000 --> 00:29:22,000
Then you have derivative of O3 divided by derivative of O2.

512
00:29:22,000 --> 00:29:28,000
And finally you have derivative of O2 divided by derivative of w of I old.

513
00:29:32,000 --> 00:29:34,000
So here is my total thing right.

514
00:29:34,000 --> 00:29:36,000
So O3 to O2.

515
00:29:36,000 --> 00:29:39,000
See I'll see O3 to 020 2 to 5.

516
00:29:39,000 --> 00:29:41,000
So oh two to oh five.

517
00:29:41,000 --> 00:29:41,000
Right.

518
00:29:41,000 --> 00:29:48,000
And finally you'll also be seeing that we will go ahead and write my final four.

519
00:29:48,000 --> 00:29:50,000
And this is for t is equal to two.

520
00:29:50,000 --> 00:29:51,000
Now let's go ahead and write it.

521
00:29:51,000 --> 00:29:58,000
For t is equal to one, t is equal to one is very simple derivative of w of y hat multiplied by derivative

522
00:29:58,000 --> 00:30:01,000
of y hat with respect to oh three.

523
00:30:01,000 --> 00:30:06,000
Derivative of oh O3 with respect to derivative of O2.

524
00:30:06,000 --> 00:30:13,000
Then oh two to oh one and finally oh one to w of I.

525
00:30:13,000 --> 00:30:13,000
Old.

526
00:30:14,000 --> 00:30:21,000
Once we calculate this all values, you will be able to see that I'll be getting this entire value over

527
00:30:21,000 --> 00:30:24,000
here and this will be replaced over here.

528
00:30:24,000 --> 00:30:25,000
Right.

529
00:30:25,000 --> 00:30:34,000
And finally we update w of I update W of I weights in the back propagation.

530
00:30:34,000 --> 00:30:38,000
And this usually happens in one back propagation with respect to time.

531
00:30:38,000 --> 00:30:43,000
So finally when we are using this weight updation formula what exactly is happening.

532
00:30:43,000 --> 00:30:43,000
Right.

533
00:30:43,000 --> 00:30:47,000
And I hope everybody knows about gradient descent okay.

534
00:30:47,000 --> 00:30:51,000
Our main aim is to probably come to this global minima.

535
00:30:51,000 --> 00:30:51,000
Right.

536
00:30:51,000 --> 00:30:52,000
And there are different, different weights.

537
00:30:52,000 --> 00:30:56,000
One is W of h, that is w of I.

538
00:30:56,000 --> 00:31:00,000
And there is something called as w of o right.

539
00:31:01,000 --> 00:31:08,000
I my main aim is that whenever this convergence basically happens we have to probably come towards this

540
00:31:08,000 --> 00:31:10,000
global minima, right?

541
00:31:10,000 --> 00:31:16,000
If we are coming towards this global minima, that basically means our loss will be very much near to

542
00:31:16,000 --> 00:31:20,000
zero or our loss will keep on decreasing.

543
00:31:20,000 --> 00:31:20,000
Right.

544
00:31:20,000 --> 00:31:24,000
And this is what we had already discussed in an regarding gradient descent.

545
00:31:24,000 --> 00:31:26,000
So I hope you like this particular video.

546
00:31:26,000 --> 00:31:28,000
This was it for my side.

547
00:31:28,000 --> 00:31:33,000
Uh, here in this video we have discussed about RNN, simple RNN, we have discussed about forward propagation.

548
00:31:33,000 --> 00:31:36,000
We have discussed about backward propagation with all the equations at all.

549
00:31:36,000 --> 00:31:37,000
Right.

550
00:31:37,000 --> 00:31:38,000
So yes, this was it from my side.

551
00:31:38,000 --> 00:31:39,000
I will see you in the next video.

552
00:31:39,000 --> 00:31:40,000
Thank you.

