1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:05,000
So we are going to continue the discussion with respect to simple RNN architecture.

3
00:00:05,000 --> 00:00:10,000
And in this video we are going to discuss about forward propagation in simple RNN.

4
00:00:10,000 --> 00:00:10,000
Right.

5
00:00:10,000 --> 00:00:14,000
Already in our previous video we just saw about the entire architecture.

6
00:00:14,000 --> 00:00:16,000
How does a basic RNN looks like?

7
00:00:16,000 --> 00:00:18,000
I hope you have understood this.

8
00:00:18,000 --> 00:00:20,000
And what is this unfolding technique?

9
00:00:20,000 --> 00:00:23,000
Okay, now let's take this example.

10
00:00:23,000 --> 00:00:23,000
The food is good.

11
00:00:23,000 --> 00:00:27,000
So we will try to solve this problem with the help of simple RNN.

12
00:00:27,000 --> 00:00:34,000
And we'll understand how does the forward propagation actually happens, uh, with the help of RNN itself.

13
00:00:34,000 --> 00:00:37,000
So let's consider that here I have all these words.

14
00:00:37,000 --> 00:00:41,000
And uh, with respect to this particular words, first of all, I'm just going to go ahead and find

15
00:00:41,000 --> 00:00:43,000
out how many unique words are there.

16
00:00:43,000 --> 00:00:43,000
Okay.

17
00:00:43,000 --> 00:00:45,000
So let's say the number of unique words.

18
00:00:45,000 --> 00:00:47,000
Are there food okay.

19
00:00:47,000 --> 00:00:48,000
Good.

20
00:00:48,000 --> 00:00:50,000
Um and I'll say bad.

21
00:00:50,000 --> 00:00:53,000
And I can also say not okay.

22
00:00:53,000 --> 00:00:57,000
So total number of words are one, two three, four, five okay.

23
00:00:57,000 --> 00:01:02,000
Now let's consider initially I need to convert this words into vectors okay.

24
00:01:03,000 --> 00:01:05,000
Now though there are multiple ways to convert it.

25
00:01:05,000 --> 00:01:07,000
We can also use word two vec.

26
00:01:07,000 --> 00:01:08,000
We can use multiple techniques okay.

27
00:01:08,000 --> 00:01:13,000
But I'll just use the most basic one which is called as one hot encoding okay.

28
00:01:13,000 --> 00:01:17,000
So we will go ahead and use this one hot encoding.

29
00:01:17,000 --> 00:01:18,000
Now one hot encoding.

30
00:01:18,000 --> 00:01:22,000
What it does is that it converts each word into vectors okay.

31
00:01:22,000 --> 00:01:26,000
So let's say if I have this the word okay since I have five words.

32
00:01:26,000 --> 00:01:30,000
So wherever there is available that will become zero and remaining all will become zero.

33
00:01:31,000 --> 00:01:31,000
Right.

34
00:01:31,000 --> 00:01:34,000
So this basically becomes my first word okay.

35
00:01:34,000 --> 00:01:35,000
Wherever that is available.

36
00:01:35,000 --> 00:01:39,000
So that actually becomes one because the number of unique words are five one, two three, four, five.

37
00:01:39,000 --> 00:01:39,000
Right.

38
00:01:39,000 --> 00:01:44,000
So wherever there is present so that actually becomes one remaining all will become zero.

39
00:01:44,000 --> 00:01:45,000
Okay.

40
00:01:45,000 --> 00:01:49,000
Similarly, uh, if I go ahead and consider with respect to food, wherever food is present, that will

41
00:01:49,000 --> 00:01:52,000
become one remaining or will become zero.

42
00:01:52,000 --> 00:01:58,000
Okay, so this actually is my another vector for the food, uh, for the word food for my next word

43
00:01:58,000 --> 00:02:03,000
that I have over here is, uh, let's say good, because I'm not going to consider is because it is

44
00:02:03,000 --> 00:02:03,000
a stop word.

45
00:02:03,000 --> 00:02:08,000
So wherever there is good in the word, This third one will become zero.

46
00:02:08,000 --> 00:02:13,000
And then remaining all will be zeros itself, sorry, third one will become one as a vector and remaining

47
00:02:13,000 --> 00:02:14,000
all will be zeros.

48
00:02:14,000 --> 00:02:15,000
Right?

49
00:02:15,000 --> 00:02:17,000
And finally you have this bad.

50
00:02:17,000 --> 00:02:19,000
Bad basically means let's say this is my fourth word.

51
00:02:19,000 --> 00:02:22,000
So this will become one and remaining all will become zero.

52
00:02:22,000 --> 00:02:26,000
And finally when you have not, this will be one and this will become zero.

53
00:02:26,000 --> 00:02:31,000
So this exactly is the entire vector for my sentence one.

54
00:02:31,000 --> 00:02:35,000
Okay, so for my sentence one this is how my vector looks like.

55
00:02:36,000 --> 00:02:43,000
Now I need to train this entire sequential data with my, uh, neural network that is RNN, and probably

56
00:02:43,000 --> 00:02:48,000
make sure that how to do the prediction will probably see over there how to do this, how to train this

57
00:02:48,000 --> 00:02:51,000
particular neural network for this particular classification problem.

58
00:02:51,000 --> 00:02:53,000
That is nothing but text classification okay.

59
00:02:53,000 --> 00:02:59,000
So now we will go ahead and, uh, consider our, uh, RNN, right.

60
00:02:59,000 --> 00:03:04,000
Now, in this case, uh, as you have seen, the generic architecture of RNN is nothing, but it will

61
00:03:04,000 --> 00:03:05,000
be something like this.

62
00:03:05,000 --> 00:03:08,000
So I will be having my input.

63
00:03:08,000 --> 00:03:09,000
This will be my hidden layer.

64
00:03:10,000 --> 00:03:11,000
This will be my output.

65
00:03:11,000 --> 00:03:11,000
Okay.

66
00:03:11,000 --> 00:03:14,000
So input output.

67
00:03:14,000 --> 00:03:17,000
Along with this I will be having a feedback loop okay.

68
00:03:17,000 --> 00:03:19,000
And this feedback loop is nothing.

69
00:03:19,000 --> 00:03:23,000
But it will be sent to each and every hidden neurons in that particular hidden layer.

70
00:03:23,000 --> 00:03:24,000
All the information, all the output.

71
00:03:24,000 --> 00:03:30,000
Okay, so here first of all you need to understand how many inputs I need to give, right?

72
00:03:30,000 --> 00:03:35,000
Because at a time I will be giving one word right to my neural network for the training purpose.

73
00:03:35,000 --> 00:03:38,000
And every word is represented by this five vectors.

74
00:03:38,000 --> 00:03:44,000
So here I'm just going to consider this 12345.

75
00:03:44,000 --> 00:03:51,000
So this will be my five inputs that I will be giving to my, uh, RNN neural network.

76
00:03:51,000 --> 00:03:51,000
Okay.

77
00:03:51,000 --> 00:03:55,000
Now in my hidden layer, let's consider that I'm just going to use three neurons.

78
00:03:55,000 --> 00:03:55,000
Okay.

79
00:03:55,000 --> 00:04:02,000
So here I will be considering one neuron two neuron and three neuron okay.

80
00:04:02,000 --> 00:04:04,000
And timestamp is equal to one.

81
00:04:04,000 --> 00:04:12,000
I will be passing this word which is nothing but X11X12XX13.

82
00:04:12,000 --> 00:04:15,000
Okay, so I'll be passing this all words right 123.

83
00:04:15,000 --> 00:04:15,000
Like that.

84
00:04:15,000 --> 00:04:16,000
Right.

85
00:04:16,000 --> 00:04:17,000
All the words will be passed.

86
00:04:17,000 --> 00:04:22,000
So here you can see for the first word this will be passed over here as X11.

87
00:04:22,000 --> 00:04:23,000
And I know what is the vector.

88
00:04:23,000 --> 00:04:28,000
It is nothing but 10000.

89
00:04:28,000 --> 00:04:28,000
All right.

90
00:04:28,000 --> 00:04:31,000
This is the vectors that I'm going to pass through this inputs.

91
00:04:31,000 --> 00:04:35,000
So this basically becomes my input layer okay.

92
00:04:36,000 --> 00:04:44,000
Now this will basically get connected to all the hidden neurons right over here over here.

93
00:04:44,000 --> 00:04:47,000
Then you have here then you have here.

94
00:04:47,000 --> 00:04:50,000
So this will basically get connected to all the hidden neurons.

95
00:04:53,000 --> 00:04:55,000
Okay perfect.

96
00:04:56,000 --> 00:05:00,000
Right now this is my hidden layer.

97
00:05:00,000 --> 00:05:02,000
As you all know, this is my hidden layer one.

98
00:05:02,000 --> 00:05:06,000
But with respect to an RNN, we need to have a self loop, right?

99
00:05:06,000 --> 00:05:08,000
What does this self loop basically mean?

100
00:05:08,000 --> 00:05:15,000
That whatever is the output, I will be passing it to itself and all the other hidden neurons.

101
00:05:15,000 --> 00:05:20,000
So let's consider that I will now go ahead and use time stamp T two.

102
00:05:20,000 --> 00:05:25,000
And here I'm going to pass my another word that is nothing but X12.

103
00:05:26,000 --> 00:05:26,000
Okay.

104
00:05:26,000 --> 00:05:35,000
Now in this case when my X12 is passed 345 right x one to X12 is the word that is formed.

105
00:05:35,000 --> 00:05:41,000
And here I have my another neuron that is with one, two and three.

106
00:05:41,000 --> 00:05:42,000
Right.

107
00:05:43,000 --> 00:05:43,000
Sorry.

108
00:05:44,000 --> 00:05:45,000
One.

109
00:05:45,000 --> 00:05:45,000
Two.

110
00:05:45,000 --> 00:05:45,000
Three.

111
00:05:47,000 --> 00:05:51,000
I will just go ahead and use the same hidden neuron over here.

112
00:05:51,000 --> 00:05:54,000
One, two and three.

113
00:05:54,000 --> 00:05:54,000
Right.

114
00:05:54,000 --> 00:05:56,000
This needs to pass over here.

115
00:05:56,000 --> 00:05:56,000
Right?

116
00:05:57,000 --> 00:05:59,000
And when I'm passing this information at t is equal to two.

117
00:05:59,000 --> 00:06:04,000
As you all know, this information that we have computed at t is equal to one.

118
00:06:04,000 --> 00:06:05,000
So here we'll take this input.

119
00:06:05,000 --> 00:06:12,000
Multiply by all the weights and whatever output I get it will be returned back with the help of self.

120
00:06:12,000 --> 00:06:14,000
Uh, feedback loop itself.

121
00:06:14,000 --> 00:06:14,000
Okay.

122
00:06:14,000 --> 00:06:20,000
This, in short, is what is basically happening is that this input will probably go over here.

123
00:06:20,000 --> 00:06:21,000
Sorry.

124
00:06:21,000 --> 00:06:24,000
This output from this will go to the next timestamp node.

125
00:06:25,000 --> 00:06:26,000
Then this will also go over here.

126
00:06:26,000 --> 00:06:28,000
And this will also go over here.

127
00:06:28,000 --> 00:06:29,000
Right.

128
00:06:29,000 --> 00:06:30,000
So this is one right.

129
00:06:30,000 --> 00:06:33,000
And this is what is basically happening right at t is equal to one.

130
00:06:33,000 --> 00:06:38,000
Like when we use this self loop right or feedback loop, what we are basically doing, whatever output

131
00:06:38,000 --> 00:06:42,000
is actually coming up, we are sending it to its all the hidden neurons.

132
00:06:42,000 --> 00:06:45,000
And that is what is basically happening at time stamp is equal to two, right.

133
00:06:45,000 --> 00:06:48,000
And similarly you will be able to see that again.

134
00:06:48,000 --> 00:06:51,000
We will go ahead and pass this information to this.

135
00:06:51,000 --> 00:06:55,000
Uh, we will go ahead and pass this information to this.

136
00:06:55,000 --> 00:06:59,000
And we'll also go ahead and pass this information to this, whatever output I'm actually getting.

137
00:06:59,000 --> 00:07:05,000
And finally, you'll also be seeing that, uh, this will also get passed to my third norm.

138
00:07:05,000 --> 00:07:07,000
And this will also get passed.

139
00:07:07,000 --> 00:07:10,000
The output will also get passed over here.

140
00:07:10,000 --> 00:07:12,000
The output will also get passed over here.

141
00:07:12,000 --> 00:07:15,000
And here also you'll be able to see it will get passed over here.

142
00:07:15,000 --> 00:07:18,000
So that basically means at timestamp t is equal to two.

143
00:07:18,000 --> 00:07:23,000
This hidden neurons will also have the context of this word.

144
00:07:23,000 --> 00:07:25,000
that is the over here.

145
00:07:25,000 --> 00:07:27,000
This word is nothing but food, right?

146
00:07:27,000 --> 00:07:34,000
So when this food is basically passed in, the neural network in the timestamp is equal to two.

147
00:07:34,000 --> 00:07:40,000
This context, this neural network, this hidden layer will also have the context of the previous word.

148
00:07:41,000 --> 00:07:43,000
And similarly at t is equal to three.

149
00:07:43,000 --> 00:07:45,000
Same thing is going to happen, right?

150
00:07:45,000 --> 00:07:49,000
This output will be passed to the next hidden layer.

151
00:07:49,000 --> 00:07:49,000
Right?

152
00:07:49,000 --> 00:07:53,000
And here I'm just going to give my inputs my next word.

153
00:07:54,000 --> 00:08:02,000
And similarly when the sentence ends this neuron, this entire RNN is basically going to have the context

154
00:08:02,000 --> 00:08:05,000
of all the previous words from timestamp is equal to t is equal to one.

155
00:08:05,000 --> 00:08:09,000
Right now this is the basic basic funda.

156
00:08:09,000 --> 00:08:15,000
I know the diagram looks quite messy over here, but you can just understand in this way that after

157
00:08:15,000 --> 00:08:20,000
calculating whatever, uh, over here, whatever operation is specifically happening with respect to

158
00:08:20,000 --> 00:08:23,000
the forward propagation, in forward propagation, what do we do?

159
00:08:23,000 --> 00:08:28,000
We take this inputs, we multiply it by the weights, and then we add a bias to this hidden layer.

160
00:08:28,000 --> 00:08:32,000
And whatever output is basically coming we are passing it to itself.

161
00:08:32,000 --> 00:08:37,000
And the other hidden neurons at time stamp is equal to two so that they get the context information

162
00:08:37,000 --> 00:08:38,000
right.

163
00:08:38,000 --> 00:08:41,000
And that is how we go ahead and do our forward propagation.

164
00:08:42,000 --> 00:08:42,000
Right.

165
00:08:42,000 --> 00:08:46,000
Now this is how you can easily do each and every step.

166
00:08:46,000 --> 00:08:51,000
So in a generic way, if you also want to probably go ahead and write it, it is very much simple.

167
00:08:51,000 --> 00:08:54,000
So I can go ahead and say, hey, let's see.

168
00:08:54,000 --> 00:08:58,000
This is my five inputs that I am passing.

169
00:08:58,000 --> 00:09:00,000
This is my hidden layer okay.

170
00:09:01,000 --> 00:09:05,000
This is my hidden layer with any number of neurons.

171
00:09:05,000 --> 00:09:08,000
Okay, so right now in this case I've used how many?

172
00:09:08,000 --> 00:09:09,000
Three neurons.

173
00:09:09,000 --> 00:09:09,000
Right.

174
00:09:09,000 --> 00:09:11,000
And finally this is my output.

175
00:09:12,000 --> 00:09:14,000
So this will get passed.

176
00:09:14,000 --> 00:09:16,000
This will get passed.

177
00:09:17,000 --> 00:09:20,000
This will get passed to everyone right.

178
00:09:20,000 --> 00:09:22,000
Similarly this will get passed to everyone.

179
00:09:22,000 --> 00:09:25,000
This will get passed to everyone okay.

180
00:09:25,000 --> 00:09:25,000
Okay.

181
00:09:27,000 --> 00:09:28,000
All the hidden neurons.

182
00:09:32,000 --> 00:09:35,000
And finally, we get the output over here.

183
00:09:35,000 --> 00:09:37,000
It will get connected to this.

184
00:09:37,000 --> 00:09:38,000
It will get connected to this.

185
00:09:38,000 --> 00:09:39,000
It will get connected to this.

186
00:09:39,000 --> 00:09:43,000
But there will be an additional feedback loop that we will be using.

187
00:09:43,000 --> 00:09:46,000
And finally we get our output right.

188
00:09:46,000 --> 00:09:47,000
So input one.

189
00:09:47,000 --> 00:09:47,000
Input two.

190
00:09:47,000 --> 00:09:48,000
Input three.

191
00:09:48,000 --> 00:09:48,000
Input four.

192
00:09:48,000 --> 00:09:49,000
Input five.

193
00:09:49,000 --> 00:09:51,000
This is based on the one hot encoding.

194
00:09:51,000 --> 00:09:58,000
Whatever I am actually getting with respect to the words that I have right over here, whatever vectors

195
00:09:58,000 --> 00:10:01,000
I basically get that I will be passing it over here.

196
00:10:01,000 --> 00:10:02,000
Right?

197
00:10:02,000 --> 00:10:06,000
So if I want to probably see how many weight matrix will be there, how many weights, how many bias

198
00:10:06,000 --> 00:10:07,000
will be there.

199
00:10:07,000 --> 00:10:09,000
Here you can see that I have five inputs.

200
00:10:09,000 --> 00:10:12,000
And here I have three hidden neurons in the hidden layer.

201
00:10:12,000 --> 00:10:12,000
Right.

202
00:10:12,000 --> 00:10:15,000
So this basically becomes five cross three.

203
00:10:16,000 --> 00:10:18,000
So let's go ahead and calculate how many weights will be there.

204
00:10:18,000 --> 00:10:20,000
So here it will become five cross three.

205
00:10:20,000 --> 00:10:23,000
That is nothing but 15 weights right.

206
00:10:24,000 --> 00:10:32,000
Then over here you will be able to see that since I'm passing one cross five and the weights will be

207
00:10:32,000 --> 00:10:33,000
five, cross three right.

208
00:10:33,000 --> 00:10:37,000
One cross five, one cross five and five cross three one cross five.

209
00:10:37,000 --> 00:10:38,000
Why?

210
00:10:38,000 --> 00:10:38,000
Because.

211
00:10:38,000 --> 00:10:39,000
one input with five.

212
00:10:39,000 --> 00:10:41,000
With one row with five inputs.

213
00:10:41,000 --> 00:10:44,000
When we do the dot operation between them, right?

214
00:10:44,000 --> 00:10:46,000
Weights multiplied by the bias.

215
00:10:46,000 --> 00:10:49,000
Then in short, we are going to get 1.31 cross three.

216
00:10:49,000 --> 00:10:52,000
So for every neuron over here I'll get one cross three here.

217
00:10:52,000 --> 00:10:53,000
Also I'll get one cross three here.

218
00:10:53,000 --> 00:10:54,000
Also I'll get one cross three.

219
00:10:54,000 --> 00:10:58,000
So this in total for this entire hidden layer will be three cross three.

220
00:10:59,000 --> 00:11:02,000
So three cross three is nothing but nine weights okay.

221
00:11:02,000 --> 00:11:02,000
Okay.

222
00:11:02,000 --> 00:11:08,000
And then finally you will be also able to see that this will be my three cross one, which will be nothing

223
00:11:08,000 --> 00:11:09,000
but three weights.

224
00:11:09,000 --> 00:11:11,000
Because here also we initialize weights right.

225
00:11:11,000 --> 00:11:18,000
So total number of weights that you will be able to see 15 plus nine, 24, 24 plus three right.

226
00:11:18,000 --> 00:11:19,000
Three weights 27.

227
00:11:19,000 --> 00:11:21,000
But here we are also going to use bias.

228
00:11:21,000 --> 00:11:23,000
So here I will be getting three bias.

229
00:11:23,000 --> 00:11:25,000
And here I'll be getting one bias.

230
00:11:26,000 --> 00:11:29,000
So 27 plus four will be 31.

231
00:11:29,000 --> 00:11:33,000
So 31 number of total trainable parameters will be their.

232
00:11:35,000 --> 00:11:37,000
Trainable parameters will be there okay.

233
00:11:37,000 --> 00:11:41,000
So this is how it specifically happens with respect to the forward propagation okay.

234
00:11:42,000 --> 00:11:47,000
But again we need to understand in forward propagation uh how what all operations specifically happens.

235
00:11:47,000 --> 00:11:48,000
Okay.

236
00:11:48,000 --> 00:11:51,000
So I'm going to take this entire example for the forward propagation.

237
00:11:52,000 --> 00:11:54,000
I'll copy this I'll paste it over here.

238
00:11:55,000 --> 00:11:55,000
Okay.

239
00:11:56,000 --> 00:11:59,000
Now in order to understand forward propagation with time.

240
00:12:00,000 --> 00:12:03,000
So here I'm going to go ahead and write forward propagation.

241
00:12:07,000 --> 00:12:09,000
Forward propagation with time.

242
00:12:15,000 --> 00:12:18,000
So now let's go ahead and discuss about the forward propagation okay.

243
00:12:19,000 --> 00:12:22,000
Let's say that I have my input over here.

244
00:12:22,000 --> 00:12:23,000
That is nothing.

245
00:12:23,000 --> 00:12:27,000
But so let's say uh, my data set is present over here.

246
00:12:27,000 --> 00:12:29,000
The data set is nothing.

247
00:12:29,000 --> 00:12:30,000
But I will go ahead and write.

248
00:12:30,000 --> 00:12:35,000
The food is good.

249
00:12:36,000 --> 00:12:36,000
Okay.

250
00:12:37,000 --> 00:12:41,000
So as I said, uh, and finally, my output that sees over here is one.

251
00:12:41,000 --> 00:12:42,000
Let's consider it.

252
00:12:42,000 --> 00:12:42,000
Okay.

253
00:12:42,000 --> 00:12:47,000
So let's say this is my X11X12X13 and X14.

254
00:12:47,000 --> 00:12:48,000
Okay.

255
00:12:48,000 --> 00:12:52,000
At timestamp t is equal to one I will be giving X11.

256
00:12:52,000 --> 00:12:55,000
This X11 will be in the form of vectors, right?

257
00:12:55,000 --> 00:12:58,000
Let's say that I have converted by one hot encoding.

258
00:12:58,000 --> 00:13:00,000
And I got five characters over here.

259
00:13:00,000 --> 00:13:02,000
And I'm sending this to X11 okay.

260
00:13:03,000 --> 00:13:07,000
Now this initially we need to convert this word into vectors first of all okay.

261
00:13:07,000 --> 00:13:09,000
Now when this passes to the hidden layer.

262
00:13:09,000 --> 00:13:13,000
So this is my hidden layer here some weights will be initialized okay.

263
00:13:13,000 --> 00:13:16,000
So here my first weights will be initialized.

264
00:13:16,000 --> 00:13:21,000
Then uh this input will be multiplied by the weights okay.

265
00:13:21,000 --> 00:13:23,000
Here a bias will be added.

266
00:13:23,000 --> 00:13:26,000
So at every node there will be bias here.

267
00:13:26,000 --> 00:13:30,000
Also bias will be there bias and bias.

268
00:13:30,000 --> 00:13:31,000
Okay.

269
00:13:31,000 --> 00:13:32,000
Based on the number of hidden neurons.

270
00:13:33,000 --> 00:13:41,000
So once we use this x one uh, along with this weights I will be specifically getting my output one

271
00:13:41,000 --> 00:13:41,000
over here.

272
00:13:41,000 --> 00:13:43,000
Oh one okay.

273
00:13:43,000 --> 00:13:50,000
Now this oh one, when it is getting transferred back to the, uh, to its particular hidden nodes.

274
00:13:50,000 --> 00:13:51,000
Right.

275
00:13:51,000 --> 00:13:54,000
Uh, then here also this weights will be assigned in this.

276
00:13:54,000 --> 00:13:58,000
So let's consider this weight as w dash okay.

277
00:13:58,000 --> 00:14:00,000
Then again at timestamp is equal to two.

278
00:14:00,000 --> 00:14:02,000
So we will first of all discuss about this particular operation.

279
00:14:02,000 --> 00:14:04,000
What exactly happens.

280
00:14:04,000 --> 00:14:11,000
So if I go ahead and see or calculate with respect to forward propagation with respect to forward propagation,

281
00:14:11,000 --> 00:14:12,000
please focus on this okay.

282
00:14:13,000 --> 00:14:14,000
Forward.

283
00:14:16,000 --> 00:14:17,000
Propagation.

284
00:14:19,000 --> 00:14:25,000
So with respect to forward propagation here, you'll be able to see my O one will be nothing, but it

285
00:14:25,000 --> 00:14:28,000
will be a function of function.

286
00:14:28,000 --> 00:14:33,000
Of function basically means an activation function, because in every hidden neuron we also apply an

287
00:14:33,000 --> 00:14:35,000
activation function okay.

288
00:14:35,000 --> 00:14:43,000
A function of here we'll go ahead and multiply X11 multiplied by w.

289
00:14:43,000 --> 00:14:44,000
Okay this w.

290
00:14:44,000 --> 00:14:49,000
And then we will go ahead and add a bias okay.

291
00:14:49,000 --> 00:14:50,000
It can be be one.

292
00:14:50,000 --> 00:14:51,000
It can be anything.

293
00:14:51,000 --> 00:14:54,000
I'll just go ahead and write by be one okay.

294
00:14:54,000 --> 00:14:58,000
This f is nothing but it is an activation function.

295
00:14:58,000 --> 00:14:59,000
So f is nothing.

296
00:14:59,000 --> 00:15:01,000
But over here as an activation function.

297
00:15:02,000 --> 00:15:06,000
So this is with respect to time stamp T is equal to one okay.

298
00:15:06,000 --> 00:15:08,000
Not t is equal to two.

299
00:15:08,000 --> 00:15:10,000
What will happen at t is equal to two.

300
00:15:10,000 --> 00:15:13,000
I'll be sending my next word that is X12.

301
00:15:13,000 --> 00:15:17,000
And here I'll be having my words uh weights w dash.

302
00:15:18,000 --> 00:15:22,000
Uh sorry w and here we will go ahead and compute our o two.

303
00:15:22,000 --> 00:15:29,000
Now while we are computing O two, this o one will also get multiplied by the weights w dash because

304
00:15:29,000 --> 00:15:31,000
here also we are having some different weights.

305
00:15:31,000 --> 00:15:34,000
So next time if I really want to go ahead and calculate my O2.

306
00:15:35,000 --> 00:15:37,000
So again here we'll be using a function f.

307
00:15:38,000 --> 00:15:45,000
And here I'll go ahead and write x one to multiplied by w okay x one to multiplied by w.

308
00:15:45,000 --> 00:15:46,000
Or let me write it in white color.

309
00:15:46,000 --> 00:15:48,000
So here I'll be using a function.

310
00:15:49,000 --> 00:15:50,000
Okay.

311
00:15:50,000 --> 00:15:58,000
So first thing first over here you'll be able to see X12 which is my vector will be multiplied by W.

312
00:15:58,000 --> 00:16:04,000
Plus I have to also used oh one multiplied by w dash right.

313
00:16:04,000 --> 00:16:07,000
So oh one multiplied by w dash.

314
00:16:07,000 --> 00:16:11,000
Now along with this I also need to add bias.

315
00:16:12,000 --> 00:16:15,000
So here I will go ahead and add bias.

316
00:16:15,000 --> 00:16:16,000
That is B1 right.

317
00:16:16,000 --> 00:16:20,000
And then we go ahead and apply the activation function Right.

318
00:16:20,000 --> 00:16:22,000
This is very much simple.

319
00:16:22,000 --> 00:16:27,000
Then again if I want to go ahead and compute my O three now o three will be nothing but over here.

320
00:16:27,000 --> 00:16:28,000
Right.

321
00:16:28,000 --> 00:16:29,000
This will be my O three.

322
00:16:29,000 --> 00:16:30,000
Right.

323
00:16:30,000 --> 00:16:34,000
Now in order to calculate O three again I have w over here and this will be my w dash.

324
00:16:34,000 --> 00:16:37,000
This will be also my w dash right.

325
00:16:37,000 --> 00:16:40,000
This weights will not get updated during forward propagation.

326
00:16:40,000 --> 00:16:44,000
All these weights will be getting updated with respect to time during backward propagation.

327
00:16:44,000 --> 00:16:48,000
I'm just showing you in forward propagation what all operation is going to specifically happen.

328
00:16:48,000 --> 00:16:49,000
Right.

329
00:16:49,000 --> 00:16:54,000
So here you'll be able to see that if I go ahead and show you with respect to O three.

330
00:16:54,000 --> 00:16:57,000
Now O three is nothing but my function off.

331
00:16:57,000 --> 00:16:59,000
So O three when I'm sending O three is nothing.

332
00:16:59,000 --> 00:17:01,000
But I want to calculate this.

333
00:17:01,000 --> 00:17:04,000
So I have to use X13 multiplied by w.

334
00:17:04,000 --> 00:17:09,000
Let me quickly go ahead and write X13 multiplied by w plus.

335
00:17:09,000 --> 00:17:17,000
On the left hand side you have o two multiplied by W-O2 multiplied by w dash plus another bias.

336
00:17:18,000 --> 00:17:18,000
Right.

337
00:17:18,000 --> 00:17:20,000
And that is how you basically get o three.

338
00:17:21,000 --> 00:17:21,000
Right?

339
00:17:21,000 --> 00:17:27,000
And finally when you get o four again by doing the same operation o four.

340
00:17:27,000 --> 00:17:30,000
Since this is a classification problem, right.

341
00:17:31,000 --> 00:17:33,000
It is a binary classification problem.

342
00:17:33,000 --> 00:17:36,000
Then here we are going to use.

343
00:17:36,000 --> 00:17:42,000
So let's say that here I go ahead and probably calculate my O for this will forward.

344
00:17:43,000 --> 00:17:45,000
We will use this output.

345
00:17:46,000 --> 00:17:52,000
If it is a binary classification problem we will go ahead and apply something called as softmax.

346
00:17:52,000 --> 00:17:55,000
Softmax is basically not softmax.

347
00:17:55,000 --> 00:17:55,000
Sorry.

348
00:17:56,000 --> 00:17:59,000
We will go ahead and apply sigmoid okay.

349
00:17:59,000 --> 00:18:04,000
So sigmoid activation function for a binary classification over here.

350
00:18:04,000 --> 00:18:09,000
So this will basically give my binary output okay zero and one.

351
00:18:09,000 --> 00:18:19,000
If I have multi class then instead of using sigmoid I may go ahead and use softmax softmax activation

352
00:18:19,000 --> 00:18:19,000
function.

353
00:18:19,000 --> 00:18:21,000
And finally I get my y hat here.

354
00:18:21,000 --> 00:18:24,000
I may get my y hat right.

355
00:18:24,000 --> 00:18:26,000
Once I get my y hat, then what will happen?

356
00:18:26,000 --> 00:18:33,000
Whatever will be my y y hat, we will go ahead and compute the loss, and we will try to reduce this

357
00:18:33,000 --> 00:18:36,000
loss by doing back propagation.

358
00:18:36,000 --> 00:18:40,000
So back propagation I'll be discussing how it happens in my next video.

359
00:18:40,000 --> 00:18:42,000
But forward propagation you saw this.

360
00:18:42,000 --> 00:18:47,000
This is how the entire operation will happen with respect to each and every word.

361
00:18:47,000 --> 00:18:49,000
Write in a specific sentence.

362
00:18:50,000 --> 00:18:53,000
And that is how the forward propagation with time actually happens.

363
00:18:53,000 --> 00:18:59,000
And this function is a kind of activation function for every word we will go ahead and use.

364
00:18:59,000 --> 00:19:03,000
Uh, we'll go ahead and multiply each and everything as we go ahead till the end of the sentence.

365
00:19:03,000 --> 00:19:03,000
Okay.

366
00:19:04,000 --> 00:19:09,000
So, uh, I hope, uh, you got an idea about forward propagation, and that is how it actually happens.

367
00:19:09,000 --> 00:19:14,000
Uh, um, Um, but here also, you got an idea with respect to weights, how many weights, how many

368
00:19:14,000 --> 00:19:15,000
bias will specifically be there?

369
00:19:16,000 --> 00:19:21,000
Uh, initially my this inputs will be one cross five, then five cross three weights are basically getting

370
00:19:21,000 --> 00:19:21,000
applied.

371
00:19:21,000 --> 00:19:24,000
When we do dot operation I'm going to get one cross three.

372
00:19:24,000 --> 00:19:27,000
So one cross three will be with respect to every node there are three nodes.

373
00:19:27,000 --> 00:19:30,000
So it will be three cross three total nine weights.

374
00:19:30,000 --> 00:19:35,000
Since it is getting passed to each and every, uh, each and every hidden node.

375
00:19:35,000 --> 00:19:38,000
So that is the reason I'm basically using three weights.

376
00:19:38,000 --> 00:19:41,000
Uh, and here you will be, uh, sorry, three cross three.

377
00:19:41,000 --> 00:19:41,000
It will become.

378
00:19:41,000 --> 00:19:42,000
Right.

379
00:19:42,000 --> 00:19:46,000
And then finally here you can see again another three cross one is there.

380
00:19:46,000 --> 00:19:51,000
And uh, when we try to multiply this again, three cross one, I will be getting three weights.

381
00:19:51,000 --> 00:19:53,000
And here will be one bias.

382
00:19:53,000 --> 00:19:56,000
And this is how the forward propagation basically happens.

383
00:19:56,000 --> 00:19:59,000
So in short here it took so much time.

384
00:19:59,000 --> 00:20:05,000
because if you understand this architecture, right, uh, trust me, it will be very much easy to understand

385
00:20:05,000 --> 00:20:08,000
all the upcoming architectures like LSTM or GRU and.

386
00:20:08,000 --> 00:20:08,000
All right.

387
00:20:08,000 --> 00:20:11,000
But here, I hope you got an idea.

388
00:20:11,000 --> 00:20:14,000
I broke down completely to make you understand about RNN, right?

389
00:20:14,000 --> 00:20:17,000
So yes, uh, this was it, uh, from my side.

390
00:20:17,000 --> 00:20:20,000
Uh, I hope you like this particular video.

391
00:20:20,000 --> 00:20:24,000
I will see you all in the next video where we'll be discussing about backward propagation.

392
00:20:24,000 --> 00:20:24,000
Thank you.

