1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:05,000
So we are going to continue our discussion with respect to our end to end deep learning project using

3
00:00:05,000 --> 00:00:07,000
simple RNN.

4
00:00:07,000 --> 00:00:13,000
So already in my previous video, I have already shown you how to actually use embedding layer.

5
00:00:14,000 --> 00:00:19,000
Now let us go ahead step by step and see like what kind of deep learning project we are implementing

6
00:00:19,000 --> 00:00:21,000
with the help of simple RNN.

7
00:00:21,000 --> 00:00:27,000
And as I have already told you, I'm going to use the IMDb data set and we are just going to do the

8
00:00:27,000 --> 00:00:29,000
text classification using simple RNN.

9
00:00:30,000 --> 00:00:31,000
Um, so let's go ahead.

10
00:00:31,000 --> 00:00:35,000
And first of all, I will quickly use the same environment.

11
00:00:35,000 --> 00:00:36,000
You will be able to see that over here.

12
00:00:36,000 --> 00:00:40,000
I have my simple RNN folder with an end classification.

13
00:00:40,000 --> 00:00:42,000
What requirement dot txt was there?

14
00:00:42,000 --> 00:00:44,000
What virtual environment was there?

15
00:00:44,000 --> 00:00:45,000
We are going to use that same thing.

16
00:00:45,000 --> 00:00:48,000
Okay, so here is my end classification folder.

17
00:00:48,000 --> 00:00:49,000
Here is my simple RNN.

18
00:00:49,000 --> 00:00:55,000
And similarly I will keep on creating all the other projects as we go ahead like LSTM, RNN or uh,

19
00:00:55,000 --> 00:01:00,000
GRU or any other further, uh, variants of RNN.

20
00:01:00,000 --> 00:01:02,000
We are going to see everything will be creating folder wise.

21
00:01:02,000 --> 00:01:02,000
Okay.

22
00:01:03,000 --> 00:01:07,000
So first of all I will go ahead and select my kernel okay.

23
00:01:07,000 --> 00:01:11,000
Here I'm going to import numpy as NP.

24
00:01:11,000 --> 00:01:11,000
Okay.

25
00:01:12,000 --> 00:01:16,000
Uh NP let's execute this and see it's working fine.

26
00:01:16,000 --> 00:01:23,000
Then along with this we are also going to import TensorFlow as TF here.

27
00:01:23,000 --> 00:01:28,000
Also we are going to use uh TensorFlow and Keras.

28
00:01:28,000 --> 00:01:34,000
Then uh, as you all know this IMDb data set is available inside the TensorFlow.

29
00:01:35,000 --> 00:01:40,000
Uh, for that we can actually load that particular data set directly from there.

30
00:01:40,000 --> 00:01:42,000
So let's go ahead and import that also.

31
00:01:42,000 --> 00:01:52,000
So if I go ahead and right from TensorFlow dot Keras dot data sets I'm going to import IMDb okay.

32
00:01:53,000 --> 00:02:02,000
Then I'm going to basically write from TensorFlow dot Keras dot pre-processing as I said that uh I can

33
00:02:02,000 --> 00:02:05,000
also use this pre-processing okay.

34
00:02:05,000 --> 00:02:07,000
Pre-processing.

35
00:02:10,000 --> 00:02:11,000
Pre-processing.

36
00:02:11,000 --> 00:02:16,000
Import sequence okay.

37
00:02:16,000 --> 00:02:19,000
So I'm going to import the sequence I'll talk about it.

38
00:02:19,000 --> 00:02:21,000
Why we will be using the sequence and all.

39
00:02:21,000 --> 00:02:25,000
But you can just imagine that this is a kind of a pre-processing technique that we really need to use.

40
00:02:26,000 --> 00:02:36,000
Then from Keras sorry from TensorFlow dot Keras dot models I'm going to specifically import sequential.

41
00:02:36,000 --> 00:02:40,000
Sequential is must for any type of neural network that we are going to use.

42
00:02:40,000 --> 00:02:48,000
Then from TensorFlow dot Keras dot layers I'm going to import embedding.

43
00:02:49,000 --> 00:02:52,000
Along with the embedding I'm going to use simple RNN.

44
00:02:52,000 --> 00:02:53,000
Okay.

45
00:02:53,000 --> 00:02:57,000
Uh and then we'll also use another layer which is called as dense.

46
00:02:57,000 --> 00:03:03,000
Now, usually with the help of an, we usually use this dense layer right where we create the hidden

47
00:03:03,000 --> 00:03:08,000
layers, where we create the hidden nodes with the help of simple RNN will be creating the, uh, RNN

48
00:03:08,000 --> 00:03:08,000
nodes.

49
00:03:08,000 --> 00:03:09,000
Right.

50
00:03:09,000 --> 00:03:11,000
And this embedding is basically for your embedding layer.

51
00:03:11,000 --> 00:03:13,000
So let's go ahead and import this.

52
00:03:14,000 --> 00:03:16,000
So this has got successfully imported.

53
00:03:16,000 --> 00:03:22,000
Uh now I'm just going to go ahead and import my IMDb, which I've actually done it over here.

54
00:03:22,000 --> 00:03:26,000
Now let's load the IMDb data set.

55
00:03:26,000 --> 00:03:30,000
So here you'll be able to see I will be loading the IMDb data set.

56
00:03:31,000 --> 00:03:35,000
Um, so first of all, I will just go ahead and use my max features.

57
00:03:35,000 --> 00:03:39,000
This Max feature is just like your vocabulary size, right.

58
00:03:39,000 --> 00:03:40,000
So here I'll go ahead.

59
00:03:40,000 --> 00:03:43,000
And first of all write my vocabulary.

60
00:03:43,000 --> 00:03:47,000
So I'm basically initializing my vocabulary size.

61
00:03:47,000 --> 00:03:47,000
Perfect.

62
00:03:48,000 --> 00:03:57,000
Then, uh, I will just go ahead and create my x underscore train comma y underscore train values okay.

63
00:03:57,000 --> 00:04:00,000
I will talk about this particular data set in depth okay.

64
00:04:00,000 --> 00:04:01,000
What kind of data set is given.

65
00:04:01,000 --> 00:04:07,000
Everything I'll be talking about then along with this I will go ahead and write x underscore text comma

66
00:04:07,000 --> 00:04:08,000
y underscore test.

67
00:04:09,000 --> 00:04:15,000
And if we go ahead and write this I am db dot load data okay.

68
00:04:15,000 --> 00:04:20,000
And I'll say uh based on how many number of words what is my vocabulary size.

69
00:04:20,000 --> 00:04:20,000
Right.

70
00:04:20,000 --> 00:04:21,000
So that is nothing.

71
00:04:21,000 --> 00:04:23,000
My max features.

72
00:04:23,000 --> 00:04:29,000
So once I give this I will be able to print the shape of my training and test data.

73
00:04:29,000 --> 00:04:30,000
Okay.

74
00:04:30,000 --> 00:04:32,000
So here I will just go ahead and execute it.

75
00:04:33,000 --> 00:04:40,000
So I'll say, hey, uh, go ahead and print the extreme X test shape just to make you understand how

76
00:04:40,000 --> 00:04:43,000
many records we are actually playing up with, right?

77
00:04:43,000 --> 00:04:48,000
So here you'll be able to see that, uh, it will be downloading this entire data set.

78
00:04:48,000 --> 00:04:51,000
And the training data shape is 25,000 records.

79
00:04:51,000 --> 00:04:52,000
Okay.

80
00:04:52,000 --> 00:04:57,000
So this will down get downloaded in your local and it will be staying in their, uh, cache memory itself.

81
00:04:57,000 --> 00:05:01,000
Then here you have your testing data, which is again 25,000 records.

82
00:05:01,000 --> 00:05:01,000
Okay.

83
00:05:02,000 --> 00:05:05,000
Similarly for this particular sentence you have this labels, right.

84
00:05:05,000 --> 00:05:06,000
It can be positive or negative.

85
00:05:07,000 --> 00:05:08,000
Now let's do one thing.

86
00:05:08,000 --> 00:05:12,000
Let's go ahead and explore this kind of sample reviews.

87
00:05:12,000 --> 00:05:15,000
What is there inside it and what kind of labels is specifically present okay.

88
00:05:15,000 --> 00:05:24,000
So we will just go ahead and inspect our sample review and its label.

89
00:05:24,000 --> 00:05:24,000
Okay.

90
00:05:24,000 --> 00:05:33,000
So here I'm going to write sample review is equal to x x underscore train of zero okay.

91
00:05:33,000 --> 00:05:40,000
So if I'm taking x underscore train of zero let's see first of all what is there in my X train and what

92
00:05:40,000 --> 00:05:41,000
is there in my X test.

93
00:05:41,000 --> 00:05:42,000
Okay.

94
00:05:42,000 --> 00:05:45,000
So in X train you'll be able to see x train of zero.

95
00:05:45,000 --> 00:05:48,000
In my first sentence I have this kind of words.

96
00:05:48,000 --> 00:05:49,000
And you remember this kind of words.

97
00:05:49,000 --> 00:05:50,000
When do we get it?

98
00:05:50,000 --> 00:05:53,000
I have already shown you with respect to the embedding layer, right?

99
00:05:53,000 --> 00:05:58,000
So if you go ahead and open this here also, you could see in every sentence I was getting this kind

100
00:05:58,000 --> 00:05:59,000
of words right.

101
00:05:59,000 --> 00:06:01,000
So in short this is nothing.

102
00:06:01,000 --> 00:06:03,000
But it is a one hot representation of every word.

103
00:06:03,000 --> 00:06:03,000
Right?

104
00:06:03,000 --> 00:06:07,000
When I say one hot representation here you are able to see the index, right.

105
00:06:08,000 --> 00:06:11,000
Uh, so this is one sentence, okay.

106
00:06:11,000 --> 00:06:13,000
And as you know, my vocabulary size is 10,000.

107
00:06:13,000 --> 00:06:15,000
So based on that you will be seeing different different index.

108
00:06:15,000 --> 00:06:18,000
And each index represents one word okay.

109
00:06:19,000 --> 00:06:22,000
So this is what you can actually get it okay.

110
00:06:22,000 --> 00:06:24,000
So I'm going to take this as a sample review okay.

111
00:06:24,000 --> 00:06:30,000
And let's see y of train of zero y underscore train of zero.

112
00:06:30,000 --> 00:06:32,000
This will basically be 0 or 1 okay.

113
00:06:32,000 --> 00:06:35,000
So here you can see it is one okay.

114
00:06:35,000 --> 00:06:36,000
It can be zero or it can be one.

115
00:06:36,000 --> 00:06:39,000
One basically means it says that it is a positive sentiment.

116
00:06:39,000 --> 00:06:41,000
Zero basically means it says a negative sentiment.

117
00:06:41,000 --> 00:06:41,000
Okay.

118
00:06:41,000 --> 00:06:46,000
So I will also go ahead and take my sample underscore label okay.

119
00:06:46,000 --> 00:06:49,000
And then I will go ahead and write y underscore train of zero.

120
00:06:50,000 --> 00:06:53,000
Now you may be thinking Krish how did we get this direct vectors right.

121
00:06:53,000 --> 00:06:56,000
How do we how do we get this indexes directly.

122
00:06:56,000 --> 00:07:00,000
Uh this data set is put up in that way, but I will what I will do, I will decode this and I'll try

123
00:07:00,000 --> 00:07:04,000
to show you what exactly is the real sentence also.

124
00:07:04,000 --> 00:07:04,000
Okay.

125
00:07:04,000 --> 00:07:07,000
Now let's go ahead and print this.

126
00:07:07,000 --> 00:07:09,000
So I will go ahead and print.

127
00:07:09,000 --> 00:07:10,000
Okay.

128
00:07:10,000 --> 00:07:18,000
I'll say hey, uh, this is my sample review, uh, as integers.

129
00:07:18,000 --> 00:07:19,000
Okay.

130
00:07:20,000 --> 00:07:26,000
And here I'm going to specifically use this as my sample review.

131
00:07:26,000 --> 00:07:27,000
Okay.

132
00:07:27,000 --> 00:07:28,000
I'm just printing it.

133
00:07:28,000 --> 00:07:32,000
Similarly, I will go ahead and print my sample label okay.

134
00:07:32,000 --> 00:07:36,000
So once I print both of them you can see this is my sample review as integers.

135
00:07:36,000 --> 00:07:38,000
This is my entire sentence.

136
00:07:38,000 --> 00:07:41,000
And it is like very very big right.

137
00:07:41,000 --> 00:07:44,000
So that many number of uh sentence length is quite big I guess.

138
00:07:44,000 --> 00:07:45,000
Okay.

139
00:07:45,000 --> 00:07:46,000
In this and this sample label is one.

140
00:07:47,000 --> 00:07:48,000
Now let's do one thing okay.

141
00:07:48,000 --> 00:07:57,000
Let's map mapping of word index back to words.

142
00:07:57,000 --> 00:07:58,000
We need to do this okay.

143
00:07:58,000 --> 00:08:03,000
We will try to convert this just for our understanding okay.

144
00:08:03,000 --> 00:08:04,000
How do I do that.

145
00:08:04,000 --> 00:08:06,000
Okay that we will try to see.

146
00:08:06,000 --> 00:08:13,000
So here I'm going to basically use word underscore index is equal to I'll say hey go and use this IMDb

147
00:08:13,000 --> 00:08:16,000
and I'll say, hey, get all the word index.

148
00:08:16,000 --> 00:08:18,000
How many word index are there?

149
00:08:18,000 --> 00:08:19,000
You just get it.

150
00:08:19,000 --> 00:08:19,000
Okay.

151
00:08:19,000 --> 00:08:22,000
So once I execute this I will be able to see my word index.

152
00:08:22,000 --> 00:08:25,000
So this will basically be my word index for this particular word you'll be seeing.

153
00:08:25,000 --> 00:08:27,000
This is the index for this word.

154
00:08:27,000 --> 00:08:29,000
It is this index for nunnery.

155
00:08:29,000 --> 00:08:31,000
This is this index on Giovani.

156
00:08:31,000 --> 00:08:34,000
All these words I'm actually seeing the indexes okay.

157
00:08:34,000 --> 00:08:39,000
Now what I will do now as you know this word index is an, uh, dictionary, right?

158
00:08:39,000 --> 00:08:41,000
It is a it is obviously a dictionary.

159
00:08:41,000 --> 00:08:45,000
What I will do, I will try to reverse.

160
00:08:45,000 --> 00:08:47,000
See, I can also reverse this word index.

161
00:08:47,000 --> 00:08:47,000
Okay.

162
00:08:47,000 --> 00:08:49,000
I can basically write first of all index information.

163
00:08:49,000 --> 00:08:52,000
And then I can go ahead and write this particular value okay.

164
00:08:52,000 --> 00:08:55,000
So for this I will just go ahead and write this code.

165
00:08:55,000 --> 00:08:56,000
Please focus on this.

166
00:08:56,000 --> 00:08:59,000
And this is where dictionary comprehension will also come into picture.

167
00:08:59,000 --> 00:09:04,000
So here what I'm saying for key comma value in word underscore index dot items okay.

168
00:09:04,000 --> 00:09:09,000
And then I'm saying hey write like this value colon key for key value colon key right.

169
00:09:09,000 --> 00:09:13,000
So value this is basically just getting reversed okay.

170
00:09:13,000 --> 00:09:19,000
So now in order to see the reverse word index I can also go ahead and print it if you want.

171
00:09:19,000 --> 00:09:20,000
So I will go ahead and print it.

172
00:09:20,000 --> 00:09:23,000
And this also will be able to see it.

173
00:09:23,000 --> 00:09:28,000
See uh let's see Where did it go?

174
00:09:28,000 --> 00:09:31,000
Okay, uh, I will just try to print in this way.

175
00:09:31,000 --> 00:09:32,000
Just a second.

176
00:09:32,000 --> 00:09:34,000
Instead of using print, I will just use this.

177
00:09:34,000 --> 00:09:36,000
Let me comment this out.

178
00:09:37,000 --> 00:09:40,000
So this is how it is reversed okay.

179
00:09:40,000 --> 00:09:48,000
Now what we are going to do is that we are going to probably from this particular, uh, keys, you

180
00:09:48,000 --> 00:09:50,000
know, or from this particular keys.

181
00:09:50,000 --> 00:09:53,000
I'm trying to get all my reviews.

182
00:09:53,000 --> 00:09:55,000
Like, let's say I'll consider this particular sample review.

183
00:09:55,000 --> 00:09:57,000
I will get all my words.

184
00:09:57,000 --> 00:09:58,000
So one is over here.

185
00:09:58,000 --> 00:10:02,000
So for one, what will be the word we'll try to see then I have 14 over here.

186
00:10:02,000 --> 00:10:04,000
Then for 14 what is the word I'll try to see.

187
00:10:04,000 --> 00:10:05,000
Okay.

188
00:10:05,000 --> 00:10:11,000
So uh, when I check the documentation with respect to the TensorFlow this was the code that is used.

189
00:10:11,000 --> 00:10:12,000
Okay.

190
00:10:12,000 --> 00:10:14,000
So here I'm saying reverse underscore word underscore.

191
00:10:14,000 --> 00:10:16,000
Uh get index okay.

192
00:10:16,000 --> 00:10:16,000
Over here.

193
00:10:17,000 --> 00:10:23,000
Uh it is saying dot get I minus three comma question mark for I in sample review.

194
00:10:23,000 --> 00:10:24,000
See what it is doing.

195
00:10:24,000 --> 00:10:27,000
First of all I'm going through each and every words in that sample review.

196
00:10:27,000 --> 00:10:30,000
Then it is taking this particular I right.

197
00:10:30,000 --> 00:10:33,000
So here you can see for I in sample underscore review.

198
00:10:33,000 --> 00:10:35,000
So first index I got whatever index.

199
00:10:35,000 --> 00:10:39,000
Then over here I'm writing I minus three okay I minus three.

200
00:10:39,000 --> 00:10:42,000
We are trying to get that specific index value.

201
00:10:42,000 --> 00:10:45,000
If we are not getting that index value, it will be just replaced by question mark.

202
00:10:45,000 --> 00:10:45,000
Okay.

203
00:10:45,000 --> 00:10:50,000
Now if I go ahead and see this decoded review you'll be able to see this.

204
00:10:50,000 --> 00:10:51,000
See initially I got question mark.

205
00:10:51,000 --> 00:10:54,000
It was one one minus three is minus two.

206
00:10:54,000 --> 00:10:57,000
So when we are trying to search for minus two index I'm not getting any value.

207
00:10:57,000 --> 00:10:59,000
So it is replaced by question mark.

208
00:10:59,000 --> 00:11:01,000
Then it shows this frame was just brilliant.

209
00:11:01,000 --> 00:11:03,000
So and so so all the information is over here.

210
00:11:03,000 --> 00:11:04,000
This frame was just brilliant.

211
00:11:04,000 --> 00:11:07,000
Casting, locations, scenery, story direction.

212
00:11:07,000 --> 00:11:12,000
Everyone really suited that part they played and you could just imagine being the robot and all this

213
00:11:12,000 --> 00:11:13,000
information is specifically there.

214
00:11:13,000 --> 00:11:17,000
So this is nothing, but this is my decoded review.

215
00:11:17,000 --> 00:11:17,000
Right.

216
00:11:17,000 --> 00:11:23,000
So initially this was my entire sample review and we had completely decoded it over here.

217
00:11:23,000 --> 00:11:26,000
And this is just one sentence along with the text.

218
00:11:26,000 --> 00:11:26,000
Okay.

219
00:11:26,000 --> 00:11:32,000
So this just to give you an idea from where this exactly sentence is coming, but for our sake, uh,

220
00:11:32,000 --> 00:11:33,000
this step is already done.

221
00:11:33,000 --> 00:11:37,000
We are able to get this particular one hot representation index, so we'll directly go ahead and use

222
00:11:37,000 --> 00:11:38,000
it.

223
00:11:38,000 --> 00:11:41,000
Just for your understanding, I've actually given over here.

224
00:11:41,000 --> 00:11:41,000
Okay.

225
00:11:42,000 --> 00:11:43,000
Perfect.

226
00:11:43,000 --> 00:11:46,000
Uh, so this was one of the basic information.

227
00:11:46,000 --> 00:11:50,000
What we did is that we loaded the data, we did all the things and all.

228
00:11:50,000 --> 00:11:54,000
Okay, now let me just go ahead and use this from text.

229
00:11:54,000 --> 00:11:56,000
Uh, from TensorFlow.

230
00:11:56,000 --> 00:12:01,000
If you remember, we had also imported something called a sequence, right in pre-processing.

231
00:12:01,000 --> 00:12:07,000
Now why this sequence will be very much necessary because here we are going to do the padding sequence.

232
00:12:07,000 --> 00:12:09,000
Padding sequence has been done by in embedding.

233
00:12:09,000 --> 00:12:12,000
Also I showed you right how to do the padding sequence.

234
00:12:12,000 --> 00:12:14,000
We can use this pad underscore sequences right.

235
00:12:14,000 --> 00:12:16,000
We can use pre and post.

236
00:12:16,000 --> 00:12:18,000
I will also show you one example with respect to the sequences.

237
00:12:18,000 --> 00:12:21,000
Let's say the maximum length of my sentence.

238
00:12:21,000 --> 00:12:23,000
I'm just going to take it as 500 okay.

239
00:12:23,000 --> 00:12:29,000
That can be the maximum text probably that I can have in my sentences.

240
00:12:29,000 --> 00:12:37,000
Now let's go ahead and write X underscore train is equal to sequence dot pad underscore sequence.

241
00:12:37,000 --> 00:12:46,000
And here I'm going to basically write x underscore train comma max length is nothing but it is equal

242
00:12:46,000 --> 00:12:47,000
to this max length.

243
00:12:47,000 --> 00:12:51,000
So that's it's just like we are trying to apply pre padding right by using this pad sequence.

244
00:12:51,000 --> 00:12:57,000
Similarly for x test I can go ahead and write something like this okay sequence dot this.

245
00:12:57,000 --> 00:13:01,000
This will basically be my x text okay.

246
00:13:01,000 --> 00:13:04,000
And let's say this is my X test okay.

247
00:13:04,000 --> 00:13:07,000
And here also I can go ahead and write my max underscore length.

248
00:13:08,000 --> 00:13:08,000
Okay.

249
00:13:08,000 --> 00:13:11,000
Let's go ahead and display my X train now.

250
00:13:11,000 --> 00:13:14,000
So x train uh max underscore length.

251
00:13:14,000 --> 00:13:18,000
Uh max okay here my feature is max underscore length.

252
00:13:18,000 --> 00:13:19,000
Okay.

253
00:13:19,000 --> 00:13:23,000
Uh, again I'm getting an error.

254
00:13:24,000 --> 00:13:26,000
Uh max underscore length.

255
00:13:26,000 --> 00:13:27,000
Okay.

256
00:13:27,000 --> 00:13:29,000
This should be okay.

257
00:13:29,000 --> 00:13:30,000
I made one mistake.

258
00:13:30,000 --> 00:13:31,000
This parameter is max length.

259
00:13:31,000 --> 00:13:37,000
Okay, so once it has got executed here, you can see by default it is done that pre padding okay.

260
00:13:37,000 --> 00:13:40,000
Pre padding before zeros are basically coming up.

261
00:13:40,000 --> 00:13:47,000
So if I go ahead and see x train of zero x underscore train of zero, here is what I'm actually able

262
00:13:47,000 --> 00:13:47,000
to get.

263
00:13:47,000 --> 00:13:50,000
See initially all I'm getting are zeros.

264
00:13:50,000 --> 00:13:53,000
And then I am specifically getting all these values at the end.

265
00:13:53,000 --> 00:13:55,000
So pre padding technique is basically applied.

266
00:13:55,000 --> 00:13:57,000
So this step is also done okay.

267
00:13:57,000 --> 00:14:03,000
Now in the next step we will go ahead and design our simple RNN.

268
00:14:03,000 --> 00:14:06,000
We will train our simple RNN.

269
00:14:06,000 --> 00:14:10,000
And this is what we are basically going to see in the next video.

270
00:14:10,000 --> 00:14:15,000
Till here we have kept all the transformation technique that is required, and we are ready to probably

271
00:14:15,000 --> 00:14:18,000
use this entire inputs for training our simple RNN.

272
00:14:18,000 --> 00:14:20,000
So yes, I will see you all in the next video.

273
00:14:20,000 --> 00:14:21,000
Thank you.