1
00:00:00,000 --> 00:00:05,000
So guys, here you can see the training has been stopped in 50 epochs because I only did till 50 epochs.

2
00:00:05,000 --> 00:00:08,000
And here I'm able to get an accuracy till 40%.

3
00:00:08,000 --> 00:00:12,000
It started from 0.03 right 3% to 40%.

4
00:00:12,000 --> 00:00:17,000
So if you probably do for another 50 epochs, I think the accuracy will keep on increasing.

5
00:00:17,000 --> 00:00:17,000
Okay.

6
00:00:17,000 --> 00:00:22,000
Now what we'll do is that we'll try to create a function okay, which will be able to predict the next

7
00:00:22,000 --> 00:00:23,000
word.

8
00:00:23,000 --> 00:00:23,000
Now see this.

9
00:00:23,000 --> 00:00:25,000
So this is nothing but a prediction.

10
00:00:25,000 --> 00:00:30,000
The next word here we need to give the model tokenizer text and max sequence.

11
00:00:30,000 --> 00:00:30,000
Okay.

12
00:00:30,000 --> 00:00:32,000
So first of all we will use the tokenizer.

13
00:00:32,000 --> 00:00:34,000
We'll convert that into sequence.

14
00:00:34,000 --> 00:00:35,000
How we did it.

15
00:00:35,000 --> 00:00:39,000
We'll say hey if length of this particular token list is greater than or equal to max sequence length,

16
00:00:39,000 --> 00:00:45,000
right at that point of time, we will just make sure to ensure the sequence length matches the sequence

17
00:00:45,000 --> 00:00:46,000
length minus one.

18
00:00:46,000 --> 00:00:47,000
Right.

19
00:00:47,000 --> 00:00:49,000
So this is the operation that we are specifically doing.

20
00:00:49,000 --> 00:00:50,000
We'll take the token list.

21
00:00:50,000 --> 00:00:54,000
We will give the indexing from this particular value.

22
00:00:54,000 --> 00:00:54,000
Right.

23
00:00:54,000 --> 00:00:56,000
Uh, like max sequence length minus one.

24
00:00:56,000 --> 00:01:01,000
And most of the time you'll be able to see that when we get this particular value from this value till

25
00:01:01,000 --> 00:01:04,000
the end, we are going to take that entire token list, right.

26
00:01:04,000 --> 00:01:09,000
Then we are going to apply the pad sequence, uh, where we will specifically use pre padding.

27
00:01:09,000 --> 00:01:13,000
Then we will go ahead and use model dot predict on this particular token list.

28
00:01:13,000 --> 00:01:18,000
And we will get the predicted word index wherein we'll say hey whichever has the highest probability

29
00:01:18,000 --> 00:01:20,000
with respect to all the outputs, take one.

30
00:01:21,000 --> 00:01:26,000
Uh, I mean, uh, take that specific value and show me that particular index, and then I will convert

31
00:01:26,000 --> 00:01:28,000
that particular index into a word.

32
00:01:28,000 --> 00:01:32,000
So I'll write for word comma index in tokenizer dot word underscore index dot item.

33
00:01:32,000 --> 00:01:35,000
If index is equal to predicted word, return that specific word.

34
00:01:35,000 --> 00:01:38,000
Now you'll be able to understand it once I probably go ahead and execute it.

35
00:01:38,000 --> 00:01:42,000
Let's say this is my input text Okay.

36
00:01:42,000 --> 00:01:47,000
Now input text is that I will say hey to be or not to be.

37
00:01:47,000 --> 00:01:48,000
Okay.

38
00:01:49,000 --> 00:01:51,000
Suppose let's say this is my input text.

39
00:01:51,000 --> 00:01:52,000
I will go ahead and write.

40
00:01:52,000 --> 00:01:53,000
This is my print.

41
00:01:54,000 --> 00:01:57,000
Uh f I will just go ahead and write my input.

42
00:01:57,000 --> 00:02:00,000
Text is colon nothing.

43
00:02:00,000 --> 00:02:05,000
But here I'm going to basically go ahead and give my input text.

44
00:02:05,000 --> 00:02:08,000
So this will basically be my input text over here.

45
00:02:08,000 --> 00:02:09,000
Okay.

46
00:02:09,000 --> 00:02:12,000
And then I will go ahead and write max sequence length.

47
00:02:12,000 --> 00:02:24,000
Max sequence length is equal to model dot input of input of shape of one plus one okay.

48
00:02:24,000 --> 00:02:29,000
So uh plus one uh, you'll be able to see that this will basically be my max sequence length, because

49
00:02:29,000 --> 00:02:34,000
I have to make sure that, uh, whatever max sequence length we have initially taken, that needs to

50
00:02:34,000 --> 00:02:35,000
be same thing, right?

51
00:02:35,000 --> 00:02:39,000
Then we will go ahead and predict our next underscore word.

52
00:02:39,000 --> 00:02:43,000
And here I will just call my predict next word function.

53
00:02:43,000 --> 00:02:49,000
Here I will give my model then my tokenizer okay tokenizer.

54
00:02:49,000 --> 00:02:53,000
Along with this I will also go ahead and give my input text okay.

55
00:02:53,000 --> 00:02:56,000
And then finally let's go ahead and give my max sequence length.

56
00:02:57,000 --> 00:02:58,000
Right.

57
00:02:58,000 --> 00:03:00,000
These are the parameters that I need to give right.

58
00:03:00,000 --> 00:03:02,000
And that is the reason I'm giving this okay.

59
00:03:02,000 --> 00:03:05,000
And finally let's go ahead and print.

60
00:03:06,000 --> 00:03:11,000
And here I will just go ahead and write my next word prediction.

61
00:03:13,000 --> 00:03:14,000
Which will be equal to.

62
00:03:16,000 --> 00:03:23,000
My next underscore word that let's go ahead and execute it.

63
00:03:23,000 --> 00:03:27,000
So to be not to be next word predicted is considered right.

64
00:03:27,000 --> 00:03:29,000
It is good to be or not to be considered right.

65
00:03:29,000 --> 00:03:31,000
So this looks like a very good word.

66
00:03:31,000 --> 00:03:31,000
Okay.

67
00:03:32,000 --> 00:03:38,000
Finally, uh, after doing this I will go ahead and save my entire models so quickly.

68
00:03:38,000 --> 00:03:39,000
Go.

69
00:03:39,000 --> 00:03:42,000
Let's go ahead and save the model that I've actually created.

70
00:03:42,000 --> 00:03:42,000
Okay.

71
00:03:42,000 --> 00:03:44,000
So I'll write model dot save.

72
00:03:44,000 --> 00:03:50,000
And here I'm going to basically use next underscore word.

73
00:03:51,000 --> 00:03:55,000
Word underscore LSTM okay dot h5 file.

74
00:03:55,000 --> 00:03:57,000
So this will basically be my h5 file.

75
00:03:57,000 --> 00:04:01,000
Other than that I will also go ahead and save the tokenizer right.

76
00:04:01,000 --> 00:04:06,000
So tokenizer also needs to be saving because we need to use the same tokenizer what we have created.

77
00:04:06,000 --> 00:04:13,000
So here I will go ahead and write with open tokenizer dot pickle file.

78
00:04:13,000 --> 00:04:17,000
I'll open this particular pickle file and it'll be in my write byte mode.

79
00:04:17,000 --> 00:04:20,000
I will say as handle okay.

80
00:04:20,000 --> 00:04:21,000
This will basically be my context.

81
00:04:22,000 --> 00:04:24,000
Let me just go ahead and write pickle dot dump.

82
00:04:24,000 --> 00:04:32,000
And here let me just import pickle because I will be requiring pickle Pickle okay.

83
00:04:32,000 --> 00:04:33,000
Pickle dot dump.

84
00:04:33,000 --> 00:04:44,000
And here we will dump this entire tokenizer comma handle comma I will use a protocol also over here

85
00:04:44,000 --> 00:04:50,000
this protocol will be equal to and we will use pickle dot highest protocol okay.

86
00:04:51,000 --> 00:04:51,000
Okay.

87
00:04:51,000 --> 00:04:52,000
One assignment.

88
00:04:52,000 --> 00:04:55,000
Just try to find out why this specific thing we specifically use.

89
00:04:55,000 --> 00:04:56,000
Okay.

90
00:04:56,000 --> 00:04:58,000
So this will be an assignment for you.

91
00:04:58,000 --> 00:04:58,000
Come on.

92
00:04:58,000 --> 00:05:01,000
So has no attribute dump.

93
00:05:01,000 --> 00:05:03,000
Uh, dumpy it is written dumpy.

94
00:05:03,000 --> 00:05:04,000
I should have written dump.

95
00:05:04,000 --> 00:05:05,000
Okay.

96
00:05:05,000 --> 00:05:09,000
So once it is executed now in the same folder, you'll be able to see my next word.

97
00:05:09,000 --> 00:05:11,000
LSTM dot h5 and tokenizer dot pickle.

98
00:05:11,000 --> 00:05:14,000
Okay, so here is all my things.

99
00:05:14,000 --> 00:05:19,000
And uh, I can go ahead and use more text if you want.

100
00:05:19,000 --> 00:05:19,000
Okay.

101
00:05:20,000 --> 00:05:27,000
Um, to be bad is better than.

102
00:05:27,000 --> 00:05:30,000
Okay, I'll just try to go ahead and predict it.

103
00:05:30,000 --> 00:05:32,000
It says to be bad is better than.

104
00:05:32,000 --> 00:05:32,000
And.

105
00:05:32,000 --> 00:05:37,000
Okay, so since the model accuracy is a little bit less all the time, you're not going to get the right

106
00:05:37,000 --> 00:05:38,000
output okay.

107
00:05:38,000 --> 00:05:39,000
So you can go ahead and try it out.

108
00:05:39,000 --> 00:05:42,000
But I'll just go ahead and use some more examples from here.

109
00:05:42,000 --> 00:05:43,000
Okay.

110
00:05:43,000 --> 00:05:48,000
Let's say, uh, I'm using this.

111
00:05:48,000 --> 00:05:48,000
Okay.

112
00:05:49,000 --> 00:05:54,000
I don't know, like what exactly these words are, but I'll just try to use this whether it'll be fine

113
00:05:54,000 --> 00:05:54,000
or not.

114
00:05:54,000 --> 00:05:55,000
Let's see.

115
00:05:55,000 --> 00:05:55,000
Okay.

116
00:05:56,000 --> 00:05:57,000
I will use this particular sentence.

117
00:05:57,000 --> 00:06:00,000
And then we are going to do the prediction fantasy.

118
00:06:00,000 --> 00:06:03,000
See it's coming fantasy right.

119
00:06:03,000 --> 00:06:05,000
The prediction is absolutely right.

120
00:06:05,000 --> 00:06:05,000
Right.

121
00:06:06,000 --> 00:06:07,000
Mama horror okay.

122
00:06:07,000 --> 00:06:10,000
Is tis but our fantasy, right?

123
00:06:10,000 --> 00:06:11,000
This is good.

124
00:06:12,000 --> 00:06:12,000
Um.

125
00:06:13,000 --> 00:06:15,000
Uh, let's see some more things.

126
00:06:15,000 --> 00:06:17,000
Uh, I'll say.

127
00:06:17,000 --> 00:06:19,000
Well, born last night of all.

128
00:06:19,000 --> 00:06:22,000
When you and same starry, starry.

129
00:06:22,000 --> 00:06:25,000
I don't know whether this English or something else.

130
00:06:25,000 --> 00:06:25,000
Okay.

131
00:06:28,000 --> 00:06:29,000
Okay, same.

132
00:06:29,000 --> 00:06:30,000
Let's execute it.

133
00:06:30,000 --> 00:06:31,000
Cosine.

134
00:06:31,000 --> 00:06:33,000
It comes as cosine okay.

135
00:06:33,000 --> 00:06:34,000
The exact input here.

136
00:06:34,000 --> 00:06:36,000
We had to get started but it got cosine.

137
00:06:36,000 --> 00:06:37,000
That's okay.

138
00:06:37,000 --> 00:06:42,000
So I hope are you able to understand this now in our next video I'm going to create a Streamlit app.

139
00:06:42,000 --> 00:06:44,000
And we will use all these things.

140
00:06:44,000 --> 00:06:45,000
All this functions.

141
00:06:45,000 --> 00:06:48,000
First of all we will be loading loading our H5 file and all.

142
00:06:48,000 --> 00:06:53,000
And we'll try to convert that into an entire end to end project using Streamlit.

143
00:06:53,000 --> 00:06:54,000
So yes, this was it for my side.

144
00:06:54,000 --> 00:06:56,000
I will see you all in the next video.

145
00:06:56,000 --> 00:06:57,000
Thank you.

146
00:06:57,000 --> 00:06:57,000
Take care.