1
00:00:00,000 --> 00:00:01,000
Hello all.

2
00:00:01,000 --> 00:00:05,000
So we are going to continue the discussion with respect to our LSTM and GRU.

3
00:00:05,000 --> 00:00:11,000
And in this videos and in the upcoming series of video, we are going to develop some amazing end to

4
00:00:11,000 --> 00:00:12,000
end projects.

5
00:00:12,000 --> 00:00:17,000
Uh, in this particular project, we are going to probably talk about the next word prediction, you

6
00:00:17,000 --> 00:00:19,000
know, using LSTM.

7
00:00:19,000 --> 00:00:24,000
So whatever concepts we have actually learned from LSTM, we by using those concepts, we are going

8
00:00:24,000 --> 00:00:30,000
to see how in a practical way, we can probably go ahead and implement this specific project that is

9
00:00:30,000 --> 00:00:33,000
called as next word prediction using LSTM.

10
00:00:33,000 --> 00:00:38,000
And after completing LSTM, we will see what are the step by step mechanism to probably solve this particular

11
00:00:38,000 --> 00:00:39,000
problem.

12
00:00:39,000 --> 00:00:40,000
Problem statement.

13
00:00:40,000 --> 00:00:43,000
And then we'll also apply the same problem statement with GRU.

14
00:00:43,000 --> 00:00:43,000
Okay.

15
00:00:44,000 --> 00:00:47,000
So uh let's go ahead and let's see the project overview.

16
00:00:47,000 --> 00:00:53,000
So here you can see this project aims to develop a deep learning model for predicting the next word

17
00:00:53,000 --> 00:00:54,000
in a sequence of words.

18
00:00:54,000 --> 00:00:58,000
The model is built using long short term memory.

19
00:00:58,000 --> 00:01:03,000
Uh, long short term networks, which are well suited for sequence prediction tasks.

20
00:01:03,000 --> 00:01:05,000
The project includes the following steps.

21
00:01:05,000 --> 00:01:06,000
Data collection.

22
00:01:06,000 --> 00:01:10,000
We use the text of Shakespeare Hamlet okay.

23
00:01:10,000 --> 00:01:13,000
As our data set, we will specifically use this particular data set.

24
00:01:13,000 --> 00:01:19,000
This rich, complex text provides a good challenge for our model because again, this particular uh,

25
00:01:19,000 --> 00:01:21,000
if you probably go ahead and see this particular text.

26
00:01:21,000 --> 00:01:22,000
Right.

27
00:01:22,000 --> 00:01:24,000
Uh, it's not not just a normal English.

28
00:01:24,000 --> 00:01:25,000
Right.

29
00:01:25,000 --> 00:01:27,000
You know, it will be very difficult just to understand.

30
00:01:27,000 --> 00:01:31,000
So we'll try to train our entire model with this particular data set.

31
00:01:31,000 --> 00:01:34,000
Then we have data preprocessing and data preprocessing.

32
00:01:34,000 --> 00:01:39,000
The text data is tokenized converted into sequence and padded padded to ensure uniform length.

33
00:01:39,000 --> 00:01:43,000
So we will be seeing this data preprocessing technique how to perform it.

34
00:01:43,000 --> 00:01:46,000
And we will also try to generate a pickle file out of it okay.

35
00:01:46,000 --> 00:01:49,000
The sequences are then split into training and test sets.

36
00:01:49,000 --> 00:01:54,000
Uh, then you have this model building, uh, where we'll be using an LSTM model is constructed with

37
00:01:54,000 --> 00:02:01,000
an embedding layer, two LSTM layer and a dense output layer with a softmax activation function to predict

38
00:02:01,000 --> 00:02:03,000
the probability of the next word.

39
00:02:03,000 --> 00:02:05,000
Then we will go ahead and do the model training.

40
00:02:05,000 --> 00:02:10,000
The model is trained using the prepared sequence, with early stopping implemented to prevent overfitting.

41
00:02:10,000 --> 00:02:15,000
Early stopping monitors the validation loss and stop monitoring when the loss stops improving.

42
00:02:15,000 --> 00:02:18,000
Okay, then you have this model evaluation.

43
00:02:18,000 --> 00:02:24,000
The model is evaluated using a set of example sentences to test its ability to predict the next word

44
00:02:24,000 --> 00:02:25,000
accurately.

45
00:02:25,000 --> 00:02:30,000
So finally, once we do all this important steps, then we are going to deploy it by using a Streamlit

46
00:02:30,000 --> 00:02:31,000
web application.

47
00:02:31,000 --> 00:02:32,000
Okay.

48
00:02:32,000 --> 00:02:37,000
And in this, uh, Streamlit web application, what we will do is that we'll will allow users to input

49
00:02:37,000 --> 00:02:42,000
a sequence of words and then get the predicted next word in a real time.

50
00:02:42,000 --> 00:02:48,000
Okay, so all these things we will specifically be doing again, uh, this will be really an amazing

51
00:02:48,000 --> 00:02:49,000
project altogether.

52
00:02:49,000 --> 00:02:54,000
We will be developing this, uh, in such a way that, uh, we will be able to understand this completely

53
00:02:54,000 --> 00:02:55,000
step by step.

54
00:02:55,000 --> 00:02:57,000
First of all, we will go ahead and experiment it.

55
00:02:57,000 --> 00:03:02,000
So here you will be able to see that I've actually go, uh, went ahead and created a LSTM RNN folder

56
00:03:02,000 --> 00:03:06,000
already in our requirement dot txt whatever libraries we wanted.

57
00:03:06,000 --> 00:03:08,000
Uh, we will use this.

58
00:03:08,000 --> 00:03:08,000
Okay.

59
00:03:08,000 --> 00:03:14,000
So uh, first of all what I will do is that since I'm working in the same environment file, uh, I

60
00:03:14,000 --> 00:03:19,000
will just go ahead and install this pip install minus our requirement dot txt.

61
00:03:19,000 --> 00:03:26,000
But before that, um, I would also like to have one more very important library, which is specifically

62
00:03:26,000 --> 00:03:32,000
called as, uh, if I probably go ahead and see over here in the requirement dot txt, I have to use

63
00:03:32,000 --> 00:03:35,000
one more library which is called as uh nltk.

64
00:03:35,000 --> 00:03:40,000
Okay, now this nltk uh library will be important.

65
00:03:40,000 --> 00:03:45,000
Uh, if I really want to download this specific data set, the data set is basically present over there.

66
00:03:45,000 --> 00:03:49,000
Now quickly let me just go ahead and do the installation minus our requirement dot txt.

67
00:03:50,000 --> 00:03:53,000
Please do or don't forget to make the environment variable.

68
00:03:53,000 --> 00:03:55,000
Uh how we have created for this v and v.

69
00:03:55,000 --> 00:03:56,000
Right.

70
00:03:56,000 --> 00:03:58,000
So this is the first step.

71
00:03:58,000 --> 00:04:01,000
We have understood about the problem statement, what we are going to do.

72
00:04:01,000 --> 00:04:06,000
And now we have also went ahead and installed the NLTK library.

73
00:04:06,000 --> 00:04:06,000
Okay.

74
00:04:06,000 --> 00:04:10,000
Now in our next video we will start step by step.

75
00:04:10,000 --> 00:04:13,000
First of all, we will try to complete data collection and data preprocessing.

76
00:04:14,000 --> 00:04:17,000
We'll save a pickle file and then we will move towards model building.

77
00:04:17,000 --> 00:04:20,000
So I hope you understood the problem statement.

78
00:04:20,000 --> 00:04:26,000
I hope uh, I have also given you an idea how to probably create this particular folder, and in the

79
00:04:26,000 --> 00:04:31,000
same V and V environment will work where we have actually worked with a simple RNA in classification.

80
00:04:31,000 --> 00:04:36,000
So please make sure that you also use the requirement dot txt what we have given in the end classification.

81
00:04:36,000 --> 00:04:39,000
Only one library you need to install that is the NLTK.

82
00:04:39,000 --> 00:04:41,000
Okay, so yes, this was it.

83
00:04:41,000 --> 00:04:47,000
I will see you all in the next video where we will be starting our data collection phase.

84
00:04:47,000 --> 00:04:49,000
So I will see you all in the next video.

85
00:04:49,000 --> 00:04:50,000
Thank you.