1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue the discussion with respect to Lang Chain.

3
00:00:03,000 --> 00:00:07,000
In this video and in the upcoming series of video, we are going to discuss about this module, which

4
00:00:07,000 --> 00:00:09,000
is called as Summarize text.

5
00:00:09,000 --> 00:00:14,000
Let's say if you have any kind of data source, it can be a unstructured file, it can be a structured

6
00:00:14,000 --> 00:00:14,000
file.

7
00:00:14,000 --> 00:00:16,000
It can be different different types of file.

8
00:00:16,000 --> 00:00:23,000
What we will be doing is that we will be reading the specific files, and then we will summarize the

9
00:00:23,000 --> 00:00:25,000
entire information that is present in that.

10
00:00:25,000 --> 00:00:31,000
Whenever we talk with respect to summarization, um, you will be seeing that, uh, you know, there

11
00:00:31,000 --> 00:00:34,000
are three ways specifically in long chain.

12
00:00:34,000 --> 00:00:34,000
Okay.

13
00:00:34,000 --> 00:00:40,000
And those three ways also will be using like stuff is one of the way MapReduce is one of the way and

14
00:00:40,000 --> 00:00:42,000
refining is one of the way.

15
00:00:42,000 --> 00:00:42,000
Right?

16
00:00:42,000 --> 00:00:47,000
So first of all, we'll just get to know about this basic implementation, how to properly use stuff

17
00:00:47,000 --> 00:00:49,000
MapReduce and refine.

18
00:00:49,000 --> 00:00:52,000
And as we go ahead, we will be creating an end to end project.

19
00:00:52,000 --> 00:00:59,000
And in that end to end project, we will try to summarize text from both structured URL or let's say

20
00:00:59,000 --> 00:01:03,000
from a website content, or from an unstructured, uh, content.

21
00:01:03,000 --> 00:01:04,000
Also.

22
00:01:04,000 --> 00:01:04,000
Right?

23
00:01:04,000 --> 00:01:07,000
Unstructured content can be YouTube videos.

24
00:01:07,000 --> 00:01:13,000
It can be, uh, Twitters, it can be Wikipedia page, you know, and structured content basically means

25
00:01:13,000 --> 00:01:16,000
like how you have a data in a tabular format.

26
00:01:16,000 --> 00:01:16,000
Right.

27
00:01:16,000 --> 00:01:18,000
So we will be using different different data sets.

28
00:01:18,000 --> 00:01:21,000
And we will try to perform this end to end project.

29
00:01:21,000 --> 00:01:24,000
So that will be coming in the upcoming, uh videos.

30
00:01:24,000 --> 00:01:27,000
But in this video let's go ahead and start with this.

31
00:01:27,000 --> 00:01:32,000
You know, uh, we will I will show you all the three different types of techniques, specifically in

32
00:01:32,000 --> 00:01:34,000
text summarization that we use.

33
00:01:34,000 --> 00:01:34,000
Okay.

34
00:01:34,000 --> 00:01:39,000
So first of all, what I'm actually going to do over here, you can see that I have imported the OS.

35
00:01:39,000 --> 00:01:42,000
Then from dot env we have imported load underscore dot env.

36
00:01:42,000 --> 00:01:45,000
This is just to import the environment variables.

37
00:01:45,000 --> 00:01:49,000
And I'm also going to use the grok API because grok actually helps you to connect with the open source

38
00:01:49,000 --> 00:01:50,000
LM models.

39
00:01:50,000 --> 00:01:52,000
I don't even have to use OpenAI over there.

40
00:01:52,000 --> 00:01:53,000
Okay.

41
00:01:53,000 --> 00:01:58,000
Now, uh, quickly, what I will do is that, uh, I will make some cells.

42
00:01:58,000 --> 00:01:59,000
Okay.

43
00:01:59,000 --> 00:02:04,000
Now, first of all, what I will do, I will just go ahead and write from Lang chain.

44
00:02:05,000 --> 00:02:05,000
Okay.

45
00:02:05,000 --> 00:02:13,000
Dot schema I'm just going to import three important things okay.

46
00:02:13,000 --> 00:02:19,000
So here I'm going to import I message okay I message is specifically with respect to the response that

47
00:02:19,000 --> 00:02:20,000
we get from the LM.

48
00:02:20,000 --> 00:02:25,000
Then you have this human message and then you have the system message.

49
00:02:25,000 --> 00:02:27,000
So till now I have not even discussed about this.

50
00:02:27,000 --> 00:02:33,000
So that is the reason in every module you will be seeing, I'll be adding some of the new things right.

51
00:02:33,000 --> 00:02:35,000
And Lang Chin is completely vast.

52
00:02:35,000 --> 00:02:39,000
Like I don't know, like what all developments will be happening in the future.

53
00:02:39,000 --> 00:02:40,000
Right.

54
00:02:40,000 --> 00:02:44,000
So here what we are doing is that we are using this kind of schema which will have I message.

55
00:02:44,000 --> 00:02:48,000
I message specifically refers to all the response that is coming from the LM model.

56
00:02:48,000 --> 00:02:51,000
Human message is the kind of queries that we are specifically asking.

57
00:02:51,000 --> 00:02:56,000
And system message is the, uh, is the instruction given to the system, how they really need to behave?

58
00:02:56,000 --> 00:02:57,000
Okay.

59
00:02:57,000 --> 00:02:59,000
Uh, here text summarization will be showing.

60
00:02:59,000 --> 00:03:03,000
Uh, I will be showing you with the help of prompt template with the different types of techniques also.

61
00:03:03,000 --> 00:03:04,000
Okay.

62
00:03:04,000 --> 00:03:07,000
So first of all let's go ahead and import all these things.

63
00:03:07,000 --> 00:03:09,000
Now I'm going to take a specific speech okay.

64
00:03:09,000 --> 00:03:15,000
It is a long speech from our um honorable Prime Minister Narendra modi.

65
00:03:15,000 --> 00:03:16,000
Okay.

66
00:03:16,000 --> 00:03:21,000
So this particular uh, speech specifically in India, you have this famous speech over here and I'm

67
00:03:21,000 --> 00:03:27,000
taking this, uh, it is spoken like, uh, mostly the things that are spoken over here are related

68
00:03:27,000 --> 00:03:31,000
to, uh, Indian government and what all developments are specifically happening.

69
00:03:31,000 --> 00:03:33,000
So I will just go ahead and execute this.

70
00:03:33,000 --> 00:03:35,000
So what I'm actually going to do, I'll take the speech.

71
00:03:36,000 --> 00:03:36,000
Okay.

72
00:03:37,000 --> 00:03:42,000
And we will probably go ahead and summarize this entire speech.

73
00:03:42,000 --> 00:03:42,000
Right.

74
00:03:42,000 --> 00:03:44,000
That is what is our task.

75
00:03:44,000 --> 00:03:49,000
So what I will do over here, I will just go ahead and give some basic prompts.

76
00:03:49,000 --> 00:03:51,000
I'll say, hey, let's go ahead and create this chat message.

77
00:03:51,000 --> 00:03:53,000
It will be a kind of list.

78
00:03:53,000 --> 00:03:54,000
Okay.

79
00:03:54,000 --> 00:04:01,000
The first default, uh, the first default instruction that I'm giving is with respect to system message,

80
00:04:01,000 --> 00:04:05,000
I'll say, here is my content and you need to behave in this way.

81
00:04:05,000 --> 00:04:17,000
So I'll say, hey, you are an expert with expertise in summarizing summarizing the speeches.

82
00:04:17,000 --> 00:04:21,000
Okay, so there is your first one.

83
00:04:21,000 --> 00:04:23,000
Then let's go to the next one.

84
00:04:23,000 --> 00:04:23,000
Here.

85
00:04:23,000 --> 00:04:25,000
I'm going to probably go ahead and write human message.

86
00:04:26,000 --> 00:04:31,000
And again with respect to the content over here, I will go ahead and use an F string and I'll say hey

87
00:04:31,000 --> 00:04:48,000
please provide a short and concise summary of the following speech.

88
00:04:48,000 --> 00:04:48,000
Okay?

89
00:04:48,000 --> 00:04:53,000
And here I will just go ahead and give my text placeholder okay.

90
00:04:53,000 --> 00:05:00,000
And this text placeholder will be assigned with a variable which looks like this speech okay.

91
00:05:00,000 --> 00:05:04,000
And this speech will be is given over here.

92
00:05:04,000 --> 00:05:04,000
Right.

93
00:05:04,000 --> 00:05:09,000
So here, uh, we are just setting up this chat underscore message over here with respect to system

94
00:05:09,000 --> 00:05:10,000
message and human message.

95
00:05:10,000 --> 00:05:15,000
In the human message, we are just telling from the human side what information is basically going and

96
00:05:15,000 --> 00:05:18,000
what we really need to what the LM model needs to do.

97
00:05:18,000 --> 00:05:24,000
And the system message is a kind of instruction like what that LM model has to do with respect to its

98
00:05:24,000 --> 00:05:25,000
functionality.

99
00:05:25,000 --> 00:05:25,000
Okay.

100
00:05:25,000 --> 00:05:28,000
So all the things is specifically mentioned over here.

101
00:05:28,000 --> 00:05:32,000
Now quickly, what I'll do is that I've already initialized my LM model.

102
00:05:32,000 --> 00:05:35,000
So I will just go ahead and execute this.

103
00:05:35,000 --> 00:05:35,000
Okay.

104
00:05:35,000 --> 00:05:41,000
First of all now I will go ahead and write LM dot get underscore tokens okay.

105
00:05:41,000 --> 00:05:45,000
So I think there should be get underscore num underscore token.

106
00:05:45,000 --> 00:05:50,000
So this will first of all determine how many number of tokens are basically present in this speech.

107
00:05:50,000 --> 00:05:51,000
Right.

108
00:05:51,000 --> 00:05:57,000
So if I execute this it will take some amount of time based on the kind of LLM models that you are specifically

109
00:05:57,000 --> 00:05:58,000
using.

110
00:05:58,000 --> 00:06:01,000
And then, uh, we should be able to see over here.

111
00:06:01,000 --> 00:06:02,000
Right.

112
00:06:02,000 --> 00:06:05,000
Uh, let's see why it is basically okay.

113
00:06:05,000 --> 00:06:11,000
It is showing 909 as your, um, you know, the tokens, number of tokens over here.

114
00:06:11,000 --> 00:06:11,000
Right.

115
00:06:11,000 --> 00:06:15,000
So 909 is specifically the number of tokens in your speech.

116
00:06:15,000 --> 00:06:15,000
Okay.

117
00:06:15,000 --> 00:06:22,000
Now if you specifically want to see the, uh, content from this particular thing, or let's say I'm

118
00:06:22,000 --> 00:06:23,000
just going to go ahead.

119
00:06:23,000 --> 00:06:28,000
And here you can see in this particular model I have created this chat message, okay.

120
00:06:28,000 --> 00:06:30,000
This chat message is also acting like a basic prompt.

121
00:06:30,000 --> 00:06:36,000
Like what I'm saying that what the, LM model needs to do, and what is the human message that is probably

122
00:06:36,000 --> 00:06:38,000
going over here based on the chat message.

123
00:06:38,000 --> 00:06:41,000
I can also go ahead and pass this particular chat message over here.

124
00:06:41,000 --> 00:06:46,000
And if I just go ahead and explore the content, this will be my summarized text.

125
00:06:46,000 --> 00:06:47,000
So here you can see.

126
00:06:47,000 --> 00:06:47,000
Summary.

127
00:06:47,000 --> 00:06:53,000
The member of the Parliament emphasizes the importance of ensuring the government schemes reach the

128
00:06:53,000 --> 00:06:54,000
standard beneficiary.

129
00:06:54,000 --> 00:07:00,000
He highlights the visit Bharath Sankalp Yatra as an initiative to assess the effectiveness of government.

130
00:07:00,000 --> 00:07:06,000
So this is the entire summary and this is one way of creating the summary, where I probably go ahead

131
00:07:06,000 --> 00:07:11,000
and create a list of chat message where what is my system message that I'll mention, and what is the

132
00:07:11,000 --> 00:07:14,000
human message that I'm going to mention over here?

133
00:07:14,000 --> 00:07:18,000
I'm going to provide it as a parameter, which with respect to the entire speech and with respect to

134
00:07:18,000 --> 00:07:23,000
the chat message, if I call with respect to LM, I will be able to get the entire summary.

135
00:07:23,000 --> 00:07:25,000
Okay, if I just remove this dot content.

136
00:07:25,000 --> 00:07:29,000
Also, you will be able to see that I will be able to get the entire response I message.

137
00:07:29,000 --> 00:07:31,000
Now this is very much important.

138
00:07:31,000 --> 00:07:35,000
As I said, I message is the response from the LM model.

139
00:07:35,000 --> 00:07:40,000
And this is one way how you can specifically go ahead and summarize the entire content.

140
00:07:40,000 --> 00:07:41,000
Okay.

141
00:07:41,000 --> 00:07:43,000
Here you can see the output tokens here.

142
00:07:43,000 --> 00:07:44,000
Output token.

143
00:07:44,000 --> 00:07:47,000
See it Input tokens were 895 right.

144
00:07:47,000 --> 00:07:50,000
And now you have this output tokens to 108.

145
00:07:50,000 --> 00:07:54,000
So this entire 895 tokens has been summarized to 108 tokens okay.

146
00:07:54,000 --> 00:07:59,000
So this is one way of getting the summary okay.

147
00:08:00,000 --> 00:08:03,000
Now similarly I will show you multiple ways right.

148
00:08:03,000 --> 00:08:04,000
This is one way right.

149
00:08:05,000 --> 00:08:12,000
But this do you think it will work if your if your message is or if your entire speech is very huge,

150
00:08:12,000 --> 00:08:17,000
if it is of 100 200 pages, then at that point of time what you will do.

151
00:08:17,000 --> 00:08:22,000
Now the second technique that I am specifically going to use is nothing with.

152
00:08:22,000 --> 00:08:27,000
I will create a prompt template first of all, and I'll say with this prompt template I'll try to perform

153
00:08:27,000 --> 00:08:28,000
my text summarization.

154
00:08:29,000 --> 00:08:33,000
So this is my second step that I am going to show you a second way.

155
00:08:33,000 --> 00:08:34,000
okay?

156
00:08:34,000 --> 00:08:38,000
And remember, all these ways will be very important when we are actually creating our end to end project.

157
00:08:38,000 --> 00:08:39,000
Okay.

158
00:08:39,000 --> 00:08:43,000
So here first of all, I will go ahead and write from long chain okay.

159
00:08:43,000 --> 00:08:46,000
Or from long chain dot chains.

160
00:08:46,000 --> 00:08:49,000
I'm going to import LM chain.

161
00:08:49,000 --> 00:08:52,000
Now let me just go ahead and talk about this LM chain.

162
00:08:52,000 --> 00:08:54,000
What exactly is LM chain.

163
00:08:54,000 --> 00:08:57,000
Whenever we combine right.

164
00:08:57,000 --> 00:09:04,000
Whenever we combine a prompt template with an LLM right, we basically say that as an LLM chain.

165
00:09:04,000 --> 00:09:04,000
Okay.

166
00:09:05,000 --> 00:09:12,000
You have actually seen in the initial stages of our, uh, of our course right there, we had specifically

167
00:09:12,000 --> 00:09:16,000
used like we used to create an LLM along with the LLM.

168
00:09:16,000 --> 00:09:18,000
We used to also create prompt along with the prompt.

169
00:09:18,000 --> 00:09:21,000
We used to also create our string output parser.

170
00:09:21,000 --> 00:09:22,000
So all these things we used to create.

171
00:09:22,000 --> 00:09:29,000
Now in order to create a chain, we used to combine something like this prompt LM and your string output

172
00:09:29,000 --> 00:09:29,000
parser.

173
00:09:29,000 --> 00:09:29,000
Right.

174
00:09:29,000 --> 00:09:30,000
Something like this.

175
00:09:31,000 --> 00:09:35,000
But in the case of lm chain we just use this two.

176
00:09:35,000 --> 00:09:40,000
We don't use a string output parser because whatever will be the default string output parser that will

177
00:09:40,000 --> 00:09:41,000
get the response with okay.

178
00:09:41,000 --> 00:09:46,000
Now in this case what I'm actually going to do, I'm going to use this LM chain.

179
00:09:46,000 --> 00:09:50,000
Along with this I will go ahead and define my prompt template.

180
00:09:50,000 --> 00:09:55,000
So I'm just going to from Lang Chain I'm going to import and I'm going to import the prompt template.

181
00:09:55,000 --> 00:09:59,000
Now prompt template is specifically with respect to the kind of prompt that you really want to create.

182
00:09:59,000 --> 00:09:59,000
Okay.

183
00:09:59,000 --> 00:10:03,000
So here let me just go ahead and create my generic prompt template.

184
00:10:03,000 --> 00:10:06,000
So here I'll go ahead and write Generate Top Prompt template.

185
00:10:06,000 --> 00:10:08,000
And let's go ahead and write my prompt okay.

186
00:10:08,000 --> 00:10:12,000
So now with respect to this particular prompt you need to understand one very important thing.

187
00:10:12,000 --> 00:10:14,000
How do we write a prompt over here.

188
00:10:14,000 --> 00:10:15,000
Right.

189
00:10:15,000 --> 00:10:23,000
So I will say hey write a summary of the following speech okay.

190
00:10:23,000 --> 00:10:27,000
And I'll say, hey, this is what I'm actually giving as a prompt okay.

191
00:10:27,000 --> 00:10:29,000
Now I need to give my speech.

192
00:10:29,000 --> 00:10:31,000
The speech will be given in the form of a input.

193
00:10:31,000 --> 00:10:33,000
So this will basically be my input over here.

194
00:10:33,000 --> 00:10:34,000
Right.

195
00:10:34,000 --> 00:10:35,000
Speech is equal to speech.

196
00:10:35,000 --> 00:10:38,000
Now I will also add one more task.

197
00:10:38,000 --> 00:10:50,000
Let's say I will go ahead and say, hey, translate the precise summary to this specific language,

198
00:10:50,000 --> 00:10:51,000
whatever language I have okay.

199
00:10:52,000 --> 00:10:54,000
Now I'm doing one more additional task.

200
00:10:54,000 --> 00:10:55,000
I'm just not getting the summary.

201
00:10:55,000 --> 00:10:58,000
I'm also converting that entire summary to a specific language okay.

202
00:10:58,000 --> 00:11:01,000
Okay, so this actually becomes my generic template.

203
00:11:01,000 --> 00:11:06,000
And this is what prompt engineering is all about here I'm actually giving two inputs okay.

204
00:11:06,000 --> 00:11:12,000
Now in order to convert this into a prompt I will just go ahead and use the same prompt template.

205
00:11:12,000 --> 00:11:14,000
Uh, inside this prompt template.

206
00:11:14,000 --> 00:11:18,000
First of all, I need to go ahead and mention my input variables.

207
00:11:19,000 --> 00:11:20,000
Okay.

208
00:11:20,000 --> 00:11:24,000
So when I mention my input variables there are two input variables and this needs to be provided in

209
00:11:24,000 --> 00:11:25,000
the form of list.

210
00:11:25,000 --> 00:11:30,000
One is speech and the other one is something called as language.

211
00:11:30,000 --> 00:11:36,000
Once I give the speech and language then I also need to give my template over here.

212
00:11:36,000 --> 00:11:40,000
So the second parameter that is basically used is something called as template.

213
00:11:40,000 --> 00:11:43,000
And this template will be assigned to my generic template okay.

214
00:11:43,000 --> 00:11:49,000
Now if I just go ahead and execute this prompt you will be able to see that this is my prompt template

215
00:11:49,000 --> 00:11:51,000
with input variables, language and speech.

216
00:11:51,000 --> 00:11:52,000
This is my template, right?

217
00:11:52,000 --> 00:11:55,000
A number of, uh, summary of the speech.

218
00:11:55,000 --> 00:11:57,000
And here I've assigned this particular placeholder.

219
00:11:57,000 --> 00:12:03,000
Now if you just pass this input variable you can also see the entire prompt.

220
00:12:03,000 --> 00:12:06,000
So for that I will just go ahead and write prompt dot format.

221
00:12:06,000 --> 00:12:08,000
And inside this I will give my two input variable.

222
00:12:08,000 --> 00:12:10,000
One is speech okay.

223
00:12:10,000 --> 00:12:18,000
One is speech over here, which will be assigned to our speech value that we have given uh as a parameter.

224
00:12:18,000 --> 00:12:21,000
And then I will also go ahead and pass our language.

225
00:12:21,000 --> 00:12:21,000
Language.

226
00:12:21,000 --> 00:12:25,000
Let's say I want to convert this into French, this entire summary into French.

227
00:12:25,000 --> 00:12:29,000
So once I execute it now you can see right a summary of the following speech.

228
00:12:29,000 --> 00:12:29,000
Okay.

229
00:12:29,000 --> 00:12:31,000
The entire speech is over here.

230
00:12:31,000 --> 00:12:33,000
It is placed it completely over here.

231
00:12:33,000 --> 00:12:37,000
And in the end you can see translate the precise summary to French.

232
00:12:37,000 --> 00:12:37,000
Okay.

233
00:12:37,000 --> 00:12:40,000
So this actually becomes my entire prompt.

234
00:12:40,000 --> 00:12:41,000
Okay.

235
00:12:41,000 --> 00:12:48,000
Now what I'm actually going to do over here is that, uh, once I get this specific prompt okay, I

236
00:12:48,000 --> 00:12:51,000
will just save this in our prompt.

237
00:12:51,000 --> 00:12:52,000
Uh, I'll write.

238
00:12:52,000 --> 00:12:52,000
Okay.

239
00:12:52,000 --> 00:12:55,000
This is my complete underscore prompt.

240
00:12:55,000 --> 00:12:56,000
Okay.

241
00:12:56,000 --> 00:12:59,000
And this is what I will go ahead and execute it okay.

242
00:12:59,000 --> 00:13:01,000
I'm just saving it in a variable over here.

243
00:13:01,000 --> 00:13:04,000
Now let's execute this specific prompt okay.

244
00:13:04,000 --> 00:13:08,000
Now in order to execute this specific prompt you could see right.

245
00:13:08,000 --> 00:13:11,000
Initially my number of tokens were 909.

246
00:13:12,000 --> 00:13:19,000
And now if I go ahead and see with respect to my token it is nothing but LM dot get num tokens.

247
00:13:19,000 --> 00:13:19,000
Okay.

248
00:13:19,000 --> 00:13:23,000
And in order to just pass this complete prompt, I'll pass it over here.

249
00:13:24,000 --> 00:13:27,000
And if I execute it here, you can see now it has increased to 931.

250
00:13:27,000 --> 00:13:29,000
Why it has increased to 931.

251
00:13:29,000 --> 00:13:33,000
Because along with this speech, I have also put some more additional information over here.

252
00:13:33,000 --> 00:13:37,000
Write this all and this will also be added in the form of tokens.

253
00:13:38,000 --> 00:13:43,000
Understand with respect to every LLM model there is a restriction with the number of tokens, right?

254
00:13:43,000 --> 00:13:45,000
Right now it is fine 931 tokens.

255
00:13:45,000 --> 00:13:50,000
Most of the advanced LLM models will just consider this many number of tokens in a very much easy way,

256
00:13:50,000 --> 00:13:53,000
and it will be able to execute and give you the response.

257
00:13:53,000 --> 00:13:57,000
But there are also some limitations with respect to other LLM models with respect to maximum number

258
00:13:57,000 --> 00:13:57,000
of tokens.

259
00:13:57,000 --> 00:14:03,000
Let's say if in this in our case, our speech was somewhere around 25 pages, at that point of time,

260
00:14:03,000 --> 00:14:06,000
I have token what had increased more than 10,000.

261
00:14:06,000 --> 00:14:11,000
If I see with respect to OpenAI GPT 3.5, there was a limitation of 4096 tokens with respect to the

262
00:14:11,000 --> 00:14:12,000
input that we give.

263
00:14:12,000 --> 00:14:13,000
right?

264
00:14:13,000 --> 00:14:18,000
So it is necessary that whenever we try to give this kind of tokens, we also try to split it.

265
00:14:18,000 --> 00:14:19,000
But till here it is.

266
00:14:19,000 --> 00:14:20,000
Fine.

267
00:14:20,000 --> 00:14:22,000
Let's run this entire thing now okay.

268
00:14:22,000 --> 00:14:24,000
Based on this complete prompt template.

269
00:14:24,000 --> 00:14:26,000
So I'll go ahead and write uh LM underscore chain.

270
00:14:26,000 --> 00:14:28,000
Now I'll go ahead and create my LM chain.

271
00:14:28,000 --> 00:14:36,000
As I said, if I have an LM model along with this, if I have the prompt, uh, prompt is equal to prompt.

272
00:14:36,000 --> 00:14:37,000
Okay.

273
00:14:37,000 --> 00:14:40,000
Only in this two case I will be using an LM chain.

274
00:14:40,000 --> 00:14:45,000
And this chain will be responsible based on the prompt, how the LM needs to execute and provide the

275
00:14:45,000 --> 00:14:45,000
response.

276
00:14:45,000 --> 00:14:45,000
Okay.

277
00:14:45,000 --> 00:14:48,000
That is the information that you will be able to see over here.

278
00:14:48,000 --> 00:14:51,000
Then I will just go ahead and write my summary.

279
00:14:51,000 --> 00:14:55,000
And here I will say LM underscore chain dot run.

280
00:14:55,000 --> 00:14:56,000
Okay.

281
00:14:56,000 --> 00:15:00,000
Now in order to run it what I will do is that quickly I will go ahead and write speech.

282
00:15:00,000 --> 00:15:07,000
Uh, what will be my input that I need to give as my input variable speech, which will be colon equal

283
00:15:07,000 --> 00:15:08,000
to speech.

284
00:15:08,000 --> 00:15:14,000
And then you'll also be seeing I will go ahead and write something like my language.

285
00:15:17,000 --> 00:15:20,000
Language and this language will be nothing.

286
00:15:20,000 --> 00:15:23,000
But let's say I will go ahead and write French okay.

287
00:15:23,000 --> 00:15:26,000
And then finally I'll go ahead and see my summary.

288
00:15:26,000 --> 00:15:29,000
So this is going to take some amount of time I guess.

289
00:15:29,000 --> 00:15:29,000
Okay.

290
00:15:29,000 --> 00:15:30,000
Perfect.

291
00:15:30,000 --> 00:15:36,000
See that is the power of uh, you can basically see the power of uh, grok, right.

292
00:15:36,000 --> 00:15:38,000
How fast it was.

293
00:15:38,000 --> 00:15:42,000
And here is your entire, uh, information in French.

294
00:15:42,000 --> 00:15:43,000
Obviously, I cannot understand it.

295
00:15:43,000 --> 00:15:46,000
So let me just go ahead and convert this into Hindi.

296
00:15:46,000 --> 00:15:48,000
But I think Hindi will be able to understand.

297
00:15:48,000 --> 00:15:52,000
So summary is that Vishay may uh, it is written.

298
00:15:52,000 --> 00:15:53,000
I'll just read it in front of you.

299
00:15:53,000 --> 00:15:59,000
I don't know how many of you know Hindi, but Vishay may look sarkar Rajnaitik or Samajik gati video

300
00:15:59,000 --> 00:16:00,000
may occur.

301
00:16:00,000 --> 00:16:06,000
So this is one way of doing another summary where you can actually use LM chain and you can take, you

302
00:16:06,000 --> 00:16:09,000
can design your entire prompt with by using a generic prompt template.

303
00:16:09,000 --> 00:16:09,000
Okay.

304
00:16:09,000 --> 00:16:12,000
So all this information is specifically there.

305
00:16:12,000 --> 00:16:14,000
Now this is fine.

306
00:16:14,000 --> 00:16:21,000
See here, what is basically happening is that you have just a limited number of text, right?

307
00:16:21,000 --> 00:16:29,000
When you have a huge PDFs, when you have bigger size PDFs, at that point of time, we definitely have

308
00:16:29,000 --> 00:16:31,000
to use different, different summarization technique.

309
00:16:32,000 --> 00:16:37,000
Here what we are doing, the entire document, whatever we are getting, entire speech, whatever we

310
00:16:37,000 --> 00:16:39,000
are getting, we are just giving it to the LM model.

311
00:16:39,000 --> 00:16:46,000
And LM model is able to handle it because the token size is less based on the context token size.

312
00:16:46,000 --> 00:16:50,000
That is the limitation with respect to any LM model.

313
00:16:50,000 --> 00:16:53,000
Now is the main types that I am actually going to discuss.

314
00:16:54,000 --> 00:16:58,000
The first type is something called as stuff document chain.

315
00:16:58,000 --> 00:16:59,000
Okay.

316
00:17:00,000 --> 00:17:04,000
And here we are just going to go ahead and write text summarization technique okay.

317
00:17:06,000 --> 00:17:09,000
Now let's uh do one thing.

318
00:17:09,000 --> 00:17:12,000
Let's discuss this entire thing in the next video.

319
00:17:12,000 --> 00:17:13,000
Okay.

320
00:17:13,000 --> 00:17:17,000
And uh, here, uh, one by one we will go ahead with this.

321
00:17:17,000 --> 00:17:18,000
There are three types.

322
00:17:18,000 --> 00:17:22,000
One is the stuff documentation, uh, chain that we use.

323
00:17:22,000 --> 00:17:24,000
And for this I will be using a bigger PDF.

324
00:17:24,000 --> 00:17:28,000
When compared to the previous one, I'll be using at least 5 to 6 pages PDF.

325
00:17:28,000 --> 00:17:29,000
Okay.

326
00:17:29,000 --> 00:17:34,000
And uh, after completing uh, this that is using stuff document chain.

327
00:17:34,000 --> 00:17:37,000
We will also see other types like refine.

328
00:17:37,000 --> 00:17:40,000
There is there is something called as a refine document chain.

329
00:17:40,000 --> 00:17:44,000
Um, you know and if I also see show you the documentation.

330
00:17:44,000 --> 00:17:44,000
Right.

331
00:17:44,000 --> 00:17:48,000
Lets me show you the documentation over here.

332
00:17:48,000 --> 00:17:51,000
So here you will be able to see map reduce.

333
00:17:51,000 --> 00:17:51,000
MapReduce.

334
00:17:51,000 --> 00:17:55,000
So stuff document chain basically means it simply concatenates document into a prompt.

335
00:17:55,000 --> 00:18:00,000
MapReduce basically means it split document into batches, summarize those and then summarize into summaries.

336
00:18:00,000 --> 00:18:05,000
Refine basically means which updates a rolling summary by iterating over the documents in a sequence.

337
00:18:05,000 --> 00:18:11,000
So I will just show you with this, and I will also try to draw a diagram to in order to make you understand.

338
00:18:11,000 --> 00:18:14,000
This is what I'm going to discuss in the next video.

339
00:18:14,000 --> 00:18:15,000
Thank you.

340
00:18:15,000 --> 00:18:16,000
I will see you all in the next video.