1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:05,000
So we are going to continue the discussion with respect to our Lange Chain series.

3
00:00:05,000 --> 00:00:11,000
Now in this video, we are going to create an end to end Q&A chatbot using open source models.

4
00:00:11,000 --> 00:00:17,000
And specifically we will be using llama now already in our previous video, we have seen this end to

5
00:00:17,000 --> 00:00:20,000
end chatbot with the help of, uh, you know, OpenAI.

6
00:00:20,000 --> 00:00:26,000
Uh, and here, uh, the best thing was that we made it so loosely coupled that we could use any kind

7
00:00:26,000 --> 00:00:27,000
of models that we want.

8
00:00:27,000 --> 00:00:28,000
And.

9
00:00:28,000 --> 00:00:33,000
All right, so similarly, uh, what we are basically going to do is that we are going to go ahead and

10
00:00:33,000 --> 00:00:35,000
do it with the llama itself.

11
00:00:35,000 --> 00:00:41,000
Now, in the case of llama, you definitely require different set of libraries.

12
00:00:41,000 --> 00:00:43,000
Again, I'll be talking about that.

13
00:00:43,000 --> 00:00:50,000
And instead of using chat OpenAI now you will be having an another library which is called as llama

14
00:00:50,000 --> 00:00:50,000
itself.

15
00:00:50,000 --> 00:00:51,000
Right.

16
00:00:51,000 --> 00:00:53,000
So let us go ahead and let us start this.

17
00:00:53,000 --> 00:00:55,000
So first of all I will go ahead and create my app Dot Pi.

18
00:00:57,000 --> 00:01:04,000
Now quickly I will go ahead and import from lang chain underscore core dot prompts.

19
00:01:04,000 --> 00:01:09,000
I'm going to import chart prompt template okay.

20
00:01:09,000 --> 00:01:12,000
So let me quickly go ahead and write chart prompt template.

21
00:01:12,000 --> 00:01:20,000
Now after this I'm going to go ahead and write from lang chain underscore code dot outputs.

22
00:01:20,000 --> 00:01:21,000
Okay.

23
00:01:21,000 --> 00:01:23,000
Output parser.

24
00:01:23,000 --> 00:01:27,000
We are just going to go ahead and import our string output parser.

25
00:01:27,000 --> 00:01:27,000
Right.

26
00:01:27,000 --> 00:01:30,000
This is also required over here.

27
00:01:30,000 --> 00:01:33,000
Then since we really want to go ahead and use Olama.

28
00:01:33,000 --> 00:01:40,000
Uh I'm just going to use this long chain underscore community dot Llms import Olama.

29
00:01:40,000 --> 00:01:41,000
Okay.

30
00:01:41,000 --> 00:01:46,000
Along with this, as you all know, uh, since I will be specifically using.

31
00:01:46,000 --> 00:01:48,000
So it should be olama over here.

32
00:01:48,000 --> 00:01:53,000
As you all know, I'll be using Streamlit, so let me just go ahead and write import Streamlit as st.

33
00:01:54,000 --> 00:01:58,000
And here, uh, along with this, I'll also go ahead and import os.

34
00:01:58,000 --> 00:01:58,000
Okay.

35
00:01:59,000 --> 00:02:04,000
Now quickly, uh, what we are basically going to do is that again, I will just go ahead and open main

36
00:02:04,000 --> 00:02:08,000
dot pi, the initial setup that we really want, right.

37
00:02:08,000 --> 00:02:13,000
This is all specific information so that we'll be able to trace it in our Lang Smith here.

38
00:02:13,000 --> 00:02:19,000
I'll just say with open I instead of writing open I will go ahead and write with Alama.

39
00:02:19,000 --> 00:02:19,000
Okay.

40
00:02:19,000 --> 00:02:23,000
So these are the initial things that we will be specifically requiring.

41
00:02:23,000 --> 00:02:24,000
Okay.

42
00:02:24,000 --> 00:02:29,000
Now quickly we are going to go ahead and create our prompt template.

43
00:02:29,000 --> 00:02:33,000
Now in order to create a prompt template I have already imported that prompt template.

44
00:02:33,000 --> 00:02:39,000
So let's go ahead and define our prompt template to create the similar way like how we have created

45
00:02:39,000 --> 00:02:41,000
the specific chat bot.

46
00:02:41,000 --> 00:02:41,000
Right.

47
00:02:41,000 --> 00:02:47,000
So here I'm just going to go ahead and write prompt is equal to chat prompt template dot from underscore

48
00:02:47,000 --> 00:02:48,000
messages.

49
00:02:49,000 --> 00:02:52,000
And here we are going to use again two things.

50
00:02:52,000 --> 00:02:58,000
As I said uh one is with respect to system and one is with respect to user.

51
00:02:58,000 --> 00:02:58,000
Okay.

52
00:02:58,000 --> 00:03:01,000
I'm writing, hey, you are a helpful assistant.

53
00:03:01,000 --> 00:03:02,000
Please respond to the user queries.

54
00:03:02,000 --> 00:03:05,000
Then you have this question, whatever question we are specifically giving.

55
00:03:05,000 --> 00:03:06,000
Okay.

56
00:03:06,000 --> 00:03:12,000
Now, uh, other than this, you will be able to see that, uh, we have also defined this generate

57
00:03:12,000 --> 00:03:12,000
response.

58
00:03:12,000 --> 00:03:13,000
Right.

59
00:03:13,000 --> 00:03:18,000
So let me copy this and let me paste it over here and I'll show you what all changes will specifically

60
00:03:18,000 --> 00:03:18,000
happen.

61
00:03:18,000 --> 00:03:24,000
Instead of writing open I you know what exactly things we will be basically requiring?

62
00:03:24,000 --> 00:03:27,000
I don't require any API key because this is completely open source.

63
00:03:27,000 --> 00:03:29,000
Instead of writing chat OpenAI here.

64
00:03:29,000 --> 00:03:32,000
Now I'm going to define my llama model.

65
00:03:32,000 --> 00:03:32,000
Right.

66
00:03:32,000 --> 00:03:38,000
And with respect to this llama model, you will be able to just see like which all models I have.

67
00:03:38,000 --> 00:03:38,000
Right.

68
00:03:38,000 --> 00:03:43,000
So in order to check it out I will just go ahead and import command prompt.

69
00:03:43,000 --> 00:03:43,000
Okay.

70
00:03:43,000 --> 00:03:48,000
Now with respect to this command prompt I will just say, hey, go ahead and write llama run.

71
00:03:49,000 --> 00:03:51,000
Um, let's say gamma two.

72
00:03:51,000 --> 00:03:53,000
So this is one of the models that I have.

73
00:03:54,000 --> 00:03:56,000
Uh, I think it is also getting installed.

74
00:03:56,000 --> 00:03:57,000
Okay.

75
00:03:57,000 --> 00:03:58,000
So this is getting installed.

76
00:03:58,000 --> 00:03:58,000
It's okay.

77
00:03:58,000 --> 00:04:01,000
Till then I will just open my command prompt again.

78
00:04:01,000 --> 00:04:02,000
Okay.

79
00:04:02,000 --> 00:04:04,000
And I will see.

80
00:04:04,000 --> 00:04:06,000
Oh, llama uh, let's say gamma model.

81
00:04:06,000 --> 00:04:08,000
So gamma model is also not there.

82
00:04:08,000 --> 00:04:13,000
Let's see uh what all models I specifically have, you know, and then we'll try to run it.

83
00:04:13,000 --> 00:04:16,000
So for that I will go to Alarm.com.

84
00:04:16,000 --> 00:04:16,000
Okay.

85
00:04:16,000 --> 00:04:18,000
What all models alarm actually provide.

86
00:04:18,000 --> 00:04:20,000
We also need to have a look onto that.

87
00:04:20,000 --> 00:04:25,000
And uh we will be seeing that what all models is basically provided by alarm itself.

88
00:04:25,000 --> 00:04:28,000
So here I will just go ahead and click on models.

89
00:04:28,000 --> 00:04:28,000
Quickly.

90
00:04:28,000 --> 00:04:30,000
Go ahead and see the library.

91
00:04:30,000 --> 00:04:33,000
So many different libraries are there like gamma two and all.

92
00:04:33,000 --> 00:04:37,000
Gamma two is the recent open source library that has been brought by, uh, Google itself.

93
00:04:37,000 --> 00:04:37,000
Right.

94
00:04:37,000 --> 00:04:39,000
So you can also go ahead and use that.

95
00:04:39,000 --> 00:04:40,000
Right.

96
00:04:40,000 --> 00:04:45,000
Uh, along with that you have this llama three, you have this quant two deep secret code Phi three.

97
00:04:45,000 --> 00:04:46,000
So many different models are there.

98
00:04:46,000 --> 00:04:46,000
Okay.

99
00:04:46,000 --> 00:04:49,000
If I probably go ahead and check with respect to the prompt two.

100
00:04:49,000 --> 00:04:50,000
Right.

101
00:04:50,000 --> 00:04:55,000
So this is over here I will just go ahead and hide this okay.

102
00:04:55,000 --> 00:04:59,000
What I'm actually going to do is that I'll just go ahead and write Allama five three.

103
00:04:59,000 --> 00:05:01,000
I think this should be there.

104
00:05:01,000 --> 00:05:02,000
Uh, sorry.

105
00:05:02,000 --> 00:05:03,000
Allama, run.

106
00:05:03,000 --> 00:05:03,000
Five.

107
00:05:03,000 --> 00:05:03,000
Three.

108
00:05:04,000 --> 00:05:04,000
Okay.

109
00:05:04,000 --> 00:05:05,000
It should be.

110
00:05:05,000 --> 00:05:06,000
The command should be run five.

111
00:05:06,000 --> 00:05:07,000
Three.

112
00:05:07,000 --> 00:05:07,000
Again.

113
00:05:07,000 --> 00:05:09,000
This is also getting installed.

114
00:05:09,000 --> 00:05:17,000
Okay, so here, whichever libraries you really want to go ahead and probably, you know, use it with

115
00:05:17,000 --> 00:05:18,000
respect to Allama.

116
00:05:18,000 --> 00:05:21,000
First of all you need to download all these libraries.

117
00:05:21,000 --> 00:05:21,000
Right.

118
00:05:21,000 --> 00:05:23,000
And that is the reason here.

119
00:05:23,000 --> 00:05:26,000
You'll be able to see that as soon as I go ahead and write Phi three.

120
00:05:26,000 --> 00:05:27,000
It is getting downloaded.

121
00:05:27,000 --> 00:05:30,000
If I go ahead and write gamma two, it is getting downloaded.

122
00:05:30,000 --> 00:05:30,000
Okay.

123
00:05:30,000 --> 00:05:37,000
So, uh, this is the first important thing that you really need to see with respect to the libraries.

124
00:05:37,000 --> 00:05:37,000
Okay.

125
00:05:37,000 --> 00:05:40,000
So let me just reload it quickly.

126
00:05:40,000 --> 00:05:42,000
So here you have this.

127
00:05:42,000 --> 00:05:49,000
Let's say I want to go ahead and run anything like, uh, Allama Gamma two is here then.

128
00:05:49,000 --> 00:05:50,000
Oh, llama run!

129
00:05:50,000 --> 00:05:50,000
Lava is here.

130
00:05:51,000 --> 00:05:51,000
Llama run!

131
00:05:51,000 --> 00:05:52,000
Solar is there.

132
00:05:52,000 --> 00:05:54,000
Let me just go ahead and take some Mistral.

133
00:05:54,000 --> 00:05:59,000
Also, I don't know whether it is installed in my local or not, but I will just go ahead and check

134
00:05:59,000 --> 00:05:59,000
it out.

135
00:06:01,000 --> 00:06:03,000
So yes, Mistral has been installed.

136
00:06:03,000 --> 00:06:09,000
Uh, so let me quickly we'll go ahead and try to access this model and we'll try to see whether it is

137
00:06:09,000 --> 00:06:10,000
working fine or not.

138
00:06:10,000 --> 00:06:15,000
So if I go ahead and write high here, you can see I'm getting all the messages over here.

139
00:06:15,000 --> 00:06:15,000
Perfect.

140
00:06:15,000 --> 00:06:17,000
This is great.

141
00:06:17,000 --> 00:06:21,000
So let's go ahead and use one of the model which is called as llama over here.

142
00:06:21,000 --> 00:06:25,000
And along with that you also have gamma two model.

143
00:06:25,000 --> 00:06:27,000
This is the recent model that has been there.

144
00:06:27,000 --> 00:06:30,000
Uh the previous version of the model will get automatically removed.

145
00:06:30,000 --> 00:06:33,000
Like you also don't have llama to lie right now.

146
00:06:33,000 --> 00:06:34,000
Llama three is also there, right?

147
00:06:34,000 --> 00:06:37,000
These all are open source models, which is super amazing.

148
00:06:37,000 --> 00:06:38,000
And you can also use this.

149
00:06:38,000 --> 00:06:39,000
Okay.

150
00:06:39,000 --> 00:06:41,000
Now you know that I have this Mistral.

151
00:06:41,000 --> 00:06:43,000
So let me quickly.

152
00:06:43,000 --> 00:06:47,000
And I've been installing any libraries that you want to probably go ahead and install.

153
00:06:47,000 --> 00:06:48,000
Just go ahead.

154
00:06:48,000 --> 00:06:52,000
And any little model you want to install in your local machine, you just need to go ahead and write

155
00:06:52,000 --> 00:06:53,000
this specific command, okay.

156
00:06:53,000 --> 00:06:57,000
And always make sure that you keep on looking on this because this will get keep on getting updated

157
00:06:57,000 --> 00:06:58,000
okay.

158
00:06:58,000 --> 00:07:05,000
So here I'm here I'm writing model is equal to whatever model name I'm specifically giving with respect

159
00:07:05,000 --> 00:07:06,000
to this engine.

160
00:07:06,000 --> 00:07:07,000
It can be llama two.

161
00:07:07,000 --> 00:07:07,000
Llama three.

162
00:07:07,000 --> 00:07:08,000
It is up to you.

163
00:07:08,000 --> 00:07:12,000
Okay then we are creating this entire chain prompt output parser.

164
00:07:12,000 --> 00:07:14,000
We are invoking it based on the input text.

165
00:07:14,000 --> 00:07:15,000
Here we are going to get this.

166
00:07:15,000 --> 00:07:18,000
So this is only the change that we have actually done over here.

167
00:07:18,000 --> 00:07:24,000
And remember one more thing that guys uh that you really need to remember is that whenever we are using

168
00:07:24,000 --> 00:07:27,000
llama, we are specifically only using open source models.

169
00:07:27,000 --> 00:07:27,000
Okay.

170
00:07:28,000 --> 00:07:33,000
Now once this is done, it is time that we can go ahead and use the same functionalities, right?

171
00:07:33,000 --> 00:07:36,000
Whatever functionalities we specifically require over here.

172
00:07:36,000 --> 00:07:43,000
But as you all know, uh, in llama three, in all the open source models, we don't have to even,

173
00:07:43,000 --> 00:07:45,000
uh, you know, play with all the parameters out there.

174
00:07:45,000 --> 00:07:46,000
Right?

175
00:07:46,000 --> 00:07:51,000
So what I can actually do is that I will just go ahead and copy this entire thing, okay?

176
00:07:52,000 --> 00:07:54,000
I don't require this settings.

177
00:07:54,000 --> 00:07:56,000
Also write anything as such.

178
00:07:56,000 --> 00:07:57,000
I'll copy this over here.

179
00:07:57,000 --> 00:07:58,000
I'll paste it over here.

180
00:08:01,000 --> 00:08:01,000
Okay.

181
00:08:02,000 --> 00:08:05,000
Uh, I'll say hey, if user input is given okay.

182
00:08:05,000 --> 00:08:06,000
Just see.

183
00:08:06,000 --> 00:08:06,000
Okay.

184
00:08:06,000 --> 00:08:14,000
What all things I will be doing here, I just need to pass my user input and I need to pass my engine.

185
00:08:14,000 --> 00:08:14,000
Right.

186
00:08:14,000 --> 00:08:16,000
Engine which model we are selecting.

187
00:08:16,000 --> 00:08:20,000
And over here also we'll remove this API key because I don't require any API key.

188
00:08:20,000 --> 00:08:21,000
Right.

189
00:08:21,000 --> 00:08:23,000
So temperature is there, max token is there.

190
00:08:23,000 --> 00:08:26,000
Everything is going on with respect to the OpenAI model.

191
00:08:26,000 --> 00:08:28,000
Let's say that right now I have Mistral.

192
00:08:28,000 --> 00:08:30,000
So I will go ahead and write Mistral over here.

193
00:08:31,000 --> 00:08:33,000
Other models are basically getting downloaded.

194
00:08:33,000 --> 00:08:35,000
So I'll remove all these things later on.

195
00:08:35,000 --> 00:08:37,000
You can put any number of models that you want okay.

196
00:08:37,000 --> 00:08:40,000
This is the basic change.

197
00:08:40,000 --> 00:08:43,000
You know, that is what is specifically done, right?

198
00:08:44,000 --> 00:08:49,000
Uh, let's see whether this open source model supports this temperature max tokens also or not.

199
00:08:49,000 --> 00:08:49,000
Okay.

200
00:08:49,000 --> 00:08:52,000
I'll be giving it over here, but I don't know whether it will be there or not.

201
00:08:52,000 --> 00:08:52,000
Okay.

202
00:08:53,000 --> 00:08:59,000
So, uh, this else if condition can be removed because I don't require it.

203
00:08:59,000 --> 00:09:00,000
That's it.

204
00:09:00,000 --> 00:09:07,000
See how simple it was just replicating with respect to OpenAI, with Olamide itself, right.

205
00:09:07,000 --> 00:09:10,000
So let's quickly go ahead and run this.

206
00:09:10,000 --> 00:09:15,000
So here I will just go ahead and write Streamlit run App.py.

207
00:09:15,000 --> 00:09:15,000
Okay.

208
00:09:18,000 --> 00:09:23,000
So here you can see, uh, let me just go ahead and change this particular message.

209
00:09:23,000 --> 00:09:24,000
Okay.

210
00:09:24,000 --> 00:09:27,000
I had to run App.py.

211
00:09:27,000 --> 00:09:28,000
Let's see.

212
00:09:28,000 --> 00:09:30,000
So I will say CD dot dot.

213
00:09:30,000 --> 00:09:34,000
I have to go through my third folder Olama chat bot.

214
00:09:34,000 --> 00:09:37,000
Now if I'll go ahead and write Streamlit run app dot pi.

215
00:09:39,000 --> 00:09:41,000
So this has got executed now.

216
00:09:41,000 --> 00:09:42,000
Perfect.

217
00:09:42,000 --> 00:09:43,000
Mistral is there.

218
00:09:43,000 --> 00:09:44,000
Let me just go ahead and select.

219
00:09:44,000 --> 00:09:46,000
Hi I think this is an inputs.

220
00:09:46,000 --> 00:09:46,000
Also.

221
00:09:46,000 --> 00:09:48,000
It will be taking temperature and all.

222
00:09:48,000 --> 00:09:49,000
How can you assist you today.

223
00:09:49,000 --> 00:09:50,000
See so fast it is.

224
00:09:51,000 --> 00:09:53,000
And then I will say hello.

225
00:09:53,000 --> 00:09:59,000
Please talk about generative AI, right?

226
00:10:00,000 --> 00:10:05,000
Once I execute it, you'll be able to see that it will also give you the response.

227
00:10:05,000 --> 00:10:09,000
And similarly, uh, if you go towards the Lange Smith okay.

228
00:10:09,000 --> 00:10:12,000
So let me just see here is my output.

229
00:10:12,000 --> 00:10:16,000
And now if I go to my Lange Smith okay.

230
00:10:16,000 --> 00:10:21,000
So if I go to just my Lange Smith Let's say over here, I'm going to do the sign up.

231
00:10:21,000 --> 00:10:24,000
Or you can, once you do the sign in, I think every time it will show you the sign up.

232
00:10:24,000 --> 00:10:27,000
But at the end of the day, you'll be also able to go inside this.

233
00:10:27,000 --> 00:10:28,000
Okay.

234
00:10:28,000 --> 00:10:31,000
Now here you'll be able to see that this entire platform will get loaded.

235
00:10:31,000 --> 00:10:31,000
Okay?

236
00:10:32,000 --> 00:10:38,000
So whatever conversation this time you are having, you're having it with respect to your open source

237
00:10:38,000 --> 00:10:41,000
LM models, which is available in your local machine.

238
00:10:41,000 --> 00:10:42,000
Right.

239
00:10:42,000 --> 00:10:44,000
So if I just go ahead and click on projects.

240
00:10:44,000 --> 00:10:47,000
So here you can see simple Q and a chatbot with llama.

241
00:10:47,000 --> 00:10:49,000
Uh 2.64 seconds.

242
00:10:49,000 --> 00:10:49,000
Hi.

243
00:10:49,000 --> 00:10:51,000
You're basically getting this input.

244
00:10:52,000 --> 00:10:54,000
What was the token size first token this much time it took?

245
00:10:54,000 --> 00:10:56,000
How many number of tokens cost is zero.

246
00:10:56,000 --> 00:10:59,000
You can go ahead and just keep on exploring this, right.

247
00:10:59,000 --> 00:11:03,000
But altogether we specifically design a chat prompt template.

248
00:11:03,000 --> 00:11:05,000
Then we use LM model.

249
00:11:05,000 --> 00:11:11,000
And finally we try to uh display the string output parts over here.

250
00:11:11,000 --> 00:11:11,000
Right.

251
00:11:11,000 --> 00:11:15,000
So everything is probably visible with respect to this.

252
00:11:15,000 --> 00:11:15,000
Right.

253
00:11:15,000 --> 00:11:19,000
So yeah, uh, I guess you are able to understand this.

254
00:11:19,000 --> 00:11:25,000
See, the entire output is also basically getting displayed, which is good enough right now.

255
00:11:25,000 --> 00:11:31,000
Uh, let's see like how much time it is probably taking this chart probably now see once gamma two and

256
00:11:31,000 --> 00:11:31,000
all will get added.

257
00:11:31,000 --> 00:11:39,000
What I will do is that I will also give an option over here so that you include gamma 2 or 3, any number

258
00:11:39,000 --> 00:11:39,000
of models over here.

259
00:11:39,000 --> 00:11:46,000
You can select that models and you can start writing your code or start using this LM models for any

260
00:11:46,000 --> 00:11:47,000
kind of response.

261
00:11:47,000 --> 00:11:47,000
Okay.

262
00:11:48,000 --> 00:11:53,000
So I hope are you able to understand this, how to create an end to end Q&A project with the help of

263
00:11:53,000 --> 00:11:54,000
Allama.

264
00:11:54,000 --> 00:11:55,000
So yes, this was it from my side.

265
00:11:55,000 --> 00:11:57,000
I will see you all in the next video.

266
00:11:57,000 --> 00:11:57,000
Thank you.

