1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue the discussion with respect to Long Chain.

3
00:00:03,000 --> 00:00:09,000
Uh, in this video we are going to create a very simple application, uh, LM application or generative

4
00:00:09,000 --> 00:00:11,000
AI application using Lang chain.

5
00:00:11,000 --> 00:00:16,000
And uh, we will also be understanding what all tools we will be specifically requiring.

6
00:00:16,000 --> 00:00:21,000
So what kind of task or what kind of application we are just going to generate over here is that let's

7
00:00:21,000 --> 00:00:25,000
say I have a specific website and that website has some content.

8
00:00:25,000 --> 00:00:29,000
Let's consider that it has some kind of text format, some information over there.

9
00:00:29,000 --> 00:00:33,000
Now what we are going to do is that we are going to extract that entire information.

10
00:00:33,000 --> 00:00:37,000
Uh, again, for that, we definitely require some data ingestion technique.

11
00:00:37,000 --> 00:00:39,000
Uh, we will be seeing in long chain.

12
00:00:39,000 --> 00:00:43,000
There are multiple data ingestion techniques such as web based loader.

13
00:00:43,000 --> 00:00:48,000
You have uh pie PDF directory reader, you know, and you can also read PDF directly.

14
00:00:48,000 --> 00:00:52,000
There are lot of libraries that are available in long chain, which we will be looking at it one by

15
00:00:52,000 --> 00:00:52,000
one.

16
00:00:52,000 --> 00:00:57,000
But uh, in this video, what we will do is that I will be considering a website, and from that particular

17
00:00:57,000 --> 00:01:02,000
website we will try to take up the content or will try to scrap the entire content after scrapping that

18
00:01:02,000 --> 00:01:02,000
specific content.

19
00:01:02,000 --> 00:01:08,000
What we are going to do is that we are going to divide that entire content, you know, or entire text

20
00:01:08,000 --> 00:01:14,000
into chunks, and then we will try to convert that into vectors by using some vector embedding techniques.

21
00:01:14,000 --> 00:01:21,000
And after doing that we will be using LM along with prompt engineering uh, to specifically get uh output

22
00:01:21,000 --> 00:01:22,000
from that particular page.

23
00:01:22,000 --> 00:01:26,000
So there are lot many things that we are specifically going to discuss.

24
00:01:26,000 --> 00:01:29,000
Uh, so let me just go ahead and share my screen.

25
00:01:29,000 --> 00:01:32,000
Uh, so here you can see, uh, I have my this Lang Smith page.

26
00:01:32,000 --> 00:01:36,000
So what I'm actually going to do, I'll go down over here and there are some documents.

27
00:01:36,000 --> 00:01:36,000
Okay.

28
00:01:36,000 --> 00:01:41,000
So this documents, uh, we will try to read one of this documents and let's see whether we will be

29
00:01:41,000 --> 00:01:43,000
able to scrap this entire thing.

30
00:01:43,000 --> 00:01:45,000
So let's say this is my tutorial over here.

31
00:01:45,000 --> 00:01:49,000
Uh, let's pick up a page where I will be able to see some information.

32
00:01:49,000 --> 00:01:52,000
So here let's consider this is my page.

33
00:01:52,000 --> 00:01:54,000
So I'm going to probably take this entire page.

34
00:01:54,000 --> 00:01:57,000
And I'm going to read the content from this particular page.

35
00:01:57,000 --> 00:02:03,000
And with the help of LLM and Prompt Engineering, if I ask any question to my uh, Jenny AI app, it

36
00:02:03,000 --> 00:02:04,000
should be able to give the answers.

37
00:02:04,000 --> 00:02:06,000
Okay, so that is what I am planning to do it.

38
00:02:06,000 --> 00:02:08,000
So let's go ahead.

39
00:02:08,000 --> 00:02:11,000
And here you can see in getting started we have learned about all these things.

40
00:02:11,000 --> 00:02:14,000
Now I'll go ahead and create my next file.

41
00:02:14,000 --> 00:02:17,000
And we'll be doing the coding over here.

42
00:02:17,000 --> 00:02:19,000
So let's go ahead and select the kernel.

43
00:02:19,000 --> 00:02:30,000
Uh as usual as I said that we are going to generate a simple, simple JNI app using Lang chain okay.

44
00:02:30,000 --> 00:02:33,000
And here again we are going to use OpenAI initially.

45
00:02:33,000 --> 00:02:37,000
And then we'll also go ahead with uh some open source libraries okay.

46
00:02:37,000 --> 00:02:38,000
That also will be seen okay.

47
00:02:38,000 --> 00:02:40,000
So let's go ahead and execute this.

48
00:02:40,000 --> 00:02:43,000
Uh here I'm going to just go ahead and create some cells.

49
00:02:43,000 --> 00:02:49,000
Now the first thing uh, which we have already done in getting started is and this is the most important

50
00:02:49,000 --> 00:02:52,000
thing, specifically loading all the keys.

51
00:02:52,000 --> 00:02:52,000
Right.

52
00:02:52,000 --> 00:02:55,000
So I will just go ahead and load all the specific keys.

53
00:02:55,000 --> 00:02:58,000
So here you can see from dot e and V load underscore dot.

54
00:02:58,000 --> 00:03:03,000
Then I need my open API key along with that lang chain API key lang chain project.

55
00:03:03,000 --> 00:03:05,000
So this will be specifically common.

56
00:03:05,000 --> 00:03:12,000
Now as I have already told that, uh, what I am actually going to do is that if I want to retrieve

57
00:03:12,000 --> 00:03:17,000
the information from that particular website, uh, I have to first of all, read the entire content

58
00:03:17,000 --> 00:03:18,000
from the website.

59
00:03:18,000 --> 00:03:18,000
Right?

60
00:03:18,000 --> 00:03:20,000
That is what I really have to do.

61
00:03:20,000 --> 00:03:26,000
So what I am actually going to do over here is that, uh, we will first of all, go ahead and import

62
00:03:26,000 --> 00:03:30,000
one important library, which is called as beautiful soup.

63
00:03:30,000 --> 00:03:30,000
Okay.

64
00:03:30,000 --> 00:03:33,000
Beautiful soup four okay.

65
00:03:33,000 --> 00:03:41,000
Now why this library will be used because this library will help us to scrap the entire website data.

66
00:03:41,000 --> 00:03:41,000
Okay.

67
00:03:41,000 --> 00:03:44,000
And as you all know right now, we are going to use this particular website data.

68
00:03:44,000 --> 00:03:50,000
So we'll be taking this URL in order to scrap this entire text and retrieve it at our site.

69
00:03:50,000 --> 00:03:54,000
We will be specifically using this Beautifulsoup for library.

70
00:03:54,000 --> 00:03:54,000
Okay.

71
00:03:54,000 --> 00:03:58,000
Now, uh, after doing that, uh, first of all, let's go ahead and install it.

72
00:03:58,000 --> 00:04:01,000
So I've already done I've written it in the requirement dot txt.

73
00:04:02,000 --> 00:04:03,000
I will go and open my terminal.

74
00:04:03,000 --> 00:04:09,000
And here I'm just going to write pip install minus our requirement dot txt.

75
00:04:09,000 --> 00:04:14,000
So once I do the installation here, you'll be able to see that my installation will take place uh in

76
00:04:14,000 --> 00:04:16,000
the same V and V environment okay.

77
00:04:17,000 --> 00:04:19,000
Now once this is done I will close this.

78
00:04:19,000 --> 00:04:23,000
Uh, let's quickly use one data ingestion technique.

79
00:04:23,000 --> 00:04:26,000
Now, first of all, we'll focus on data ingestion.

80
00:04:26,000 --> 00:04:30,000
Data ingestion basically means from the website, right from the website.

81
00:04:30,000 --> 00:04:33,000
We need to we need to scrap the data.

82
00:04:33,000 --> 00:04:34,000
Right.

83
00:04:34,000 --> 00:04:36,000
Scrap or scrape the data right?

84
00:04:36,000 --> 00:04:38,000
I'll not say scrap it, scrape the data.

85
00:04:38,000 --> 00:04:38,000
Okay.

86
00:04:38,000 --> 00:04:40,000
So that is what we really want to do.

87
00:04:40,000 --> 00:04:41,000
Okay.

88
00:04:41,000 --> 00:04:47,000
Now from, uh, Lang Chin usually provides you an amazing module, uh, which is specifically available

89
00:04:47,000 --> 00:04:48,000
in Lang Chin underscore community.

90
00:04:48,000 --> 00:04:50,000
So I will go ahead and import from Lang Chin.

91
00:04:51,000 --> 00:04:56,000
Uh, Lang Chin underscore community.

92
00:04:56,000 --> 00:04:56,000
Okay.

93
00:04:56,000 --> 00:05:02,000
So here, uh, what we are basically going to do is that uh again for this we require lang chain underscore

94
00:05:02,000 --> 00:05:03,000
community library.

95
00:05:03,000 --> 00:05:06,000
So let me just go ahead and import it over here again.

96
00:05:06,000 --> 00:05:09,000
So here you have this Lang chain underscore community.

97
00:05:09,000 --> 00:05:12,000
Let me open my terminal and let me quickly install this.

98
00:05:12,000 --> 00:05:14,000
So because I will be requiring it okay.

99
00:05:16,000 --> 00:05:20,000
Lang chain underscore community has lot of document loaders right.

100
00:05:20,000 --> 00:05:25,000
Document loaders basically means you will be able to load the entire data from different different sources.

101
00:05:25,000 --> 00:05:30,000
That is the reason I am installing this particular library, which is called as lang chain underscore

102
00:05:30,000 --> 00:05:31,000
community.

103
00:05:31,000 --> 00:05:33,000
We'll be discussing more about it as we go ahead.

104
00:05:33,000 --> 00:05:36,000
Okay, so now the installation has been taking place.

105
00:05:36,000 --> 00:05:39,000
Uh, what I'm actually going to do, I'm going to go back to my code.

106
00:05:39,000 --> 00:05:45,000
So here now let's see if I go ahead and see this lang chain underscore community okay.

107
00:05:45,000 --> 00:05:51,000
And here we are going to basically use document underscore loaders.

108
00:05:51,000 --> 00:05:52,000
Okay.

109
00:05:52,000 --> 00:05:57,000
And here I'm going to import web base loader.

110
00:05:57,000 --> 00:05:58,000
So web base loader is nothing.

111
00:05:58,000 --> 00:06:03,000
But you give the website link and automatically it will be able to load it.

112
00:06:03,000 --> 00:06:04,000
I'll get an error I guess.

113
00:06:04,000 --> 00:06:07,000
So here you can see okay this is a okay.

114
00:06:07,000 --> 00:06:09,000
Let's see whether I'll get an error.

115
00:06:09,000 --> 00:06:11,000
So user agent environment not set.

116
00:06:11,000 --> 00:06:15,000
Consider setting it to identify your request okay.

117
00:06:15,000 --> 00:06:18,000
So right now it is some kind of error.

118
00:06:18,000 --> 00:06:23,000
User agent environment variable not set user agent.

119
00:06:23,000 --> 00:06:26,000
So I have selected the virtual environment over here.

120
00:06:27,000 --> 00:06:29,000
Let's see what is the issue over here okay.

121
00:06:29,000 --> 00:06:31,000
So this has got executed now.

122
00:06:31,000 --> 00:06:36,000
Now what I am actually going to do over here is that I'll go ahead and use a loader and I'll say, hey,

123
00:06:36,000 --> 00:06:39,000
use this web based loader, okay.

124
00:06:40,000 --> 00:06:41,000
Web based loader.

125
00:06:41,000 --> 00:06:43,000
And inside this I will give my link.

126
00:06:43,000 --> 00:06:44,000
Right.

127
00:06:44,000 --> 00:06:45,000
So link will be nothing.

128
00:06:45,000 --> 00:06:47,000
But let's go ahead and see the link which it is.

129
00:06:47,000 --> 00:06:50,000
So I will go to quickly.

130
00:06:50,000 --> 00:06:55,000
I will be probably giving this entire website link because this is where I really need to scrap my entire

131
00:06:55,000 --> 00:06:56,000
data, right.

132
00:06:56,000 --> 00:06:58,000
So I will just go ahead and give my link over here.

133
00:06:58,000 --> 00:06:59,000
Okay.

134
00:06:59,000 --> 00:07:03,000
So this will be my Https document docs dot smith dot lang chain dot tutorials.

135
00:07:04,000 --> 00:07:05,000
Uh slash manage spend.

136
00:07:05,000 --> 00:07:11,000
Okay, now after I load this particular data you can probably go ahead and just print this loader.

137
00:07:11,000 --> 00:07:14,000
So here you'll be able to see hey it is a document loader web based loader.

138
00:07:14,000 --> 00:07:17,000
Right now web based loader is something it is a kind of library.

139
00:07:17,000 --> 00:07:23,000
Whatever link you specifically get or give right a website link, it will be able to scrap the entire,

140
00:07:23,000 --> 00:07:27,000
um, scrape the entire content from that particular website.

141
00:07:28,000 --> 00:07:34,000
Now, uh, the prerequisite is that you really need to have this beautiful soup library because internally

142
00:07:34,000 --> 00:07:37,000
web based loader uses Beautifulsoup for.

143
00:07:37,000 --> 00:07:40,000
Okay, now, once I get this entire, uh, documents.

144
00:07:40,000 --> 00:07:41,000
Right.

145
00:07:41,000 --> 00:07:44,000
What I really need to do is that I will load this document.

146
00:07:44,000 --> 00:07:46,000
So here, if I go ahead and write loader dot load.

147
00:07:46,000 --> 00:07:47,000
Right.

148
00:07:47,000 --> 00:07:49,000
So in short, I'm going to get all my documents.

149
00:07:49,000 --> 00:07:51,000
If I go ahead and execute this.

150
00:07:51,000 --> 00:07:53,000
And let's go ahead and print my docs.

151
00:07:53,000 --> 00:07:57,000
So you'll be able to see all the documents inside this, all the page content I'll be able to see,

152
00:07:57,000 --> 00:08:04,000
see page content, uh, hosting tutorial every the entire scraped, uh, scraped data from the website,

153
00:08:04,000 --> 00:08:06,000
uh, which we have actually exported.

154
00:08:06,000 --> 00:08:07,000
We have actually got it over here.

155
00:08:07,000 --> 00:08:08,000
Okay.

156
00:08:08,000 --> 00:08:11,000
Very much important now very much important thing.

157
00:08:11,000 --> 00:08:13,000
One more step after this, what we do.

158
00:08:13,000 --> 00:08:17,000
And usually in every Rag application we usually performs this thing right.

159
00:08:17,000 --> 00:08:21,000
First of all, we read our entire data source from a specific data source itself.

160
00:08:21,000 --> 00:08:24,000
We'll read our entire data from a specific data source.

161
00:08:24,000 --> 00:08:28,000
Then we will load this and convert into our documents once we get that particular document.

162
00:08:28,000 --> 00:08:30,000
Now this is a huge document.

163
00:08:30,000 --> 00:08:37,000
Right now in this particular case, if it is a huge document okay, we need to perform, you know,

164
00:08:37,000 --> 00:08:40,000
we need to probably divide this entire document into chunks.

165
00:08:40,000 --> 00:08:44,000
You know, we cannot directly give this entire document to our LLM model.

166
00:08:44,000 --> 00:08:46,000
The reason is very simple.

167
00:08:46,000 --> 00:08:49,000
With respect to every LLM model there is some context size.

168
00:08:49,000 --> 00:08:51,000
There is a limitation with respect to context size.

169
00:08:51,000 --> 00:08:56,000
And as we go ahead and see different different LLM models that are going to come up, the context size

170
00:08:56,000 --> 00:08:57,000
will keep on increasing.

171
00:08:57,000 --> 00:09:02,000
But it is always a good idea that we divide this entire documents into chunks of text, right?

172
00:09:02,000 --> 00:09:06,000
Or whatever text is basically present inside this document, will try to divide that into chunks.

173
00:09:06,000 --> 00:09:07,000
Okay.

174
00:09:07,000 --> 00:09:10,000
So, uh, that is the next step that we really need to do.

175
00:09:10,000 --> 00:09:14,000
And after we divide all our data into chunks, see I'm just going to write it down over here.

176
00:09:14,000 --> 00:09:21,000
Initially we load the our data, then we uh, get all the docs okay.

177
00:09:21,000 --> 00:09:23,000
By just executing loader dot load.

178
00:09:23,000 --> 00:09:28,000
Then after that we divide our text into chunks.

179
00:09:28,000 --> 00:09:29,000
Okay.

180
00:09:29,000 --> 00:09:30,000
Into chunks.

181
00:09:30,000 --> 00:09:32,000
Chunks basically means smaller sized data.

182
00:09:32,000 --> 00:09:33,000
Right.

183
00:09:33,000 --> 00:09:34,000
And it is very simple.

184
00:09:34,000 --> 00:09:35,000
Why do we do that?

185
00:09:35,000 --> 00:09:39,000
Because every LM model has a restriction with respect to context size.

186
00:09:39,000 --> 00:09:39,000
Right?

187
00:09:39,000 --> 00:09:44,000
Then after we divide a chunk text into chunks, then the next step, what we are specifically going

188
00:09:44,000 --> 00:09:50,000
to do is that convert this into vectors by using some kind of vector embedding.

189
00:09:50,000 --> 00:09:51,000
Okay.

190
00:09:51,000 --> 00:09:52,000
Now what is vector embedding?

191
00:09:52,000 --> 00:09:58,000
Vector embedding basically means these are some, uh, techniques wherein we'll be able to convert all

192
00:09:58,000 --> 00:10:02,000
this text into vectors.

193
00:10:02,000 --> 00:10:02,000
Okay.

194
00:10:02,000 --> 00:10:05,000
And again we have to use some different different vector embeddings.

195
00:10:05,000 --> 00:10:07,000
There are a lot of different different vector embeddings.

196
00:10:07,000 --> 00:10:10,000
Uh, some of the vector embeddings I will be using is open AI.

197
00:10:10,000 --> 00:10:13,000
And we'll also see some, uh, open source vector embedding techniques.

198
00:10:13,000 --> 00:10:18,000
And after you probably get the vector embeddings, uh, all the vectors, we will try to store it in

199
00:10:18,000 --> 00:10:24,000
a vector store DV okay, so this are my steps that we are going to follow in this entire tutorial.

200
00:10:24,000 --> 00:10:28,000
Okay, so I hope you got an idea with respect to this.

201
00:10:28,000 --> 00:10:32,000
Uh, here we are going to load the data, divide our text into chunks text vectors vector embedding

202
00:10:32,000 --> 00:10:34,000
and vector uh vector store db.

203
00:10:34,000 --> 00:10:34,000
Okay.

204
00:10:35,000 --> 00:10:37,000
Finally, uh let's go ahead and do this.

205
00:10:37,000 --> 00:10:44,000
So first of all, what I'm actually going to do is that, uh, I will go ahead and use from lang chain

206
00:10:44,000 --> 00:10:45,000
underscore.

207
00:10:45,000 --> 00:10:47,000
There's something called as text splitters.

208
00:10:47,000 --> 00:10:52,000
Since we really need to split our entire documents into chunks of text.

209
00:10:52,000 --> 00:10:53,000
Right.

210
00:10:53,000 --> 00:10:58,000
So I'm going to use this text splitter and I'm going to import something called as recursive character

211
00:10:58,000 --> 00:10:59,000
text splitter.

212
00:10:59,000 --> 00:10:59,000
Okay.

213
00:10:59,000 --> 00:11:06,000
Now this recursive character text splitter is a function which will be helping us to entire split our

214
00:11:06,000 --> 00:11:06,000
documents.

215
00:11:06,000 --> 00:11:06,000
Right?

216
00:11:06,000 --> 00:11:10,000
So let's go ahead and initialize this recursive character text splitter here.

217
00:11:10,000 --> 00:11:12,000
There are some parameters.

218
00:11:12,000 --> 00:11:14,000
First is your let's see.

219
00:11:14,000 --> 00:11:18,000
So okay I'll go ahead and initialize this recursive character text splitter.

220
00:11:18,000 --> 00:11:21,000
So let me just go ahead and define this.

221
00:11:21,000 --> 00:11:26,000
This will basically be my text splitter over here which is nothing but our recursive character text

222
00:11:26,000 --> 00:11:27,000
splitter.

223
00:11:27,000 --> 00:11:29,000
Uh, which we are specifically going to use.

224
00:11:29,000 --> 00:11:30,000
Okay.

225
00:11:30,000 --> 00:11:36,000
Now why we are using this, is that because with the help of recursive character text splitter, uh,

226
00:11:36,000 --> 00:11:44,000
we will be simply, you know, we will easily be able to convert our entire documents, you know, into,

227
00:11:44,000 --> 00:11:46,000
uh, chunks of documents, in short.

228
00:11:46,000 --> 00:11:46,000
Right.

229
00:11:46,000 --> 00:11:48,000
So we will be able to do that, right?

230
00:11:48,000 --> 00:11:54,000
Uh, so chunks of text, I can basically say now, inside this, you also have an option to specify

231
00:11:54,000 --> 00:11:54,000
your chunk size.

232
00:11:54,000 --> 00:11:58,000
Let's say I go ahead and specify my chunk size is 1000.

233
00:11:58,000 --> 00:12:05,000
And I can also overlap like the my text while we are doing this uh, splitting.

234
00:12:05,000 --> 00:12:06,000
Right.

235
00:12:06,000 --> 00:12:07,000
It can also overlap.

236
00:12:07,000 --> 00:12:10,000
And the number of characters that can overlap is 200.

237
00:12:10,000 --> 00:12:10,000
Okay.

238
00:12:10,000 --> 00:12:15,000
So this is what I'm going to specifically use with respect to recursive character text splitter okay.

239
00:12:15,000 --> 00:12:19,000
Now coming to the next step after we do the text splitting Right.

240
00:12:19,000 --> 00:12:25,000
Uh, what we really need to do is that we will use this text splitter, and then I will split all my

241
00:12:25,000 --> 00:12:25,000
documents.

242
00:12:25,000 --> 00:12:28,000
So I'll go ahead and split my, all my documents.

243
00:12:28,000 --> 00:12:33,000
And obviously this docs that I'm actually going to give will be available inside this right as a parameter.

244
00:12:34,000 --> 00:12:40,000
So once I do this I will be able to get all my chunk of documents right over here.

245
00:12:41,000 --> 00:12:41,000
Perfect.

246
00:12:41,000 --> 00:12:44,000
Uh, till here you have actually done each and everything, right?

247
00:12:44,000 --> 00:12:47,000
So if I go ahead and execute it, and if I go ahead and show you.

248
00:12:47,000 --> 00:12:49,000
So here will be my documents.

249
00:12:49,000 --> 00:12:51,000
And this is how it has been divided.

250
00:12:51,000 --> 00:12:52,000
Right.

251
00:12:52,000 --> 00:12:54,000
Based on the chunk size and the overlap.

252
00:12:54,000 --> 00:12:55,000
Initially I just had one document.

253
00:12:55,000 --> 00:12:57,000
Now I have multiple documents.

254
00:12:57,000 --> 00:13:02,000
And we really need to do this because every LLM has some limitation with respect to context size.

255
00:13:02,000 --> 00:13:07,000
Now once you have done this, the next step will be that converting all this text into vectors.

256
00:13:07,000 --> 00:13:08,000
Okay.

257
00:13:08,000 --> 00:13:10,000
The reason we convert that into vectors.

258
00:13:10,000 --> 00:13:18,000
Because, uh, whenever we work with Q&A, chatbot or document Q&A, chat bot or uh, let's consider

259
00:13:18,000 --> 00:13:19,000
rag application.

260
00:13:19,000 --> 00:13:25,000
You know, there are similar a very simple algorithm is basically used which is called as cosine similarity.

261
00:13:26,000 --> 00:13:30,000
And based on the cosine similarity usually gets applied in the specific vectors itself.

262
00:13:30,000 --> 00:13:31,000
Right.

263
00:13:31,000 --> 00:13:36,000
So that is the reason what we really need to do is that, uh, in order to convert this into vectors,

264
00:13:36,000 --> 00:13:40,000
first of all, I will go ahead and use some embedding techniques.

265
00:13:40,000 --> 00:13:45,000
Now in this case, since I'm using open AI, I will go ahead and write hey, uh, train underscore open

266
00:13:45,000 --> 00:13:51,000
AI, I'm going to import something called as open AI embeddings.

267
00:13:51,000 --> 00:13:51,000
Okay.

268
00:13:51,000 --> 00:13:52,000
So this is the embedding technique.

269
00:13:52,000 --> 00:13:57,000
And this is a very efficient embedding technique wherein it will take the text and it will convert that

270
00:13:57,000 --> 00:13:57,000
into a vector.

271
00:13:58,000 --> 00:14:00,000
So here I will go ahead and initialize.

272
00:14:00,000 --> 00:14:03,000
This embedding is equal to OpenAI embeddings.

273
00:14:03,000 --> 00:14:03,000
Okay.

274
00:14:03,000 --> 00:14:04,000
So once we have done this.

275
00:14:04,000 --> 00:14:08,000
So in short this is my embedding technique which will convert our text into vectors.

276
00:14:08,000 --> 00:14:10,000
So I have executed this okay.

277
00:14:11,000 --> 00:14:18,000
Now the next thing that I'm actually going to do is that I will be also installing one library because

278
00:14:18,000 --> 00:14:20,000
with the help of embeddings we will be converting the text into vectors.

279
00:14:20,000 --> 00:14:21,000
Right.

280
00:14:21,000 --> 00:14:26,000
But where we will be storing that particular vectors, we have to store these vectors in some kind of

281
00:14:26,000 --> 00:14:27,000
vector database.

282
00:14:27,000 --> 00:14:27,000
Right.

283
00:14:27,000 --> 00:14:35,000
And in this example I'm going to take a simple example of this vector database for this vector database

284
00:14:35,000 --> 00:14:41,000
is uh, you know, it has been created by Facebook and uh, you can use it very much efficiently.

285
00:14:41,000 --> 00:14:44,000
And there also similarity search is applied in the back end.

286
00:14:44,000 --> 00:14:50,000
So first of all I will go ahead and import my files dash CPU okay.

287
00:14:50,000 --> 00:14:52,000
So since I it's okay.

288
00:14:52,000 --> 00:14:54,000
You can also have the GPU version.

289
00:14:54,000 --> 00:14:57,000
But right now I think uh CPU will be more than sufficient.

290
00:14:57,000 --> 00:15:03,000
So I'm going to go ahead and install this and let me go ahead and open my terminal and quickly install

291
00:15:03,000 --> 00:15:05,000
all the requirement dot txt okay.

292
00:15:05,000 --> 00:15:09,000
So now you'll be able to see that my CPU has been installed.

293
00:15:09,000 --> 00:15:09,000
Okay.

294
00:15:09,000 --> 00:15:17,000
Now, once I do this, the next step will be that, uh, I will just quickly go ahead and import from

295
00:15:17,000 --> 00:15:23,000
long chain underscore community dot vector stores the different different vector stores like fires,

296
00:15:23,000 --> 00:15:25,000
chroma DB and many more.

297
00:15:25,000 --> 00:15:27,000
We will be discussing about them.

298
00:15:27,000 --> 00:15:31,000
Uh, later on when we do projects we'll be getting to know different different vector stores.

299
00:15:31,000 --> 00:15:36,000
There is also like a one vector store database which is called as object box will also be seeing some

300
00:15:36,000 --> 00:15:37,000
example from there.

301
00:15:37,000 --> 00:15:39,000
Now I'm going to import files okay.

302
00:15:40,000 --> 00:15:45,000
After importing files, uh, you'll be able to see now with the help of this embedding that I have,

303
00:15:45,000 --> 00:15:50,000
right, OpenAI embedding, I will be converting all my documents into document text into vectors.

304
00:15:50,000 --> 00:15:56,000
And then all those text, all those vectors will be stored in this particular, uh, vector database.

305
00:15:56,000 --> 00:15:57,000
That is nothing but files.

306
00:15:57,000 --> 00:16:02,000
Now, here, uh, you can actually see that I have converted all my things into documents.

307
00:16:02,000 --> 00:16:05,000
Now, let me just go ahead and create my vector store.

308
00:16:05,000 --> 00:16:06,000
DB okay.

309
00:16:06,000 --> 00:16:10,000
Now this vector store DB here I'm going to specifically use for this.

310
00:16:10,000 --> 00:16:14,000
And then I will be using a function which is called as from underscore documents.

311
00:16:14,000 --> 00:16:19,000
And here I am actually going to give my documents comma embeddings okay.

312
00:16:19,000 --> 00:16:23,000
So documents comma embedding basically means all the documents which I am actually going to do.

313
00:16:23,000 --> 00:16:27,000
And all these documents are which embedding I really want to apply.

314
00:16:27,000 --> 00:16:28,000
That is nothing but OpenAI embedding.

315
00:16:28,000 --> 00:16:29,000
So I have given it over here.

316
00:16:29,000 --> 00:16:30,000
Right.

317
00:16:30,000 --> 00:16:33,000
And finally you will be able to see that I will be having this vector stored database.

318
00:16:33,000 --> 00:16:34,000
Okay.

319
00:16:34,000 --> 00:16:39,000
So once I go ahead and execute it, you will be able to see that, uh, it will get successfully executed.

320
00:16:39,000 --> 00:16:43,000
And now I will be having my entire vector stored DB okay.

321
00:16:43,000 --> 00:16:46,000
So here is my entire vector store DB right.

322
00:16:46,000 --> 00:16:50,000
And now from this particular vector store db, I will be also.

323
00:16:50,000 --> 00:16:55,000
You will also be able to so save this in your local environment or in any hard disk wherever you want.

324
00:16:55,000 --> 00:16:55,000
Right.

325
00:16:55,000 --> 00:17:01,000
And now from this particular vector store DB I'm going to query and based on the query I can ask anything.

326
00:17:01,000 --> 00:17:03,000
It will be able to provide me the response.

327
00:17:03,000 --> 00:17:03,000
Okay.

328
00:17:03,000 --> 00:17:06,000
And over here I have discussed about this files.

329
00:17:06,000 --> 00:17:09,000
Uh I've store I've discussed about vector store DB and many more things.

330
00:17:09,000 --> 00:17:10,000
Right.

331
00:17:10,000 --> 00:17:16,000
So um, yes, uh, right now in this video I will stop it till here of creating this vector store.

332
00:17:16,000 --> 00:17:17,000
DB.

333
00:17:17,000 --> 00:17:21,000
Now in the upcoming videos I will be talking about more amazing things.

334
00:17:21,000 --> 00:17:26,000
Uh, wherein uh, we will be discussing how we can retrieve queries from this particular vector store

335
00:17:26,000 --> 00:17:30,000
DB and get all the content based on the similarity search.

336
00:17:30,000 --> 00:17:30,000
Right.

337
00:17:30,000 --> 00:17:33,000
So yes, uh, this was it from my side.

338
00:17:33,000 --> 00:17:35,000
I hope you like this particular video.

339
00:17:35,000 --> 00:17:37,000
I will see you all in the next video.

340
00:17:37,000 --> 00:17:37,000
Thank you.

341
00:17:37,000 --> 00:17:37,000
Take.

