1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:03,000
So we are going to continue our discussion with respect to Lang Chin.

3
00:00:03,000 --> 00:00:08,000
Uh, till now, uh, we have discussed about step one that is loading from various data sources with

4
00:00:08,000 --> 00:00:10,000
the help of document loader.

5
00:00:10,000 --> 00:00:12,000
This was loading different kind of data itself.

6
00:00:12,000 --> 00:00:18,000
Then we went to the second step wherein we converted all this documents that we loaded into text chunks

7
00:00:18,000 --> 00:00:19,000
or document chunks.

8
00:00:20,000 --> 00:00:22,000
And we saw again various different methods over here.

9
00:00:23,000 --> 00:00:27,000
Now coming to the third step here, what we are doing going to do is that we are going to probably take

10
00:00:27,000 --> 00:00:33,000
this text chunks or document chunks and convert this into vectors.

11
00:00:34,000 --> 00:00:39,000
Now converting, uh, you know, a text chunk or a document chunk into vectors.

12
00:00:39,000 --> 00:00:40,000
There are various ways.

13
00:00:40,000 --> 00:00:40,000
Okay.

14
00:00:41,000 --> 00:00:46,000
Uh, in our session, what we are going to focus on, we are going to focus on three important techniques.

15
00:00:46,000 --> 00:00:49,000
One is with the help of OpenAI library.

16
00:00:49,000 --> 00:00:53,000
Second, uh, we will be seeing something called as llama.

17
00:00:53,000 --> 00:00:53,000
Okay.

18
00:00:53,000 --> 00:00:55,000
We'll discuss about what exactly is Olama.

19
00:00:55,000 --> 00:00:59,000
And the third technique that we are going to use is with the help of hugging face.

20
00:00:59,000 --> 00:01:04,000
Okay, now out of this, it's not like only this three embedding techniques are there.

21
00:01:04,000 --> 00:01:09,000
There are a lot of different techniques that are available, but if you know all these three, that

22
00:01:09,000 --> 00:01:14,000
will be more than sufficient because in terms of accuracy, with respect to the embedding techniques,

23
00:01:14,000 --> 00:01:16,000
I think this three will be more than sufficient.

24
00:01:16,000 --> 00:01:20,000
Yes, there are different different LM models like say you have Google Gemini.

25
00:01:20,000 --> 00:01:22,000
For Google Gemini you have different embedding techniques.

26
00:01:22,000 --> 00:01:28,000
If you are using other models like cloudy right cloudy three from anthropic there they have a different

27
00:01:28,000 --> 00:01:29,000
embedding techniques.

28
00:01:29,000 --> 00:01:34,000
But what I feel for our course this three embedding techniques will be more than sufficient.

29
00:01:34,000 --> 00:01:38,000
If I probably consider with respect to OpenAI this is paid okay.

30
00:01:39,000 --> 00:01:42,000
With llama you can run this embedding techniques from local.

31
00:01:42,000 --> 00:01:45,000
So this will be using open source okay.

32
00:01:45,000 --> 00:01:47,000
And hugging face.

33
00:01:47,000 --> 00:01:51,000
Also we will try to use an open source embedding technique okay.

34
00:01:51,000 --> 00:01:53,000
So we are going to come up with three videos.

35
00:01:53,000 --> 00:01:59,000
First of all uh we'll discuss about OpenAI how you can probably go ahead and uh, use an open AI API

36
00:01:59,000 --> 00:02:03,000
and how you can use OpenAI embedding techniques to convert text into vectors.

37
00:02:03,000 --> 00:02:05,000
Then we'll go with llama and then finally hugging face.

38
00:02:05,000 --> 00:02:08,000
So this is the plan of upcoming videos.

39
00:02:08,000 --> 00:02:10,000
In this video, let's focus on OpenAI.

40
00:02:10,000 --> 00:02:13,000
Now quickly I'm going to go to my browser over here okay.

41
00:02:13,000 --> 00:02:19,000
If you go ahead and search for open AI okay API key.

42
00:02:19,000 --> 00:02:19,000
Okay.

43
00:02:19,000 --> 00:02:22,000
Now I hope everybody knows about OpenAI, right.

44
00:02:22,000 --> 00:02:25,000
It has brought up this entire chat GPT right.

45
00:02:25,000 --> 00:02:27,000
And I hope everybody is may be using ChatGPT.

46
00:02:27,000 --> 00:02:31,000
Now inside that you will be having a lot of LLM models right now.

47
00:02:31,000 --> 00:02:34,000
The recent model is uh, uh, GPT four.

48
00:02:34,000 --> 00:02:34,000
Right.

49
00:02:34,000 --> 00:02:39,000
And again, with OpenAI there is a OpenAI embedding techniques uh, also available.

50
00:02:39,000 --> 00:02:39,000
Right.

51
00:02:39,000 --> 00:02:45,000
So for this I will be requiring my OpenAI API key since I said that this is paid.

52
00:02:45,000 --> 00:02:52,000
So if you go ahead and go to this particular URL that is platform.openai.com/playground, please go

53
00:02:52,000 --> 00:02:55,000
ahead and create your account if you have not created it.

54
00:02:55,000 --> 00:02:57,000
And here you just need to invest $5.

55
00:02:57,000 --> 00:02:58,000
Okay.

56
00:02:58,000 --> 00:03:03,000
The reason is that go ahead and check it out because the the reason why I'm showing you OpenAI embedding.

57
00:03:04,000 --> 00:03:10,000
Because the accuracy is amazing in this, the accuracy with respect to implementing with the help of

58
00:03:10,000 --> 00:03:11,000
APIs, any chatbot solution.

59
00:03:11,000 --> 00:03:12,000
It's amazing right?

60
00:03:12,000 --> 00:03:14,000
Many, many companies are using it.

61
00:03:14,000 --> 00:03:20,000
So that is the reason, um, you need to probably invest $5 to check out how the API key actually works.

62
00:03:20,000 --> 00:03:27,000
So in the API you will be able to see if I go to my um, settings option over here, there'll be something

63
00:03:27,000 --> 00:03:28,000
called as billing.

64
00:03:28,000 --> 00:03:28,000
Right.

65
00:03:28,000 --> 00:03:31,000
So right now you can see I have $2.82.

66
00:03:31,000 --> 00:03:36,000
And let's say if I want more I will probably keep on uploading because I have added some payment methods.

67
00:03:36,000 --> 00:03:37,000
Okay.

68
00:03:37,000 --> 00:03:38,000
Now what I will do.

69
00:03:38,000 --> 00:03:42,000
Uh, I'll go to my dashboard and I'll go to my API key.

70
00:03:42,000 --> 00:03:44,000
So here you can see I I've created two API keys.

71
00:03:44,000 --> 00:03:47,000
You can go ahead and create your own API key okay.

72
00:03:47,000 --> 00:03:50,000
Give any name over here and just click on Create Secret Key.

73
00:03:50,000 --> 00:03:53,000
So once it is created it will give you the entire API key.

74
00:03:53,000 --> 00:04:00,000
Just go ahead and copy and paste it okay I'm going to keep this as hidden because if I share here again

75
00:04:00,000 --> 00:04:00,000
you'll be using it.

76
00:04:00,000 --> 00:04:04,000
And again I may finish up all my balance right.

77
00:04:04,000 --> 00:04:07,000
So your API key will start with s k dash okay.

78
00:04:07,000 --> 00:04:10,000
And the remaining key will be available over there.

79
00:04:10,000 --> 00:04:13,000
So once you create it please make sure that you copy it.

80
00:04:13,000 --> 00:04:17,000
And then we will go ahead and create an environment variable for this.

81
00:04:17,000 --> 00:04:19,000
Now I will go to my code okay.

82
00:04:20,000 --> 00:04:23,000
Now in my environment variable.

83
00:04:23,000 --> 00:04:24,000
See over here.

84
00:04:24,000 --> 00:04:27,000
If you see in this folder I have this dot env file.

85
00:04:27,000 --> 00:04:32,000
Now this environment variable is like I will be storing my API key over there.

86
00:04:33,000 --> 00:04:37,000
I'll not show you the environment variable because I've already stored my API key over there.

87
00:04:37,000 --> 00:04:45,000
So for that you just need to write open I underscore API underscore key and just go ahead and use equal

88
00:04:45,000 --> 00:04:45,000
to.

89
00:04:45,000 --> 00:04:48,000
And your key should be pasted in this format right.

90
00:04:48,000 --> 00:04:53,000
So whatever key will be present you just go ahead and copy and paste it over there.

91
00:04:53,000 --> 00:04:58,000
Right now this will be the environment key that will be available in your environment variable.

92
00:04:58,000 --> 00:05:02,000
So if I go ahead and open this probably you'll be seeing this key value pair over there okay.

93
00:05:02,000 --> 00:05:05,000
You just need to write open API key and this particular information.

94
00:05:06,000 --> 00:05:08,000
So please do that particular step.

95
00:05:08,000 --> 00:05:13,000
Uh so I'm not going to show you because it is crucial not to show you OpenAI API key.

96
00:05:13,000 --> 00:05:20,000
Now if I really want to call that API key of the OpenAI eyes.

97
00:05:20,000 --> 00:05:25,000
I have to load that environment variables in my coding environment.

98
00:05:25,000 --> 00:05:25,000
Okay.

99
00:05:25,000 --> 00:05:27,000
How do I do that.

100
00:05:27,000 --> 00:05:30,000
So for this you will be requiring one library.

101
00:05:30,000 --> 00:05:34,000
So first library is python dash dot env okay.

102
00:05:34,000 --> 00:05:36,000
We will be using this specific library.

103
00:05:36,000 --> 00:05:37,000
Why do we use it.

104
00:05:37,000 --> 00:05:42,000
Because with the help of this particular library you'll be able to see that I will be importing something

105
00:05:42,000 --> 00:05:45,000
like this from dot EMV import load underscore dot env.

106
00:05:45,000 --> 00:05:52,000
And when I initialize this, it will basically call all your environment variables in your coding environment.

107
00:05:52,000 --> 00:05:52,000
Okay.

108
00:05:52,000 --> 00:05:58,000
So first of all we will go ahead and install this okay python dash dot env okay.

109
00:05:58,000 --> 00:06:02,000
Now since we are also working with lang chain with OpenAI.

110
00:06:02,000 --> 00:06:03,000
Right.

111
00:06:03,000 --> 00:06:06,000
So there is also one more library that we will be requiring.

112
00:06:06,000 --> 00:06:10,000
That is nothing but lang chain dash open I.

113
00:06:10,000 --> 00:06:15,000
Okay, so this two libraries I will be requiring first is to load the environment variable.

114
00:06:15,000 --> 00:06:17,000
And second is nothing but Langston OpenAI.

115
00:06:17,000 --> 00:06:22,000
So I will go ahead and open my terminal over here I will clear on my screen.

116
00:06:22,000 --> 00:06:24,000
So let me clear all my screen.

117
00:06:24,000 --> 00:06:27,000
And here I'll say hey pip install minus our requirement dot txt.

118
00:06:27,000 --> 00:06:31,000
And please make sure that it is there in the environment variable okay.

119
00:06:31,000 --> 00:06:35,000
So I'll go ahead and execute it okay.

120
00:06:35,000 --> 00:06:39,000
So finally here you can see that the requirement is already satisfied.

121
00:06:39,000 --> 00:06:41,000
And here I've already done the installation of this.

122
00:06:41,000 --> 00:06:45,000
So this both the libraries has been installed.

123
00:06:45,000 --> 00:06:47,000
Now let's go back to our embedding technique.

124
00:06:47,000 --> 00:06:51,000
And let's see how can we specifically work with Liang Chen.

125
00:06:51,000 --> 00:06:53,000
Uh Dash OpenAI okay.

126
00:06:53,000 --> 00:06:57,000
And we'll go step by step in order to understand this entire thing.

127
00:06:57,000 --> 00:07:01,000
Okay, so first of all, I am importing OS and then I'm importing this.

128
00:07:01,000 --> 00:07:08,000
This is for my environment variables to load all my environment variables load all the environment variables

129
00:07:09,000 --> 00:07:09,000
okay.

130
00:07:10,000 --> 00:07:11,000
Perfect.

131
00:07:12,000 --> 00:07:15,000
Now you'll be able to see that I'm getting true.

132
00:07:15,000 --> 00:07:18,000
So obviously it is saying that hey it has loaded.

133
00:07:18,000 --> 00:07:22,000
Now I'll go ahead and write OS dot environ, okay.

134
00:07:22,000 --> 00:07:25,000
And you know that I've created my open AI API key.

135
00:07:25,000 --> 00:07:28,000
So I'll write open AI underscore API key.

136
00:07:29,000 --> 00:07:37,000
And I'll set this environment variable by calling my environment variable that I've saved in my env

137
00:07:37,000 --> 00:07:37,000
file.

138
00:07:37,000 --> 00:07:37,000
Right.

139
00:07:37,000 --> 00:07:45,000
So this will be get underscore uh get env get environment variable with open AI underscore API underscore

140
00:07:45,000 --> 00:07:46,000
key.

141
00:07:46,000 --> 00:07:47,000
Okay.

142
00:07:47,000 --> 00:07:49,000
So this is what I'm actually able to get.

143
00:07:49,000 --> 00:07:54,000
So once I execute this that basically means my environment variable has been loaded okay.

144
00:07:54,000 --> 00:07:58,000
Now in order to use the embedding technique.

145
00:07:58,000 --> 00:08:00,000
Now what exactly is embedding techniques.

146
00:08:00,000 --> 00:08:09,000
Here we are converting our text converting text into vectors okay.

147
00:08:09,000 --> 00:08:12,000
That is what we are specifically doing over here okay.

148
00:08:12,000 --> 00:08:14,000
We are converting the text into vectors.

149
00:08:14,000 --> 00:08:23,000
So for using the OpenAI embedding I will go to lang chain from lang chain underscore OpenAI I'm going

150
00:08:23,000 --> 00:08:27,000
to import OpenAI embeddings.

151
00:08:27,000 --> 00:08:28,000
Okay.

152
00:08:28,000 --> 00:08:30,000
So this sorry OpenAI embeddings.

153
00:08:31,000 --> 00:08:33,000
Now this is what I'm actually going to use okay.

154
00:08:34,000 --> 00:08:39,000
Now in OpenAI embeddings you have different different models in order to check it out.

155
00:08:39,000 --> 00:08:42,000
What I will do, I will just go back to my documentation.

156
00:08:42,000 --> 00:08:48,000
And here you will be able to see all the models, what all models are specifically there and all.

157
00:08:48,000 --> 00:08:51,000
So here you can see documentation okay.

158
00:08:51,000 --> 00:08:58,000
So in the documentation here lot of different different functionalities available right.

159
00:08:58,000 --> 00:09:04,000
And we will be also using OpenAI later on to create lot many uh, gen AI projects.

160
00:09:04,000 --> 00:09:06,000
So here you can see you have models right.

161
00:09:06,000 --> 00:09:11,000
With respect to model overview, you have all the specific models like GPT four or GPT four turbo,

162
00:09:11,000 --> 00:09:13,000
GPT four or GPT 3.5.

163
00:09:13,000 --> 00:09:17,000
If you want to work with images, that is Dall-E, then you have audio for whisper and all.

164
00:09:17,000 --> 00:09:21,000
But right now I'm focusing more on embeddings, so I'll click on this.

165
00:09:21,000 --> 00:09:25,000
So see a set of models that converts text into a numerical form.

166
00:09:25,000 --> 00:09:27,000
Or I can also say it as vectors okay.

167
00:09:27,000 --> 00:09:29,000
And that is what I have written over here.

168
00:09:29,000 --> 00:09:32,000
If I want to convert the text into vectors okay.

169
00:09:32,000 --> 00:09:37,000
So I'll go ahead and click on this right later on we'll be seeing all this specific models once we develop

170
00:09:37,000 --> 00:09:38,000
an application.

171
00:09:38,000 --> 00:09:42,000
So here you can see in embeddings you have this many number of models.

172
00:09:42,000 --> 00:09:46,000
One is uh text embedding large right.

173
00:09:47,000 --> 00:09:51,000
Uh text embedding three small text embedding 88 002.

174
00:09:51,000 --> 00:09:55,000
So here you can see this is the most capable embedding model for both English and non-English task.

175
00:09:55,000 --> 00:09:58,000
The output dimension is 3072.

176
00:09:58,000 --> 00:10:00,000
Increased performance over second generation.

177
00:10:00,000 --> 00:10:01,000
Other embedding model.

178
00:10:01,000 --> 00:10:04,000
Most capable second generation embedding model.

179
00:10:04,000 --> 00:10:09,000
So what we will do is that we'll try to just use this, uh, and we'll call this particular model for

180
00:10:09,000 --> 00:10:10,000
our embedding purpose.

181
00:10:10,000 --> 00:10:11,000
Okay.

182
00:10:11,000 --> 00:10:13,000
So look quickly, let's go over here.

183
00:10:13,000 --> 00:10:16,000
And here I have initialized OpenAI embedding.

184
00:10:16,000 --> 00:10:19,000
Now I'll call OpenAI embeddings.

185
00:10:19,000 --> 00:10:21,000
And let's go ahead and call my model.

186
00:10:21,000 --> 00:10:29,000
The model is nothing but it will be text embedding three dash large okay.

187
00:10:29,000 --> 00:10:33,000
So this is my embedding model that I'm actually going to use it.

188
00:10:33,000 --> 00:10:34,000
Okay.

189
00:10:34,000 --> 00:10:36,000
So I will just go ahead and execute this.

190
00:10:36,000 --> 00:10:42,000
So if you go ahead and execute this embedding it will show, hey it is nothing but it is a OpenAI embeddings

191
00:10:42,000 --> 00:10:43,000
okay.

192
00:10:43,000 --> 00:10:47,000
And here you can see all the other information at this particular memory location and some of the default

193
00:10:47,000 --> 00:10:49,000
parameters that has been assigned over here.

194
00:10:50,000 --> 00:10:52,000
Now let's test this OpenAI embedding.

195
00:10:52,000 --> 00:10:55,000
So let's say that I am going to consider a text.

196
00:10:55,000 --> 00:11:05,000
And this text will be like this is a tutorial on, uh, embeddings on OpenAI embedding.

197
00:11:05,000 --> 00:11:05,000
All right.

198
00:11:05,000 --> 00:11:07,000
OpenAI embedding okay.

199
00:11:08,000 --> 00:11:14,000
Now remember this OpenAI embedding is not completely for free because, uh, there will be some charges

200
00:11:14,000 --> 00:11:16,000
that will be applied for this particular embedding.

201
00:11:16,000 --> 00:11:17,000
Okay.

202
00:11:17,000 --> 00:11:21,000
But since and it will be a minimal charge anyhow for to test our application and all.

203
00:11:21,000 --> 00:11:26,000
Okay, so here what I'm actually going to do, I'll take this particular uh, uh, text and I'm going

204
00:11:26,000 --> 00:11:28,000
to say hey embeddings dot.

205
00:11:28,000 --> 00:11:31,000
And I'm going to probably convert this text into vectors.

206
00:11:31,000 --> 00:11:34,000
So here I'm actually going to go ahead and write text okay.

207
00:11:34,000 --> 00:11:37,000
So this basically becomes my text okay.

208
00:11:38,000 --> 00:11:39,000
Uh, my uh vectors.

209
00:11:39,000 --> 00:11:44,000
So in order to see my vectors I will just go ahead and see my query underscore result okay.

210
00:11:44,000 --> 00:11:46,000
And I'll go ahead and execute it.

211
00:11:46,000 --> 00:11:49,000
So let's go ahead and see this query underscore result over here.

212
00:11:49,000 --> 00:11:51,000
So it is going to take this particular text.

213
00:11:51,000 --> 00:11:53,000
And it has converted this okay.

214
00:11:53,000 --> 00:11:54,000
Converted this into a vector.

215
00:11:55,000 --> 00:11:57,000
Now let's see the dimension for this okay.

216
00:11:57,000 --> 00:12:02,000
So if I go ahead and write query underscore result of zero.

217
00:12:03,000 --> 00:12:05,000
So this is my first vector that is visible.

218
00:12:05,000 --> 00:12:06,000
Let me do one thing.

219
00:12:06,000 --> 00:12:10,000
Let me just go ahead and write length of query of results.

220
00:12:10,000 --> 00:12:13,000
So here you can see 3072 dimension is there.

221
00:12:13,000 --> 00:12:17,000
And that is what we got from the uh from the uh documentation.

222
00:12:17,000 --> 00:12:18,000
Right.

223
00:12:18,000 --> 00:12:25,000
So what it does is that it takes this entire sentence and it converts into a vector of 3072 dimension.

224
00:12:25,000 --> 00:12:28,000
With all these values, you'll be able to see all these values over here.

225
00:12:28,000 --> 00:12:29,000
Okay.

226
00:12:29,000 --> 00:12:31,000
And that is what is all about.

227
00:12:31,000 --> 00:12:35,000
Uh, you know how it probably converts a text into vectors, right?

228
00:12:36,000 --> 00:12:40,000
Uh, you see, you can definitely do multiple in different, different ways also.

229
00:12:40,000 --> 00:12:41,000
Okay.

230
00:12:41,000 --> 00:12:45,000
And what I can also do is that I can go ahead and set my own dimensions if you want.

231
00:12:45,000 --> 00:12:49,000
Let's say that I don't want to go with 3072, uh, which is the default.

232
00:12:49,000 --> 00:12:50,000
I can just copy the same thing.

233
00:12:50,000 --> 00:12:51,000
Okay.

234
00:12:51,000 --> 00:12:56,000
Let's say I will go ahead and call this embeddings again.

235
00:12:56,000 --> 00:12:57,000
Okay.

236
00:12:57,000 --> 00:12:59,000
I'll paste it over here.

237
00:12:59,000 --> 00:13:03,000
Now inside this embeddings what I will write I will go ahead and say, hey, this is my dimension,

238
00:13:03,000 --> 00:13:04,000
okay?

239
00:13:04,000 --> 00:13:06,000
This is my dimension.

240
00:13:06,000 --> 00:13:10,000
And this time I will say, hey, just convert this into one zero, two four dimension.

241
00:13:10,000 --> 00:13:10,000
Right.

242
00:13:10,000 --> 00:13:17,000
And uh, this dimension, I will just go ahead and say embedding underscore 1024 okay.

243
00:13:17,000 --> 00:13:21,000
So once I execute it you will be able to see this over here.

244
00:13:21,000 --> 00:13:26,000
And uh embeddings underscore 1024.

245
00:13:26,000 --> 00:13:29,000
So this is my, uh, embedding.

246
00:13:29,000 --> 00:13:29,000
Okay.

247
00:13:29,000 --> 00:13:32,000
Now I'll do or execute the same code over here.

248
00:13:32,000 --> 00:13:35,000
So let me just go ahead and paste it over here.

249
00:13:35,000 --> 00:13:36,000
Right.

250
00:13:36,000 --> 00:13:38,000
So this is the sentence that I'm actually going to do.

251
00:13:39,000 --> 00:13:39,000
Right.

252
00:13:39,000 --> 00:13:40,000
1024.

253
00:13:40,000 --> 00:13:43,000
And here let me just go ahead and write query result.

254
00:13:43,000 --> 00:13:50,000
So here I'll just go ahead and write length of query result because it needs to show me in 1024 dimension.

255
00:13:50,000 --> 00:13:54,000
So here you can see I'm able to get into 1024 dimension.

256
00:13:54,000 --> 00:13:54,000
Right.

257
00:13:54,000 --> 00:13:56,000
And if you want to see the result.

258
00:13:56,000 --> 00:14:00,000
So this will be your vectors for that same text.

259
00:14:00,000 --> 00:14:00,000
Okay.

260
00:14:01,000 --> 00:14:08,000
So uh, this was a simple way of converting an embeddings uh, or of any text into vectors.

261
00:14:08,000 --> 00:14:09,000
Now let me do one thing.

262
00:14:09,000 --> 00:14:13,000
Let me just go to my one of the example over here.

263
00:14:13,000 --> 00:14:15,000
Recursive character text splitter.

264
00:14:15,000 --> 00:14:15,000
Okay.

265
00:14:15,000 --> 00:14:20,000
So uh, let's say I will be taking one example okay.

266
00:14:20,000 --> 00:14:23,000
And I'll try to convert that entire thing into.

267
00:14:23,000 --> 00:14:27,000
So let's take this speech dot text okay I will copy this.

268
00:14:27,000 --> 00:14:30,000
I'll put it in my embedding okay.

269
00:14:30,000 --> 00:14:35,000
And here I'm going to just copy this entire thing paste it okay.

270
00:14:36,000 --> 00:14:40,000
So I'll just show you for an entire document how you can actually do so.

271
00:14:40,000 --> 00:14:43,000
Once this basically comes, this will basically be my document.

272
00:14:43,000 --> 00:14:44,000
Okay.

273
00:14:44,000 --> 00:14:45,000
Then what do we do?

274
00:14:45,000 --> 00:14:51,000
Uh, after we get the document, uh, we, uh uh, here I can say loader dot load.

275
00:14:51,000 --> 00:14:58,000
Then, uh, with respect to this loader dot load, uh, I will just go ahead and convert that into recursive.

276
00:14:58,000 --> 00:15:01,000
I'll split that based on my recursive character splitter.

277
00:15:01,000 --> 00:15:01,000
Okay.

278
00:15:01,000 --> 00:15:04,000
So I'm just going to copy and paste it over here.

279
00:15:04,000 --> 00:15:04,000
Okay.

280
00:15:04,000 --> 00:15:06,000
I'm going to take this particular documents.

281
00:15:06,000 --> 00:15:08,000
And this basically becomes my 500 documents.

282
00:15:08,000 --> 00:15:11,000
So so here is my all the text that I have.

283
00:15:11,000 --> 00:15:18,000
Now once I use this code over here, what I am actually doing, I'm taking this entire documents, and

284
00:15:18,000 --> 00:15:22,000
I've done the recursive character text splitter and converted this into chunks of documents.

285
00:15:22,000 --> 00:15:23,000
This is perfect.

286
00:15:23,000 --> 00:15:23,000
Okay.

287
00:15:24,000 --> 00:15:29,000
Now, uh, what I really need to do is that I need to take all these documents and convert this into

288
00:15:29,000 --> 00:15:32,000
vectors, and finally store it in a vector db.

289
00:15:32,000 --> 00:15:32,000
Right.

290
00:15:32,000 --> 00:15:33,000
Vector store db.

291
00:15:34,000 --> 00:15:37,000
Now, uh, this is really, really important.

292
00:15:37,000 --> 00:15:38,000
Please focus on this.

293
00:15:38,000 --> 00:15:42,000
So what I'm actually doing is that I will combine both of these steps.

294
00:15:42,000 --> 00:15:48,000
First, I will take this entire chunk of documents, convert it into a vector and then store it into

295
00:15:48,000 --> 00:15:48,000
the vector DB.

296
00:15:48,000 --> 00:15:50,000
That is what I'm planning to do.

297
00:15:50,000 --> 00:15:51,000
Okay.

298
00:15:51,000 --> 00:15:53,000
now how do I actually do it?

299
00:15:53,000 --> 00:15:55,000
I'll just give you one example.

300
00:15:55,000 --> 00:15:55,000
Okay.

301
00:15:55,000 --> 00:15:58,000
And one of the vector database that I'm actually going to use.

302
00:15:58,000 --> 00:16:03,000
So here what I'm actually going to do I'm going to combine vector embedding since we have a list of

303
00:16:03,000 --> 00:16:05,000
documents okay.

304
00:16:05,000 --> 00:16:10,000
Since I have a list of documents I will combine vector embedding and vector store.

305
00:16:11,000 --> 00:16:11,000
Okay.

306
00:16:12,000 --> 00:16:18,000
So uh, let's say for this particular example I'm going to use a vector store DB which is called as

307
00:16:18,000 --> 00:16:19,000
chroma.

308
00:16:19,000 --> 00:16:19,000
Okay.

309
00:16:19,000 --> 00:16:24,000
We'll discuss about vector store DB uh, more in depth in the upcoming videos.

310
00:16:24,000 --> 00:16:27,000
First of all, I just want to give you a simple, basic example.

311
00:16:27,000 --> 00:16:34,000
Let's consider that I'm going to use a vector store db which is called as vector stores, which is called

312
00:16:34,000 --> 00:16:36,000
as chroma.

313
00:16:36,000 --> 00:16:39,000
Okay, so this will be my vector store DB okay.

314
00:16:39,000 --> 00:16:41,000
Um, again it is an open source.

315
00:16:41,000 --> 00:16:42,000
You can use this okay.

316
00:16:42,000 --> 00:16:49,000
Now my main aim is that with the help of vector embedding, first of all I need to convert all this

317
00:16:49,000 --> 00:16:51,000
documents text into vectors.

318
00:16:51,000 --> 00:16:53,000
Okay that is the first step.

319
00:16:53,000 --> 00:16:55,000
And then finally store it in the chroma db.

320
00:16:55,000 --> 00:16:57,000
So here I will go ahead and write db is equal to.

321
00:16:58,000 --> 00:17:05,000
Let's crawl this particular chroma okay dot I will just go ahead and call from documents since we are

322
00:17:05,000 --> 00:17:06,000
working with documents.

323
00:17:06,000 --> 00:17:06,000
Okay.

324
00:17:07,000 --> 00:17:12,000
Inside this vector store db, the first parameter that goes is nothing.

325
00:17:12,000 --> 00:17:13,000
Your final documents okay.

326
00:17:14,000 --> 00:17:18,000
And the second parameter is nothing but your embedding technique that you are going to use.

327
00:17:18,000 --> 00:17:22,000
Let's say I want to go ahead and use this embedding technique of 1024.

328
00:17:22,000 --> 00:17:22,000
Okay.

329
00:17:22,000 --> 00:17:23,000
Where did it go.

330
00:17:23,000 --> 00:17:24,000
Here.

331
00:17:24,000 --> 00:17:26,000
Let's say I'm going to use this embedding technique.

332
00:17:26,000 --> 00:17:30,000
So I'm just going to copy it I'm going to just going to paste it over here.

333
00:17:30,000 --> 00:17:35,000
So this basically becomes my vector store DB okay.

334
00:17:35,000 --> 00:17:39,000
So if I go ahead and see this DB now it will get executed.

335
00:17:39,000 --> 00:17:40,000
And I'm getting an error.

336
00:17:40,000 --> 00:17:45,000
Could not import chroma DB python package because I need to go ahead and install chroma DB because this

337
00:17:45,000 --> 00:17:46,000
is a vector store DB.

338
00:17:47,000 --> 00:17:49,000
So I'll go ahead and open my terminal.

339
00:17:49,000 --> 00:17:55,000
And first of all I'll just go ahead and update my requirement dot txt And let's go ahead and quickly

340
00:17:55,000 --> 00:17:56,000
install this chrome.

341
00:17:56,000 --> 00:17:59,000
ADB is another vector store and it is in the form of library.

342
00:18:00,000 --> 00:18:03,000
Uh, we will first of all go ahead and install this okay.

343
00:18:03,000 --> 00:18:05,000
And then only we'll be able to use this.

344
00:18:05,000 --> 00:18:11,000
But here what we are doing is that we are combining vector store DB along with vector embedding techniques.

345
00:18:11,000 --> 00:18:12,000
And that is the code.

346
00:18:12,000 --> 00:18:13,000
We have written it over here.

347
00:18:13,000 --> 00:18:14,000
Right.

348
00:18:14,000 --> 00:18:17,000
So it will take some time to do the installation.

349
00:18:17,000 --> 00:18:19,000
So here again, let me repeat it.

350
00:18:19,000 --> 00:18:21,000
First of all we are importing chroma.

351
00:18:21,000 --> 00:18:24,000
We have already created our embedding underscore 1024.

352
00:18:24,000 --> 00:18:27,000
This is nothing but this is OpenAI embeddings.

353
00:18:27,000 --> 00:18:32,000
And in chroma DB what I'm actually going to do I'm going to use this function under dot from underscore

354
00:18:32,000 --> 00:18:33,000
documents.

355
00:18:33,000 --> 00:18:35,000
I'll give all my final documents.

356
00:18:35,000 --> 00:18:39,000
And on all this final documents I need to apply this particular embedding technique.

357
00:18:39,000 --> 00:18:42,000
And finally we get this particular vector store DB okay.

358
00:18:43,000 --> 00:18:45,000
Again based on your internet connection it will take some time.

359
00:18:45,000 --> 00:18:50,000
And again this will be a open source vector store DB which you can also use it.

360
00:18:50,000 --> 00:18:50,000
Yes.

361
00:18:50,000 --> 00:18:56,000
For deployment I will also be showing you I'll be using some of the vector database where I will be

362
00:18:56,000 --> 00:19:00,000
deploying it in the cloud, and from that also you will be able to call it okay.

363
00:19:00,000 --> 00:19:02,000
But just get an idea here.

364
00:19:02,000 --> 00:19:08,000
Uh, instead of just using just a vector embedding, converting that document into text, it is also

365
00:19:08,000 --> 00:19:12,000
good that you get to know about how to store it in the vector database as we go ahead.

366
00:19:13,000 --> 00:19:16,000
Right now, I'm just going to discuss about one vector database called chroma.

367
00:19:16,000 --> 00:19:20,000
But as we go ahead we'll be discussing more about different different vector databases.

368
00:19:20,000 --> 00:19:22,000
So yes this installation has taken place.

369
00:19:22,000 --> 00:19:25,000
Now I think it should get executed.

370
00:19:25,000 --> 00:19:30,000
So now if I'm executing it you'll be able to see that I'll get the DB and it says that, hey, it is

371
00:19:30,000 --> 00:19:34,000
a vector store and this is my chrome at this particular memory location.

372
00:19:34,000 --> 00:19:40,000
Now how do I query from this particular vector the database okay.

373
00:19:40,000 --> 00:19:44,000
So first of all I'll just go ahead and open this particular speech text okay.

374
00:19:44,000 --> 00:19:50,000
Let's say I will be copying this entire content and I'll be searching okay.

375
00:19:50,000 --> 00:19:51,000
Over here.

376
00:19:51,000 --> 00:19:53,000
So here I will go ahead and write.

377
00:19:53,000 --> 00:19:55,000
This will basically be my query.

378
00:19:55,000 --> 00:19:55,000
Okay.

379
00:19:55,000 --> 00:20:03,000
I've just used some text over there and I'll say hey use this DB and do some similarity search.

380
00:20:04,000 --> 00:20:05,000
Okay.

381
00:20:05,000 --> 00:20:07,000
Similarity search.

382
00:20:07,000 --> 00:20:10,000
And here I'm going to basically use this particular query okay.

383
00:20:10,000 --> 00:20:17,000
And uh here I'm going to save this entire in one variable which is called as retrieve.

384
00:20:17,000 --> 00:20:24,000
retrieve retrieved results is equal to this one.

385
00:20:24,000 --> 00:20:24,000
Okay.

386
00:20:25,000 --> 00:20:32,000
And then I will go ahead and print it retrieve.

387
00:20:33,000 --> 00:20:38,000
So I'll just go and write retrieved underscore result okay.

388
00:20:39,000 --> 00:20:41,000
And I'll just go ahead and execute it.

389
00:20:41,000 --> 00:20:43,000
So here you can see I'm able to get the response.

390
00:20:43,000 --> 00:20:47,000
I'm doing a similarity search right on a specific query.

391
00:20:47,000 --> 00:20:52,000
I want to retrieve some information from this particular vector store DB I've taken a text from that

392
00:20:52,000 --> 00:20:56,000
and I'm searching by using the similarity search uh method.

393
00:20:56,000 --> 00:21:00,000
And finally you'll be able to see I'm able to get that entire information right.

394
00:21:00,000 --> 00:21:02,000
It will be all easier to conduct.

395
00:21:02,000 --> 00:21:05,000
And finally you'll be seeing that entire text.

396
00:21:05,000 --> 00:21:11,000
The context has got explored and it is basically search this and it is basically displaying over here.

397
00:21:11,000 --> 00:21:11,000
Right.

398
00:21:11,000 --> 00:21:15,000
And that is the power of vector study right now.

399
00:21:15,000 --> 00:21:22,000
I hope you are able to understand how step by step we have taken a text, we have converted that into

400
00:21:22,000 --> 00:21:23,000
vectors.

401
00:21:23,000 --> 00:21:30,000
And finally with a document over here like this dot txt file we we used all the methods.

402
00:21:30,000 --> 00:21:32,000
First of all we did the data injection.

403
00:21:32,000 --> 00:21:34,000
Second we did the text splitter.

404
00:21:34,000 --> 00:21:38,000
Then third uh we applied vector embedding along with vector store db.

405
00:21:38,000 --> 00:21:41,000
And fourth we have also retrieved the results right.

406
00:21:41,000 --> 00:21:45,000
So here you'll be able to see what did we do.

407
00:21:45,000 --> 00:21:45,000
Over here.

408
00:21:45,000 --> 00:21:53,000
We retrieve the results from querying vector to vector store db vector store db.

409
00:21:53,000 --> 00:21:55,000
And here what technique we have applied.

410
00:21:55,000 --> 00:21:58,000
It is nothing, but we have applied similarity search.

411
00:21:58,000 --> 00:22:02,000
Okay, so I hope you are able to understand all these things.

412
00:22:02,000 --> 00:22:09,000
Uh, now in my next video I am going to show you with some open source libraries, uh, like how you

413
00:22:09,000 --> 00:22:09,000
can actually do it.

414
00:22:09,000 --> 00:22:12,000
And then we'll also be discussing about hugging face.

415
00:22:12,000 --> 00:22:13,000
Okay.

416
00:22:13,000 --> 00:22:15,000
So yeah, uh, this was it.

417
00:22:15,000 --> 00:22:18,000
I hope you are quite excited to understand this entire session.

418
00:22:18,000 --> 00:22:20,000
I will see you all in the next video.

419
00:22:20,000 --> 00:22:20,000
Thank you.

