1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:04,000
So we are going to continue the discussion with respect to vector stores.

3
00:00:04,000 --> 00:00:07,000
Uh, in this video we are going to discuss about a chroma DB.

4
00:00:07,000 --> 00:00:10,000
You know, so what exactly is chroma DB?

5
00:00:10,000 --> 00:00:15,000
It is an I native open source vector database focused on developer productivity and happiness.

6
00:00:15,000 --> 00:00:17,000
Chroma is licensed under Apache 2.0.

7
00:00:17,000 --> 00:00:19,000
So let's go ahead and see.

8
00:00:19,000 --> 00:00:22,000
Uh, again I will be showing you with the help of code itself.

9
00:00:22,000 --> 00:00:28,000
So first of all, uh, you know, you need to go ahead and, uh, install chroma.

10
00:00:28,000 --> 00:00:33,000
So in order to install, uh, chroma, you know, you just need to make sure that you update your,

11
00:00:34,000 --> 00:00:36,000
um, you know, requirement dot txt.

12
00:00:36,000 --> 00:00:41,000
So for this, uh, what I'm actually going to do is that I'm going to quickly go ahead and write chroma

13
00:00:41,000 --> 00:00:41,000
over here.

14
00:00:42,000 --> 00:00:42,000
Okay.

15
00:00:42,000 --> 00:00:44,000
Or sorry chroma DB.

16
00:00:44,000 --> 00:00:50,000
And then I'll just go ahead and open my requirement dot txt pip install minus our requirement dot txt.

17
00:00:50,000 --> 00:00:51,000
Let's see.

18
00:00:52,000 --> 00:00:56,000
You'll be able to see that I have I think I may have already done this installation so it has been installed.

19
00:00:56,000 --> 00:00:57,000
Okay.

20
00:00:57,000 --> 00:01:02,000
So this is the first thing, uh, please make sure like one by one you should be able to install all

21
00:01:02,000 --> 00:01:03,000
these packages till now.

22
00:01:03,000 --> 00:01:04,000
Okay.

23
00:01:04,000 --> 00:01:12,000
Now the first step, uh, again we will be building a sample vector db.

24
00:01:13,000 --> 00:01:13,000
Okay.

25
00:01:13,000 --> 00:01:19,000
And then first of all I will go ahead and read from lang chain underscore underscore.

26
00:01:20,000 --> 00:01:21,000
Chroma.

27
00:01:21,000 --> 00:01:22,000
Okay.

28
00:01:22,000 --> 00:01:25,000
Uh import chroma.

29
00:01:25,000 --> 00:01:25,000
Okay.

30
00:01:25,000 --> 00:01:30,000
Now if you go ahead and see the previous version of, uh, chroma DB, right.

31
00:01:30,000 --> 00:01:35,000
Uh, that time we used to directly install chroma DB, and we used to do use it.

32
00:01:35,000 --> 00:01:35,000
Right.

33
00:01:35,000 --> 00:01:41,000
But now in the long chain you have this new library that has come like chain underscore chroma.

34
00:01:41,000 --> 00:01:44,000
So you need to also install this right.

35
00:01:44,000 --> 00:01:46,000
No need to install chroma debris separately.

36
00:01:46,000 --> 00:01:46,000
Now.

37
00:01:46,000 --> 00:01:51,000
Now you have this entirely where you can actually use lang chain underscore chroma db okay.

38
00:01:51,000 --> 00:01:58,000
So what I will do I will again go ahead and uh, I will say, hey, uh, just clear the screen okay.

39
00:01:58,000 --> 00:02:02,000
And then let's go ahead and install this requirement dot txt.

40
00:02:02,000 --> 00:02:07,000
So here you can see now you need to install this lang chain underscore chroma.

41
00:02:07,000 --> 00:02:10,000
Uh, this will be my library that I'll be using.

42
00:02:10,000 --> 00:02:14,000
And right now you can see the recent version 0.11 1.1.

43
00:02:14,000 --> 00:02:19,000
Please make sure that you keep this version also ready so that, uh, you will be able to understand

44
00:02:19,000 --> 00:02:21,000
which version you have actually created.

45
00:02:21,000 --> 00:02:21,000
Okay.

46
00:02:21,000 --> 00:02:24,000
Now from Langston underscore import chroma.

47
00:02:24,000 --> 00:02:26,000
So I'll just go ahead and import this.

48
00:02:26,000 --> 00:02:27,000
Let's see whether it is working fine.

49
00:02:27,000 --> 00:02:30,000
So yes it has got successfully imported okay.

50
00:02:30,000 --> 00:02:34,000
Remaining everything will be almost same like how we used for fires.

51
00:02:34,000 --> 00:02:39,000
So here I'm going to import login underscore community dot document loaders I'm going to use text loader

52
00:02:39,000 --> 00:02:40,000
then use Ola my embedding.

53
00:02:40,000 --> 00:02:42,000
You can also use OpenAI embedding if you want.

54
00:02:42,000 --> 00:02:47,000
Then along with this I'm also going to use login underscore text splitters with recursive character

55
00:02:47,000 --> 00:02:49,000
text splitter okay.

56
00:02:49,000 --> 00:02:54,000
Now uh quickly let's go ahead and read this particular speech dot txt.

57
00:02:54,000 --> 00:02:59,000
And again I'm going to go ahead and write my data dot loader right loader dot loader.

58
00:02:59,000 --> 00:03:01,000
So here you can see text loader.

59
00:03:01,000 --> 00:03:04,000
Is there why it did not get executed.

60
00:03:04,000 --> 00:03:06,000
Let's see I have to execute this.

61
00:03:06,000 --> 00:03:11,000
Then let's go ahead and execute this I think I did not execute this part okay.

62
00:03:11,000 --> 00:03:13,000
So here you have this entire speech dot txt.

63
00:03:13,000 --> 00:03:15,000
And here is my entire data.

64
00:03:15,000 --> 00:03:18,000
Now what we'll do we'll basically do the split.

65
00:03:18,000 --> 00:03:24,000
And for splitting um we have to use recursive character text splitter I've used chunk over size as 500

66
00:03:24,000 --> 00:03:26,000
chunk overlap as zero.

67
00:03:26,000 --> 00:03:28,000
Okay, it is up to you how much overlap you want to use.

68
00:03:28,000 --> 00:03:28,000
Use it.

69
00:03:28,000 --> 00:03:30,000
Then, uh, we will go ahead and write text.

70
00:03:30,000 --> 00:03:35,000
Underscore splitter dot split documents with respect to data is equal to splits okay.

71
00:03:35,000 --> 00:03:39,000
So here uh you can see uh this basically is my splits okay.

72
00:03:39,000 --> 00:03:43,000
Now quickly let's go ahead and create my vector store along with the embedding.

73
00:03:43,000 --> 00:03:47,000
So in order to do that first of all I will go to right.

74
00:03:47,000 --> 00:03:51,000
Embedding is equal to Allama embeddings okay.

75
00:03:51,000 --> 00:04:00,000
And then I'm just going to write vector DB is equal to chroma dot db or sorry from documents okay.

76
00:04:00,000 --> 00:04:04,000
So again it will be like the same thing like how we discussed for files.

77
00:04:04,000 --> 00:04:11,000
And inside this I will go ahead and give my um data that is splits okay comma.

78
00:04:11,000 --> 00:04:16,000
Along with this, I'm going to just go ahead and apply my embeddings, which will be equal to this specific

79
00:04:16,000 --> 00:04:17,000
embedding that I have initialized.

80
00:04:17,000 --> 00:04:21,000
Remember the first parameter is nothing, but it is documents okay.

81
00:04:22,000 --> 00:04:28,000
So this basically becomes my vector store vector db vector store uh vector db over here.

82
00:04:28,000 --> 00:04:30,000
Or you can also say vector store db.

83
00:04:30,000 --> 00:04:35,000
And once I execute it this is basically going to apply to this entire documents okay.

84
00:04:35,000 --> 00:04:38,000
Which we have actually done the split.

85
00:04:38,000 --> 00:04:42,000
It is going to take some amount of time since this is getting executed in the local.

86
00:04:42,000 --> 00:04:42,000
Okay.

87
00:04:42,000 --> 00:04:50,000
Once I do this, let's this get executed, then I'm just going to query this particular, uh DB.

88
00:04:50,000 --> 00:04:50,000
Right.

89
00:04:50,000 --> 00:04:53,000
So for that again how do we query it.

90
00:04:53,000 --> 00:04:56,000
By using the same technique what we used in fast.

91
00:04:56,000 --> 00:05:00,000
So let's say here I say what does the speaker believe is the main reason?

92
00:05:01,000 --> 00:05:02,000
United States should enter the war.

93
00:05:02,000 --> 00:05:06,000
So vector vector db dot similarity search of query.

94
00:05:06,000 --> 00:05:09,000
So this basically becomes my docs of zero.

95
00:05:09,000 --> 00:05:10,000
I will go ahead and print.

96
00:05:10,000 --> 00:05:15,000
Along with that I'll also go ahead and print my page content okay.

97
00:05:15,000 --> 00:05:15,000
Content.

98
00:05:15,000 --> 00:05:22,000
So if I go ahead and execute this you'll be able to see that, hey I'm getting the entire response okay.

99
00:05:23,000 --> 00:05:31,000
Now like how we made sure that we can also save this in the local will also be able to save chroma vector

100
00:05:31,000 --> 00:05:33,000
store DB in the local itself.

101
00:05:33,000 --> 00:05:33,000
Right.

102
00:05:33,000 --> 00:05:38,000
So now let's go ahead and save to the disk okay.

103
00:05:38,000 --> 00:05:39,000
Saving to the disk.

104
00:05:39,000 --> 00:05:44,000
And here again I'll be using the same command like this okay.

105
00:05:44,000 --> 00:05:48,000
This will be my vector store DB okay.

106
00:05:48,000 --> 00:05:54,000
Here what I can do is that, uh, along with this entire documents, uh, that I'm actually providing,

107
00:05:54,000 --> 00:06:00,000
uh, with the embeddings and all, uh, what I will do, I will go ahead and use one persist directory.

108
00:06:00,000 --> 00:06:03,000
So here I have to go ahead and write my persist directory.

109
00:06:03,000 --> 00:06:05,000
And I'll give my directory location.

110
00:06:05,000 --> 00:06:09,000
That is nothing but chroma underscore db okay.

111
00:06:09,000 --> 00:06:12,000
And finally I can use this.

112
00:06:13,000 --> 00:06:18,000
Uh, so if I go ahead and execute this, it's going to take some amount of time.

113
00:06:18,000 --> 00:06:22,000
And again here you'll be able to see a folder is basically getting created.

114
00:06:22,000 --> 00:06:26,000
And this internally creates a SQLite db.

115
00:06:26,000 --> 00:06:27,000
Okay SQLite three.

116
00:06:27,000 --> 00:06:30,000
So this db is specifically used inside.

117
00:06:30,000 --> 00:06:33,000
That basically means every vector is stored inside this particular db.

118
00:06:33,000 --> 00:06:35,000
Now see if you are able to see one db right.

119
00:06:35,000 --> 00:06:37,000
This can be hosted anywhere you like right.

120
00:06:37,000 --> 00:06:42,000
So this is the main fund or importance behind uh, saving it to the disk.

121
00:06:42,000 --> 00:06:42,000
Okay.

122
00:06:42,000 --> 00:06:45,000
Or saving it in, in, uh, in a file.

123
00:06:45,000 --> 00:06:47,000
Uh, that can be reused again.

124
00:06:47,000 --> 00:06:52,000
And that can be called now in order to call this a what you can do is that you can just go ahead and

125
00:06:52,000 --> 00:06:54,000
write this particular code, see Chrome.

126
00:06:55,000 --> 00:07:00,000
I'm just saying, hey, give the persist directory that is uh, dot slash Chrome DB embedding technique

127
00:07:00,000 --> 00:07:02,000
is nothing but my embedding over here.

128
00:07:02,000 --> 00:07:02,000
Right.

129
00:07:02,000 --> 00:07:03,000
Whatever embedding function is there.

130
00:07:04,000 --> 00:07:08,000
and I can again, uh, do the same similarity search.

131
00:07:08,000 --> 00:07:10,000
So let's say if I get my docs.

132
00:07:10,000 --> 00:07:14,000
So this will be my DB two dot similarity search.

133
00:07:14,000 --> 00:07:16,000
You can do the similarity search by vectors.

134
00:07:16,000 --> 00:07:18,000
You can do it by anything as such.

135
00:07:18,000 --> 00:07:23,000
And then we will go ahead and print docs of zero okay.

136
00:07:23,000 --> 00:07:27,000
Dot page underscore content.

137
00:07:27,000 --> 00:07:31,000
So once I execute this here you will be able to get the response okay.

138
00:07:32,000 --> 00:07:37,000
Uh, if you want to probably get the similarity score, uh, you can also go ahead and write this particular

139
00:07:37,000 --> 00:07:38,000
code like this.

140
00:07:38,000 --> 00:07:40,000
Like how we did it for fires.

141
00:07:40,000 --> 00:07:42,000
There is a function which is called as similarity score.

142
00:07:42,000 --> 00:07:43,000
Uh, with score.

143
00:07:43,000 --> 00:07:45,000
And here you will be able to see the docs.

144
00:07:45,000 --> 00:07:46,000
Okay.

145
00:07:47,000 --> 00:07:52,000
Uh, so here again Manhattan distance based on the Manhattan distance, you are able to get this information.

146
00:07:52,000 --> 00:07:52,000
Okay.

147
00:07:52,000 --> 00:07:53,000
Perfect.

148
00:07:53,000 --> 00:07:56,000
Uh, now, finally, uh, again, we should not skip this option.

149
00:07:56,000 --> 00:07:59,000
Also, that is nothing but retriever option.

150
00:07:59,000 --> 00:08:06,000
Okay, now, in the case of retriever option, I will take this vector store DB dot as retriever okay.

151
00:08:06,000 --> 00:08:12,000
And then uh I'm just going to create this particular variable retriever is equal to.

152
00:08:12,000 --> 00:08:21,000
And let's go ahead and invoke this okay retriever dot invoke okay.

153
00:08:21,000 --> 00:08:27,000
And here I'm just going to give my query and get me the first whichever will be the first result.

154
00:08:28,000 --> 00:08:28,000
Right.

155
00:08:28,000 --> 00:08:30,000
So this will basically be my first result.

156
00:08:30,000 --> 00:08:34,000
I can go ahead and also display my page underscore content okay.

157
00:08:35,000 --> 00:08:39,000
Page underscore content okay.

158
00:08:40,000 --> 00:08:40,000
Perfect.

159
00:08:40,000 --> 00:08:43,000
Uh, this was more about it.

160
00:08:43,000 --> 00:08:48,000
You know, uh, where we were discussing about, uh, vector store DBS and all.

161
00:08:48,000 --> 00:08:52,000
Uh, and this is one example with respect to the chroma DB.

162
00:08:52,000 --> 00:08:59,000
We will be using everything like this kind of vector store DB or databases, uh, in our end to end

163
00:08:59,000 --> 00:08:59,000
projects.

164
00:08:59,000 --> 00:09:04,000
Uh, just to give you an example, two of the most common things that we use, which are completely

165
00:09:04,000 --> 00:09:05,000
open source is Faes and Chroma.

166
00:09:06,000 --> 00:09:11,000
Uh, as we go ahead again, uh, when we develop end to end projects, we'll be seeing a new DB like

167
00:09:11,000 --> 00:09:12,000
Cassandra DB.

168
00:09:12,000 --> 00:09:17,000
Astra DB will be talking about different different DBS like pin code, vector store DB.

169
00:09:17,000 --> 00:09:23,000
So that also will be talking about where we will be creating some kind of that will be hosted in some

170
00:09:23,000 --> 00:09:23,000
kind of cloud.

171
00:09:23,000 --> 00:09:25,000
So that is the reason I'm saying that.

172
00:09:25,000 --> 00:09:28,000
We'll discuss about that when we develop our intern projects.

173
00:09:28,000 --> 00:09:28,000
Right.

174
00:09:29,000 --> 00:09:31,000
So yes, uh, this was it.

175
00:09:31,000 --> 00:09:32,000
Uh, I'll see you all in the next video.

176
00:09:32,000 --> 00:09:32,000
Thank you.

