1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:04,000
So we are going to continue the discussion with respect to embedding techniques.

3
00:00:04,000 --> 00:00:08,000
And in this video we are going to see about embedding techniques using hugging face.

4
00:00:08,000 --> 00:00:14,000
Now hugging face also has a lot of open source models uh, specifically used for embedding and also

5
00:00:14,000 --> 00:00:19,000
different different types of LM models through which you can actually use them and create your generative

6
00:00:19,000 --> 00:00:20,000
AI application.

7
00:00:20,000 --> 00:00:21,000
So let us go ahead.

8
00:00:21,000 --> 00:00:27,000
And first of all, what you really need to do is that just go to the Hugging face.co website okay.

9
00:00:27,000 --> 00:00:31,000
So in this particular website you'll be able to see that there will be models.

10
00:00:31,000 --> 00:00:36,000
There will be so many things that are available over here with respect to all the tasks.

11
00:00:36,000 --> 00:00:39,000
So first of all just go ahead and create your account.

12
00:00:39,000 --> 00:00:44,000
Because later on we are going to create a lot of end to end projects again using hugging face.

13
00:00:44,000 --> 00:00:47,000
And the best thing about hugging face and Lang Chen is that they have integrate.

14
00:00:47,000 --> 00:00:52,000
They have created a library, which is a integration of both hugging face and language in itself.

15
00:00:52,000 --> 00:00:54,000
So you can call any LLM models that you like over here.

16
00:00:54,000 --> 00:00:55,000
Okay.

17
00:00:55,000 --> 00:01:04,000
So first of all, uh, in order to work this, uh, you really need to have to create your own API key,

18
00:01:04,000 --> 00:01:04,000
right?

19
00:01:04,000 --> 00:01:09,000
So once you go to the settings over here, right, once you click on the settings here, you will be

20
00:01:09,000 --> 00:01:13,000
able to see uh, something called as access token.

21
00:01:13,000 --> 00:01:16,000
So just go ahead and create your access token okay.

22
00:01:16,000 --> 00:01:19,000
So here you can see I've created so many different different access token.

23
00:01:19,000 --> 00:01:21,000
You can just go ahead and click on new token.

24
00:01:21,000 --> 00:01:24,000
Give the token name right what type you want.

25
00:01:24,000 --> 00:01:30,000
You can basically select read because you're just going to read the token uh to access any LM models

26
00:01:30,000 --> 00:01:30,000
okay.

27
00:01:30,000 --> 00:01:36,000
So here I can probably go ahead and write uh lang chain okay.

28
00:01:36,000 --> 00:01:40,000
And here I will just go ahead and select read and I'll generate a token.

29
00:01:41,000 --> 00:01:45,000
Now this is the token that I'm actually going to use in order to work with hugging face.

30
00:01:45,000 --> 00:01:46,000
Uh, right.

31
00:01:46,000 --> 00:01:48,000
All the embedding techniques that are available in hugging face.

32
00:01:48,000 --> 00:01:51,000
So now I'm just going to go back to my code.

33
00:01:52,000 --> 00:01:56,000
Uh, first thing first, what I really need to do, if you remember.

34
00:01:56,000 --> 00:01:57,000
Right.

35
00:01:57,000 --> 00:02:00,000
So here we use this dot env.

36
00:02:00,000 --> 00:02:06,000
So same thing I will be copy and pasting it over here because we need to call that same token.

37
00:02:06,000 --> 00:02:06,000
Right.

38
00:02:06,000 --> 00:02:13,000
So what I will be doing is that I will be creating a token, uh, specific token in my environment variable,

39
00:02:13,000 --> 00:02:16,000
and the format will be something like this.

40
00:02:16,000 --> 00:02:16,000
Okay.

41
00:02:16,000 --> 00:02:18,000
So I will show you the format.

42
00:02:18,000 --> 00:02:23,000
So here you can see this is the format that it will be in your environment variable.

43
00:02:23,000 --> 00:02:27,000
So just go ahead and probably open your environment variable dot env.

44
00:02:27,000 --> 00:02:31,000
And there you can just copy and paste this particular token in a new line.

45
00:02:31,000 --> 00:02:32,000
Right.

46
00:02:32,000 --> 00:02:34,000
Like how we did it for OpenAI.

47
00:02:34,000 --> 00:02:36,000
Again, I'm not going to show you the dot env file.

48
00:02:36,000 --> 00:02:41,000
The reason is very simple because this particular token, uh, I need to keep it hidden.

49
00:02:41,000 --> 00:02:41,000
Right.

50
00:02:41,000 --> 00:02:43,000
So I'll just remove this.

51
00:02:44,000 --> 00:02:48,000
Now what I'm actually going to do I'm just going to write OS dot environ.

52
00:02:48,000 --> 00:02:48,000
Okay.

53
00:02:48,000 --> 00:02:54,000
And let me just go ahead and call my hugging face token h f underscore token.

54
00:02:54,000 --> 00:02:54,000
Okay.

55
00:02:55,000 --> 00:02:57,000
So I'll be setting this up token over here.

56
00:02:57,000 --> 00:03:03,000
And I'll go ahead and write OS dot get env with respect to the same token.

57
00:03:03,000 --> 00:03:04,000
Right.

58
00:03:04,000 --> 00:03:09,000
If I execute it, you'll be able to see that this will get executed successfully.

59
00:03:09,000 --> 00:03:13,000
That basically means in my environment variable, whatever token I have actually created, I've assigned

60
00:03:13,000 --> 00:03:15,000
it to this particular hugging face underscore token.

61
00:03:15,000 --> 00:03:15,000
Okay.

62
00:03:15,000 --> 00:03:18,000
So this is the first step that we need to do.

63
00:03:18,000 --> 00:03:18,000
Okay.

64
00:03:19,000 --> 00:03:22,000
Now the second step is that I will go to my requirement dot txt.

65
00:03:22,000 --> 00:03:26,000
And here I'm going to install one library okay.

66
00:03:26,000 --> 00:03:31,000
The library is nothing but it is basically called as sentence underscore transformers.

67
00:03:31,000 --> 00:03:32,000
Okay.

68
00:03:32,000 --> 00:03:38,000
I have to use this particular library because uh in this libraries the embedding technique is basically

69
00:03:38,000 --> 00:03:38,000
available.

70
00:03:39,000 --> 00:03:47,000
So now I will just go ahead and open my terminal and just go ahead and write pip install minus r requirement

71
00:03:47,000 --> 00:03:48,000
dot txt.

72
00:03:49,000 --> 00:03:53,000
Okay so here you will be able to see this and all.

73
00:03:53,000 --> 00:03:56,000
My requirement dot txt will get downloaded.

74
00:03:56,000 --> 00:03:58,000
Now for sentence transformer.

75
00:03:58,000 --> 00:04:02,000
The default that library that we require is nothing but torch.

76
00:04:02,000 --> 00:04:06,000
So here you can see within 30s it will be getting downloaded.

77
00:04:06,000 --> 00:04:08,000
It is somewhere around 5150 9.8 MB.

78
00:04:09,000 --> 00:04:14,000
Once this is getting downloaded we can use this sentence transformer embedding technique for embeddings,

79
00:04:14,000 --> 00:04:15,000
right?

80
00:04:15,000 --> 00:04:19,000
So till then the installation is basically taking place.

81
00:04:19,000 --> 00:04:24,000
I will just go ahead and uh, also we need to also install one more library.

82
00:04:24,000 --> 00:04:31,000
Since I said that uh, recently Lang Chen and Hugging Face, they have combined the hands together and

83
00:04:31,000 --> 00:04:37,000
they have created another library which is called as lang chain underscore hugging face.

84
00:04:37,000 --> 00:04:43,000
Okay, so we need to also install this library because inside this library only, we'll be finding various

85
00:04:43,000 --> 00:04:47,000
uh, techniques of calling LM models and all, which I will be discussing.

86
00:04:47,000 --> 00:04:50,000
But in this video I'll be focusing more on the embedding technique.

87
00:04:50,000 --> 00:04:51,000
Okay.

88
00:04:51,000 --> 00:04:54,000
So here you can see I have my lecture underscore hugging face.

89
00:04:55,000 --> 00:04:59,000
Um, now what I'm actually going to do is that as soon as this particular installation takes place,

90
00:04:59,000 --> 00:05:03,000
I will also go ahead and install this from now till the installation is taking place.

91
00:05:03,000 --> 00:05:05,000
Let's go back over here.

92
00:05:05,000 --> 00:05:08,000
So I'm just going to go ahead and write my code.

93
00:05:08,000 --> 00:05:15,000
I'll say hey from lang chain underscore hugging face okay I'm going to import.

94
00:05:17,000 --> 00:05:20,000
Hugging face embeddings okay.

95
00:05:20,000 --> 00:05:24,000
So we are going to use this hugging face embeddings over here okay.

96
00:05:24,000 --> 00:05:24,000
Okay.

97
00:05:24,000 --> 00:05:31,000
Now this hugging face embeddings this same hugging face embeddings I can copy and paste it.

98
00:05:31,000 --> 00:05:37,000
And the first, uh, embedding technique that we are specifically going to use is something called a

99
00:05:37,000 --> 00:05:38,000
sentence transformer.

100
00:05:38,000 --> 00:05:41,000
So let me just go ahead and make a field over here.

101
00:05:42,000 --> 00:05:43,000
I'll create a markdown.

102
00:05:43,000 --> 00:05:47,000
And I will also provide some information okay.

103
00:05:47,000 --> 00:05:47,000
Okay.

104
00:05:50,000 --> 00:05:51,000
Over here.

105
00:05:51,000 --> 00:05:56,000
I'll just go ahead and copy and paste it so that you'll also get some kind of description information.

106
00:05:56,000 --> 00:05:57,000
Okay.

107
00:05:57,000 --> 00:06:01,000
So here you can see sentence transformer on hugging face hugging face.

108
00:06:01,000 --> 00:06:05,000
Uh, sentence transformer is a Python framework for state of art sentence text and image embeddings.

109
00:06:05,000 --> 00:06:09,000
One of the embedding model is used in the hugging face embedding class.

110
00:06:09,000 --> 00:06:14,000
We have also added uh analysis called as sentence transformer embedding for users who are familiar with

111
00:06:14,000 --> 00:06:16,000
directly using that package.

112
00:06:16,000 --> 00:06:16,000
Okay.

113
00:06:16,000 --> 00:06:19,000
So this basically uses something called a sentence Bert.

114
00:06:19,000 --> 00:06:20,000
Okay.

115
00:06:20,000 --> 00:06:24,000
Uh, with the help of that, you will be able to call any embedding techniques.

116
00:06:24,000 --> 00:06:28,000
So here I will just go ahead and uh, use this Huggingface embeddings.

117
00:06:28,000 --> 00:06:30,000
Now here also you can see the installation has taken place.

118
00:06:30,000 --> 00:06:35,000
Now let me quickly again do the pip install requirement dot txt because I also need to install this

119
00:06:35,000 --> 00:06:37,000
link chain underscore huggingface.

120
00:06:37,000 --> 00:06:41,000
Okay again installation is something that you will always be requiring.

121
00:06:41,000 --> 00:06:43,000
So I'll just go ahead and execute this.

122
00:06:43,000 --> 00:06:46,000
Now inside this I will give my model name okay.

123
00:06:46,000 --> 00:06:49,000
So here I'm going to say hey this will basically be my model name.

124
00:06:50,000 --> 00:06:52,000
And let's go ahead and give my model name over here.

125
00:06:52,000 --> 00:06:58,000
The model name will be something called as uh, if you go ahead and search in hugging face, this is

126
00:06:58,000 --> 00:07:02,000
the model that we are going to use all mini lm, L6 version two okay.

127
00:07:02,000 --> 00:07:05,000
This will basically be used for embeddings.

128
00:07:05,000 --> 00:07:08,000
So now I'll go and create my embeddings equal to this one.

129
00:07:09,000 --> 00:07:15,000
So it shows syntax okay I have to write from okay from.

130
00:07:15,000 --> 00:07:21,000
So if I go ahead and execute it I think this should be working fine and I should not get any error.

131
00:07:21,000 --> 00:07:26,000
You can also go ahead and install Tqdm so that you will be able to see that how much time it is probably

132
00:07:26,000 --> 00:07:28,000
taking the model to to get downloaded.

133
00:07:28,000 --> 00:07:31,000
Okay, so there are some warnings.

134
00:07:31,000 --> 00:07:32,000
You can ignore this particular warning.

135
00:07:32,000 --> 00:07:33,000
Okay.

136
00:07:33,000 --> 00:07:37,000
But it is going to take some seconds to probably load the model in our system.

137
00:07:38,000 --> 00:07:38,000
Okay.

138
00:07:38,000 --> 00:07:42,000
And then once we do this I will just go ahead and create my text document.

139
00:07:42,000 --> 00:07:44,000
So this will be my text.

140
00:07:44,000 --> 00:07:45,000
Let's say I'll go ahead and write.

141
00:07:45,000 --> 00:07:47,000
This is a test document.

142
00:07:47,000 --> 00:07:51,000
And we will try to convert this into vectors okay.

143
00:07:52,000 --> 00:07:54,000
Remember the reason why this is working?

144
00:07:54,000 --> 00:07:58,000
Because we have already, uh, called this particular h f underscore token over here.

145
00:07:58,000 --> 00:07:59,000
Right.

146
00:08:00,000 --> 00:08:02,000
So here is this is a test document over here.

147
00:08:03,000 --> 00:08:05,000
I will just go ahead and write my query.

148
00:08:05,000 --> 00:08:12,000
Underscore result is equal to embedding embeddings dot.

149
00:08:13,000 --> 00:08:17,000
And here we can go ahead and write our embed underscore query.

150
00:08:17,000 --> 00:08:18,000
Okay.

151
00:08:18,000 --> 00:08:21,000
And once I go ahead and execute this give my text over here.

152
00:08:21,000 --> 00:08:23,000
Then I will be able to see my.

153
00:08:26,000 --> 00:08:28,000
Query underscore results.

154
00:08:29,000 --> 00:08:32,000
So if I go ahead and execute this you'll be able to see this.

155
00:08:32,000 --> 00:08:36,000
And uh, if you want to go ahead and see the dimension of this particular vectors, this particular

156
00:08:36,000 --> 00:08:40,000
sentence, how much vectors it has got, uh, what is the dimension of that.

157
00:08:40,000 --> 00:08:44,000
So I can go ahead and write length of query underscore results.

158
00:08:45,000 --> 00:08:45,000
Okay.

159
00:08:46,000 --> 00:08:47,000
So it's result okay.

160
00:08:47,000 --> 00:08:51,000
Now here you can see it is basically converted into a 384 dimension.

161
00:08:52,000 --> 00:08:58,000
Similarly uh uh, you know you can also do it for embed documents wherein you can give multiple lists

162
00:08:58,000 --> 00:08:59,000
of statements.

163
00:08:59,000 --> 00:09:05,000
And here is one example where I'm writing embedding dot embed documents with respect to this particular

164
00:09:05,000 --> 00:09:05,000
value.

165
00:09:05,000 --> 00:09:07,000
And this is not a test document okay.

166
00:09:07,000 --> 00:09:11,000
So here if I go ahead and see my doc underscore results of zero.

167
00:09:11,000 --> 00:09:13,000
So the first embedding is this one.

168
00:09:13,000 --> 00:09:15,000
And the second embedding also you'll be able to see it.

169
00:09:15,000 --> 00:09:21,000
So this in short actually shows us like how to properly perform with the help of hugging face sentence

170
00:09:21,000 --> 00:09:22,000
transformers.

171
00:09:22,000 --> 00:09:26,000
Similarly you can also do it with hugging face itself.

172
00:09:26,000 --> 00:09:26,000
Okay.

173
00:09:26,000 --> 00:09:31,000
And uh, one is through sentence transformers, which we have already seen it.

174
00:09:31,000 --> 00:09:32,000
Okay.

175
00:09:32,000 --> 00:09:37,000
The other thing is that we can also check something called as hugging face inference API embedding,

176
00:09:37,000 --> 00:09:39,000
but that we will see it in the later stage.

177
00:09:39,000 --> 00:09:43,000
This is just to give you an idea like what all techniques we can specifically use over here.

178
00:09:43,000 --> 00:09:47,000
So, uh, I hope, uh, you were able to understand about the embedding techniques.

179
00:09:47,000 --> 00:09:53,000
Uh, now, see, uh, step by step, we are able to cover each and every thing right from data ingestion

180
00:09:53,000 --> 00:09:58,000
to text splitter to, you know, probably converting vectors, uh, text into vectors.

181
00:09:58,000 --> 00:10:01,000
Then you also saw one example of vector store DB.

182
00:10:01,000 --> 00:10:04,000
Right now there are two important things again after this.

183
00:10:04,000 --> 00:10:07,000
One is vector store DB different different vector store DB will try to see.

184
00:10:07,000 --> 00:10:13,000
And then after that we will also go ahead and explore uh retrievers and chains, which are super important.

185
00:10:13,000 --> 00:10:17,000
And then we can finally go ahead and implement lot of end to end projects.

186
00:10:17,000 --> 00:10:19,000
So yes, this was it for my side.

187
00:10:19,000 --> 00:10:20,000
Uh, I'll see you all in the next video.

188
00:10:20,000 --> 00:10:20,000
Thank you.

