1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:03,000
So we are going to continue the discussion with respect to natural language processing.

3
00:00:03,000 --> 00:00:07,000
In this video we are going to discuss about the second architecture that is Skipgram.

4
00:00:07,000 --> 00:00:14,000
Now already I have actually shown you how does Cbow actually works continuous bag of words.

5
00:00:14,000 --> 00:00:18,000
And I also showed you that how the neural network gets trained right now.

6
00:00:18,000 --> 00:00:20,000
What is the difference between Cbow and skip gram?

7
00:00:20,000 --> 00:00:21,000
What is the difference between the architecture?

8
00:00:21,000 --> 00:00:23,000
It is very simple guys.

9
00:00:23,000 --> 00:00:24,000
Right now just focus.

10
00:00:24,000 --> 00:00:26,000
I'm going to take the same data set over here.

11
00:00:26,000 --> 00:00:29,000
Let's say that Enron company is related to data science.

12
00:00:29,000 --> 00:00:30,000
I've written over here.

13
00:00:30,000 --> 00:00:31,000
Right.

14
00:00:31,000 --> 00:00:35,000
And with respect to this, you can see that I've created my input and output.

15
00:00:35,000 --> 00:00:41,000
Now if I am using skip gram, then the thing that is going to change is that everything will be same.

16
00:00:41,000 --> 00:00:45,000
Let's say that I have taken the window size as this.

17
00:00:45,000 --> 00:00:54,000
So if I probably go ahead and show you now, what will happen is that before the input was this specific

18
00:00:54,000 --> 00:00:57,000
test and the output was this specific test.

19
00:00:57,000 --> 00:01:05,000
Now with the help of skip gram over here, the the initially let's say that if this was input now this

20
00:01:05,000 --> 00:01:06,000
is going to become the input.

21
00:01:07,000 --> 00:01:10,000
And this will basically be the output.

22
00:01:10,000 --> 00:01:10,000
Right.

23
00:01:10,000 --> 00:01:16,000
And this is with respect to window size is equal to five right.

24
00:01:16,000 --> 00:01:18,000
Window size is equal to five.

25
00:01:18,000 --> 00:01:19,000
All the steps will be same.

26
00:01:19,000 --> 00:01:22,000
Only what we are doing is that we are changing the input and we are changing the output.

27
00:01:22,000 --> 00:01:26,000
Before the input was this all text and the output was this.

28
00:01:26,000 --> 00:01:32,000
Now what will happen is that entirely when we are creating this neural network now in the input let's

29
00:01:32,000 --> 00:01:35,000
say that I have this is word related word two word right.

30
00:01:35,000 --> 00:01:44,000
So in the input I'm going to basically have a input layer with seven vectors that will be going.

31
00:01:44,000 --> 00:01:45,000
Why seven?

32
00:01:45,000 --> 00:01:51,000
Because if we probably see this our vectors, how many number of, uh vocabulary?

33
00:01:51,000 --> 00:01:52,000
What is the vocabulary size?

34
00:01:52,000 --> 00:01:55,000
1234567.

35
00:01:55,000 --> 00:01:55,000
Right.

36
00:01:55,000 --> 00:02:03,000
So initially in the input layer I will be giving a input which will be basically having uh seven vectors

37
00:02:03,000 --> 00:02:04,000
seven dimension vectors.

38
00:02:04,000 --> 00:02:10,000
And then in the middle I will basically be having my window size vectors.

39
00:02:10,000 --> 00:02:14,000
So window size is nothing but 512345.

40
00:02:14,000 --> 00:02:15,000
Just understand this.

41
00:02:15,000 --> 00:02:17,000
These are just nodes okay.

42
00:02:17,000 --> 00:02:21,000
And in the output you can see that I'm having four words right.

43
00:02:21,000 --> 00:02:24,000
So everything that is present with respect to this right.

44
00:02:24,000 --> 00:02:27,000
Similarly this will get constructed over here at this point of time.

45
00:02:27,000 --> 00:02:34,000
So over here in the output layer you will be able to see that I will be having one word, two word,

46
00:02:35,000 --> 00:02:39,000
three word and the fourth word.

47
00:02:39,000 --> 00:02:42,000
So this is what is the kind of output we will be getting.

48
00:02:42,000 --> 00:02:48,000
And again here it is very much simple because every one will be like seven dimension over here again.

49
00:02:48,000 --> 00:02:48,000
Right?

50
00:02:48,000 --> 00:02:52,000
So I hope you are able to understand we have just changed the direction with respect to this.

51
00:02:52,000 --> 00:02:52,000
Right.

52
00:02:52,000 --> 00:02:54,000
So this will be my input layer.

53
00:02:55,000 --> 00:03:01,000
So here you can see that we will basically be having a seven cross five matrix with respect to the weights.

54
00:03:01,000 --> 00:03:04,000
Because weights initially be randomly and initialized.

55
00:03:04,000 --> 00:03:05,000
Right.

56
00:03:05,000 --> 00:03:07,000
And then we need to train this right.

57
00:03:07,000 --> 00:03:13,000
And then we will be having with respect to this all, I'll be connecting here to here here to here.

58
00:03:13,000 --> 00:03:19,000
And this will basically be a five cross seven uh matrix with respect to weights.

59
00:03:19,000 --> 00:03:21,000
And then this will be five cross seven.

60
00:03:21,000 --> 00:03:25,000
Similarly over here you will be seeing that this will be five cross seven.

61
00:03:25,000 --> 00:03:27,000
And below one will also be a five cross seven.

62
00:03:27,000 --> 00:03:34,000
But in with respect to that, you can see that initially when I give the input is, is will be uh,

63
00:03:34,000 --> 00:03:36,000
the Is vectors will be going over here.

64
00:03:36,000 --> 00:03:39,000
And as you know, that is is the third, uh, third word.

65
00:03:39,000 --> 00:03:41,000
So this will be going like this.

66
00:03:41,000 --> 00:03:41,000
Right.

67
00:03:41,000 --> 00:03:45,000
So 0010000.

68
00:03:45,000 --> 00:03:47,000
So this is the vectors that will be going over here.

69
00:03:47,000 --> 00:03:53,000
And again since my window size is five over here you'll be able to see other vectors that will get initialized

70
00:03:53,000 --> 00:03:54,000
uh randomly.

71
00:03:54,000 --> 00:03:56,000
This all will be connected with weights.

72
00:03:56,000 --> 00:03:59,000
So seven cross five uh weight matrix will be created.

73
00:03:59,000 --> 00:04:02,000
And with respect to this our forward propagation will happen.

74
00:04:02,000 --> 00:04:07,000
And obviously you know that if you probably know and write what all things happens in between the hidden

75
00:04:07,000 --> 00:04:13,000
layer, the input weights is getting multiplied by the weights itself, and then a bias is added and

76
00:04:13,000 --> 00:04:15,000
the activation function is applied on top of it.

77
00:04:15,000 --> 00:04:23,000
In the output layer, we basically apply a softmax function softmax function so that we compute it with

78
00:04:23,000 --> 00:04:28,000
y and y hat y hat is the predicted one, Y is the real data.

79
00:04:28,000 --> 00:04:30,000
In this particular case, this is my Y.

80
00:04:31,000 --> 00:04:34,000
Sorry, this is my Y right?

81
00:04:34,000 --> 00:04:36,000
In the first case y is I neuron.

82
00:04:36,000 --> 00:04:37,000
So I neuron whatever.

83
00:04:37,000 --> 00:04:42,000
Um whatever things will be there, whatever vectors will be there here you will be initializing it.

84
00:04:42,000 --> 00:04:42,000
Right.

85
00:04:42,000 --> 00:04:48,000
So here I basically apply a softmax function let's say with respect to I neuron my y is nothing but

86
00:04:48,000 --> 00:04:52,000
100000007 zeros.

87
00:04:52,000 --> 00:04:56,000
And then y hat will be computed since we apply a softmax over here.

88
00:04:57,000 --> 00:04:57,000
Right.

89
00:04:57,000 --> 00:04:58,000
So y hat will be something.

90
00:04:58,000 --> 00:04:59,000
Right.

91
00:04:59,000 --> 00:05:00,000
Some values over here.

92
00:05:00,000 --> 00:05:03,000
And then what we do, we calculate our loss function.

93
00:05:03,000 --> 00:05:08,000
And we make sure that we keep on doing the forward and the backward propagation unless and until all

94
00:05:08,000 --> 00:05:10,000
the loss function decreases right.

95
00:05:10,000 --> 00:05:12,000
The loss value decreases.

96
00:05:12,000 --> 00:05:15,000
And finally you will be seeing that whatever is connected right.

97
00:05:15,000 --> 00:05:18,000
This particular word will be shown in the form of five vectors.

98
00:05:18,000 --> 00:05:22,000
This word will be shown in the form of five vectors once the loss is completely minimized.

99
00:05:22,000 --> 00:05:24,000
So same process.

100
00:05:24,000 --> 00:05:29,000
You should definitely know how the Ann actually works, how the optimizer actually work.

101
00:05:29,000 --> 00:05:33,000
And this was just a brief idea about script gram right now.

102
00:05:33,000 --> 00:05:35,000
How can we improve this?

103
00:05:35,000 --> 00:05:40,000
Or the the basic question is that when should we apply c by C bo.

104
00:05:40,000 --> 00:05:51,000
So the question is when should we apply cbow or skip gram?

105
00:05:51,000 --> 00:05:52,000
Right.

106
00:05:52,000 --> 00:05:55,000
The simple thing is that according to the research, right.

107
00:05:55,000 --> 00:06:03,000
Whenever you have a small data set, small corpus, we can definitely go with something like Cbow.

108
00:06:03,000 --> 00:06:05,000
That is continuous bag of word.

109
00:06:05,000 --> 00:06:11,000
If you have a huge data set, you should definitely go with Skip gram.

110
00:06:11,000 --> 00:06:14,000
And that is proven, uh, in many research paper.

111
00:06:14,000 --> 00:06:19,000
So I'm just giving you the direct, um, observation out of it so that you will be able to do this.

112
00:06:19,000 --> 00:06:26,000
Now, let's say if you want to increase Cbow or skip gram, how can you basically do it?

113
00:06:26,000 --> 00:06:31,000
One thing is that you should increase your training data set.

114
00:06:31,000 --> 00:06:33,000
Increase the training data.

115
00:06:33,000 --> 00:06:37,000
That basically means the more the training data, the better the accuracy, right?

116
00:06:37,000 --> 00:06:40,000
Increase the training data.

117
00:06:40,000 --> 00:06:47,000
The second thing is that you can also increase increase the window size.

118
00:06:50,000 --> 00:07:01,000
Window size, which in turn which in turn leads to leads to increase of dimensions.

119
00:07:02,000 --> 00:07:05,000
Increase of vector dimension.

120
00:07:05,000 --> 00:07:06,000
This is super important.

121
00:07:08,000 --> 00:07:15,000
Okay so here I'm saying how to improve Cbow or skip gram.

122
00:07:15,000 --> 00:07:16,000
So this is what you can basically do.

123
00:07:16,000 --> 00:07:18,000
Increase the window size okay.

124
00:07:18,000 --> 00:07:19,000
This is super important.

125
00:07:19,000 --> 00:07:24,000
Increase the window size I'm again going to take this separately or forget it out.

126
00:07:24,000 --> 00:07:27,000
So over here again uh, let me write this.

127
00:07:27,000 --> 00:07:28,000
We have to increase.

128
00:07:28,000 --> 00:07:31,000
We can also increase the window size instead of having five.

129
00:07:31,000 --> 00:07:38,000
I can make this as, uh, 100, you know, so obviously with the increase in the window size, increase

130
00:07:38,000 --> 00:07:39,000
the window size.

131
00:07:40,000 --> 00:07:45,000
If we are increasing the window size, that basically means the vector dimension is also increasing,

132
00:07:45,000 --> 00:07:45,000
right?

133
00:07:45,000 --> 00:07:49,000
The vector dimension is also increasing.

134
00:07:49,000 --> 00:07:54,000
So when we keep on increasing and trying uh, try it, you'll be able to see that we'll be getting better

135
00:07:54,000 --> 00:07:55,000
performance.

136
00:07:55,000 --> 00:07:59,000
So this is also you can basically use you can you can increase it.

137
00:07:59,000 --> 00:08:04,000
Now when we see in the next example right we'll be we'll be using a pre-trained model with respect to

138
00:08:04,000 --> 00:08:04,000
Google.

139
00:08:04,000 --> 00:08:11,000
So Google Word two vec right now this is basically trained in 3 billion words I guess it is 3 billion

140
00:08:11,000 --> 00:08:13,000
words 3 billion words.

141
00:08:13,000 --> 00:08:23,000
And it is going to give me a feature representation, feature representation of 300 vectors, sorry,

142
00:08:23,000 --> 00:08:25,000
300 dimensions.

143
00:08:25,000 --> 00:08:28,000
That basically means suppose if I have a word cricket.

144
00:08:29,000 --> 00:08:35,000
Okay, since cricket is always there in the news, this 3 billion word is from the Google News, right?

145
00:08:35,000 --> 00:08:38,000
And Google is a very big company guys with them.

146
00:08:38,000 --> 00:08:40,000
This amount of data it is very much easy.

147
00:08:40,000 --> 00:08:45,000
So what they are going to do is that if we give a word called as cricket, then it is going to basically

148
00:08:45,000 --> 00:08:48,000
convert this into a 300 dimension.

149
00:08:48,000 --> 00:08:50,000
300 dimension vectors.

150
00:08:50,000 --> 00:08:51,000
Okay.

151
00:08:51,000 --> 00:08:54,000
Vectors, this is super important.

152
00:08:55,000 --> 00:08:58,000
300 dimension vectors okay.

153
00:08:59,000 --> 00:09:01,000
I'm just going to write it as 300 dimension vectors.

154
00:09:01,000 --> 00:09:08,000
And this all example I will try to show you, uh, when we use this, what we'll do is that in the upcoming

155
00:09:08,000 --> 00:09:14,000
session we'll try to we'll try to use a pre-trained model also, and we'll try to also make sure that

156
00:09:14,000 --> 00:09:18,000
we train a new data set from scratch with the help of word two vec.

157
00:09:18,000 --> 00:09:21,000
And that we are going to basically do with the Gensim library.

158
00:09:21,000 --> 00:09:22,000
Okay.

159
00:09:22,000 --> 00:09:25,000
So yes, uh, this we will be doing in the next video.

160
00:09:25,000 --> 00:09:27,000
So I hope you have understood both the architecture.

161
00:09:27,000 --> 00:09:30,000
One is Cbow and one is the skip gram.

162
00:09:30,000 --> 00:09:30,000
Only.

163
00:09:30,000 --> 00:09:36,000
The thing that you really need to know is that you need to have the knowledge of an N or a fully connected

164
00:09:36,000 --> 00:09:38,000
layer neural network.

165
00:09:38,000 --> 00:09:40,000
If you know that, it will be very easy to understand.

166
00:09:40,000 --> 00:09:42,000
So yes, this was it from my side.

167
00:09:42,000 --> 00:09:47,000
I will see you all in the next video where we discuss about how to implement word two vec.

168
00:09:47,000 --> 00:09:50,000
Okay, so yes, I will see you all in the next video.

