1
00:00:00,000 --> 00:00:01,000
Hello guys.

2
00:00:01,000 --> 00:00:04,000
So we are going to continue the discussion with respect to NLP with deep learning.

3
00:00:04,000 --> 00:00:11,000
In this video and in the upcoming series of video, we are going to discuss about Transformers.

4
00:00:11,000 --> 00:00:15,000
Now Transformers is a very important topic.

5
00:00:15,000 --> 00:00:17,000
Uh, please keep a note of that.

6
00:00:17,000 --> 00:00:24,000
You know, if you really want to excel in deep learning, specifically with respect to NLP task, um,

7
00:00:24,000 --> 00:00:26,000
then Transformers is the thing.

8
00:00:26,000 --> 00:00:31,000
Uh, in this video, uh, I'll just show you a plan of action.

9
00:00:31,000 --> 00:00:32,000
Like what?

10
00:00:32,000 --> 00:00:32,000
All things.

11
00:00:32,000 --> 00:00:34,000
And how we are going to cover this entire topic.

12
00:00:35,000 --> 00:00:39,000
Um, till now, we have already discussed about our an LSTM, gru RNN.

13
00:00:39,000 --> 00:00:42,000
We understood what were the problems over here.

14
00:00:42,000 --> 00:00:42,000
Right?

15
00:00:42,000 --> 00:00:47,000
Then we went to encoder decoder architecture, which was in sequence to sequence learning.

16
00:00:47,000 --> 00:00:53,000
Then uh, over here also we face some kind of problems and then we try to solve that particular problem

17
00:00:53,000 --> 00:00:54,000
through this attention mechanism.

18
00:00:54,000 --> 00:00:55,000
Right.

19
00:00:55,000 --> 00:01:00,000
Um, again, on all this architecture, there were some or the other differences.

20
00:01:00,000 --> 00:01:01,000
Right.

21
00:01:01,000 --> 00:01:06,000
And now finally, we will first of all understand what is the problem in attention mechanism and what

22
00:01:06,000 --> 00:01:10,000
kind of problem we are solving it with the help of transformers.

23
00:01:10,000 --> 00:01:15,000
Um, the plan of action will be that, first of all, uh, we will go ahead and understand why.

24
00:01:15,000 --> 00:01:16,000
Transformers.

25
00:01:16,000 --> 00:01:19,000
Then we will see the architecture of the transformer.

26
00:01:19,000 --> 00:01:23,000
So this is the detailed architecture which is basically shown on the right hand side.

27
00:01:23,000 --> 00:01:26,000
And this architecture looks really cumbersome right.

28
00:01:26,000 --> 00:01:31,000
Probably if you are seeing it for the first time, there are so many things over here and it does not

29
00:01:31,000 --> 00:01:35,000
even match from the architectures which we have learned in encoder, decoder or attention mechanism.

30
00:01:35,000 --> 00:01:38,000
Don't worry, we will break this down.

31
00:01:38,000 --> 00:01:39,000
The entire architecture.

32
00:01:39,000 --> 00:01:41,000
So in this architecture also you have this encoder.

33
00:01:41,000 --> 00:01:42,000
You have this decoder.

34
00:01:42,000 --> 00:01:47,000
But inside the encoder and decoder there are many more things that is actually included which we will

35
00:01:47,000 --> 00:01:48,000
be discussing about it.

36
00:01:48,000 --> 00:01:48,000
Right.

37
00:01:48,000 --> 00:01:53,000
So the plan of action will be that first of all, we will try to understand why Transformers.

38
00:01:53,000 --> 00:01:58,000
Then we will go with the architecture of transformers, wherein the first model that we are going to

39
00:01:58,000 --> 00:02:00,000
cover is something called as self-attention.

40
00:02:00,000 --> 00:02:03,000
And this self-attention you will be seeing this key.

41
00:02:03,000 --> 00:02:06,000
Uh q uh q k v parameters.

42
00:02:06,000 --> 00:02:09,000
So we will also be understanding what exactly this is.

43
00:02:09,000 --> 00:02:09,000
Okay.

44
00:02:09,000 --> 00:02:13,000
We basically say this as query key value pairs.

45
00:02:13,000 --> 00:02:13,000
Okay.

46
00:02:13,000 --> 00:02:14,000
What exactly it is.

47
00:02:14,000 --> 00:02:16,000
We'll discuss about it.

48
00:02:16,000 --> 00:02:19,000
Then we will be talking about positional encoding.

49
00:02:19,000 --> 00:02:19,000
Right.

50
00:02:19,000 --> 00:02:21,000
This is also a very important topic.

51
00:02:21,000 --> 00:02:22,000
Uh in this architecture.

52
00:02:22,000 --> 00:02:25,000
Then we will be covering about Multi-head attention.

53
00:02:25,000 --> 00:02:31,000
And finally we will be combining to all these topics to understand the working of the Transformers.

54
00:02:31,000 --> 00:02:34,000
So this is the plan of action, how we are going to cover this particular topic.

55
00:02:34,000 --> 00:02:40,000
But just to give you an idea why Transformers is really important, because right now, if you if you

56
00:02:40,000 --> 00:02:47,000
have heard about generative AI, right, and the kind of LLM models or multi models that are available

57
00:02:47,000 --> 00:02:48,000
in generative AI, right.

58
00:02:50,000 --> 00:02:55,000
Most of the models, these are basically trained on by using this transformer architecture.

59
00:02:55,000 --> 00:02:56,000
Right.

60
00:02:56,000 --> 00:02:56,000
right?

61
00:02:56,000 --> 00:03:02,000
Specifically, if I talk about some of the models, like Bert, or if I talk about GPT, right.

62
00:03:03,000 --> 00:03:09,000
Uh, and if you know about GPT right now, chat GPT has uh, sorry, OpenAI has come up with this amazing

63
00:03:09,000 --> 00:03:10,000
models, right?

64
00:03:10,000 --> 00:03:14,000
OpenAI specifically or chat GPT applications.

65
00:03:15,000 --> 00:03:19,000
And right now the model that it is using is nothing but GPT four.

66
00:03:19,000 --> 00:03:20,000
Oh, right.

67
00:03:20,000 --> 00:03:21,000
Four oh.

68
00:03:21,000 --> 00:03:26,000
So this specific model, you know, it is based on the transformer architecture, but it is trained

69
00:03:26,000 --> 00:03:28,000
with huge amount of data.

70
00:03:28,000 --> 00:03:35,000
And obviously if I talk about this GPT, it is like GPT four is, uh, using this GPT architecture along

71
00:03:35,000 --> 00:03:39,000
with the transfer learning with respect to this particular architecture, which is trained with huge

72
00:03:39,000 --> 00:03:40,000
amount of data.

73
00:03:40,000 --> 00:03:43,000
So we will be covering all these topics.

74
00:03:43,000 --> 00:03:47,000
But, uh, in this video, I just wanted to give you the about the plan of action, how we are going

75
00:03:47,000 --> 00:03:49,000
to cover each and every specific topics.

76
00:03:50,000 --> 00:03:52,000
So yeah, this was it.

77
00:03:52,000 --> 00:03:56,000
Uh, in my next video I will be talking about why Transformers?

78
00:03:56,000 --> 00:04:02,000
And again, uh, before that, we really need to again revise this encoder decoder architecture and

79
00:04:02,000 --> 00:04:03,000
attention mechanism.

80
00:04:03,000 --> 00:04:05,000
And what is the problem that we are facing over here?

81
00:04:05,000 --> 00:04:06,000
We will be talking about it.

82
00:04:06,000 --> 00:04:06,000
Okay.

83
00:04:07,000 --> 00:04:08,000
Uh, yeah.

84
00:04:08,000 --> 00:04:08,000
This was it.

85
00:04:08,000 --> 00:04:10,000
I will see you all in the next video.

86
00:04:10,000 --> 00:04:10,000
Thank you.