1
00:00:00,000 --> 00:00:00,000
Hello guys.

2
00:00:00,000 --> 00:00:04,000
So we are going to continue a discussion with respect to Text Splitter.

3
00:00:04,000 --> 00:00:08,000
And in this video we are going to discuss about how to split JSON data.

4
00:00:08,000 --> 00:00:16,000
Let's say that if we are getting a chunk of JSON data from, uh, from an API right after that, if

5
00:00:16,000 --> 00:00:21,000
we read that entire JSON data, how do we split that into chunks so that we'll be able to pass it to

6
00:00:21,000 --> 00:00:22,000
our LLM model.

7
00:00:22,000 --> 00:00:29,000
So here, uh, the JSON splitter, this JSON splitter that we are going to use, uh split JSON data

8
00:00:29,000 --> 00:00:31,000
while allowing control over chunk sizes.

9
00:00:31,000 --> 00:00:35,000
It traverses JSON data depth first and builds smaller JSON chunks.

10
00:00:35,000 --> 00:00:41,000
It attempts to keep nested JSON objects whole, but will split them if needed to, uh, keep chunks

11
00:00:41,000 --> 00:00:44,000
between a minimum chunk size and the maximum chunk size.

12
00:00:44,000 --> 00:00:51,000
If the value is not a nested JSON, but rather a very large string, the string will not be split.

13
00:00:51,000 --> 00:00:56,000
If you need a hard cap on the chunk size, consider composing this with a recursive character.

14
00:00:56,000 --> 00:00:58,000
Text character split up.

15
00:00:58,000 --> 00:01:02,000
So what you can basically do, you can also uh, use along with recursive text splitter.

16
00:01:02,000 --> 00:01:08,000
There is an optional pre-processing step to uh split list by first converting them into JSON dictionary

17
00:01:08,000 --> 00:01:10,000
and then splitting them first.

18
00:01:10,000 --> 00:01:14,000
Okay, how the text is splitted based on JSON value, how the chunk size is measured by the number of

19
00:01:14,000 --> 00:01:15,000
characters.

20
00:01:15,000 --> 00:01:17,000
The same thing is basically happening over here.

21
00:01:17,000 --> 00:01:21,000
Now let me just go ahead and show you how we can probably go ahead and implement this.

22
00:01:22,000 --> 00:01:25,000
For this, I'm going to take this specific example as my API.

23
00:01:25,000 --> 00:01:25,000
Okay.

24
00:01:25,000 --> 00:01:28,000
So this is a very huge API right.

25
00:01:28,000 --> 00:01:34,000
So big API with all the information that I'm actually getting I'm saying not huge API, but if I'm hitting

26
00:01:34,000 --> 00:01:38,000
this API I'm able to get so much of information right over here.

27
00:01:38,000 --> 00:01:38,000
Right.

28
00:01:38,000 --> 00:01:43,000
So what we'll do is that here, obviously we cannot read this entire chunk and just use it right.

29
00:01:43,000 --> 00:01:50,000
Instead what I'll do I'll try to, uh, convert this entire API response, um, in chunks, and then

30
00:01:50,000 --> 00:01:51,000
we'll see that how things will work.

31
00:01:51,000 --> 00:01:51,000
Okay.

32
00:01:52,000 --> 00:01:54,000
So, uh, let's quickly go ahead and do this.

33
00:01:54,000 --> 00:01:59,000
So first of all, what I will do I will be using import JSON along with this.

34
00:01:59,000 --> 00:02:02,000
First of all let me go ahead and select the kernel.

35
00:02:02,000 --> 00:02:06,000
And here I'm going to go ahead and import my requests library okay.

36
00:02:07,000 --> 00:02:10,000
Um let's go ahead and call my JSON data.

37
00:02:10,000 --> 00:02:15,000
Now inside the JSON data, we'll be using this request library and I'll be saying hey request dot get

38
00:02:15,000 --> 00:02:15,000
okay.

39
00:02:16,000 --> 00:02:17,000
Uh request dot get.

40
00:02:17,000 --> 00:02:22,000
So once I write request dot get, all I have to do is that I have to give my URL over here.

41
00:02:22,000 --> 00:02:22,000
Right.

42
00:02:22,000 --> 00:02:26,000
So this is a Get request since I'm reading this entire API.

43
00:02:26,000 --> 00:02:26,000
Right.

44
00:02:26,000 --> 00:02:30,000
So I'll go ahead and hit this particular API over here and here.

45
00:02:30,000 --> 00:02:34,000
What I'm actually going to do I'll also convert this into dot JSON.

46
00:02:34,000 --> 00:02:39,000
So once I get this this will basically be my JSON data, right?

47
00:02:39,000 --> 00:02:42,000
So if you want to probably go ahead and see your JSON data, it will be a huge file.

48
00:02:42,000 --> 00:02:46,000
So this is the entire information with respect to the JSON over here.

49
00:02:46,000 --> 00:02:47,000
Right.

50
00:02:47,000 --> 00:02:52,000
And right now here, uh, most of the information is not visible because uh, if you click on scrollable

51
00:02:52,000 --> 00:02:54,000
element then you'll be able to see it.

52
00:02:54,000 --> 00:03:01,000
Now what we will do, we will try to from we will use this from Lang chain text splitter.

53
00:03:01,000 --> 00:03:07,000
We will import something called as recursive character or recursive JSON splitter.

54
00:03:07,000 --> 00:03:07,000
Right.

55
00:03:07,000 --> 00:03:12,000
Because here we know that they are also nested uh elements inside this particular JSON.

56
00:03:12,000 --> 00:03:12,000
Okay.

57
00:03:13,000 --> 00:03:16,000
Like JSON has a value inside that value there is another JSON.

58
00:03:16,000 --> 00:03:16,000
Okay.

59
00:03:17,000 --> 00:03:21,000
So what we will do over here, we will quickly go ahead and write our JSON splitter.

60
00:03:21,000 --> 00:03:25,000
So I will just go ahead and create my JSON underscore splitter.

61
00:03:25,000 --> 00:03:30,000
And here we will be initializing this recursive JSON splitter.

62
00:03:30,000 --> 00:03:33,000
Here we can give a maximum chunk size.

63
00:03:33,000 --> 00:03:36,000
Let's say the maximum chunk size that I want to give is 300 okay.

64
00:03:36,000 --> 00:03:39,000
And uh after this I will go ahead and write JSON.

65
00:03:39,000 --> 00:03:47,000
Underscore chunks is equal to JSON splitter dot split underscore JSON okay.

66
00:03:48,000 --> 00:03:52,000
And here we are going to give our entire JSON Jason underscore data.

67
00:03:52,000 --> 00:03:53,000
Okay.

68
00:03:54,000 --> 00:03:58,000
And this is what is my Jason underscore data, which I have actually got it over here.

69
00:03:58,000 --> 00:03:59,000
Okay.

70
00:03:59,000 --> 00:04:04,000
Once I give this, uh, then, uh, you will be able to see that I will be able to get this Jason chunks.

71
00:04:04,000 --> 00:04:10,000
So now if I go ahead and execute this Jason chunks, you'll be able to see, hey, uh, we are getting

72
00:04:10,000 --> 00:04:13,000
something, but, uh, right now again, you will not be able to see it.

73
00:04:13,000 --> 00:04:16,000
So what we'll do, we'll just go ahead and print the top three chunks.

74
00:04:16,000 --> 00:04:21,000
So I'll write from chunk for chunk in JSON underscore chunk.

75
00:04:21,000 --> 00:04:24,000
Let's take the top three element okay.

76
00:04:24,000 --> 00:04:25,000
Drop three document.

77
00:04:25,000 --> 00:04:27,000
And then we'll go ahead and print the chunk.

78
00:04:28,000 --> 00:04:30,000
So here I will just go ahead and print it.

79
00:04:30,000 --> 00:04:33,000
So this is my first chunk open I in the key value pairs.

80
00:04:33,000 --> 00:04:36,000
Then the second one with respect to paths.

81
00:04:36,000 --> 00:04:37,000
Then third one with respect to path.

82
00:04:37,000 --> 00:04:38,000
Right.

83
00:04:38,000 --> 00:04:44,000
And this is how easily you are able to probably get this entire information.

84
00:04:45,000 --> 00:04:47,000
You may be also having a scenario right now.

85
00:04:47,000 --> 00:04:49,000
If you see this chunk, it is normally in the key value pair.

86
00:04:49,000 --> 00:04:52,000
But let's say I want to get this into documents.

87
00:04:52,000 --> 00:05:03,000
So the splitter can also can also output also output documents.

88
00:05:03,000 --> 00:05:04,000
Okay.

89
00:05:04,000 --> 00:05:07,000
So here I'm going to create my documents.

90
00:05:07,000 --> 00:05:09,000
And let's go ahead and use the splitter again.

91
00:05:09,000 --> 00:05:12,000
Sorry JSON splitter dot here.

92
00:05:12,000 --> 00:05:15,000
Instead of writing this I will use create documents again.

93
00:05:15,000 --> 00:05:19,000
Um in the create documents we will give a parameter which will be my text.

94
00:05:20,000 --> 00:05:23,000
And here I'm just going to give my JSON underscore data okay.

95
00:05:24,000 --> 00:05:29,000
Now once I get to my docs again I can go ahead and display for doc in docs off.

96
00:05:29,000 --> 00:05:31,000
Let's see the top three docs.

97
00:05:31,000 --> 00:05:34,000
I will just go ahead and print the doc.

98
00:05:35,000 --> 00:05:38,000
So here you can see this is my documents, right.

99
00:05:38,000 --> 00:05:41,000
The same information that I am able to get it okay.

100
00:05:41,000 --> 00:05:47,000
Now let's say that I want to also directly get the string content okay.

101
00:05:47,000 --> 00:05:50,000
I don't want in this particular format like page underscore content or something like that.

102
00:05:50,000 --> 00:05:51,000
Right.

103
00:05:51,000 --> 00:05:56,000
So for that also what you can do, you can directly get this particular text just by writing splitter

104
00:05:56,000 --> 00:06:00,000
or JSON splitter dot split underscore text.

105
00:06:00,000 --> 00:06:00,000
Right.

106
00:06:00,000 --> 00:06:04,000
So there is also split underscore text the split underscore JSON.

107
00:06:04,000 --> 00:06:06,000
And here I'm just going to get my JSON data.

108
00:06:06,000 --> 00:06:07,000
Okay.

109
00:06:07,000 --> 00:06:13,000
And if I go ahead and write print let's say text of zero text of zero.

110
00:06:13,000 --> 00:06:18,000
And let's go ahead and print the text of one.

111
00:06:19,000 --> 00:06:25,000
Text of one, you'll be able to display both of them.

112
00:06:25,000 --> 00:06:25,000
Right.

113
00:06:25,000 --> 00:06:28,000
So both this information is basically getting displayed.

114
00:06:28,000 --> 00:06:35,000
So I hope, uh, you got this idea about how you can split with, uh, JSON, uh, with the help of

115
00:06:35,000 --> 00:06:36,000
recursive JSON split data.

116
00:06:36,000 --> 00:06:38,000
Uh, recursive.

117
00:06:38,000 --> 00:06:39,000
Uh, let me just see the library again.

118
00:06:39,000 --> 00:06:40,000
Recursive JSON splitter.

119
00:06:40,000 --> 00:06:41,000
Okay.

120
00:06:41,000 --> 00:06:44,000
And uh, again, these are just the splitting techniques.

121
00:06:44,000 --> 00:06:48,000
Uh, now in the upcoming videos we are going to see about different different embedding techniques.

