1

00:00:00,950  -->  00:00:07,260
What and let's go ahead and continue extending our project I know we are going to implement something

2

00:00:07,260  -->  00:00:09,980
more fancy and interesting.

3

00:00:10,020  -->  00:00:13,620
We're going to download web pages and write them to the desk.

4

00:00:14,250  -->  00:00:17,360
So in our system we have different kinds of bookmarks.

5

00:00:17,600  -->  00:00:19,610
A bookmark can be a weblink.

6

00:00:19,650  -->  00:00:23,100
It can be a movie or it can be a book.

7

00:00:23,210  -->  00:00:29,850
Now when it comes to babblings when a user bookmarks a web link we were saving the title of the eyeblink

8

00:00:30,120  -->  00:00:32,170
along with your old information.

9

00:00:32,610  -->  00:00:38,560
But now we want to extend the project to also download the corresponding web page content.

10

00:00:38,990  -->  00:00:41,400
Now that would give us a couple of benefits.

11

00:00:41,460  -->  00:00:46,070
One of them is we can maintain a local cache of the web page.

12

00:00:46,080  -->  00:00:52,920
So if tomorrow the post is not available if it is taken on our of the web page is not available then

13

00:00:52,920  -->  00:00:58,260
we still have the local cache of the web page and the user can still browse the bitch.

14

00:00:58,590  -->  00:01:03,640
But the more important benefit is that they can index all those web pages.

15

00:01:03,660  -->  00:01:10,810
Now if we index those web pages it means that the user can actually search this business.

16

00:01:10,830  -->  00:01:18,390
So indexing basically allows us to provide search capability an index is basically a data structure

17

00:01:18,450  -->  00:01:22,500
which allows us to search text in a very efficient way.

18

00:01:22,800  -->  00:01:28,800
And there are third party libraries like so lock on Lucine which can help us in indexing their documents

19

00:01:28,800  -->  00:01:29,600
.

20

00:01:29,880  -->  00:01:36,210
So we're not going to index the web pages but we're going to just download the web page and we're going

21

00:01:36,210  -->  00:01:37,680
to write it to their desk.

22

00:01:37,970  -->  00:01:38,840
OK.

23

00:01:38,870  -->  00:01:41,170
And so there are a couple of pasts involved here.

24

00:01:41,400  -->  00:01:47,790
So when we say we are going to download a web page what we are doing is we are actually doing it normally

25

00:01:47,820  -->  00:01:48,880
read operation.

26

00:01:49,230  -->  00:01:54,180
But in this case the source is in a remote host earlier in the last lesson.

27

00:01:54,210  -->  00:01:57,110
We saw that when we read we actually read from a file.

28

00:01:57,150  -->  00:01:59,470
So do they are the source of just fine.

29

00:01:59,820  -->  00:02:04,110
But now with this extension the source is basically remote host.

30

00:02:04,140  -->  00:02:05,720
So it's the network.

31

00:02:06,090  -->  00:02:12,720
So once we read the document from the remote host then we need to write it with a disk on product we

32

00:02:12,720  -->  00:02:15,160
use the our writer causus.

33

00:02:15,300  -->  00:02:17,630
OK so we are going to do that.

34

00:02:17,670  -->  00:02:21,540
So let's just go ahead and make these extensions.

35

00:02:21,540  -->  00:02:27,330
So I have created this new class called hysterically picnic and it has been added to the Udal package

36

00:02:27,420  -->  00:02:30,730
which was created in the last class on the history class.

37

00:02:30,780  -->  00:02:36,590
Has this method us don't work and all it is going to do as as the name suggests it's going to download

38

00:02:36,630  -->  00:02:37,440
a webpage.

39

00:02:37,530  -->  00:02:41,140
So the input to this method is a you are off of a page.

40

00:02:41,190  -->  00:02:47,100
You are a lawful eyeblink which is a bookmark and it's going to honor the page and written the contents

41

00:02:47,100  -->  00:02:50,460
of the web page as a string.

42

00:02:50,520  -->  00:02:54,350
And here we are using some classes from the Jawa dot net package.

43

00:02:54,390  -->  00:02:59,310
Since we did not discuss it you don't have to worry about it and it's not even production quality code

44

00:02:59,310  -->  00:02:59,920
.

45

00:03:00,000  -->  00:03:07,410
So basically here there is this class called hitcher to be your own connection and it represents a connection

46

00:03:07,410  -->  00:03:09,240
to the remote host.

47

00:03:09,240  -->  00:03:13,380
And then we are invoking this method called Let's get response called on this variable.

48

00:03:13,400  -->  00:03:21,690
Can and when we do that if the response code is in between to those 200 and 300 It means that we are

49

00:03:21,690  -->  00:03:22,740
able to connect with that.

50

00:03:22,740  -->  00:03:28,820
So we're successfully on so we can read the contents of that particular page.

51

00:03:28,920  -->  00:03:29,200
OK.

52

00:03:29,220  -->  00:03:30,950
And that's what we're going to do here.

53

00:03:31,290  -->  00:03:39,510
And so we are in working this matter called Get input stream on the connection object and get input

54

00:03:39,510  -->  00:03:42,920
stream simply returns a regular input stream.

55

00:03:42,930  -->  00:03:46,010
Ok so now we're going to just read from here.

56

00:03:46,530  -->  00:03:52,220
And we know that input stream is the base abstract class for all byte oriented and streams.

57

00:03:52,230  -->  00:03:56,030
So let's go ahead and fill this method and it could be part of the I or you do.

58

00:03:56,370  -->  00:04:04,270
So let's just say create method and let's just go ahead and fill this matter.

59

00:04:04,530  -->  00:04:08,090
So the previous method was reading from a file.

60

00:04:08,580  -->  00:04:15,110
And instead of building a string it was writing it into the string every night.

61

00:04:15,270  -->  00:04:16,950
So that was the last lesson.

62

00:04:16,950  -->  00:04:25,350
So let's just go ahead and copy this because the code is pretty much identical.

63

00:04:26,850  -->  00:04:39,000
So the one difference is let's make this as in to keep it simple and file in good stream object is replaced

64

00:04:39,000  -->  00:04:41,150
with this input stream.

65

00:04:41,250  -->  00:04:43,220
So it's more generic now.

66

00:04:43,890  -->  00:04:50,610
And since we want to accumulate the content of the page we want to build a web page content that uses

67

00:04:50,630  -->  00:04:54,800
string builder.

68

00:04:55,500  -->  00:04:58,300
So we should always use a string builder.

69

00:04:58,440  -->  00:05:07,560
I have seen code where people who are concatenating using the plus operation for such an important operation

70

00:05:07,570  -->  00:05:09,270
and that's very bad.

71

00:05:09,510  -->  00:05:17,020
Now string line is fine let's remove this.

72

00:05:17,180  -->  00:05:18,330
To append

73

00:05:23,990  -->  00:05:32,530
and we're writing a new line that's because real method is strips new lines lets add some catch glossier

74

00:05:32,830  -->  00:05:33,670
to catch clause

75

00:05:35,040  -->  00:05:39,930
.

76

00:05:43,540  -->  00:05:47,600
So we are dimming that string method we are invoking the true string.

77

00:05:47,830  -->  00:05:55,190
It's going to read on the web page content which has been accumulated and this string builder object

78

00:05:55,620  -->  00:05:56,110
.

79

00:05:56,650  -->  00:05:57,440
So that's about it.

80

00:05:57,460  -->  00:06:02,920
And then coding is still UTF 8 and we will be using only you be of 8 in our system because that's like

81

00:06:02,920  -->  00:06:06,640
a universal kind of cursor.

82

00:06:06,650  -->  00:06:10,570
Now let's go back into history to be correct.

83

00:06:10,900  -->  00:06:12,300
And this is fine.

84

00:06:12,310  -->  00:06:15,790
Now let's look at the code which is going to invoke this method.

85

00:06:15,940  -->  00:06:18,250
So if you're going to view our Java.

86

00:06:19,150  -->  00:06:22,550
So we have this browse here and we are

87

00:06:25,390  -->  00:06:31,420
basically for a given user we are going to we are doing the bookmarking here and we are going we are

88

00:06:31,420  -->  00:06:33,670
invoking the same user bookmark.

89

00:06:33,670  -->  00:06:36,130
So that's just going here.

90

00:06:36,160  -->  00:06:39,110
It's into the going into book my controller.

91

00:06:39,380  -->  00:06:43,760
From here we are going to bookmark manager and have already written some code here.

92

00:06:44,120  -->  00:06:51,910
Now we want to download the web page only if the bookmark is an instance of eyeblink.

93

00:06:52,030  -->  00:06:58,360
So here we are passing a bookmark which can be a weblink which can be a book or a movie but we want

94

00:06:58,370  -->  00:07:00,760
to do this only if it is a weblink.

95

00:07:00,790  -->  00:07:01,710
So here we are.

96

00:07:01,770  -->  00:07:03,340
Downcasting it took a blink.

97

00:07:03,560  -->  00:07:09,520
If it an instance off of a blink and we are working to get you out of the matter because wabbling has

98

00:07:09,740  -->  00:07:15,920
a get your delimiter and then we are assigning it to this variable you are in.

99

00:07:15,940  -->  00:07:18,440
So it basically has that you are a lot of eyeblink.

100

00:07:18,760  -->  00:07:25,030
And we want to download only effect for now if it doesn't end with Dot B-B of which means that if it

101

00:07:25,030  -->  00:07:28,700
ends with dockage DML or something like that then we want to download.

102

00:07:28,720  -->  00:07:32,830
So then we are connecting in working this download method here.

103

00:07:33,450  -->  00:07:33,860
OK.

104

00:07:33,880  -->  00:07:36,250
Which is what we haven't seen.

105

00:07:36,760  -->  00:07:41,080
And we are passing the Yoran of the page.

106

00:07:41,560  -->  00:07:44,520
OK so once we get back the response.

107

00:07:44,620  -->  00:07:47,040
So it's assigned to this wearable web page.

108

00:07:47,240  -->  00:07:52,780
And if the web page is not equal to none only then we are going to write it to their desk.

109

00:07:52,780  -->  00:08:01,720
Sometimes it can be not also for instance if the response code is not within this limit then it means

110

00:08:01,730  -->  00:08:06,180
that something is wrong in which case we are going to read on another.

111

00:08:06,430  -->  00:08:11,840
OK so let's go back and let's also for the right matter now.

112

00:08:11,990  -->  00:08:13,440
OK.

113

00:08:14,530  -->  00:08:20,270
So that's also going to get created within the higher your class.

114

00:08:20,290  -->  00:08:29,690
So let's go ahead and the input barometer here is the ID because the when we write to the desk we have

115

00:08:29,680  -->  00:08:34,400
to create a file and the filename will be the name of the idea.

116

00:08:34,400  -->  00:08:39,410
So because the idea is unique to the bookmark we go we can just use that ID.

117

00:08:39,640  -->  00:08:44,950
And we are passing the web page content also because that needs to be written to the file.

118

00:08:45,080  -->  00:08:48,360
So here it is.

119

00:08:48,380  -->  00:08:57,290
So let's just pick this chord here and our are.

120

00:09:00,590  -->  00:09:01,960
Now it's very simple.

121

00:09:02,050  -->  00:09:05,700
Just keep making it right.

122

00:09:07,000  -->  00:09:14,550
So on it as a writer a little bit more explicit.

123

00:09:21,910  -->  00:09:24,200
So it's just the exact opposite stuff

124

00:09:24,190  -->  00:09:30,330
.

125

00:09:30,350  -->  00:09:41,990
OK now let's leave this as UTF 8 and yes we need to give a final name so I'd say Dot hedge DML

126

00:09:45,680  -->  00:09:51,220
and so we are again creating a string here from the ID.

127

00:09:51,250  -->  00:09:53,930
So we are invoking the value of.

128

00:09:54,640  -->  00:09:55,130
And

129

00:09:58,180  -->  00:10:01,190
we need to specify the location.

130

00:10:01,210  -->  00:10:10,140
So here JRD is my workspace eclipse workspace and truly is our project.

131

00:10:10,820  -->  00:10:16,560
And let's create this page a directory called pages and it's right here.

132

00:10:16,610  -->  00:10:19,160
So I have already created this directory called pages.

133

00:10:19,160  -->  00:10:25,490
So you should also create this page just a directory within that trivial project and all the pages that

134

00:10:25,490  -->  00:10:28,120
get downloaded will be read on into that directory.

135

00:10:28,120  -->  00:10:32,560
So that's also put the finals last year.

136

00:10:32,570  -->  00:10:34,690
So that's what we have here.

137

00:10:35,120  -->  00:10:43,420
And we can read more all of this make a writer dark ride and we just

138

00:10:46,490  -->  00:10:47,080
do this

139

00:10:52,100  -->  00:10:59,530
like Eclipse cleared all the exceptions for us exception catch blocks for us and upset.

140

00:10:59,570  -->  00:11:00,300
So

141

00:11:07,030  -->  00:11:08,090
OK that's about it.

142

00:11:08,090  -->  00:11:10,870
So let's just go ahead and run our application now

143

00:11:15,120  -->  00:11:17,450
running something else.

144

00:11:17,440  -->  00:11:26,030
And one other small change I'm mean is that in the weblink there was a different you are here for the

145

00:11:26,020  -->  00:11:29,470
first two eyeblink but that was taking a lot of time to download.

146

00:11:29,470  -->  00:11:37,000
So I just replaced it with something else but later we will change it back but or later when we are

147

00:11:37,000  -->  00:11:44,110
dealing with multithreading we will actually see how we can abandon a request if it is taking too long

148

00:11:44,120  -->  00:11:44,330
.

149

00:11:44,530  -->  00:11:44,890
OK.

150

00:11:44,890  -->  00:11:48,810
So right now it's taking like 10 or 15 seconds you don't know that page.

151

00:11:48,830  -->  00:11:50,970
So if it is it shouldn't be taking so long.

152

00:11:51,040  -->  00:11:57,220
So if something like that happens we should abandon downloading that Biche because that's just not good

153

00:11:57,220  -->  00:12:00,640
to download such kind of Lapidus so that we will see later.

154

00:12:00,700  -->  00:12:02,950
But this start to just let you know.

155

00:12:02,950  -->  00:12:12,800
So I just replace that with this non-IT me quickly launch.

156

00:12:12,790  -->  00:12:18,120
So it is downloading downloading what it is downloading it means it's not loading some link.

157

00:12:18,130  -->  00:12:19,280
It's done.

158

00:12:19,270  -->  00:12:20,080
So let me

159

00:12:22,660  -->  00:12:26,340
refresh this on so there are three pages that God created.

160

00:12:27,040  -->  00:12:27,580
OK.

161

00:12:27,740  -->  00:12:31,450
So you are there are there are three hits the whole pages that go down in order.

162

00:12:31,510  -->  00:12:33,250
So that's it.

163

00:12:33,280  -->  00:12:38,540
So you just go ahead and I'm going to upload some code into the resources section.

164

00:12:38,810  -->  00:12:44,410
So what we have done is I think it's more interesting that we have actually downloaded web pages and

165

00:12:44,410  -->  00:12:45,680
we have written them to their desk.

166

00:12:45,760  -->  00:12:53,360
So we also know how to write data out to their desk and how to read for them from a network.

167

00:12:53,950  -->  00:12:55,090
And that's about it.

168

00:12:55,250  -->  00:12:55,990
Thank you.

169

00:12:56,000  -->  00:12:57,420
I'm happy coding.

170

00:12:57,760  -->  00:12:58,420
And here it is.

171

00:12:58,420  -->  00:13:02,270
Here we are saying that we are downloading some songs so that's the next year.
