1
00:00:00,360 --> 00:00:01,290
Angela: In this lesson,

2
00:00:01,290 --> 00:00:03,390
I have another super quick challenge for you

3
00:00:03,390 --> 00:00:06,660
so you can practice setting up the Selenium web driver

4
00:00:06,660 --> 00:00:09,720
in a blank project and scraping a different piece

5
00:00:09,720 --> 00:00:11,730
of data from a website.

6
00:00:11,730 --> 00:00:15,000
This time we're gonna work with the Wikipedia main page.

7
00:00:15,000 --> 00:00:17,400
So you can head over to the course resources

8
00:00:17,400 --> 00:00:19,650
and find the link to this page,

9
00:00:19,650 --> 00:00:21,033
or you can just type it in.

10
00:00:21,960 --> 00:00:25,950
Back in our project, I'm gonna create a new file

11
00:00:25,950 --> 00:00:28,883
and I'm gonna call this interaction.py.

12
00:00:32,520 --> 00:00:36,510
Now, in this new Python file, we're going to interact

13
00:00:36,510 --> 00:00:38,850
with this Wikipedia webpage.

14
00:00:38,850 --> 00:00:41,850
And as a challenge to you, the first thing I want you

15
00:00:41,850 --> 00:00:45,000
to do is to figure out how you can get a hold

16
00:00:45,000 --> 00:00:47,190
of this particular number

17
00:00:47,190 --> 00:00:51,120
and print it out inside our interaction.py.

18
00:00:51,120 --> 00:00:54,421
Remember that you'll need to import Selenium

19
00:00:54,421 --> 00:00:59,010
and also use the web driver to get hold of this page

20
00:00:59,010 --> 00:01:01,410
and then find this particular number

21
00:01:01,410 --> 00:01:03,090
and finally print it out.

22
00:01:03,090 --> 00:01:04,890
And then when you're ready to run it, all you have

23
00:01:04,890 --> 00:01:06,360
to do is right click

24
00:01:06,360 --> 00:01:09,870
and then run this interaction.py and it'll work

25
00:01:09,870 --> 00:01:13,140
and you should see the outcome being printed

26
00:01:13,140 --> 00:01:14,280
in your console.

27
00:01:14,280 --> 00:01:16,833
So pause the video now and give that a go.

28
00:01:19,920 --> 00:01:22,110
Alright, so here's the solution.

29
00:01:22,110 --> 00:01:25,680
First, we're going to go into the Selenium package,

30
00:01:25,680 --> 00:01:28,050
which we've already installed into this project

31
00:01:28,050 --> 00:01:30,300
so we don't have to install it again.

32
00:01:30,300 --> 00:01:33,450
And then we're going to import the web driver.

33
00:01:33,450 --> 00:01:37,770
Now using the web driver, we're going to create a new driver

34
00:01:37,770 --> 00:01:40,023
from the Chrome browser,

35
00:01:43,620 --> 00:01:48,120
But this is what we put to initialize a new Chrome driver.

36
00:01:48,120 --> 00:01:52,770
Now once we've created our driver,

37
00:01:52,770 --> 00:01:57,270
now we can use the driver to navigate to our webpage,

38
00:01:57,270 --> 00:01:59,760
which is done using get,

39
00:01:59,760 --> 00:02:03,663
and this is the URL, which we'll copy and paste into here.

40
00:02:04,620 --> 00:02:07,860
And once we've gotten hold of this page, then we're going

41
00:02:07,860 --> 00:02:11,310
to try to narrow down on this particular element.

42
00:02:11,310 --> 00:02:14,190
So let's go ahead and inspect it.

43
00:02:14,190 --> 00:02:17,520
And you can see that it's inside an anchor tag

44
00:02:17,520 --> 00:02:20,040
with no particular identifiers.

45
00:02:20,040 --> 00:02:23,880
There's no id, there's no name, there's no class.

46
00:02:23,880 --> 00:02:27,780
But this anchor tag lives in a div that has an id.

47
00:02:27,780 --> 00:02:31,980
So this article count is going to be a unique identifier

48
00:02:31,980 --> 00:02:35,940
for the div that holds this particular anchor tag.

49
00:02:35,940 --> 00:02:38,100
So we can narrow in on this anchor tag

50
00:02:38,100 --> 00:02:40,650
using our CSS selectors.

51
00:02:40,650 --> 00:02:44,970
So we can say driver.find_element(By.CSS_SELECTOR)

52
00:02:44,970 --> 00:02:48,030
make sure that it's element, not elements.

53
00:02:48,030 --> 00:02:50,820
And then inside here we're going to put our selector,

54
00:02:50,820 --> 00:02:55,170
which is first the id of articlecount,

55
00:02:55,170 --> 00:02:58,950
and that is going to be proceeded with a pound sign.

56
00:02:58,950 --> 00:03:01,860
And then inside that div with that id, we're looking

57
00:03:01,860 --> 00:03:04,203
for the first anchor tag.

58
00:03:05,520 --> 00:03:07,230
Now notice that inside that div,

59
00:03:07,230 --> 00:03:09,540
there's actually two anchor tags.

60
00:03:09,540 --> 00:03:13,530
But by using this find_element(By.CSS_SELECTOR)

61
00:03:13,530 --> 00:03:15,510
it's only gonna give us the first one

62
00:03:15,510 --> 00:03:17,940
that matches this criteria.

63
00:03:17,940 --> 00:03:18,897
So this is going

64
00:03:18,897 --> 00:03:23,343
to be our article_count.

65
00:03:25,050 --> 00:03:28,230
And now what we wanna do is we wanna print

66
00:03:28,230 --> 00:03:31,830
the article_count.text.

67
00:03:31,830 --> 00:03:33,960
So now let's go ahead and right click

68
00:03:33,960 --> 00:03:36,630
and run our interaction.py.

69
00:03:36,630 --> 00:03:39,840
It should open up our browser to this page.

70
00:03:39,840 --> 00:03:43,293
And now it should have found and printed out that number.

71
00:03:44,130 --> 00:03:46,410
So this is what we've been doing so far,

72
00:03:46,410 --> 00:03:49,182
creating our driver, opening webpages,

73
00:03:49,182 --> 00:03:51,666
and then finding specific elements

74
00:03:51,666 --> 00:03:54,540
and printing some sort of property.

75
00:03:54,540 --> 00:03:58,020
But the next step is to actually form some sort

76
00:03:58,020 --> 00:04:00,510
of interaction with the webpage.

77
00:04:00,510 --> 00:04:02,280
For example, clicking on a link

78
00:04:02,280 --> 00:04:05,400
or typing something into the search bar.

79
00:04:05,400 --> 00:04:08,070
Because after all, when we're working with websites,

80
00:04:08,070 --> 00:04:11,040
it's often that we'll need to interact with it in order

81
00:04:11,040 --> 00:04:12,600
to navigate to new pages

82
00:04:12,600 --> 00:04:16,589
and get hold of specific pieces of information

83
00:04:16,589 --> 00:04:18,300
that we're interested in.

84
00:04:18,300 --> 00:04:21,089
And that's what I'm gonna show you in the next lesson.

85
00:04:21,089 --> 00:04:22,190
So I'll see you there.

