1
00:00:00,960 --> 00:00:01,740
In this lesson,

2
00:00:01,740 --> 00:00:06,120
I want to show you how you can use loops with pandas dataframes and how to

3
00:00:06,120 --> 00:00:11,010
iterate over a pandas data frame. So here, I've got a simple dictionary,

4
00:00:11,280 --> 00:00:15,720
I've got two keys, student and score, and under student

5
00:00:15,750 --> 00:00:18,870
I've got a list of student names, and under score

6
00:00:18,870 --> 00:00:21,480
I've got a list of their corresponding scores.

7
00:00:21,990 --> 00:00:26,990
Now we know that we can loop through a dictionary very simply by creating a for

8
00:00:28,500 --> 00:00:30,540
loop and then we say, well,

9
00:00:30,600 --> 00:00:35,600
we're going to go through each of the key and values inside this student

10
00:00:36,120 --> 00:00:36,953
dictionary.

11
00:00:37,230 --> 00:00:41,190
And then we're going to get all of the items in order to be able to loop through

12
00:00:41,190 --> 00:00:45,360
it. So now when I print each of the keys,

13
00:00:45,690 --> 00:00:49,770
you can see that it goes through the dictionary and prints both of the keys.

14
00:00:50,910 --> 00:00:54,240
And similarly, I can get it to loop through both of the values.

15
00:00:54,660 --> 00:00:57,540
So this is how we've been looping through dictionaries

16
00:00:57,870 --> 00:01:01,020
and we've been using it in our dictionary comprehension.

17
00:01:01,890 --> 00:01:05,820
Now you can loop through a data frame in the same way that you loop through a

18
00:01:05,820 --> 00:01:07,800
dictionary. In a lot of ways,

19
00:01:07,830 --> 00:01:12,360
you can consider a data frame pretty much as if you're working with a Python

20
00:01:12,360 --> 00:01:16,170
dictionary. So I'm going to go ahead and import pandas

21
00:01:16,710 --> 00:01:20,730
and I'm going to use pandas to create a new data frame,

22
00:01:21,390 --> 00:01:24,840
and it's going to be created from our student dictionary.

23
00:01:25,230 --> 00:01:26,760
So you've seen all of this before,

24
00:01:26,820 --> 00:01:31,820
and I'll just call this the student_data_frame and I can print it for you to see

25
00:01:34,080 --> 00:01:34,910
what it looks like.

26
00:01:34,910 --> 00:01:35,743
Okay.

27
00:01:38,630 --> 00:01:40,280
This is our data frame.

28
00:01:40,280 --> 00:01:45,280
It looks like a pretty standard table with the first column being all of the

29
00:01:45,290 --> 00:01:49,280
indices. So at zero index is this first row,

30
00:01:49,940 --> 00:01:53,480
and that basically denotes the index of each row.

31
00:01:54,230 --> 00:01:56,660
Now working with this data frame,

32
00:01:56,750 --> 00:02:01,750
we can actually loop through a data frame using the same method as before.

33
00:02:02,990 --> 00:02:07,990
So we can say for key, value in our student_data_frame .items.

34
00:02:14,390 --> 00:02:17,690
So if I print each of the keys,

35
00:02:18,980 --> 00:02:23,180
you can see it's just going to give me the titles of each column.

36
00:02:23,810 --> 00:02:26,720
But if I print each of the values,

37
00:02:28,520 --> 00:02:32,030
then it's going to give me the data in each of the columns.

38
00:02:32,660 --> 00:02:37,280
Now this is not particularly useful because it's basically just looping through

39
00:02:37,610 --> 00:02:42,110
the names of our columns and then the data inside each column.

40
00:02:42,710 --> 00:02:46,430
This is why pandas has a inbuilt  loop

41
00:02:47,180 --> 00:02:50,690
and it's a method called iterrows.

42
00:02:51,140 --> 00:02:56,140
And it allows us to loop through each of the rows of the data frame rather than

43
00:02:56,540 --> 00:02:57,680
each of the columns.

44
00:02:58,490 --> 00:03:03,490
And the way that we do that is we again use a for loop and then we can get hold

45
00:03:03,910 --> 00:03:07,030
of each of the index inside each row,

46
00:03:07,030 --> 00:03:10,570
so that corresponds to the number in that first column.

47
00:03:11,050 --> 00:03:14,320
And then we can get hold of the data in the row.

48
00:03:15,010 --> 00:03:19,660
And then we can say for index row in data frame,

49
00:03:19,690 --> 00:03:23,530
which is student_data_frame, and then its that method.iter

50
00:03:23,530 --> 00:03:24,850
rows.

51
00:03:26,290 --> 00:03:31,290
And now I can loop through each of those rows and print out either the index for

52
00:03:34,150 --> 00:03:35,260
each of those rows.

53
00:03:36,250 --> 00:03:39,820
So you can see that this is going to print out our data frame here,

54
00:03:40,150 --> 00:03:43,360
And then in order to print out each of the index at 0, 1, 2.

55
00:03:43,750 --> 00:03:46,900
But I can also print out each of the rows.

56
00:03:47,530 --> 00:03:52,530
So now I get the first row has a student and a score,

57
00:03:53,380 --> 00:03:57,310
the second row has a student and a score, and the third row has a student and

58
00:03:57,310 --> 00:03:58,143
score.

59
00:03:58,540 --> 00:04:03,540
So each of these rows is a pandas series object. So that means we can tap into the

60
00:04:04,480 --> 00:04:09,480
row and then get hold of the value under a particular column by using the dot

61
00:04:10,690 --> 00:04:13,930
notation. So we can say row.student

62
00:04:14,470 --> 00:04:16,360
and now when it goes through the loop,

63
00:04:16,690 --> 00:04:19,540
you can see first, it's going to print out our entire data frame,

64
00:04:19,870 --> 00:04:24,490
and then it's going to print out each of the students inside that data frame.

65
00:04:25,090 --> 00:04:28,240
Now I can also say row.score,

66
00:04:28,900 --> 00:04:32,320
and now it's going to give me each of the scores inside the data frame.

67
00:04:32,740 --> 00:04:35,320
And I can even do something like this where I say

68
00:04:35,410 --> 00:04:40,410
if the row.student is equal to Angela,

69
00:04:41,440 --> 00:04:46,440
well then we can print that particular row that we're currently looping on, 

70
00:04:47,020 --> 00:04:51,850
.score. And this way we would get the student, Angela's score

71
00:04:51,880 --> 00:04:56,050
which happens to be 56, as you can verify here.

