WEBVTT

0
00:00.960 --> 00:01.740
In this lesson,

1
00:01.740 --> 00:06.120
I want to show you how you can use loops with pandas dataframes and how to

2
00:06.120 --> 00:11.010
iterate over a pandas data frame. So here, I've got a simple dictionary,

3
00:11.280 --> 00:15.720
I've got two keys, student and score, and under student

4
00:15.750 --> 00:18.870
I've got a list of student names, and under score

5
00:18.870 --> 00:21.480
I've got a list of their corresponding scores.

6
00:21.990 --> 00:26.990
Now we know that we can loop through a dictionary very simply by creating a for

7
00:28.500 --> 00:30.540
loop and then we say, well,

8
00:30.600 --> 00:35.600
we're going to go through each of the key and values inside this student

9
00:36.120 --> 00:36.953
dictionary.

10
00:37.230 --> 00:41.190
And then we're going to get all of the items in order to be able to loop through

11
00:41.190 --> 00:45.360
it. So now when I print each of the keys,

12
00:45.690 --> 00:49.770
you can see that it goes through the dictionary and prints both of the keys.

13
00:50.910 --> 00:54.240
And similarly, I can get it to loop through both of the values.

14
00:54.660 --> 00:57.540
So this is how we've been looping through dictionaries

15
00:57.870 --> 01:01.020
and we've been using it in our dictionary comprehension.

16
01:01.890 --> 01:05.820
Now you can loop through a data frame in the same way that you loop through a

17
01:05.820 --> 01:07.800
dictionary. In a lot of ways,

18
01:07.830 --> 01:12.360
you can consider a data frame pretty much as if you're working with a Python

19
01:12.360 --> 01:16.170
dictionary. So I'm going to go ahead and import pandas

20
01:16.710 --> 01:20.730
and I'm going to use pandas to create a new data frame,

21
01:21.390 --> 01:24.840
and it's going to be created from our student dictionary.

22
01:25.230 --> 01:26.760
So you've seen all of this before,

23
01:26.820 --> 01:31.820
and I'll just call this the student_data_frame and I can print it for you to see

24
01:34.080 --> 01:34.910
what it looks like.

25
01:34.910 --> 01:35.743
<v 1>Okay.</v>

26
01:38.630 --> 01:40.280
<v 0>This is our data frame.</v>

27
01:40.280 --> 01:45.280
It looks like a pretty standard table with the first column being all of the

28
01:45.290 --> 01:49.280
indices. So at zero index is this first row,

29
01:49.940 --> 01:53.480
and that basically denotes the index of each row.

30
01:54.230 --> 01:56.660
Now working with this data frame,

31
01:56.750 --> 02:01.750
we can actually loop through a data frame using the same method as before.

32
02:02.990 --> 02:07.990
So we can say for key, value in our student_data_frame .items.

33
02:14.390 --> 02:17.690
So if I print each of the keys,

34
02:18.980 --> 02:23.180
you can see it's just going to give me the titles of each column.

35
02:23.810 --> 02:26.720
But if I print each of the values,

36
02:28.520 --> 02:32.030
then it's going to give me the data in each of the columns.

37
02:32.660 --> 02:37.280
Now this is not particularly useful because it's basically just looping through

38
02:37.610 --> 02:42.110
the names of our columns and then the data inside each column.

39
02:42.710 --> 02:46.430
This is why pandas has a inbuilt  loop

40
02:47.180 --> 02:50.690
and it's a method called iterrows.

41
02:51.140 --> 02:56.140
And it allows us to loop through each of the rows of the data frame rather than

42
02:56.540 --> 02:57.680
each of the columns.

43
02:58.490 --> 03:03.490
And the way that we do that is we again use a for loop and then we can get hold

44
03:03.910 --> 03:07.030
of each of the index inside each row,

45
03:07.030 --> 03:10.570
so that corresponds to the number in that first column.

46
03:11.050 --> 03:14.320
And then we can get hold of the data in the row.

47
03:15.010 --> 03:19.660
And then we can say for index row in data frame,

48
03:19.690 --> 03:23.530
which is student_data_frame, and then its that method.iter

49
03:23.530 --> 03:24.850
rows.

50
03:26.290 --> 03:31.290
And now I can loop through each of those rows and print out either the index for

51
03:34.150 --> 03:35.260
each of those rows.

52
03:36.250 --> 03:39.820
So you can see that this is going to print out our data frame here,

53
03:40.150 --> 03:43.360
And then in order to print out each of the index at 0, 1, 2.

54
03:43.750 --> 03:46.900
But I can also print out each of the rows.

55
03:47.530 --> 03:52.530
So now I get the first row has a student and a score,

56
03:53.380 --> 03:57.310
the second row has a student and a score, and the third row has a student and

57
03:57.310 --> 03:58.143
score.

58
03:58.540 --> 04:03.540
So each of these rows is a pandas series object. So that means we can tap into the

59
04:04.480 --> 04:09.480
row and then get hold of the value under a particular column by using the dot

60
04:10.690 --> 04:13.930
notation. So we can say row.student

61
04:14.470 --> 04:16.360
and now when it goes through the loop,

62
04:16.690 --> 04:19.540
you can see first, it's going to print out our entire data frame,

63
04:19.870 --> 04:24.490
and then it's going to print out each of the students inside that data frame.

64
04:25.090 --> 04:28.240
Now I can also say row.score,

65
04:28.900 --> 04:32.320
and now it's going to give me each of the scores inside the data frame.

66
04:32.740 --> 04:35.320
And I can even do something like this where I say

67
04:35.410 --> 04:40.410
if the row.student is equal to Angela,

68
04:41.440 --> 04:46.440
well then we can print that particular row that we're currently looping on, 

69
04:47.020 --> 04:51.850
.score. And this way we would get the student, Angela's score

70
04:51.880 --> 04:56.050
which happens to be 56, as you can verify here.