1
00:00:00,290 --> 00:00:01,470
Instructor: In the last lesson,

2
00:00:01,470 --> 00:00:03,570
we got started using pandas.

3
00:00:03,570 --> 00:00:04,770
We installed the library,

4
00:00:04,770 --> 00:00:08,370
and we imported it to read a CSV file.

5
00:00:08,370 --> 00:00:11,940
And then, we used that file to get hold of a column,

6
00:00:11,940 --> 00:00:15,450
which has been automatically identified as soon as it read

7
00:00:15,450 --> 00:00:17,160
this data CSV.

8
00:00:17,160 --> 00:00:20,250
So, what exactly are we dealing with here?

9
00:00:20,250 --> 00:00:22,830
Well, one of the most useful things I find

10
00:00:22,830 --> 00:00:25,950
is to do type checks on any of the objects

11
00:00:25,950 --> 00:00:29,520
that I'm working with from a new novel library.

12
00:00:29,520 --> 00:00:32,490
For example, we can use the type method to check

13
00:00:32,490 --> 00:00:36,000
what exactly is the data type of this data

14
00:00:36,000 --> 00:00:38,070
that we're getting back from pandas.

15
00:00:38,070 --> 00:00:42,033
So if I go ahead and print this one, and comment out

16
00:00:42,033 --> 00:00:45,390
this second line, then you can see what we're getting

17
00:00:45,390 --> 00:00:49,410
is what's called a pandas DataFrame object.

18
00:00:49,410 --> 00:00:51,630
In the package overview, they talk about

19
00:00:51,630 --> 00:00:55,380
the two primary data structures of pandas.

20
00:00:55,380 --> 00:00:57,870
Series and a DataFrame.

21
00:00:57,870 --> 00:00:59,490
A DataFrame is kind of

22
00:00:59,490 --> 00:01:02,490
the equivalent of your whole table here.

23
00:01:02,490 --> 00:01:05,430
So, every single sheet inside an Excel file

24
00:01:05,430 --> 00:01:07,080
or inside a Google Sheet file

25
00:01:07,080 --> 00:01:10,203
would be considered a DataFrame in pandas.

26
00:01:11,040 --> 00:01:13,350
Now, what about the second part here,

27
00:01:13,350 --> 00:01:16,140
where we've gotten hold of one of the columns

28
00:01:16,140 --> 00:01:17,550
in our DataFrame?

29
00:01:17,550 --> 00:01:22,170
If I do a type check on this object, and I hit run,

30
00:01:22,170 --> 00:01:26,580
then you can see, this is a pandas Series object.

31
00:01:26,580 --> 00:01:31,580
The Series is the other super important concept in pandas.

32
00:01:31,830 --> 00:01:35,940
And the Series is basically equivalent to a list.

33
00:01:35,940 --> 00:01:40,020
It's kind of like a single column in your table.

34
00:01:40,020 --> 00:01:43,170
So, the temperature column would be a Series,

35
00:01:43,170 --> 00:01:45,450
the condition column would be a Series,

36
00:01:45,450 --> 00:01:48,240
and the day column would also be a Series.

37
00:01:48,240 --> 00:01:50,100
So, once you've groked this idea

38
00:01:50,100 --> 00:01:54,150
that the whole table is basically a DataFrame in pandas,

39
00:01:54,150 --> 00:01:57,270
and every single column is a series

40
00:01:57,270 --> 00:01:59,880
kind of like a list in pandas,

41
00:01:59,880 --> 00:02:01,920
then you're pretty much half of the way there

42
00:02:01,920 --> 00:02:05,190
to understanding how this library works.

43
00:02:05,190 --> 00:02:07,920
Now if we go over to the API reference,

44
00:02:07,920 --> 00:02:08,752
you can see that

45
00:02:08,752 --> 00:02:12,060
this is basically a list of all of the things

46
00:02:12,060 --> 00:02:13,980
that you can do with pandas.

47
00:02:13,980 --> 00:02:17,250
And it is a long list of things.

48
00:02:17,250 --> 00:02:20,520
But let's take a look at those two core classes,

49
00:02:20,520 --> 00:02:22,740
the DataFrame and the Series.

50
00:02:22,740 --> 00:02:25,110
So if we go to the DataFrame, you can see

51
00:02:25,110 --> 00:02:28,680
that it has things on how to construct a new DataFrame,

52
00:02:28,680 --> 00:02:30,150
how to get hold of the index,

53
00:02:30,150 --> 00:02:32,550
how to get hold of the column labels,

54
00:02:32,550 --> 00:02:35,070
and there's a whole bunch of attributes

55
00:02:35,070 --> 00:02:39,030
that you can tap into as well as many methods.

56
00:02:39,030 --> 00:02:41,700
For example, if we take a look at this section

57
00:02:41,700 --> 00:02:44,340
on serialization IO conversion,

58
00:02:44,340 --> 00:02:47,070
you can see that you can actually convert a DataFrame

59
00:02:47,070 --> 00:02:49,800
to various different file types.

60
00:02:49,800 --> 00:02:51,810
You can convert it to an Excel file,

61
00:02:51,810 --> 00:02:54,060
you can convert it to HTML.

62
00:02:54,060 --> 00:02:57,060
You can also convert it to a dictionary.

63
00:02:57,060 --> 00:02:57,990
So if we click on this,

64
00:02:57,990 --> 00:03:00,810
this takes us to the actual documentation

65
00:03:00,810 --> 00:03:03,720
on how you would use this method.

66
00:03:03,720 --> 00:03:06,930
And if you look at the basic Python documentation

67
00:03:06,930 --> 00:03:08,640
versus panda's documentation,

68
00:03:08,640 --> 00:03:10,950
you'll see why this is so much better.

69
00:03:10,950 --> 00:03:14,730
It lists out all of the possible parameters.

70
00:03:14,730 --> 00:03:16,950
It gives you examples on how you can use

71
00:03:16,950 --> 00:03:20,790
each of the methods, and it's also got some related methods

72
00:03:20,790 --> 00:03:23,440
that it thinks that you might want to take a look at.

73
00:03:24,360 --> 00:03:26,850
So, let's use this method.

74
00:03:26,850 --> 00:03:30,060
And what we're gonna do is, I'm gonna get my data,

75
00:03:30,060 --> 00:03:32,880
and I'm going to call to dict,

76
00:03:32,880 --> 00:03:35,370
which is gonna convert it into a dictionary.

77
00:03:35,370 --> 00:03:37,720
So, let's call that data_dict = data.to_dict().

78
00:03:39,150 --> 00:03:42,810
And then if we print out our new data dictionary,

79
00:03:42,810 --> 00:03:46,440
you can see that pandas has taken our table,

80
00:03:46,440 --> 00:03:48,990
and taken each column of the table

81
00:03:48,990 --> 00:03:53,640
to create a separate dictionary for each of the columns.

82
00:03:53,640 --> 00:03:56,850
So, we've got day, temperature, and condition,

83
00:03:56,850 --> 00:03:58,890
and we can now work with this

84
00:03:58,890 --> 00:04:00,963
as if it were a real dictionary.

85
00:04:01,920 --> 00:04:06,480
Now, if we take a look at the Series data type.

86
00:04:06,480 --> 00:04:09,300
And you take a look at the conversion section

87
00:04:09,300 --> 00:04:11,760
for this type of data,

88
00:04:11,760 --> 00:04:14,010
then you can see that you can actually convert

89
00:04:14,010 --> 00:04:17,130
each of the Series to a list, if you wanna be able

90
00:04:17,130 --> 00:04:20,343
to work with it just as you would with any other list.

91
00:04:21,360 --> 00:04:24,150
So, we saw that we can get our data,

92
00:04:24,150 --> 00:04:27,180
and then get the temperature column.

93
00:04:27,180 --> 00:04:30,540
And this, when we printed out the type, was a Series.

94
00:04:30,540 --> 00:04:33,687
So, we can get the Series, and then call .to_list().

95
00:04:34,980 --> 00:04:39,210
And this will turn this data Series into a Python list.

96
00:04:39,210 --> 00:04:40,710
So, let's call that temp_list.

97
00:04:42,000 --> 00:04:44,013
And let's go ahead and print it out.

98
00:04:46,890 --> 00:04:49,710
This would be a list of all the temperatures,

99
00:04:49,710 --> 00:04:54,540
and this is now converted into the raw Python data type.

100
00:04:54,540 --> 00:04:55,950
So, we can do all of the things

101
00:04:55,950 --> 00:04:57,960
that we can do to a Python list,

102
00:04:57,960 --> 00:05:00,423
like for example, we could check its length.

103
00:05:01,260 --> 00:05:03,870
So, here comes another challenge for you.

104
00:05:03,870 --> 00:05:06,450
Given what we've spoken about so far,

105
00:05:06,450 --> 00:05:08,550
can you figure out if you can work out

106
00:05:08,550 --> 00:05:12,720
the average temperature in our column of temperatures?

107
00:05:12,720 --> 00:05:15,450
So remember, you can always Google if you don't remember

108
00:05:15,450 --> 00:05:18,090
how to calculate the average of something.

109
00:05:18,090 --> 00:05:20,193
So, pause the video and give that a go.

110
00:05:22,410 --> 00:05:23,940
All right, so we know that we can get

111
00:05:23,940 --> 00:05:28,740
a list of all of the temperatures in that list,

112
00:05:28,740 --> 00:05:32,640
and we know that Python has a built-in method called sum.

113
00:05:32,640 --> 00:05:35,790
So, then we can get the sum of all of the temperatures

114
00:05:35,790 --> 00:05:37,500
in our list of temperatures,

115
00:05:37,500 --> 00:05:42,213
and then we can divide it by the length of the temp list.

116
00:05:43,110 --> 00:05:44,970
The sum of all the values divided

117
00:05:44,970 --> 00:05:48,240
by the number of values gives us the average,

118
00:05:48,240 --> 00:05:50,070
which is also known as the mean.

119
00:05:50,070 --> 00:05:53,760
And so if we print this out, then you can see,

120
00:05:53,760 --> 00:05:58,760
the average temperature of the week was 17.4.

121
00:05:58,890 --> 00:06:02,130
Now, an alternative way of solving this challenge is,

122
00:06:02,130 --> 00:06:06,660
maybe you took a look through this list of methods,

123
00:06:06,660 --> 00:06:09,450
and you might have come across some of the computations

124
00:06:09,450 --> 00:06:14,340
and statistics that you can do with your Series in pandas.

125
00:06:14,340 --> 00:06:16,740
Now, one of those methods is the mean.

126
00:06:16,740 --> 00:06:19,740
So, you can actually get rid of all of this excess work,

127
00:06:19,740 --> 00:06:21,900
and take our data Series,

128
00:06:21,900 --> 00:06:24,210
which is basically the column of data

129
00:06:24,210 --> 00:06:29,210
under the heading temp, and simply call .mean on it.

130
00:06:29,520 --> 00:06:33,060
And now if I print this out, you'll see

131
00:06:33,060 --> 00:06:35,640
it's the same result as before

132
00:06:35,640 --> 00:06:37,773
without a lot of the extra work.

133
00:06:38,640 --> 00:06:40,830
In addition to the mean, you can get other things

134
00:06:40,830 --> 00:06:43,170
like the median or the mode,

135
00:06:43,170 --> 00:06:46,050
and a whole bunch of other things just by calling

136
00:06:46,050 --> 00:06:49,200
the right method on the data Series.

137
00:06:49,200 --> 00:06:51,120
So, here's the challenge for you.

138
00:06:51,120 --> 00:06:54,060
I want you to get hold of the maximum value

139
00:06:54,060 --> 00:06:56,190
from this column of temperatures

140
00:06:56,190 --> 00:06:59,130
by using one of the data Series methods.

141
00:06:59,130 --> 00:07:01,860
Pause the video, have a look at the documentation,

142
00:07:01,860 --> 00:07:04,010
and see if you can complete this challenge.

143
00:07:06,120 --> 00:07:08,220
All right, to get the maximum value,

144
00:07:08,220 --> 00:07:11,160
we're probably gonna need this max method.

145
00:07:11,160 --> 00:07:14,670
So, we call this method in the same way as we did before,

146
00:07:14,670 --> 00:07:17,610
which is get hold of the data Series.

147
00:07:17,610 --> 00:07:20,610
So, our entire table is stored in data,

148
00:07:20,610 --> 00:07:22,650
and then we can get the column

149
00:07:22,650 --> 00:07:24,810
under the heading temperature.

150
00:07:24,810 --> 00:07:27,480
So, this is now a data Series,

151
00:07:27,480 --> 00:07:30,330
and then we can call that method max on it.

152
00:07:30,330 --> 00:07:31,740
And if we print it out,

153
00:07:31,740 --> 00:07:35,580
then you can see what we're getting is 24.

154
00:07:35,580 --> 00:07:38,193
So, the highest temperature was 24.

155
00:07:39,780 --> 00:07:42,120
So, you've seen that when we're working with pandas,

156
00:07:42,120 --> 00:07:46,620
it's really easy to get hold of data in a particular column.

157
00:07:46,620 --> 00:07:48,930
All we have to do is just take the DataFrame,

158
00:07:48,930 --> 00:07:50,700
use a set of square brackets,

159
00:07:50,700 --> 00:07:53,880
and then specify the name of the column,

160
00:07:53,880 --> 00:07:58,680
which it takes by default as the first row of the data.

161
00:07:58,680 --> 00:08:02,490
So, the day column, the temp, the condition.

162
00:08:02,490 --> 00:08:05,760
So if I want to get hold of all of the conditions,

163
00:08:05,760 --> 00:08:08,520
then I would say data["condition"].

164
00:08:08,520 --> 00:08:11,073
And if I go ahead and print this out,

165
00:08:12,090 --> 00:08:15,330
you can see it gets hold of all of the weather conditions,

166
00:08:15,330 --> 00:08:18,570
and selects that column to print out.

167
00:08:18,570 --> 00:08:21,360
Now, an alternative way to using

168
00:08:21,360 --> 00:08:24,750
the square bracket notation, where you have to be careful

169
00:08:24,750 --> 00:08:26,310
about the string you use here,

170
00:08:26,310 --> 00:08:30,480
it has to match, the name of the column exactly.

171
00:08:30,480 --> 00:08:32,760
Another way that you can work with the columns

172
00:08:32,760 --> 00:08:37,760
is simply by calling data.condition.

173
00:08:37,980 --> 00:08:41,010
And the fact that this code is valid at all means that

174
00:08:41,010 --> 00:08:45,930
pandas behind the scenes, has taken each of these columns

175
00:08:45,930 --> 00:08:47,100
and each of the headings,

176
00:08:47,100 --> 00:08:50,370
and converted those headings into attributes.

177
00:08:50,370 --> 00:08:53,700
So, we can say data.condition or data.day.

178
00:08:53,700 --> 00:08:55,710
And if I print that out, you can see

179
00:08:55,710 --> 00:08:57,210
it's actually gonna be

180
00:08:57,210 --> 00:09:01,293
exactly the same as doing it like this.

181
00:09:02,220 --> 00:09:05,040
So, it's up to you which method you wanna use

182
00:09:05,040 --> 00:09:07,230
to select the columns.

183
00:09:07,230 --> 00:09:11,580
But be aware that if your column name has a capital C,

184
00:09:11,580 --> 00:09:16,580
for example here, then your key has to be a capital C.

185
00:09:16,650 --> 00:09:19,593
And also, your attribute has to be a capital C.

186
00:09:20,460 --> 00:09:23,580
So effectively, when you're using a DataFrame like this,

187
00:09:23,580 --> 00:09:26,550
it's almost like you're treating it as a dictionary.

188
00:09:26,550 --> 00:09:29,580
And you're pulling out each column by the key.

189
00:09:29,580 --> 00:09:31,890
Now, when you are using the DataFrame like this,

190
00:09:31,890 --> 00:09:34,020
then you're kind of treating it more like an object.

191
00:09:34,020 --> 00:09:36,480
You're saying data attribute,

192
00:09:36,480 --> 00:09:39,660
and you get hold of the data in that column.

193
00:09:39,660 --> 00:09:42,690
So, I'm gonna restore everything to lowercase

194
00:09:42,690 --> 00:09:45,600
because I find it easier to read the code.

195
00:09:45,600 --> 00:09:48,330
But the next thing I wanna show you is a little bit harder,

196
00:09:48,330 --> 00:09:50,010
which is how do you get data,

197
00:09:50,010 --> 00:09:53,190
which are in the rows of our DataFrame.

198
00:09:53,190 --> 00:09:57,060
If I wanted to get hold of the entire row of data

199
00:09:57,060 --> 00:10:00,990
for where the day is equal to Monday,

200
00:10:00,990 --> 00:10:03,870
then the way that I would do that in pandas, is firstly,

201
00:10:03,870 --> 00:10:06,360
get a hold of my entire data table,

202
00:10:06,360 --> 00:10:10,140
and then, inside that data table, get hold of the column

203
00:10:10,140 --> 00:10:12,090
that I want to search through.

204
00:10:12,090 --> 00:10:14,820
So, I'm going to search through the day column,

205
00:10:14,820 --> 00:10:19,530
so I can use data.day or data["day"],

206
00:10:19,530 --> 00:10:21,630
both will work the same.

207
00:10:21,630 --> 00:10:25,170
But once I've got the column, then I can say, well,

208
00:10:25,170 --> 00:10:28,140
where inside that column, I wanna check

209
00:10:28,140 --> 00:10:33,140
for the row where the value is equal to Monday.

210
00:10:33,630 --> 00:10:37,200
This is basically gonna return my row that I want.

211
00:10:37,200 --> 00:10:39,960
So, I'm gonna print this out, and I'm going to

212
00:10:39,960 --> 00:10:43,830
comment out all of the previous code other than the place

213
00:10:43,830 --> 00:10:45,870
where we created our DataFrame.

214
00:10:45,870 --> 00:10:48,000
And then, I'm gonna run my code.

215
00:10:48,000 --> 00:10:51,900
And you can see it's pulled out that correct row

216
00:10:51,900 --> 00:10:54,630
where the day is equal to Monday.

217
00:10:54,630 --> 00:10:57,843
And it's given me all of the rest of the data for that row.

218
00:10:59,040 --> 00:11:00,720
So, here's a challenge for you.

219
00:11:00,720 --> 00:11:03,990
Can you figure out how to pull out the row of data

220
00:11:03,990 --> 00:11:07,590
from our weather data where the temperature

221
00:11:07,590 --> 00:11:09,240
was at the maximum?

222
00:11:09,240 --> 00:11:11,940
So, which row of data had the highest temperature

223
00:11:11,940 --> 00:11:12,960
in the week?

224
00:11:12,960 --> 00:11:14,763
Pause the video and give that a go.

225
00:11:16,020 --> 00:11:19,020
So, we know that we can get the maximum temperature

226
00:11:19,020 --> 00:11:22,950
in the temperature column just by using this code.

227
00:11:22,950 --> 00:11:25,080
Now, you can either use this method

228
00:11:25,080 --> 00:11:27,450
where you say data["temp"].

229
00:11:27,450 --> 00:11:29,760
Or, you can use the attribute .temp,

230
00:11:29,760 --> 00:11:31,260
which is the code that I prefer.

231
00:11:31,260 --> 00:11:34,440
I don't like writing a lot of strings if I can avoid it.

232
00:11:34,440 --> 00:11:38,940
In this case, we're checking to see which row inside

233
00:11:38,940 --> 00:11:42,060
our column of temperatures

234
00:11:42,060 --> 00:11:45,450
is equal to the maximum temperature.

235
00:11:45,450 --> 00:11:50,450
We would say data.temp == data.temp.max.

236
00:11:51,930 --> 00:11:54,420
And then, we're going to get our data,

237
00:11:54,420 --> 00:11:58,350
and access the row that fits that criteria.

238
00:11:58,350 --> 00:12:01,470
Now if I print out this row,

239
00:12:01,470 --> 00:12:04,140
then you can see it was the row for Sunday,

240
00:12:04,140 --> 00:12:08,100
where the temperature was 24 and the condition was sunny.

241
00:12:08,100 --> 00:12:10,860
Essentially, when we get our DataFrame

242
00:12:10,860 --> 00:12:12,960
and then we use some square brackets,

243
00:12:12,960 --> 00:12:15,420
and inside those square brackets, if we only put

244
00:12:15,420 --> 00:12:18,420
the name of our column, day, temp, or condition,

245
00:12:18,420 --> 00:12:21,000
then we would get the entire column.

246
00:12:21,000 --> 00:12:24,900
But if we filter that column by a condition,

247
00:12:24,900 --> 00:12:29,130
say when a particular column is equal to a particular value,

248
00:12:29,130 --> 00:12:32,610
then we actually get hold of the row instead.

249
00:12:32,610 --> 00:12:35,220
Now, once you've gotten hold of the data in the row,

250
00:12:35,220 --> 00:12:37,860
you can actually go one step further.

251
00:12:37,860 --> 00:12:41,280
Because we know that the row contains lots of data, right?

252
00:12:41,280 --> 00:12:45,300
What if we wanted that particular row's temperature

253
00:12:45,300 --> 00:12:47,880
or that particular row's condition?

254
00:12:47,880 --> 00:12:49,770
Well, let's say that we create a variable

255
00:12:49,770 --> 00:12:53,130
called monday, which is equal to our DataFrame,

256
00:12:53,130 --> 00:12:55,170
and then searching through that DataFrame,

257
00:12:55,170 --> 00:13:00,170
where the [data.day == "Monday"].

258
00:13:00,480 --> 00:13:03,330
So, now with this row, monday,

259
00:13:03,330 --> 00:13:07,860
we can then tap into the values under different columns

260
00:13:07,860 --> 00:13:10,680
by using the same kind of way that we got data

261
00:13:10,680 --> 00:13:13,140
in the entire column over here.

262
00:13:13,140 --> 00:13:16,350
So, we can say monday.condition.

263
00:13:16,350 --> 00:13:21,350
And if I print this out and just comment these other bits,

264
00:13:23,910 --> 00:13:26,970
then you can see that I get the actual condition

265
00:13:26,970 --> 00:13:30,510
for that particular day, which happens to be sunny.

266
00:13:30,510 --> 00:13:31,860
Now, here's a challenge.

267
00:13:31,860 --> 00:13:35,220
I want you to get Monday's temperature,

268
00:13:35,220 --> 00:13:38,010
but because my temperatures are in Celsius,

269
00:13:38,010 --> 00:13:40,650
I want you to convert it into Fahrenheit.

270
00:13:40,650 --> 00:13:42,060
So, pause the video,

271
00:13:42,060 --> 00:13:44,210
and see if you can complete that challenge.

272
00:13:46,560 --> 00:13:50,130
All right, we know we can get hold of Monday's temperature

273
00:13:50,130 --> 00:13:55,130
by saying monday., and then the name of that column

274
00:13:55,320 --> 00:14:00,320
in the row, which happens to be temp, T-E-M-P.

275
00:14:00,690 --> 00:14:02,670
And then, we get the first value in the series

276
00:14:02,670 --> 00:14:03,783
at index zero.

277
00:14:05,160 --> 00:14:08,250
And then if we wanna convert Celsius to Fahrenheit,

278
00:14:08,250 --> 00:14:12,060
all we have to do is to multiply the Celsius

279
00:14:12,060 --> 00:14:15,990
by 9 over 5, and then add 32.

280
00:14:15,990 --> 00:14:18,360
So, it'll be Monday temperature

281
00:14:18,360 --> 00:14:21,930
multiplied by 9 divided by 5,

282
00:14:21,930 --> 00:14:24,363
and then add 32.

283
00:14:31,200 --> 00:14:35,103
So, monday_temp_F, and then we can print this value out.

284
00:14:37,470 --> 00:14:41,130
So now when I run it, we get 53.6.

285
00:14:41,130 --> 00:14:43,650
And if I put Monday's temperature, 12,

286
00:14:43,650 --> 00:14:48,320
into this Google converter, I get the same value, 53.6.

287
00:14:49,440 --> 00:14:53,100
Now, the final thing I wanna show you is how you create

288
00:14:53,100 --> 00:14:55,893
a DataFrame from scratch.

289
00:14:56,820 --> 00:14:59,400
So, in our case, we created our DataFrame

290
00:14:59,400 --> 00:15:02,040
by reading from our CSV file.

291
00:15:02,040 --> 00:15:04,620
But what if you wanted to create a DataFrame

292
00:15:04,620 --> 00:15:08,310
just from some data that you're generating in Python?

293
00:15:08,310 --> 00:15:11,010
Let's say that I have this dictionary of values,

294
00:15:11,010 --> 00:15:13,800
I've got some students, and these are their names

295
00:15:13,800 --> 00:15:17,490
held in a list, and then each of them has a score.

296
00:15:17,490 --> 00:15:19,590
And the scores correspond to the students,

297
00:15:19,590 --> 00:15:23,250
so 76 is Amy's score, James scored 56,

298
00:15:23,250 --> 00:15:25,260
and Angela scored 65.

299
00:15:25,260 --> 00:15:29,760
Now, how would we create a DataFrame from this dictionary?

300
00:15:29,760 --> 00:15:31,080
It's really simple.

301
00:15:31,080 --> 00:15:33,360
We call our pandas library,

302
00:15:33,360 --> 00:15:36,300
and we get hold of the DataFrame class.

303
00:15:36,300 --> 00:15:39,840
And then, we initialize that class with some data.

304
00:15:39,840 --> 00:15:42,450
And the data, in our case, is just going to be

305
00:15:42,450 --> 00:15:44,580
our data dictionary.

306
00:15:44,580 --> 00:15:48,630
And now, if I go ahead and save this as our data,

307
00:15:48,630 --> 00:15:51,090
so I'm gonna comment out what we had previously,

308
00:15:51,090 --> 00:15:53,403
and then print out this data,

309
00:15:54,390 --> 00:15:56,910
then you can see, I've now created a table

310
00:15:56,910 --> 00:15:59,460
using the values from that dictionary.

311
00:15:59,460 --> 00:16:02,220
Now, we can go even further than this.

312
00:16:02,220 --> 00:16:05,310
When we've created a DataFrame, we can actually get

313
00:16:05,310 --> 00:16:10,050
that DataFrame to be converted to a CSV file.

314
00:16:10,050 --> 00:16:15,050
And this to_csv method takes only one required input,

315
00:16:15,330 --> 00:16:18,480
which is the path that you wanna save this file.

316
00:16:18,480 --> 00:16:21,420
So, let's just create a new file,

317
00:16:21,420 --> 00:16:23,120
which we'll call ("new_data.csv").

318
00:16:25,500 --> 00:16:28,920
And when I run this code, then watch over here,

319
00:16:28,920 --> 00:16:32,820
you see a new CSV file being created from thin air,

320
00:16:32,820 --> 00:16:36,573
and all of our data has been added to that CSV file.

321
00:16:37,890 --> 00:16:39,810
So, we're just getting a glimpse

322
00:16:39,810 --> 00:16:43,770
into how powerful this panda library can be.

323
00:16:43,770 --> 00:16:47,880
And we're only really using it to read CSV data

324
00:16:47,880 --> 00:16:49,920
and write CSV data.

325
00:16:49,920 --> 00:16:51,690
Which is a common format

326
00:16:51,690 --> 00:16:55,110
that you'll see being manipulated using Python.

327
00:16:55,110 --> 00:16:56,190
In later lessons,

328
00:16:56,190 --> 00:16:58,620
we're gonna dive deeper into data analysis,

329
00:16:58,620 --> 00:17:00,630
and we're gonna be looking not only at pandas,

330
00:17:00,630 --> 00:17:03,230
but we're also gonna be looking at NumPy,

331
00:17:03,230 --> 00:17:06,839
Matplotlib, and other libraries that make it easier for us

332
00:17:06,839 --> 00:17:09,839
to work with large chunks of data.

333
00:17:09,839 --> 00:17:10,829
In the next lesson,

334
00:17:10,829 --> 00:17:12,960
we're gonna be putting what we've learned to use

335
00:17:12,960 --> 00:17:17,099
by analyzing some squirrel data from Central Park.

336
00:17:17,099 --> 00:17:20,163
So for all of that and more, I'll see on the next lesson.

