You (aka Dr Semmelweis) are working at Vienna General Hospital. Let's take a closer look at the data you've been collecting on the number of births and maternal deaths throughout the 1840s.


Challenge 1: Preliminary Data Exploration


.

.

..

...

..

.

.


Solution to Challenge 1

Using .shape, .head(), .tail() we see that the dataset covers the years 1841 to 1849. The two tables report the total number of births and the total number of deaths. Interestingly, the yearly data breaks the number of birthds and deaths down by clinic.

We see that there are no NaN values in either of the DataFrames. We can verify this either with using .info() or using .isna().values.any().

There are also no duplicate entries. In other words, the dataset appears to be clean.

Using .describe() allows us to view some interesting statistics at a glance. We see that on average there were about 267 births and 22.47 deaths per month.

Challenge 2: Percentage of Women Dying in Childbirth

How dangerous was childbirth in the 1840s in Vienna?

In comparison, the United States recorded 18.5 maternal deaths per 100,000 or 0.018% in 2013 (source).


.

.

..

...

..

.

.



Solution to Challenge 2

Childbirth was very risky! About 7.08% of women died in the 1840s (compared to 0.018% in the US in 2013).

prob = df_yearly.deaths.sum() / df_yearly.births.sum() * 100
print(f'Chances of dying in the 1840s in Vienna: {prob:.3}%')

If someone gave me a bag of 100 M&Ms and told me that 7 of them would kill me, I'd (probably) pass on those M&Ms . Just saying.


Challenge 3: Visualise the Total Number of Births and Deaths over Time

Create a Matplotlib chart with twin y-axes. It should look something like this:


.

.

..

...

..

.

.


Solution to Challenge 3

Just as in previous notebooks we can use .twinx() to create to y-axes. Then it's just a matter of adding a gird with .grid() and configuring the look of our plots with the color, linewidth, and linestyle parameters.

plt.figure(figsize=(14,8), dpi=200)
plt.title('Total Number of Monthly Births and Deaths', fontsize=18)

ax1 = plt.gca()
ax2 = ax1.twinx()

ax1.grid(color='grey', linestyle='--')

ax1.plot(df_monthly.date, 
         df_monthly.births, 
         color='skyblue', 
         linewidth=3)

ax2.plot(df_monthly.date, 
         df_monthly.deaths, 
         color='crimson', 
         linewidth=2, 
         linestyle='--')

plt.show()

To get the tickmarks showing up on the x-axis, we need to use mdates and Matplotlib's locators.

# Create locators for ticks on the time axis
years = mdates.YearLocator()
months = mdates.MonthLocator()
years_fmt = mdates.DateFormatter('%Y') 

We can then use the locators in our chart:

plt.figure(figsize=(14,8), dpi=200)
plt.title('Total Number of Monthly Births and Deaths', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(fontsize=14, rotation=45)

ax1 = plt.gca()
ax2 = ax1.twinx()

ax1.set_ylabel('Births', color='skyblue', fontsize=18)
ax2.set_ylabel('Deaths', color='crimson', fontsize=18)

# Use Locators
ax1.set_xlim([df_monthly.date.min(), df_monthly.date.max()])
ax1.xaxis.set_major_locator(years)
ax1.xaxis.set_major_formatter(years_fmt)
ax1.xaxis.set_minor_locator(months)

ax1.grid(color='grey', linestyle='--')

ax1.plot(df_monthly.date, 
         df_monthly.births, 
         color='skyblue', 
         linewidth=3)

ax2.plot(df_monthly.date, 
         df_monthly.deaths, 
         color='crimson', 
         linewidth=2, 
         linestyle='--')

plt.show()

What we see is that something happened after 1847. The total number of deaths seems to have dropped, despite an increasing number of births!