Now let's look at how things have changed over time. This will give us a chance to review what we learnt about creating charts with two y-axes in Matplotlib and generating arrays with NumPy.
Are more prizes awarded recently than when the prize was first created? Show the trend in awards visually.
Count the number of prizes awarded every year.
Create a 5 year rolling average of the number of prizes (Hint: see previous lessons analysing Google Trends).
Using Matplotlib superimpose the rolling average on a scatter plot.
Show a tick mark on the x-axis for every 5 years from 1900 to 2020. (Hint: you'll need to use NumPy).

Use the named colours to draw the data points in dogerblue while the rolling average is coloured in crimson.

Looking at the chart, did the first and second world wars have an impact on the number of prizes being given out?
What could be the reason for the trend in the chart?
Investigate if more prizes are shared than before.
Calculate the average prize share of the winners on a year by year basis.
Calculate the 5 year rolling average of the percentage share.
Copy-paste the cell from the chart you created above.
Modify the code to add a secondary axis to your Matplotlib chart.
Plot the rolling average of the prize share on this chart.
See if you can invert the secondary y-axis to make the relationship even more clear.
.
.
..
...
..
.
.
Solution 1: Number of Prizes Awarded over Time
First, we have to count the number of Nobel prizes that are awarded each year.
prize_per_year = df_data.groupby(by='year').count().prize
This just involves grouping the data so that we can count the number of entries per year. To calculate the 5-year moving average we use .rolling() and .mean() like we did with the Google Trend data.
moving_average = prize_per_year.rolling(window=5).mean()
Now we can create a Matplotlib chart that superimposes the two:
plt.scatter(x=prize_per_year.index,
y=prize_per_year.values,
c='dodgerblue',
alpha=0.7,
s=100,)
plt.plot(prize_per_year.index,
moving_average.values,
c='crimson',
linewidth=3,)
plt.show()
With the help of a little styling, this chart could look better. To create 5-year tick marks on the x-axis, we generate an array using NumPy:
np.arange(1900, 2021, step=5)
Then we tap into functions like the .figure(), the .title(), the .xticks(), and .yticks() to fine-tune the chart.
In addition, we will shortly be adding a second y-axis, so we can use an Axes object to draw our scatter and line plots.
plt.figure(figsize=(16,8), dpi=200)
plt.title('Number of Nobel Prizes Awarded per Year', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(ticks=np.arange(1900, 2021, step=5),
fontsize=14,
rotation=45)
ax = plt.gca() # get current axis
ax.set_xlim(1900, 2020)
ax.scatter(x=prize_per_year.index,
y=prize_per_year.values,
c='dodgerblue',
alpha=0.7,
s=100,)
ax.plot(prize_per_year.index,
moving_average.values,
c='crimson',
linewidth=3,)
plt.show()
Solution 2: The Prize Share of Laureates over Time
Now we can work out the rolling average of the percentage share of the prize. If more prizes are given out, perhaps it is because the prize is split between more people.
yearly_avg_share = df_data.groupby(by='year').agg({'share_pct': pd.Series.mean})
share_moving_average = yearly_avg_share.rolling(window=5).mean()If more people get the prize, then the average share should go down, right?
plt.figure(figsize=(16,8), dpi=200)
plt.title('Number of Nobel Prizes Awarded per Year', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(ticks=np.arange(1900, 2021, step=5),
fontsize=14,
rotation=45)
ax1 = plt.gca()
ax2 = ax1.twinx() # create second y-axis
ax1.set_xlim(1900, 2020)
ax1.scatter(x=prize_per_year.index,
y=prize_per_year.values,
c='dodgerblue',
alpha=0.7,
s=100,)
ax1.plot(prize_per_year.index,
moving_average.values,
c='crimson',
linewidth=3,)
# Adding prize share plot on second axis
ax2.plot(prize_per_year.index,
share_moving_average.values,
c='grey',
linewidth=3,)
plt.show()
To see the relationship between the number of prizes and the laureate share even more clearly we can invert the second y-axis.
plt.figure(figsize=(16,8), dpi=200)
plt.title('Number of Nobel Prizes Awarded per Year', fontsize=18)
plt.yticks(fontsize=14)
plt.xticks(ticks=np.arange(1900, 2021, step=5),
fontsize=14,
rotation=45)
ax1 = plt.gca()
ax2 = ax1.twinx()
ax1.set_xlim(1900, 2020)
# Can invert axis
ax2.invert_yaxis()
ax1.scatter(x=prize_per_year.index,
y=prize_per_year.values,
c='dodgerblue',
alpha=0.7,
s=100,)
ax1.plot(prize_per_year.index,
moving_average.values,
c='crimson',
linewidth=3,)
ax2.plot(prize_per_year.index,
share_moving_average.values,
c='grey',
linewidth=3,)
plt.show()What do we see on the chart? Well, there is clearly an upward trend in the number of prizes being given out as more and more prizes are shared. Also, more prizes are being awarded from 1969 onwards because of the addition of the economics category. We also see that very few prizes were awarded during the first and second world wars. Note that instead of there being a zero entry for those years, we instead see the effect of the wards as missing blue dots.
