How old are the Nobel laureates at the time when they win the prize? Does this vary by category? Also, how has the age of the laureates changed over time?


Challenge 1

Calculate the age of the laureate in the year of the ceremony and add this as a column called winning_age to the df_data DataFrame. Hint: you can use this to help you.


Challenge 2

Who were the oldest and the youngest winners?


Challenge 3


Challenge 4

Are Nobel laureates being nominated later in life than before? Have the ages of laureates at the time of the award increased or decreased over time?


Challenge 5

How does the age of laureates vary by category?


Challenge 6



.

.

..

...

..

.

.



Solution 1: Calculate the Age at the Time of Award

First, we need to extract the year as a number from the birth_date column:

birth_years = df_data.birth_date.dt.year

Now we can work out the age at the time of the award:

df_data['winning_age'] = df_data.year - birth_years



Solution 2: Oldest and Youngest Winners

display(df_data.nlargest(n=1, columns='winning_age'))
display(df_data.nsmallest(n=1, columns='winning_age'))

John Goodenough was 97 years old when he got the Nobel prize!!! Holy moly. Interestingly John was born to American parents while they were in Germany. This is one example where our analysis of countries counts an extra "German" prize even though he is an American citizen. Too bad we don't have a nationality column in our dataset! Nonetheless, this goes to show it is never too late to win a Nobel prize. I'm keeping my fingers crossed for you!


Solution 3: Descriptive Statistics and Histogram

Using .describe() is a fantastic way to get a feeling for how the numbers are distributed in a particular column. However, actually visualising them on a histogram to see their distribution is highly recommended too since it allows us to see if we have a bell-shaped curve or something else.

Here's what the histogram looks like:

plt.figure(figsize=(8, 4), dpi=200)
sns.histplot(data=df_data,
             x=df_data.winning_age,
             bins=30)
plt.xlabel('Age')
plt.title('Distribution of Age on Receipt of Prize')
plt.show()


Solution 4: Winning Age Over Time (All Categories)

The histogram above shows us the distribution across the entire dataset, over the entire time period. But perhaps the age has changed over time.

plt.figure(figsize=(8,4), dpi=200)
with sns.axes_style("whitegrid"):
    sns.regplot(data=df_data,
                x='year',
                y='winning_age',
                lowess=True, 
                scatter_kws = {'alpha': 0.4},
                line_kws={'color': 'black'})

plt.show()

Using the lowess parameter allows us to plot a local linear regression. This means the best fit line is still linear, but it's more like a moving average which gives us a non-linear shape across the entire series. This is super neat because it clearly shows how the Nobel laureates are getting their award later and later in life. From 1900 to around 1950, the laureates were around 55 years old, but these days they are closer to 70 years old when they get their award! The other thing that we see in the chart is that in the last 10 years the spread has increased. We've had more very young and very old winners. In 1950s/60s winners were between 30 and 80 years old. Lately, that range has widened.


Solution 5: Age Differences between Categories

Seaborn allows us to create the above chart by category. But first, let's look at a box plot by category.

plt.figure(figsize=(8,4), dpi=200)
with sns.axes_style("whitegrid"):
    sns.boxplot(data=df_data,
                x='category',
                y='winning_age')

plt.show()

The box plot shows us the mean, the quartiles, the maximum and the minimum values. It raises an interesting question: "Are peace prize winners really older than physics laureates?".


Solution 6: Laureate Age over Time by Category

To get a more complete picture, we should look at how the age of winners has changed over time. The box plot above looked at the dataset as a whole.

with sns.axes_style('whitegrid'):
    sns.lmplot(data=df_data,
               x='year', 
               y='winning_age',
               row = 'category',
               lowess=True, 
               aspect=2,
               scatter_kws = {'alpha': 0.6},
               line_kws = {'color': 'black'},)

plt.show()

We see that winners in physics, chemistry, and medicine have gotten older over time. The ageing trend is strongest for physics. The average age used to be below 50, but now it's over 70. Economics, the newest category, is much more stable in comparison. The peace prize shows the opposite trend where winners are getting younger! As such, our scatter plots showing the best fit lines over time and our box plot of the entire dataset can tell very different stories!

To combine all these charts into the same chart, we can use the hue parameter

with sns.axes_style("whitegrid"):
    sns.lmplot(data=df_data,
               x='year',
               y='winning_age',
               hue='category',
               lowess=True, 
               aspect=2,
               scatter_kws={'alpha': 0.5},
               line_kws={'linewidth': 5})

plt.show()


Source: smbc-comics.com