Lets turn our attention to the Genres column. This is quite similar to the categories column but more granular.


Challenge

How many different types of genres are there? Can an app belong to more than one genre? Check what happens when you use .value_counts() on a column with nested values? See if you can work around this problem by using the .split() function and the DataFrame's .stack() method.





.

.

..

..

.

.



Solution: Working with Nested Column Data

If we look at the number of unique values in the Genres column we get 114. But this is not accurate if we have nested data like we do here. We can see this using .value_counts() and looking at the values that just have a single entry. There we see that the semi-colon (;) separates the genre names.

We somehow need to separate the genre names to get a clear picture. This is where the strings .split() method comes in handy. After weve separated our genre names based on the semi-colon, we can add them all into a single column with .stack() and then use .value_counts().


# Split the strings on the semi-colon and then .stack them.
stack = df_apps_clean.Genres.str.split(';', expand=True).stack()
print(f'We now have a single column with shape: {stack.shape}')
num_genres = stack.value_counts()
print(f'Number of genres: {len(num_genres)}')

This shows us we actually have 53 different genres.

Challenge

Can you create this chart with the Series containing the genre data?

Try experimenting with the built-in colour scales in Plotly. You can find a full list here.







.

.

..

..

.

.



Solution: Working with Colour Scales in Plotly


bar = px.bar(x = num_genres.index[:15], # index = category name
             y = num_genres.values[:15], # count
             title='Top Genres',
             hover_name=num_genres.index[:15],
             color=num_genres.values[:15],
             color_continuous_scale='Agsunset')

bar.update_layout(xaxis_title='Genre',
yaxis_title='Number of Apps',
coloraxis_showscale=False)

bar.show()