Skip to content

Concerts Problem


Given a dataset of concerts, count the number of concerts per (artist, venue), per yearmonth. Make the resulting table be a wide table - one row per yearmonth with a column for each unique (artist, venue) pair. Use the cross product of the artists and venues Series to determine which (artist, venue) pairs to include in the result.

import numpy as np
import pandas as pd

generator = np.random.default_rng(1357)
artists = ['Mouse Rat', 'Binary Brothers', 'Lady Gigabyte', 'The Rolling Sums']
venues = ['Super Dome', 'STAPLES Center', 'Madison Square Garden']

concerts = pd.DataFrame({
    'artist':generator.choice(['Mouse Rat', 'Binary Brothers', 'Lady Gigabyte', 'Just-in-Timeberlake'], size=200, replace=True),
    'venue':generator.choice(venues + ['Red Rocks'], size=200, replace=True),
    'date':pd.to_datetime('2020-01-01') + pd.to_timedelta(generator.choice(365, size=200, replace=True), unit='D'),
}).drop_duplicates().convert_dtypes()

print(concerts)
#                   artist                  venue       date
# 0        Binary Brothers         STAPLES Center 2020-04-10
# 1        Binary Brothers         STAPLES Center 2020-07-31
# 2              Mouse Rat  Madison Square Garden 2020-06-21
# 3              Mouse Rat              Red Rocks 2020-04-17
# 4        Binary Brothers              Red Rocks 2020-10-05
# ..                   ...                    ...        ...
# 195        Lady Gigabyte              Red Rocks 2020-08-10
# 196            Mouse Rat  Madison Square Garden 2020-06-30
# 197      Binary Brothers             Super Dome 2020-05-02
# 198  Just-in-Timeberlake  Madison Square Garden 2020-06-21
# 199      Binary Brothers  Madison Square Garden 2020-04-14
# 
# [200 rows x 3 columns]

Try with Google Colab