Skip to content

Potholes Problem


Fed up with your city’s roads, you go around collecting data on potholes in your area. Due to an unfortunate ☕ coffee spill, you lost bits and pieces of your data.

import numpy as np
import pandas as pd

potholes = pd.DataFrame({
    'length':[5.1, np.nan, 6.2, 4.3, 6.0, 5.1, 6.5, 4.3, np.nan, np.nan],
    'width':[2.8, 5.8, 6.5, 6.1, 5.8, np.nan, 6.3, 6.1, 5.4, 5.0],
    'depth':[2.6, np.nan, 4.2, 0.8, 2.6, np.nan, 3.9, 4.8, 4.0, np.nan],
    'location':pd.Series(['center', 'north edge', np.nan, 'center', 'north edge', 'center', 'west edge',
                          'west edge', np.nan, np.nan], dtype='string')
})

print(potholes)
#    length  width  depth    location
# 0     5.1    2.8    2.6      center
# 1     NaN    5.8    NaN  north edge
# 2     6.2    6.5    4.2        <NA>
# 3     4.3    6.1    0.8      center
# 4     6.0    5.8    2.6  north edge
# 5     5.1    NaN    NaN      center
# 6     6.5    6.3    3.9   west edge
# 7     4.3    6.1    4.8   west edge
# 8     NaN    5.4    4.0        <NA>
# 9     NaN    5.0    NaN        <NA>

Given your DataFrame of pothole measurements, discard rows where more than half the values are NaN, elsewhere impute NaNs with the average value per column unless the column is non-numeric, in which case use the mode.


Try with Google Colab