Skip to content

Similar Names


Here's a CSV file with 1,000 distinct U.S. baby 👶 names (all lowercase).

babynames_1000.csv
   1:   aaden
   2: aaliyah
   3:    abby
   4:    abel
   5: abigail
  ---        
 996:  zander
 997:    zane
 998:    zara
 999:    zion
1000:     zoe

How many distinct (A, B) pairs of names have Levenshtein distance ≤ 3?

Distinct entries

If your result includes (aaden, allen), make sure it doesn't also include (allen, aaden).

Loading the data

You can load the data directly from GitHub.

import pandas as pd
names = pd.read_csv("https://raw.githubusercontent.com/practiceprobs/datasets/main/babynames/babynames_1000.csv")
library(data.table)
names <- fread("https://raw.githubusercontent.com/practiceprobs/datasets/main/babynames/babynames_1000.csv")