Skip to content

Movie Distance Problem


You're building a Netflix clone . You have a dataset of movie reviews, where each review is a ( user_id, movie_id, rating) triplet.

  • movie_ids are integers in the range [0, Nmovies)
  • user_ids are integers in the range [0, Nusers)
  • ratings are integers in the range [1, 5]
import random

Nmovies = 10
Nusers = 10
Nreviews = 30

movie_ids = random.choices(range(Nmovies), k=Nreviews)
user_ids = random.choices(range(Nusers), k=Nreviews)
ratings = random.choices(range(1,6), k=Nreviews)

print(movie_ids)
# [4, 9, 8, 7, 3, ... ]

print(user_ids)
# [1, 7, 4, 1, 2, ... ]

print(ratings)
# [1, 3, 2, 1, 1, ... ]
  1. Build a compressed sparse matrix where (i,j) gives the ith person's review of movie j.
  2. Normalize the movie vectors (column vectors) so that each of them has unit length.
  3. Calculate the Euclidean distance between normalized movie 2 and normalized movie 4.

For example
if our Netflix clone had three users and two movies with a review matrix like this

[[1 0]
 [0 1]
 [3 0]]

The normalized movie vectors would be

[[0.32  0. ]
 [0.    1. ]
 [0.95  0. ]]

The Euclidean distance between these two normalized movie vectors is 1.41.


Try with Google Colab