Skip to content

Puny Computer Problem

Build a compressed sparse row matrix from this file of text

hey hey my name is ryan
hey ryan my name is marissa
it is nice to meet you
it is nice to meet you too
did you know that seals get seasick if you put them on a boat
i am surprised
i have lots of other animal facts do you want to hear other animal facts
no thanks i have to be somewhere
okay can i get your number

such that:

  • row i of the matrix represents row i of the file.
  • column j of the matrix represents the jth unique word observed in the file.
  • element ij represents the number of times word j was observed in line i.

There’s a catch..

Your computer is so puny that it can’t fit the entire file into memory at one time. It can fit each line into memory, just not the entire file.

Try with Google Colab