Skip to content

Groundhog Day Problem


Here's a quote from Groundhog Day.

quote = """Once a year, the eyes of the nation turn here, to this tiny
hamlet in Pennsylvania, to watch a master at work. The master?
Punxsutawney Phil, the world's most famous weatherman, the
groundhog, who, as legend has it, can predict the coming of an
early spring. And here's the big moment we've all been waiting for. Let's just
see what Mr. Groundhog has to say. Hey! Over here, you little
weasel! Well, that's it. Sorry you couldn't be here in person to
share the electric moment. This is one event where television
really fails to capture the excitement of thousands of people
gathered to watch a large squirrel predict the weather, and
I for one am deeply grateful to have been a part of it.
Reporting for Channel 9, this is Phil Connors.
"""

Find all substrings that, ignoring case sensitivity,

  • begin with one of these words: ['the', 'this', 'to', 'in']
  • end with one of these words: ['phil', 'weatherman', 'groundhog', 'pennsylvania', 'master']
  • have 30 or fewer characters in between the begin and end word (including spaces and newline characters).

Keep the earliest identified, non-overlapping, non-nested substrings when scanning from left to right.

starters = ['the', 'this', 'to', 'in']
enders = ['phil', 'weatherman', 'groundhog', 'pennsylvania', 'master']
Expected result
expected = [
    'to this tiny\nhamlet in Pennsylvania', 
    'to watch a master', 
    'The master', 
    "the world's most famous weatherman", 
    'the\ngroundhog', 
    'this is Phil'
]
Once a year, the eyes of the nation turn here, to this tiny
hamlet in Pennsylvania, to watch a master at work. The master?
Punxsutawney Phil, the world's most famous weatherman, the
groundhog, who, as legend has it, can predict the coming of an
early spring. And here's the big moment we've all been waiting for. Let's just
see what Mr. Groundhog has to say. Hey! Over here, you little
weasel! Well, that's it. Sorry you couldn't be here in person to
share the electric moment. This is one event where television
really fails to capture the excitement of thousands of people
gathered to watch a large squirrel predict the weather, and
I for one am deeply grateful to have been a part of it.
Reporting for Channel 9, this is Phil Connors.

Note that the result includes

to watch a master

and

The master

but not

to watch a master at work. The master

Regex Functions
Function Description Return Value
re.findall(pattern, string, flags=0) Find all non-overlapping occurrences of pattern in string list of strings, or list of tuples if > 1 capture group
re.finditer(pattern, string, flags=0) Find all non-overlapping occurrences of pattern in string iterator yielding match objects
re.search(pattern, string, flags=0) Find first occurrence of pattern in string match object or None
re.split(pattern, string, maxsplit=0, flags=0) Split string by occurrences of pattern list of strings
re.sub(pattern, repl, string, count=0, flags=0) Replace pattern with repl new string with the replacement(s)
Regex Patterns
Pattern Description
[abc] a or b or c
[^abc] not (a or b or c)
[a-z] a or b ... or y or z
[1-9] 1 or 2 ... or 8 or 9
\d digits [0-9]
\D non-digits [^0-9]
\s whitespace [ \t\n\r\f\v]
\S non-whitespace [^ \t\n\r\f\v]
\w alphanumeric [a-zA-Z0-9_]
\W non-alphanumeric [^a-zA-Z0-9_]
. any character
x* zero or more repetitions of x
x+ one or more repetitions of x
x? zero or one repetitions of x
{m} m repetitions
{m,n} m to n repetitions
{m,n} m to n repetitions
\\, \., \* backslash, period, asterisk
\b word boundary
^hello starts with hello
bye$ ends with bye
(...) capture group
(po|go) po or go

Try with Google Colab