Skip to content

Beginner | Regular Expressions In Python

Skip to the problems!

What's a regular expression?

A regular expression (aka regex) is a special syntax that lets you match strings based on conditions. For example, the regular expression \d+\s[a-z]+ matches strings that have

  • one or more digits (\d+)
  • followed by a single space (\s)
  • followed by one or more lowercase letters between a and z ([a-z]+)

20 quick brown foxes jumped over 2 lazy dogs, 8 sleepy cats, and 4 loud crickets.

Table of regular expression patterns

Pattern Description
[abc] a or b or c
[^abc] not (a or b or c)
[a-z] a or b ... or y or z
[1-9] 1 or 2 ... or 8 or 9
\d digits [0-9]
\D non-digits [^0-9]
\s whitespace [ \t\n\r\f\v]
\S non-whitespace [^ \t\n\r\f\v]
\w alphanumeric [a-zA-Z0-9_]
\W non-alphanumeric [^a-zA-Z0-9_]
. any character
x* zero or more repetitions of x
x+ one or more repetitions of x
x? zero or one repetitions of x
{m} m repetitions
{m,n} m to n repetitions
{m,n} m to n repetitions
\\, \., \* backslash, period, asterisk
\b word boundary
^hello starts with hello
bye$ ends with bye
(...) capture group
(po|go) po or go

How do regular expressions work in Python?

In Python, regular expressions are managed by the re module.

Table of regular expression functions in Python

Function Description Return Value
re.findall(pattern, string, flags=0) Find all non-overlapping occurrences of pattern in string list of strings, or list of tuples if > 1 capture group
re.finditer(pattern, string, flags=0) Find all non-overlapping occurrences of pattern in string iterator yielding match objects
re.search(pattern, string, flags=0) Find first occurrence of pattern in string match object or None
re.split(pattern, string, maxsplit=0, flags=0) Split string by occurrences of pattern list of strings
re.sub(pattern, repl, string, count=0, flags=0) Replace pattern with repl new string with the replacement(s)

What about re.compile()?

The following regular expression searches have equivalent logic...

import re
pat = re.compile("[A-Z][a-z]+") # (1)!
pat.findall("Hi, I'm Bob.")
# ['Hi', 'Bob']
  1. One uppercase letter followed by one or more lowercase letters
import re
re.findall(pattern="[A-Z][a-z]+", string="Hi, I'm Bob.")
# ['Hi', 'Bob']

but the first version compiles the regular expression into a re.Pattern object.

type(pat)  # <class 're.Pattern'>

This can boost performance in cases where you use the same regular expression repeatedly.