Regular Expression

Regex works on characters, not words. Concatenation is implied.

In general, there are 3 parts of regex: anchors, character sets, and modifiers.

Notice: Python use \ to escape character the same as regex, using r prefix to ignore escape character in string.

Anchors

Anchors specify the position of the pattern with respect to the line.

regex match pattern
^ beginning
$ end

Character sets

The bread and butter of regex.

regex match pattern
\d a single digit
\w a single digit or letter
. any character
\s a space
[] any character in the set

Modifiers

A modifier changes the meaning of the character preceding it.

regex match pattern
* 0 or more characters preceding it
+ 1 or more characters preceding it
? 0 or 1 character preceding it
{n} n characters preceding it
{n, m} n-m characters preceding it
{n,} at least n characters preceding it

Advanced

  • A|B matches A or B.
  • () group a substring to extract.
    With group, we can extract substring with group() method on Match object. group(0) is the origin string, group(1) is the 1st substring and so on.

Reference