How to use regex match objects in Python | note.nkmk.me (2024)

In Python's re module, match() and search() return match objects when a string matches a regular expression pattern. You can extract the matched string and its position using methods provided by the match object.

Contents

  • Get the matched position: start(), end(), span()
  • Extract the matched string: group()
  • Grouping in regex patterns
    • Extract each group's string: groups()
    • Get the string and position of any group
    • Nested groups
    • Set names for groups
    • Extract each group's string as a dictionary: groupdict()
  • Match objects in if statements

The sample code in this article uses the following string as an example.

import res = 'aaa@xxx.com'

For more information on how to use the functions and other features of the re module, see the following article:

  • Regular expressions with the re module in Python

Get the matched position: start(), end(), span()

When a string matches a regex pattern using match() or search(), a match object is returned.

m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(type(m))# <class 're.Match'>

You can get the position (index) of the matched substring using the match object's methods start(), end(), and span().

print(m.start())# 0print(m.end())# 11print(m.span())# (0, 11)

start() returns the beginning of the matched substring, end() returns the end, and span() returns a tuple containing the beginning and end.

Extract the matched string: group()

You can extract the matched part as a string using the match object's group() method.

m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.group())# aaa@xxx.comprint(type(m.group()))# <class 'str'>

Grouping in regex patterns

Parentheses () are used to group part of a regex pattern string.

Extract each group's string: groups()

You can extract a tuple containing the strings that matched each group using the match object's groups() method.

m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.groups())# ('aaa', 'xxx', 'com')

Get the string and position of any group

When using grouping, the group() method allows you to access the string of any group by specifying a number as an argument. If the argument is omitted or set to 0, it returns the entire match. Specifying 1 or higher returns the strings of each group in order, and a value larger than the number of groups leads to an error.

m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.group())# aaa@xxx.comprint(m.group(0))# aaa@xxx.comprint(m.group(1))# aaaprint(m.group(2))# xxxprint(m.group(3))# com# print(m.group(4))# IndexError: no such group

Supplying multiple numbers as arguments to group() returns a tuple with the corresponding strings. This way, you can select only the desired groups.

print(m.group(0, 1, 3))# ('aaa@xxx.com', 'aaa', 'com')

start(), end(), and span() work similarly to group(), but do not accept multiple values.

print(m.span())# (0, 11)print(m.span(3))# (8, 11)# print(m.span(4))# IndexError: no such group# print(m.span(0, 1))# TypeError: span expected at most 1 arguments, got 2

Nested groups

Grouping parentheses () can be nested. To retrieve the entire match string with groups(), enclose the entire pattern with (). The group order is determined by order of the (.

m = re.match(r'(([a-z]+)@([a-z]+)\.([a-z]+))', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.groups())# ('aaa@xxx.com', 'aaa', 'xxx', 'com')

Set names for groups

By adding ?P<xxx> at the start of (), you can assign a custom name to the group. Then you can specify the name instead of a number as an argument to group(), start(), end(), or span() to access the corresponding part of the string or its position.

m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.group('local'))# aaaprint(m.group('SLD'))# xxxprint(m.group('TLD'))# com

You can also use numbers, even if custom names are assigned.

print(m.group(0))# aaa@xxx.comprint(m.group(3))# comprint(m.group(0, 2, 'TLD'))# ('aaa@xxx.com', 'xxx', 'com')

Naming does not affect the result of groups().

print(m.groups())# ('aaa', 'xxx', 'com')

Extract each group's string as a dictionary: groupdict()

You can get a dictionary (dict) where the group names are keys and the matched strings are values using the groupdict() method.

m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.groupdict())# {'local': 'aaa', 'SLD': 'xxx', 'TLD': 'com'}print(type(m.groupdict()))# <class 'dict'>

Match objects in if statements

When evaluated as Boolean values, match objects are always considered True.

print(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s))# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(bool(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)))# True

match() and search() return None when there is no match, which is evaluated as False.

  • Convert bool (True, False) and other types to each other in Python
print(re.match('[0-9]+', s))# Noneprint(bool(re.match('[0-9]+', s)))# False

Therefore, to simply determine whether a match has occurred, you can use match() or search() directly or their return values in an if statement.

if re.match(r'[a-z]+@[a-z]+\.[a-z]+', s): print('match')else: print('no match')# match
if re.match('[0-9]+', s): print('match')else: print('no match')# no match

However, be aware that some regex patterns may match a zero-length string (empty string ''), which is still evaluated as True.

m = re.match('[0-9]*', s)print(m)# <re.Match object; span=(0, 0), match=''>print(m.group() == '')# Trueprint(bool(m))# Trueif re.match('[0-9]*', s): print('match')else: print('no match')# match

Be careful when using * to denote zero or more repetitions, as demonstrated in the example.

If you wish to treat a match with an empty string as a non-match, you can first evaluate the match object and then further evaluate the string obtained using the group() method.

How to use regex match objects in Python | note.nkmk.me (2024)

References

Top Articles
Latest Posts
Article information

Author: Dong Thiel

Last Updated:

Views: 6030

Rating: 4.9 / 5 (59 voted)

Reviews: 90% of readers found this page helpful

Author information

Name: Dong Thiel

Birthday: 2001-07-14

Address: 2865 Kasha Unions, West Corrinne, AK 05708-1071

Phone: +3512198379449

Job: Design Planner

Hobby: Graffiti, Foreign language learning, Gambling, Metalworking, Rowing, Sculling, Sewing

Introduction: My name is Dong Thiel, I am a brainy, happy, tasty, lively, splendid, talented, cooperative person who loves writing and wants to share my knowledge and understanding with you.