In Python's re module, match()
and search()
return match objects when a string matches a regular expression pattern. You can extract the matched string and its position using methods provided by the match object.
Contents
- Get the matched position: start(), end(), span()
- Extract the matched string: group()
- Grouping in regex patterns
- Extract each group's string: groups()
- Get the string and position of any group
- Nested groups
- Set names for groups
- Extract each group's string as a dictionary: groupdict()
- Match objects in if statements
The sample code in this article uses the following string as an example.
import res = 'aaa@xxx.com'
source: re_match_object.py
For more information on how to use the functions and other features of the re
module, see the following article:
- Regular expressions with the re module in Python
Get the matched position: start()
, end()
, span()
When a string matches a regex pattern using match()
or search()
, a match object is returned.
m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(type(m))# <class 're.Match'>
source: re_match_object.py
You can get the position (index) of the matched substring using the match object's methods start()
, end()
, and span()
.
print(m.start())# 0print(m.end())# 11print(m.span())# (0, 11)
source: re_match_object.py
start()
returns the beginning of the matched substring, end()
returns the end, and span()
returns a tuple containing the beginning and end.
Extract the matched string: group()
You can extract the matched part as a string using the match object's group()
method.
m = re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.group())# aaa@xxx.comprint(type(m.group()))# <class 'str'>
source: re_match_object.py
Grouping in regex patterns
Parentheses ()
are used to group part of a regex pattern string.
Extract each group's string: groups()
You can extract a tuple containing the strings that matched each group using the match object's groups()
method.
m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.groups())# ('aaa', 'xxx', 'com')
source: re_match_object.py
Get the string and position of any group
When using grouping, the group()
method allows you to access the string of any group by specifying a number as an argument. If the argument is omitted or set to 0
, it returns the entire match. Specifying 1
or higher returns the strings of each group in order, and a value larger than the number of groups leads to an error.
m = re.match(r'([a-z]+)@([a-z]+)\.([a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.group())# aaa@xxx.comprint(m.group(0))# aaa@xxx.comprint(m.group(1))# aaaprint(m.group(2))# xxxprint(m.group(3))# com# print(m.group(4))# IndexError: no such group
source: re_match_object.py
Supplying multiple numbers as arguments to group()
returns a tuple with the corresponding strings. This way, you can select only the desired groups.
print(m.group(0, 1, 3))# ('aaa@xxx.com', 'aaa', 'com')
source: re_match_object.py
start()
, end()
, and span()
work similarly to group()
, but do not accept multiple values.
print(m.span())# (0, 11)print(m.span(3))# (8, 11)# print(m.span(4))# IndexError: no such group# print(m.span(0, 1))# TypeError: span expected at most 1 arguments, got 2
source: re_match_object.py
Nested groups
Grouping parentheses ()
can be nested. To retrieve the entire match string with groups()
, enclose the entire pattern with ()
. The group order is determined by order of the (
.
m = re.match(r'(([a-z]+)@([a-z]+)\.([a-z]+))', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.groups())# ('aaa@xxx.com', 'aaa', 'xxx', 'com')
source: re_match_object.py
Set names for groups
By adding ?P<xxx>
at the start of ()
, you can assign a custom name to the group. Then you can specify the name instead of a number as an argument to group()
, start()
, end()
, or span()
to access the corresponding part of the string or its position.
m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.group('local'))# aaaprint(m.group('SLD'))# xxxprint(m.group('TLD'))# com
source: re_match_object.py
You can also use numbers, even if custom names are assigned.
print(m.group(0))# aaa@xxx.comprint(m.group(3))# comprint(m.group(0, 2, 'TLD'))# ('aaa@xxx.com', 'xxx', 'com')
source: re_match_object.py
Naming does not affect the result of groups()
.
print(m.groups())# ('aaa', 'xxx', 'com')
source: re_match_object.py
Extract each group's string as a dictionary: groupdict()
You can get a dictionary (dict
) where the group names are keys and the matched strings are values using the groupdict()
method.
m = re.match(r'(?P<local>[a-z]+)@(?P<SLD>[a-z]+)\.(?P<TLD>[a-z]+)', s)print(m)# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(m.groupdict())# {'local': 'aaa', 'SLD': 'xxx', 'TLD': 'com'}print(type(m.groupdict()))# <class 'dict'>
source: re_match_object.py
Match objects in if
statements
When evaluated as Boolean values, match objects are always considered True
.
print(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s))# <re.Match object; span=(0, 11), match='aaa@xxx.com'>print(bool(re.match(r'[a-z]+@[a-z]+\.[a-z]+', s)))# True
source: re_match_object.py
match()
and search()
return None
when there is no match, which is evaluated as False
.
- Convert bool (True, False) and other types to each other in Python
print(re.match('[0-9]+', s))# Noneprint(bool(re.match('[0-9]+', s)))# False
source: re_match_object.py
Therefore, to simply determine whether a match has occurred, you can use match()
or search()
directly or their return values in an if
statement.
if re.match(r'[a-z]+@[a-z]+\.[a-z]+', s): print('match')else: print('no match')# match
source: re_match_object.py
if re.match('[0-9]+', s): print('match')else: print('no match')# no match
source: re_match_object.py
However, be aware that some regex patterns may match a zero-length string (empty string ''
), which is still evaluated as True
.
m = re.match('[0-9]*', s)print(m)# <re.Match object; span=(0, 0), match=''>print(m.group() == '')# Trueprint(bool(m))# Trueif re.match('[0-9]*', s): print('match')else: print('no match')# match
source: re_match_object.py
Be careful when using *
to denote zero or more repetitions, as demonstrated in the example.
If you wish to treat a match with an empty string as a non-match, you can first evaluate the match object and then further evaluate the string obtained using the group()
method.