Python RegEx Archives (2024)

This tutorial describes the usage of regular expressions in Python. In this lesson, we will explain how to use Python's RE module for pattern matching with regular expressions.

Python regex is an abbreviation of Python's regular expression. This tutorial regex tutorial starts with the basics and gradually covers more advanced regex techniques and methods.

This tutorial covers the followings

  • Python RE module
  • Regular expressions and their syntax
  • Regex methods and objects
  • Regex Metacharacters, special sequences, and character classes
  • Regex option flags
  • Capturing groups
  • Extension notations and assertions
  • A real-world example of regular expression

RegEx Series

This Python Regex series contains the following in-depth tutorial. You can directly read those.

  • Python regex compile: Compile a regular expression pattern provided as a string into a re.Pattern object.
  • Python regex match: A Comprehensive guide for pattern matching.
  • Python regex search: Search for the first occurrences of the regex pattern inside the target string.
  • Python regex find all matches: Scans the regex pattern through the entire string and returns all matches.
  • Python regex split: Split a string into a list of matches as per the given regular expression pattern.
  • Python Regex replace: Replace one or more occurrences of a pattern in the string with a replacement.
  • Python regex capturing groups: Match several distinct patterns inside the same target string.
  • Python regex metacharacters and operators: Metacharacters are special characters that affect how the regular expressions around them are interpreted.
  • Python regex special sequences and character classes: special sequence represents the basic predefined character classes.
  • Python regex flags: All RE module methods accept an optional flags argument used to enable various unique features and syntax variations.
  • Python regex span(), start(), and end(): To find match positions.

What are regular expressions?

The Regex or Regular Expression is a way to define a pattern for searching or manipulating strings.We can use a regular expression to match, search, replace, and manipulate inside textual data.

In simple words, the regex pattern Jessa will match to name Jessa.

Also, you can write a regex pattern to validate a password with some predefined constraints, such as the password must contain at least one special character, digit, and one upper case letter. If the pattern matches against the password, we can say that password is correctly constructed.

Also, Regular expressions are instrumental in extracting information from text such as log files, spreadsheets, or even textual documents.

For example, Below are some of the cases where regular expressions can help you to save a lot of time.

  • Searching and replacing text in files
  • Validating text input, such as password and email address
  • Rename a hundred files at a time. For example, You can change the extension of all files using a regex pattern

The re module

We will start this tutorial by using the RE module, a built-in Python module that provides all the required functionality needed for handling patterns and regular expressions.

Type import re at the start of your Python file, and you are ready to use the re module's methods and special characters. To get to know the RE module's functionality, methods, and attributes, use the help function.

Just Pass the module's name as an argument to the help function like this print(help(re)) . It will show hundreds of lines simply because this module is vast and comprehensive.

Now let's how to use the re module to perform regex pattern matching in Python.

Example 1: Write a regular expression to search digit inside a string

Now, let's see how to use the Python re module to write the regular expression. Let's take a simple example of a regular expression to check if a string contains a number.

For this example, we will use the ( \d) metacharacter, we will discuss regex metacharacters in detail in the later section of this article.

As of now, keep in mind that a \d is a special sequence that matches any digit between 0 to 9.

# import RE moduleimport retarget_str = "My roll number is 25"res = re.findall(r"\d", target_str)# extract mathing valueprint(res) # Output [2, 5]

Understand this example

  1. We imported the RE module into our program
  2. Next, We created a regex pattern \d to match any digit between 0 to 9.
  3. After that, we used the re.findall() method to match our pattern.
  4. In the end, we got two digits 2 and 5.

Use raw string to define a regex

Note: I have used a raw string to define a pattern like this r"\d". Always write your regex as a raw string.

As you may already know, the backslash has a special meaning in some cases because it may indicate an escape character or escape sequence. To avoid that always use a raw string.

For example, let's say that in Python we are defining a string that is actually a path to an exercise folder like this path = "c:\example\task\new".

Now, let's assume you wanted to search this path inside a target string using a regular expression. let's write code for the same.

import reprint("without raw string:")# path_to_search = "c:\example\task\new"target_string = r"c:\example\task\new\exercises\session1"# regex patternpattern = "^c:\\example\\task\\new"# \n and \t has a special meaning in Python# Python will treat them differentlyres = re.search(pattern, target_string)print(res.group())

Notice that inside the pattern we have two escape characters \t and \n. If you execute the above code you will the re.error: bad escape error because \n and \thas a special meaning in Python.

To avoid such issues, always write a regex pattern using a raw string. The character r denotes the raw string.

Now replace the existing pattern with pattern = r"^c:\\example\\task\\new" and execute our code again. Now you can get the following output.

with raw string:Matching path: ['c:\\example\\task\\new']

Python regex methods

The Python regex module consists of multiple methods. below is the list of regex methods and their meaning.

Click on each method name to study it in detail.

MethodDescription
re.compile('pattern')Compile a regular expression pattern provided as a string into a re.Pattern object.
re.search(pattern, str)Search for occurrences of the regex pattern inside the target string and return only the first match.
re.match(pattern, str)Try to match the regex pattern at the start of the string. It returns a match only if the pattern is located at the beginning of the string.
re.fullmatch(pattern, str)Match the regular expression pattern to the entire string from the first to the last character.
re.findall(pattern, str)Scans the regex pattern through the entire string and returns all matches.
re.finditer(pattern, str)Scans the regex pattern through the entire string and returns an iterator yielding match objects.
re.split(pattern, str)It breaks a string into a list of matches as per the given regular expression pattern.
re.sub(pattern, replacement, str)Replace one or more occurrences of a pattern in the string with a replacement.
re.subn(pattern, replacement, str)Same as re.sub(). The difference is it will return a tuple of two elements.
First, a new string after all replacement, and second the number of replacements it has made.

Example 2: How to use regular expression in Python

Let's see how to use all regex methods.

# import the RE moduleimport retarget_string = "Jessa salary is 8000$"# compile regex pattern# pattern to match any characterstr_pattern = r"\w"pattern = re.compile(str_pattern)# match regex pattern at start of the stringres = pattern.match(target_string)# match characterprint(res.group()) # Output 'J'# search regex pattern anywhere inside string# pattern to search any digitres = re.search(r"\d", target_string)print(res.group())# Output 8# pattern to find all digitsres = re.findall(r"\d", target_string)print(res) # Output ['8', '0', '0', '0']# regex to split string on whitespacesres = re.split(r"\s", target_string)print("All tokens:", res)# Output ['Jessa', 'salary', 'is', '8000$']# regex for replacement# replace space with hyphenres = re.sub(r"\s", "-", target_string)# string after replacement:print(res)# Output Jessa-salary-is-8000$

The Match object methods

Also, whenever we found a match to the regex pattern, Python returns us the Match object. Later we can use the following methods of a re.Match object to extract the matched values and positions.

MethodMeaning
group()Return the string matched by the regex pattern. See capturing groups.
groups()Returns a tuple containing the strings for all matched subgroups.
start()Return the start position of the match.
end()Return the end position of the match.
span()Return a tuple containing the (start, end) positions of the match.

We can use both the special and ordinary characters inside a regular expression. For example, Most ordinary characters, like 'A', 'p', are the simplest regular expressions; they match themselves. You can concatenate ordinary characters, so the PYnative pattern matches the string 'PYnative'.

Apart from fo this we also have special characters. For example, characters like '|', '+', or '*', are special. Specialmetacharactersdon’t match themselves. Instead, they indicate that some rules. Special characters affect how the regular expressions around them are interpreted.

Read more on Regex Metacharacters Guide

Click on see the example to study it in detail.

MetacharacterDescription
. (DOT)Matches any character except a newline.
See example
^ (Caret)Matches pattern only at the start of the string.
See example
$ (Dollar)Matches pattern at the end of the string.
See example
* (asterisk)Matches 0 or more repetitions of the regex.
See example
+ (Plus)Match 1 or more repetitions of the regex.
See example
? (Question mark)Match 0 or 1 repetition of the regex.
See example
[] (Square brackets)Used to indicate a set of characters. Matches any single character in brackets. For example, [abc] will match either a, or, b, or c character.
See example
| (Pipe)used to specify multiple patterns. For example, P1|P2, whereP1andP2are two different regexes.
\ (backslash)Use to escape special characters or signals a special sequence. For example, If you are searching for one of the special characters you can use a \ to escape them.
See example
[^...]Matches any single character not in brackets.
(...)Matches whatever regular expression is inside the parentheses. For example, (abc) will match to substring 'abc'

Regex special sequences (a.k.a. Character Classes)

The special sequences consist of '\' and a character from the list below. Each special sequence has a unique meaning.

The following special sequences have a pre-defined meaning and make specific common patterns more comfortable to use. For example, you can use \d as a simplified definition for [0..9] or \w as a simpler version of [a-zA-z].

Read more on Guide on Regex special sequences

Click on each special sequence to study it in detail.

Special SequenceMeaning
\AMatches pattern only at the start of the string.
See example
\ZMatches pattern only at the end of the string.
\dMatches to any digit. Short for character classes[0-9].
See example
\DMatches to any non-digit. short for[^0-9].
\sMatches any whitespace character. short for character class [ \t\n\x0b\r\f].
See example
\SMatches any non-whitespace character. Short for [^ \t\n\x0b\r\f].
\wMatches any alphanumeric character. Short for character class [a-zA-Z_0-9].
See example
\WMatches any non-alphanumeric character. Short for [^a-zA-Z_0-9]
\bMatches the empty string, but only at the beginning or end of a word. Matches a word boundary where a word character is [a-zA-Z0-9_].
For example, '\bJessa\b' matches 'Jessa', 'Jessa.', '(Jessa)', 'Jessa Emma Kelly' but not 'JessaKelly' or 'Jessa5'.
See example
\BOpposite of a \b. Matches the empty string, but only when it is not at the beginning or end of a word

Regex Quantifiers

We use quantifiers to define quantities. A quantifier is a metacharacter that determines how often a preceding regex can occur. you can use it to specify how many times a regex can repeat/occur.

For example, We use metacharacter *, +, ? and {} to define quantifiers.

Let's see the list of quantifiers and their meaning.

QuantifierMeaning
*Match 0 or more repetitions of the preceding regex. For example, a* matches any string that contains zero or more occurrences of 'a'.
+Match 1 or more repetitions of the preceding regex. For example, a+ matches any string that contains at least one a, i.e., a, aa, aaa, or any number of a's.
?Match 0 or 1 repetition of the preceding regex. For example, a? matches any string that contains zero or one occurrence of a.
{2}Matches only 2 copies of the preceding regex. For example, p{3} matches exactly three 'p' characters, but not four.
{2, 4}Match 2 to 4 repetitions of the preceding regex. For example, a{2,4} matches any string that contains 3 to 5 'a' characters.
{3,}Matches minimum 3 copies of the preceding regex. It will try to match as many repetitions as possible.
For example, p{3,} matches a minimum of three 'p' characters.

Regex flags

All RE module methods accept an optionalflagsargument used to enable various unique features and syntax variations.

For example, you want to search a word inside a string using regex. You can enhance this regex's capability by adding the RE.I flag as an argument to the search method to enable case-insensitive searching.

Read more: Guide on Python Regex Flags

Click on each flag to study it in detail.

Flaglong syntaxMeaning
re.Are.ASCIIPerform ASCII-only matching instead of full Unicode matching.
re.Ire.IGNORECASEPerform case-insensitive matching.
re.Mre.MULTILINEThis flag is used with metacharacter ^ (caret) and $ (dollar).
When this flag is specified, the metacharacter ^ matches the pattern at beginning of the string and each newline’s beginning (\n).
And the metacharacter $ matches pattern at the end of the string and the end of each new line (\n)
re.Sre.DOTALLMake the DOT (.) special character match any character at all, including a newline. Without this flag, DOT(.) will match anything except a newline.
re.Xre.VERBOSEAllow comment in the regex. This flag is useful to make regex more readable by allowing comments in the regex.
re.Lre.LOCALEPerform case-insensitive matching dependent on the current locale. Use only with bytes patterns.

To specify more than one flag, use the | operator to connect them.

For example:

re.findall(pattern, string, flags=re.I|re.M|re.X)

All Python Regex tutorials: -

Python RegEx Archives (2024)

References

Top Articles
Latest Posts
Article information

Author: Gregorio Kreiger

Last Updated:

Views: 6026

Rating: 4.7 / 5 (57 voted)

Reviews: 88% of readers found this page helpful

Author information

Name: Gregorio Kreiger

Birthday: 1994-12-18

Address: 89212 Tracey Ramp, Sunside, MT 08453-0951

Phone: +9014805370218

Job: Customer Designer

Hobby: Mountain biking, Orienteering, Hiking, Sewing, Backpacking, Mushroom hunting, Backpacking

Introduction: My name is Gregorio Kreiger, I am a tender, brainy, enthusiastic, combative, agreeable, gentle, gentle person who loves writing and wants to share my knowledge and understanding with you.