Regex(regular expression) in python - CodeSpeedy (2024)

Post Views: 79

Regular expression (regex) is a special sequence of characters that aid us to match or find out the string or set of string, using a specialized syntax held in a pattern. The inbuilt module re in python helps us to do a string search and manipulation. It is used for web scraping. Import the module in your program.

import re

Quick guide to regex: Python

  • + = 1 or more {For ex – (0-9)+ will match 2, 14, 543 etc.}.
  • * = 0 or more {the match will succeeds regardless of the presence of the search string}.
  • . = matches any single character except the newline {For ex – “.en” will match hen, ten, men}.
  • ^ = matches the expression if at the start of the string {For ex – “^.en” would match hen, ten if located at the start of the string }.
  • [] = matches the single character within the bracket {For ex – “[th]en” will match ten, hen}.
  • [^] = matches a single character NOT contained within the bracket {For ex – “[^w]hen” will match then, but not when }.
  • \w= it matches the word character: [A-Za-z0-9].
  • \W= it matches the non-word character: [^A-Za-z0-9].
  • \d= it matches the digit: [0-9].
  • \D= it matches a non-digit: [^0-9].
  • \s= it matches a white space characters: [\t\r\n\f].
  • \S= it matches a non-white space characters: [^\t\r\n\f].
  • \A = matches beginning of the string.
  • \z = matches the end of the string.
  • re{n,} = n or more occurrences.
  • \Z = matches the end of the string. If a new line exists, it just matches before new line.
  • \G = matches point where last match finished.
  • $ = matches the characters at the end {For ex – “.at$” would match cat, hat, sat if located at the end of the string}.
  • a|b = matches either a or b.
  • re? = matches 0 or 1 occurrence of the preceding expression {For ex – “sleepy?” would match sleep or sleepy. Here y is optional}.
  • re{n} = matches exactly n occurrences.
  • re{m, n} = at least m and at most n occurrences.
  • ( = it shows that where the extraction is started.
  • ) = it shows that where the extraction is ended.
  • re.I = perform case insensitive matching.
  • re.M = makes $ match the end of a line and makes ^ match the start of any line.
  • re.S = makes a .(dot) match any character, including a newline.

Basic methods that are used in regex : Python

Match() method:

Then match() method will search for the pattern in the start of the string.
In mat1, the pattern(‘i understand’) is at the start of the string(str), therefore gives output match found.
In mat2, the pattern(‘understand’) is not at the start of the string(str), therefore gives output no match found.

Note: Group() method will return all matching subgroups of a tuple(empty if there were not any).

import restr ="i understand the concept of regular expression" mat1=re.match(r'i understand',str)mat2=re.match(r'understand',str)if mat1: print("match found: " + mat1.group())else: print("No match found for mat1")if mat2: print("match found: " + mat2.group())else: print("No match found for mat2")

Output:

match found: i understandNo match found for mat2

Search() method:

Search() method will search for the pattern(‘think’) in the whole string(str), if the pattern is found in the string then it will return search found else search not found.

import restr=" My name is Apoorva Gupta "searchresult=re.search(r'Apoorva',str)if searchresult: print("search found: ",searchresult.group())else: print("search not found")

Output:

search found: Apoorva

Sub() method:

Sub() method will replace the pattern with the repl string.
Syntax: re.sub(pattern, repl, string).

import rehouse_addr = "48A- Aptitude Apartment,Civil Lines, Delhi"pat1 = re.sub(r'-.\D+', " # 48 number house in block A", house_addr)print("Apartment number : ",pat1)pat2 = re.sub(r',.+', "", house_addr)pat= re.sub(r'\d+\D-', "", pat2)print("Apartment name : ",pat)pat = re.sub(r'\d+\D- [A-Z][a-z]+ [A-Z][a-z]+,',"",house_addr)print("New house_addr : ",pat)

Output:

Apartment number : 48A # 48 number house in block AApartment name : Aptitude ApartmentNew house_addr : Civil Lines, Delhi

Findall() method:

Findall() method will tell that how many times the pattern has occurred in the string.

import restring="Each one is different one."word=re.findall('one',string)print(word[0])print(word[1])

Output:

oneone

A program to understand regular expression: Python

  1. Import re module and give the string in which you have to find out the pattern.
  2. [0-9]+ will find out all the numbers in the string.
  3. \S+@\S+ means the non-white space character attached with the @ on both of its sides.
  4. [A-Z][a-z]+[^0-9] means that the first letter should be capital followed by 1 or more other alphabets without any digits.
  5. [A-Z][a-z]+\d+ means that the first letter should be capitalized followed by other alphabets with 1 or more digits.
import restring= 'my favourite 3 numbers are 7 , 8 and 6; my email id is guptaapoorva02@gmail.com and apoorva0698@gmail.com .; today is June14, June04 nice day, Dec12.'# Extracting the numeric digit from a string by regular expression.li = re.findall('[0-9]+', string) print(li)# Extracting the emails from the string By regular expression.lst = re.findall('\S+@\S+', string) print(lst)# To get the months of each date we can use the following patternregex = r"[A-Z][a-z]+[^0-9]"matches = re.findall(regex,string)for match in matches: print("Match month:",(match))# To get the momths with the datesregex = r"[A-Z][a-z]+\d+"matches = re.findall(regex,string)for match in matches: print("Full match:",(match))

Output:

['3', '7', '8', '6', '02', '0698', '14', '04', '12']['guptaapoorva02@gmail.com', 'apoorva0698@gmail.com']Match month: JuneMatch month: JuneMatch month: DecFull match: June14Full match: June04Full match: Dec12

Go and check other tutorials on python:

Function argument in Python

Python File Handling

Regex(regular expression) in python - CodeSpeedy (2024)

References

Top Articles
Latest Posts
Article information

Author: Geoffrey Lueilwitz

Last Updated:

Views: 6032

Rating: 5 / 5 (60 voted)

Reviews: 91% of readers found this page helpful

Author information

Name: Geoffrey Lueilwitz

Birthday: 1997-03-23

Address: 74183 Thomas Course, Port Micheal, OK 55446-1529

Phone: +13408645881558

Job: Global Representative

Hobby: Sailing, Vehicle restoration, Rowing, Ghost hunting, Scrapbooking, Rugby, Board sports

Introduction: My name is Geoffrey Lueilwitz, I am a zealous, encouraging, sparkling, enchanting, graceful, faithful, nice person who loves writing and wants to share my knowledge and understanding with you.