Python Regular Expressions (RegEx)

Regular expressions (RegEx) are a powerful tool for matching patterns in text. Python provides the re module, which allows you to work with regular expressions. This module offers a set of functions that allows us to search a string for a match.

Importing the re Module

To work with regular expressions in Python, you need to import the re module.

import re

Basic RegEx Functions

  • re.search() - Searches a string for a match and returns a match object if found.
  • re.match() - Checks for a match only at the beginning of the string.
  • re.findall() - Returns a list containing all matches.
  • re.sub() - Replaces one or many matches with a string.

Example 1: Using re.search()

Search for the word "Tamil" in a string.

# Using re.search()
import re
text = "Karthick AG is learning Python in Tamil."
match = re.search("Tamil", text)
if match:
    print("Match found:", match.group())
else:
    print("No match found.")

Match found: Tamil

Example 2: Using re.match()

Check if a string starts with "Karthick".

# Using re.match()
import re
text = "Karthick AG is from Trichy."
match = re.match("Karthick", text)
if match:
    print("String starts with 'Karthick'.")
else:
    print("String does not start with 'Karthick'.")

String starts with 'Karthick'.

Example 3: Using re.findall()

Find all occurrences of the word "is" in a string.

# Using re.findall()
import re
text = "Rohini is a teacher. She is from Chennai."
matches = re.findall("is", text)
print("Occurrences of 'is':", len(matches))

Occurrences of 'is': 2

Example 4: Using re.sub()

Replace all digits in a string with "#".

# Using re.sub()
import re
text = "Vijay's contact number is 1234567890."
result = re.sub(r"\d", "#", text)
print("Modified Text:", result)

Modified Text: Vijay's contact number is ##########.

Metacharacters and Special Sequences

Regular expressions use metacharacters and special sequences to define patterns.

  • . - Matches any character except newline.
  • ^ - Matches the beginning of the string.
  • $ - Matches the end of the string.
  • * - Matches 0 or more repetitions.
  • + - Matches 1 or more repetitions.
  • ? - Matches 0 or 1 repetition.
  • \d - Matches any digit character.
  • \w - Matches any alphanumeric character.
  • \s - Matches any whitespace character.

Example 5: Using Metacharacters

Check if a string ends with a period.

# Using metacharacters
import re
text = "John lives in Karur."
if re.search(r"\.$", text):
    print("String ends with a period.")
else:
    print("String does not end with a period.")

String ends with a period.

Example 6: Matching Digits and Words

Find all digits and words in a string.

# Matching digits and words
import re
text = "Durai is 28 years old and lives in Madurai."
digits = re.findall(r"\d+", text)
words = re.findall(r"\w+", text)
print("Digits:", digits)
print("Words:", words)

Digits: ['28']

Words: ['Durai', 'is', '28', 'years', 'old', 'and', 'lives', 'in', 'Madurai']

Example 7: Using Character Sets

Find all vowels in a string.

# Using character sets
import re
text = "Akila loves programming."
vowels = re.findall(r"[aeiouAEIOU]", text)
print("Vowels:", vowels)

Vowels: ['A', 'i', 'a', 'o', 'e', 'o', 'a', 'i']

Example 8: Using Quantifiers

Match all words that start with "T" and are followed by 2 to 5 letters.

# Using quantifiers
import re
text = "Tamil is a language spoken in Trichy and Tamilnadu."
matches = re.findall(r"\bT\w{2,5}\b", text)
print("Matched Words:", matches)

Matched Words: ['Tamil', 'Trichy']

Example 9: Using Groups

Extract the domain and extension from email addresses.

# Using groups
import re
text = "Contact us at info@techsolutions.com or support@techsolutions.co.in"
pattern = r"@(\w+)\.(\w+)"
matches = re.findall(pattern, text)
for domain, extension in matches:
    print(f"Domain: {domain}, Extension: {extension}")

Domain: techsolutions, Extension: com

Domain: techsolutions, Extension: co

Example 10: Using re.compile()

Compile a regular expression pattern for reuse.

# Using re.compile()
import re
pattern = re.compile(r"\b[A-Za-z]+\b")
text = "Banumathi is learning Python."
words = pattern.findall(text)
print("Words:", words)

Words: ['Banumathi', 'is', 'learning', 'Python']

Example 11: Using Flags

Perform a case-insensitive search using re.IGNORECASE.

# Using flags
import re
text = "Riya lives in Chennai. She loves chennai's culture."
matches = re.findall("chennai", text, re.IGNORECASE)
print("Occurrences of 'chennai':", len(matches))

Occurrences of 'chennai': 2

Example 12: Positive and Negative Lookahead

Find words followed by a comma.

# Positive lookahead
import re
text = "Please bring apples, bananas, and cherries."
matches = re.findall(r"\b\w+(?=,)", text)
print("Words followed by a comma:", matches)

Words followed by a comma: ['apples', 'bananas']

Example 13: Negative Lookbehind

Find words that start with the letter 'A' and are not preceded by a space.

# Negative lookbehind to find words starting with 'A' not preceded by space
import re
text = "This is anApple, anotherApple, and Aunique example."
matches = re.findall(r"(?<!\s)\bA\w+", text)
print("Words starting with 'A' not preceded by space:", matches)

Words starting with 'A' not preceded by space: ['Apple', 'Aunique']

Example 14: Matching Multiline Strings

Search across multiple lines using re.MULTILINE.

# Matching multiline strings
import re
text = '''
First line
Second line
Third line
'''
matches = re.findall(r"^\w+", text, re.MULTILINE)
print("First word of each line:", matches)

First word of each line: ['First', 'Second', 'Third']

Example 15: Non-Greedy Matching

Match the smallest possible string using the ? modifier.

# Non-greedy matching
import re
text = "<div>Content</div> and <div>More</div>"
match = re.search(r"<div>(.*?)</div>", text)
print("Matched Content:", match.group(1))

Matched Content: Content

Example 16: Splitting Strings with re.split()

Split a string by multiple delimiters.

# Using re.split()
import re
text = "apple;banana,orange|grape"
fruits = re.split(r"[;,|]", text)
print("Fruits:", fruits)

Fruits: ['apple', 'banana', 'orange', 'grape']

Example 17: Compiling Regular Expressions for Performance

Compile a regular expression pattern for faster matching in loops.

# Compiling for performance
import re
pattern = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")
texts = [
    "SSN is 123-45-6789.",
    "Invalid SSN 12-345-6789.",
    "Another SSN 987-65-4321."
]
for text in texts:
    if pattern.search(text):
        print("Valid SSN found in:", text)

Valid SSN found in: SSN is 123-45-6789.

Valid SSN found in: Another SSN 987-65-4321.

Example 18: Escaping Special Characters

Use re.escape() to escape all special characters in a string.

# Escaping special characters
import re
special_string = "[This] (is) {a} *special* string?"
escaped_string = re.escape(special_string)
print("Escaped String:", escaped_string)

Escaped String: \[This\]\ \(is\)\ \{a\}\ \*special\*\ string\?

Example 19: Using Named Groups

Assign names to groups for easier access.

# Using named groups
import re
text = "Order number: 12345, Date: 2023-10-05"
pattern = r"Order number: (?P<order_number>\d+), Date: (?P<date>\d{4}-\d{2}-\d{2})"
match = re.search(pattern, text)
if match:
    print("Order Number:", match.group('order_number'))
    print("Date:", match.group('date'))

Order Number: 12345

Date: 2023-10-05

Example 20: Verifying Phone Numbers

Validate Indian mobile phone numbers.

# Validating phone numbers
import re
phone_numbers = [
    "9876543210",
    "09876543210",
    "+919876543210",
    "98765-43210"
]
pattern = re.compile(r"^(\+91)?[6-9]\d{9}$")
for number in phone_numbers:
    if pattern.match(number):
        print("Valid phone number:", number)
    else:
        print("Invalid phone number:", number)

Valid phone number: 9876543210

Invalid phone number: 09876543210

Valid phone number: +919876543210

Invalid phone number: 98765-43210

Explanation: Regular expressions are a versatile tool for text processing in Python. By mastering the re module and understanding pattern syntax, you can perform complex string manipulations, validations, and data extraction with ease.