Python Regular Expressions (RegEx)
Regular expressions (RegEx) are a powerful tool for matching patterns in text. Python provides the re
module, which allows you to work with regular expressions. This module offers a set of functions that allows us to search a string for a match.
Importing the re
Module
To work with regular expressions in Python, you need to import the re
module.
import re
Basic RegEx Functions
re.search()
- Searches a string for a match and returns a match object if found.re.match()
- Checks for a match only at the beginning of the string.re.findall()
- Returns a list containing all matches.re.sub()
- Replaces one or many matches with a string.
Example 1: Using re.search()
Search for the word "Tamil" in a string.
# Using re.search()
import re
text = "Karthick AG is learning Python in Tamil."
match = re.search("Tamil", text)
if match:
print("Match found:", match.group())
else:
print("No match found.")
Match found: Tamil
Example 2: Using re.match()
Check if a string starts with "Karthick".
# Using re.match()
import re
text = "Karthick AG is from Trichy."
match = re.match("Karthick", text)
if match:
print("String starts with 'Karthick'.")
else:
print("String does not start with 'Karthick'.")
String starts with 'Karthick'.
Example 3: Using re.findall()
Find all occurrences of the word "is" in a string.
# Using re.findall()
import re
text = "Rohini is a teacher. She is from Chennai."
matches = re.findall("is", text)
print("Occurrences of 'is':", len(matches))
Occurrences of 'is': 2
Example 4: Using re.sub()
Replace all digits in a string with "#".
# Using re.sub()
import re
text = "Vijay's contact number is 1234567890."
result = re.sub(r"\d", "#", text)
print("Modified Text:", result)
Modified Text: Vijay's contact number is ##########.
Metacharacters and Special Sequences
Regular expressions use metacharacters and special sequences to define patterns.
.
- Matches any character except newline.^
- Matches the beginning of the string.$
- Matches the end of the string.*
- Matches 0 or more repetitions.+
- Matches 1 or more repetitions.?
- Matches 0 or 1 repetition.\d
- Matches any digit character.\w
- Matches any alphanumeric character.\s
- Matches any whitespace character.
Example 5: Using Metacharacters
Check if a string ends with a period.
# Using metacharacters
import re
text = "John lives in Karur."
if re.search(r"\.$", text):
print("String ends with a period.")
else:
print("String does not end with a period.")
String ends with a period.
Example 6: Matching Digits and Words
Find all digits and words in a string.
# Matching digits and words
import re
text = "Durai is 28 years old and lives in Madurai."
digits = re.findall(r"\d+", text)
words = re.findall(r"\w+", text)
print("Digits:", digits)
print("Words:", words)
Digits: ['28']
Words: ['Durai', 'is', '28', 'years', 'old', 'and', 'lives', 'in', 'Madurai']
Example 7: Using Character Sets
Find all vowels in a string.
# Using character sets
import re
text = "Akila loves programming."
vowels = re.findall(r"[aeiouAEIOU]", text)
print("Vowels:", vowels)
Vowels: ['A', 'i', 'a', 'o', 'e', 'o', 'a', 'i']
Example 8: Using Quantifiers
Match all words that start with "T" and are followed by 2 to 5 letters.
# Using quantifiers
import re
text = "Tamil is a language spoken in Trichy and Tamilnadu."
matches = re.findall(r"\bT\w{2,5}\b", text)
print("Matched Words:", matches)
Matched Words: ['Tamil', 'Trichy']
Example 9: Using Groups
Extract the domain and extension from email addresses.
# Using groups
import re
text = "Contact us at info@techsolutions.com or support@techsolutions.co.in"
pattern = r"@(\w+)\.(\w+)"
matches = re.findall(pattern, text)
for domain, extension in matches:
print(f"Domain: {domain}, Extension: {extension}")
Domain: techsolutions, Extension: com
Domain: techsolutions, Extension: co
Example 10: Using re.compile()
Compile a regular expression pattern for reuse.
# Using re.compile()
import re
pattern = re.compile(r"\b[A-Za-z]+\b")
text = "Banumathi is learning Python."
words = pattern.findall(text)
print("Words:", words)
Words: ['Banumathi', 'is', 'learning', 'Python']
Example 11: Using Flags
Perform a case-insensitive search using re.IGNORECASE
.
# Using flags
import re
text = "Riya lives in Chennai. She loves chennai's culture."
matches = re.findall("chennai", text, re.IGNORECASE)
print("Occurrences of 'chennai':", len(matches))
Occurrences of 'chennai': 2
Example 12: Positive and Negative Lookahead
Find words followed by a comma.
# Positive lookahead
import re
text = "Please bring apples, bananas, and cherries."
matches = re.findall(r"\b\w+(?=,)", text)
print("Words followed by a comma:", matches)
Words followed by a comma: ['apples', 'bananas']
Example 13: Negative Lookbehind
Find words that start with the letter 'A' and are not preceded by a space.
# Negative lookbehind to find words starting with 'A' not preceded by space
import re
text = "This is anApple, anotherApple, and Aunique example."
matches = re.findall(r"(?<!\s)\bA\w+", text)
print("Words starting with 'A' not preceded by space:", matches)
Words starting with 'A' not preceded by space: ['Apple', 'Aunique']
Example 14: Matching Multiline Strings
Search across multiple lines using re.MULTILINE
.
# Matching multiline strings
import re
text = '''
First line
Second line
Third line
'''
matches = re.findall(r"^\w+", text, re.MULTILINE)
print("First word of each line:", matches)
First word of each line: ['First', 'Second', 'Third']
Example 15: Non-Greedy Matching
Match the smallest possible string using the ?
modifier.
# Non-greedy matching
import re
text = "<div>Content</div> and <div>More</div>"
match = re.search(r"<div>(.*?)</div>", text)
print("Matched Content:", match.group(1))
Matched Content: Content
Example 16: Splitting Strings with re.split()
Split a string by multiple delimiters.
# Using re.split()
import re
text = "apple;banana,orange|grape"
fruits = re.split(r"[;,|]", text)
print("Fruits:", fruits)
Fruits: ['apple', 'banana', 'orange', 'grape']
Example 17: Compiling Regular Expressions for Performance
Compile a regular expression pattern for faster matching in loops.
# Compiling for performance
import re
pattern = re.compile(r"\b\d{3}-\d{2}-\d{4}\b")
texts = [
"SSN is 123-45-6789.",
"Invalid SSN 12-345-6789.",
"Another SSN 987-65-4321."
]
for text in texts:
if pattern.search(text):
print("Valid SSN found in:", text)
Valid SSN found in: SSN is 123-45-6789.
Valid SSN found in: Another SSN 987-65-4321.
Example 18: Escaping Special Characters
Use re.escape()
to escape all special characters in a string.
# Escaping special characters
import re
special_string = "[This] (is) {a} *special* string?"
escaped_string = re.escape(special_string)
print("Escaped String:", escaped_string)
Escaped String: \[This\]\ \(is\)\ \{a\}\ \*special\*\ string\?
Example 19: Using Named Groups
Assign names to groups for easier access.
# Using named groups
import re
text = "Order number: 12345, Date: 2023-10-05"
pattern = r"Order number: (?P<order_number>\d+), Date: (?P<date>\d{4}-\d{2}-\d{2})"
match = re.search(pattern, text)
if match:
print("Order Number:", match.group('order_number'))
print("Date:", match.group('date'))
Order Number: 12345
Date: 2023-10-05
Example 20: Verifying Phone Numbers
Validate Indian mobile phone numbers.
# Validating phone numbers
import re
phone_numbers = [
"9876543210",
"09876543210",
"+919876543210",
"98765-43210"
]
pattern = re.compile(r"^(\+91)?[6-9]\d{9}$")
for number in phone_numbers:
if pattern.match(number):
print("Valid phone number:", number)
else:
print("Invalid phone number:", number)
Valid phone number: 9876543210
Invalid phone number: 09876543210
Valid phone number: +919876543210
Invalid phone number: 98765-43210
Explanation: Regular expressions are a versatile tool for text processing in Python. By mastering the re
module and understanding pattern syntax, you can perform complex string manipulations, validations, and data extraction with ease.