Extracting and Substituting Text with Regular Expressions in PostgreSQL
PostgreSQL provides a powerful built-in regex engine that enables sophisticated text manipulation through pattern matching, extraction, and substitution operations. Regular expressions are essential for data cleaning, validation, and transformation tasks in database applications.
Key Regex Operators for Pattern Matching
PostgreSQL offers several operators for regex pattern matching:
- ~ operator: Tests if a value matches a regex pattern (case-sensitive)
- ~* operator: Performs case-insensitive pattern matching
These operators return true if the regex matches any part of the value, not necessarily the entire string.
Extracting Text with Regular Expressions
Using substring() Function
The substring(value from pattern) function extracts text based on a regex pattern. When the pattern includes capturing groups (), only the matched group content is returned.
-- Extract area code from phone number SELECT substring('(416) 555-1212' from '\((\d{3})\)'); -- Returns: 416
Using regexp_match() Function
The regexp_match(value, pattern) function is more powerful for extracting multiple substrings simultaneously. It returns an array of captured groups, making it ideal for complex extractions.
-- Extract all parts of a phone number SELECT regexp_match('(416) 555-1212', '^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*$'); -- Returns: {416,555,1212} -- Access individual elements using array indexing SELECT (regexp_match('(416) 555-1212', '^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*$'))[1] AS area_code; -- Returns: 416
Substituting Text with Regular Expressions
Using regexp_replace() Function
The regexp_replace(value, regex, replacement, flags) function performs pattern-based text substitution. This function replaces occurrences of the regex pattern within the value with the specified replacement string.
-- Remove all non-digit characters from phone number SELECT regexp_replace('(416) 555-1212', '\D', '', 'g'); -- Returns: 4165551212
Function Parameters
- value: The source text to process
- regex: The regular expression pattern to match
- replacement: The text to substitute for matched patterns
- flags: Optional modifiers that control matching behavior
Important Regex Flags
Understanding regex flags is crucial for effective text manipulation:
- “g” (Global): Replaces all matches, not just the first occurrence
- “i” (Case-insensitive): Ignores case when matching patterns
- “n” (Newline): Allows the . character to match newline characters
-- Without 'g' flag - replaces only first match SELECT regexp_replace('hello world hello', 'hello', 'hi'); -- Returns: hi world hello -- With 'g' flag - replaces all matches SELECT regexp_replace('hello world hello', 'hello', 'hi', 'g'); -- Returns: hi world hi
Common Regex Patterns
Here are essential regex patterns for PostgreSQL text processing:
- .: Matches any single character
- \s: Matches whitespace characters, \S: matches non-whitespace
- \d: Matches digits, \D: matches non-digits
- \w: Matches word characters, \W: matches non-word characters
- ^: Matches start of string, $: matches end of string
- (): Creates capturing groups for extraction
- *: Matches 0 or more repetitions, +: matches 1 or more
- {N}: Matches exactly N repetitions
Practical Examples
-- Extract email domain SELECT regexp_match('user@example.com', '@(.+)$'); -- Returns: {example.com} -- Clean and format phone numbers SELECT regexp_replace( regexp_replace('1-416-555-1212', '\D', '', 'g'), '(\d{1})(\d{3})(\d{3})(\d{4})', '+\1 (\2) \3-\4' ); -- Returns: +1 (416) 555-1212 -- Case-insensitive substitution SELECT regexp_replace('Hello WORLD', 'hello', 'hi', 'gi'); -- Returns: hi WORLD
These regex functions provide PostgreSQL users with comprehensive text processing capabilities, enabling efficient data cleaning, validation, and transformation directly within the database layer.
Further Reading:
- PostgreSQL 18: Accelerating Disk Reads with Asynchronous I/O
- Strategic Shift from Oracle to PostgreSQL
- Cassandra Consistency Level Guide: Mastering Data Consistency in Distributed Systems
- Mastering MySQL EXPLAIN Format: Optimizing Query Performance in MySQL 8.0.32
- MySQL “Got an Error Reading Communication Packet”: Complete Troubleshooting Guide
- Regular Expression