Extracting and Substituting Text with Regular Expressions in PostgreSQL

Extracting and Substituting Text with Regular Expressions in PostgreSQL


PostgreSQL provides a powerful built-in regex engine that enables sophisticated text manipulation through pattern matching, extraction, and substitution operations. Regular expressions are essential for data cleaning, validation, and transformation tasks in database applications.

Key Regex Operators for Pattern Matching

PostgreSQL offers several operators for regex pattern matching:

  • ~ operator: Tests if a value matches a regex pattern (case-sensitive)
  • ~* operator: Performs case-insensitive pattern matching

These operators return true if the regex matches any part of the value, not necessarily the entire string.

Extracting Text with Regular Expressions

Using substring() Function

The substring(value from pattern) function extracts text based on a regex pattern. When the pattern includes capturing groups (), only the matched group content is returned.

-- Extract area code from phone number
SELECT substring('(416) 555-1212' from '\((\d{3})\)');
-- Returns: 416

Using regexp_match() Function

The regexp_match(value, pattern) function is more powerful for extracting multiple substrings simultaneously. It returns an array of captured groups, making it ideal for complex extractions.

-- Extract all parts of a phone number
SELECT regexp_match('(416) 555-1212', '^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*$');
-- Returns: {416,555,1212}

-- Access individual elements using array indexing
SELECT (regexp_match('(416) 555-1212', '^\D*(\d{3})\D*(\d{3})\D*(\d{4})\D*$'))[1] AS area_code;
-- Returns: 416

Substituting Text with Regular Expressions

Using regexp_replace() Function

The regexp_replace(value, regex, replacement, flags) function performs pattern-based text substitution. This function replaces occurrences of the regex pattern within the value with the specified replacement string.

-- Remove all non-digit characters from phone number
SELECT regexp_replace('(416) 555-1212', '\D', '', 'g');
-- Returns: 4165551212

Function Parameters

  • value: The source text to process
  • regex: The regular expression pattern to match
  • replacement: The text to substitute for matched patterns
  • flags: Optional modifiers that control matching behavior

Important Regex Flags

Understanding regex flags is crucial for effective text manipulation:

  • “g” (Global): Replaces all matches, not just the first occurrence
  • “i” (Case-insensitive): Ignores case when matching patterns
  • “n” (Newline): Allows the . character to match newline characters
-- Without 'g' flag - replaces only first match
SELECT regexp_replace('hello world hello', 'hello', 'hi');
-- Returns: hi world hello

-- With 'g' flag - replaces all matches
SELECT regexp_replace('hello world hello', 'hello', 'hi', 'g');
-- Returns: hi world hi

Common Regex Patterns

Here are essential regex patterns for PostgreSQL text processing:

  • .: Matches any single character
  • \s: Matches whitespace characters, \S: matches non-whitespace
  • \d: Matches digits, \D: matches non-digits
  • \w: Matches word characters, \W: matches non-word characters
  • ^: Matches start of string, $: matches end of string
  • (): Creates capturing groups for extraction
  • *: Matches 0 or more repetitions, +: matches 1 or more
  • {N}: Matches exactly N repetitions

Practical Examples

-- Extract email domain
SELECT regexp_match('user@example.com', '@(.+)$');
-- Returns: {example.com}

-- Clean and format phone numbers
SELECT regexp_replace(
    regexp_replace('1-416-555-1212', '\D', '', 'g'),
    '(\d{1})(\d{3})(\d{3})(\d{4})',
    '+\1 (\2) \3-\4'
);
-- Returns: +1 (416) 555-1212

-- Case-insensitive substitution
SELECT regexp_replace('Hello WORLD', 'hello', 'hi', 'gi');
-- Returns: hi WORLD

These regex functions provide PostgreSQL users with comprehensive text processing capabilities, enabling efficient data cleaning, validation, and transformation directly within the database layer.

Further Reading:

About MinervaDB Corporation 171 Articles
Full-stack Database Infrastructure Architecture, Engineering and Operations Consultative Support(24*7) Provider for PostgreSQL, MySQL, MariaDB, MongoDB, ClickHouse, Trino, SQL Server, Cassandra, CockroachDB, Yugabyte, Couchbase, Redis, Valkey, NoSQL, NewSQL, Databricks, Amazon Resdhift, Amazon Aurora, CloudSQL, Snowflake and AzureSQL with core expertize in Performance, Scalability, High Availability, Database Reliability Engineering, Database Upgrades/Migration, and Data Security.