Detailed Maintenance Plan for PostgreSQL



PostgreSQL database maintenance(PostgreSQL Maintenance Plan) is a critical aspect of database administration that ensures optimal performance, data integrity, and system reliability . A well-structured maintenance plan minimizes downtime, prevents data corruption, and supports business continuity by implementing proactive measures rather than reactive responses . This comprehensive plan provides detailed procedures, schedules, and best practices for maintaining PostgreSQL databases in production environments.

Executive Summary

PostgreSQL requires systematic maintenance activities to achieve optimal performance and reliability . The maintenance tasks fall into several key categories: backup and recovery operations, performance optimization, security management, monitoring and alerting, and configuration tuning . Research indicates that regular maintenance activities such as vacuuming, analyzing, and monitoring help optimize query performance, reclaim disk space, and prevent data corruption . Without proper maintenance, databases can experience performance degradation, increased bloat, and potential data loss scenarios .

PostgreSQL Maintenance Plan

PostgreSQL Maintenance Task Analysis - Distribution by Category, Frequency, Priority and Duration

Core Maintenance Components

Database Vacuum Operations

The VACUUM command serves as PostgreSQL's primary cleanup mechanism, removing dead tuples and reclaiming disk space occupied by deleted or updated rows . PostgreSQL's Multi-Version Concurrency Control (MVCC) architecture creates dead tuples during normal operations, making regular vacuuming essential for maintaining performance . Standard VACUUM operations can run concurrently with other database activities, while VACUUM FULL provides more thorough cleanup but requires exclusive table locks .

Autovacuum provides automated maintenance by running VACUUM and ANALYZE operations based on configurable thresholds . The autovacuum daemon monitors table activity and triggers maintenance when the number of dead tuples exceeds specified limits . Key autovacuum parameters include autovacuum_vacuum_threshold, autovacuum_vacuum_scale_factor, and autovacuum_max_workers, which should be tuned based on workload characteristics .

Statistics Collection and Analysis

The ANALYZE command collects statistical information about table contents, enabling the query planner to generate optimal execution plans . These statistics include data distribution patterns, column correlations, and cardinality estimates that directly impact query performance . Regular ANALYZE operations are crucial for maintaining accurate statistics, particularly for tables with frequent data modifications .

The PostgreSQL query planner relies heavily on current statistics to make informed decisions about index usage, join ordering, and scan methods . Outdated statistics can lead to suboptimal query plans and performance degradation . Manual ANALYZE operations may be necessary for tables with significant data changes that don't trigger autovacuum thresholds .

Index Maintenance and Optimization

Index maintenance involves monitoring index usage patterns, identifying unused indexes, and rebuilding fragmented indexes through REINDEX operations . Unused indexes consume storage space and slow down write operations without providing query benefits .Regular analysis of index statistics helps identify candidates for removal or optimization .

REINDEX operations rebuild indexes to eliminate bloat and fragmentation, particularly beneficial for indexes with high update frequencies . However, REINDEX requires exclusive locks and should be scheduled during maintenance windows . Alternative approaches include creating new indexes concurrently and dropping old ones to minimize downtime .

Maintenance Schedules and Frequencies

Daily Maintenance Tasks

Daily maintenance activities focus on critical monitoring and basic housekeeping operations . Essential daily tasks include verifying backup completion, monitoring disk space usage, checking database connectivity, and reviewing error logs for critical issues . These activities typically require 15-30 minutes and can be automated through scripts .

Database connectivity testing ensures system availability and identifies potential connection issues before they impact users . Disk space monitoring prevents storage-related outages by providing early warnings when space utilization approaches critical thresholds . Error log review helps identify recurring issues, security concerns, and performance problems that require immediate attention

#!/bin/bash
# PostgreSQL Daily Maintenance Script
# Usage: ./daily_maintenance.sh [database_name]
# Version: 2.0
# Author: System Administrator
# Description: Performs daily maintenance tasks for PostgreSQL databases

set -euo pipefail  # Exit on error, undefined variables, and pipe failures
IFS=$'\n\t'       # Secure Internal Field Separator

# Script configuration
readonly SCRIPT_NAME="$(basename "$0")"
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly PID_FILE="/var/run/${SCRIPT_NAME%.*}.pid"

# PostgreSQL configuration with defaults
readonly PGUSER="${PGUSER:-postgres}"
readonly PGHOST="${PGHOST:-localhost}"
readonly PGPORT="${PGPORT:-5432}"
readonly PGDATABASE="${PGDATABASE:-postgres}"

# Directory configuration
readonly LOG_DIR="${LOG_DIR:-/var/log/postgresql/maintenance}"
readonly BACKUP_DIR="${BACKUP_DIR:-/var/backups/postgresql}"
readonly DATE="$(date +%Y%m%d_%H%M%S)"
readonly LOG_FILE="${LOG_DIR}/daily_maintenance_${DATE}.log"

# Maintenance configuration
readonly LONG_QUERY_THRESHOLD="${LONG_QUERY_THRESHOLD:-5 minutes}"
readonly MAX_LOG_LINES="${MAX_LOG_LINES:-50}"
readonly DISK_USAGE_THRESHOLD="${DISK_USAGE_THRESHOLD:-85}"

# Color codes for output
readonly RED='\033[0;31m'
readonly GREEN='\033[0;32m'
readonly YELLOW='\033[1;33m'
readonly BLUE='\033[0;34m'
readonly NC='\033[0m' # No Color

# Cleanup function
cleanup() {
    local exit_code=$?
    if [[ -f "$PID_FILE" ]]; then
        rm -f "$PID_FILE"
    fi
    exit $exit_code
}

# Set up signal handlers
trap cleanup EXIT INT TERM

# Function to display usage
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] [database_name]

OPTIONS:
    -h, --help          Show this help message
    -v, --verbose       Enable verbose output
    -d, --dry-run       Show what would be done without executing
    -q, --quiet         Suppress non-error output

ARGUMENTS:
    database_name       Specific database to maintain (optional)

ENVIRONMENT VARIABLES:
    PGUSER             PostgreSQL username (default: postgres)
    PGHOST             PostgreSQL host (default: localhost)
    PGPORT             PostgreSQL port (default: 5432)
    PGDATABASE         Default database (default: postgres)
    LOG_DIR            Log directory (default: /var/log/postgresql/maintenance)
    BACKUP_DIR         Backup directory (default: /var/backups/postgresql)

EOF
}

# Function to log messages with different levels
log_message() {
    local level="$1"
    local message="$2"
    local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
    local color=""

    case "$level" in
        "ERROR")   color="$RED" ;;
        "WARN")    color="$YELLOW" ;;
        "INFO")    color="$GREEN" ;;
        "DEBUG")   color="$BLUE" ;;
        *)         color="$NC" ;;
    esac

    if [[ "$level" != "DEBUG" || "${VERBOSE:-false}" == "true" ]]; then
        printf "${color}[%s] %s: %s${NC}\n" "$timestamp" "$level" "$message" | tee -a "$LOG_FILE"
    else
        printf "[%s] %s: %s\n" "$timestamp" "$level" "$message" >> "$LOG_FILE"
    fi
}

# Function to check if script is already running
check_running() {
    if [[ -f "$PID_FILE" ]]; then
        local old_pid
        old_pid="$(cat "$PID_FILE" 2>/dev/null || echo "")"
        if [[ -n "$old_pid" ]] && kill -0 "$old_pid" 2>/dev/null; then
            log_message "ERROR" "Script is already running with PID $old_pid"
            exit 1
        else
            rm -f "$PID_FILE"
        fi
    fi
    echo $$ > "$PID_FILE"
}

# Function to create required directories
setup_directories() {
    local dirs=("$LOG_DIR" "$BACKUP_DIR")

    for dir in "${dirs[@]}"; do
        if [[ ! -d "$dir" ]]; then
            if ! mkdir -p "$dir" 2>/dev/null; then
                log_message "ERROR" "Failed to create directory: $dir"
                exit 1
            fi
            log_message "INFO" "Created directory: $dir"
        fi
    done
}

# Function to validate PostgreSQL connection
validate_pg_connection() {
    log_message "INFO" "Testing database connectivity..."

    if ! command -v psql >/dev/null 2>&1; then
        log_message "ERROR" "psql command not found. Please install PostgreSQL client."
        return 1
    fi

    local connection_test
    if connection_test=$(psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$PGDATABASE" -t -c "SELECT version();" 2>&1); then
        log_message "INFO" "Database connectivity: OK"
        log_message "DEBUG" "PostgreSQL version: $(echo "$connection_test" | head -1 | xargs)"
        return 0
    else
        log_message "ERROR" "Cannot connect to database: $connection_test"
        return 1
    fi
}

# Function to check disk space
check_disk_space() {
    log_message "INFO" "Checking disk space..."

    local disk_info
    disk_info=$(df -h 2>/dev/null | grep -E "(Filesystem|/var|/data|/tmp)" || true)

    if [[ -n "$disk_info" ]]; then
        echo "$disk_info" | tee -a "$LOG_FILE"

        # Check for high disk usage
        while IFS= read -r line; do
            if [[ "$line" =~ ([0-9]+)% ]]; then
                local usage="${BASH_REMATCH[1]}"
                if [[ "$usage" -gt "$DISK_USAGE_THRESHOLD" ]]; then
                    log_message "WARN" "High disk usage detected: ${usage}% on $(echo "$line" | awk '{print $6}')"
                fi
            fi
        done <<< "$disk_info"
    else
        log_message "WARN" "Could not retrieve disk space information"
    fi
}

# Function to check for long-running queries
check_long_queries() {
    log_message "INFO" "Checking for long-running queries..."

    local query="
    SELECT 
        pid,
        now() - pg_stat_activity.query_start AS duration,
        usename,
        application_name,
        client_addr,
        state,
        LEFT(query, 100) || CASE WHEN LENGTH(query) > 100 THEN '...' ELSE '' END AS query_preview
    FROM pg_stat_activity 
    WHERE (now() - pg_stat_activity.query_start) > interval '$LONG_QUERY_THRESHOLD'
    AND state = 'active'
    AND pid <> pg_backend_pid()
    ORDER BY duration DESC;"

    local result
    if result=$(psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$PGDATABASE" -c "$query" 2>&1); then
        if [[ $(echo "$result" | wc -l) -gt 3 ]]; then
            log_message "WARN" "Found long-running queries:"
            echo "$result" | tee -a "$LOG_FILE"
        else
            log_message "INFO" "No long-running queries found"
        fi
    else
        log_message "ERROR" "Failed to check long-running queries: $result"
    fi
}

# Function to update table statistics
update_statistics() {
    local target_db="$1"

    if [[ -n "$target_db" ]]; then
        log_message "INFO" "Updating statistics for database: $target_db"
        if [[ "${DRY_RUN:-false}" == "true" ]]; then
            log_message "INFO" "[DRY RUN] Would run ANALYZE on database: $target_db"
        else
            if psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$target_db" -c "ANALYZE;" >> "$LOG_FILE" 2>&1; then
                log_message "INFO" "ANALYZE completed successfully for database: $target_db"
            else
                log_message "ERROR" "ANALYZE failed for database: $target_db"
                return 1
            fi
        fi
    else
        log_message "INFO" "Updating statistics for all databases..."

        local databases
        if databases=$(psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$PGDATABASE" -t -c "SELECT datname FROM pg_database WHERE datistemplate = false AND datallowconn = true;" 2>/dev/null); then
            while IFS= read -r dbname; do
                dbname=$(echo "$dbname" | xargs)  # Trim whitespace
                if [[ -n "$dbname" ]]; then
                    log_message "INFO" "Running ANALYZE on database: $dbname"
                    if [[ "${DRY_RUN:-false}" == "true" ]]; then
                        log_message "INFO" "[DRY RUN] Would run ANALYZE on database: $dbname"
                    else
                        if ! psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$dbname" -c "ANALYZE;" >> "$LOG_FILE" 2>&1; then
                            log_message "WARN" "ANALYZE failed for database: $dbname"
                        fi
                    fi
                fi
            done <<< "$databases"
        else
            log_message "ERROR" "Failed to retrieve database list"
            return 1
        fi
    fi
}

# Function to check PostgreSQL error logs
check_error_logs() {
    log_message "INFO" "Checking recent PostgreSQL error logs..."

    local log_files=(
        "/var/log/postgresql/postgresql.log"
        "/var/log/postgresql/postgresql-*.log"
        "/var/lib/postgresql/*/main/log/postgresql-*.log"
    )

    local found_logs=false
    for log_pattern in "${log_files[@]}"; do
        for log_file in $log_pattern; do
            if [[ -f "$log_file" && -r "$log_file" ]]; then
                found_logs=true
                log_message "INFO" "Checking log file: $log_file"

                local recent_errors
                if recent_errors=$(grep -i "error\|fatal\|panic" "$log_file" 2>/dev/null | tail -"$MAX_LOG_LINES"); then
                    if [[ -n "$recent_errors" ]]; then
                        log_message "WARN" "Recent errors found in $log_file:"
                        echo "$recent_errors" | tee -a "$LOG_FILE"
                    else
                        log_message "INFO" "No recent errors found in $log_file"
                    fi
                fi
                break
            fi
        done
    done

    if [[ "$found_logs" == "false" ]]; then
        log_message "WARN" "No PostgreSQL log files found or accessible"
    fi
}

# Function to monitor connections
monitor_connections() {
    log_message "INFO" "Monitoring database connections..."

    local query="
    SELECT 
        count(*) as total_connections,
        count(*) FILTER (WHERE state = 'active') as active_connections,
        count(*) FILTER (WHERE state = 'idle') as idle_connections,
        count(*) FILTER (WHERE state = 'idle in transaction') as idle_in_transaction,
        max_conn.setting::int as max_connections,
        ROUND((count(*)::float / max_conn.setting::int) * 100, 2) as connection_usage_percent
    FROM pg_stat_activity, 
         (SELECT setting FROM pg_settings WHERE name = 'max_connections') max_conn;"

    local result
    if result=$(psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$PGDATABASE" -c "$query" 2>&1); then
        echo "$result" | tee -a "$LOG_FILE"

        # Extract connection usage percentage for alerting
        local usage_percent
        if usage_percent=$(echo "$result" | grep -oE '[0-9]+\.[0-9]+' | tail -1); then
            if (( $(echo "$usage_percent > 80" | bc -l) )); then
                log_message "WARN" "High connection usage: ${usage_percent}%"
            fi
        fi
    else
        log_message "ERROR" "Failed to retrieve connection information: $result"
    fi
}

# Function to perform database health checks
health_checks() {
    log_message "INFO" "Performing database health checks..."

    # Check for database locks
    local lock_query="
    SELECT 
        blocked_locks.pid AS blocked_pid,
        blocked_activity.usename AS blocked_user,
        blocking_locks.pid AS blocking_pid,
        blocking_activity.usename AS blocking_user,
        blocked_activity.query AS blocked_statement,
        blocking_activity.query AS current_statement_in_blocking_process
    FROM pg_catalog.pg_locks blocked_locks
    JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid
    JOIN pg_catalog.pg_locks blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
        AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
        AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
        AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
        AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
        AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
        AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
        AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
        AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
        AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
        AND blocking_locks.pid != blocked_locks.pid
    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid
    WHERE NOT blocked_locks.GRANTED;"

    local locks_result
    if locks_result=$(psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$PGDATABASE" -c "$lock_query" 2>&1); then
        if [[ $(echo "$locks_result" | wc -l) -gt 3 ]]; then
            log_message "WARN" "Database locks detected:"
            echo "$locks_result" | tee -a "$LOG_FILE"
        else
            log_message "INFO" "No blocking locks found"
        fi
    else
        log_message "WARN" "Could not check for database locks: $locks_result"
    fi
}

# Main execution function
main() {
    local target_database=""
    local verbose=false
    local dry_run=false
    local quiet=false

    # Parse command line arguments
    while [[ $# -gt 0 ]]; do
        case $1 in
            -h|--help)
                usage
                exit 0
                ;;
            -v|--verbose)
                verbose=true
                export VERBOSE=true
                shift
                ;;
            -d|--dry-run)
                dry_run=true
                export DRY_RUN=true
                shift
                ;;
            -q|--quiet)
                quiet=true
                export QUIET=true
                shift
                ;;
            -*)
                log_message "ERROR" "Unknown option: $1"
                usage
                exit 1
                ;;
            *)
                target_database="$1"
                shift
                ;;
        esac
    done

    # Initialize
    check_running
    setup_directories

    log_message "INFO" "Starting PostgreSQL daily maintenance (PID: $$)"
    log_message "INFO" "Target database: ${target_database:-'all databases'}"
    log_message "INFO" "Log file: $LOG_FILE"

    if [[ "$dry_run" == "true" ]]; then
        log_message "INFO" "Running in DRY RUN mode - no changes will be made"
    fi

    # Execute maintenance tasks
    local exit_code=0

    check_disk_space || ((exit_code++))
    validate_pg_connection || { log_message "ERROR" "Database connection failed, aborting"; exit 1; }
    check_long_queries || ((exit_code++))
    update_statistics "$target_database" || ((exit_code++))
    check_error_logs || ((exit_code++))
    monitor_connections || ((exit_code++))
    health_checks || ((exit_code++))

    # Final status
    if [[ $exit_code -eq 0 ]]; then
        log_message "INFO" "Daily maintenance completed successfully"
    else
        log_message "WARN" "Daily maintenance completed with $exit_code warnings/errors"
    fi

    log_message "INFO" "Maintenance log saved to: $LOG_FILE"

    return $exit_code
}

# Execute main function if script is run directly
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
    main "$@"
fi

 

Weekly Maintenance Operations

Weekly maintenance includes more comprehensive tasks such as VACUUM ANALYZE operations, performance analysis, and security reviews . The traditional approach schedules database-wide VACUUM operations during low-usage periods, typically once per week . Weekly tasks also include connection pool analysis, deadlock investigation, and index usage statistics review .

Table bloat analysis identifies tables requiring attention due to excessive dead tuple accumulation . Connection pool monitoring helps optimize resource utilization and identify potential bottlenecks . Deadlock analysis reveals application-level issues that may require query optimization or transaction redesign.

#!/bin/bash

#===============================================================================
# PostgreSQL Weekly Maintenance Script
# Description: Performs routine maintenance tasks on PostgreSQL databases
# Usage: ./weekly_maintenance.sh [OPTIONS] [database_name]
# Author: System Administrator
# Version: 2.0
#===============================================================================

set -euo pipefail  # Exit on error, undefined vars, pipe failures
IFS=$'\n\t'       # Secure Internal Field Separator

#===============================================================================
# CONFIGURATION
#===============================================================================

# Default PostgreSQL connection parameters
readonly DEFAULT_PGUSER="${PGUSER:-postgres}"
readonly DEFAULT_PGHOST="${PGHOST:-localhost}"
readonly DEFAULT_PGPORT="${PGPORT:-5432}"
readonly DEFAULT_LOG_DIR="/var/log/postgresql/maintenance"

# Script configuration
readonly SCRIPT_NAME="$(basename "$0")"
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly DATE="$(date +%Y%m%d_%H%M%S)"
readonly PID="$$"

# Initialize variables
PGUSER="$DEFAULT_PGUSER"
PGHOST="$DEFAULT_PGHOST"
PGPORT="$DEFAULT_PGPORT"
LOG_DIR="$DEFAULT_LOG_DIR"
DATABASE=""
VERBOSE=false
DRY_RUN=false
FORCE=false

#===============================================================================
# FUNCTIONS
#===============================================================================

# Display usage information
usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] [database_name]

PostgreSQL Weekly Maintenance Script

OPTIONS:
    -h, --help              Show this help message
    -v, --verbose           Enable verbose output
    -n, --dry-run          Show what would be done without executing
    -f, --force            Force execution without prompts
    -u, --user USER        PostgreSQL username (default: $DEFAULT_PGUSER)
    -H, --host HOST        PostgreSQL host (default: $DEFAULT_PGHOST)
    -p, --port PORT        PostgreSQL port (default: $DEFAULT_PGPORT)
    -l, --log-dir DIR      Log directory (default: $DEFAULT_LOG_DIR)

ARGUMENTS:
    database_name          Specific database to maintain (optional)
                          If not specified, all non-template databases will be processed

EXAMPLES:
    $SCRIPT_NAME                    # Maintain all databases
    $SCRIPT_NAME mydb               # Maintain specific database
    $SCRIPT_NAME -v -u admin mydb   # Verbose mode with custom user

EOF
}

# Logging function with different levels
log_message() {
    local level="$1"
    local message="$2"
    local timestamp="$(date '+%Y-%m-%d %H:%M:%S')"
    local log_entry="[$timestamp] [$level] [PID:$PID] $message"

    echo "$log_entry" | tee -a "$LOG_DIR/weekly_maintenance_$DATE.log"

    if [[ "$level" == "ERROR" ]]; then
        echo "$log_entry" >&2
    fi
}

# Wrapper functions for different log levels
log_info() { log_message "INFO" "$1"; }
log_warn() { log_message "WARN" "$1"; }
log_error() { log_message "ERROR" "$1"; }
log_debug() { [[ "$VERBOSE" == true ]] && log_message "DEBUG" "$1" || true; }

# Error handling function
error_exit() {
    log_error "$1"
    cleanup
    exit "${2:-1}"
}

# Cleanup function
cleanup() {
    log_debug "Performing cleanup operations"
    # Add any cleanup operations here
}

# Trap for cleanup on exit
trap cleanup EXIT
trap 'error_exit "Script interrupted by user" 130' INT TERM

# Validate PostgreSQL connection
validate_connection() {
    log_debug "Validating PostgreSQL connection"

    if ! command -v psql >/dev/null 2>&1; then
        error_exit "psql command not found. Please install PostgreSQL client tools." 2
    fi

    if ! psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -c '\q' >/dev/null 2>&1; then
        error_exit "Cannot connect to PostgreSQL server at $PGHOST:$PGPORT as user $PGUSER" 3
    fi

    log_info "PostgreSQL connection validated successfully"
}

# Create log directory with proper permissions
setup_logging() {
    if [[ "$DRY_RUN" == true ]]; then
        log_info "[DRY RUN] Would create log directory: $LOG_DIR"
        return 0
    fi

    if ! mkdir -p "$LOG_DIR" 2>/dev/null; then
        error_exit "Cannot create log directory: $LOG_DIR" 4
    fi

    # Set appropriate permissions
    chmod 755 "$LOG_DIR" 2>/dev/null || log_warn "Could not set permissions on log directory"

    log_info "Log directory initialized: $LOG_DIR"
}

# Execute SQL command with error handling
execute_sql() {
    local database="$1"
    local sql_command="$2"
    local description="$3"

    log_debug "Executing SQL on database '$database': $description"

    if [[ "$DRY_RUN" == true ]]; then
        log_info "[DRY RUN] Would execute on '$database': $description"
        return 0
    fi

    if ! psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -d "$database" \
             -c "$sql_command" >> "$LOG_DIR/weekly_maintenance_$DATE.log" 2>&1; then
        log_error "Failed to execute SQL on database '$database': $description"
        return 1
    fi

    log_info "Successfully executed on '$database': $description"
    return 0
}

# Get list of databases to process
get_databases() {
    if [[ -n "$DATABASE" ]]; then
        echo "$DATABASE"
        return 0
    fi

    log_debug "Retrieving list of non-template databases"

    local databases
    databases=$(psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -t -c \
        "SELECT datname FROM pg_database WHERE datistemplate = false AND datallowconn = true;" 2>/dev/null)

    if [[ -z "$databases" ]]; then
        error_exit "No databases found or unable to retrieve database list" 5
    fi

    echo "$databases" | sed '/^[[:space:]]*$/d' | sort
}

# Perform VACUUM ANALYZE
perform_vacuum_analyze() {
    log_info "Starting VACUUM ANALYZE operations"

    local databases
    databases=$(get_databases)
    local db_count=0
    local success_count=0

    while IFS= read -r dbname; do
        [[ -z "$dbname" ]] && continue
        ((db_count++))

        log_info "Running VACUUM ANALYZE on database: $dbname"

        if execute_sql "$dbname" "VACUUM ANALYZE;" "VACUUM ANALYZE"; then
            ((success_count++))
        fi
    done <<< "$databases"

    log_info "VACUUM ANALYZE completed: $success_count/$db_count databases processed successfully"
}

# Check table bloat
check_table_bloat() {
    log_info "Checking table bloat"

    local sql_query="
        SELECT 
            schemaname,
            tablename,
            pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) as size,
            n_dead_tup,
            n_live_tup,
            ROUND(n_dead_tup::numeric / NULLIF(n_dead_tup + n_live_tup, 0) * 100, 2) as dead_tuple_percent
        FROM pg_stat_user_tables 
        WHERE n_dead_tup > 1000 
        ORDER BY dead_tuple_percent DESC NULLS LAST;
    "

    if [[ "$DRY_RUN" == true ]]; then
        log_info "[DRY RUN] Would check table bloat"
        return 0
    fi

    psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -c "$sql_query" \
        >> "$LOG_DIR/weekly_maintenance_$DATE.log" 2>&1 || \
        log_error "Failed to check table bloat"
}

# Check unused indexes
check_unused_indexes() {
    log_info "Checking for unused indexes"

    local sql_query="
        SELECT 
            schemaname,
            tablename,
            indexname,
            idx_scan,
            idx_tup_read,
            idx_tup_fetch,
            pg_size_pretty(pg_relation_size(indexrelname)) as index_size
        FROM pg_stat_user_indexes 
        WHERE idx_scan = 0 
            AND pg_relation_size(indexrelname) > 1024*1024  -- > 1MB
        ORDER BY pg_relation_size(indexrelname) DESC;
    "

    if [[ "$DRY_RUN" == true ]]; then
        log_info "[DRY RUN] Would check unused indexes"
        return 0
    fi

    psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -c "$sql_query" \
        >> "$LOG_DIR/weekly_maintenance_$DATE.log" 2>&1 || \
        log_error "Failed to check unused indexes"
}

# Check deadlocks
check_deadlocks() {
    log_info "Checking for recent deadlocks"

    local sql_query="
        SELECT 
            datname,
            deadlocks,
            CASE 
                WHEN deadlocks > 100 THEN 'HIGH'
                WHEN deadlocks > 10 THEN 'MEDIUM'
                WHEN deadlocks > 0 THEN 'LOW'
                ELSE 'NONE'
            END as severity
        FROM pg_stat_database 
        WHERE deadlocks > 0 
        ORDER BY deadlocks DESC;
    "

    if [[ "$DRY_RUN" == true ]]; then
        log_info "[DRY RUN] Would check deadlocks"
        return 0
    fi

    psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -c "$sql_query" \
        >> "$LOG_DIR/weekly_maintenance_$DATE.log" 2>&1 || \
        log_error "Failed to check deadlocks"
}

# Analyze connection pool
analyze_connections() {
    log_info "Performing connection pool analysis"

    local sql_query="
        SELECT 
            datname,
            usename,
            client_addr,
            state,
            count(*) as connection_count,
            now() - min(backend_start) as oldest_connection
        FROM pg_stat_activity 
        WHERE datname IS NOT NULL 
        GROUP BY datname, usename, client_addr, state 
        ORDER BY connection_count DESC;
    "

    if [[ "$DRY_RUN" == true ]]; then
        log_info "[DRY RUN] Would analyze connections"
        return 0
    fi

    psql -h "$PGHOST" -p "$PGPORT" -U "$PGUSER" -c "$sql_query" \
        >> "$LOG_DIR/weekly_maintenance_$DATE.log" 2>&1 || \
        log_error "Failed to analyze connections"
}

# Parse command line arguments
parse_arguments() {
    while [[ $# -gt 0 ]]; do
        case $1 in
            -h|--help)
                usage
                exit 0
                ;;
            -v|--verbose)
                VERBOSE=true
                shift
                ;;
            -n|--dry-run)
                DRY_RUN=true
                shift
                ;;
            -f|--force)
                FORCE=true
                shift
                ;;
            -u|--user)
                PGUSER="$2"
                shift 2
                ;;
            -H|--host)
                PGHOST="$2"
                shift 2
                ;;
            -p|--port)
                PGPORT="$2"
                shift 2
                ;;
            -l|--log-dir)
                LOG_DIR="$2"
                shift 2
                ;;
            -*)
                error_exit "Unknown option: $1" 1
                ;;
            *)
                if [[ -z "$DATABASE" ]]; then
                    DATABASE="$1"
                else
                    error_exit "Multiple database names specified: '$DATABASE' and '$1'" 1
                fi
                shift
                ;;
        esac
    done
}

# Main execution function
main() {
    local start_time
    start_time=$(date +%s)

    log_info "Starting PostgreSQL weekly maintenance script"
    log_info "Configuration: Host=$PGHOST, Port=$PGPORT, User=$PGUSER"

    if [[ "$DRY_RUN" == true ]]; then
        log_info "DRY RUN MODE - No changes will be made"
    fi

    if [[ -n "$DATABASE" ]]; then
        log_info "Target database: $DATABASE"
    else
        log_info "Processing all non-template databases"
    fi

    # Perform maintenance tasks
    perform_vacuum_analyze
    check_table_bloat
    check_unused_indexes
    check_deadlocks
    analyze_connections

    local end_time
    end_time=$(date +%s)
    local duration=$((end_time - start_time))

    log_info "Weekly maintenance completed successfully in ${duration} seconds"

    # Display log file location
    echo
    echo "Detailed logs available at: $LOG_DIR/weekly_maintenance_$DATE.log"

    return 0
}

#===============================================================================
# SCRIPT EXECUTION
#===============================================================================

# Parse command line arguments
parse_arguments "$@"

# Setup logging
setup_logging

# Validate PostgreSQL connection
validate_connection

# Execute main function
main

exit 0

 

Monthly Maintenance Activities

Monthly maintenance encompasses strategic activities including selective REINDEX operations, security audits, and configuration reviews . These tasks require more time and planning but provide significant long-term benefits for database health and performance . Monthly activities also include hardware health checks, autovacuum tuning reviews, and comprehensive performance analysis .

Security audits involve reviewing user privileges, access patterns, and authentication logs to ensure compliance with security policies . Configuration reviews examine parameter settings, identifying opportunities for optimization based on workload changes and performance metrics . Hardware health checks monitor system resources and identify potential infrastructure issues before they cause outages .

Quarterly Strategic Reviews

Quarterly maintenance focuses on major strategic activities including disaster recovery testing, capacity planning, and comprehensive security assessments . These activities require significant time investment but are crucial for long-term system reliability and business continuity . Quarterly reviews also include configuration optimization, technology assessments, and documentation updates .

Disaster recovery testing validates backup and recovery procedures, ensuring systems can be restored within defined recovery time objectives . Capacity planning analyzes growth trends and resource utilization patterns to anticipate future infrastructure needs . Security assessments evaluate the overall security posture and identify areas for improvement

Configuration Optimization

Memory Configuration Parameters

PostgreSQL memory configuration significantly impacts performance, with key parameters including shared_buffers, effective_cache_size, and work_mem . The shared_buffersparameter should typically be set to 25-40% of available RAM for dedicated database servers . The effective_cache_size parameter informs the query planner about available system cache and should be set to approximately 75% of total system memory .

The work_mem parameter controls memory allocation for sort and hash operations, with optimal values depending on workload characteristics and concurrent user counts . Higher values benefit analytical workloads with complex queries, while lower values suit OLTP environments with many concurrent connections . The maintenance_work_mem parameter affects maintenance operations like VACUUM and CREATE INDEX, and should be set higher than work_mem .

Autovacuum Configuration Tuning

Autovacuum parameters require careful tuning to balance maintenance efficiency with system performance impact . The autovacuum_vacuum_scale_factor parameter determines the fraction of table size that triggers vacuuming, with lower values providing more frequent maintenance . The autovacuum_max_workers parameter controls the number of concurrent autovacuum processes and should be adjusted based on system resources and workload patterns .

Table-specific autovacuum settings can override global parameters for high-activity tables . This approach allows fine-tuned maintenance schedules for different workload patterns within the same database . Monitoring autovacuum activity through pg_stat_user_tablesviews helps identify tables requiring parameter adjustments.

Write-Ahead Logging Configuration

WAL configuration affects both performance and recovery capabilities through parameters like wal_buffers, checkpoint_timeout, and max_wal_size . The wal_buffers parameter should typically be set to 16MB for most workloads, providing adequate buffering without excessive memory usage . Checkpoint configuration balances write performance with recovery time objectives .

The checkpoint_completion_target parameter spreads checkpoint I/O over time, reducing performance spikes during checkpoint operations . Setting this parameter to 0.9 allows checkpoints to complete over 90% of the checkpoint interval . The max_wal_size parameter controls when checkpoints are triggered based on WAL volume, affecting both performance and disk space usage .

Monitoring and Performance Analysis

Real-Time Monitoring Systems

Comprehensive monitoring requires tracking multiple metrics including query performance, resource utilization, and system health indicators . Monitoring tools like pgwatch2, pganalyze, and pgDash provide specialized PostgreSQL monitoring capabilities with customizable dashboards and alerting . These tools collect metrics from PostgreSQL's statistics views and system resources to provide holistic performance visibility .

Key performance indicators include query execution times, connection counts, lock wait times, and resource utilization patterns . Monitoring should track both current performance and historical trends to identify degradation patterns and capacity planning needs . Alert thresholds should be configured to provide early warning of potential issues while minimizing false positives .

Query Performance Analysis

Query performance monitoring involves tracking slow queries, analyzing execution plans, and identifying optimization opportunities . The pg_stat_statements extension provides detailed query statistics including execution counts, total time, and resource usage patterns . This extension helps identify the most resource-intensive queries and track performance changes over time .

EXPLAIN and EXPLAIN ANALYZE commands provide detailed information about query execution plans and actual performance characteristics . Regular analysis of execution plans helps identify suboptimal query strategies and opportunities for index optimization . Query optimization may involve index creation, query rewriting, or configuration parameter adjustments .

Lock and Concurrency Monitoring

Lock monitoring identifies blocking queries and potential deadlock situations that can impact application performance . PostgreSQL's pg_locks and pg_stat_activity views provide detailed information about current lock states and waiting queries . Monitoring lock wait times helps identify queries requiring optimization or application-level changes .

Deadlock analysis involves examining deadlock logs and identifying query patterns that cause conflicts . Common deadlock scenarios include concurrent updates to the same rows in different orders and conflicts between DDL and DML operations . Resolving deadlocks may require application changes, transaction restructuring, or query optimization .

Security Maintenance

Authentication and Access Control

PostgreSQL security maintenance involves regular review of authentication methods, user privileges, and access patterns . The pg_hba.conf file controls client authentication and should be reviewed regularly to ensure appropriate access restrictions . Strong authentication methods like SCRAM-SHA-256 should be preferred over legacy methods like MD5 .

User privilege management follows the principle of least privilege, granting only necessary permissions for specific roles . Regular access reviews help identify unused accounts, excessive privileges, and potential security violations . Role-based access control (RBAC) provides structured privilege management and simplifies security administration .

Security Monitoring and Auditing

Security monitoring involves tracking authentication attempts, privilege escalations, and data access patterns . The pgAudit extension provides comprehensive auditing capabilities for tracking database activities and maintaining compliance with security standards .Log analysis helps identify suspicious activities and potential security incidents.

Security patch management ensures timely application of security updates and vulnerability fixes . Regular security assessments evaluate the overall security posture and identify areas for improvement . Security incident response procedures should be documented and tested regularly .

Encryption and Data Protection

Data encryption involves implementing SSL/TLS for network communications and encryption at rest for sensitive data . SSL configuration requires proper certificate management and cipher suite selection to ensure secure communications . Regular review of encryption settings helps maintain security effectiveness and compliance requirements .

Backup encryption ensures data protection for backup files stored on external systems or cloud storage . Encryption key management involves secure key storage, rotation procedures, and access controls . Data classification helps identify sensitive information requiring additional protection measures .

Backup and Disaster Recovery

Backup Strategy Implementation

Comprehensive backup strategies combine logical and physical backup methods to provide flexible recovery options . Logical backups using pg_dump and pg_dumpall provide database portability and selective restore capabilities . Physical backups using pg_basebackup offer faster backup and restore operations for large databases .

Point-in-time recovery (PITR) capabilities require continuous WAL archiving and provide precise recovery to specific timestamps . WAL archiving should be configured to remote storage locations to protect against local disasters . Backup retention policies should balance recovery requirements with storage costs and compliance needs .

Disaster Recovery Testing

Regular disaster recovery testing validates backup procedures and ensures recovery time objectives can be met . Testing should include full database restores, point-in-time recovery scenarios, and failover procedures for high-availability configurations . Documentation should be updated based on testing results and lessons learned .

Recovery procedures should be documented with step-by-step instructions and tested regularly to ensure effectiveness . Staff training ensures personnel can execute recovery procedures under pressure . Recovery testing should include scenarios such as hardware failures, data corruption, and site disasters .

High Availability and Replication

PostgreSQL streaming replication provides high availability and load distribution capabilities . Replication monitoring involves tracking lag times, connection status, and failover readiness . Delayed replicas provide protection against logical errors and malicious changes .

Failover procedures should be automated where possible and tested regularly to ensure reliability . Monitoring replication lag helps identify performance issues and potential data loss scenarios . Network connectivity and security configurations are critical for reliable replication operation .

Performance Optimization Strategies

Workload-Specific Tuning

Performance optimization requires understanding workload characteristics and adjusting configurations accordingly . OLTP workloads benefit from lower work_mem settings, higher shared_buffers, and emphasis on connection pooling . OLAP workloads typically require higher work_mem, increased maintenance_work_mem, and parallel query configuration .

Mixed workloads require balanced configurations and may benefit from read replica implementations for analytical queries . Query routing can direct different workload types to appropriate database instances . Monitoring actual usage patterns helps validate tuning decisions and identify optimization opportunities .

Index Strategy Optimization

Index strategy development involves analyzing query patterns, identifying performance bottlenecks, and implementing appropriate index types . B-tree indexes suit most OLTP queries, while specialized index types like GIN and GiST benefit specific data types and query patterns . Composite indexes can improve performance for multi-column queries but increase maintenance overhead .

Index maintenance involves monitoring usage statistics and removing unused indexes that consume storage and slow write operations . Partial indexes can improve performance for queries with selective WHERE clauses . Index-only scans require proper visibility map maintenance through regular VACUUM operations .

Connection Pool Management

Connection pooling reduces connection overhead and improves resource utilization through tools like PgBouncer and Pgpool-II . Pool sizing should balance resource utilization with connection availability for peak loads . Connection pool monitoring helps identify bottlenecks and optimization opportunities .

Pool configuration involves setting appropriate pool sizes, timeout values, and connection limits based on application requirements . Monitoring connection patterns helps optimize pool settings and identify application issues . Pool maintenance includes regular restarts and configuration updates .

Automation and Tools

Maintenance Script Implementation

Automated maintenance scripts reduce manual effort and ensure consistent execution of routine tasks . Shell scripts can orchestrate complex maintenance procedures and provide comprehensive logging . Script scheduling through cron or similar tools enables unattended execution during maintenance windows .

Error handling in maintenance scripts should include appropriate logging, notification procedures, and rollback capabilities where applicable . Script testing in development environments ensures reliability before production deployment . Version control for maintenance scripts enables change tracking and rollback capabilities

Monitoring and Alerting Systems

Comprehensive monitoring systems provide early warning of potential issues and enable proactive maintenance . Alert configuration should balance sensitivity with practical response capabilities . Escalation procedures ensure critical issues receive appropriate attention .

Monitoring tools should integrate with existing infrastructure management systems where possible . Dashboard design should provide clear visibility into system health and performance trends . Historical data retention enables trend analysis and capacity planning .

Maintenance Documentation

Comprehensive documentation ensures maintenance procedures can be executed consistently and effectively . Procedure documentation should include step-by-step instructions, expected results, and troubleshooting guidance . Regular documentation updates ensure accuracy and relevance .

Knowledge management systems help organize maintenance information and ensure accessibility during emergencies . Training materials ensure staff competency in maintenance procedures . Documentation reviews should be conducted regularly to identify gaps and improvement opportunities

Emergency Response Procedures

Database Corruption Response

Database corruption requires immediate response to minimize data loss and restore service availability . Initial response involves stopping PostgreSQL immediately to prevent further damage . Corruption assessment using tools like pg_amcheck helps determine the extent of damage and appropriate recovery strategies .

Recovery procedures depend on corruption severity and available backup options .Point-in-time recovery may allow restoration to a point just before corruption occurred .In severe cases, data salvage operations may be necessary to recover partial information from damaged tables .

Performance Crisis Management

Performance emergencies require rapid identification and resolution of bottlenecks .Initial assessment involves identifying long-running queries, lock conflicts, and resource utilization patterns . Immediate actions may include terminating problematic queries, adjusting configuration parameters, or implementing temporary fixes .

Root cause analysis helps identify underlying issues requiring permanent solutions .Performance monitoring during crisis resolution ensures actions are effective . Post-incident reviews help improve response procedures and prevent recurrence .

Security Incident Response

Security incidents require immediate containment and assessment to prevent further damage . Initial response involves isolating affected systems and assessing the scope of potential compromise . Security log analysis helps identify attack vectors and affected resources .

Incident documentation ensures proper forensic analysis and compliance reporting .Communication procedures ensure appropriate stakeholders are informed of security incidents . Recovery procedures should restore service while addressing security vulnerabilities .

Best Practices and Recommendations

Proactive Maintenance Philosophy

Successful PostgreSQL maintenance requires a proactive approach that prevents issues rather than reacting to problems . Regular maintenance scheduling ensures critical tasks are completed consistently . Monitoring and alerting provide early warning of potential issues .

Capacity planning helps anticipate future needs and prevent resource constraints . Staff training ensures competency in maintenance procedures and emergency response .Continuous improvement processes help optimize maintenance effectiveness .

Change Management and Testing

All maintenance procedures should be tested in development environments before production implementation . Change documentation ensures traceability and enables rollback if necessary . Staged rollouts help minimize risk for significant changes .

Backup verification ensures recovery capabilities before implementing changes . Monitoring during and after changes helps identify unexpected impacts . Post-change reviews help improve future maintenance procedures .

Compliance and Standards

Maintenance procedures should align with organizational policies and regulatory requirements . Documentation standards ensure consistency and completeness . Audit trails provide evidence of maintenance compliance .

Regular compliance reviews ensure ongoing adherence to standards . Staff training includes compliance requirements and procedures . Incident reporting ensures appropriate handling of compliance violations .

This comprehensive PostgreSQL maintenance plan provides the framework for maintaining database health, performance, and reliability through systematic approaches to routine maintenance, monitoring, security, and emergency response procedures.

Further Reading