Automated Duplicate Record Cleanup and DBA Notification in MariaDB with Python

“Efficiency meets automation: Keeping your MariaDB clean and your DBA informed with Python.”

MinervaDB Automation and Performance Engineering

Duplicate Record Cleanup

To Duplicate Record Cleanup in MariaDB, you can use SQL queries to identify and remove them. Below are the steps for doing this and a Python script for scheduling this task weekly and notifying the DBA via email. In this example, we’ll use the smtplib library to send email notifications.

Step 1: Delete Duplicate Records

You can identify and delete duplicate records using a SQL query that utilizes the DELETE statement with a self-join. Here’s a sample SQL query to delete duplicates from a table named your_table based on a specific criteria (e.g., duplicate email values):

DELETE t1
FROM your_table t1
JOIN your_table t2
ON t1.email = t2.email
WHERE t1.id > t2.id;

This query will delete all but one of the duplicate records, keeping the one with the lowest id value. Adjust the table name, columns, and criteria to match your database schema.

Step 2: Python Script for Weekly Scheduling and Email Notification

You can use Python’s schedule library to schedule the task and smtplib to send email notifications. Before running the script, make sure you have configured your SMTP email settings.

import schedule
import time
import smtplib
from email.mime.text import MIMEText

# Database Connection
import mysql.connector

db_connection = mysql.connector.connect(
    host="your_db_host",
    user="your_db_user",
    password="your_db_password",
    database="your_db_name"
)

db_cursor = db_connection.cursor()

# Function to Delete Duplicate Records
def delete_duplicates():
    # Execute the SQL query to delete duplicates
    delete_query = """
    DELETE t1
    FROM your_table t1
    JOIN your_table t2
    ON t1.email = t2.email
    WHERE t1.id > t2.id;
    """
    db_cursor.execute(delete_query)
    db_connection.commit()

# Function to Send Email Notification
def send_email_notification():
    # Email Configuration
    sender_email = "your_sender_email@gmail.com"
    sender_password = "your_sender_password"
    recipient_email = "dba_email@example.com"
    
    # Create the email message
    msg = MIMEText("Duplicate records have been deleted in the MariaDB.")
    msg["Subject"] = "MariaDB Duplicate Records Cleanup"
    msg["From"] = sender_email
    msg["To"] = recipient_email
    
    # Send the email
    try:
        server = smtplib.SMTP("smtp.gmail.com", 587)
        server.starttls()
        server.login(sender_email, sender_password)
        server.sendmail(sender_email, recipient_email, msg.as_string())
        server.quit()
        print("Email notification sent successfully.")
    except Exception as e:
        print(f"Email notification failed: {str(e)}")

# Schedule the task to run weekly
schedule.every().week.at("00:00:00").do(delete_duplicates)
schedule.every().week.at("00:15:00").do(send_email_notification)

# Run the scheduled tasks
while True:
    schedule.run_pending()
    time.sleep(1)

Make sure to replace the placeholders (e.g., your_db_host, your_db_user, your_db_password, your_db_name, your_table, and email details) with your actual database and email configuration. This script will delete duplicate records and send an email notification to the DBA every week on the specified schedule.

Note: Ensure that you have installed the required Python libraries (schedule, smtplib) using pip if they are not already installed. Additionally, consider running this script on a server or machine where it can be scheduled to run automatically.

 

About Shiv Iyer 497 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.