
“Efficiency meets automation: Keeping your MariaDB clean and your DBA informed with Python.”
MinervaDB Automation and Performance Engineering
Duplicate Record Cleanup
To Duplicate Record Cleanup in MariaDB, you can use SQL queries to identify and remove them. Below are the steps for doing this and a Python script for scheduling this task weekly and notifying the DBA via email. In this example, we’ll use the smtplib library to send email notifications.
Step 1: Delete Duplicate Records
You can identify and delete duplicate records using a SQL query that utilizes the DELETE statement with a self-join. Here’s a sample SQL query to delete duplicates from a table named your_table based on a specific criteria (e.g., duplicate email values):
DELETE t1
FROM your_table t1
JOIN your_table t2
ON t1.email = t2.email
WHERE t1.id > t2.id;
This query will delete all but one of the duplicate records, keeping the one with the lowest id value. Adjust the table name, columns, and criteria to match your database schema.
Step 2: Python Script for Weekly Scheduling and Email Notification
You can use Python’s schedule library to schedule the task and smtplib to send email notifications. Before running the script, make sure you have configured your SMTP email settings.
import schedule
import time
import smtplib
from email.mime.text import MIMEText
# Database Connection
import mysql.connector
db_connection = mysql.connector.connect(
host="your_db_host",
user="your_db_user",
password="your_db_password",
database="your_db_name"
)
db_cursor = db_connection.cursor()
# Function to Delete Duplicate Records
def delete_duplicates():
# Execute the SQL query to delete duplicates
delete_query = """
DELETE t1
FROM your_table t1
JOIN your_table t2
ON t1.email = t2.email
WHERE t1.id > t2.id;
"""
db_cursor.execute(delete_query)
db_connection.commit()
# Function to Send Email Notification
def send_email_notification():
# Email Configuration
sender_email = "your_sender_email@gmail.com"
sender_password = "your_sender_password"
recipient_email = "dba_email@example.com"
# Create the email message
msg = MIMEText("Duplicate records have been deleted in the MariaDB.")
msg["Subject"] = "MariaDB Duplicate Records Cleanup"
msg["From"] = sender_email
msg["To"] = recipient_email
# Send the email
try:
server = smtplib.SMTP("smtp.gmail.com", 587)
server.starttls()
server.login(sender_email, sender_password)
server.sendmail(sender_email, recipient_email, msg.as_string())
server.quit()
print("Email notification sent successfully.")
except Exception as e:
print(f"Email notification failed: {str(e)}")
# Schedule the task to run weekly
schedule.every().week.at("00:00:00").do(delete_duplicates)
schedule.every().week.at("00:15:00").do(send_email_notification)
# Run the scheduled tasks
while True:
schedule.run_pending()
time.sleep(1)
Make sure to replace the placeholders (e.g., your_db_host, your_db_user, your_db_password, your_db_name, your_table, and email details) with your actual database and email configuration. This script will delete duplicate records and send an email notification to the DBA every week on the specified schedule.
Note: Ensure that you have installed the required Python libraries (schedule, smtplib) using pip if they are not already installed. Additionally, consider running this script on a server or machine where it can be scheduled to run automatically.