# How to find outliers using the Median Absolute Deviation in PostgreSQL?

In PostgreSQL, you can use the Median Absolute Deviation (MAD) to identify outliers in a dataset. MAD is a robust statistic that measures the variability of a dataset, and it is less sensitive to outliers than traditional measures such as standard deviation. The MAD is defined as the median of the absolute deviations from the median of the dataset.

Here’s how you can use the MAD to identify outliers in PostgreSQL:

1. Calculate the median of the dataset:

SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY my_column) AS median
FROM my_table;

In this example, my_table is the name of the table containing the dataset, and my_column is the name of the column containing the data to be analyzed. The percentile_cont function calculates the median of the dataset.

2. Calculate the Median Absolute Deviation (MAD):

WITH median_value AS (
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY my_column) AS median
FROM my_table
)
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY abs(my_column – median_value.median)) AS mad
FROM my_table, median_value;

In this example, the WITH clause calculates the median of the dataset and stores it in a temporary table called median_value. The main query then calculates the MAD by calculating the absolute deviation of each data point from the median, and then calculating the median of these absolute deviations.

1. Identify outliers using the MAD:

WITH median_value AS (
SELECT percentile_cont(0.5) WITHIN GROUP (ORDER BY my_column) AS median
FROM my_table