PostgreSQL is equipped with a robust suite of statistical functions that are essential for performing detailed data analysis directly within the database. These functions allow users to calculate various statistical measures, such as averages, variances, standard deviations, and correlations, directly from stored data. This capability not only streamlines the analytical process but also enhances performance by leveraging PostgreSQL’s powerful query optimization. Whether you’re analyzing financial data, customer metrics, or scientific measurements, PostgreSQL’s statistical functions provide the tools necessary to derive meaningful insights without the need for additional statistical software. This integration of statistical analysis into the database engine supports a wide range of applications, from business intelligence and market research to scientific research and predictive analytics, making PostgreSQL an invaluable tool for data scientists and analysts alike, Here is a comprehensive list of the different types of aggregate functions available, categorized by their use cases:
1. General-Purpose Aggregates
COUNT(expression)
: Counts non-null values in a set.SUM(expression)
: Sums up the values of the expression.AVG(expression)
: Calculates the average of the expression values.MIN(expression)
: Finds the minimum value of the expression.MAX(expression)
: Finds the maximum value of the expression.
2. Statistical Aggregates
STDDEV(expression)
: Computes the standard deviation of the input values.STDDEV_POP(expression)
: Computes the population standard deviation.STDDEV_SAMP(expression)
: Computes the sample standard deviation.VAR_POP(expression)
: Computes the population variance.VAR_SAMP(expression)
: Computes the sample variance.VARIANCE(expression)
: Alias forVAR_SAMP
.COVAR_POP(expression1, expression2)
: Calculates the population covariance between two expressions.COVAR_SAMP(expression1, expression2)
: Calculates the sample covariance.CORR(expression1, expression2)
: Calculates the correlation coefficient.
3. Ordered-Set Aggregates
MODE()
: Returns the mode of a set of values (the value that appears most frequently).PERCENTILE_CONT(fraction)
: Computes a percentile based on a continuous distribution of the column values.PERCENTILE_DISC(fraction)
: Computes a percentile based on a discrete distribution of the column values.
4. Hypothetical-Set Aggregates
- Similar to ordered-set aggregates but include a hypothetical element:
RANK(expression)
DENSE_RANK(expression)
PERCENT_RANK(expression)
CUME_DIST(expression)
5. Array Aggregates
ARRAY_AGG(expression)
: Aggregates values, including nulls, into an array.STRING_AGG(expression, delimiter)
: Concatenates input values into a string, separated by a delimiter.
6. JSON Aggregates
JSON_AGG(expression)
: Aggregates values as a JSON array.JSONB_AGG(expression)
: Aggregates values as a JSONB array.JSON_OBJECT_AGG(key, value)
: Aggregates input as a JSON object.JSONB_OBJECT_AGG(key, value)
: Aggregates input as a JSONB object.
7. XML Aggregates
XMLAGG(expression)
: Concatenates XML values.
8. Grouping Operations
GROUPING(expression)
: Identifies the level of aggregation, particularly in GROUP BY GROUPING SETS.
Specialized Aggregates
BIT_AND(expression)
: Performs a bitwise AND operation on all input values.BIT_OR(expression)
: Performs a bitwise OR operation on all input values.BOOL_AND(expression)
: Returns TRUE if all input values are TRUE.BOOL_OR(expression)
: Returns TRUE if at least one input value is TRUE.
These aggregate functions are versatile tools for data analysis, allowing complex statistical, mathematical, and group-based operations to be performed directly within SQL queries, enhancing the analytical capabilities of PostgreSQL.
More PostgreSQL Blogs
How to replace subqueries containing aggregate functions in PostgreSQL with Windows functions?
How to find outliers using the Median Absolute Deviation in PostgreSQL?