Comprehensive Guide to Aggregate Functions in PostgreSQL

PostgreSQL is equipped with a robust suite of statistical functions that are essential for performing detailed data analysis directly within the database. These functions allow users to calculate various statistical measures, such as averages, variances, standard deviations, and correlations, directly from stored data. This capability not only streamlines the analytical process but also enhances performance by leveraging PostgreSQL’s powerful query optimization. Whether you’re analyzing financial data, customer metrics, or scientific measurements, PostgreSQL’s statistical functions provide the tools necessary to derive meaningful insights without the need for additional statistical software. This integration of statistical analysis into the database engine supports a wide range of applications, from business intelligence and market research to scientific research and predictive analytics, making PostgreSQL an invaluable tool for data scientists and analysts alike, Here is a comprehensive list of the different types of aggregate functions available, categorized by their use cases:

1. General-Purpose Aggregates

  • COUNT(expression): Counts non-null values in a set.
  • SUM(expression): Sums up the values of the expression.
  • AVG(expression): Calculates the average of the expression values.
  • MIN(expression): Finds the minimum value of the expression.
  • MAX(expression): Finds the maximum value of the expression.

2. Statistical Aggregates

  • STDDEV(expression): Computes the standard deviation of the input values.
  • STDDEV_POP(expression): Computes the population standard deviation.
  • STDDEV_SAMP(expression): Computes the sample standard deviation.
  • VAR_POP(expression): Computes the population variance.
  • VAR_SAMP(expression): Computes the sample variance.
  • VARIANCE(expression): Alias for VAR_SAMP.
  • COVAR_POP(expression1, expression2): Calculates the population covariance between two expressions.
  • COVAR_SAMP(expression1, expression2): Calculates the sample covariance.
  • CORR(expression1, expression2): Calculates the correlation coefficient.

3. Ordered-Set Aggregates

  • MODE(): Returns the mode of a set of values (the value that appears most frequently).
  • PERCENTILE_CONT(fraction): Computes a percentile based on a continuous distribution of the column values.
  • PERCENTILE_DISC(fraction): Computes a percentile based on a discrete distribution of the column values.

4. Hypothetical-Set Aggregates

  • Similar to ordered-set aggregates but include a hypothetical element:
    • RANK(expression)
    • DENSE_RANK(expression)
    • PERCENT_RANK(expression)
    • CUME_DIST(expression)

5. Array Aggregates

  • ARRAY_AGG(expression): Aggregates values, including nulls, into an array.
  • STRING_AGG(expression, delimiter): Concatenates input values into a string, separated by a delimiter.

6. JSON Aggregates

  • JSON_AGG(expression): Aggregates values as a JSON array.
  • JSONB_AGG(expression): Aggregates values as a JSONB array.
  • JSON_OBJECT_AGG(key, value): Aggregates input as a JSON object.
  • JSONB_OBJECT_AGG(key, value): Aggregates input as a JSONB object.

7. XML Aggregates

  • XMLAGG(expression): Concatenates XML values.

8. Grouping Operations

  • GROUPING(expression): Identifies the level of aggregation, particularly in GROUP BY GROUPING SETS.

Specialized Aggregates

  • BIT_AND(expression): Performs a bitwise AND operation on all input values.
  • BIT_OR(expression): Performs a bitwise OR operation on all input values.
  • BOOL_AND(expression): Returns TRUE if all input values are TRUE.
  • BOOL_OR(expression): Returns TRUE if at least one input value is TRUE.

These aggregate functions are versatile tools for data analysis, allowing complex statistical, mathematical, and group-based operations to be performed directly within SQL queries, enhancing the analytical capabilities of PostgreSQL.

More PostgreSQL Blogs 

How to replace subqueries containing aggregate functions in PostgreSQL with Windows functions?

How to compute cost of slow query in PostgreSQL?

How to find outliers using the Median Absolute Deviation in PostgreSQL?

About Shiv Iyer 452 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.