Why SELECT * from is bad for SQL performance?

Using SELECT * can be bad for SQL performance for several reasons, including the following:

  1. Unnecessary columns: SELECT * returns all columns in a table, even those that aren’t needed for the query. This means that more data is returned than is necessary, which can result in slower query performance. For example, consider a table with 100 columns, but the query only needs 5. By using SELECT *, the query returns all 100 columns, even though only 5 are needed.
  2. Increased network traffic: By returning more data than is needed, SELECT * can increase network traffic between the database server and the application server. This can slow down query performance, especially in high-traffic environments.
  3. Larger query result size: SELECT * can return a larger result set than is needed, which can be a performance issue for large tables. For example, consider a table with millions of rows, but the query only needs a few hundred. By using SELECT *, the query returns millions of rows, which can be slow to process and transfer.
  4. Increased memory usage: Returning more data than is needed can also increase memory usage on the database server, especially if the query result set is large.
  5. Slower indexing: When a query uses SELECT *, the database may need to scan more data than is necessary, which can slow down the indexing process. This is because the database needs to check each column to determine which columns to include in the index.
  6. Joins and subqueries: SELECT * can cause issues when used with joins and subqueries. For example, when using a join, the result set may contain duplicate column names, which can cause conflicts and errors.
  7. Query optimization: SELECT * can make query optimization more difficult, as the database engine needs to consider all columns in the table when optimizing the query. This can lead to less efficient query plans, which can result in slower query execution.
  8. Maintenance: Using SELECT * can make it more difficult to maintain the query over time. For example, if the table schema changes, the query may need to be updated to exclude new columns. If the query is a part of a larger application, this can be a significant maintenance issue.

Overall, it is best practice to only select the columns that are needed for a query, rather than using SELECT *. This can result in faster query performance and reduce the risk of performance issues in high-traffic environments.

Conclusion

Using SELECT * in SQL queries can lead to inefficient performance due to unnecessary data retrieval, increased network traffic, larger result sets, higher memory usage, slower indexing, and challenges with joins and subqueries. It can also complicate query optimization and maintenance. To optimize SQL performance and reduce potential issues, it’s recommended to specify only the required columns in SELECT statements.

About Shiv Iyer 444 Articles
Open Source Database Systems Engineer with a deep understanding of Optimizer Internals, Performance Engineering, Scalability and Data SRE. Shiv currently is the Founder, Investor, Board Member and CEO of multiple Database Systems Infrastructure Operations companies in the Transaction Processing Computing and ColumnStores ecosystem. He is also a frequent speaker in open source software conferences globally.