Is the GROUP BY clause in SQL redundant?
NickName:Mike Chamberlain Ask DateTime:2010-12-22T09:12:00

Is the GROUP BY clause in SQL redundant?

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns, for instance:

SELECT storeid, storename, SUM(revenue), COUNT(*)
FROM Sales 
GROUP BY storeid, storename

It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.

SELECT (2 * (x + y)) / z + 1, MyFunction(x, y), SUM(z)
FROM AnotherTable
GROUP BY (2 * (x + y)) / z + 1, MyFunction(x, y)

If we ever change the SELECT statement, we must remember to make the same change to our GROUP BY clause.

So is the GROUP BY clause is redundant?

  • If this is indeed the case, then why is there a GROUP BY clause in SQL at all?
  • If this is not the case, then what extra functionality does GROUP BY give us?

Copyright Notice:Content Author:「Mike Chamberlain」,Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article:https://stackoverflow.com/questions/4505406/is-the-group-by-clause-in-sql-redundant

Answers
Mark Byers 2010-12-22T01:19:56

\n Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns\n\n\nThis is not true in general. MySQL for example doesn't require this, and the SQL standard doesn't say this either.\n\n\nDebunking GROUP BY myths\n\n\n\n It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.\n\n\nAlso not true in general. MySQL (and perhaps other databases too) allow column aliases to be used in the GROUP BY clause:\n\nSELECT (2 * (x + y)) / z + 1 AS a, MyFunction(x, y) AS b, SUM(z)\nFROM AnotherTable\nGROUP BY a, b\n\n\n\n If this is not the case, then what extra functionality does GROUP BY give us?\n\n\nThe only way of specifying what to group by is to use a GROUP BY clause. You cannot necessarily deduce it from the columns mentioned in the SELECT. In fact you don't even have to select all the columns mentioned in the GROUP BY:\n\nSELECT MAX(col2)\nFROM foo\nGROUP BY col1\nHAVING COUNT(*) = 2\n",


BeemerGuy 2010-12-22T01:20:29

I may agree with what you're saying, but it is not redundant in all cases. \n\nConsider this: \n\nSELECT FirstName \n + ' (' + REPLACE(Address1, ',', ' ') + ' '\n + REPLACE(Address2, ',', ' ') + ', '\n + UPPER(State) + ' '\n + 'USA)',\n COUNT(*)\nFROM Profiles\nGROUP BY FirstName, Address1, Address2, State\n\n\nIn this case I just want the number of same-first-name, same-address profiles.\nAs you can see, I didn't have to repeat the \"complex\" operations of the SELECT in the GROUP BY statement. \n\nI think to allow this \"sometimes like this, sometimes like that\", you are taxed with having to do repetitions most of the time.",


OMG Ponies 2010-12-22T04:23:06

The GROUP BY clause is not redundant -- it's function is to define the scope that the aggregate functions work on. It's your belief that the optimizer should read from the SELECT clause to know what the scope of the grouping is, but access to column aliases are available in the ORDER BY clause at the earliest (with the exception of MySQL, where the GROUP BY and HAVING clauses support column aliases). There's no means to support your expectation, currently. ANSI standards are nice, but the reality is ANSI standards aren't implemented in their entirety by vendors. It's hunt & peck support, like how PostgreSQL 8.4+ supports more analytic functions than Oracle (certainly more than SQL Server).\n\nMySQL and SQLite support omitting columns from the GROUP BY, but those column values are, per the documentation, arbitrary -- the value can not be guaranteed to be returned consistently. And the scope of the grouping is also different, which has the potential to drastically effect the resultset returned. Then there's the problem of relying on vendor specific syntax while needing to port to other databases because DB2, Oracle, SQL Server and PostgreSQL do not support the functionality. \n\nBut with the advent of analytic/windowing/ranking functionality, you can get aggregate functionality without the GROUP BY. IE:\n\nSELECT t.id,\n COUNT(t.column) OVER(PARTITION BY t.id) AS num,\n SUM(t.column) OVER(PARTITION BY t.id) AS sum\n FROM YOUR_TABLE t\n\n\nIt's more verbose, and prone to error though because you can't define a PARTITION BY/ORDER BY that applies to all the analytic functions in a query. Currently... But Analytics won't supplant aggregates any time soon -- support started in Oracle 9i, SQL Server 2005+, and PostgreSQL 8.4+. I'm aware that DB2 supports analytics, but I don't know details beyond that.",


More about “Is the GROUP BY clause in SQL redundant?” related questions

Is the GROUP BY clause in SQL redundant?

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns, for instance: SELECT storeid, storename, SUM(revenue), COUNT(*) FROM Sales GR...

Show Detail

Is SELECT DISTINCT always redundant when using a GROUP BY clause?

Is there a case where adding DISTINCT would change the results of a SELECT query that uses a GROUP BY clause? Group by and distinct produce similar execution plans. From my understanding, tables ...

Show Detail

Redundant aggregate vs redundant GROUP BY?

I have a query that joins two tables by a column that happens to be a primary key for one of those tables. I also GROUP BY the same column. Now I need to return some columns from the table with the

Show Detail

meta: why do I have to specify a group by clause

Just curious why I really have to specify a group by clause since if I use a function that requiers a group by clause(can't remember the general name of those functions), eg. SUM(). Because if I u...

Show Detail

using group by clause in subquery in sql

I am trying to use group by clause in subquery which is in from clause select userID,count(id) from ( ( select id,max(bidAmount),userID from Bids group by id,bidAmo...

Show Detail

Redundant clause in match

When I run the following script: Definition inv (a: Prop): Prop := match a with | False => True | True => False end. I get "Error: This clause is redundant." Any idea why this happens? Tha

Show Detail

oracle sql select syntax with GROUP BY and HAVING clause

I been going thru some of the sql syntax to study for the oracle sql exam, I found something rather confusing based on the official references, the select syntax is as follow : SELECT [ hint...

Show Detail

GROUP BY clause order omitting results in Oracle 11g query

I have a simple query that appears to give the desired result: select op.opr, op.last, op.dept, count(*) as counter from DWRVWR.BCA_M_OPRIDS1 op where op.opr = '21B' group by op.opr, op.last ,op.

Show Detail

SQL: GROUP BY Clause

SELECT (1.0*( SELECT SUM(r.SalesVolume) FROM RawData r INNER JOIN Product p ON r.ProductId = p.ProductId WHERE p.Distribut...

Show Detail

SQL Server CE GROUP BY clause

I have to use GROUP BY statement to pull data from SQL Server CE. Now I'm getting In aggregate and grouping expressions, the SELECT clause can contain only aggregates and grouping expressions...

Show Detail