NickName:Mike Chamberlain Ask DateTime:2010-12-22T09:12:00

Is the GROUP BY clause in SQL redundant?

Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns, for instance:

SELECT storeid, storename, SUM(revenue), COUNT(*)
FROM Sales 
GROUP BY storeid, storename

It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.

SELECT (2 * (x + y)) / z + 1, MyFunction(x, y), SUM(z)
FROM AnotherTable
GROUP BY (2 * (x + y)) / z + 1, MyFunction(x, y)

If we ever change the SELECT statement, we must remember to make the same change to our GROUP BY clause.

So is the GROUP BY clause is redundant?

If this is indeed the case, then why is there a GROUP BY clause in SQL at all?
If this is not the case, then what extra functionality does GROUP BY give us?

Copyright Notice：Content Author:「Mike Chamberlain」，Reproduced under the CC 4.0 BY-SA copyright license with a link to the original source and this disclaimer.
Link to original article：https://stackoverflow.com/questions/4505406/is-the-group-by-clause-in-sql-redundant

Answers

Mark Byers 2010-12-22T01:19:56

\n Whenever we use an aggregate function in SQL (MIN, MAX, AVG etc), we must always GROUP BY all non-aggregated columns\n\n\nThis is not true in general. MySQL for example doesn't require this, and the SQL standard doesn't say this either.\n\n\nDebunking GROUP BY myths\n\n\n\n It becomes even more intrusive when we use a function or other calculation in our SELECT statement, as this must also be copied to the GROUP BY clause.\n\n\nAlso not true in general. MySQL (and perhaps other databases too) allow column aliases to be used in the GROUP BY clause:\n\nSELECT (2 * (x + y)) / z + 1 AS a, MyFunction(x, y) AS b, SUM(z)\nFROM AnotherTable\nGROUP BY a, b\n\n\n\n If this is not the case, then what extra functionality does GROUP BY give us?\n\n\nThe only way of specifying what to group by is to use a GROUP BY clause. You cannot necessarily deduce it from the columns mentioned in the SELECT. In fact you don't even have to select all the columns mentioned in the GROUP BY:\n\nSELECT MAX(col2)\nFROM foo\nGROUP BY col1\nHAVING COUNT(*) = 2\n",

BeemerGuy 2010-12-22T01:20:29

I may agree with what you're saying, but it is not redundant in all cases. \n\nConsider this: \n\nSELECT FirstName \n + ' (' + REPLACE(Address1, ',', ' ') + ' '\n + REPLACE(Address2, ',', ' ') + ', '\n + UPPER(State) + ' '\n + 'USA)',\n COUNT(*)\nFROM Profiles\nGROUP BY FirstName, Address1, Address2, State\n\n\nIn this case I just want the number of same-first-name, same-address profiles.\nAs you can see, I didn't have to repeat the \"complex\" operations of the SELECT in the GROUP BY statement. \n\nI think to allow this \"sometimes like this, sometimes like that\", you are taxed with having to do repetitions most of the time.",

OMG Ponies 2010-12-22T04:23:06

The GROUP BY clause is not redundant -- it's function is to define the scope that the aggregate functions work on. It's your belief that the optimizer should read from the SELECT clause to know what the scope of the grouping is, but access to column aliases are available in the ORDER BY clause at the earliest (with the exception of MySQL, where the GROUP BY and HAVING clauses support column aliases). There's no means to support your expectation, currently. ANSI standards are nice, but the reality is ANSI standards aren't implemented in their entirety by vendors. It's hunt & peck support, like how PostgreSQL 8.4+ supports more analytic functions than Oracle (certainly more than SQL Server).\n\nMySQL and SQLite support omitting columns from the GROUP BY, but those column values are, per the documentation, arbitrary -- the value can not be guaranteed to be returned consistently. And the scope of the grouping is also different, which has the potential to drastically effect the resultset returned. Then there's the problem of relying on vendor specific syntax while needing to port to other databases because DB2, Oracle, SQL Server and PostgreSQL do not support the functionality. \n\nBut with the advent of analytic/windowing/ranking functionality, you can get aggregate functionality without the GROUP BY. IE:\n\nSELECT t.id,\n COUNT(t.column) OVER(PARTITION BY t.id) AS num,\n SUM(t.column) OVER(PARTITION BY t.id) AS sum\n FROM YOUR_TABLE t\n\n\nIt's more verbose, and prone to error though because you can't define a PARTITION BY/ORDER BY that applies to all the analytic functions in a query. Currently... But Analytics won't supplant aggregates any time soon -- support started in Oracle 9i, SQL Server 2005+, and PostgreSQL 8.4+. I'm aware that DB2 supports analytics, but I don't know details beyond that.",