The SQL clause GROUP BY takes the results of a SELECT command, divides it into groups, and produces a result set with one row per group. Aggregate functions can then be used to make summary info on each of the groups. Without GROUP BY, aggregate functions would be used on the whole result set made by the SELECT statement. The WITH ROLLUP and WITH CUBE clause provide additional detail or summary information to the result set produced by the SELECT and GROUP BY combination.
The syntax is as follows. The parts in bold were not included in my section on the SELECT Command.
SELECT [ALL | DISTINCT] [TOP n [PERCENT]] SelectList [INTO NewTable] [FROM TableSources] [WHERE WhereRowSearchCondition] [ GROUP BY [ALL] GroupList [WITH [ROLLUP | CUBE]] [HAVING HavingRowSearchCondition] ] [ORDER BY OrderList [ASC | DESC]]
If a SELECT command has any combination of WHERE, GROUP BY, HAVING, and ORDER BY clauses, then they are applied in that order.
The GROUP BY clause takes the result set of the SELECT command so far and reduces it into aggregate rows according to the columns listed in the GroupList. The ALL keyword returns all records instead of eliminating duplicates, the default behavior. Here are a few particulars about the GROUP BY clause:
GroupList has columns or non-aggregate expressions that reference a column. EG:
select Yr=Year(DateStarted), Charges=sum(ChargeAmount) from Service group by Year(DateStarted) order by Year(DateStarted)
SelectList must exist in the GroupList.SelectList and/or the HavingRowSearchCondition. Some aggregate functions include the following:The HAVING clause is sort of like a WHERE clause for the results of a SELECT commands that make use of the GROUP BY clause. The difference is that the WHERE clause applies before aggregation occurs, where as the HAVING clause applies on the results after aggregation has occurred.
The ROLLUP clause specifies that in addition to the aggregate rows produced by the SELECT and GROUP BY combination, there will also be aggregates for the hierarchy of values in the columns (aka dimensions in data warehouse terminology) in the GroupList. This is sort of like grand totals. The order of the columns in the GroupList can affect the number of rows produced.
The WITH CUBE clause specifies that in addition to the aggregate rows produced by the SELECT and GROUP BY combination, there will also be aggregates for all combinations of values in the columns (aka dimensions in data warehouse terminology) in the GroupList. The order of the columns in the GroupList is irrelevant.
Assume that the titles table has these values:
| title_id | title | type | pub_id | price |
|---|---|---|---|---|
| BU1032 | The Busy Executive's Database Guide | business | 1389 | 19.99 |
| BU1111 | Cooking with Computers: Surreptitious Balance Sheets | business | 1389 | 11.95 |
| BU2075 | You Can Combat Computer Stress! | business | 736 | 2.99 |
| BU7832 | Straight Talk About Computers | business | 1389 | 19.99 |
| MC2222 | Silicon Valley Gastronomic Treats | mod_cook | 877 | 19.99 |
| MC3021 | The Gourmet Microwave | mod_cook | 877 | 2.99 |
| MC3026 | The Psychology of Computer Cooking | UNDECIDED | 877 | |
| PC1035 | But Is It User Friendly? | popular_comp | 1389 | 22.95 |
| PC8888 | Secrets of Silicon Valley | popular_comp | 1389 | 20 |
| PC9999 | Net Etiquette | popular_comp | 1389 | |
| PS1372 | Computer Phobic AND Non-Phobic Individuals: Behavior Variations | psychology | 877 | 21.59 |
| PS2091 | Is Anger the Enemy? | psychology | 736 | 10.95 |
| PS2106 | Life Without Fear | psychology | 736 | 7 |
| PS3333 | Prolonged Data Deprivation: Four Case Studies | psychology | 736 | 19.99 |
| PS7777 | Emotional Security: A New Algorithm | psychology | 736 | 7.99 |
| TC3218 | Onions, Leeks, and Garlic: Cooking Secrets of the Mediterranean | trad_cook | 877 | 20.95 |
| TC4203 | Fifty Years in Buckingham Palace Kitchens | trad_cook | 877 | 11.95 |
| TC7777 | Sushi, Anyone? | trad_cook | 877 | 14.99 |
This SELECT command and GROUP BY clause might aggregate 18 rows into 3 aggregate rows.
SELECT pub_id FROM titles GROUP BY pub_id
Results:
pub_id ------ 0736 0877 1389
This SELECT command and GROUP BY clause might distinguish 18 rows via 2 columns for 8 aggregate rows. This will show two other things: (1) How many rows were aggregated into each aggregate row. (2) The average price from all the rows that made each aggregate row.
SELECT pub_id, type, cnt = count(*), avg(price) as avg FROM titles GROUP BY pub_id, type
Results:
pub_id type cnt avg --------------------------------------- 0736 business 1 2.9900 1389 business 3 17.3100 0877 mod_cook 2 11.4900 1389 popular_comp 3 21.4750 0736 psychology 4 11.4825 0877 psychology 1 21.5900 0877 trad_cook 3 15.9633 0877 UNDECIDED 1 NULL
This HAVING clause will show the same except but only for those whose average price is above a certain level. It will then order the result set by the average.
SELECT pub_id, type, count(*), avg(price) FROM titles GROUP BY pub_id, type HAVING avg(price) > 10 ORDER BY avg(price)
Results:
pub_id type --------------------------------------- 0736 psychology 4 11.4825 0877 mod_cook 2 11.4900 0877 trad_cook 3 15.9633 1389 business 3 17.3100 1389 popular_comp 3 21.4750 0877 psychology 1 21.5900
This example has a table (t1), which has more than one row per fkID, and returns a list of fkIDs that have more than one row and how many rows that fkID has.
SELECT fkID, count(fkID) as rowCount FROM t1 GROUP BY fkID HAVING count(fkID) > 1 ORDER BY rowCount desc
This example has a table (t1) that has more than one row per fkID, and returns a list of rows for fkIDs that have only one row.
SELECT * FROM t1 WHERE fkID NOT IN(
SELECT fkID
FROM t1
GROUP BY fkID
HAVING count(fkID) > 1)
This example has a table (t1) that has more than one row per fkID, and returns one row for each distinct fkID.
SELECT * FROM t1 WHERE ID IN(
SELECT MAX(ID)
FROM t1
GROUP BY fkID)
Let us say that you have the following table Games and that it has other rows with other UserIDs.
ID UserID Score Time ============================ 1 1 7 9:01 2 1 8 9:02 3 1 6 9:03
Finding the max/min scores and times per user is easy:
SELECT MIN(Score) as MaxScore, MAX(Time) as MaxTime FROM Games GROUP BY UserID
If we want all the columns for a given UserID for the latest time?
--In PostgreSQL this is trivial! SELECT DISTINCT ON (UserID), * FROM Games ORDER BY UserID
Unfortunately, DISTINCT ON() is not standard SQL. Standard SQL has to use a much more limited and harder workaround.
SELECT *
FROM Games G JOIN (
SELECT MAX(ID) as G2ID
FROM Games
GROUP BY UserID
) G2 ON G2.G2ID = G.ID
ORDER BY UserID
This cheesy solution assumes that Games.ID is in chronological order. If you wanted to do something like get the time of the max score, then you have to be much more creative.
How to do a running total. Assume a table with two fields (strDate and intPatient).
First we'll show how many patients there are per date.
select sum(intPatient) as Patients, strDate as [Date] from fact_Patient group by strDate having strDate is not null order by strDate /* That makes something like this: Date Patients ================== 200501 1 200502 2 200503 1 */
Second we'll do the running total.
select P.strDate as [Date], (
select sum(intPatient) from fact_Patient
where strDate <= P.strDate
) as 'Cumulative Patients'
from fact_Patient as P
group by strDate
having strDate is not null
order by strDate
/* That makes something like this:
Date Cumulative Patients
=============================
200501 1
200502 3
200503 4
*/
For more, see "SQL Server: Calculating Running Totals, Subtotals and Grand Total Without a Cursor" [databasejournal.com/features/mssql/article.php/3112381] and "Calculating Running Totals" [http://www.sqlteam.com/article/calculating-running-totals].
If an Inventory contains this list:
Item Color Quantity -------------------- -------------------- -------------------------- Table Blue 124 Table Red 223 Chair Blue 101 Chair Red 210
Then ROLLUP can generate aggregates for the hierarchy of values in Item and Color:
SELECT CASE WHEN (GROUPING(Item) = 1) THEN 'ALL'
ELSE ISNULL(Item, 'UNKNOWN')
END AS Item,
CASE WHEN (GROUPING(Color) = 1) THEN 'ALL'
ELSE ISNULL(Color, 'UNKNOWN')
END AS Color,
SUM(Quantity) AS QtySum
FROM Inventory
GROUP BY Item, Color WITH ROLLUP
Item Color QtySum
-------------------- -------------------- --------------------------
Chair Blue 101.00
Chair Red 210.00
Chair ALL 311.00
Table Blue 124.00
Table Red 223.00
Table ALL 347.00
ALL ALL 658.00
(7 row(s) affected)
Note that ROLLUP returns NULL for the column/dimension being aggregated but the GROUPING function was used to convert the NULL to ALL.
If the same table is used as in the ROLLUP example, then CUBE can generate aggregates for all combinations of values in Item and Color:
SELECT CASE WHEN (GROUPING(Item) = 1) THEN 'ALL'
ELSE ISNULL(Item, 'UNKNOWN')
END AS Item,
CASE WHEN (GROUPING(Color) = 1) THEN 'ALL'
ELSE ISNULL(Color, 'UNKNOWN')
END AS Color,
SUM(Quantity) AS QtySum
FROM Inventory
GROUP BY Item, Color WITH ROLLUP
Item Color QtySum
-------------------- -------------------- --------------------------
Chair Blue 101.00
Chair Red 210.00
Chair ALL 311.00
Table Blue 124.00
Table Red 223.00
Table ALL 347.00
ALL ALL 658.00
ALL Blue 225.00
ALL Red 433.00
(9 row(s) affected)
Note that that CUBE produced the two extra rows marked in red, i.e. summaries based on values in the second column Color.
The OVER() clause actually does NOT have to use the GROUP BY clause, and yet it it allows you to use aggregate functions.
SELECT RunnerID, RaceID, MilesRun, DateOfRun
,SUM(MilesRun) OVER(PARTITION BY RunnerID) AS 'Total'
,AVG(MilesRun) OVER(PARTITION BY RunnerID) AS 'Avg'
,COUNT(MilesRun) OVER(PARTITION BY RunnerID) AS 'Count'
,MIN(MilesRun) OVER(PARTITION BY RunnerID) AS 'Min'
,MAX(MilesRun) OVER(PARTITION BY RunnerID) AS 'Max'
FROM RaceResults
WHERE RunnerID in (7,8);
The OVER() clause can also be used with various ranking functions.
SELECT RunnerID, Age
,ROW_NUMBER() OVER (ORDER BY a.Age) AS 'Row Number'
,RANK() OVER (ORDER BY a.Age) AS 'Rank'
,DENSE_RANK() OVER (ORDER BY a.Age) AS 'Dense Rank'
,NTILE(4) OVER (ORDER BY a.Age) AS 'Quartile'
FROM Runners;
Page Modified: (Hand noted: 2007-10-12 19:50:30Z) (Auto noted: 2008-05-29 18:44:59Z)