GROUP BY ROLLUP / CUBE
Lets have a look at my fictional salary progression compared to that of a fictional friend, who has chosen a different career path (observe the salary boost in 2011):
select 'Lukas' as employee, 'SoftSkills' as company, 80000 as salary, 2007 as year from dual union all select 'Lukas', 'SoftSkills', 80000, 2008 from dual union all select 'Lukas', 'SmartSoft', 90000, 2009 from dual union all select 'Lukas', 'SmartSoft', 95000, 2010 from dual union all select 'Lukas', 'jOOQ', 200000, 2011 from dual union all select 'Lukas', 'jOOQ', 250000, 2012 from dual union all select 'Tom', 'SoftSkills', 89000, 2007 from dual union all select 'Tom', 'SoftSkills', 90000, 2008 from dual union all select 'Tom', 'SoftSkills', 91000, 2009 from dual union all select 'Tom', 'SmartSoft', 92000, 2010 from dual union all select 'Tom', 'SmartSoft', 93000, 2011 from dual union all select 'Tom', 'SmartSoft', 94000, 2012 from dual
Now we’re used to gathering statistics using simple grouping and simple aggregate functions. For instance, let’s calculate how much Lukas and Tom earned on average over the past few years:
with data as ([above select]) select employee, avg(salary) from data group by employee
This will show that Lukas has earned more:
+--------+-----------+ |EMPLOYEE|AVG(SALARY)| +--------+-----------+ |Lukas | 132500| |Tom | 91500| +--------+-----------+
So it’s probably interesting to find out what they have earned on average in which company:
with data as (...) select company, employee, avg(salary) from data group by company, employee order by company, employee
And immediately, it becomes clear where the big bucks are and that Tom has made a bad decision
+----------+--------+-----------+ |COMPANY |EMPLOYEE|AVG(SALARY)| +----------+--------+-----------+ |jOOQ |Lukas | 225000| |SmartSoft |Lukas | 92500| |SmartSoft |Tom | 93000| |SoftSkills|Lukas | 80000| |SoftSkills|Tom | 90000| +----------+--------+-----------+
ROLLUP
By adding grouping fields, we “lose” some aggregation information. In the above examples, the overall average salary per employee is no longer available directly from the result. That’s obvious, considering the grouping algorithm. But in nice-looking reports, we often want to display those grouping headers as well. This is where ROLLUP, CUBE (and GROUPING SETS) come into play. Consider the following query:
with data as (...) select company, employee, avg(salary) from data group by rollup(company), employee
The above rollup function will now add additional rows to the grouping result set, holding useful aggregated values. In this case, when we “roll up the salaries of the company”, we will get the average of the remaining grouping fields, i.e. the average per employee:
+----------+--------+-----------+ |COMPANY |EMPLOYEE|AVG(SALARY)| +----------+--------+-----------+ |SmartSoft |Tom | 93000| |SoftSkills|Tom | 90000| |{null} |Tom | 91500| |jOOQ |Lukas | 225000| |SmartSoft |Lukas | 92500| |SoftSkills|Lukas | 80000| |{null} |Lukas | 132500| +----------+--------+-----------+
Note how these rows hold the same information as the ones from the first query, where we were only grouping by employee… This becomes even more interesting, when we put more grouping fields into the rollup function:
with data as (...) select company, employee, avg(salary) from data group by rollup(employee, company)
As you can see, the order of grouping fields is important in the rollup function. The result from this query now also adds the overall average salary paid to all employees in all companies
+----------+--------+-----------+ |COMPANY |EMPLOYEE|AVG(SALARY)| +----------+--------+-----------+ |SmartSoft |Tom | 93000| |SoftSkills|Tom | 90000| |{null} |Tom | 91500| |jOOQ |Lukas | 225000| |SmartSoft |Lukas | 92500| |SoftSkills|Lukas | 80000| |{null} |Lukas | 132500| |{null} |{null} | 112000| +----------+--------+-----------+
In order to identify the totals rows for reporting, you can use the GROUPING() function in DB2, Oracle, SQL Server and Sybase SQL Anywhere. In Oracle and SQL Server, there’s the even more useful GROUPING_ID() function:
with data as (...) select grouping_id(employee, company) id, company, employee, avg(salary) from data group by rollup(employee, company)
It documents on what “grouping level” of the rollup function the current row was produced:
+----+----------+--------+-----------+ | ID|COMPANY |EMPLOYEE|AVG(SALARY)| +----+----------+--------+-----------+ | 0|SmartSoft |Tom | 93000| | 0|SoftSkills|Tom | 90000| | 1|{null} |Tom | 91500| | 0|jOOQ |Lukas | 225000| | 0|SmartSoft |Lukas | 92500| | 0|SoftSkills|Lukas | 80000| | 1|{null} |Lukas | 132500| | 3|{null} |{null} | 112000| +----+----------+--------+-----------+
CUBE
The cube function works similar, except that the order of cube grouping fields becomes irrelevant, as all combinations of grouping are combined. This is a bit tricky to put in words, so lets put it in action:
with data as (...) select grouping_id(employee, company) id, company, employee, avg(salary) from data group by cube(employee, company)
In the following result, you will get:
- GROUPING_ID() = 0: Average per company and employee. This is the normal grouping result
- GROUPING_ID() = 1: Average per employee
- GROUPING_ID() = 2: Average per company
- GROUPING_ID() = 3: Overall average
+----+----------+--------+-----------+ | ID|COMPANY |EMPLOYEE|AVG(SALARY)| +----+----------+--------+-----------+ | 3|{null} |{null} | 112000| | 2|jOOQ |{null} | 225000| | 2|SmartSoft |{null} | 92800| | 2|SoftSkills|{null} | 86000| | 1|{null} |Tom | 91500| | 0|SmartSoft |Tom | 93000| | 0|SoftSkills|Tom | 90000| | 1|{null} |Lukas | 132500| | 0|jOOQ |Lukas | 225000| | 0|SmartSoft |Lukas | 92500| | 0|SoftSkills|Lukas | 80000| +----+----------+--------+-----------+
In other words, using the CUBE() function, you will get grouping results for every possible combination of the grouping fields supplied to the CUBE() function, which results in 2^n GROUPING_ID()’s for n “cubed” grouping fields
Support in jOOQ
jOOQ 2.0 introduces support for these functions. If you want to translate the last select into jOOQ, you’d roughly get this Java code:
// assuming that DATA is an actual table... create.select( groupingId(DATA.EMPLOYEE, DATA.COMPANY).as("id"), DATA.COMPANY, DATA.EMPLOYEE, avg(SALARY)) .from(DATA) .groupBy(cube(DATA.EMPLOYEE, DATA.COMPANY));
With this powerful tool, you’re ready for all of those fancy reports and data overviews. For more details, read on about ROLLUP(), CUBE(), and GROUPING SETS() functions on the SQL Server documentation page, which explains it quite nicely:
http://msdn.microsoft.com/en-us/library/bb522495.aspx
Reference: GROUP BY ROLLUP / CUBE from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog.
Related Articles :