GROUP BY ROLLUP / CUBE

Lukas EderDecember 22nd, 2011Last Updated: January 7th, 2015

0 155 4 minutes read

Every now and then, you come across a requirement that will bring you to your SQL limits. Many of us probably give up early and calculate stuff in Java / [or your language]. Instead, it might’ve been so easy and fast to do with SQL. If you’re working with an advanced database, such as DB2, Oracle, SQL Server, Sybase SQL Anywhere, (and MySQL in this case, which supports the WITH ROLLUP clause), you can take advantage of the ROLLUP / CUBE / GROUPING SETS grouping functions.

Lets have a look at my fictional salary progression compared to that of a fictional friend, who has chosen a different career path (observe the salary boost in 2011):

select 'Lukas'      as employee, 

       'SoftSkills' as company, 

    80000        as salary, 

    2007         as year 

from dual

union all select 'Lukas', 'SoftSkills', 80000,  2008 from dual

union all select 'Lukas', 'SmartSoft',  90000,  2009 from dual

union all select 'Lukas', 'SmartSoft',  95000,  2010 from dual

union all select 'Lukas', 'jOOQ',       200000, 2011 from dual

union all select 'Lukas', 'jOOQ',       250000, 2012 from dual

union all select 'Tom',   'SoftSkills', 89000,  2007 from dual

union all select 'Tom',   'SoftSkills', 90000,  2008 from dual

union all select 'Tom',   'SoftSkills', 91000,  2009 from dual

union all select 'Tom',   'SmartSoft',  92000,  2010 from dual

union all select 'Tom',   'SmartSoft',  93000,  2011 from dual

union all select 'Tom',   'SmartSoft',  94000,  2012 from dual

Now we’re used to gathering statistics using simple grouping and simple aggregate functions. For instance, let’s calculate how much Lukas and Tom earned on average over the past few years:

with data as ([above select])

select employee, avg(salary)

from data

group by employee

This will show that Lukas has earned more:

+--------+-----------+
|EMPLOYEE|AVG(SALARY)|
+--------+-----------+
|Lukas   |     132500|
|Tom     |      91500|
+--------+-----------+

So it’s probably interesting to find out what they have earned on average in which company:

with data as (...)

select company, employee, avg(salary)

from data

group by company, employee

order by company, employee

And immediately, it becomes clear where the big bucks are and that Tom has made a bad decision

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SmartSoft |Tom     |      93000|
|SoftSkills|Lukas   |      80000|
|SoftSkills|Tom     |      90000|
+----------+--------+-----------+

ROLLUP

By adding grouping fields, we “lose” some aggregation information. In the above examples, the overall average salary per employee is no longer available directly from the result. That’s obvious, considering the grouping algorithm. But in nice-looking reports, we often want to display those grouping headers as well. This is where ROLLUP, CUBE (and GROUPING SETS) come into play. Consider the following query:

with data as (...)

select company, employee, avg(salary)

from data

group by rollup(company), employee

The above rollup function will now add additional rows to the grouping result set, holding useful aggregated values. In this case, when we “roll up the salaries of the company”, we will get the average of the remaining grouping fields, i.e. the average per employee:

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|SmartSoft |Tom     |      93000|
|SoftSkills|Tom     |      90000|
|{null}    |Tom     |      91500|
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SoftSkills|Lukas   |      80000|
|{null}    |Lukas   |     132500|
+----------+--------+-----------+

Note how these rows hold the same information as the ones from the first query, where we were only grouping by employee… This becomes even more interesting, when we put more grouping fields into the rollup function:

with data as (...)

select company, employee, avg(salary)

from data

group by rollup(employee, company)

As you can see, the order of grouping fields is important in the rollup function. The result from this query now also adds the overall average salary paid to all employees in all companies

+----------+--------+-----------+
|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----------+--------+-----------+
|SmartSoft |Tom     |      93000|
|SoftSkills|Tom     |      90000|
|{null}    |Tom     |      91500|
|jOOQ      |Lukas   |     225000|
|SmartSoft |Lukas   |      92500|
|SoftSkills|Lukas   |      80000|
|{null}    |Lukas   |     132500|
|{null}    |{null}  |     112000|
+----------+--------+-----------+

In order to identify the totals rows for reporting, you can use the GROUPING() function in DB2, Oracle, SQL Server and Sybase SQL Anywhere. In Oracle and SQL Server, there’s the even more useful GROUPING_ID() function:

with data as (...)

select grouping_id(employee, company) id, company, employee, avg(salary)

from data

group by rollup(employee, company)

It documents on what “grouping level” of the rollup function the current row was produced:

+----+----------+--------+-----------+
|  ID|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----+----------+--------+-----------+
|   0|SmartSoft |Tom     |      93000|
|   0|SoftSkills|Tom     |      90000|
|   1|{null}    |Tom     |      91500|
|   0|jOOQ      |Lukas   |     225000|
|   0|SmartSoft |Lukas   |      92500|
|   0|SoftSkills|Lukas   |      80000|
|   1|{null}    |Lukas   |     132500|
|   3|{null}    |{null}  |     112000|
+----+----------+--------+-----------+

CUBE

The cube function works similar, except that the order of cube grouping fields becomes irrelevant, as all combinations of grouping are combined. This is a bit tricky to put in words, so lets put it in action:

with data as (...)

select grouping_id(employee, company) id, company, employee, avg(salary)

from data

group by cube(employee, company)

In the following result, you will get:

GROUPING_ID() = 0: Average per company and employee. This is the normal grouping result
GROUPING_ID() = 1: Average per employee
GROUPING_ID() = 2: Average per company
GROUPING_ID() = 3: Overall average

+----+----------+--------+-----------+
|  ID|COMPANY   |EMPLOYEE|AVG(SALARY)|
+----+----------+--------+-----------+
|   3|{null}    |{null}  |     112000|
|   2|jOOQ      |{null}  |     225000|
|   2|SmartSoft |{null}  |      92800|
|   2|SoftSkills|{null}  |      86000|
|   1|{null}    |Tom     |      91500|
|   0|SmartSoft |Tom     |      93000|
|   0|SoftSkills|Tom     |      90000|
|   1|{null}    |Lukas   |     132500|
|   0|jOOQ      |Lukas   |     225000|
|   0|SmartSoft |Lukas   |      92500|
|   0|SoftSkills|Lukas   |      80000|
+----+----------+--------+-----------+

In other words, using the CUBE() function, you will get grouping results for every possible combination of the grouping fields supplied to the CUBE() function, which results in 2^n GROUPING_ID()’s for n “cubed” grouping fields

Support in jOOQ

jOOQ 2.0 introduces support for these functions. If you want to translate the last select into jOOQ, you’d roughly get this Java code:

// assuming that DATA is an actual table...

create.select(

         groupingId(DATA.EMPLOYEE, DATA.COMPANY).as("id"),

         DATA.COMPANY, DATA.EMPLOYEE, avg(SALARY))

      .from(DATA)

      .groupBy(cube(DATA.EMPLOYEE, DATA.COMPANY));

With this powerful tool, you’re ready for all of those fancy reports and data overviews. For more details, read on about ROLLUP(), CUBE(), and GROUPING SETS() functions on the SQL Server documentation page, which explains it quite nicely:
http://msdn.microsoft.com/en-us/library/bb522495.aspx

Reference: GROUP BY ROLLUP / CUBE from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog.

Related Articles :