SQL GROUP BY and Functional Dependencies: A Very Useful Feature
Relational databases define the term “Functional Dependency” as such (from Wikipedia):
In relational database theory, a functional dependency is a constraint between two sets of attributes in a relation from a database. In other words, functional dependency is a constraint that describes the relationship between attributes in a relation.
In SQL, functional dependencies appear whenever there is a unique constraint (e.g. a primary key constraint). Let’s assume the following:
CREATE TABLE actor ( actor_id BIGINT NOT NULL PRIMARY KEY, first_name VARCHAR(50) NOT NULL, last_name VARCHAR(50) NOT NULL );
It can be said that both FIRST_NAME
and LAST_NAME
each have a functional dependency on the ACTOR_ID
column.
Nice. So what?
This isn’t just some mathematical statement that can be applied to unique constraints. It’s extremely useful for SQL. It means that for every ACTOR_ID
value, there can be only one (functionally dependent) FIRST_NAME
and LAST_NAME
value. The other way round, this isn’t true. For any given FIRST_NAME
and/or LAST_NAME
value, we can have multiple ACTOR_ID
values, as we can have multiple actors by the same names.
Because there can be only one corresponding FIRST_NAME
and LAST_NAME
value for any given ACTOR_ID
value, we can omit those columns in the GROUP BY
clause. Let’s assume also:
CREATE TABLE film_actor ( actor_id BIGINT NOT NULL, film_id BIGINT NOT NULL, PRIMARY KEY (actor_id, film_id), FOREIGN KEY (actor_id) REFERENCS actor (actor_id), FOREIGN KEY (film_id) REFERENCS film (film_id) );
Now, if we want to count the number of films per actor, we can write:
SELECT actor_id, first_name, last_name, COUNT(*) FROM actor JOIN film_actor USING (actor_id) GROUP BY actor_id ORDER BY COUNT(*) DESC
This is extremely useful as it saves us from a lot of typing. In fact, the way GROUP BY
semantics is defined, we can put all sorts of column references in the SELECT
clause, which are any of:
- Column expressions that appear in the
GROUP BY
clause - Column expressions that are functionally dependent on the set of column expressions in the
GROUP BY
clause - Aggregate functions
Unfortunately, not everyone supports this
If you’re using Oracle, for instance, you can’t make use of the above. You’ll need to write the classic, equivalent version where all the non-aggregate column expressions appearing in the SELECT
clause must also appear in the GROUP BY
clause
SELECT actor_id, first_name, last_name, COUNT(*) FROM actor JOIN film_actor USING (actor_id) GROUP BY actor_id, first_name, last_name -- ^^^^^^^^^^ ^^^^^^^^^ unnecessary ORDER BY COUNT(*) DESC
Further reading:
- Wikipedia article on functional dependency
- How SQL GROUP BY Should Have Been Designed – Like Neo4j’s Implicit GROUP BY
- Do You Really Understand SQL’s GROUP BY and HAVING clauses?
- How to Translate SQL GROUP BY and Aggregations to Java 8
- GROUP BY ROLLUP / CUBE
Reference: | SQL GROUP BY and Functional Dependencies: A Very Useful Feature from our JCG partner Lukas Eder at the JAVA, SQL, AND JOOQ blog. |