Window functions in Database
Window functions in PostgreSQL allow us to perform calculations across a set of table rows that are related to the current row. This is similar to aggregate functions, but window functions do not collapse rows into a single result like aggregate functions do. Instead, they provide result values across rows. Let us delve into exploring the important window functions in detail.
1. Pre-requisite
To understand this article, we need a database, but setting one up can often be tedious. However, with Docker, it becomes much simpler. You can watch this video to learn how to install Docker on Windows. Once installed, open the terminal and run the command below to set up and launch PostgreSQL.
-- Remember to change the password – docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=your_password --name postgres postgres -- command to stop the Postgres docker container -- docker stop postgres -- command to remove the Postgres docker container -- docker rm postgres
Remember to set a password of your choice. If everything goes well, the PostgreSQL database server will be running on port 5432
, and you can connect to it using the DBeaver GUI tool.
2. Common Window Functions in PostgreSQL
Some of the most commonly used window functions are:
ROW_NUMBER()
– Assigns a unique sequential integer to rows within a partition.RANK()
– Assigns a rank number to rows, with the same rank for equal values.DENSE_RANK()
– Similar toRANK()
, but without gaps in rank values.SUM()
– Calculates the cumulative sum of a column within a partition.AVG()
– Computes the average value within a partition.LEAD()
– Returns the value of a column from the next row in the result set.LAG()
– Returns the value of a column from the previous row in the result set.
2.1 Code Examples
We will use the below sql script to create the sales
table and add some mock data to the table.
-- Create the sales table CREATE TABLE sales ( id SERIAL PRIMARY KEY, employee_name VARCHAR(100), sale_amount DECIMAL ); -- Insert sample data INSERT INTO sales (employee_name, sale_amount) VALUES ('John', 500), ('Jane', 1200), ('John', 800), ('Jane', 100), ('Alice', 700), ('Bob', 300), ('John', 450), ('Alice', 1000), ('Jane', 950), ('Bob', 200);
2.1.1. ROW_NUMBER() Example
We can use the ROW_NUMBER()
method to assign a sequential number to each row. Below is a simple sql:
SELECT employee_name, sale_amount, ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds column called row_num
using the ROW_NUMBER()
window function. The ROW_NUMBER()
function assigns a unique sequential number to each row in the result set. In this query, the rows are ordered by sale_amount
in descending order (DESC
). This means that the highest sales amount will have a row_num
of 1, the second highest sales amount will have a row_num
of 2, and so on. This query is useful when you need to rank or enumerate rows based on a specific column (in this case, sale_amount
), without collapsing the rows as with aggregate functions.
2.1.2 RANK() Example
We can use the RANK()
method to rank employees by sales, giving the same rank for equal values but leaving gaps in the ranking for subsequent rows. Below is a simple sql:
SELECT employee_name, sale_amount, RANK() OVER ( ORDER BY sale_amount DESC ) AS rank_num FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds a column called rank_num
using the RANK()
window function. The RANK()
function assigns a rank to each row based on the sale_amount
, ordered in descending order (DESC
). If two or more rows have the same sale_amount
, they will receive the same rank. However, the next row will skip ranks to account for ties. For example, if two employees are tied for rank 1, the next employee will rank 3. This query is useful for ranking rows based on a specific column (in this case, sale_amount
) and handling ties by maintaining rank consistency.
2.1.3 DENSE_RANK() Example
We can use the DENSE_RANK()
method to rank rows, but without leaving gaps in the ranking for rows with the same value. Below is a simple sql:
SELECT employee_name, sale_amount, DENSE_RANK() OVER ( ORDER BY sale_amount DESC ) AS dense_rank_num FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds a new column called dense_rank_num
using the DENSE_RANK()
window function. The DENSE_RANK()
function assigns a rank to each row based on the sale_amount
, ordered in descending order (DESC
). If two or more rows have the same sale_amount
, they receive the same rank, similar to the RANK()
function. However, unlike RANK()
, the DENSE_RANK()
function does not skip ranks after ties. For example, if two employees are tied for rank 1, the next employee will be ranked as 2, rather than 3. This query is useful when ranking rows without creating gaps in the ranking sequence for ties.
2.1.4 SUM() Example
We can use the SUM()
method as a window function to calculate the cumulative sales for each employee. Below is a simple sql:
SELECT employee_name, sale_amount, SUM(sale_amount) OVER ( PARTITION BY employee_name ORDER BY sale_amount DESC ) AS cumulative_sales FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds a new column called cumulative_sales
using the SUM()
window function. The SUM()
function calculates the running total (cumulative sum) of the sale_amount
for each employee_name
. The PARTITION BY
clause divides the result set into partitions based on employee_name
, meaning that the sum is calculated separately for each employee. Within each partition, the rows are ordered by sale_amount
in descending order (DESC
), so the highest sale appears first, and the sum is calculated progressively down the list. This query is useful for tracking the cumulative sales of each employee in the order of their highest sales first.
2.1.5 AVG() Example
We can use the AVG()
method to calculate the average sale amount for each employee. Below is a simple sql:
SELECT employee_name, sale_amount, AVG(sale_amount) OVER (PARTITION BY employee_name) AS avg_sales FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds a new column called avg_sales
using the AVG()
window function. The AVG()
function calculates the average of the sale_amount
for each employee_name
. The PARTITION BY
clause groups the rows based on the employee_name
, meaning the average is computed separately for each employee across all their sales. This query is useful for determining the average sales amount of each employee while preserving the details of individual sales in the result set.
2.1.6 LEAD() Example
We can use the LEAD()
method to return the value of the next row in the result set, making it useful for comparing rows with the subsequent row. Below is a simple sql:
SELECT employee_name, sale_amount, LEAD(sale_amount, 1) OVER (ORDER BY sale_amount DESC) AS next_sale FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds a new column called next_sale
using the LEAD()
window function. The LEAD()
function returns the value of the sale_amount
from the next row in the result set, based on the current ordering. The ORDER BY
clause sorts the rows by sale_amount
in descending order (DESC
), meaning the highest sale appears first. The LEAD()
function with an offset of 1 looks at the sale amount of the next row and shows it in the next_sale
column. If there is no next row (i.e., for the last row), the value will be NULL
. This query is useful for comparing an employee’s current sale with their next highest sale.
2.1.7 LAG() Example
We can use the LAG()
method to return the value of the previous row, which can help in comparing the current values with the previous ones. Below is a simple sql:
SELECT employee_name, sale_amount, LAG(sale_amount, 1) OVER (ORDER BY sale_amount DESC) AS previous_sale FROM sales;
The SQL query retrieves the employee_name
and sale_amount
from the sales
table, and adds a new column called previous_sale
using the LAG()
window function. The LAG()
function returns the value of the sale_amount
from the previous row in the result set, based on the current ordering. The ORDER BY
clause sorts the rows by sale_amount
in descending order (DESC
), so the highest sale amount appears first. The LAG()
function with an offset of 1 retrieves the sale amount of the previous row and displays it in the previous_sale
column. If there is no previous row (i.e., for the first row), the value will be NULL
. This query is useful for comparing an employee’s current sale with their previous highest sale.
3. Conclusion
In conclusion, window functions are powerful tools for performing advanced calculations over a set of rows while preserving individual row details. They enable insights that aren’t possible with traditional aggregate functions.
4. Download the code
You can download the simple sql script from here: Window functions in Database