Software Development

Window functions in Database

Window functions in PostgreSQL allow us to perform calculations across a set of table rows that are related to the current row. This is similar to aggregate functions, but window functions do not collapse rows into a single result like aggregate functions do. Instead, they provide result values across rows. Let us delve into exploring the important window functions in detail.

1. Pre-requisite

To understand this article, we need a database, but setting one up can often be tedious. However, with Docker, it becomes much simpler. You can watch this video to learn how to install Docker on Windows. Once installed, open the terminal and run the command below to set up and launch PostgreSQL.

-- Remember to change the password –
docker run -d -p 5432:5432 -e POSTGRES_PASSWORD=your_password --name postgres postgres

-- command to stop the Postgres docker container --
docker stop postgres

-- command to remove the Postgres docker container --
docker rm postgres

Remember to set a password of your choice. If everything goes well, the PostgreSQL database server will be running on port 5432, and you can connect to it using the DBeaver GUI tool.

postgresql-on-docker
Fig. 1. Postgres on Docker

2. Common Window Functions in PostgreSQL

Some of the most commonly used window functions are:

  • ROW_NUMBER() – Assigns a unique sequential integer to rows within a partition.
  • RANK() – Assigns a rank number to rows, with the same rank for equal values.
  • DENSE_RANK() – Similar to RANK(), but without gaps in rank values.
  • SUM() – Calculates the cumulative sum of a column within a partition.
  • AVG() – Computes the average value within a partition.
  • LEAD() – Returns the value of a column from the next row in the result set.
  • LAG() – Returns the value of a column from the previous row in the result set.

2.1 Code Examples

We will use the below sql script to create the sales table and add some mock data to the table.

-- Create the sales table
CREATE TABLE sales (
  id SERIAL PRIMARY KEY, 
  employee_name VARCHAR(100), 
  sale_amount DECIMAL
);

-- Insert sample data
INSERT INTO sales (employee_name, sale_amount) 
VALUES 
  ('John', 500), 
  ('Jane', 1200), 
  ('John', 800), 
  ('Jane', 100), 
  ('Alice', 700), 
  ('Bob', 300), 
  ('John', 450), 
  ('Alice', 1000), 
  ('Jane', 950), 
  ('Bob', 200);

2.1.1. ROW_NUMBER() Example

We can use the ROW_NUMBER() method to assign a sequential number to each row. Below is a simple sql:

SELECT 
    employee_name,
    sale_amount,
    ROW_NUMBER() OVER (ORDER BY sale_amount DESC) AS row_num
FROM sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds column called row_num using the ROW_NUMBER() window function. The ROW_NUMBER() function assigns a unique sequential number to each row in the result set. In this query, the rows are ordered by sale_amount in descending order (DESC). This means that the highest sales amount will have a row_num of 1, the second highest sales amount will have a row_num of 2, and so on. This query is useful when you need to rank or enumerate rows based on a specific column (in this case, sale_amount), without collapsing the rows as with aggregate functions.

Database window function-img1
Fig. 1: Row number output

2.1.2 RANK() Example

We can use the RANK() method to rank employees by sales, giving the same rank for equal values but leaving gaps in the ranking for subsequent rows. Below is a simple sql:

SELECT 
  employee_name, 
  sale_amount, 
  RANK() OVER (
    ORDER BY 
      sale_amount DESC
  ) AS rank_num 
FROM 
  sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds a column called rank_num using the RANK() window function. The RANK() function assigns a rank to each row based on the sale_amount, ordered in descending order (DESC). If two or more rows have the same sale_amount, they will receive the same rank. However, the next row will skip ranks to account for ties. For example, if two employees are tied for rank 1, the next employee will rank 3. This query is useful for ranking rows based on a specific column (in this case, sale_amount) and handling ties by maintaining rank consistency.

Database window function-img2
Fig. 2: Rank output

2.1.3 DENSE_RANK() Example

We can use the DENSE_RANK() method to rank rows, but without leaving gaps in the ranking for rows with the same value. Below is a simple sql:

SELECT 
  employee_name, 
  sale_amount, 
  DENSE_RANK() OVER (
    ORDER BY 
      sale_amount DESC
  ) AS dense_rank_num 
FROM 
  sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds a new column called dense_rank_num using the DENSE_RANK() window function. The DENSE_RANK() function assigns a rank to each row based on the sale_amount, ordered in descending order (DESC). If two or more rows have the same sale_amount, they receive the same rank, similar to the RANK() function. However, unlike RANK(), the DENSE_RANK() function does not skip ranks after ties. For example, if two employees are tied for rank 1, the next employee will be ranked as 2, rather than 3. This query is useful when ranking rows without creating gaps in the ranking sequence for ties.

Database window function-img3
Fig. 3: Dense rank output

2.1.4 SUM() Example

We can use the SUM() method as a window function to calculate the cumulative sales for each employee. Below is a simple sql:

SELECT 
  employee_name, 
  sale_amount, 
  SUM(sale_amount) OVER (
    PARTITION BY employee_name 
    ORDER BY 
      sale_amount DESC
  ) AS cumulative_sales 
FROM 
  sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds a new column called cumulative_sales using the SUM() window function. The SUM() function calculates the running total (cumulative sum) of the sale_amount for each employee_name. The PARTITION BY clause divides the result set into partitions based on employee_name, meaning that the sum is calculated separately for each employee. Within each partition, the rows are ordered by sale_amount in descending order (DESC), so the highest sale appears first, and the sum is calculated progressively down the list. This query is useful for tracking the cumulative sales of each employee in the order of their highest sales first.

Database window function-img4
Fig. 4: Sum with partition output

2.1.5 AVG() Example

We can use the AVG() method to calculate the average sale amount for each employee. Below is a simple sql:

SELECT 
    employee_name,
    sale_amount,
    AVG(sale_amount) OVER (PARTITION BY employee_name) AS avg_sales
FROM sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds a new column called avg_sales using the AVG() window function. The AVG() function calculates the average of the sale_amount for each employee_name. The PARTITION BY clause groups the rows based on the employee_name, meaning the average is computed separately for each employee across all their sales. This query is useful for determining the average sales amount of each employee while preserving the details of individual sales in the result set.

Database window function-img5
Fig. 5: Average with partition output

2.1.6 LEAD() Example

We can use the LEAD() method to return the value of the next row in the result set, making it useful for comparing rows with the subsequent row. Below is a simple sql:

SELECT 
    employee_name,
    sale_amount,
    LEAD(sale_amount, 1) OVER (ORDER BY sale_amount DESC) AS next_sale
FROM sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds a new column called next_sale using the LEAD() window function. The LEAD() function returns the value of the sale_amount from the next row in the result set, based on the current ordering. The ORDER BY clause sorts the rows by sale_amount in descending order (DESC), meaning the highest sale appears first. The LEAD() function with an offset of 1 looks at the sale amount of the next row and shows it in the next_sale column. If there is no next row (i.e., for the last row), the value will be NULL. This query is useful for comparing an employee’s current sale with their next highest sale.

lead
Fig. 6: Lead output

2.1.7 LAG() Example

We can use the LAG() method to return the value of the previous row, which can help in comparing the current values with the previous ones. Below is a simple sql:

SELECT 
    employee_name,
    sale_amount,
    LAG(sale_amount, 1) OVER (ORDER BY sale_amount DESC) AS previous_sale
FROM sales;

The SQL query retrieves the employee_name and sale_amount from the sales table, and adds a new column called previous_sale using the LAG() window function. The LAG() function returns the value of the sale_amount from the previous row in the result set, based on the current ordering. The ORDER BY clause sorts the rows by sale_amount in descending order (DESC), so the highest sale amount appears first. The LAG() function with an offset of 1 retrieves the sale amount of the previous row and displays it in the previous_sale column. If there is no previous row (i.e., for the first row), the value will be NULL. This query is useful for comparing an employee’s current sale with their previous highest sale.

lag
Fig. 7: Lag output

3. Conclusion

In conclusion, window functions are powerful tools for performing advanced calculations over a set of rows while preserving individual row details. They enable insights that aren’t possible with traditional aggregate functions.

4. Download the code

Download
You can download the simple sql script from here: Window functions in Database

Yatin Batra

An experience full-stack engineer well versed with Core Java, Spring/Springboot, MVC, Security, AOP, Frontend (Angular & React), and cloud technologies (such as AWS, GCP, Jenkins, Docker, K8).
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Back to top button