Understanding SQL Window Functions: A Comprehensive Guide

SQL Window Functions are powerful tools that allow users to perform calculations across a set of table rows related to the current row.

What is a Window Function?

A window function performs a calculation across a set of table rows that are somehow related to the current row. This set of rows is referred to as the “window” and is defined using the OVER clause. Window functions are versatile and can be used for a wide range of analytical tasks.

Unlike aggregate functions, which return a single result for a group of rows, window functions can return multiple rows for each partition. This makes them incredibly useful for running totals, moving averages, and rank calculations.

window functions

Syntax

The general syntax for a window function is:

<function_name>(<expression>) OVER (
    [PARTITION BY <expression>]
    [ORDER BY <expression>]
    [ROWS/RANGE BETWEEN <start> AND <end>]
)
  • <function_name>: The window function to be used (e.g., ROW_NUMBER, RANK, SUM, AVG).
  • <expression>: The column or expression to be used in the function.
  • PARTITION BY: Divides the result set into partitions to which the window function is applied.
  • ORDER BY: Defines the order of the rows within each partition.
  • ROWS/RANGE BETWEEN: Specifies the window frame for the function.

Common Window Functions

window functions

ROW_NUMBER()

The ROW_NUMBER function assigns a unique sequential integer to rows within a partition of a result set, starting at 1 for the first row in each partition.

SELECT employee_id,first_name, last_name, department_id, salary, ROW_NUMBER() OVER (PARTITION BY department_id ORDER BY salary DESC) AS row_num FROM employees;

Result:

employee_idfirst_namelast_namedepartment_idsalaryrow_num
2JaneSmith160001
3AliceJohnson155002
1JohnDoe150003
5CharlieDavis270001
6DavidWilson260002
4BobBrown245003

RANK() and DENSE_RANK()

The RANK function assigns a rank to each row within a partition, with gaps in the ranking values when there are ties. DENSE_RANK assigns ranks without gaps.

SELECT employee_id, first_name, last_name, department_id, salary, RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS rank, DENSE_RANK() OVER (PARTITION BY department_id ORDER BY salary DESC) AS dense_rank FROM employees;

Result:

employee_idfirst_namelast_namedepartment_idsalaryrankdense_rank
2JaneSmith1600011
3AliceJohnson1550022
1JohnDoe1500033
5CharlieDavis2700011
6DavidWilson2600022
4BobBrown2450033

SQL RANK() versus ROW_NUMBER()

  • RANK and DENSE_RANK are deterministic, all rows with the same value for both the ordering and partitioning columns will end up with an equal result, whereas ROW_NUMBER will arbitrarily (non-deterministically) assign an incrementing result to the tied
  • ROW_NUMBER: Returns a unique number for each row starting with 1. For rows that have duplicate values, numbers are arbitrarily assigned.
  • Rank: Assigns a unique number for each row starting with 1, except for rows that have duplicate values, in which case the same ranking is assigned and a gap appears in the sequence for each duplicate ranking.

LAG() and LEAD()

The LAG function provides access to a row at a specified physical offset before the current row within the result set. LEAD does the same but for the row after the current row.

SELECT employee_id, first_name, last_name, department_id, salary, LAG(salary, 1) OVER (PARTITION BY department_id ORDER BY salary DESC) AS previous_salary, LEAD(salary, 1) OVER (PARTITION BY department_id ORDER BY salary DESC) AS next_salary FROM employees;

Result:

employee_idfirst_namelast_namedepartment_idsalaryprevious_salarynext_salary
2JaneSmith16000NULL5500
3AliceJohnson1550060005000
1JohnDoe150005500NULL
5CharlieDavis27000NULL6000
6DavidWilson2600070004500
4BobBrown245006000NULL

Moving Average

Calculating a 2-salary moving average within each department.

SELECT employee_id, first_name, last_name, department_id, salary, AVG(salary) OVER (PARTITION BY department_id ORDER BY salary DESC ROWS BETWEEN 1 PRECEDING AND CURRENT ROW) AS moving_avg FROM employees;

Result:

employee_idfirst_namelast_namedepartment_idsalarymoving_avg
2JaneSmith160006000
3AliceJohnson155005750
1JohnDoe150005250
5CharlieDavis270007000
6DavidWilson260006500
4BobBrown245005250

Cumulative Distribution

Calculating the cumulative distribution of salaries within each department.

SELECT employee_id, first_name, last_name, department_id, salary, CUME_DIST() OVER (PARTITION BY department_id ORDER BY salary) AS cum_dist FROM employees;

Result:

employee_idfirst_namelast_namedepartment_idsalarycum_dist
1JohnDoe150000.3333
3AliceJohnson155000.6667
2JaneSmith160001.0000
4BobBrown245000.3333
6DavidWilson260000.6667
5CharlieDavis270001.0000

Conclusion

SQL Window Functions are incredibly powerful tools for performing complex calculations across a set of table rows related to the current row. By using the OVER clause, these functions can provide insights and analytics that are difficult to achieve with standard SQL queries. Understanding and leveraging these functions can greatly enhance your ability to perform data analysis and gain meaningful insights from your datasets.

Use the examples provided in this article to experiment with window functions and see how they can be applied to your specific use cases. With practice, you’ll be able to harness the full power of SQL Window Functions to perform advanced data analysis with ease.

Resource

Leave a Comment