You are on page 1of 23

Note: “standard” name is “Window” functions

When? – Starting 8i

Why? – Simple Solution of Complex Problems

Why Exactly? – advanced ranking, aggregation,


row comparison, statistics, “what if” scenarios

Order of Evaluation in SQL: Prior to “ORDER


BY” clause
Some of the things that are hard to do in SQL are :

. Calculate a running total

. Find percentages within a group

. Top-N Queries

. Compute a moving average

. Perform ranking queries


Analytic functions compute an aggregate value based on a group of rows.

The group of rows is called a window and is defined by the analytic_clause.


For each row, a sliding window of rows is defined.

The window determines the range of rows used to perform the


calculations for the current row
Analytic-Function(<Argument>,<Argument>,...)
OVER ( < Query-Partition-Clause>
<Order-By-Clause> <Windowing-Clause> )

PARTITION BY – aggregates result set into groups

ORDER BY – orders data within a partition

WINDOWING – rows or ranges (logical offset)


How are analytic functions different from group or
aggregate functions? EMPNO ENAME DEPTNO SAL
SELECT empno, -----------------------------------------------------
ename, 7369 SMITH 20 800
deptno, 7499 ALLEN 30 1600
sal 7521 WARD 30 1250
FROM emp 7566 JONES 20 2975
7654 MARTIN 30 1250

SELECT deptno,
DEPTNO SAL
sum(sal) sal
-------------------------
FROM emp
30 4100
GROUP BY deptno
20 3775

DEPTNO SAL
SELECT deptno,
--------------------------
sum(sal) over() sal
20 7875
FROM emp
30 7875
30 7875
20 7875
30 7875
How are analytic functions different from group or
aggregate functions?
DEPTNO DEPT_COUNT
SELECT deptno, --------------- --------------------
COUNT(*) DEPT_COUNT 20 2
FROM emp 30 3
WHERE deptno IN (20, 30) 2 rows selected
GROUP BY deptno;

SELECT empno ,
deptno,
COUNT(*) OVER (PARTITION BY deptno) DEPT_COUNT
FROM emp
WHERE deptno IN (20, 30); EMPNO DEPTNO DEPT_COUNT
----------- ---------- ----------
7369 20 2
7566 20 2
7499 30 3
7900 30 3
7844 30 3

5 rows selected.
SELECT ename,
deptno,
sal,
sum(sal) over () Tot,
sum(sal) over (order by deptno,ename) Run_Tot,
sum(sal) over (partition by deptno order by ename) Dept_Tot,
row_number() over (partition by deptno order by ename) Seq
FROM emp ORDER BY deptno,ename ;
ENAME DEPTNO SAL TOT RUN_TOT DEPT_TOT SEQ

CLARK 10 2450 29025 2450 2450 1


KING 10 5000 29025 7450 7450 2
MILLER 10 1300 29025 8750 8750 3
ADAMS 20 1100 29025 9850 1100 1
FORD 20 3000 29025 12850 4100 2
JONES 20 2975 29025 15825 7075 3
SCOTT 20 3000 29025 18825 10075 4
SMITH 20 800 29025 19625 10875 5
ALLEN 30 1600 29025 21225 1600 1
BLAKE 30 2850 29025 24075 4450 2
JAMES 30 950 29025 25025 5400 3
How Analytic Functions Work and when to
use?

 Analytic functions are computed after all joins, WHERE clause, GROUP
BY and HAVING are computed on the query.

 The main ORDER BY clause of the query operates after the analytic
functions.

 So analytic functions can only appear in the select list and in the main
ORDER BY clause of the query.
ROW_NUMBER()
LAG()
LEAD()
MIN()
MAX()
RANK()
DENSE_RANK()
SUM()
AVG()
FIRST_VALUE()
LAST_VALUE()
FIRST()
LAST()
ROW_NUMBER
FUNCTION
ROW_NUMBER( ) gives a running serial number to a partition of
records. It is very useful in reporting, especially in places where
different partitions have their own serial numbers.

SELECT empno,
deptno,
hiredate,
ROW_NUMBER() OVER (PARTITION BY
deptno ORDER BY hiredate
NULLS LAST) SRLNO
FROM emp EMPNO DEPTNO HIREDATE SRLNO
WHERE deptno IN (10,20)
7782 10 09-JUN-81 1
OREDR BY deptno,SRLNO
7839 10 17-NOV-81 2
7934 10 23-JAN-82 3
7369 20 17-DEC-80 1
7566 20 02-APR-81 2
7902 20 03-DEC-81 3
7788 20 09-DEC-82 4
7876 20 12-JAN-83 5
RANK
DENSE_RANK FUNCTIONS

SELECT empno, EMPNO DEPT SAL RANK DENSE_


deptno, NO RANK
sal,
RANK() OVER 7839 10 5000 1 1
(PARTITION BY deptno 7782 10 2450 2 2
ORDER BY sal DESC 7934 10 1300 3 3
NULLS LAST) RANK, 7788 20 3000 1 1
DENSE_RANK() OVER 7902 20 3000 1 1
(PARTITION BY 7566 20 2975 3 2
deptno ORDER BY sal 7876 20 1100 4 3
DESC NULLS
LAST) DENSE_RANK
FROM emp
WHERE deptno IN (10, 20)
ORDER BY 2, RANK;
LEAD LAG FUNCTIONS
LEAD has the ability to compute an expression on the next rows (rows which are
going to come after the current row) and return the value to the current row.

LEAD (<sql_expr>, <offset>, <default>) OVER


(<analytic_clause>)
The syntax of LAG is similar except that the offset for LAG goes into the
previous rows.
DEPT EMP SAL NEXT_ PREV_
SELECT deptno, NO NO LOWER HIGHER_
empno, _SAL SAL
sal, 10 7839 5000 2450 0
LEAD(sal, 1, 0) OVER (PARTITION 10 7782 2450 1300 5000
BY dept ORDER BY sal DESC 10 7934 1300 0 2450
20 7788 3000 3000 0
NULLS LAST) NEXT_LOWER_SAL, 20 7902 3000 2975 3000
LAG(sal, 1, 0) OVER (PARTITION 20 7566 2975 1100 3000
20 7876 1100 800 2975
BY dept ORDER BY sal DESC 20 7369 800 0 100
NULLS LAST) PREV_HIGHER_SAL
FROM emp
WHERE deptno IN (10, 20)
ORDER BY deptno, sal DESC;
FIRST VALUE
LAST VALUE FUNCTIONS
The FIRST_VALUE analytic function picks the first record from the partition after doing
the ORDER BY. The <sql_expr> is computed on the columns of this first record and
results are returned. The LAST_VALUE function is used in similar context except that it
acts on the last record of the partition.

FIRST_VALUE(<sql_expr>) OVER (<analytic_clause>)


EMPNO DEPTNO DAY_GAP
SELECT empno, deptno,
hiredate-
7369 20 0
FIRST_VALUE(hiredate)
7566 20 106
OVER (PARTITION BY
7902 20 351
deptno ORDER BY
7788 20 722
hiredate) DAY_GAP
7876 20 756
FROM emp
7499 30 0
WHERE deptno IN (20, 30)
7521 30 2
ORDER BY deptno,
7698 30 70
DAY_GAP;
7844 30 200
MIN
MAX FUNCTIONS

MAX returns maximum value of expr.


MIN returns minimum value of expr

SELECT manager_id, MGRID LNAME SAL MGR_MAX


last_name,
salary, 100 Kochhar 17000 17000
MAX(salary) OVER 100 De Haan 17000 17000
(PARTITION BY 100 Raphaely 11000 17000
manager_id) AS mgr_max 100 Kaufling 7900 17000
FROM employees; 100 Fripp 8200 17000
100 Weiss 8000 17000 . . .
WINDOW CLAUSE
To further sub-partition the result and apply the analytic function.

[ROW or RANGE] BETWEEN <start_expr> AND <end_expr>

<start_expr> can be any one of the following


•UNBOUNDED PECEDING
•CURRENT ROW
•<sql_expr> PRECEDING or FOLLOWING.

<end_expr> can be any one of the following


•UNBOUNDED FOLLOWING or
•CURRENT ROW or
•<sql_expr> PRECEDING or FOLLOWING.

UNBOUNDED PRECEDING for <start_expr>


UNBOUNDED FOLLOWING for <end_expr>.
There are two types of Window clauses

1.ROW Type Windows


Syntax:
Function( ) OVER (PARTITIN BY <expr1> ORDER BY
<expr2,..> ROWS BETWEEN <start_expr> AND <end_expr>)

(or)

Function( ) OVER (PARTITON BY <expr1> ORDER BY <expr2,..>


ROWS [<start_expr> PRECEDING or UNBOUNDED PRECEDING]

2.RANGE Windows
Syntax:
Function( ) OVER (PARTITION BY <expr1> ORDER BY
<expr2> RANGE BETWEEN <start_expr> AND <end_expr>)

(or)

Function( ) OVER (PARTITION BY <expr1> ORDER BY <expr2>


RANGE [<start_expr> PRECEDING or UNBOUNDED PRECEDING]
ROW Type Example
ID SAL
SELECT id, 01 1000
sal 02 2000
FROM emp 03 3000
04 1000
05 2000

SELECT id, ID SAL TOT_AVG


sal,
AVG(sal) OVER(ORDER BY id 01 1000 1500
ROWS BETWEEN 1 02 2000 2000
PRECEDING AND 1 FOLLOWING) 03 3000 2000
tot_avg 04 1000 2000
FROM emp 05 2000 1000
RANGE Type Example

SELECT ename,
sal,
hiredate,
hiredate-50 "50_days_prior",
first_value(ename) over
(order by hiredate asc range 50
preceding) first_ename,
first_value(hiredate) over (order
by hiredate asc range 50 preceding)
first_hdate
FROM emp ORDER BY hiredate ASC
ENAME SAL HIREDATE 50_days_p FIRST_ENAM FIRST_HDA
----------------------------------------------------------------------------------------------------
SMITH 800 17-DEC-80 28-OCT-80 SMITH 17-DEC-80
ALLEN 1600 20-FEB-81 01-JAN-81 ALLEN 20-FEB-81
WARD 1250 22-FEB-81 03-JAN-81 ALLEN 20-FEB-81
JONES 2975 02-APR-81 11-FEB-81 ALLEN 20-FEB-81
BLAKE 2850 01-MAY-81 12-MAR-81 JONES 02-APR-81
CLARK 2450 09-JUN-81 20-APR-81 BLAKE 01-MAY-81
TURNER 1500 08-SEP-81 20-JUL-81 TURNER 08-SEP-81
MARTIN 1250 28-SEP-81 09-AUG-81 TURNER 08-SEP-81
KING 5000 17-NOV-81 28-SEP-81 MARTIN 28-SEP-81
FORD 3000 03-DEC-81 14-OCT-81 KING 17-NOV-81
JAMES 950 03-DEC-81 14-OCT-81 KING 17-NOV-81
MILLER 1300 23-JAN-82 04-DEC-81 MILLER 23-JAN-82
SCOTT 3000 19-APR-87 28-FEB-87 SCOTT 19-APR-87
ADAMS 1100 23-MAY-87 03-APR-87 SCOTT 19-APR-87
Process Resultset Using Minimal Resources

Number Of Logical I/Os Is Less

 Run Time Is Less

 Easier To Code
Oracle calls the concept of filling in missing data with partitioned outer
joins Data Densification.
LEFT OUTER JOIN,PARTITIONED BY ,KEEP are some new features in
10g

LISTAGG , NTH_VALUE are new features available in 11g

LEAD , LAG functions have been improved with the addition of


IGNORE NULLS option in 11g