You are on page 1of 12

Explanation:

 To calculate average order price, window function used AVG() on price columns and
partition by on order_id
 Consider order id: 1112 — consist 3 products (i.e. 1,2,5). The average value of those 3
products are (866 + 163 +173) / 3 = 400.667

b. COUNT():

 Calculates the number of rows with NULL values too if available in column or
expression.
 This window function is helpful while creating a new feature in the dataset. Like count
number of entries belong to each customer.

Example:

Query 1: Calculate the number of products purchased by a customer in the order.

SELECT order_id, name, product_id, COUNT(*) OVER (Partition BY order_id) AS


Number_of_Products
FROM retails

Query 2: Calculate the number of product sales purchased (running total) by the customer.

SELECT order_id, name, product_id, COUNT(*) OVER (Order BY order_id) AS


Number_of_Products
FROM retails

Output — Query 1 & 2 — COUNT()

Explanation:

Query 1:
 Partition by order_id counts number of records belongs to particular order_id.
 In the output, we can see several products for each order displayed.

Query 2:

 Order by order_id counts a number of records and particular order_id and then add
number records of consecutive order.
 Output: We can see the count is increased by the number of records related to particular
order_id.

c. Min() or Max():

 Min() or Max() return Minimum or Maximum value of the expression across the input
values respectively.
 Both window function works with Numeric values and ignores NULL values.

Example:

Below query add a new feature into the result set, Minumum and Maximum price of the product
purchased in respective order.

Query:
SELECT order_id, name, product_id, price,
MIN(price) OVER (Partition BY order_id) AS Minimum_Price_Product,
MAX(price) OVER (Partition BY order_id) AS Maximum_Price_Product
FROM retails

Output- Query- MIN() or MAX()

Explanation:
 For each order_id record respective minimum and maximum price of the product has
been added.
 We can use each function separately too.

d. Sum():

 Returns sum/total expression across all input value.


 The function works with Numeric Values and ignores NULL values.

Example:

Below query return Total Price of each order_id.

Query:
SELECT order_id, name, product_id, price,
SUM(price) OVER (PARTITION BY order_id) AS Average_Order_Price
FROM retails

Output — Query — SUM()

Explanation:

 New column added with total_order_price for each order_id.


 Helpful to analyse data, where we have a number of records, belongs to each order_id.
2. Window Ranking Aggregated Function:
Consist one of the supporting ranking function i.e. RANK(), DENSE_RANK(),
ROW_NUMBER().

a. RANK():

 The Rank of a value in a group of values based on the ORDER BY expression in the
OVER clause (refer Query 1).
 Each value is ranked within its PARTITION BY expression (refer Query 2).
 Rows with equal values for the ranking criteria receive the same rank.
 Tie or same rank skip the consecutive rank eg. Rank (): 1,1,3,4,5.

Example:

Query 1: Rank the product based on their prices.

SELECT order_id, name, product_id, price,


RANK() OVER (ORDER BY price) AS Rank_Product_Price
FROM retails

Query 2: Rank the product based on their prices in each order (i.e. partition by order_id).

SELECT order_id, name, product_id, price,


RANK() OVER (PARTITION BY order_id ORDER BY price) AS Rank_Product_Price
FROM retails

Output — Query 1 & 2 — RANK()

Explanation:

As we can see in both query, ORDER BY states the expression used to rank the values.

Query 1:
 The ranking is done based on product_price.
 Also note, 9 rows with the same value has tie rank 1.
 So next Rank value starts with 10.

Query 2:

 Rank has been done by ORDER BY expression i.e. price column.


 Check order_id 114, we can see the rank for first 2 product prices are same. Hence rank
assign to it is 1.
 Next product price rank with 3 within that order_id.

b. DENSE_RANK():

 Similarly to Rank() function, Rank of a value in a group of values based on the ORDER
BY expression and the OVER clause and each value is ranked within its PARTITION
BY expression.
 The difference is, Rows with equal values receive the same rank and Tie or the same rank
not skip the consecutive rank.
 Example: Dense_Rank(): 1,1,2,3,4

EXAMPLE:

Query: Dense_Rank the product based on their prices in each order (i.e. partition by order_id).

SELECT order_id, name, product_id, price,


DENSE_RANK() OVER (PARTITION BY order_id ORDER BY price) AS
Dense_Rank_Product_Price
FROM retails
Output — Query — Dense_Rank()

Explanation:

 As we can see ranking to each row done based on ORDER BY expression i.e. price
values also within each order_id i.e. (PARTITION BY order_id).
 order_id 1114, have 5 products out of which 2 products having the same price hence rank
tie i.e. 1.
 The next rank starts with 2 (this is the major difference between Rank() and
Desne_Rank() function).
 Dense_Rank() not skip the consecutive rank number.

c. CUME_DIST():

 Calculates Relative Rank of the current row within a window partition based on below
Formula:

EXAMPLE:
Query: CUME_DIST i.e. Relative rank the product based on their prices in each order (i.e.
partition by order_id).

SELECT order_id, name, product_id, price,


CUME_DIST() OVER (PARTITION BY order_id ORDER BY price) AS
Dense_Rank_Product_Price
FROM retails

Output — Query — CUME_DIST() i.e. Relative Rank

Explanation:

Let's consider order_id 1112 having 3 products, relative rank calculated as discussed below
formula used:

 Row no. 3— First product: 1/3 = 0.3333


 Row no. 4— Second product: 2/3 = 0.666
 Row no. 5— Third product: 3/3 = 1

Similarly, if product having same value or price then relative rank also same check order_id
1114 in output screenshot.
d. ROW_NUMBER():

 An ordinal number of the current row within its partition based on ORDER BY
expression in the OVER clause.
 Each value is ordered within its PARTITION BY expression.
 Rows with equal values for the ORDER BY expressions receive different row numbers
non deterministically.

EXAMPLE:

Query: Assign Row_Number to the product based on their prices in each order (i.e. partition by
order_id).

SELECT order_id, name, product_id, price,


ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY price) AS
Row_Number_Product_Price
FROM retails

Output — Query — Row_Number()

Explanation:

 As we can see in output screenshot, row number assign based on price (ORDER BY
expression) within each order (PARTITION BY order_id).
 Not consider value same or not, just assign row_number to each row in the expression.

e. NTILE():
 Divides the rows for each window partition, as equally as possible, into a specified
number of ranked groups.
 Requires ORDER BY clause in the OVER clause.
 The column or expression specified in ORDER BY clause, first all values has been sorted
in ascending order and then equally assign group number.

Example:

Query: Assign Group/cluster/bucket number to all row into 10 different groups based on the
product price.

SELECT order_id, name, product_id, price,


NTILE(10) OVER (ORDER BY price) AS NTile_Product_Price
FROM retails

Output — Query — NTILE()

Explanation:

 In this dataset, we have a total of 50 records.


 Hence, each clustered consist of 5 rows as shown in the output screenshot.
 First, all rows have been sorted with the price and then assign a group number to each
row.
f. PERCENT_RANK()

 Percentage rank of the current row using the following formula:

Example:

Query: Calculate or assign percentage rank to all row based on the product price.

SELECT order_id, name, product_id, price,


PERCENT_RANK() OVER (PARTITION BY order_id ORDER BY price) AS
Row_Number_Product_Price
FROM retails

Output — Query — Percent_Rank()

Explanation:

Let’s consider order_id 1114 having 5 products, relative rank calculated as discussed below
formula used:
 Row no. 9— First product: (1–1)/(5–1) = 0
 Row no. 10— Second product: (1–1)/(5–1) = 0
 Row no. 11— Third product: (3–1)/(5–1) = 0.5
 Row no. 12 — Forth product: (4–1)/(5–1) = 0.75
 Row no. 13 — Fifth product: (5–1)/(5–1) = 1

3. Window Analytic Functions:


Consist one of the supporting ranking function i.e. LAG(), LEAD(), FIRST_VALUE(),
LAST_VALUE().

a. LAG() or LEAD():

Syntax:

LAG | LEAD (expression)


OVER ([ PARTITION BY expression_list] [ORDER BY order_list] )

 LAG or LEAD returns, value for the row value before or after the current row in a
partition respectively.
 If no row exists, null is returned.

Example:

Query: Add new feature 1 step LAG or LEAD product price within each order (i.e.
PARTITION BY order_id)

SELECT order_id, name, product_id, price,


LAG(price,1) OVER (PARTITION BY order_id ORDER BY price) AS LAG_Product_Price,
LEAD(price,1) OVER (PARTITION BY order_id ORDER BY price) AS
LEAD_Product_Price
FROM retails
Output — Query — LAG() or LEAD()

Explanation:

You might also like