Windofhnction

Explanation:
 To calculate average order price, window function used AVG() on price columns and
partition by on order_id
 Consider order id: 1112 — consist 3 products (i.e. 1,2,5). The average value of those 3
products are (866 + 163 +173) / 3 = 400.667
b. COUNT():
 Calculates the number of rows with NULL values too if available in column or
expression.
 This window function is helpful while creating a new feature in the dataset. Like count
number of entries belong to each customer.
Example:
Query 1: Calculate the number of products purchased by a customer in the order.
SELECT order_id, name, product_id, COUNT(*) OVER (Partition BY order_id) AS

Number_of_Products
FROM retails
Query 2: Calculate the number of product sales purchased (running total) by the customer.
SELECT order_id, name, product_id, COUNT(*) OVER (Order BY order_id) AS

Number_of_Products
FROM retails
Output — Query 1 & 2 — COUNT()
Explanation:
Query 1:
 Partition by order_id counts number of records belongs to particular order_id.
 In the output, we can see several products for each order displayed.
Query 2:
 Order by order_id counts a number of records and particular order_id and then add
number records of consecutive order.
 Output: We can see the count is increased by the number of records related to particular
order_id.
c. Min() or Max():
 Min() or Max() return Minimum or Maximum value of the expression across the input
values respectively.
 Both window function works with Numeric values and ignores NULL values.
Example:
Below query add a new feature into the result set, Minumum and Maximum price of the product
purchased in respective order.
Query:
SELECT order_id, name, product_id, price,
MIN(price) OVER (Partition BY order_id) AS Minimum_Price_Product,
MAX(price) OVER (Partition BY order_id) AS Maximum_Price_Product
FROM retails
Output- Query- MIN() or MAX()
Explanation:
 For each order_id record respective minimum and maximum price of the product has
been added.
 We can use each function separately too.
d. Sum():
 Returns sum/total expression across all input value.

 The function works with Numeric Values and ignores NULL values.
Example:
Below query return Total Price of each order_id.
Query:
SUM(price) OVER (PARTITION BY order_id) AS Average_Order_Price
FROM retails
Output — Query — SUM()
Explanation:
 New column added with total_order_price for each order_id.

 Helpful to analyse data, where we have a number of records, belongs to each order_id.
2. Window Ranking Aggregated Function:
Consist one of the supporting ranking function i.e. RANK(), DENSE_RANK(),
ROW_NUMBER().
a. RANK():
 The Rank of a value in a group of values based on the ORDER BY expression in the
OVER clause (refer Query 1).
 Each value is ranked within its PARTITION BY expression (refer Query 2).
 Rows with equal values for the ranking criteria receive the same rank.
 Tie or same rank skip the consecutive rank eg. Rank (): 1,1,3,4,5.
Example:
Query 1: Rank the product based on their prices.

RANK() OVER (ORDER BY price) AS Rank_Product_Price
FROM retails
Query 2: Rank the product based on their prices in each order (i.e. partition by order_id).

RANK() OVER (PARTITION BY order_id ORDER BY price) AS Rank_Product_Price
FROM retails
Output — Query 1 & 2 — RANK()
Explanation:
As we can see in both query, ORDER BY states the expression used to rank the values.
Query 1:
 The ranking is done based on product_price.
 Also note, 9 rows with the same value has tie rank 1.
 So next Rank value starts with 10.
Query 2:
 Rank has been done by ORDER BY expression i.e. price column.

 Check order_id 114, we can see the rank for first 2 product prices are same. Hence rank
assign to it is 1.
 Next product price rank with 3 within that order_id.
b. DENSE_RANK():
 Similarly to Rank() function, Rank of a value in a group of values based on the ORDER
BY expression and the OVER clause and each value is ranked within its PARTITION
BY expression.
 The difference is, Rows with equal values receive the same rank and Tie or the same rank
not skip the consecutive rank.
 Example: Dense_Rank(): 1,1,2,3,4
EXAMPLE:
Query: Dense_Rank the product based on their prices in each order (i.e. partition by order_id).

DENSE_RANK() OVER (PARTITION BY order_id ORDER BY price) AS
Dense_Rank_Product_Price
FROM retails
Output — Query — Dense_Rank()
Explanation:
 As we can see ranking to each row done based on ORDER BY expression i.e. price
values also within each order_id i.e. (PARTITION BY order_id).
 order_id 1114, have 5 products out of which 2 products having the same price hence rank
tie i.e. 1.
 The next rank starts with 2 (this is the major difference between Rank() and
Desne_Rank() function).
 Dense_Rank() not skip the consecutive rank number.
c. CUME_DIST():
 Calculates Relative Rank of the current row within a window partition based on below
Formula:
EXAMPLE:
Query: CUME_DIST i.e. Relative rank the product based on their prices in each order (i.e.
partition by order_id).

CUME_DIST() OVER (PARTITION BY order_id ORDER BY price) AS
Dense_Rank_Product_Price
FROM retails
Output — Query — CUME_DIST() i.e. Relative Rank
Explanation:
Let's consider order_id 1112 having 3 products, relative rank calculated as discussed below
formula used:
 Row no. 3— First product: 1/3 = 0.3333

 Row no. 4— Second product: 2/3 = 0.666
 Row no. 5— Third product: 3/3 = 1
Similarly, if product having same value or price then relative rank also same check order_id
1114 in output screenshot.
d. ROW_NUMBER():
 An ordinal number of the current row within its partition based on ORDER BY
expression in the OVER clause.
 Each value is ordered within its PARTITION BY expression.
 Rows with equal values for the ORDER BY expressions receive different row numbers
non deterministically.
EXAMPLE:
Query: Assign Row_Number to the product based on their prices in each order (i.e. partition by
order_id).

ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY price) AS
Row_Number_Product_Price
FROM retails
Output — Query — Row_Number()
Explanation:
 As we can see in output screenshot, row number assign based on price (ORDER BY
expression) within each order (PARTITION BY order_id).
 Not consider value same or not, just assign row_number to each row in the expression.
e. NTILE():
 Divides the rows for each window partition, as equally as possible, into a specified
number of ranked groups.
 Requires ORDER BY clause in the OVER clause.
 The column or expression specified in ORDER BY clause, first all values has been sorted
in ascending order and then equally assign group number.
Example:
Query: Assign Group/cluster/bucket number to all row into 10 different groups based on the
product price.

NTILE(10) OVER (ORDER BY price) AS NTile_Product_Price
FROM retails
Output — Query — NTILE()
Explanation:
 In this dataset, we have a total of 50 records.

 Hence, each clustered consist of 5 rows as shown in the output screenshot.
 First, all rows have been sorted with the price and then assign a group number to each
row.
f. PERCENT_RANK()
 Percentage rank of the current row using the following formula:
Example:
Query: Calculate or assign percentage rank to all row based on the product price.

PERCENT_RANK() OVER (PARTITION BY order_id ORDER BY price) AS
Row_Number_Product_Price
FROM retails
Output — Query — Percent_Rank()
Explanation:
Let’s consider order_id 1114 having 5 products, relative rank calculated as discussed below
formula used:
 Row no. 9— First product: (1–1)/(5–1) = 0
 Row no. 10— Second product: (1–1)/(5–1) = 0
 Row no. 11— Third product: (3–1)/(5–1) = 0.5
 Row no. 12 — Forth product: (4–1)/(5–1) = 0.75
 Row no. 13 — Fifth product: (5–1)/(5–1) = 1
3. Window Analytic Functions:

Consist one of the supporting ranking function i.e. LAG(), LEAD(), FIRST_VALUE(),
LAST_VALUE().
a. LAG() or LEAD():
Syntax:
LAG | LEAD (expression)

OVER ([ PARTITION BY expression_list] [ORDER BY order_list] )
 LAG or LEAD returns, value for the row value before or after the current row in a
partition respectively.
 If no row exists, null is returned.
Example:
Query: Add new feature 1 step LAG or LEAD product price within each order (i.e.
PARTITION BY order_id)

LAG(price,1) OVER (PARTITION BY order_id ORDER BY price) AS LAG_Product_Price,
LEAD(price,1) OVER (PARTITION BY order_id ORDER BY price) AS
LEAD_Product_Price
FROM retails
Output — Query — LAG() or LEAD()
Explanation:

Windofhnction

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Windofhnction

Uploaded by

Copyright:

Available Formats

Explanation:

Query 1: Calculate the number of products purchased by a customer in the order.

SELECT order_id, name, product_id, COUNT(*) OVER (Partition BY order_id) AS

SELECT order_id, name, product_id, COUNT(*) OVER (Order BY order_id) AS

Output — Query 1 & 2 — COUNT()

Output- Query- MIN() or MAX()

 Returns sum/total expression across all input value.

Below query return Total Price of each order_id.

Output — Query — SUM()

 New column added with total_order_price for each order_id.

Query 1: Rank the product based on their prices.

SELECT order_id, name, product_id, price,

SELECT order_id, name, product_id, price,

Output — Query 1 & 2 — RANK()

 Rank has been done by ORDER BY expression i.e. price column.

SELECT order_id, name, product_id, price,

SELECT order_id, name, product_id, price,

Output — Query — CUME_DIST() i.e. Relative Rank

 Row no. 3— First product: 1/3 = 0.3333

SELECT order_id, name, product_id, price,

Output — Query — Row_Number()

SELECT order_id, name, product_id, price,

Output — Query — NTILE()

 In this dataset, we have a total of 50 records.

 Percentage rank of the current row using the following formula:

SELECT order_id, name, product_id, price,

Output — Query — Percent_Rank()

3. Window Analytic Functions:

LAG | LEAD (expression)

SELECT order_id, name, product_id, price,

You might also like