You are on page 1of 18

1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.

ROW_NUMBER() | by Lori Lu | Medium

Search Write

What’s the difference? — RANK()


vs.DENSE_RANK()
vs.ROW_NUMBER()
SQL 101 —How to Rank Rows in Byzer

Lori Lu · Follow
4 min read · Mar 30, 2022

85

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 1/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Photo by Joshua Golde on Unsplash

W hat’s the difference between RANK() vs. DENSE_RANK() vs.


ROW_NUMBER()? How are they different from each other?

Struggle to answer this question? Well, you have to read this blog!

Let’s go back to the real world and learn the difference from a common retail
analytics use case. We’ll run some queries in Byzer Notebook so you can
easily compare them to each other.

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 2/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Retail Analytics: Finding Top X Best-Selling Products of This


Season
Data is the backbone of the retail industry. What products should be sold in
which store at what price is determined by data insights extracted from
analyzing customer purchase history. One of the must-have metrics is Top X
best-selling products within a timeframe.

In this blog, let’s run some SQL queries in Byzer Notebook to find the Top 3
Best-Selling Products of this season for a retailer.

We’ll use the sales table. It has the following columns:

product - The name of the product.

product_price - The price of the product.

items_sold - The number of items sold.

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 3/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Dummy Sales Data

Let’s create a dummy table in Byzer running the following code:

-- store dummy sales data as a JSON string

set sales='''

product,product_price,items_sold

a, 44.12, 6547
b, 100, 547
https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 4/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

c, 12.47, 48

d, 12, 3254

e, 100, 547
f, 12, 3254

g, 12, 3254

h, 7.77, 147

''';
-- convert JSON string to a table

load csvStr.`sales` options header="true" and inferSchema="true"

as sales;

Pick Which One? RANK() or DENSE_RANK() or ROW_NUMBER()


To compute the most popular items, intuitively, we would sum the revenue
up for each product, sort products in descending order of total revenue, and
then pick the top 3 products. Alternatively, we can use the SQL window
function — RANK() to compute the ranking and then filter data to show the
top 3 items.

However, the results might surprise you as you run the query with RANK() :

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 5/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Where is the No.3 item?

Let’s run the following query that ranks the rows by revenue using the three
ranking functions described above:

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 6/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

SQL — RANK(), DENSE_RANK(), ROW_NUMBER()

See the difference?

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 7/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

The ROW_NUMBER() function is self-explanatory, as you’ve already seen the


data. It simply assigns a consecutive ranking to each row ordered by
revenue. If two rows have the same value, they won’t have the same ranking.

SQL — ROW_NUMBER() function

The RANK() function creates a ranking of the rows based on the provided
columns. It starts with assigning “1” to the first row in the order and then
gives higher numbers to rows lower in the order. If rows have the same
value, they’re ranked the same. However, the next spot is shifted accordingly.

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 8/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

For example, if two rows are 2th (have the same rank), the next row will be
4th (i.e., 3rd doesn’t exist).

SQL — RANK() function

The DENSE_RANK() function is rather similar. The only difference is that it


doesn’t leave gaps in the ranking values. Even though more than one row can

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 9/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

have the same rank, the rank of the next row will be one plus the previous
number. For example, if two rows are 2rd, the next row will be 3rd.

SQL — DENSE_RANK() function

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 10/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

So DENSE_RANK() is the right ranking function to show top results in this use
case.

Here is the code:

SELECT
RANK() OVER(ORDER BY product_price * items_sold DESC) AS rank,
DENSE_RANK() OVER(ORDER BY product_price * items_sold DESC) AS
dense_rank,
ROW_NUMBER() OVER(ORDER BY product_price * items_sold DESC) AS
row_number,
product,
product_price * items_sold AS revenue
FROM sales
AS sales_rank;

What’s Byzer?

A simple analogy of Byzer

Go to this blog — Byzer 101, if you want to install Byzer on your laptop.

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 11/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Want to see how Byzer how seamlessly Byzer Notebook integrates with SQL
and Python? Go try this tutorial:

The EASIEST Way to Build and Visualise a Conversion Funnel

You can Do More with Less in Byzer !

Leaving feedback:

Please leave a comment here or join Slack to ask


questions, get help, or discuss all things Byzer!

Last but not least, please share Byzer with data


enthusiasts around you if you like this open-source
project!

Thanks for reading!

Please share, subscribe to my email list, or follow me on Medium for upcoming


blogs.

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 12/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Data Sql Analytics Engineering Ml So Good

Written by Lori Lu Follow

438 Followers

Data, Strategy & Planning | Restaurant Industry

More from Lori Lu

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 13/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Lori Lu in Kyligence Joanna He in Kyligence

Metrics Store in Action|Data Understanding the Metrics Store


Analytics Workflow: Before vs… Learn why and how enterprises adopt metrics
Metrics Store in Action #1–2-minute Tech Tok store
for 2 years’ Implementation

3 min read · Jan 8, 2022 9 min read · Feb 8, 2022

40 196

Xiaodong (Tony) Zhang in Kyligence Lori Lu in Kyligence

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 14/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Using semantic layers with data Why Building a Self-Service


marts Metrics Platform?
This article will describe how to implement Metrics Store in Action #2–2-minute Tech Tok
semantic layers with data marts, as well as… for 2 years’ Implementation

8 min read · May 27, 2022 3 min read · Jan 13, 2022

51 1 33

See all from Lori Lu

Recommended from Medium

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 15/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Michael Berk in Towards Data Science Alexandru Lazar in ILLUMINATION

1.5 Years of Spark Knowledge in 8 Ten Habits that will get you ahead
Tips of 99% of People
My learnings from Databricks customer Improve your life and get ahead of your peers
engagements in 10 simple steps

8 min read · Dec 24, 2023 9 min read · Nov 18, 2023

746 5 15.8K 281

Lists

Leadership Leadership upgrades


41 stories · 188 saves 7 stories · 58 saves

Predictive Modeling w/ Stories to Help You Grow as a


Python Software Developer
20 stories · 750 saves 19 stories · 678 saves

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 16/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Jeremy Neeraj Kushwaha

What I learned after one year of Deep Dive into Apache Parquet:
building a Data Platform from… Efficient Data Storage for Analytics
My key learnings on building a Data platform, In today’s digital age, the amount of data
from the tech side to the business side being generated is growing at an…

9 min read · Nov 14, 2023 9 min read · Aug 27, 2023

2.5K 30 155 1

Sean Coyne SQL Fundamentals in DevOps.dev

Top 10 Advanced SQL Queries


https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 17/18
1/4/24, 5:57 AM What’s the difference? — RANK() vs.DENSE_RANK() vs.ROW_NUMBER() | by Lori Lu | Medium

Data Engineering: Practice System SQL (Structured Query Language) is a


Design Question and Solution:… versatile tool for managing and querying dat…
One common type of data engineering
· 3 min read · Nov 17, 2023
system design question is the streaming dat…

12 min read · Nov 7, 2023 908 4

97 1

See more recommendations

Help Status About Careers Blog Privacy Terms Text to speech Teams

https://medium.com/@LoriLu/whats-the-difference-rank-vs-dense-rank-vs-row-number-3aca5ecfb928 18/18

You might also like