You are on page 1of 57

SQL Subquery

In this tutorial, you will learn about subqueries in SQL with the help of examples.
In SQL, it's possible to place a SQL query inside another query. This inner query is known as a
subquery.
Example
-- use a subquery to select the first name of customer
-- with the maximum value of customer id
SELECT first_name
FROM Customers
WHERE customer_id= (
SELECT MAX(customer_id)
FROM CUSTOMERS
);
Run Code
Here, the query is divided into two parts:
 the subquery selects the maximum id from the Customers table
 the outer query selects the first_name of the customer with the maximum id (returned by the
sub query)

SQL Subquery Syntax


The syntax of SQL subqueries is:

SELECT column FROM table


WHERE column OPERATOR (
SELECT column FROM table
);

Here,
 column is the name of the column(s) to filter
 OPERATOR is any SQL operator to connect the two queries
 table is the name of the table to fetch the column from

Example 1: SQL Subquery


-- select all the rows from the Customers table with the minimum age
SELECT *
FROM Customers
WHERE age = (
SELECT MIN(age)
FROM Customers
);
Run Code
In a subquery, the outer query's result depends on the result set of the inner subquery. That's
why subqueries are also called nested queries.
Here, the SQL command
1. executes the subquery first; selects the minimum age from the Customers table.
2. executes the outer query; selects the rows where age is equal to the result of subquery.
Examp
le: SQL Subqueries

Example 2: SQL Subquery With IN Operator


Suppose we want the details of customers who have placed an order. Here's how we can do
that using a subquery:
-- select the customers who have made orders
SELECT customer_id, first_name
FROM Customers
WHERE customer_id IN (
SELECT customer_id
FROM Orders
);
Run Code
Here, the SQL command
1. selects customer_id from the Orders table
2. select those rows from the Customers table where customer_id is in the result set of the
subquery
Examp
le: SQL Subquery

SQL Subquery and JOIN


In some scenarios, we can get the same result set using a subquery and the JOIN clause. For
example,
-- SELECT DISTINCT only selects the unique combination of customer_id and first_name
-- join the Customers and Orders tables and select the rows where their customer_id values
match
-- result set contains customer_id and first_name of customers who made an order

SELECT DISTINCT Customers.customer_id, Customers.first_name


FROM Customers
INNER JOIN Orders
ON Customers.customer_id = Orders.customer_id
ORDER BY Customers.customer_id;
Run Code
The result set of the above query will be the same as the one below:
-- display the distinct customer ids and first names
-- of customers who made an order using a subquery
SELECT customer_id, first_name
FROM Customers
WHERE customer_id IN (
SELECT customer_id
FROM Orders
);
Run Code

Note: We should use the JOIN clause instead of a subquery whenever possible. It's because the
execution speed of JOIN is more optimized than that of a subquery.

SQL subqueries are basic tools if you want to communicate effectively with relational databases. In
this article, I provide five subquery examples demonstrating how to use scalar, multirow, and
correlated subqueries in the WHERE, FROM/JOIN, and SELECT clauses.
A subquery, or nested query, is a query placed within another SQL query. When requesting
information from a database, you may find it necessary to include a subquery into
the SELECT, FROM , JOIN, or WHERE clause. However, you can also use subqueries when
updating the database (i.e. in INSERT, UPDATE, and DELETE statements).
There are several types of SQL subqueries:
 Scalar subqueries return a single value, or exactly one row and exactly one column.
 Multirow subqueries return either:
o One column with multiple rows (i.e. a list of values), or
o Multiple columns with multiple rows (i.e. tables).
 Correlated subqueries, where the inner query relies on information obtained from the
outer query.
You can read more about the different types of SQL subqueries elsewhere; here, I want to focus
on examples. As we all know, it’s always easier to grasp new concepts with real-world use cases.
So let’s get started.

5 Subquery Examples in SQL


Let’s say we run an art gallery. We have a database with four tables: paintings, artists, collectors,
and sales. You can see the data stored in each table below.
paintings

id name artist_id listed_price

11 Miracle 1 300.00

12 Sunshine 1 700.00

13 Pretty woman 2 2800.00

14 Handsome man 2 2300.00

15 Barbie 3 250.00

16 Cool painting 3 5000.00

17 Black square #1000 3 50.00

18 Mountains 4 1300.00
Artists

id first_name last_name

1 Thomas Black

2 Kate Smith

3 Natali Wein

4 Francesco Benelli

Collectors

id first_name last_name

101 Brandon Cooper

102 Laura Fisher

103 Christina Buffet

104 Steve Stevenson

Sales

D
arti sales
i a painti collec
st_ _pric
d t ng_id tor_id
id e
e

2
0
2
1 1
0 - 2500.
13 2 104
0 1 00
1 1
-
0
1

1 2 14 2 102 2300.
0 0 00
0 2
2 1
-
Sales

1
1
-
1
0

2
0
2
1 1
0 - 300.0
11 1 102
0 1 0
3 1
-
1
0

2
0
2
1 1
0 - 4000.
16 3 103
0 1 00
4 1
-
1
5

2
0
2
1 1
0 - 200.0
15 3 103
0 1 0
5 1
-
2
2

2
0
2
1 1
0 -
17 3 103 50.00
0 1
6 1
-
2
2
Now let’s explore this data using SQL queries with different types of subqueries.
Example 1 - Scalar Subquery
We’ll start with a simple example: We want to list paintings that are priced higher than the
average. Basically, we want to get painting names along with the listed prices, but only for the
ones that cost more than average. That means that we first need to find this average price;
here’s where the scalar subquery comes into play:
SELECT name, listed_price
FROM paintings
WHERE listed_price > (
    SELECT AVG(listed_price)
    FROM paintings
);
Our subquery is in the WHERE clause, where it filters the result set based on the listed price. This
subquery returns a single value: the average price per painting for our gallery. Each listed price is
compared to this value, and only the paintings that are priced above average make it to the final
output:
name listed_price

Pretty woman 2800.00

Handsome man 2300.00

Cool painting 5000.00


If this seems a bit complicated, you may want to check out our interactive SQL Basics course and
brush up on your essential SQL skills.
Examples 2 – Multirow Subquery
Now let’s look into subqueries that return one column with multiple rows. These subqueries are
often included in the WHERE clause to filter the results of the main query.
Suppose we want to list all collectors who purchased paintings from our gallery. We can get the
necessary output using a multirow subquery. Specifically, we can use an inner query to list all
collectors’ IDs present in the sales table – these would be IDs corresponding to collectors who
made at least one purchase with our gallery. Then, in the outer query, we request the first name
and last name of all collectors whose ID is in the output of the inner query. Here’s the code:
SELECT first_name, last_name
FROM collectors
WHERE id IN (
    SELECT collector_id
    FROM sales
);
And here’s the output:
first_name last_name

Laura Fisher

Christina Buffet

Steve Stevenson
Interestingly, we could get the same result without a subquery by using an INNER JOIN (or
just JOIN). This join type returns only records that can be found in both tables. So, if we join
the collectors and the sales tables, we’ll get a list of collectors with corresponding records in
the sales table. Note: I have also used the DISTINCT keyword here to remove duplicates from
the output.
Here’s the query:
SELECT DISTINCT collectors.first_name, collectors.last_name
FROM collectors
JOIN sales
  ON collectors.id = sales.collector_id;
You can read more about choosing subquery vs. JOIN elsewhere in our blog.
Example 3 – Multirow Subquery with Multiple Columns
When a subquery returns a table with multiple rows and multiple columns, that subquery is
usually found in the FROM or JOIN clause. This allows you to get a table with data that was not
readily available in the database (e.g. grouped data) and then join this table with another one
from your database, if necessary.
Let’s say that we want to see the total amount of sales for each artist who has sold at least one
painting in our gallery. We may start with a subquery that draws on the sales table and
calculates the total amount of sales for each artist ID. Then, in the outer query, we combine this
information with the artists’ first names and last names to get the required output:
SELECT
  artists.first_name,
  artists.last_name,
  artist_sales.sales
FROM artists
JOIN (
    SELECT artist_id, SUM(sales_price) AS sales
    FROM sales
    GROUP BY artist_id
  ) AS artist_sales
  ON artists.id = artist_sales.artist_id;
We assign a meaningful alias to the output of our subquery (artist_sales). This way, we can
easily refer to it in the outer query, when selecting the column from this table, and when
defining the join condition in the ON clause. Note: Databases will throw an error if you don't
provide an alias for your subquery output.
Here’s the result of the query:
first_name last_name sales

Thomas Black 300

Kate Smith 4800

Natali Wein 4250


So, within one short SQL query, we were able to calculate the total sales for each artist based on
the raw data from one table (sales), and then join this output with the data from another table
(artists).
Subqueries can be quite powerful when we need to combine information from multiple tables.
Let’s see what else we can do with subqueries.
Example 4 – Correlated Subquery
The following example will demonstrate how subqueries:
 Can be used in the SELECT clause, and
 Can be correlated (i.e. the main or outer query relies on information obtained from the
inner query).
For each collector, we want to calculate the number of paintings purchased through our gallery.
To answer this question, we can use a subquery that counts the number of paintings purchased
by each collector. Here’s the entire query:
SELECT
  first_name,
  last_name,
  (
    SELECT count(*) AS paintings
    FROM sales
    WHERE collectors.id = sales.collector_id
  )
FROM collectors;
Notice how the inner query in this example actually runs for each row of the collectors table:
 The subquery is placed in the SELECT clause because we want to have an additional
column with the number of paintings purchased by the corresponding collector.
 For each record of the collectors table, the inner subquery calculates the total number of
paintings purchased by a collector with the corresponding ID.
Here’s the output:
first_name last_name paintings

Brandon Cooper 0

Laura Fisher 2

Christina Buffet 3

Steve Stevenson 1
As you see, the output of the subquery (i.e. the number of paintings) is different for each record
and depends on the output of the outer query (i.e. the corresponding collector). Thus, we are
dealing with a correlated subquery here.
Check out this guide if you want to learn how to write correlated subqueries in SQL. For now, 
let’s have one more correlated subquery example.
Example 5 – Correlated Subquery
This time, we want to show the first names and the last names of the artists who had zero sales
with our gallery. Let’s try to accomplish this task using a correlated subquery in
the WHERE clause:
SELECT first_name, last_name
FROM artists
WHERE NOT EXISTS (
  SELECT *
  FROM sales
  WHERE sales.artist_id = artists.id
);
Here is what's going on in this query:
 The outer query lists basic information on the artists, first checking if there are
corresponding records in the sales
 The inner query looks for records that correspond to the artist ID that is currently being
checked by the outer query.
 If there are no corresponding records, the first name and the last name of the
corresponding artist are added to the output:
first_name last_name

Francesco Benelli
In our example, we have only one artist without any sales yet. Hopefully, he’ll land one soon.
What is subquery in SQL?
A subquery is a SQL query nested inside a larger query.
 A subquery may occur in :
o - A SELECT clause
o - A FROM clause
o - A WHERE clause
 The subquery can be nested inside a SELECT, INSERT, UPDATE, or DELETE statement or
inside another subquery.
 A subquery is usually added within the WHERE Clause of another SQL SELECT statement.
 You can use the comparison operators, such as >, <, or =. The comparison operator can
also be a multiple-row operator, such as IN, ANY, or ALL.
 A subquery is also called an inner query or inner select, while the statement containing a
subquery is also called an outer query or outer select.
 The inner query executes first before its parent query so that the results of an inner
query can be passed to the outer query.
You can use a subquery in a SELECT, INSERT, DELETE, or UPDATE statement to perform the
following tasks:
 Compare an expression to the result of the query.
 Determine if an expression is included in the results of the query.
 Check whether the query selects any rows.
Syntax :

 The subquery (inner query) executes once before the main query (outer query) executes.
 The main query (outer query) use the subquery result.
SQL Subqueries Example :
In this section, you will learn the requirements of using subqueries. We have the following two
tables 'student' and 'marks' with common field 'StudentID'.
         
            student                                        marks
Now we want to write a query to identify all students who get better marks than that of the
student who's StudentID is 'V002', but we do not know the marks of 'V002'.
- To solve the problem, we require two queries. One query returns the marks (stored in
Total_marks field) of 'V002' and a second query identifies the students who get better marks
than the result of the first query.
First query:
SELECT *
FROM `marks`
WHERE studentid = 'V002';
Copy
Query result:

The result of the query is 80.


- Using the result of this query, here we have written another query to identify the students who
get better marks than 80. Here is the query :
Second query:
SELECT a.studentid, a.name, b.total_marks
FROM student a, marks b
WHERE a.studentid = b.studentid
AND b.total_marks >80;
Copy
Relational Algebra Expression:

Relational Algebra Tree:


Query result:

Above two queries identified students who get the better number than the student who's
StudentID is 'V002' (Abhay).
You can combine the above two queries by placing one query inside the other. The subquery
(also called the 'inner query') is the query inside the parentheses. See the following code and
query result :
SQL Code:
SELECT a.studentid, a.name, b.total_marks
FROM student a, marks b
WHERE a.studentid = b.studentid AND b.total_marks >
(SELECT total_marks
FROM marks
WHERE studentid = 'V002');
Copy
Query result:

Pictorial Presentation of SQL Subquery:


Subqueries: General Rules
A subquery SELECT statement is almost similar to the SELECT statement and it is used to begin a
regular or outer query. Here is the syntax of a subquery:
Syntax:
(SELECT [DISTINCT] subquery_select_argument
FROM {table_name | view_name}
{table_name | view_name} ...
[WHERE search_conditions]
[GROUP BY aggregate_expression [, aggregate_expression] ...]
[HAVING search_conditions])
Subqueries: Guidelines
There are some guidelines to consider when using subqueries :
 A subquery must be enclosed in parentheses. 
 A subquery must be placed on the right side of the comparison operator. 
 Subqueries cannot manipulate their results internally, therefore ORDER BY clause cannot
be added into a subquery. You can use an ORDER BY clause in the main SELECT
statement (outer query) which will be the last clause.
 Use single-row operators with single-row subqueries. 
 If a subquery (inner query) returns a null value to the outer query, the outer query will
not return any rows when using certain comparison operators in a WHERE clause.
Type of Subqueries
 Single row subquery : Returns zero or one row.
 Multiple row subquery : Returns one or more rows.
 Multiple column subqueries : Returns one or more columns.
 Correlated subqueries : Reference one or more columns in the outer SQL statement. The
subquery is known as a correlated subquery because the subquery is related to the outer
SQL statement.
 Nested subqueries : Subqueries are placed within another subquery.
In the next session, we have thoroughly discussed the above topics. Apart from the above type
of subqueries, you can use a subquery inside INSERT, UPDATE and DELETE statement. Here is a
brief discussion :
Subqueries with INSERT statement
INSERT statement can be used with subqueries. Here are the syntax and an example of
subqueries using INSERT statement.
Syntax:
INSERT INTO table_name [ (column1 [, column2 ]) ]
SELECT [ *|column1 [, column2 ]
FROM table1 [, table2 ]
[ WHERE VALUE OPERATOR ];
If we want to insert those orders from 'orders' table which have the advance_amount 2000 or
5000 into 'neworder' table the following SQL can be used:
Sample table: orders

SQL Code:
INSERT INTO neworder
SELECT * FROM orders
WHERE advance_amount in(2000,5000);
Copy
Output:

To see more details of subqueries using INSERT statement click here.


Subqueries with UPDATE statement
In a UPDATE statement, you can set new column value equal to the result returned by a single
row subquery. Here are the syntax and an example of subqueries using UPDATE statement.
Syntax:
UPDATE table SET column_name = new_value
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
If we want to update that ord_date in 'neworder' table with '15-JAN-10' which have the
difference of ord_amount and advance_amount is less than the minimum ord_amount of
'orders' table the following SQL can be used:
Sample table: neworder

SQL Code:
UPDATE neworder
SET ord_date='15-JAN-10'
WHERE ord_amount-advance_amount<
(SELECT MIN(ord_amount) FROM orders);
Copy
Output:
To see more details of subqueries using UPDATE statement click here.
Subqueries with DELETE statement
DELETE statement can be used with subqueries. Here are the syntax and an example of
subqueries using DELETE statement.
Syntax:
DELETE FROM TABLE_NAME
[ WHERE OPERATOR [ VALUE ]
(SELECT COLUMN_NAME
FROM TABLE_NAME)
[ WHERE) ]
If we want to delete those orders from 'neworder' table which advance_amount are less than
the maximum advance_amount of 'orders' table, the following SQL can be used:
Sample table: neworder

SQL Code:
DELETE FROM neworder
WHERE advance_amount<
(SELECT MAX(advance_amount) FROM orders);
Copy
Output:

To see more detail


Difference between RANK and DENSE_RANK
Introduction
Have you ever heard about ranking of sql tables. What are different ways in which we can rank
the tables. In this article we will see what ranking means in Sql and how it is achieved.
We will see RANK and DENSE_RANK functions in details. These are the functions used for
ranking in SQL.We will also see what are the difference between these two and how we can use
them in different ways. Additionally we will also see the use case of these functions.
 

Need of Ranking
Suppose a student is given a dataset with information regarding the exam of 50 students. It
would take them some time to determine the top performers. Sorting through all that
information is part of the work, which is problematic. But you can solve this issue using SQL
queries like RANK and DENSE_RANK. 
Operations that used to take a long time to complete can now be finished in a few seconds.
These functions are used to order and assign numerical values when they fall into two entities.
Both these functions are always used with the OVER() clause.
For example, a company has five employees with five different salary amounts.
Amounts are like 50 thousand, 45 thousand, 55 thousand, 35 thousand and 42 thousand
respectively. Using this function, we got a rank and estimated who received the highest salary.
 We will learn how to use these functions with real-time examples. Then we will understand
what the critical difference between both functions is. 
What is the RANK Function in SQL?
The rank function is a SQL function specifically used to calculate each row's ranking depending
on the defined attributes and clauses.
Some essential points need to be considered while using the RANK function:
 The use of the ORDER BY clause is necessary for using the RANK function. 
 
 PARTITION BY clause can be optional.
 
 If two records share identical numerical values- they will also share a similar ranking value.
 
This leads to combined results that do not follow any sequential order, leading to irregularity
within some values.
Syntax: 
SELECT col_name
RANK() OVER(
[PARTITION BY exp]  // if you want to partition in group
ORDER BY exp [ASC | DESC], [{exp1…}]
AS ‘r’ FROM table_name

Examples of RANK Function


Let’s see an example, to understand the RANK function better. Let’s consider Score  table which
has attributes like name, subject and marks.
Id name subject marks

1 Ninja_1 Maths 85

2 Ninja_2 Maths 50

3 Ninja_3 Science 70

4 Ninja_4 Economics 85

5 Ninja_5 Maths 20

6 Ninja_6 English 92

7 Ninja_7 English 92

8 Ninja_8 Science 89

9 Ninja_9 Maths 63
The above table (score) contains six rows and four columns; now, we will use the RANK function
to estimate the students' Rank without using the PARTITION function.
Estimating Rank without using Partition Function 
SELECT
 name, subject, marks,
 RANK() OVER (
   ORDER BY marks ASC ) AS rank
FROM Result;
 
Output:
name Subject marks rank

Ninja_5 Maths 20 1
Ninja_2 Maths 50 2

Ninja_9 Maths 63 3

Ninja_3 Science 70 4

Ninja_1 Maths 85 5

Ninja_4 Economics 85 5

Ninja_8 Science 89 7

Ninja_6 English 92 8

Ninja_7 English 92 8

We can observe that Ninja_1 and Ninja_4 are assigned the same Rank as 5, and Ninja_8, which
has different marks, is assigned a new seven instead of 6.
We will implement the RANK function using the Partition clause in descending order.
Estimating Rank using Partition Function
SELECT
 name, subject, marks,
 RANK() OVER (
   PARTITION BY subject
   ORDER BY marks ASC ) AS rank
 From Result;
 
Output:
name Subject marks rank

Ninja_4 Economics 85 1

Ninja_6 English 92 1

Ninja_7 English 92 1

Ninja_5 Maths 20 1

Ninja_2 Maths 50 2

Ninja_9 Maths 63 3

Ninja_1 Maths 85 4

Ninja_3 Science 70 1

Ninja_8 Science 89 2
We observe that using the PARTITION clause, we can assign ranks based on groups or partitions
of data. Then the PARTITION function can help divide the resulting set into smaller groups or
sections. In the above case, since there are four students with the subject 'Maths', the student
with higher marks is assigned the rank 1 and 2, 3 and 4, respectively.
What is DENSE_RANK Function in SQL?
 
The DENSE RANK function and the RANK function share similarities. But slightly different from
that of the rank function. It produces a Rank continuously without any gap.
Some essential points need to be kept in mind while using the DENSE_RANK function:
 Rows with identical values receive the same Rank.
 
 The Rank of subsequent rows increases by one.
Syntax:
SELECT col_name
DENSE_RANK() OVER(
[PARTITION BY exp]  // if you want to partition in group
ORDER BY exp [ASC | DESC], [{exp1…}]
AS ‘r’ FROM table_name
 
Examples of DENSE_RANK Function
Let’s see an example, to understand the Dense_Rank function better. Let’s consider Score  table
which has attributes like name, subject and marks.
Id name subject marks

1 Ninja_1 Maths 85

2 Ninja_2 Maths 50

3 Ninja_3 Science 70

4 Ninja_4 Economics 85

5 Ninja_5 Maths 20

6 Ninja_6 English 92

7 Ninja_7 English 92

8 Ninja_8 Science 89

9 Ninja_9 Maths 63
The above table (score) contains six rows and four columns; for better understanding, let’s see
an example using Dense_Rank without the PARTITION clause.
Estimating DENSE_RANK without using Partition Function 
SELECT
 name, subject, marks,
 DENSE_RANK() OVER (
   ORDER BY marks ASC
) AS rank
FROM Result;
Output:
name Subject marks rank

Ninja_5 Maths 20 1

Ninja_2 Maths 50 2

Ninja_9 Maths 63 3

Ninja_3 Science 70 4
Ninja_1 Maths 85 5

Ninja_4 Economics 85 5

Ninja_8 Science 89 6

Ninja_6 English 92 7

Ninja_7 English 92 7
Since Ninja_1 and Ninja_4 have the same marks, they are given the same Rank. And Ninja_8 is
given Rank 6.
Another example is DENSE_RANK with PARTITION clause.
Estimating DENSE_RANK using Partition Function 
 
SELECT
 name, subject, marks,
 DENSE_RANK() OVER (
   PARTITION BY subject
   ORDER BY marks ASC
 ) AS rank
FROM Result;
Output:
name Subject marks rank

Ninja_4 Economics 85 1

Ninja_6 English 92 1

Ninja_7 English 92 1

Ninja_5 Maths 20 1

Ninja_2 Maths 50 2

Ninja_9 Maths 63 3

Ninja_1 Maths 85 4

Ninja_3 Science 70 1

Ninja_8 Science 89 2
Comparison Table between RANK and DENSE_RANK
RANK DENSE_RANK

The next Rank is skipped if two or more rank The next Rank is not skipped if two or more rows
rows have identical values in the ORDER BY have the same values in the ORDER BY columns
columns. and obtain the same dense Rank.

They do not follow a chronological order They follow a chronological order.


because the skipped ranks.  

Example: If two workers have the same value, Example: If two workers have the same value, for
for instance, they will both obtain rank 1, and instance, they will both be assigned rank 1, and
the following employee will receive rank 3. the subsequent employee will be given a rank of
2.
A quick summary of SQL ranking function
RANK : The rank function is a SQL function specifically used to calculate each row's ranking
depending on the defined attributes and clauses. It skips a rank who has the same record values.
DENSE_RANK : The dense_rank function is a SQL function which assigns rank number to each
row. It does not skip any rank who has the same record values.
Frequently Asked Questions
How can you use RANK and DENSE_RANK to identify outliers?
RANK and DENSE_RANK are used to identify outliers by looking at the tables. For example if
some records rank are in the range of 20 to 50 but there is a record which has 100,this may
indicate an abnormality. 
 
What are some limitations of using RANK and DENSE_RANK that we need to keep in mind
before using it?
Limitation of using RANK and DENSE_RANK is that they are not suitable for ties or skewed
distributed datasets. To use RANK and DENSE_RANK on these dataset, you need to make some
adjustments in the ranking method or choose an alternative option.  
 
What are some use cases for RANK() and DENSE_RANK() in SQL?
Some common use cases for rank and dense_rank functions in SQL include identifying top
performers, ranking products or identifying trends from data. They are used to answer
questions like “what are top 10 performers in a company over last month” or “what are the top
5 highest rated movies over last year.”
 
How do the RANK and DENSE_RANK functions work with NULL values?
NULL values are treated as unique values by RANK and DENSE_RANK. They can affect ranking
order. It assigns a distinct RANK and DENSE_RANK value to the record. Depending on sorting
technique or partition criteria, NULL values are either ranked as highest and lowest value in
ranking.

We perform calculations on data using various aggregated functions such as Max, Min, and AVG.
We get a single output row using these functions. SQL Sever provides SQL RANK functions to
specify rank for individual fields as per the categorizations. It returns an aggregated value for
each participating row. SQL RANK functions also knows as Window Functions.
 Note:  Windows term in this does not relate to the Microsoft Windows operating system.
These are SQL RANK functions.
We have the following rank functions.
 ROW_NUMBER()
 RANK()
 DENSE_RANK()
 NTILE()
In the SQL RANK functions, we use the OVER() clause to define a set of rows in the result set.
We can also use SQL PARTITION BY clause to define a subset of data in a partition. You can also
use Order by clause to sort the results in a descending or ascending order.
Before we explore these SQL RANK functions, let’s prepare sample data. In this sample data, we
have exam results for three students in Maths, Science and English subjects.
1 CREATE TABLE ExamResult
2 (StudentName VARCHAR(70),
3 Subject     VARCHAR(20),
4 Marks       INT
5 );
6 INSERT INTO ExamResult
7 VALUES
8 ('Lily',
9 'Maths',
10 65
11 );
12 INSERT INTO ExamResult
13 VALUES
14 ('Lily',
15 'Science',
16 80
17 );
18 INSERT INTO ExamResult
19 VALUES
20 ('Lily',
21 'english',
22 70
23 );
24 INSERT INTO ExamResult
25 VALUES
26 ('Isabella',
27 'Maths',
28 50
29 );
30 INSERT INTO ExamResult
31 VALUES
32 ('Isabella',
33 'Science',
34 70
35 );
36 INSERT INTO ExamResult
37 VALUES
38 ('Isabella',
39 'english',
40 90
41 );
42 INSERT INTO ExamResult
43 VALUES
44 ('Olivia',
45 'Maths',
46 55
47 );
48 INSERT INTO ExamResult
49 VALUES
50 ('Olivia',
51 'Science',
52 60
53 );
54 INSERT INTO ExamResult
55 VALUES
56 ('Olivia',
57 'english',
58 89
59 );
We have the following sample data in the ExamResult table.

Let’s use each SQL Rank Functions in upcoming examples.


ROW_Number() SQL RANK function
We use ROW_Number() SQL RANK function to get a unique sequential number for each row in
the specified data. It gives the rank one for the first row and then increments the value by one
for each row. We get different ranks for the row having similar values as well.
Execute the following query to get a rank for students as per their marks.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        ROW_NUMBER() OVER(ORDER BY Marks) RowNumber
5 FROM ExamResult;

By default, it sorts the data in ascending order and starts assigning ranks for each row. In the
above screenshot, we get ROW number 1 for marks 50.
We can specify descending order with Order By clause, and it changes the RANK accordingly.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        ROW_NUMBER() OVER(ORDER BY Marks desc) RowNumber
5 FROM ExamResult;
RANK() SQL RANK Function
We use RANK() SQL Rank function to specify rank for each row in the result set. We have
student results for three subjects. We want to rank the result of students as per their marks in
the subjects. For example, in the following screenshot, student Isabella got the highest marks in
English subject and lowest marks in Maths subject. As per the marks, Isabella gets the first rank
in English and 3rd place in Maths subject.

Execute the following query to get this result set. In this query, you can note the following
things:
 We use PARTITION BY Studentname clause to perform calculations on each student
group
 Each subset should get rank as per their Marks in descending order
 The result set uses Order By clause to sort results on Studentname and their rank
1 SELECT Studentname,
2        Subject,
3        Marks,
4        RANK() OVER(PARTITION BY Studentname ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
Let’s execute the following query of SQL Rank function and look at the result set. In this query,
we did not specify SQL PARTITION By clause to divide the data into a smaller subset. We use SQL
Rank function with over clause on Marks clause ( in descending order) to get ranks for
respective rows.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        RANK() OVER(ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Rank;
In the output, we can see each student get rank as per their marks irrespective of the specific
subject. For example, the highest and lowest marks in the complete result set are 90 and 50
respectively. In the result set, the highest mark gets RANK 1, and the lowest mark gets RANK 9.
If two students get the same marks (in our example, ROW numbers 4 and 5), their ranks are also
the same.

DENSE_RANK() SQL RANK function


We use DENSE_RANK() function to specify a unique rank number within the partition as per the
specified column value. It is similar to the Rank function with a small difference.
In the SQL RANK function DENSE_RANK(), if we have duplicate values, SQL assigns different
ranks to those rows as well. Ideally, we should get the same rank for duplicate or similar values.
Let’s execute the following query with the DENSE_RANK() function.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        DENSE_RANK() OVER(ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Rank;
In the output, you can see we have the same rank for both Lily and Isabella who scored 70
marks.

Let’s use DENSE_RANK function in combination with the SQL PARTITION BY clause.
1 SELECT Studentname,
2        Subject,
3        Marks,
4        DENSE_RANK() OVER(PARTITION BY Subject ORDER BY Marks DESC) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
We do not have two students with similar marks; therefore result set similar to RANK Function
in this case.
Let’s update the student mark with the following query and rerun the query.
1 Update Examresult set Marks=70 where Studentname='Isabella' and Subject='Maths'
We can see that in the student group, Isabella got similar marks in Maths and Science subjects.
Rank is also the same for both subjects in this case.

Let’s see the difference between RANK() and DENSE_RANK() SQL Rank function with the
following query.
 Query 1
1 SELECT Studentname,
2        Subject,
3        Marks,
4        RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
 Query 2
1 SELECT Studentname,
2        Subject,
3        Marks,
4        DENSE_RANK() OVER(PARTITION BY StudentName ORDER BY Marks ) Rank
5 FROM ExamResult
6 ORDER BY Studentname,
7          Rank;
In the output, you can see a gap in the rank function output within a partition. We do not have
any gap in the DENSE_RANK function.
In the following screenshot, you can see that Isabella has similar numbers in the two subjects. A
rank function assigns rank 1 for similar values however, internally ignores rank two, and the next
row gets rank three.
In the Dense_Rank function, it maintains the rank and does not give any gap for the values.

NTILE(N) SQL RANK function


We use the NTILE(N) function to distribute the number of rows in the specified (N) number of
groups. Each row group gets its rank as per the specified condition. We need to specify the
value for the desired number of groups.
In my example, we have nine records in the ExamResult table. The NTILE(2) shows that we
require a group of two records in the result.
1 SELECT *,
2        NTILE(2) OVER(
3        ORDER BY Marks DESC) Rank
4 FROM ExamResult
5 ORDER BY rank;
In the output, we can see two groups. Group 1 contains five rows, and Group 2 contains four
rows.

Similarly, NTILE(3) divides the number of rows of three groups having three records in each
group.
1 SELECT *,
2        NTILE(3) OVER(
3        ORDER BY Marks DESC) Rank
4 FROM ExamResult
5 ORDER BY rank;
We can use SQL PARTITION BY clause to have more than one partition. In the following query,
each partition on subjects is divided into two groups.
1 SELECT *,
2        NTILE(2) OVER(PARTITION  BY subject ORDER BY Marks DESC) Rank
3 FROM ExamResult
4 ORDER BY subject, rank;

Practical usage of SQL RANK functions


We can use SQL RANK function to fetch specific rows from the data. Suppose we want to get
the data of the students from ranks 1 to 3. In the following query, we use common table
expressions(CTE) to get data using ROW_NUMBER() function and later filtered the result from
CTE to satisfy our condition.
1WITH StudentRanks AS
2(
3  SELECT *, ROW_NUMBER() OVER( ORDER BY Marks) AS Ranks
4  FROM ExamResult
5)
6
7 SELECT StudentName , Marks
8 FROM StudentRanks
9 WHERE Ranks >= 1 and Ranks <=3
10 ORDER BY Ranks

We can use the OFFSET FETCH command starting from SQL Server 2012 to fetch a specific
number of records.
1 WITH StudentRanks AS
2(
3   SELECT *, ROW_NUMBER() OVER( ORDER BY Marks) AS Ranks
4   FROM ExamResult
5)
6
7 SELECT StudentName , Marks
8 FROM StudentRanks
9 ORDER BY Ranks OFFSET 1 ROWS FETCH NEXT 3 ROWS ONLY;

A quick summary of SQL RANK Functions


ROW_Numbe
r It assigns the sequential rank number to each unique record.

It assigns the rank number to each row in a partition. It skips the number for similar
RANK values.

It assigns the rank number to each row in a partition. It does not skip the number for
Dense_RANK similar values.

It divides the number of rows as per specified partition and assigns unique value in
NTILE(N) the partition.
Writing Subqueries in SQL
Starting here? This lesson is part of a full-length tutorial in using SQL for Data Analysis. Check out
the beginning.
In this lesson we'll cover:
 Subquery basics
 Using subqueries to aggregate in multiple stages
 Subqueries in conditional logic
 Joining subqueries
 Subqueries and UNIONs
In this lesson, you will continue to work with the same San Francisco Crime data used in
a previous lesson.
Subquery basics
Subqueries (also known as inner queries or nested queries) are a tool for performing operations
in multiple steps. For example, if you wanted to take the sums of several columns, then average
all of those values, you'd need to do each aggregation in a distinct step.
Subqueries can be used in several places within a query, but it's easiest to start with
the FROM statement. Here's an example of a basic subquery:
SELECT sub.*
FROM (
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE day_of_week = 'Friday'
) sub
WHERE sub.resolution = 'NONE'

Let's break down what happens when you run the above query:
First, the database runs the "inner query"—the part between the parentheses:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE day_of_week = 'Friday'

If you were to run this on its own, it would produce a result set like any other query. It might
sound like a no-brainer, but it's important: your inner query must actually run on its own, as the
database will treat it as an independent query. Once the inner query runs, the outer query will
run using the results from the inner query as its underlying table:
SELECT sub.*
FROM (
<<results from inner query go here>>
) sub
WHERE sub.resolution = 'NONE'

Subqueries are required to have names, which are added after parentheses the same way you
would add an alias to a normal table. In this case, we've used the name "sub."
A quick note on formatting: The important thing to remember when using subqueries is to
provide some way to for the reader to easily determine which parts of the query will be
executed together. Most people do this by indenting the subquery in some way. The examples
in this tutorial are indented quite far—all the way to the parentheses. This isn't practical if you
nest many subqueries, so it's fairly common to only indent two spaces or so.
Practice Problem
Write a query that selects all Warrant Arrests from
the tutorial.sf_crime_incidents_2014_01 dataset, then wrap it in an outer query that only
displays unresolved incidents.

Try it out  See the answer

The above examples, as well as the practice problem don't really require subqueries—they solve
problems that could also be solved by adding multiple conditions to the WHERE clause. These
next sections provide examples for which subqueries are the best or only way to solve their
respective problems.
Using subqueries to aggregate in multiple stages
What if you wanted to figure out how many incidents get reported on each day of the week?
Better yet, what if you wanted to know how many incidents happen, on average, on a Friday in
December? In January? There are two steps to this process: counting the number of incidents
each day (inner query), then determining the monthly average (outer query):
SELECT LEFT(sub.date, 2) AS cleaned_month,
sub.day_of_week,
AVG(sub.incidents) AS average_incidents
FROM (
SELECT day_of_week,
date,
COUNT(incidnt_num) AS incidents
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY 1,2
) sub
GROUP BY 1,2
ORDER BY 1,2

If you're having trouble figuring out what's happening, try running the inner query individually
to get a sense of what its results look like. In general, it's easiest to write inner queries first and
revise them until the results make sense to you, then to move on to the outer query.
Practice Problem
Write a query that displays the average number of monthly incidents for each category. Hint:
use tutorial.sf_crime_incidents_cleandate to make your life a little easier.

Try it out  See the answer

Subqueries in conditional logic


You can use subqueries in conditional logic (in conjunction with WHERE, JOIN/ON, or CASE). The
following query returns all of the entries from the earliest date in the dataset (theoretically—the
poor formatting of the date column actually makes it return the value that sorts first
alphabetically):
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE Date = (SELECT MIN(date)
FROM tutorial.sf_crime_incidents_2014_01
)

The above query works because the result of the subquery is only one cell. Most conditional
logic will work with subqueries containing one-cell results. However, IN is the only type of
conditional logic that will work when the inner query contains multiple results:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01
WHERE Date IN (SELECT date
FROM tutorial.sf_crime_incidents_2014_01
ORDER BY date
LIMIT 5
)

Note that you should not include an alias when you write a subquery in a conditional statement.
This is because the subquery is treated as an individual value (or set of values in the IN case)
rather than as a table.
Joining subqueries
You may remember that you can filter queries in joins. It's fairly common to join a subquery that
hits the same table as the outer query rather than filtering in the WHERE clause. The following
query produces the same results as the previous example:
SELECT *
FROM tutorial.sf_crime_incidents_2014_01 incidents
JOIN ( SELECT date
FROM tutorial.sf_crime_incidents_2014_01
ORDER BY date
LIMIT 5
) sub
ON incidents.date = sub.date

This can be particularly useful when combined with aggregations. When you join, the
requirements for your subquery output aren't as stringent as when you use the WHERE clause.
For example, your inner query can output multiple results. The following query ranks all of the
results according to how many incidents were reported in a given day. It does this by
aggregating the total number of incidents each day in the inner query, then using those values
to sort the outer query:
SELECT incidents.*,
sub.incidents AS incidents_that_day
FROM tutorial.sf_crime_incidents_2014_01 incidents
JOIN ( SELECT date,
COUNT(incidnt_num) AS incidents
FROM tutorial.sf_crime_incidents_2014_01
GROUP BY 1
) sub
ON incidents.date = sub.date
ORDER BY sub.incidents DESC, time

Practice Problem
Write a query that displays all rows from the three categories with the fewest incidents
reported.

Try it out  See the answer

Subqueries can be very helpful in improving the performance of your queries. Let's revisit
the Crunchbase Data briefly. Imagine you'd like to aggregate all of the companies receiving
investment and companies acquired each month. You could do that without subqueries if you
wanted to, but don't actually run this as it will take minutes to return:
SELECT COALESCE(acquisitions.acquired_month, investments.funded_month) AS month,
COUNT(DISTINCT acquisitions.company_permalink) AS companies_acquired,
COUNT(DISTINCT investments.company_permalink) AS investments
FROM tutorial.crunchbase_acquisitions acquisitions
FULL JOIN tutorial.crunchbase_investments investments
ON acquisitions.acquired_month = investments.funded_month
GROUP BY 1

Note that in order to do this properly, you must join on date fields, which causes a massive "data
explosion." Basically, what happens is that you're joining every row in a given month from one
table onto every month in a given row on the other table, so the number of rows returned is
incredibly great. Because of this multiplicative effect, you must use COUNT(DISTINCT) instead
of COUNT to get accurate counts. You can see this below:
The following query shows 7,414 rows:
SELECT COUNT(*) FROM tutorial.crunchbase_acquisitions

The following query shows 83,893 rows:


SELECT COUNT(*) FROM tutorial.crunchbase_investments

The following query shows 6,237,396 rows:


SELECT COUNT(*)
FROM tutorial.crunchbase_acquisitions acquisitions
FULL JOIN tutorial.crunchbase_investments investments
ON acquisitions.acquired_month = investments.funded_month

If you'd like to understand this a little better, you can do some extra research on cartesian
products. It's also worth noting that the FULL JOIN and COUNT above actually runs pretty fast—
it's the COUNT(DISTINCT) that takes forever. More on that in the lesson on optimizing queries.
Of course, you could solve this much more efficiently by aggregating the two tables separately,
then joining them together so that the counts are performed across far smaller datasets:
SELECT COALESCE(acquisitions.month, investments.month) AS month,
acquisitions.companies_acquired,
investments.companies_rec_investment
FROM (
SELECT acquired_month AS month,
COUNT(DISTINCT company_permalink) AS companies_acquired
FROM tutorial.crunchbase_acquisitions
GROUP BY 1
) acquisitions

FULL JOIN (
SELECT funded_month AS month,
COUNT(DISTINCT company_permalink) AS companies_rec_investment
FROM tutorial.crunchbase_investments
GROUP BY 1
)investments

ON acquisitions.month = investments.month
ORDER BY 1 DESC

Note: We used a FULL JOIN above just in case one table had observations in a month that the
other table didn't. We also used COALESCE to display months when the acquisitions subquery
didn't have month entries (presumably no acquisitions occurred in those months). We strongly
encourage you to re-run the query without some of these elements to better understand how
they work. You can also run each of the subqueries independently to get a better understanding
of them as well.
Practice Problem
Write a query that counts the number of companies founded and acquired by quarter starting in
Q1 2012. Create the aggregations in two separate queries, then join them.

Try it out  See the answer

Subqueries and UNIONs


For this next section, we will borrow directly from the lesson on UNIONs—again using the
Crunchbase data:
SELECT *
FROM tutorial.crunchbase_investments_part1

UNION ALL

SELECT *
FROM tutorial.crunchbase_investments_part2

It's certainly not uncommon for a dataset to come split into several parts, especially if the data
passed through Excel at any point (Excel can only handle ~1M rows per spreadsheet). The two
tables used above can be thought of as different parts of the same dataset—what you'd almost
certainly like to do is perform operations on the entire combined dataset rather than on the
individual parts. You can do this by using a subquery:
SELECT COUNT(*) AS total_rows
FROM (
SELECT *
FROM tutorial.crunchbase_investments_part1

UNION ALL

SELECT *
SQL Subquery
Summary: in this tutorial, you will learn about the SQL subquery and how to use the subqueries
to form flexible SQL statements.
SQL subquery basic
Consider the following employees and departments tables from the sample database:

Suppose you have to find all employees who locate in the location with the id 1700. You might
come up with the following solution.
First, find all departments located at the location whose id is 1700:
SELECT
*
FROM
departments
WHERE
location_id = 1700;
Code language: SQL (Structured Query Language) (sql)

Second, find all employees that belong to the location 1700 by using the department id list of
the previous query:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
department_id IN (1 , 3, 8, 10, 11)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)
This solution has two problems. To start with, you have looked at the departments table to
check which department belongs to the location 1700. However, the original question was not
referring to any specific departments; it referred to the location 1700.
Because of the small data volume, you can get a list of department easily. However, in the real
system with high volume data, it might be problematic.
Another problem was that you have to revise the queries whenever you want to find employees
who locate in a different location.
A much better solution to this problem is to use a subquery. By definition, a subquery is a query
nested inside another query such as SELECT, INSERT, UPDATE, or DELETE statement. In this
tutorial, we are focusing on the subquery used with the SELECT statement.
In this example, you can rewrite combine the two queries above as follows:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
department_id IN (SELECT
department_id
FROM
departments
WHERE
location_id = 1700)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)
The query placed within the parentheses is called a subquery. It is also known as an inner query
or inner select. The query that contains the subquery is called an outer query or an outer select.
To execute the query, first, the database system has to execute the subquery and substitute the
subquery between the parentheses with its result – a number of department id located at the
location 1700 – and then executes the outer query.
You can use a subquery in many places such as:
 With the IN or NOT IN operator
 With comparison operators
 With the EXISTS or NOT EXISTS operator
 With the ANY or ALL operator
 In the FROM clause
 In the SELECT clause
SQL subquery examples
Let’s take some examples of using the subqueries to understand how they work.
SQL subquery with the IN or NOT IN operator
In the previous example, you have seen how the subquery was used with the IN operator. The
following example uses a subquery with the NOT IN operator to find all employees who do not
locate at the location 1700:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
department_id NOT IN (SELECT
department_id
FROM
departments
WHERE
location_id = 1700)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)

SQL subquery with the comparison operator


The following syntax illustrates how a subquery is used with a comparison operator:
comparison_operator (subquery)
Code language: SQL (Structured Query Language) (sql)
where the comparison operator is one of these operators:
 Equal (=)
 Greater than (>)
 Less than (<)
 Greater than or equal ( >=)
 Less than or equal (<=)
 Not equal ( !=) or (<>)
The following example finds the employees who have the highest salary:
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE
salary = (SELECT
MAX(salary)
FROM
employees)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)

In this example, the subquery returns the highest salary of all employees and the outer query
finds the employees whose salary is equal to the highest one.
The following statement finds all employees who salaries are greater than the average salary of
all employees:
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE
salary > (SELECT
AVG(salary)
FROM
employees);
Code language: SQL (Structured Query Language) (sql)

In this example, first, the subquery returns the average salary of all employees. Then, the outer
query uses the greater than operator to find all employees whose salaries are greater than the
average.
SQL subquery with the EXISTS or NOT EXISTS operator
The EXISTS operator checks for the existence of rows returned from the subquery. It returns
true if the subquery contains any rows. Otherwise, it returns false.
The syntax of the EXISTS operator is as follows:
EXISTS (subquery )
Code language: SQL (Structured Query Language) (sql)
The NOT EXISTS operator is opposite to the EXISTS operator.
NOT EXISTS (subquery)
Code language: SQL (Structured Query Language) (sql)
The following example finds all departments which have at least one employee with the salary is
greater than 10,000:
SELECT
department_name
FROM
departments d
WHERE
EXISTS( SELECT
1
FROM
employees e
WHERE
salary > 10000
AND e.department_id = d.department_id)
ORDER BY department_name;
Code language: SQL (Structured Query Language) (sql)

Similarly, the following statement finds all departments that do not have any employee with the
salary greater than 10,000:
SELECT
department_name
FROM
departments d
WHERE
NOT EXISTS( SELECT
1
FROM
employees e
WHERE
salary > 10000
AND e.department_id = d.department_id)
ORDER BY department_name;
Code language: SQL (Structured Query Language) (sql)

SQL subquery with the ALL operator


The syntax of the subquery when it is used with the ALL operator is as follows:
comparison_operator ALL (subquery)
Code language: SQL (Structured Query Language) (sql)
The following condition evaluates to true if x is greater than every value returned by the
subquery.
x > ALL (subquery)
Code language: SQL (Structured Query Language) (sql)
For example, suppose the subquery returns three value one, two, and three. The following
condition evaluates to true if x is greater than 3.
x > ALL (1,2,3)
Code language: SQL (Structured Query Language) (sql)
The following query uses the GROUP BY clause and MIN() function to find the lowest salary by
department:
SELECT
MIN(salary)
FROM
employees
GROUP BY department_id
ORDER BY MIN(salary) DESC;
Code language: SQL (Structured Query Language) (sql)
The following example finds all employees whose salaries are greater than the lowest salary of
every department:
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE
salary >= ALL (SELECT
MIN(salary)
FROM
employees
GROUP BY department_id)
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)

SQL subquery with the ANY operator


The following shows the syntax of a subquery with the ANY operator:
comparison_operator ANY (subquery)
Code language: SQL (Structured Query Language) (sql)
For example, the following condition evaluates to true if x is greater than any value returned by
the subquery. So the condition x > SOME (1,2,3) evaluates to true if x is greater than 1.
x > ANY (subquery)Code language: SQL (Structured Query Language) (sql)
Note that the SOME operator is a synonym for the ANY operator so you can use them
interchangeably.
The following query finds all employees whose salaries are greater than or equal to the highest
salary of every department.
SELECT
employee_id, first_name, last_name, salary
FROM
employees
WHERE
salary >= SOME (SELECT
MAX(salary)
FROM
employees
GROUP BY department_id);
Code language: SQL (Structured Query Language) (sql)
In this example, the subquery finds the highest salary of employees in each department. The
outer query looks at these values and determines which employee’s salaries are greater than or
equal to any highest salary by department.
SQL subquery in the FROM clause
You can use a subquery in the FROM clause of the SELECT statement as follows:
SELECT
*
FROM
(subquery) AS table_name
Code language: SQL (Structured Query Language) (sql)
In this syntax, the table alias is mandatory because all tables in the FROM clause must have a
name.
Note that the subquery specified in the FROM clause is called a derived table in MySQL or inline
view in Oracle.
The following statement returns the average salary of every department:
SELECT
AVG(salary) average_salary
FROM
employees
GROUP BY department_id;
Code language: SQL (Structured Query Language) (sql)

You can use this query as a subquery in the FROM clause to calculate the average of average
salary of departments as follows:
SELECT
ROUND(AVG(average_salary), 0)
FROM
(SELECT
AVG(salary) average_salary
FROM
employees
GROUP BY department_id) department_salary;
Code language: SQL (Structured Query Language) (sql)

SQL Subquery in the SELECT clause


A subquery can be used anywhere an expression can be used in the SELECT clause. The
following example finds the salaries of all employees, their average salary, and the difference
between the salary of each employee and the average salary.
SELECT
employee_id,
first_name,
last_name,
salary,
(SELECT
ROUND(AVG(salary), 0)
FROM
employees) average_salary,
salary - (SELECT
ROUND(AVG(salary), 0)
FROM
employees) difference
FROM
employees
ORDER BY first_name , last_name;Code language: SQL (Structured Query Language) (sql)

Now you should understand what an SQL subquery is and how to use subqueries to form
flexible SQL statements.
SQL Correlated Subquery
Summary: in this tutorial, you will learn about the SQL correlated subquery which is
a subquery that uses values from the outer query.
Introduction to SQL correlated subquery
Let’s start with an example.
See the following employees table in the sample database:
The following query finds employees whose salary is greater than the average salary of all
employees:
SELECT
employee_id,
first_name,
last_name,
salary
FROM
employees
WHERE
salary > (SELECT
AVG(salary)
FROM
employees);
Code language: SQL (Structured Query Language) (sql)

In this example, the subquery is used in the WHERE clause. There are some points that you can
see from this query:
First, you can execute the subquery that returns the average salary of all employees
independently.
SELECT
AVG(salary)
FROM
employees;
Code language: SQL (Structured Query Language) (sql)
Second, the database system needs to evaluate the subquery only once.
Third, the outer query makes use of the result returned from the subquery. The outer query
depends on the subquery for its value. However, the subquery does not depend on the outer
query. Sometimes, we call this subquery is a plain subquery.
Unlike a plain subquery, a correlated subquery is a subquery that uses the values from the outer
query. Also, a correlated subquery may be evaluated once for each row selected by the outer
query. Because of this, a query that uses a correlated subquery may be slow.
A correlated subquery is also known as a repeating subquery or a synchronized subquery.
SQL correlated subquery examples
Let’s see few more examples of the correlated subqueries to understand them better.
SQL correlated subquery in the WHERE clause example
The following query finds all employees whose salary is higher than the average salary of the
employees in their departments:
SELECT
employee_id,
first_name,
last_name,
salary,
department_id
FROM
employees e
WHERE
salary > (SELECT
AVG(salary)
FROM
employees
WHERE
department_id = e.department_id)
ORDER BY
department_id ,
first_name ,
last_name;
Code language: SQL (Structured Query Language) (sql)
Here is the output:

In this example, the outer query is:


SELECT
employee_id,
first_name,
last_name,
salary,
department_id
FROM
employees e
WHERE
salary >
...
Code language: SQL (Structured Query Language) (sql)
and the correlated subquery is:
SELECT
AVG( list_price )
FROM
products
WHERE
category_id = p.category_id
Code language: SQL (Structured Query Language) (sql)
For each employee, the database system has to execute the correlated subquery once to
calculate the average salary of the employees in the department of the current employee.
SQL correlated subquery in the SELECT clause example
The following query returns the employees and the average salary of all employees in their
departments:
SELECT
employee_id,
first_name,
last_name,
department_name,
salary,
(SELECT
ROUND(AVG(salary),0)
FROM
employees
WHERE
department_id = e.department_id) avg_salary_in_department
FROM
employees e
INNER JOIN
departments d ON d.department_id = e.department_id
ORDER BY
department_name,
first_name,
last_name;
Code language: SQL (Structured Query Language) (sql)
The output is:
For each employee, the database system has to execute the correlated subquery once to
calculate the average salary by the employee’s department.
SQL correlated subquery with EXISTS operator example
We often use a correlated subquery with the EXISTS operator. For example, the following query
returns all employees who have no dependents:
SELECT
employee_id,
first_name,
last_name
FROM
employees e
WHERE
NOT EXISTS( SELECT
*
FROM
dependents d
WHERE
d.employee_id = e.employee_id)
ORDER BY first_name ,
last_name;
Code language: SQL (Structured Query Language) (sql)
The following picture shows the output:

In this tutorial, you have learned about the SQL correlated subquery and how to apply it to form
a complex query.
SQL ALL
Summary: in this tutorial, you will learn about the SQL ALL operator and how to use it to
compare a value with a set of values.
Introduction to the SQL ALL operator
The SQL ALL operator is a logical operator that compares a single value with a single-column set
of values returned by a subquery.
The following illustrates the syntax of the SQL ALL operator:
WHERE column_name comparison_operator ALL (subquery)
Code language: SQL (Structured Query Language) (sql)
The SQL ALL operator must be preceded by a comparison operator such as >, >=, <, <=, <>, = and
followed by a subquery. Some database systems such as Oracle allow a list of literal values
instead of a subquery.
Note that if the subquery returns no row, the condition in the WHERE clause is always true.
Assuming that the subquery returns one or more rows, the following table illustrates the
meaning of the SQL ALL operator:
Condition Meaning
 c > ALL(…)  The values in column c must greater than the biggest value in the set to
Condition Meaning
evaluate to true.
 c >=  The values in column c must greater than or equal to the biggest value in the
ALL(…) set to evaluate to true.
 c < ALL(…)  The values in column c must be less than the lowest value in the set to
evaluate to true.
 c >=  The values in column c must be less than or equal to the lowest value in the
ALL(…) set to evaluate to true.
 c <>  The values in column c must not be equal to any value in the set to evaluate to
ALL(…) true.
 c = ALL(…)  The values in column c must be equal to any value in the set to evaluate to
true.
SQL ALL examples
We will use the employees table from the sample database for the demonstration:

SQL ALL with the greater than operator


The following query finds rows whose values in the column_name are greater than the biggest
values returned by the subquery:
SELECT
*
FROM
table_name
WHERE
column_name > ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
For example, the following statement finds all employees whose salaries are greater than the
highest salary of employees in the Marketing department whose id is 2:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary > ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary;Code language: SQL (Structured Query Language) (sql)

Let’s verify it by querying the highest salary of employees in department 2:


SELECT
MAX(salary)
FROM
employees
WHERE
department_id = 2;
Code language: SQL (Structured Query Language) (sql)

This query returned 13,000 which is lower than any salary that returned by the query which used
the ALL operator above.
SQL ALL with the greater than or equal to operator
The following shows the syntax of the SQL ALL operator with the greater than or equal to
operator:
SELECT
*
FROM
table_name
WHERE
column_name >= ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
The query returns all rows whose values in the column_name are greater than or equal to all the
values returned by the subquery.
For example, the following query finds all employees whose salaries are greater than or equal to
the highest salary of employees in the Marketing department:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary >= ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary;
Code language: SQL (Structured Query Language) (sql)
As shown clearly in the screenshot, the salary of Michael is 13,000 which is equal to the highest
salary of employees in the Marketing department is included in the result set.
SQL ALL with the less than operator
The following illustrates the ALL operator used with the less than operator:
SELECT
*
FROM
table_name
WHERE
column_name < ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
This query returns all rows whose values in the column_name are smaller than the smallest
values returned by the subquery.
The following statement finds the lowest salary of employees in the Marketing department:
SELECT
MIN(salary)
FROM
employees
WHERE
department_id = 2;
Code language: SQL (Structured Query Language) (sql)

To find all employees whose salaries are less than the lowest salary of employees in
the Marketing department, you use the ALL operator with the less than operator as follows:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary < ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
The result is:
SQL ALL with the less than or equal to operator
The following shows the syntax of the ALL operator used with the less than or equal to
operator:
SELECT
*
FROM
table_name
WHERE
column_name <= ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
For example, the following statement finds all employees whose salaries are less than or equal
to the lowest salary of employees in the Marketing department:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary <= ALL (SELECT
salary
FROM
employees
WHERE
department_id = 2)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)
SQL ALL with the not equal to operator
The following query returns all rows whose values in the column_name are not equal to any
values returned by the subquery:
SELECT
*
FROM
table_name
WHERE
column_name <> ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
For example, to find employees whose salaries are not equal to the average salary of every
department, you use the query below:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary <> ALL (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)

Notice that the subquery finds the average salary of employees by the department by using
the AVG() function and the GROUP BY clause.
SQL ALL with the equal to operator
When you use the ALL operator with the equal to operator, the query finds all rows whose
values in the column_name are equal to any values returned by the subquery:
SELECT
*
FROM
table_name
WHERE
column_name = ALL (subquery);
Code language: SQL (Structured Query Language) (sql)
The following example finds all employees whose salaries are equal to the highest salary of
employees in the Marketing department:
SELECT
first_name, last_name, salary
FROM
employees
WHERE
salary = ALL (SELECT
MAX(salary)
FROM
employees
WHERE
department_id = 2);
Code language: SQL (Structured Query Language) (sql)

In this tutorial, you have learned how to use the SQL ALL operator to test whether a value
matches a set of values returned by a subquery.
SQL ANY
Summary: in this tutorial, you will learn about the SQL ANY operator and how to use it to
compare a value with a set of values.
Introduction to the SQL ANY operator
The ANY operator is a logical operator that compares a value with a set of values returned by a
subquery. The ANY operator must be preceded by a comparison operator >, >=, <, <=, =, <> and
followed by a subquery.
The following illustrates the syntax of the ANY operator:
WHERE column_name comparison_operator ANY (subquery)
Code language: SQL (Structured Query Language) (sql)
If the subquery returns no row, the condition evaluates to false. Suppose the subquery does not
return zero rows, the following illustrates the meaning of the ANY operator when it is used with
each comparison operator:
Condition Meaning
x = ANY (…)  The values in column c must match one or more values in the set to evaluate
to true.
x != ANY  The values in column c must not match one or more values in the set to
(…) evaluate to true.
x > ANY (…)  The values in column c must be greater than the smallest value in the set to
evaluate to true.
x < ANY (…)  The values in column c must be smaller than the biggest value in the set to
evaluate to true.
x >= ANY  The values in column c must be greater than or equal to the smallest value in
Condition Meaning
(…) the set to evaluate to true.
x <= ANY  The values in column c must be smaller than or equal to the biggest value in
(…) the set to evaluate to true.
SQL ANY examples
For the demonstration, we will use the employees table from the sample database:

SQL ANY with equal to operator example


The following statement uses the AVG() function and GROUP BY clause to find the average
salary of each department:
SELECT
ROUND(AVG(salary), 2)
FROM
employees
GROUP BY
department_id
ORDER BY
AVG(salary) DESC;
Code language: SQL (Structured Query Language) (sql)

To find all employees whose salaries are equal to the average salary of their department, you
use the following query:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary = ANY (
SELECT
AVG(salary)
FROM
employees
GROUP BY
department_id)
ORDER BY
first_name,
last_name,
salary;
Code language: SQL (Structured Query Language) (sql)

Using SQL ANY with the not equal to operator example


Similarly, the following query finds all employees whose salaries are not equal to the average
salary of every department:
SELECT
first_name,
last_name,
salary
FROM
Employees WHERE
salary <> ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY
first_name,
last_name,
salary;
Code language: SQL (Structured Query Language) (sql)

Using SQL ANY with the greater than operator example


The following query finds all employees whose salaries are greater than the average salary in
every department:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary > ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY
salary;
Code language: SQL (Structured Query Language) (sql)

Note that the lowest average salary is 4,150. The query above returns all employees whose
salaries are greater than the lowest salary.
Using SQL ANY with the greater than or equal to operator example
The following statement returns all employees whose salaries are greater than or equal to the
average salary in every department:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary >= ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY first_name , last_name , salary;
Code language: SQL (Structured Query Language) (sql)
Using SQL ANY with the less than operator example
The following query finds all employees whose salaries are less than the average salary in every
department:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary < ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)

In this example, employees whose salaries are smaller than the highest average salary in every
department:
Using SQL ANY with the less than or equal to operator example
To find employees whose salaries are less than or equal to the average salary in every
department, you use the following query:
SELECT
first_name,
last_name,
salary
FROM
employees
WHERE
salary <= ANY (SELECT
AVG(salary)
FROM
employees
GROUP BY department_id)
ORDER BY salary DESC;
Code language: SQL (Structured Query Language) (sql)

As shown in the screenshot, the result set includes the employees whose salaries are lower than
or equal to the highest average salary in every department.
Now you should know how to use the SQL ANY operator to form condition by comparing a
value with a set of values.
Introduction to the SQL EXISTS operator
The EXISTS operator allows you to specify a subquery to test for the existence of rows. The
following illustrates the syntax of the EXISTS operator:
EXISTS (subquery)
Code language: SQL (Structured Query Language) (sql)
The EXISTS operator returns true if the subquery contains any rows. Otherwise, it returns false.
The EXISTS operator terminates the query processing immediately once it finds a row,
therefore, you can leverage this feature of the EXISTS operator to improve the query
performance.
SQL EXISTS operator example
We will use the  employees and dependents tables in the sample database for the
demonstration.

The following statement finds all employees who have at least one dependent:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
EXISTS( SELECT
1
FROM
dependents
WHERE
dependents.employee_id = employees.employee_id);
Code language: SQL (Structured Query Language) (sql)

The subquery is correlated. For each row in the  employees table, the subquery checks if there is
a corresponding row in the dependents table. If yes, then the subquery returns one which
makes the outer query to include the current row in the  employees table. If there is no
corresponding row, then the subquery returns no row that causes the outer query to not
include the current row in the  employees table in the result set.
SQL NOT EXISTS
To negate the EXISTS operator, you use the NOT operator as follows:
NOT EXISTS (subquery)
Code language: SQL (Structured Query Language) (sql)
For example, the following query finds employees who do not have any dependents:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
NOT EXISTS( SELECT
1
FROM
dependents
WHERE
dependents.employee_id = employees.employee_id);
Code language: SQL (Structured Query Language) (sql)
The following screenshot illustrates the result:

SQL EXISTS and NULL
If the subquery returns NULL, the EXISTS operator still returns the result set. This is because
the EXISTS operator only checks for the existence of row returned by the subquery. It does not
matter if the row is NULL or not.
In the following example, the subquery returns NULL but the EXISTS operator still evaluates to
true:
SELECT
employee_id, first_name, last_name
FROM
employees
WHERE
EXISTS( SELECT NULL)
ORDER BY first_name , last_name;
Code language: SQL (Structured Query Language) (sql)
The query returns all rows in the  employees table.
In this tutorial, you have learned how to use the SQL EXISTS operator to test for the existence of
rows returned by a subquery.

You might also like