Professional Documents
Culture Documents
All rights reserved. This book or any portion thereof may not be reproduced
or used in any manner whatsoever without the express written permission
of the publisher except for the use of brief quotations in a book review.
Table of Content
Overview .......................................................................................................................... 1
Why Prepare for SQL and Database for Interviews? ....................................................... 2
How to prepare SQL for Interviews? .................................................................................. 3
Which SQL and Database topics to Prepare for Interviews? .......................................... 5
A s a Java developer who has cleared SQL interviews, I can attest that
SQL interviews are not easy, given the vast nature of SQL and the
variety of concepts and topics it covers. When I was preparing for Java
interviews, I was looking for a resource where I learn and prepare SQL
questions from a programmer’s perspective rather than a DBA, but I
couldn’t find any. Grokking the SQL Interview is my effort to fill that gap
and become an excellent resource for anyone looking to master SQL and
prepare for SQL interviews.
Whenever I prepare for Java Interview, I always prepare about SQL and
Linux; these are everywhere. That’s why I decided to cover essential topics
from an interview perspective when I thought to write about this book.
The book covers a wide range of topics, starting with the basics of SQL and
database design, and progressing to more advanced topics like SQL queries,
joins, indexes, group-by and aggregate functions, stored procedures,
triggers and views, database normalization, transactions, window functions,
and common interview questions.
One of the best things about this book is that it includes many real-world
SQL questions commonly asked in job interviews. This means that by
studying the questions and answers in this book, you will be well-prepared
for any SQL interview.
—1—
GROKKING THE SQL INTERVIEW
Why? I suggest preparing because when you work on a vast technology like
Database and SQL, you work on a particular area. It’s almost impossible to
learn all the areas of a specific framework.
For example, if you work in a Java application that uses MySQL on the
backend, you are most likely exposed to writing queries and stored
procedures. Still, you may or may not be involved in the database design
process.
And, believe it or not, interviews are almost always very different from real
jobs. Interviewers expect you to know everything about the technology, even
if certain areas are never used in any project, like normalization and
database design in this case.
If your aim is to clear the interviews because you need that job then you
don’t have any choice; you cannot argue what is right or what is wrong, the
best approach is to prepare hard and prepare well and that’s where this book
can help you.
—2—
OVERVIEW
While there is no guarantee that the Interviewer will ask any question from
this book, by going through the book, you already know most of the SQL
and Database concepts which you are expected to know.
Here are some tips to help you prepare for SQL interviews:
—3—
GROKKING THE SQL INTERVIEW
Remember that SQL interviews are not just about memorizing syntax and
queries but also about being able to think critically and solve complex
problems. With practice and preparation, you can develop the skills needed
to ace your SQL interview. If you are short of time, then there is no choice
but to go through the question as soon as possible so that you can revise the
essential concepts and at least have some idea about it.
Reviewing this book will not only help you gain both confidence and
knowledge to answer the questions, but more importantly, it will allow you
to drive the Java interview in your favor. This is the single most important
tip I can give you as a Java developer.
Always remember your answers drive interviews, and these questions will
empower you to drive the Interviewer to your stronger areas.
—4—
OVERVIEW
All the best for the SQL interview, and if you have any questions or feedback,
you can always contact me on Twitter at javinpaul or comment on my blogs
Javarevisited and Java67.
These are also the topics which you need to prepare Data Scientist
interviews, which means you can kill two birds in one stone. I mean, by
preparing for Software development interview you can also get yourself
ready for potential Data Scientist jobs.
—5—
CHAPTER 1:
D atabase and SQL is a very important skill, not just for DBA or Database
admins but also for any application developer like Java, .NET, or Web
developers. This is why you would often see questions from SQL and
Database in Programming interviews.
For DBAs, SQL is more important than a programmer because it being their
primary skill, they are also expected to know more than a common Java or
.NET developer. Since no Java interview is just about Java questions, I often
receive requests from my reader about SQL questions like how to solve a
particular query or some tricky questions based upon database indexes.
We will kick start our journey with common SQL questions. So, these are
50 standard SQL and Database interview questions often asked in the
telephonic round of DBA and Programmer interviews. Phone interviews
differ slightly from face-to-face interviews and tend to be more specific.
—6—
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
At the same time, It's also hard for any SQL developer to go through five-
page articles to revise some 30 most frequently asked SQL questions, keep
answers to the point helps them a lot.
So, let's start our journey with these beautiful databases and SQL interview
questions, I am sure you will also learn new things.
Question 1
Difference between UNION and UNION ALL in SQL
Short answer - UNION doesn't include duplicate records, UNION ALL
does. Both can be used to combine results from multiple queries.
Long answer - In SQL, both UNION and UNION ALL are used to combine
the results of two or more SELECT queries into a single result set. However,
there is a significant difference between the two:
1. UNION:
• UNION combines the results of multiple SELECT queries and
removes duplicate rows from the final result set.
• It performs an implicit DISTINCT operation on the result set,
ensuring that only distinct rows are included.
• It is useful when you want to combine and eliminate duplicate data
from multiple tables or queries.
—7—
GROKKING THE SQL INTERVIEW
2. UNION ALL:
• UNION ALL also combines the results of multiple SELECT queries,
but it includes all rows, even if there are duplicates.
• It does not perform any duplicate elimination, resulting in faster
performance compared to UNION, especially when dealing with
large datasets.
• It is useful when you want to combine and preserve all rows,
including duplicates, from multiple tables or queries.
Example:
Consider two tables, employees and contractors, with similar structures:
1. employees table:
1 John 50000
2 Jane 55000
3 Mary 60000
4 Peter 52000
2. contractors table:
—8—
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
1. Using UNION:
Result:
1 John 50000
2 Jane 55000
3 Mary 60000
4 Peter 52000
As you can see, UNION combined the data from both tables and removed
duplicates (e.g., contractor with salary 48000 is only shown once).
—9—
GROKKING THE SQL INTERVIEW
Result:
1 John 50000
2 Jane 55000
3 Mary 60000
4 Peter 52000
In this case, UNION ALL also combined the data from both tables but
retained duplicates in the result set. As a result, contractor with salary 48000
is shown twice, as there are two contractors with that salary.
Question 2
Difference between WHERE and HAVING clause in SQL?
Short answer - In case of WHERE filtering applies before aggregation while
in case of HAVING, filtering applies after aggregation
Long answer - The WHERE and HAVING clauses are both used in SQL to
filter data, but they serve different purposes:
1. WHERE Clause:
• The WHERE clause is used to filter rows before they are grouped or
aggregated in the query.
— 10 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
2. HAVING Clause:
• The HAVING clause is used to filter the result of a grouped query,
specifically after the GROUP BY clause has been applied.
• It operates on groups of rows and filters groups based on specified
conditions.
• It is used in combination with aggregate functions (e.g., COUNT,
SUM, AVG, etc.) and allows you to filter the grouped data based on
aggregate results.
Example:
Consider a table named employees with the following data:
1 John HR 50000
2 Jane IT 55000
3 Mary HR 60000
4 Peter IT 52000
— 11 —
GROKKING THE SQL INTERVIEW
Result:
3 Mary HR 60000
In this example, the WHERE clause filters individual rows of the employees
table based on the conditions specified (department = 'HR' and salary >
55000).
Result:
department avg_salary
HR 55000
IT 53500
In this example, the HAVING clause operates on the result of the grouped
query (after applying GROUP BY department) and filters the groups based
on the condition specified (AVG(salary) > 51000).
— 12 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 3
Describe the difference between clustered and non-clustered indexes
in a database?
Short answer - Clustered index defined the order in which data is physically
stored in the table. Since data can be sorted in only one way physically, there
is only one clustered index per table and that's usually the primary key. You
can have multiple non-clustered indexes to speed up your queries.
Long answer - In SQL, both clustered and non-clustered indexes are used
to improve the performance of database queries by providing quick access
to data. However, there are significant differences between the two:
1. Clustered Index:
• A clustered index determines the physical order of data rows in a
table. Each table can have only one clustered index.
• When a table has a clustered index, the rows are stored in the order
of the clustered index key. This means that the data is physically
organized on disk based on the values in the indexed column.
• Due to the physical ordering, the retrieval of rows using a clustered
index is faster when querying on the indexed column.
• Creating or rebuilding a clustered index can be more time-
consuming, as it affects the physical order of data in the table.
Example:
1 John HR 50000
2 Jane IT 55000
— 13 —
GROKKING THE SQL INTERVIEW
3 Mary HR 60000
4 Peter IT 52000
The data in the employees table will be physically sorted based on the values
in the emp_id column. Any query that uses the emp_id column as a search
criteria will benefit from faster data retrieval.
2. Non-Clustered Index:
• A non-clustered index is a separate data structure that contains a
copy of the indexed columns along with a pointer to the actual data
rows in the table.
• A table can have multiple non-clustered indexes, allowing for
different indexing strategies to optimize various queries.
• Non-clustered indexes do not affect the physical order of data in the
table; instead, they provide a quick lookup path to the actual data
rows.
• Non-clustered indexes are generally faster to create or rebuild
compared to clustered indexes.
Example:
If we create a non-clustered index on the salary column:
— 14 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
The index will contain a copy of the salary column along with pointers to
the actual rows in the employees table. Any query that uses the salary
column in a search or join operation will benefit from faster data retrieval.
Question 4
Write an SQL query to find the second highest salary of an employee
without using TOP or LIMIT?
You can use correlated queries to solve this problem. You can find the
second highest salary of an employee without using TOP or LIMIT by using
the following SQL query:
In this query, we are using a subquery to find the maximum salary in the
employees table. Then, we are selecting all distinct salary values that are less
than the maximum salary. Finally, we order the results in descending order
and use LIMIT 1 to get the second highest salary.
— 15 —
GROKKING THE SQL INTERVIEW
Question 5
How to find duplicate rows in the database?
To find duplicate rows in a database table, you can use SQL queries with the
GROUP BY and HAVING clauses.
The GROUP BY clause groups the rows with identical values, and the
HAVING clause filters the groups to show only those with more than one
occurrence.Here's an example of how to find duplicate rows in a table:
1 John HR 50000
2 Jane IT 55000
3 Mary HR 60000
4 Peter IT 52000
5 Alice HR 50000
6 Tom IT 52000
7 Emma HR 60000
To find duplicate rows based on the salary and department columns, you
can use the following SQL query:
— 16 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
50000 HR 2
60000 HR 2
52000 IT 2
In this example, the query groups the rows based on the salary and
department columns and counts the occurrences of each group. The
HAVING COUNT(*) > 1 condition filters the groups to show only those
with more than one occurrence, indicating that these are the duplicate rows
in the table.
Please note that the example assumes you are using SQL, and the syntax may
vary slightly depending on the database management system you are using
(e.g., MySQL, SQL Server, PostgreSQL, etc.). Additionally, the duplicate
criteria can be based on different combinations of columns as needed for
your specific use case.
Question 6
Difference between correlated and non-correlated subquery in SQL?
In SQL, subqueries are queries that are nested within another query.
Subqueries can be classified into two types: correlated subqueries and non-
correlated subqueries. The main difference between them lies in how they
interact with the outer query:
1. Non-Correlated Subquery:
• A non-correlated subquery is an independent query that can be
executed on its own without reference to the outer query.
— 17 —
GROKKING THE SQL INTERVIEW
• The subquery is evaluated first, and its result is then used in the
outer query to filter or perform other operations.
• Non-correlated subqueries are executed only once, regardless of the
number of rows in the outer query, making them generally more
efficient.
1 John HR 50000
2 Jane IT 55000
3 Mary HR 60000
4 Peter IT 52000
Suppose we want to find all employees whose salary is greater than the
average salary of all employees. We can use a non-correlated subquery for
this:
— 18 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
2. Correlated Subquery:
• A correlated subquery is a subquery that depends on the values from
the outer query to execute.
• For each row processed by the outer query, the subquery is re-
evaluated with the specific values from the current row of the outer
query.
• Correlated subqueries can lead to decreased performance, especially
when dealing with large datasets, as they may be executed multiple
times.
Suppose we want to find all employees whose salary is greater than the
average salary of their respective departments. We can use a correlated
subquery for this:
— 19 —
GROKKING THE SQL INTERVIEW
Question 7
How many clustered indexes you can have in a table?
Short answer - This is a tricky question, you can only have one clustered
index per table.
However, it's essential to note that a table can have multiple non-clustered
indexes. Non-clustered indexes are separate data structures that provide
quick access paths to the data rows in the table. While there can be multiple
non-clustered indexes, there can be only one clustered index per table.
The limitation of one clustered index per table is mainly due to the fact that
the physical order of data rows can only be maintained in one specific
way.
Having multiple clustered indexes would require storing the same data in
multiple physical orders, which would be inefficient and impractical. Hence,
the single clustered index restriction is enforced to maintain data
consistency and storage efficiency.
In summary, a table in most RDBMS can have only one clustered index, but
it can have multiple non-clustered indexes to optimize query performance
for various columns and criteria.
— 20 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 8
Difference between the PRIMARY key and the UNIQUE key
constraint in SQL?
In SQL, both the PRIMARY key and the UNIQUE key constraint are used
to enforce uniqueness in a table's column(s). However, there are some key
differences between them:
Example:
— 21 —
GROKKING THE SQL INTERVIEW
Example:
In this example, the email column has a UNIQUE key constraint, ensuring
that each email address is unique in the table, but it can allow NULL values.
In summary, both the PRIMARY key and the UNIQUE key constraint
ensure uniqueness in column(s) of a table. The PRIMARY key uniquely
identifies each row and creates a clustered index, while the UNIQUE key
constraint enforces uniqueness but does not necessarily identify each row
uniquely and creates a non-clustered index.
— 22 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 9
Difference between view and materialized view in SQL?
In SQL, both views and materialized views are database objects used to
provide a logical representation of data from one or more underlying tables.
However, there are significant differences between the two:
1. View:
• A view is a virtual table that does not store data on its own. It is
defined by a query that retrieves data from one or more base tables.
• Views are used to simplify complex queries, encapsulate data access
logic, and provide a security mechanism by restricting access to
specific columns or rows of a table.
• Whenever you query a view, the underlying query is executed, and
the data is generated on-the-fly based on the latest data in the base
tables. The view's data is always up-to-date with the underlying
tables.
• Views are suitable for scenarios where real-time data is needed, and
you want to hide the complexity of underlying tables.
— 23 —
GROKKING THE SQL INTERVIEW
2. Materialized View:
• A materialized view is a physical copy of the result set of a query. It
stores the data in a separate table and keeps the data updated
periodically or on-demand based on a defined refresh schedule.
• Materialized views are used to improve query performance by
precomputing and storing the results of complex and resource-
intensive queries. They are beneficial for scenarios where you need
to access the same query result multiple times or when the
underlying data changes infrequently.
• The data in a materialized view is not always up-to-date with the
underlying tables; it depends on the last refresh or update time.
Therefore, materialized views are suitable for scenarios where near-
real-time data is sufficient.
Example of creating a materialized view:
— 24 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 10
Difference between TRUNCATE, DELETE and DROP in SQL?
Delete removes record but also put an entry into log segment so you can roll
back, truncate removes all rows of a table without logging each record on
log segment, DROP removes table itself.
1. TRUNCATE:
• TRUNCATE is used to delete all the rows from a table while keeping
the table structure intact.
• It is a data manipulation language (DML) command that is typically
faster than the DELETE command because it does not log
individual row deletions and does not generate as much undo/redo
data.
• The TRUNCATE command cannot be used on tables with foreign
key constraints, and it also cannot be rolled back.
• When you use TRUNCATE, the table's identity counter (if any) is
reset, and the table becomes empty.
This command will remove all rows from the employees table, but the table
structure will remain unchanged.
2. DELETE:
• DELETE is used to remove specific rows from a table based on
specified conditions.
— 25 —
GROKKING THE SQL INTERVIEW
This command will delete all rows from the employees table where the
department is 'HR'.
3. DROP:
• DROP is used to remove database objects such as tables, views,
indexes, or stored procedures.
• It is a data definition language (DDL) command that completely
removes the object from the database.
• DROP cannot be rolled back, and once an object is dropped, it is
permanently deleted from the database.
• Be cautious when using DROP as it can lead to data loss if not used
carefully.
Example of dropping a table:
This command will remove the employees table and all its data and
structure from the database.
— 26 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 11
What is Referential Integrity in a relational database?
Short answer - It's a rule which ensures that when a record is deleted from
the primary table, all associated records are deleted from the related table. It
ensures data integrity.
— 27 —
GROKKING THE SQL INTERVIEW
3. Enforcing Constraints:
• Foreign Key constraints are used to enforce Referential Integrity
rules in the database.
• These constraints can be declarative (defined explicitly) or implicit
(created automatically by the database based on the schema).
4. Cascading Actions:
• Referential Integrity constraints can include cascading actions that
define how updates or deletions in the referenced table affect the
referencing table.
• Common cascading actions include CASCADE (update/delete the
referencing rows), SET NULL (set referencing rows to NULL), and
RESTRICT (prevent update/delete if referenced rows exist).
— 28 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 12
What is Normalization?
Short answer - A way to avoid duplication of data in tables.
— 29 —
GROKKING THE SQL INTERVIEW
Normalization Process:
Students Table:
Student_ID Student_Name
101 John
102 Alice
103 Bob
Courses Table:
— 30 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Courses Table:
CS101 Intro to CS 1
Instructors Table:
— 31 —
GROKKING THE SQL INTERVIEW
Students Table:
Student_ID Student_Name
101 John
102 Alice
103 Bob
Courses Table:
CS101 Intro to CS 1
— 32 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Instructors Table:
By following the normalization process and dividing the data into separate
related tables, we have eliminated data redundancy, reduced update
anomalies, and ensured data integrity. Each table serves a single purpose,
and the relationships between the tables are defined by appropriate foreign
key constraints.
Question 13
When a table is said to be in 1NF, 2nd NF, and 3rd NF?
A table is said to be in First Normal Form (1NF), Second Normal Form
(2NF), and Third Normal Form (3NF) based on specific criteria related to
data organization and dependencies. Each normal form builds upon the
previous one, and a higher normal form indicates a higher level of data
integrity and a reduction in data redundancy and anomalies. Here's an
overview of each normal form:
— 33 —
GROKKING THE SQL INTERVIEW
— 34 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 14
Describe the difference between ISNULL() and COALESCE() in SQL
Server?
Both ISNULL() and COALESCE() are used in SQL Server to handle NULL
values, but they have some differences in functionality:
1. ISNULL() Function:
• ISNULL() is a built-in SQL Server function that returns the
specified replacement value if the expression evaluates to NULL.
Otherwise, it returns the original expression.
• It takes two arguments: the expression to be evaluated and the
replacement value.
• ISNULL() is specific to SQL Server and can only handle two
arguments.
2. COALESCE() Function:
• COALESCE() is a standard SQL function supported by various
database management systems, including SQL Server.
• It takes multiple arguments and returns the first non-NULL expression
from the list. If all arguments are NULL, it returns NULL.
• Unlike ISNULL(), COALESCE() can handle multiple arguments
and will evaluate and return the first non-NULL value encountered.
— 35 —
GROKKING THE SQL INTERVIEW
In this example, COALESCE() will return the first non-NULL value among
column1, column2, and column3. If all three columns are NULL, it will
return 'Not available'.
Key Differences:
In SQL Server, both functions can be useful for handling NULL values, but
COALESCE() offers more flexibility when dealing with multiple
expressions and provides a standard approach that can be easily ported to
other SQL-compliant databases.
Question 15
How do you ensure that only values between 1 to 5 are allowed in an
integer column?
To ensure that only values between 1 to 5 are allowed in an integer column,
you can use a combination of CHECK constraint and a BETWEEN clause.
The CHECK constraint allows you to define a condition that must be true
for any data to be inserted or updated in the column. In this case, you can
use the CHECK constraint to enforce the range of values.
— 36 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Let's demonstrate how you can insert valid and invalid values:
Valid Insert:
Invalid Insert:
By using the CHECK constraint with the BETWEEN clause, you can ensure
that only valid values are allowed in the integer column, maintaining data
integrity and preventing incorrect data from being entered into the table.
— 37 —
GROKKING THE SQL INTERVIEW
Question 16
Difference between CHAR and VARCHAR data types in SQL?
Short answer - CHAR is fixed length, VARCHAR is variable length
Long answer - In SQL, CHAR and VARCHAR are both data types used to
store character strings, but they have some key differences in terms of
storage and behavior:
Example of CHAR:
— 38 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Example of VARCHAR:
The choice between the two data types depends on the nature of the data
being stored and the expected lengths of the strings. If you have data with
consistent lengths, CHAR might be more suitable, whereas for variable-
length data, VARCHAR is generally a better choice due to its efficient use
of storage.
— 39 —
GROKKING THE SQL INTERVIEW
Question 17
Difference between VARCHAR and NVARCHAR in SQL Server?
Short answer - NVARCHAR supports Unicode while VARCHAR doesn’t
Long answer - In SQL Server, VARCHAR and NVARCHAR are both used
to store character data, but they differ in how they handle character
encoding. VARCHAR stores data using the database's default encoding
(e.g., ASCII or UTF-8), while NVARCHAR uses a variable-length Unicode
encoding (UTF-16), allowing storage of multi-byte characters.
NVARCHAR is ideal for internationalization and multilingual data, but it
consumes more storage space than VARCHAR. Use VARCHAR for single-
byte character sets and NVARCHAR for multi-byte character sets to ensure
proper handling of character data in the database.
Question 18
How do you get Day, Month, and Year from a date in SQL Server?
Short answer - By using the DATEPART() function
Long answer - In SQL Server, you can extract the day, month, and year
components from a date using various built-in functions. Here's how you
can do it:
1. Get Day: To extract the day from a date, you can use the DAY()
function. It returns the day of the month as an integer value ranging
from 1 to 31.
— 40 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
2. Get Month: To extract the month from a date, you can use the
MONTH() function. It returns the month as an integer value
ranging from 1 to 12.
3. Get Year: To extract the year from a date, you can use the YEAR()
function. It returns the year as an integer value.
Question 19
How to check if a date is valid in SQL?
In SQL, you can check if a date is valid using the TRY_CAST() or
TRY_CONVERT() function.
These functions attempt to convert the input expression to the specified data
type and return NULL if the conversion fails.
To check if a date is valid, you can try to cast or convert the date string to a
date data type (e.g., DATE, DATETIME, SMALLDATETIME, etc.).
— 41 —
GROKKING THE SQL INTERVIEW
— 42 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 20
Difference between LEFT OUTER JOIN and INNER JOIN in SQL?
In SQL, both LEFT OUTER JOIN and INNER JOIN are used to combine
data from two or more tables, but they produce different results:
1. INNER JOIN: An INNER JOIN returns only the rows that have
matching values in both tables based on the specified join condition.
It effectively filters out rows where there is no match in the joined
table. In other words, an INNER JOIN produces a result set that
contains only the rows with matching keys in both tables.
Example:
In this example, only the rows where there is a match between the
employees table and the departments table on the department_id column
will be returned.
2. LEFT OUTER JOIN: A LEFT OUTER JOIN returns all the rows
from the left (first) table and the matching rows from the right
(second) table. If there is no match in the right table, NULL values
will be returned for columns from the right table. It keeps all the
rows from the left table regardless of whether there is a matching
row in the right table.
Example:
— 43 —
GROKKING THE SQL INTERVIEW
Question 21
What is SELF JOIN in SQL?
When we join two instances of tables it's called self-join. For example, if the
table contains employee name and role then you can use self-joincopy to
find all employees who are managers.
Question 22
In a classical Employee and Department relationship, write SQL
query to print all departments and number of employees on each
department.
To print all departments along with the number of employees in each
department, you can use a LEFT OUTER JOIN and the COUNT()
function. Assuming you have two tables named departments and
employees with a foreign key department_id in the employees table that
references the department_id in the departments table, you can use the
following SQL query:
Explanation:
1. We start with the departments table, aliasing it as d, and then
perform a LEFT OUTER JOIN with the employees table, aliasing it
as e, using the common column department_id.
2. The LEFT OUTER JOIN ensures that all departments from the
departments table are included in the result, even if there are no
matching employees in the employees table.
— 44 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
The query will provide a result set with two columns: department_name
and num_of_employees, showing all departments and the corresponding
number of employees in each department.
Question 23
Difference between COUNT(*), COUNT(1), and COUNT(column_
name) in SQL?
Short answer - COUNT(*) includes null values while counting but
COUNT(1), and COUNT(column_name) doesn't take null values into
consideration during counting.
Long answer:
In SQL, COUNT(*), COUNT(1), and COUNT(column_name) are all used
with the COUNT() aggregate function to count the number of rows in a
result set or the number of non-null values in a specific column. However,
they have slight differences in their behavior:
1. COUNT(*):
• COUNT(*) counts the total number of rows in the result set,
including rows with NULL values in all columns.
• It does not consider individual column values but simply counts the
rows that satisfy the conditions in the query.
— 45 —
GROKKING THE SQL INTERVIEW
• It is usually the fastest option for counting rows because it does not
involve evaluating any column values.
Example:
2. COUNT(1):
• COUNT(1) is similar to COUNT(*), as it also counts the total
number of rows in the result set, including rows with NULL values
in all columns.
• Instead of considering individual column values, it uses the constant
value "1" for each row in the result set, effectively counting the rows
without evaluating any column values.
• Like COUNT(*), it is a fast option for counting rows.
Example:
3. COUNT(column_name):
• COUNT(column_name) counts the number of non-null values in
the specified column. It excludes rows with NULL values in the
specified column from the count.
• It is used when you want to count the occurrences of non-null values
in a particular column.
Example:
SELECT COUNT(employee_id) AS non_null_employee_ids
FROM employees;
In summary, COUNT(*) and COUNT(1) are used to count all rows in the
result set, while COUNT(column_name) is used to count the non-null values
— 46 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
in a specific column. All three options have their specific use cases, and the
choice depends on what you want to count in your SQL query. For counting
all rows, COUNT(*) is the most commonly used and efficient option.
Question 24
What is Database statistics? How it affects the performance of your
Query?
One of the first thing I do when my app become slow is to update the statistics
on DB. It immediately boost the performance for one reason or other.
Database statistics refer to the information and metrics about the data
distribution and structure within a database. These statistics are typically
collected and maintained by the database management system (DBMS) and
are used by the query optimizer to determine the most efficient execution
plan for a given SQL query.
— 47 —
GROKKING THE SQL INTERVIEW
Pro tip
if you run import jobs make sure you include a command to update the
statistics after data is imported like
— 48 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 25
Suppose you have to create a compound index in a table, involving two
columns like book_id and active. Now you can create them as either
(book_id, active) or (active, book_id), do you think the order of
columns in an index matter? How it will affect your SQL queries?
Yes, the order of columns in a compound index can matter, and it can
significantly affect the performance of SQL queries. The order of columns
in an index impacts how the index is utilized by the query optimizer for
different types of queries.
When creating a compound index involving two columns like book_id and
active, the choice of the order of these columns can lead to different index
structures and query optimizations:
2. (active, book_id) Index: On the other hand, this index order is more
appropriate when queries often involve filtering by active and then
by book_id. For example, if you have queries like:
— 49 —
GROKKING THE SQL INTERVIEW
Keep in mind that the order of columns in the index should be based on the
query patterns and the filtering conditions you expect to see in your
application. In some cases, it might be beneficial to create multiple indexes
with different column orders to optimize different query patterns.
However, creating too many indexes can also have downsides, such as
increased storage and maintenance overhead, so it's essential to strike a
balance and consider the most common and performance-critical query
scenarios when designing compound indexes.
Question 26
What do _ and % are used in the SQL query?
Short answer - They are used in the LIKE operators while writing SQL
queries. The underscore is used to match one character, while % is used for
any number of characters.
For example, suppose you want to find all names that start with "J" and have
exactly four characters in total:
This query will retrieve all records where the name starts with "J" and has
exactly four characters after the initial "J."
— 50 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
2. Percent Sign (%) Wildcard: The percent sign (%) is used as a wildcard
that represents any sequence of characters (including zero characters).
When used in a LIKE comparison, it matches any number of
characters, including none, at the position where it appears.
For example, suppose you want to find all names that start with "A":
This query will retrieve all records where the name starts with "A," followed
by any sequence of characters.
You can also use the % wildcard in combination with other characters to
create more complex patterns. For instance, %n% will match any name that
contains the letter "n" at any position.
Keep in mind that the LIKE operator is not case-sensitive by default in most
database systems. If you need case-sensitive matching, you can use
appropriate collations or convert the strings to a specific case using
functions like LOWER() or UPPER() before performing the comparison.
Question 27
How do you ensure that a particular SQL query will use a particular
Index?
Short answer - You can use SQL hints for that purpose.
Long answer - In SQL, you cannot directly force a query to use a specific
index. The query optimizer in the database management system (DBMS) is
responsible for choosing the most efficient execution plan, including which
indexes to use, based on the available statistics and the complexity of the
query.
— 51 —
GROKKING THE SQL INTERVIEW
However, you can influence the query optimizer's decision and encourage
it to use a particular index by following these best practices:
• Create the Right Index: Ensure that you have appropriate indexes
on the columns used in the query's WHERE, JOIN, and ORDER
BY clauses. The presence of a suitable index increases the chances
of the query optimizer selecting it.
• Statistics: Keep the statistics up-to-date for the tables and indexes.
The query optimizer relies on statistics to estimate the cost of
different execution plans and make informed decisions.
• Index Hints: Some database systems provide index hints that allow
you to suggest which index to use in a query. However, using hints
should be a last resort, as the query optimizer is generally good at
choosing efficient execution plans.
• Rewrite the Query: Sometimes, rewriting the query can influence
the query optimizer's choices. Small changes in the query logic
might lead to different execution plans, which may make better use
of certain indexes.
• Covering Indexes: Create covering indexes that include all the
columns required by the query. This allows the query to retrieve all
the necessary data directly from the index without accessing the
actual table, improving performance.
• Avoid Indexing Overkill: Having too many indexes on a table can
increase the overhead of maintaining them and might confuse the
query optimizer. Keep indexes selective and remove unused or
redundant indexes.
• Remember that the query optimizer's decision to use an index is
based on the overall cost estimation for the execution plan. Even
if you encourage the use of a specific index, the optimizer may still
— 52 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 28
In SQL Server, which one is fastest and slowest between an index seek,
an index scan, and table scan?
Short answer - In a general, the table scan is slower than the index scan and
index seek. Your goal should be to write queries that can take advantage of
the index by using index scan and index seek for faster retrieval. You can
use the SQL EXPLAIN command to retrieve the query plans and find out
whether indexes are used for your query or not.
1. Index Seek:
• An Index Seek is the fastest operation because it directly navigates
to the specific rows in the table using the index key. It is efficient for
queries that have selective criteria, as it minimizes the number of
rows to be retrieved.
2. Index Scan:
• An Index Scan is the second fastest operation. It reads all the rows
in an index (or a range of index entries) that match the query's filter
conditions. While not as efficient as an Index Seek, it is still faster
— 53 —
GROKKING THE SQL INTERVIEW
3. Table Scan:
• A Table Scan is the slowest operation. It reads all the rows from a
table, whether or not they match the query's filter conditions. It is
the least efficient operation, as it involves reading the entire table,
which can be very time-consuming for large tables.
The performance of these operations can be influenced by factors such as the
size of the table, the distribution of data, the presence of suitable indexes, and
the complexity of the query. It's essential to create appropriate indexes and
write efficient queries to ensure that SQL Server can use Index Seeks and
Index Scans whenever possible, as they are more efficient than Table Scans.
The SQL Server query optimizer plays a crucial role in determining which
operation to use based on statistics, available indexes, and query complexity.
By ensuring proper indexing and writing well-optimized queries, you can
improve query performance and reduce the likelihood of Table Scans,
which tend to be the slowest operation.
Question 29
What does NULL = NULL will return in SQL?
In SQL, the expression NULL = NULL will not return TRUE, as you might
expect. Instead, it will return NULL.
When you compare two NULL values using the equals (=) operator, SQL
cannot determine whether they are equal or not because they are both
— 54 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
To check for NULL values in SQL, you should use the IS NULL or IS NOT
NULL operators:
For example:
In this query, we are selecting all records from the employees table where
the department_id column has a NULL value. This is the correct way to
handle NULL comparisons in SQL.
Question 30
Write SQL query to find all rows where EMP_NAME, a VARCHAR
column is NULL?
This is a simple question often asked beginners to check whether they know
how to check for NULL in SQL or not. The trick here is to use "IS NULL"
instead of the "=" operator to find all rows where EMP_NAME is NULL.
— 55 —
GROKKING THE SQL INTERVIEW
Question 31
What is the temp table?
Short answer - A temp table or a temporary table is a base table that is not
stored in the database and only exists while the current database session is
active. Once the database connection is closed, all temp tables are lost. They
may look similar to view but they are not. A view exists only for a single query
but you can use a temporary table as a regular table until your session is active.
— 56 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 32
What is the fastest way to empty or clear a table?
Short answer - You can use the truncate command to empty or clear the
table. It's faster than delete because it doesn't log each deleted entry on a
log, that's why you cannot roll back it. So be careful while using truncate to
clear or empty a table.
Long answer - The fastest way to empty or clear a table in most relational
database systems, including SQL Server, is to use the TRUNCATE TABLE
statement. The TRUNCATE TABLE statement removes all rows from a
table quickly and efficiently, and it is typically faster than using the DELETE
statement, especially for large tables.
— 57 —
GROKKING THE SQL INTERVIEW
Replace table_name with the name of the table you want to empty.
— 58 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
In summary, if you need to quickly empty or clear a table and you don't have
foreign key constraints referencing it, the TRUNCATE TABLE statement
is the fastest and most efficient option. However, be mindful of the
considerations mentioned above and verify that it aligns with your specific
use case.
Question 33
What is an identity column in SQL Server? How do you return an
identity value from a table?
In SQL Server, an identity column is a special type of column that
automatically generates a unique value for each new row added to a table. It
is commonly used to provide a unique and sequential identifier for the rows
in a table. The values in an identity column are typically used as primary
keys for the table.
— 59 —
GROKKING THE SQL INTERVIEW
PRIMARY KEY indicates that the identity column will be the primary key
for the table. The IDENTITY(1,1) specifies the seed value and the
increment value for the identity column. In this case, the identity starts at 1
and increments by 1 for each new row.
To return the identity value from a table after inserting a new row, you can
use one of the following methods:
— 60 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
You should choose the appropriate method based on your specific use case
and the behaviour you need for returning the identity value after an insert
operation.
Question 34
How do you return an identity value from a table with a trigger?
Short answer - In SQL Server, you can use a function like @@IDENTITY to
generate identity values.
Question 35
How do you return a value from a stored procedure?
Short answer - In SQL Server, you can either use the OUTPUT parameter
or use the return statement to return a value from a stored procedure.
Long answer - In SQL Server, you can return a value from a stored procedure
using the RETURN statement or by using an OUTPUT parameter.
— 61 —
GROKKING THE SQL INTERVIEW
To execute the stored procedure and retrieve the returned value, you use the
following:
— 62 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
To execute the stored procedure and retrieve the values returned through
OUTPUT parameters, you use the following:
Remember that RETURN statements can only return a single integer value,
while OUTPUT parameters can return multiple values of different data
types. Choose the appropriate method based on the number and type of
values you need to return from the stored procedure.
Question 36
How do you return a VARCHAR value from a stored procedure?
Hint - using OUTPUT parameter return clause as shown above
Question 37
If you have a column that will only have values between 1 and 250 what
data type will you use?
If you are using SQL Server database then you can use the TINYINT
datatype which can accommodate numbers between 0 and 255 and it needs
1 byte for storage.
— 63 —
GROKKING THE SQL INTERVIEW
This question is asked to test your knowledge of SQL data type and whether
you can choose the right data type for a given requirement or not. Your goal
should be to use a data type that can accommodate a given range. You can
also ask questions that whether it's fixed or it can change in the future.
Question 38
Difference between LEFT and RIGHT OUTER JOIN in SQL?
Both are outer joins, in LEFT outer join, all rows from the left side table
will be included, and only matching rows from the other side of the table
are included. In case of RIGHT outer join, all rows of the right side of the
table on join condition are included.
Question 39
Can you write an SQL query to select all last names that start with 'T'?
Here's an SQL query to select all last names that start with 'T':
In this query:
This query will return all last names that start with the letter 'T' from the
employees table.
— 64 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 40
How would you select all rows where the date is 20231002?
To select all rows where the date is '2023-10-02', you would use the following
SQL query:
Replace your_table with the actual name of your table and date_column
with the name of the column that holds the date values.
This query will retrieve all rows from the specified table where the date in
the date_column is '2023-10-02'.
Question 41
What is the difference between a local and global temporary table in
SQL Server?
I did explained this before on previous question but here it is again:
In SQL Server, local and global temporary tables are two types of temporary
tables that serve different purposes:
— 65 —
GROKKING THE SQL INTERVIEW
• They are useful for temporary data storage within a specific session,
such as in stored procedures or user-defined functions.
In summary, the main differences between local and global temporary tables
lie in their scope and lifespan. Local temporary tables are limited to a single
session and are dropped when the session ends, while global temporary
tables can be accessed by multiple sessions and are dropped when the last
session that references them is closed.
Question 42
How do you create a copy of a table in SQL Server?
To create a copy of a table in SQL Server, you can use various methods,
depending on your requirements and the complexity of the table structure.
Here are a few approaches:
— 66 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
This creates a new table new_table with the same structure as old_table and
copies all data into it.
Question 43
How do you change the data type of a column in a table in SQL?
To change the data type of a column in a table in SQL, you can use the
ALTER TABLE statement along with the ALTER COLUMN clause. Here's
the general syntax:
Replace table_name with the name of the table containing the column,
column_name with the name of the column you want to modify, and
new_data_type with the desired new data type.
For example, if you want to change the data type of a column named age in
a table named employees to INT, the SQL statement would be:
— 67 —
GROKKING THE SQL INTERVIEW
Make sure to review the documentation for your specific database system
to understand any nuances or limitations related to altering column data
types.
Question 44
What data type should you use to store monetary values in a table?
When storing monetary values in a table, it is recommended to use the
DECIMAL (also known as NUMERIC) data type. This data type is
specifically designed for precise decimal arithmetic and is suitable for
representing fixed-point numbers, such as monetary values.
1. Precision (p): The total number of digits that can be stored, both to
the left and right of the decimal point.
2. Scale (s): The number of decimal places that are stored to the right
of the decimal point.
— 68 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 45
What does SELECT 3/2 will return? 1 or 1.5?
In most programming languages including Java and database systems,
including SQL Server, when you perform an arithmetic operation like
division on integers, the result is treated as an integer division, and any
fractional part is truncated (rounded down).
So, in the case of SELECT 3/2, the result will be 1, not 1.5. The fractional
part 0.5 is truncated, and only the integer part of the division result is
returned.
— 69 —
GROKKING THE SQL INTERVIEW
Question 46
What is the maximum value that Decimal (6, 5) can hold in SQL
Server?
In SQL Server, the Decimal (p, s) data type represents a fixed precision and
scale decimal number. The p parameter specifies the total number of digits
that the number can hold (precision), and the s parameter specifies the
number of decimal places (scale).
For Decimal (6, 5), this means it can hold a total of 6 digits, with 5 of those
digits being reserved for decimal places. The remaining digit is used for the
non-decimal part.
The maximum value that a Decimal (6, 5) can hold is determined by its
precision and scale. In this case, with a precision of 6 and a scale of 5, the
maximum value can be represented as:
0.99999
This value is derived from the fact that there are 6 total digits, with 5 of them
after the decimal point. The largest value that can be represented is where
all the available digits are set to 9 in the decimal places.
Please note that the actual range of values might be constrained by other
factors such as the available memory and storage.
Question 47
If you have a table with one integer column ID, and it has three values
101, 201, and NULL? What will the following SQL query SELECT *
FROM TestTable where ID !=101 will return?
The SQL query SELECT * FROM TestTable WHERE ID != 101 will return
the rows where the ID column is not equal to 101. However, it's important
to note that when comparing with NULL values, the result might not be as
— 70 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Question 48
What is your favorite SQL book?
This one is an easy question and the interviewer just wants to know whether
you have read any book or not. You can name the SQL book you have read,
if you haven't read any SQL book so far then I highly recommend you to
read Head First SQL if you are learning SQL from scratch and Joe Celko's
SQL Puzzles if you already know SQL and looking for some SQL puzzles to
test your SQL query skills.
Question 49
Tell me two SQL best practices you follow?
Short answer - creating indexes and using them on SQL queries,
normalization, and updating statistics regularly.
— 71 —
GROKKING THE SQL INTERVIEW
Long answer – Here are two SQL best practices that are widely recommended:
Question 50
What is the different ISOLATION level in the Microsoft SQL Server
database?
In Microsoft SQL Server, isolation levels define how transactions interact
with each other in a multi-user environment. Different isolation levels
— 72 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
1. READ UNCOMMITTED:
• Least restrictive isolation level.
• Allows transactions to read uncommitted changes made by other
transactions, leading to potential dirty reads, non-repeatable reads,
and phantom reads.
2. READ COMMITTED:
• Default isolation level.
• Allows transactions to read only committed data, avoiding dirty
reads. However, it may still experience non-repeatable reads and
phantom reads.
3. REPEATABLE READ:
• Prevents other transactions from updating or inserting new rows
that would affect the result set of the current transaction.
• Guarantees consistent results for read operations, but can still lead
to phantom reads (newly inserted rows).
4. SERIALIZABLE:
• Provides the highest level of isolation.
• Prevents other transactions from updating, inserting, or deleting
rows that would affect the result set of the current transaction.
• Offers the highest data integrity but can lead to decreased
concurrency due to locking.
— 73 —
GROKKING THE SQL INTERVIEW
5. SNAPSHOT:
• Uses row versioning to allow each transaction to see a snapshot of
the data as it existed at the start of the transaction.
• Helps to prevent blocking and provides a higher level of
concurrency than SERIALIZABLE.
Isolation levels affect how locks are acquired and released, impacting the
trade-off between data consistency and concurrency. Choosing the
appropriate isolation level depends on the specific requirements of your
application and the potential impact on performance and concurrency. It's
important to carefully consider the implications of each isolation level and
test their behaviour in your specific use case.
Question 51
If you create a local temp table and then call a proc is the temp table
available inside the proc?
Yes, if you create a local temporary table (#table_name) and then call a
stored procedure in the same session, the temporary table will be available
inside the stored procedure.
Local temporary tables are session-specific, meaning they are accessible only
within the context of the session that created them, including any stored
procedures called from that session.
— 74 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Inside the stored procedure, the temporary table is still accessible, and you can
perform operations on it.
Remember that local temporary tables are automatically dropped when the
session that created them ends.
This means that after the session ends (when the connection is closed), the
temporary table will no longer exist.
— 75 —
GROKKING THE SQL INTERVIEW
Question 52
Which date format is the only safe one to use when passing dates as
strings?
When passing dates as strings in SQL queries, the safest date format to use
is the ISO 8601 format, specifically the "YYYY-MM-DD" format for dates
and "YYYY-MM-DDTHH:MM:SS" format for timestamps with time zone
information. For example:
• Date: "2023-08-05"
• Timestamp: "2023-08-05T15:30:00"
The ISO 8601 format is widely supported across different database systems
and programming languages, making it less prone to ambiguity or
misinterpretation. It's also less affected by regional date format settings.
Using this format helps ensure consistent and correct date parsing
regardless of the environment.
It's important to note that using the ISO 8601 format may still require
proper handling and conversion in your code to ensure that the passed
string is correctly interpreted as a date or timestamp by the database system
you are using.
Question 53
How do you suppress rows affected messages when executing an insert
statement in SQL Server?
In SQL Server, you can suppress the "rows affected" messages when
executing an INSERT statement by using the SET NOCOUNT ON
statement before your INSERT statement. This prevents SQL Server from
sending the informational message indicating the number of rows affected
by the statement.
— 76 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
By setting NOCOUNT to ON, you will suppress the "X rows affected"
message for the current session, making the output cleaner, especially when
you are performing bulk inserts or other operations where you don't need
the row count information. Remember to set NOCOUNT back to OFF
afterward to restore the default behavior.
Note that this technique is specific to SQL Server. Other database systems
may have different methods or settings to achieve similar results.
Question 54
Difference between ANSI-89 and ANSI-92 syntax of writing SQL
Join?
The ANSI-89 syntax uses the WHERE clause to specify join conditions,
often resulting in less readable and harder-to-maintain code. It lacks explicit
JOIN keywords, leading to confusion between filtering and joining. In
contrast, the ANSI-92 (or SQL-92) syntax employs explicit JOIN keywords
(INNER, LEFT, RIGHT, FULL), separating join conditions from filtering
conditions, enhancing code clarity and maintainability. ANSI-92 supports
more complex joins, reducing ambiguity and improving query
optimization. It's the recommended choice for modern SQL queries due to
its structured and versatile approach to joining tables.
— 77 —
GROKKING THE SQL INTERVIEW
Question 55
Differences between IN and EXISTS (and NOT IN, and NOT EXISTS)
in SQL?
IN and EXISTS are SQL operators that are used to filter and compare values
in different ways. Similarly, NOT IN and NOT EXISTS are used for
negating those comparisons. Let's explore the differences between them:
This query retrieves all employees from the departments with IDs
101, 102, or 103.
— 78 —
SQL AND DATABASE TELEPHONIC INTERVIEW QUESTIONS
Example:
This query retrieves all students whose age is not 18, 19, or 20.
This query retrieves all products that have not been ordered.
Key Differences:
• IN and NOT IN compare values directly, while EXISTS and NOT
EXISTS are used for existence checks based on subqueries.
• IN and NOT IN are typically used for comparing values from a list,
while EXISTS and NOT EXISTS are often used for correlated
subqueries (subqueries that reference columns from the outer
query).
• EXISTS and NOT EXISTS are usually more efficient for checking
existence compared to IN and NOT IN because they stop processing
once a match is found.
— 79 —
GROKKING THE SQL INTERVIEW
That's all on this list of frequently asked database and SQL interview
questions. This is like a warm up for the topic-wise questions we are going
to cover but if you are in hurry, you can complete this chapter and get ready
for your interview in quick time.
We have also covered a lot of SQL Server questions in this list and some
questions from Oracle Database but most of the questions are applicable to
all the databases including MySQL. As I said, you can use this list to quickly
revise the essential SQL and Database concepts before you go for the
interview.
— 80 —
CHAPTER 2
SQL JOIN
In this chapter, we will explore the concept of SQL Joins and delve into the
various types of joins, preparing you to confidently tackle SQL Join-related
interview questions. By the end of this chapter, you will have a
comprehensive understanding of the intricacies involved in connecting
tables, optimizing joins, and combining data seamlessly.
1. Inner Join: Understanding the most common type of join, the Inner
Join, and how it returns only the rows that have matching values in
both tables. Learn how to write inner join queries using the JOIN
keyword or the explicit INNER JOIN syntax.
— 81 —
GROKKING THE SQL INTERVIEW
2. Left Join (Left Outer Join): Exploring the Left Join, which returns
all the rows from the left table and the matching rows from the right
table. If there are no matches in the right table, NULL values are
returned. Learn how to perform left joins using the LEFT JOIN
keyword.
4. Full Join (Full Outer Join): Exploring the Full Join, which returns
all rows when there is a match in either the left or right table. If there
are no matches in both tables, NULL values are returned. Learn how
to perform Full Joins using FULL JOIN or combining Left and Right
Joins.
6. Self Join: Delving into the Self Join, a special case where a table is
joined with itself, typically to compare records within the same
table. Understand the syntax and scenarios where self joins are
useful.
7. Joining Multiple Tables: Learning how to join more than two tables
in a single query to build complex relationships and retrieve data
from interconnected datasets. Understand the order of joins and the
importance of parentheses in multiple join queries.
— 82 —
SQL JOIN
SQL Joins are a vital tool in the arsenal of any SQL professional. Whether
you're seeking to consolidate data, perform data analysis, or generate
meaningful reports, understanding the various types of joins and their
appropriate use-cases is indispensable. By mastering SQL Joins, you will
— 83 —
GROKKING THE SQL INTERVIEW
Let's embark on this journey into the world of SQL Joins, empowering you
to build seamless connections between tables and unlock the true potential
of your data!
Below are 30 frequently asked SQL questions based upon Joins along with
their answers:
Question 1
What is the difference between INNER JOIN and LEFT JOIN in SQL?
INNER JOIN returns only the rows with matching values in both tables,
while LEFT JOIN returns all rows from the left table and the matching rows
from the right table. If there are no matches in the right table for a left join,
NULL values are returned.
Question 2
How do you perform a simple INNER JOIN between two tables in
SQL?
You can use the JOIN keyword or the INNER JOIN keyword followed by
the ON clause specifying the join condition.
For example:
— 84 —
SQL JOIN
Question 3
Explain the usage of RIGHT JOIN with an example.
RIGHT JOIN returns all rows from the right table and the matching rows
from the left table. If there are no matches in the left table, NULL values are
returned.
For example:
Question 4
How can you perform a FULL JOIN to retrieve all rows from two
tables, even when there are no matches?
You can use the FULL JOIN or combine the LEFT JOIN and RIGHT JOIN
using the UNION operator.
For example:
Question 5
What is a SELF JOIN, and in what scenarios is it useful?
A SELF JOIN is a join where a table is joined with itself. It is useful in
scenarios where you need to compare records within the same table, such as
finding employees who share the same manager.
— 85 —
GROKKING THE SQL INTERVIEW
Question 6
How do you perform a SELF JOIN in SQL?
You can use table aliases to distinguish between the two instances of the
same table.
For example:
Question 7
What is the result of a CROSS JOIN?
A CROSS JOIN (also known as a Cartesian Join) returns the Cartesian
product of both tables, combining every row from the first table with every
row from the second table.
Question 8
How do you perform a CROSS JOIN in SQL?
You can use the CROSS JOIN keyword to perform a cross join.
For example:
Question 9
Explain the concept of Non-Equi Join.
A Non-Equi Join uses join conditions other than equality. For example,
using greater than or less than operators to compare values between tables.
— 86 —
SQL JOIN
Question 10
How do you handle NULL values during joins?
To handle NULL values during joins, you can use the IS NULL or IS NOT
NULL condition in the WHERE clause or coalesce NULL values using the
COALESCE function.
Question 11
What is the difference between an INNER JOIN and an equi-join?
An INNER JOIN is a type of equi-join, which means the join condition uses
the equality operator (=) to match values between tables.
Question 12
How can you retrieve records from the left table that do not have
corresponding matches in the right table?
You can use a LEFT JOIN and specify a condition in the WHERE clause to
filter out rows where the right table's columns are NULL.
For example:
Question 13
What is the difference between a LEFT JOIN and a RIGHT JOIN?
The difference lies in the order of tables in the join. LEFT JOIN returns all rows
from the left table and matching rows from the right table, while RIGHT JOIN
returns all rows from the right table and matching rows from the left table.
— 87 —
GROKKING THE SQL INTERVIEW
Question 14
How do you perform a multi-table join in SQL?
You can perform a multi-table join by chaining multiple JOIN clauses
together or using parentheses to group joins.
For example:
Question 15
How can you find records with no matching entries in either of the
two joined tables?
You can use a FULL JOIN and specify a condition in the WHERE clause to
filter out rows where either the left or right table's columns are NULL.
For example:
— 88 —
SQL JOIN
Question 16
When should you use an INNER JOIN over a LEFT JOIN or RIGHT
JOIN?
Use an INNER JOIN when you want to retrieve only the rows that have
matching values in both tables. Use LEFT JOIN when you want all rows
from the left table and matching rows from the right table. Use RIGHT JOIN
when you want all rows from the right table and matching rows from the
left table.
Question 17
How can you find records with matching values in one table but not
in another?
You can use a LEFT JOIN and specify a condition in the WHERE clause to
filter out rows where the right table's columns are NULL.
For example:
Question 18
What are the key differences between a FULL JOIN and a UNION?
FULL JOIN combines data from both tables, retaining non-matching rows
with NULLs, while UNION combines the result sets of two or more SELECT
queries, removing duplicate rows.
— 89 —
GROKKING THE SQL INTERVIEW
Question 19
How do you combine results from multiple queries without using a
join?
You can use the UNION operator to combine results from multiple SELECT
queries.
For example:
Question 20
What is the difference between a self join and a regular join?
A self join is a join where a table is joined with itself, while a regular join is
a join between two different tables. Self joins are useful when comparing
records within the same table.
Question 21
How can you find the top N employees based on their salary using
joins?
You can use the ORDER BY clause in combination with the LIMIT or TOP
clause (depending on the database system) to find the top N employees
based on their salary.
— 90 —
SQL JOIN
In SQL Server:
Question 22
How can you find employees who have the same manager using a self
join?
You can use a self join on the manager_id column to find employees who
share the same manager.
For example:
Question 23
What is the purpose of using table aliases in SQL joins?
Table aliases provide a shorthand notation for table names, improving
query readability and reducing the amount of typing. They are especially
useful when dealing with self joins or joining multiple tables.
— 91 —
GROKKING THE SQL INTERVIEW
Question 24
How can you join three or more tables in a single query?
You can use multiple JOIN clauses to join three or more tables together.
For example:
Question 25
Explain how you can perform a LEFT JOIN between two tables and
keep only the rows that do not match.
To perform a LEFT JOIN and keep only the rows with no match, you can
specify a condition in the WHERE clause that checks for NULL values in
the right table's columns.
For example:
Question 26
Can you use a JOIN without specifying a join condition? If yes, what
will be the result?
Yes, you can use a CROSS JOIN (Cartesian Join) without specifying a join
condition. The result will be a combination of every row from the first table
with every row from the second table, resulting in a large result set.
— 92 —
SQL JOIN
Question 27
What is the benefit of using an equi-join over a non-equi join?
Equi-joins are more straightforward and easier to read than non-equi joins.
Additionally, equi-joins are often more efficient and can take advantage of
indexes.
Question 28
How do you perform a LEFT JOIN between two tables and include all
rows from the left table, even if there are multiple matches in the right
table?
To include all rows from the left table, even with multiple matches in the
right table, you can use the GROUP BY clause along with aggregate
functions like GROUP_CONCAT() (MySQL) or STRING_AGG()
(PostgreSQL).
For example:
— 93 —
GROKKING THE SQL INTERVIEW
Question 29
How do you perform an anti-join (retrieve records that exist in one
table but not in another)?
You can perform an anti-join by using a LEFT JOIN and checking for NULL
values in the right table's columns in the WHERE clause.
For example:
Question 30
How can you find the average salary of employees in each department
using SQL joins?
You can use a combination of the AVG() aggregate function and a JOIN to
calculate the average salary of employees in each department.
For example:
These 30 SQL Join questions and their corresponding answers cover a wide
range of scenarios and concepts related to SQL Joins. Practicing these
questions will enhance your understanding of SQL joins, allowing you to
confidently handle interview questions and real-world data manipulation
challenges.
— 94 —
CHAPTER 3
SQL QUERIES
E very programming job interview has at least one or two questions that
require you to write SQL queries for a given requirement and many
developers struggle there. It's easy to answer theoretical questions like what
is the difference between clustered and non-clustered index (see) or what is
the difference between correlated and non-correlated subqueries (see), but
when it comes time to actually write SQL queries to solve problems, it's not
that easy, especially if you haven't done your homework and practice.
In order to learn fast, start with a small table with a few columns which
include data types like number, date, and String, which have fewer number
data so that you can quickly understand and expect what should be output.
Includes some NULL, empty, and out of bound values to really test your
queries.
Considering all these together today I am going to share SQL script to create
a sample table to practice writing SQL queries for interviews. In this article,
you will find an SQL script to create a table and populate it with sample data
and then write SQL queries to solve some common problems from
Interviews.
— 95 —
GROKKING THE SQL INTERVIEW
This is also the example many of you have used in your academics so it's
quite easy to understand and correlate. Remember, understanding schema
and data is very important not only to write correct SQL queries but also to
verify that your SQL query is correct by looking at the output.
The SQL queries are written for Microsoft SQL Server database and tested
on the same, but you can easily run on Oracle, MySQL, or any other
database of your choice by removing T-SQL code e.g. the one which checks
if a table already exists, and then drop and re-create it.
Most of the code is standard ANSI SQL, hence it will run as it is on any other
database. If you still face any problems then you can also check this guide to
migrate SQL Server queries to Oracle.
— 96 —
SQL QUERIES
— 97 —
GROKKING THE SQL INTERVIEW
( 9, 3, 1, 'Middleware', 60),
( 10, 2, 3, 'Legal 1', 110),
( 11, 3, 3, 'Network', 80),
( 12, 3, 1, 'UNIX', 200);
This query runs on the Test database, if you don't have the Test database
in your SQL Server instance then either create it or remove the "USE
Test" to run on any database of your choice, you can also change the name
of the database and keep the "USE".
When you run this script, it will create and populate the data the first time.
When you run it again, it will drop and recreate the tables again, as shown
in the following output:
In this script, I have followed the naming convention and tricks which I
discussed earlier in my article, a better way to write SQL queries. All the
keyword is on the capital case while table names and column names are in
small and camel case.
This example shows that just following some simple SQL best practices can
seriously improve the queries you write. If you are interested in learning
more SQL best practices, I suggest reading SQL Anti patterns, an interesting
book for both beginners and experienced programmers.
— 98 —
SQL QUERIES
It's time to write SQL queries now. This section contains 6 SQL query
Interview questions that will test many of your SQL skills like joins,
grouping, and aggregating data, how you handle nulls in SQL etc. It doesn't
test all skills e.g. correlated subqueries, but you can take a look at questions
like how to find Nth highest salary of employees to learn that.
This section contains 6 problems for which you need to write SQL queries,
the solution is provided in the next section but I suggest you try to solve
these problems first before looking at the solution.
Question 1
Can you write an SQL query to show Employee (names) who have a
bigger salary than their manager?
In this problem, you need to compare employees' salaries to their manager's
salary. To achieve this, you need two instances of the same table. Also in
order to find a Manager you need to compare employee id with manager id,
this is achieved by using the self-join in SQL, where two instances of the
same table are compared.
Question 2
Write an SQL query to find Employees who have the biggest salary in
their Department?
This is a little bit complex problem to solve, you first need to find the
maximum salary of each department, but the department doesn't have the
— 99 —
GROKKING THE SQL INTERVIEW
salary, it is the employee who has the salary. So we need to create a virtual
table where we should have both department and salary.
Since we need to print the name of the employee who has the highest salary,
we need to compare each employee's salary with the department's highest
salary which we have just calculated. This can be done by keeping the result
of the previous query in a temp table and then joining it again with the
Employee table.
Question 3
Write an SQL query to list Departments that have less than 3 people
in it?
This is a rather simple SQL query interview question to solve.
You just need to know how to use the COUNT() function and GROUP BY
clause.
— 100 —
SQL QUERIES
Question 4
Write an SQL query to show all Departments along with the number
of people there?
This is a tricky problem, candidates often use inner join to solve the
problem, leaving out empty departments.
Question 5
Can you write an SQL query to show all Employees that don't have a
manager in the same department?
This is similar to the first SQL query interview question, where we have used
self-join to solve the problem. There we compared the salary of employee
and here we have compared their department.
— 101 —
GROKKING THE SQL INTERVIEW
Question 6
Can you write SQL query to list all Departments along with the total
salary there?
This problem is similar to the 4th question in this list. Here also you need to
use OUTER JOIN instead of INNER join to include empty departments
which should have no salaries.
Question 7
How do you find the second highest salary in the employee table?
There are many ways to find the second highest salary in the employee table,
e.g. you can either use the co-related subquery, or you can use a ranking
function like row_number() and rank().
The question can be even trickier by adding duplicates, e.g. two employees
with the same salary. In that case, you can choose between the ranking
function to satisfy the requirement.
I have already written a detailed blog post about this topic, please see here
for a couple of examples of finding the second highest salary in the employee
table.
— 102 —
SQL QUERIES
Question 8
How do you find the duplicate rows in a table?
Again, like the previous question, there are several ways to find the duplicate
rows in a table, e.g. you can use a ranking function like row_number
which assign row number depending upon the value you ask, and for the
same values as in the case of duplicate rows, the row number will be same.
This one of the classic ways to find the duplicate rows in a table, and it will
work in almost all databases like Oracle, SQL Server, or MySQL.
Question 9
How do you copy all rows of a table using SQL query?(solution)
Question 10
How do you remove the duplicate rows from the table?
This is the follow-up previous question and much tougher than the previous
one, especially if you are not familiar with the ranking functions like
row_number.
If you only know about the GROUP BY clause, then you will struggle to
remove duplicate rows because if you remove by emp_id, both rows will be
removed. Here you must use the row_number() function, the duplicate
rows will have row number > 1 like
— 103 —
GROKKING THE SQL INTERVIEW
John 1
Heroku 1
David 1
David 2
Here second David is duplicate, and you can easily remove it by giving
conditions like delete from table where row_number> 1.
10. How do you join more than two tables in SQL query? (solution)
Question 11
How to find 2nd highest salary without using a co-related subquery?
(solution)
Question 12
There exists an Order table and a Customer table, find all Customers
who have never ordered (solution)
Don't scroll down to look at the solution until you try solving all the
problems by yourself. Some of the questions are tricky, so please pay special
attention to them. It's not a real interview you can take your time because
all the hard work your mind will put now to find answers by its own will
always remain there and that's the real learning you will get by doing this
exercise.
— 104 —
CHAPTER 4
Indexes
You should know different types of index and their pros and cons, how your
database or Query engine chooses indexes and the impact of table scan,
index scan, and index seek. You should also be able to build upon your
knowledge to optimize your SQL queries as well as troubleshoot a
production issue involving long running queries or poor performing
queries. This is also a great skill for experience developers with 5 to 6 years
of experience under his belt.
— 105 —
GROKKING THE SQL INTERVIEW
These are very common question and if you have work in SQL or database,
you might have faced those situations already.
In this article, I have compiled such question to bring them together and
give them a nice overview. To keep the article short and simple, I haven't
gone into too much detail but I have pointed resource to learn more if you
are curious about it.
These questions will help you to refresh your knowledge of how index work
in general, barring the database differences. Btw, this list is by no means
complete and if you have any good index based question then feel free to
share with us.
Question 1
What is index in database?
An index is a object in database which help you to retrieve data faster.
Similar to index pages of book, index are stored in separate place than data
and they point to the data. Though it’s not mandatory to have an index in a
table, you often need index for faster retrieval of data using SELECT queries.
You can create index on a column or a set of columns in a given table. The
SQL clause CREATE INDEX is used to create index in a table.
Question 2
What are different types of index in Oracle or SQL Server?
There are mainly two types of indices in any database, clustered and non-
clustered, but if you want to divide on number of columns then you can also
say that you have a index which is based uopn just one column and then you
have a composite index which is based upon a set of columns.
— 106 —
INDEXES
Question 3
How does index work?
When you create an index, there are arranged in a tree structure, so that you
can navigate them in logN time. A data structure like B-Tree is used to store
index but that may vary depending upon the database. Each node in index
tree reference to other node and nodes at the leaf contains pointer to actual
data. When you fire a SELECT query to retrieve some data, SQL query
engine uses this tree to retrieve selective data. Whenever you add or remove
data, this index tree is re-arranged.
Question 4
What is index scan?
When you try to retrieve data from a table which has index but you didn't
provide any condition using WHERE clause then it uses index scan to
search for rows in index pages. For example, if you want all employees from
an employee table e.g.
Then it can use index-scan if you have a clustered index on Employee e.g.
on EmployeeId. It's generally faster than table scan but slower than index
seek.
Question 5
What is index seek?
The index seek is faster than index-scan because instead of scanning
through all index, you directly retrieve data using pointer stored in index.
This is actually the fastest way to retrieve data in a large table. This works
when you have to retrieve 10 to 20% of data e.g. by specifying conditions in
— 107 —
GROKKING THE SQL INTERVIEW
WHERE clause. For example, following query will use index seek, if you
have an index on EmployeeId
You can even see the actual Execution Plan in Microsoft SQL Server
Management Studio by pressing Ctrl + A and then running the queries as
shown below:
Question 6
What is difference between index scan and index seek in database?
The key difference between index scan and index seek is that seek is faster
than index scan. Former is generally used when you retrieve 90% to 100%
data e.g. queries without WHERE clause, while index seek is used when you
selectively retrieve data e.g. by specifying conditions using WHERE or
HAVING clause.
— 108 —
INDEXES
Question 7
What is difference between table scan and index scan?
There are two main ways to retrieve data from a table e.g. by using table scan
or by using index. If your table is small then table scan is probably the fastest
way but its in-efficient for a large table because you have to perform a lot of
IO.
In a table scan, SQL query engine, searches all records to find the matching
rows, while in index-scan it searches through index pages. If you don't have
index in your table then table scan is used, but if you have clustered index
then it is used even when you don't specify any condition in WHERE clause.
For example, select * from Books will use table scan if there is
clustered index in table but will use index-scan if there is a clustered index
in the table.
Question 8
What is difference between Clustered and Non-Clustered index in a
table?
There are two types of indexes in a table, clustered and non-clustered. The
Clustered index specifies the actual physical ordering of record in disk e.g.
if you have a Book table then you can arrange them using title or ISBN
number, if you create clustered index on title then they will be arranged and
stored in disk in that order.
— 109 —
GROKKING THE SQL INTERVIEW
Here is a nice diagram which clearly shows the difference between clustered
and non-clustered or secondary indices in SQL:
Question 9
I have a table which has clustered index on column 1, can I create
another clustered index on column 2?
No, you can only have one clustered index per table, so you cannot create
second one, it's not allowed.
— 110 —
INDEXES
Question 10
What is composite index in SQL?
An index may contain one or multiple columns. If a index contains multiple
columns then it is known as composite index. For example,
This index is based upon three column col1, col2, and col3.
Question 11
Does the order of columns matter in a composite index?
Yes, the order of columns matters in a composite index. For example, if you
have both firstname and lastname column in table and you create index on
(firstname, lastname) then it can also be used in queries when you specify
just one column, for example:
This query will use your composite index but following query will not use it
because the mandatory column firstname is not available:
— 111 —
GROKKING THE SQL INTERVIEW
Question 12
What is the disadvantage of having many indices in a table?
Even though, index make data retrieval fast, especially in large table, you
should be careful with so many index. Since index are re-arrange on every
insert, update, and delete operation, there is a cost associated with them. If
you execute more insert, update, and delete query than select then they will
be slower because the time it will take to re-arrange all those index tree.
Index data structure also take space in your database, so if you have multiple
index tree then obviously more space will be needed.
That's all about some of the frequently asked SQL and database interview
questions based upon Index. As I said, it's very important for every
programmer working with database and SQL to have a good understanding
of index because it directly affects the performance of your application. A
person with good knowledge of index not only write faster SQL queries but
also quickly to diagnose any performance issue with their SQL queries.
— 112 —
CHAPTER 5
GROUP BY
Many queries that require the use of aggregate function like sum(), avg(), or
count() requires the grouping of data using the GROUP BY clause. SQL
queries which involve GROUP BY and HAVING clauses are also a bit
confusing for many programmers who don't have hands-on experience in
SQL and are often used as SQL interview questions to filter.
In this article, we will see some examples of the GROUP BY clause in SQL
which help you to understand where to use group by clause and how to use
GROUP BY along with the SELECT clause. You will also learn some SQL
rules related to the GROUP BY clause which is available in some databases
particularly in MySQL as a group by extensions.
Now it's time to see the GROUP BY clause in action. The following are some
examples of how you can use GROUP BY to aggregate data and then apply
filtering on aggregated or grouped data by using the HAVING clause.
— 113 —
GROKKING THE SQL INTERVIEW
In order to write an SQL query to find the total salary per department, we
need to group by the department and use sum(salary) in the select list
as shown in the following SQL query :
— 114 —
GROUP BY
Then you can further filter records by using having clauses to perform
queries like finding all departments whose total salary expenditure is more
than 3000. Here is an SQL query for that :
This is your most expensive department in terms of salary. It's stood to know
the useful aggregate function like count and sum.
— 115 —
GROKKING THE SQL INTERVIEW
Similarly, you can use having clause to further filter down this result set like
finding a department whose average salary is below 1000. Here is an SQL
query for that :
This is your most poorly paid department. There is no point going into that
department and exploring further.
— 116 —
GROUP BY
One example of this kind of group by the query is how to find the number of
employees per department?
In this case, we need to group by the department and need to use the
count() aggregate function for counting employees as shown in the below
SQL query :
While using the count() function in SQL, it's worth noting the difference
between count() and count(field) which could give different
counts based upon which column on which you are counting and whether
that column contains NULL or not.
Just remember count() also counts NULL values in a column. NULLs are
very special in SQL and you should have a good knowledge of how to use
Null and how to compare them.
— 117 —
GROKKING THE SQL INTERVIEW
— 118 —
GROUP BY
1. You can not use a non-aggregated column name in the select list if it is
not used in Group By clause. For example following SQL query is illegal and
will not run because it has used a non-aggregate column emp_id in the select
list which is not named in Group By clause, in this SQL query only
dept_id is used in group by clause.
— 119 —
GROKKING THE SQL INTERVIEW
3. Another rule of using group by clause in standard SQL is that you cannot
use an alias in the HAVING clause, the previous SQL query is also an
example of this group by rule. This is also allowed in the MySQL database.
NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)
You can also set MySQL SQL mode for a single client session by using the
word SESSION instead of GLOBAL as shown below :
NO_ENGINE_SUBSTITUTION,ONLY_FULL_GROUP_BY';
Query OK, 0 rows affected (0.00 sec)
— 120 —
GROUP BY
Also just changing MySQL SQL mode to Global will not take effect in the
current session until you restart a new session. In order to enable
ONLY_FULL_GROUP_BY in the current MySQL session using the above
query.
That's all on the GROUP BY clause example in SQL queries. We have seen
where to use the GROUP BY command and How to use the GROUP BY
clause to group data in SQL. As I said GROUP BY and HAVING clause is
one of the must-know for any programmers as it's quite common in the SQL
world and particularly important for reporting purposes.
— 121 —
CHAPTER 6
As the saying goes, "time is money," and this holds true especially in the
realm of database management and querying. In today's competitive job
market, employers often seek candidates with a strong grasp of SQL's date
and time functions, as these skills play a crucial role in handling time-
sensitive data effectively.
In this chapter, we will delve into the essential aspects of SQL date and time
handling, preparing you to confidently tackle interview questions related to
this critical topic.
— 122 —
SQL DATE AND TIME INTERVIEW QUESTIONS
Date and Time Functions: Familiarizing yourself with the wide array of
SQL functions designed to extract, manipulate, and format date and time
values. Some crucial functions include DATEPART, EXTRACT,
DATEADD, DATEDIFF, TO_DATE, TO_CHAR, and many more.
Date and Time Constraints: Exploring the concept of date and time
constraints, including defining constraints on columns to ensure the
validity and integrity of date and time data stored in a database.
Date and Time Formatting: Gaining insights into formatting date and time
values for display purposes, including custom date formats and converting
between different date formats.
— 123 —
GROKKING THE SQL INTERVIEW
Date and Time in Joins and Conditions: Understanding how to use date
and time values effectively in JOIN clauses, WHERE clauses, and other
conditional statements to filter and combine data accurately.
Best Practices: Learning industry best practices for working with dates and
times, optimizing queries, and ensuring efficient date-related operations in
SQL databases.
Now, let’s see a frequently asked SQL questions related to Date and Time:
Question 1
How do you find all records modified between two dates in SQL?
This is one of the tricky SQL questions. It looks simple, but when you go
and write the SQL query, you will find that it's not working as expected. For
example, many programmers will use the "between" SQL clause to find
all the rows modified between two dates e.g.
This will work well in most cases, but if there is an order where order_date
is midnight of 1st October, then it will also be picked. You can be more
precise by using the logical operator, as shown below:
— 124 —
SQL DATE AND TIME INTERVIEW QUESTIONS
Question 2
How do you extract the year from a date in SQL?
You can extract the year from a date using the YEAR() function.
For example:
Question 3
How can you add 3 months to a given date in SQL?
You can add 3 months to a date using the DATEADD() function.
For example:
Question 4
How do you find the number of days between two dates in SQL?
You can find the number of days between two dates using the DATEDIFF()
function.
For example:
Question 5
Explain the difference between DATE and DATETIME data types.
DATE stores only the date without any time information, while
DATETIME stores both the date and the time.
— 125 —
GROKKING THE SQL INTERVIEW
Question 6.
How do you convert a date to a different date format in SQL?
You can use the CONVERT() or FORMAT() function to convert a date to
a different format. For example:
Question 7
How can you find records that fall within a specific month and year?
You can use the MONTH() and YEAR() functions in combination with the
WHERE clause. For example:
Question 8
How do you handle time zone conversions in SQL?
To handle time zone conversions, you can use the AT TIME ZONE clause
(for SQL Server) or the AT TIME ZONE function (for PostgreSQL).
For example:
— 126 —
SQL DATE AND TIME INTERVIEW QUESTIONS
Question 9
Can you enforce a constraint to ensure a date column always contains
future dates?
Yes, you can use a constraint with the CHECK keyword to enforce this.
For example:
Question 10
How can you find the first and last day of the current month?
You can use the EOMONTH() function to find the last day and then
subtract the number of days in the month minus one to find the first day.
For example:
Question 11
How do you find records with overlapping date ranges in SQL?
You can use the BETWEEN operator to check for overlapping date ranges.
For example:
— 127 —
GROKKING THE SQL INTERVIEW
Question 12
How can you calculate the age of a person from their birthdate?
You can use the DATEDIFF() function to calculate the difference in years
between the birthdate and the current date.
For example:
Question 13
Explain the significance of the UNIX timestamp and how to convert
it to a readable date format.
The UNIX timestamp represents the number of seconds that have elapsed
since January 1, 1970 (UTC). To convert it to a readable date format, you
can use the FROM_UNIXTIME() function (for MySQL) or the
TO_TIMESTAMP() function (for PostgreSQL). For example:
Question 14
How do you get the current date and time in SQL?
You can use the GETDATE() or CURRENT_TIMESTAMP function to get
the current date and time. For example:
— 128 —
SQL DATE AND TIME INTERVIEW QUESTIONS
Question 15
How can you convert a string to a date in SQL?
You can use the CAST() or CONVERT() functions to convert a string to a
date.
For example:
Question 16
How do you find the day of the week for a given date in SQL?
You can use the DATEPART() function with the dw parameter to find the
day of the week (Sunday = 1, Monday = 2, etc.). For example:
— 129 —
GROKKING THE SQL INTERVIEW
Question 17
How do you find the number of weekdays (excluding weekends)
between two dates?
You can use a combination of DATEDIFF() and DATEPART() functions
to find the number of weekdays between two dates. For example:
SELECT (DATEDIFF(dd, start_date, end_date) + 1)
- (DATEDIFF(wk, start_date, end_date) * 2)
- CASE
WHEN DATEPART(dw, start_date) = 1
THEN 1
ELSE 0
END
- CASE WHEN DATEPART(dw, end_date) = 7
THEN 1
ELSE 0
END AS weekdays_count
FROM table_name;
Question 18
How do you extract the time part from a DATETIME column in SQL?
You can use the CONVERT() function with a style parameter of 108 to
extract the time part. For example:
SELECT CONVERT(TIME, datetime_column, 108) AS
extracted_time
FROM table_name;
Question 19
How can you find the records with the latest date in a table?
You can use the MAX() function to find the latest date in a table. For example:
— 130 —
SQL DATE AND TIME INTERVIEW QUESTIONS
Question 20
Explain the concept of leap years and how you would identify if a year
is a leap year in SQL.
Leap years are years with an extra day, February 29, to keep the calendar
year synchronized with the astronomical year. In SQL, you can identify a
leap year by checking if the year is divisible by 4, except for years divisible
by 100, which are not leap years unless they are also divisible by 400. For
example:
SELECT
CASE
WHEN (year_column % 4 = 0
AND year_column % 100 != 0) OR (year_column
% 400 = 0)
THEN 'Leap Year'
ELSE 'Not a Leap Year'
END
AS leap_year_status FROM table_name;
These 20 SQL Date and Time Interview Questions cover a wide range of
topics, from basic date and time manipulations to more advanced functions
and constraints. Familiarizing yourself with these concepts and practicing
their implementation will undoubtedly boost your confidence when facing
date and time-related interview questions in SQL. Remember that hands-on
practice with real-world scenarios will reinforce your understanding and
help you excel in SQL interviews. Good luck!
— 131 —
CHAPTER 7
Aggregate Functions
In this chapter, we will explore the world of aggregate functions and delve
into the key topics that form the foundation of effective data summarization
and analysis. Whether you are a database professional, a data analyst, or a
developer, understanding aggregate functions is crucial for gaining valuable
information from databases.
— 132 —
AGGREGATE FUNCTIONS
— 133 —
GROKKING THE SQL INTERVIEW
In the following chapters, we will delve into each of these topics, providing
you with in-depth knowledge and practical insights into aggregate functions
and their significance in modern database management and data analysis.
Let's embark on this journey into the realm of aggregate functions and
uncover their true potential in data-driven decision making!
— 134 —
AGGREGATE FUNCTIONS
Below are 20+ common questions on aggregate functions, along with their
answers:
Question 1
What are aggregate functions in SQL, and what is their purpose?
Aggregate functions are SQL functions used to perform calculations on sets
of data and return a single value as a result. Their purpose is to summarize,
group, or analyze data in queries.
Question 2
List some common aggregate functions in SQL.
Common aggregate functions include SUM, COUNT, AVG, MIN, MAX,
and others.
Question 3
How do you use the COUNT function to count the number of rows in
a table?
To count the number of rows in a table, you can use the COUNT function
with the SELECT statement and specify the column or use COUNT(*) to
count all rows.
Question 4
What is the difference between the COUNT function with DISTINCT
and without DISTINCT?
COUNT function with DISTINCT counts only the distinct (unique) values,
while without DISTINCT, it counts all occurrences, including duplicates.
— 135 —
GROKKING THE SQL INTERVIEW
Question 5
How can you use the SUM function to calculate the total value of a
specific column?
To calculate the total value of a specific column, use the SUM function with
the SELECT statement and specify the column.
Question 6
How do you calculate the average (mean) of a column using the AVG
function?
To calculate the average of a column, use the AVG function with the
SELECT statement and specify the column.
Question 7
Explain the usage of the GROUP BY clause with aggregate functions.
The GROUP BY clause is used to group rows based on specified columns.
When used with aggregate functions, it calculates aggregates for each group.
Question 8
How does the HAVING clause differ from the WHERE clause when
using aggregate functions?
The WHERE clause filters individual rows before grouping, while the
HAVING clause filters the result of the GROUP BY operation.
Question 9
What is the purpose of the ROLLUP operator in aggregate functions?
The ROLLUP operator is used to generate multiple levels of subtotal and
grand total summaries in a single query.
— 136 —
AGGREGATE FUNCTIONS
Question 10
Can you use multiple aggregate functions in a single SQL query?
Yes, you can use multiple aggregate functions in a single query to generate
comprehensive data summaries.
Question 11
How do you handle NULL values when using aggregate functions?
Aggregate functions generally ignore NULL values. You can use the
COALESCE or NULLIF functions to manage NULLs before using aggregate
functions.
Question 12
What is string aggregation, and how can you achieve it in SQL?
String aggregation involves concatenating strings from multiple rows into a
single string. In SQL, you can use functions like GROUP_CONCAT
(MySQL), STRING_AGG (PostgreSQL), or LISTAGG (Oracle) for this
purpose.
Question 13
How do aggregate functions interact with different types of joins?
Aggregate functions can be used in combination with various types of joins
(INNER JOIN, LEFT JOIN, etc.) to analyze and summarize related data.
Question 14
Can you use aggregate functions with window functions?
Yes, aggregate functions can be used in combination with window functions
for advanced data analysis and ranking.
— 137 —
GROKKING THE SQL INTERVIEW
Question 15
How do you optimize the performance of queries using aggregate
functions on large datasets?
To optimize performance, you can use appropriate indexes, minimize data
retrieval, and ensure the use of efficient query plans.
Question 16
What is the purpose of the GROUPING SETS clause in aggregate
functions?
The GROUPING SETS clause allows you to specify multiple grouping sets
within a single query, enabling you to generate different levels of aggregates.
Question 17
How do you apply aggregate functions in real-world scenarios like
sales analysis or financial reporting?
In sales analysis, aggregate functions can be used to calculate total revenue,
average sales, and other metrics. In financial reporting, they can summarize
financial data like expenses, revenue, and profits.
Question 18
How are aggregate functions used in data visualization and reporting
tools?
Aggregate functions play a crucial role in data visualization tools, where they
summarize data for charts, graphs, and dashboards.
Question 19
Can you create custom aggregate functions in SQL?
Some database systems allow you to create custom user-defined aggregate
functions, extending the functionality beyond built-in aggregates.
— 138 —
AGGREGATE FUNCTIONS
Question 20
Explain how you use aggregate functions with NoSQL databases.
NoSQL databases may not support traditional aggregate functions, but they
provide similar capabilities through specialized aggregation mechanisms or
MapReduce paradigms.
— 139 —
CHAPTER 8
Stored Procedure
In this chapter, we will explore the realm of stored procedures and delve
into the key topics that form the foundation of efficient database
management. Whether you are a database professional or a developer,
understanding stored procedures is essential for effective data manipulation
and application development.
— 140 —
STORED PROCEDURE
— 141 —
GROKKING THE SQL INTERVIEW
— 142 —
STORED PROCEDURE
In the following chapters, we will delve into each of these topics, providing
you with in-depth knowledge and practical insights into stored procedures
and their significance in modern database management and application
development. Let's embark on this journey into the realm of stored
procedures and uncover their true potential in the world of databases!
Below are some important questions on stored procedures along with their
answers for interviews:
Question 1
What is a stored procedure?
A stored procedure is a pre-compiled and reusable database object that
contains a group of SQL statements. It is stored in the database and can be
executed by calling its name.
Question 2
What are the advantages of using stored procedures?
Some advantages of using stored procedures include:
— 143 —
GROKKING THE SQL INTERVIEW
Question 3
How do you create a stored procedure in SQL?
To create a stored procedure in SQL, you use the CREATE PROCEDURE
statement, followed by the procedure name, input parameters (if any), and
the SQL statements that define the procedure.
Question 4
How do you call a stored procedure from SQL or an application?
You can call a stored procedure using the EXECUTE or EXEC statement in
SQL, or by invoking it from application code using the appropriate database
library or framework.
Question 5
What are input and output parameters in stored procedures?
Input parameters allow you to pass values to the stored procedure when it
is called. Output parameters allow the procedure to return values back to
the caller.
Question 6
How do you handle errors in a stored procedure?
Errors in stored procedures can be handled using TRY...CATCH blocks (in
SQL Server) or EXCEPTION blocks (in PostgreSQL and other databases) to
catch and handle exceptions gracefully.
Question 7
Can a stored procedure call another stored procedure?
Yes, stored procedures can call other stored procedures, either within the
same database or in different databases.
— 144 —
STORED PROCEDURE
Question 8
What are the different types of parameters that a stored procedure can
have?
A stored procedure can have input parameters, output parameters, and
parameters that serve both as input and output (INOUT).
Question 9
How do you pass multiple values to a stored procedure as a single
parameter?
To pass multiple values as a single parameter, you can use techniques like
passing comma-separated values, XML, or JSON data.
Question 10
What is dynamic SQL, and when is it useful in stored procedures?
Dynamic SQL involves generating and executing SQL statements at runtime
within a stored procedure. It is useful when the structure of the query needs
to change dynamically based on user inputs or conditions.
Question 11
How can you improve the performance of a stored procedure?
Performance of a stored procedure can be improved by optimizing the
underlying SQL queries, using appropriate indexes, and parameterizing
queries to avoid SQL injection.
Question 12
How do you grant permissions to execute a stored procedure?
You can grant execution permissions to a stored procedure using the
GRANT EXECUTE statement to specific users or roles.
— 145 —
GROKKING THE SQL INTERVIEW
Question 13
Can a stored procedure return multiple result sets?
Yes, a stored procedure can return multiple result sets, and they can be
accessed one by one from the application code.
Question 14
What is the difference between a stored procedure and a user-defined
function?
Stored procedures are primarily used for data manipulation and complex
logic, while user-defined functions are used to return scalar values or table-
valued results.
Question 15
How do you debug a stored procedure?
You can use debugging tools provided by the database management system
or print debugging information to identify issues in the stored procedure
code.
Question 16
What are nested stored procedures?
Nested stored procedures are procedures that call other stored procedures
within their logic.
Question 17
How do you view the source code of a stored procedure?
The source code of a stored procedure can usually be viewed using system
catalogs or metadata views specific to the database system.
— 146 —
STORED PROCEDURE
Question 18
Can you use transactions within a stored procedure?
Yes, stored procedures can include transactions to ensure data integrity and
consistency during complex operations.
Question 19
How do you drop a stored procedure?
You can drop a stored procedure using the DROP PROCEDURE
statement.
Question 20
What are the best practices for writing efficient and maintainable
stored procedures?
Some best practices include using meaningful and consistent naming
conventions, modularizing procedures, handling errors, and commenting
the code for clarity.
These questions cover various aspects of stored procedures and their usage
in database management and application development. Preparing for these
questions will help you demonstrate your understanding of stored
procedures and their practical implementation in different scenarios.
— 147 —
CHAPTER 9
In this chapter, we will explore the world of database triggers and views,
uncovering their significance in data management and query optimization.
Whether you are a database administrator, a data analyst, or a developer,
understanding triggers and views is essential for streamlining database
operations and enhancing data accessibility.
— 148 —
TRIGGERS AND VIEWS
— 149 —
GROKKING THE SQL INTERVIEW
Database triggers and views are valuable tools that facilitate data
management, simplify query execution, and automate tasks within a
database system. By mastering the concepts and practices of triggers and
views, you can optimize database performance, enhance data accessibility,
and ensure data integrity in various database applications.
In the following sections, we will delve into each of these topics, providing
you with comprehensive knowledge and practical insights into database
triggers and views. Let's embark on this journey into the realm of
triggers and views and understand their significance in modern database
management!
— 150 —
TRIGGERS AND VIEWS
Below are 20+ common questions on database triggers and views, along
with their answers:
Question 1
What is a database trigger, and what events can trigger its execution?
A database trigger is a special type of stored procedure that automatically
executes in response to specific events, such as INSERT, UPDATE, or
DELETE operations on a table.
Question 2
How do you create a trigger in SQL?
To create a trigger in SQL, you use the CREATE TRIGGER statement,
specifying the trigger name, the trigger timing (BEFORE or AFTER), the
triggering event (INSERT, UPDATE, or DELETE), and the action to be
performed in response to the event.
Question 3
What is the difference between a BEFORE trigger and an AFTER trigger?
A BEFORE trigger executes before the triggering event, allowing you to
modify data before it is actually inserted, updated, or deleted. An AFTER
trigger executes after the triggering event, acting on the data after the change
has taken place.
Question 4
In what scenarios are triggers commonly used?
Triggers are commonly used to enforce data integrity constraints, maintain
audit trails, implement complex business logic, and propagate changes to
related tables.
— 151 —
GROKKING THE SQL INTERVIEW
Question 5
How can you enable or disable a trigger?
You can use the ALTER TRIGGER statement with the ENABLE or
DISABLE option to enable or disable a trigger.
Question 6
What are some potential drawbacks of using triggers?
Triggers can introduce hidden logic that may be harder to track and debug.
Overuse of triggers can lead to performance issues and make the codebase
harder to maintain.
Question 7
How do you create a view in SQL?
To create a view in SQL, you use the CREATE VIEW statement, providing
a name for the view and the SELECT query that defines the view's data.
Question 8
Can you update data through a view? If yes, what are the restrictions?
Yes, you can update data through a view under certain conditions. The view
must be updatable, meaning it must meet specific criteria, such as having a
single table in the FROM clause, not containing aggregate functions, and
not having certain constructs like GROUP BY or HAVING.
Question 9
What is the difference between a regular view and a materialized view?
A regular view is a virtual table that displays the results of a SELECT query.
A materialized view, on the other hand, physically stores the results of a
query, providing faster data retrieval at the cost of potentially stale data.
— 152 —
TRIGGERS AND VIEWS
Question 10
How can you update a materialized view?
Materialized views are updated automatically based on their refresh settings
(e.g., on demand, at specified intervals, or when underlying data changes).
Question 11
What is a nested view, and how does it differ from a regular view?
A nested view is a view that is based on another view. It creates a layered
abstraction over the data, allowing for more complex queries.
Question 12
In what situations would you use an indexed view?
Indexed views are beneficial when you have complex queries that involve
aggregations or joins on large datasets, as they can significantly improve
query performance.
Question 13
How do views contribute to data security and access control?
Views allow you to grant users access to specific columns or rows, hiding
sensitive data and enforcing security at the database level.
Question 14
Can you combine views with joins in a query?
Yes, you can use views in combination with joins to simplify complex
queries and avoid redundancy in the code.
— 153 —
GROKKING THE SQL INTERVIEW
Question 15
How does recursion work in views, and what is its use case?
Recursive views enable hierarchical querying, where a view refers to itself to
traverse hierarchical data structures, such as organizational charts or bill of
materials.
Question 16
What is the impact of updating the underlying tables on views?
Updating the underlying tables may change the data presented by views.
However, if the views are read-only or restrict certain modifications,
changes to the underlying tables will not be allowed.
Question 17
How can you handle updates on non-updatable views?
You can use INSTEAD OF triggers to handle updates on non-updatable
views, specifying the logic to perform the update on the underlying tables.
Question 18
How do views contribute to data abstraction and query simplification?
Views provide a level of data abstraction by allowing users to interact with
the data using a simplified virtual table, without exposing the complexity of
underlying tables.
Question 19
Can you provide an example of a situation where a trigger would be
useful?
One example is maintaining an audit trail. A trigger could automatically
insert a record into an audit table whenever a specific table is updated.
— 154 —
TRIGGERS AND VIEWS
Question 20
How would you create an indexed view to improve query performance?
To create an indexed view, you would define the view and then use the
CREATE INDEX statement to create an index on the view's columns,
optimizing query execution.
These questions cover a wide range of topics related to database triggers and
views. Preparing for these questions will help you gain a comprehensive
understanding of triggers and views and their practical application in
database design and query optimization.
— 155 —
CHAPTER 10
Normalization
— 156 —
NORMALIZATION
— 157 —
GROKKING THE SQL INTERVIEW
— 158 —
NORMALIZATION
In the subsequent sections, we will delve into each of these topics, providing
you with in-depth knowledge and practical insights into database
normalization and data integrity. Let's embark on this journey into the
realm of database normalization and its significance in building reliable and
efficient databases!
Question 1
What is database normalization, and why is it essential in database
design?
Database normalization is the process of organizing data in a database to
reduce redundancy and data anomalies. It ensures data integrity, minimizes
data duplication, and makes the database more efficient and maintainable.
Question 2
Explain the First Normal Form (1NF) and its requirements.
1NF requires that each table cell contains a single value (atomicity) and that
there are no repeating groups or duplicate rows in the table.
Question 3
How does the Second Normal Form (2NF) differ from 1NF, and what
problem does it address?
2NF addresses the issue of partial dependencies by requiring that non-key
attributes be fully dependent on the entire primary key.
— 159 —
GROKKING THE SQL INTERVIEW
Question 4
What is the Third Normal Form (3NF), and what types of dependencies
does it eliminate?
3NF eliminates transitive dependencies, ensuring that non-key attributes
are dependent only on the primary key and not on other non-key attributes.
Question 5
When should you consider moving to Boyce-Codd Normal Form
(BCNF) instead of 3NF?
BCNF should be considered when a table has overlapping candidate keys
and is in 3NF but still contains anomalies due to functional dependencies.
Question 6
What is Fourth Normal Form (4NF), and what problem does it address?
4NF addresses multi-valued dependencies and ensures that each non-key
attribute is dependent only on the entire primary key, not on subsets of the
primary key.
Question 7
How does Fifth Normal Form (5NF) handle join dependencies?
5NF addresses join dependencies, ensuring that there are no redundant
combinations of attributes in separate tables.
Question 8
What is denormalization, and in what situations is it appropriate to
use?
Denormalization involves deliberately introducing redundancy into a
database to improve query performance. It is suitable for read-heavy
applications or situations where query optimization is crucial.
— 160 —
NORMALIZATION
Question 9
How does normalization help in avoiding insertion, update, and
deletion anomalies?
Normalization reduces or eliminates data anomalies by ensuring that data
is stored logically and without redundancy, preventing data inconsistencies
during insert, update, and delete operations.
Question 10
What is the purpose of functional dependencies in normalization?
Functional dependencies define the relationships between attributes in a
table and form the basis for normalization rules.
Question 11
Can you have multiple candidate keys in a table, and how do they
relate to normalization?
Yes, a table can have multiple candidate keys. They are used to determine if
the table is in the desired normal form and are essential for normalization.
Question 12
How does normalization impact database query performance?
Normalization can improve data integrity but may lead to more complex
joins and potentially slower query performance. Denormalization is
sometimes used to enhance query performance.
Question 13
What are the potential drawbacks of denormalization?
Denormalization can lead to data redundancy, making updates and
maintenance more challenging. It can also increase the risk of data
inconsistencies if not managed properly.
— 161 —
GROKKING THE SQL INTERVIEW
Question 14
In what real-world scenarios is normalization crucial for data
integrity?
Normalization is crucial in scenarios such as e-commerce websites (managing
product catalogs), inventory management systems, and customer relationship
management (CRM) databases (tracking customer interactions).
Question 15
How can you decide on the level of normalization required for a
specific database design?
The level of normalization required depends on the specific requirements of
the application. Striking a balance between normalization and
denormalization is crucial to meet performance and data integrity needs.
Question 16
Can you provide an example of a table that violates the First Normal
Form (1NF)?
A table with repeating groups or multiple values in a single cell violates 1NF.
For example:
Question 17
How would you convert the above table into 1NF?
To convert the table into 1NF, we would split the subjects into separate
rows, each with its own student_id and name:
— 162 —
NORMALIZATION
Question 18
Explain the process of achieving Second Normal Form (2NF) in a
table.
To achieve 2NF, you need to identify and remove partial dependencies by
breaking down the table into separate tables for each subset of dependent
attributes.
Question 19
What is a composite key, and when is it used in normalization?
A composite key is a primary key that consists of more than one attribute.
It is used when no single attribute can uniquely identify a row, but a
combination of attributes can.
Question 20
How can normalization and denormalization be balanced in a
database design?
Balancing normalization and denormalization involves understanding the
specific requirements of the application and deciding which tables need to
be normalized for data integrity and which can be denormalized for
performance optimization.
— 163 —
Chapter 11
Transaction
In this chapter, we will delve into the intricacies of database transactions and
explore how they maintain data integrity while supporting concurrent access
to the database. Understanding transaction concepts and management is vital
for database administrators, developers, and anyone dealing with critical data
operations.
— 164 —
TRANSACTION
— 165 —
GROKKING THE SQL INTERVIEW
Below are some common SQL transaction-related questions that are often
asked in interviews:
— 166 —
TRANSACTION
Question 1
What is a transaction in the context of a database?
A transaction in a database is a logical unit of work that consists of one or
more database operations. It is executed as a single, indivisible unit,
ensuring data integrity and consistency.
Question 2
Explain the ACID properties of transactions.
ACID stands for Atomicity, Consistency, Isolation, and Durability:
Question 3
What are the different states that a transaction can go through during
its lifecycle?
A transaction can go through the following states: Active, Partially
Committed, Committed, Failed, and Aborted.
— 167 —
GROKKING THE SQL INTERVIEW
Question 4
How do you begin a transaction in SQL, and how do you end it?
In SQL, you can begin a transaction using the BEGIN TRANSACTION or
START TRANSACTION statement. To end a transaction, you can use the
COMMIT statement to save the changes or the ROLLBACK statement to
discard the changes.
Question 5
What is the difference between an implicit transaction and an explicit
transaction?
An implicit transaction (auto-commit) is automatically started and
committed for each individual SQL statement, while an explicit transaction
(manually controlled) requires the use of BEGIN TRANSACTION,
COMMIT, and ROLLBACK statements.
Question 6
Explain the concept of savepoints in transactions.
Savepoints allow you to create named markers within a transaction,
enabling you to roll back to a specific point within the transaction if needed.
Question 7
What is a deadlock in transactions, and how can it be resolved?
A deadlock occurs when two or more transactions are blocked, each waiting
for a resource held by the other. Deadlocks can be resolved by setting
transaction priorities, using timeouts, or by automatically rolling back one
of the conflicting transactions.
— 168 —
TRANSACTION
Question 8
What are the different isolation levels in transactions, and how do
they affect data consistency and concurrency?
The isolation levels are Read Uncommitted, Read Committed, Repeatable
Read, and Serializable. Each level provides different levels of data consistency
and concurrency control, with higher levels offering stronger isolation but
potentially impacting performance.
Question 9
How can you handle errors within transactions to ensure proper
rollback and recovery?
Error handling within transactions involves using TRY...CATCH blocks (in
SQL Server) or EXCEPTION blocks (in PostgreSQL) to catch and handle
errors gracefully. If an error occurs, the transaction can be rolled back to
avoid leaving the database in an inconsistent state.
Question 10
What is the purpose of a transaction log in a database, and how does
it aid in recovery?
The transaction log records all changes made to the database during a
transaction. It helps in database recovery by allowing the system to roll back
uncommitted changes or roll forward committed changes in case of a
system failure or crash.
Question 11
How can you implement a two-phase commit (2PC) protocol for
coordinating distributed transactions?
The two-phase commit protocol involves a coordinator and multiple
participants (distributed systems). In the first phase, the coordinator asks all
— 169 —
GROKKING THE SQL INTERVIEW
participants if they can commit the transaction. In the second phase, the
coordinator instructs all participants to either commit or abort the
transaction based on the responses received.
Question 12
What are the best practices for designing and managing transactions
effectively?
Some best practices include keeping transactions short and focused, using
appropriate isolation levels, avoiding long-running transactions, and
ensuring proper error handling and rollback procedures.
These questions cover a wide range of topics related to SQL transactions and
transaction management. Preparing for these questions will help you gain a
thorough understanding of how transactions work, their importance in
maintaining data integrity, and how to handle various transaction-related
scenarios.
— 170 —
CHAPTER 12
Window Function
and CTE
— 171 —
GROKKING THE SQL INTERVIEW
— 172 —
WINDOW FUNCTION AND CTE
11. Nested CTEs: Learning how to nest CTEs inside each other to build
modular and reusable query components.
SQL Window Functions and CTEs are indispensable tools for data analysts,
database administrators, and anyone involved in querying and analysing
data. With these advanced SQL features in your toolkit, you can elevate your
data analysis capabilities, gain deeper insights from your data, and craft
sophisticated queries with ease.
Let's embark on this journey into the world of SQL Window Functions and
CTEs, unlocking the full potential of your SQL querying skills!
— 173 —
GROKKING THE SQL INTERVIEW
Below are 20+ SQL questions covering the topics of Window Functions and
Common Table Expressions (CTEs), along with their answers:
Question 1
What are Window Functions in SQL, and how are they different from
regular aggregate functions?
Window Functions perform calculations across rows within a defined
window or partition of the result set, without collapsing the result into a
single row. Unlike regular aggregate functions, Window Functions preserve
individual rows and return results alongside the original data.
Question 2
Explain the PARTITION BY clause in Window Functions with an
example.
The PARTITION BY clause divides the result set into partitions or groups,
and the Window Function operates on each partition independently.
For example:
In this query, we use the RANK() function with the OVER clause to assign
a rank to each employee within their respective department based on their
salary in descending order. The PARTITION BY department clause
divides the result set into partitions based on the department column, and
— 174 —
WINDOW FUNCTION AND CTE
then the ORDER BY salary DESC specifies the order in which salaries are
ranked within each department. The result will display the employee_id,
department, salary, and salary_rank for each employee.
Question 3
How do you calculate the cumulative salary for each department using
Window Functions?
You can use the SUM() function with the PARTITION BY clause to
calculate the cumulative salary within each department. For example:
In this query, the SUM() function is used with the OVER clause to calculate
the cumulative salary for each employee within their respective department.
The PARTITION BY department clause divides the result set into
partitions based on the department column, and the ORDER BY
employee_id specifies the order in which the cumulative sum is calculated.
The result will display the employee_id, department, salary, and
cumulative_salary for each employee, showing the running total of salaries
within each department.
Question 4
What is the purpose of the ORDER BY clause in Window Functions?
The ORDER BY clause within Window Functions defines the sorting order
of rows in the window frame. It is crucial to ensure consistent and
meaningful results for Window Functions.
— 175 —
GROKKING THE SQL INTERVIEW
Question 5
How can you find the percentage of total sales for each product using
Window Functions?
You can use the SUM() function with the OVER() clause to calculate the
total sales and then divide individual sales by the total to get the percentage.
For example:
In this query, the SUM() function with the OVER clause is used to calculate
the total sum of sales across the entire sales_table. Then, the
percentage_of_total_sales is calculated for each product by dividing its
individual sales by the total sum of sales and multiplying by 100. The result
will display the product_id, sales, and percentage_of_total_sales for each
product, showing its contribution as a percentage of the total sales.
Question 6
What is the difference between ROWS BETWEEN and RANGE
BETWEEN clauses in Window Functions?
The ROWS BETWEEN clause defines the frame based on the number of
rows, whereas the RANGE BETWEEN clauses defines the frame based on
the actual values within the window.
— 176 —
WINDOW FUNCTION AND CTE
Question 7
How do you find the lead and lag values for a column using Window
Functions?
You can use the LEAD() and LAG() functions to access the subsequent and
previous values, respectively, for a given column.
For example:
In this query, the LEAD() and LAG() functions with the OVER clause are
used to retrieve the next and previous salary values for each employee,
respectively. The ORDER BY salary clause specifies the order in which the
values are retrieved. The result will display the employee_id, name, salary,
next_salary, and previous_salary for each employee, showing the adjacent
salaries in ascending order.
Question 8
Explain the concept of Recursive CTEs in SQL.
Recursive CTEs allow you to perform recursive queries, where a query refers
to its own result set in a subsequent iteration. It is commonly used for
hierarchical data, such as organizational structures or nested categories.
— 177 —
GROKKING THE SQL INTERVIEW
Question 9
How do you use a Recursive CTE to traverse a hierarchical table?
You need to define two parts in the CTE: the anchor member that selects
the starting point of recursion, and the recursive member that joins the
result of the previous iteration.
For example:
— 178 —
WINDOW FUNCTION AND CTE
Question 10
How can you use a CTE to simplify a complex query?
CTEs can modularize complex queries by breaking them down into more
manageable components. They make the code more readable, maintainable,
and reusable.
Question 11
When would you use a CTE instead of a subquery?
CTEs are preferable over subqueries when you need to use the same
subquery multiple times within a larger query, as CTEs eliminate the need
to repeat the subquery.
Question 12
How can you use multiple CTEs in a single query?
You can define multiple CTEs separated by commas before the main query.
Each CTE should have a unique name. For example:
— 179 —
GROKKING THE SQL INTERVIEW
Question 13
Can you use a CTE inside another CTE?
Yes, you can nest CTEs inside each other to build more modular and
reusable query components.
Question 14
How do you perform aggregation on CTEs?
You can perform aggregation on the result of a CTE just like any other table.
For example:
The final result will display the product_id and the corresponding
total_sales for each product, providing a summary of the sales data by
product.
— 180 —
WINDOW FUNCTION AND CTE
Question 15
How do you calculate the running total for a column using Window
Functions and CTEs?
You can use the SUM() function with the OVER() clause inside a CTE to
calculate the running total. For example:
Question 16
How can you use a Window Function with a PARTITION BY clause
to find the highest salary in each department?
You can use the PARTITION BY clause along with the MAX() function to
find the highest salary within each department.
— 181 —
GROKKING THE SQL INTERVIEW
For example:
In this query, we use the MAX() function with the OVER clause to calculate
the highest salary within each department using the PARTITION BY
clause. The OVER (PARTITION BY department) part divides the result
set into partitions based on the department column, and then the
MAX(salary) function is applied within each partition to get the highest
salary for that specific department. The result will show each employee's
employee_id, department, salary, and the highest salary within their
respective department as highest_salary.
Question 17
How do you calculate the percentage change in sales compared to the
previous month using Window Functions?
You can use the LAG() function to access the previous month's sales and
then calculate the percentage change.
For example:
— 182 —
WINDOW FUNCTION AND CTE
In this query, we use the LAG() function with the OVER clause to calculate
the percentage change in sales from one month to the next. The LAG(sales)
OVER (ORDER BY month) part fetches the sales value of the previous row
based on the month order, and then the percentage change is calculated as
(sales - previous_sales) / previous_sales * 100.
The result will show each month, the corresponding sales, and the
percentage_change, which represents the percentage change in sales
compared to the previous month.
Question 18
How can you use the RANK() function to find the top N employees
with the highest salary?
You can use the RANK() function along with a Common Table Expression
(CTE) to find the top N employees with the highest salary. Here's how you
can do it:
— 183 —
GROKKING THE SQL INTERVIEW
In this query:
The result will display the details of the top N employees with the highest
salary, based on the specified value of N.
Question 19
Explain the concept of a "rolling sum" using Window Functions.
A "rolling sum" calculates the sum of a column for a specific range of rows,
usually with an ORDER BY clause to define the order of rows. It is useful
for time series analysis or calculating moving averages.
Question 20
How do you use the ROWS BETWEEN clause to calculate a "rolling
sum"?
You can use the ROWS BETWEEN clause with ORDER BY to define the
range of rows for the "rolling sum." For example, to calculate a sum of the
last three rows:
— 184 —
WINDOW FUNCTION AND CTE
In this query, the SUM() function with the OVER clause is used to calculate
the rolling sum of sales over a window of three rows: the current row and
the two preceding rows, ordered by the date column. The result will display
the date, sales, and rolling_sum for each row in the sales_data table,
showing the cumulative sum of sales over the specified window.
Question 21
What are the performance considerations when using Window
Functions and CTEs in SQL?
Using Window Functions and CTEs can impact query performance,
especially when dealing with large datasets. It's essential to consider
appropriate indexing and analyze query execution plans to optimize
performance.
Question 22
How can you use Window Functions and CTEs together in a single
query to perform advanced analytical calculations?
You can use CTEs to prepare the data and then apply Window Functions
on the CTE result set to perform complex calculations. For example:
— 185 —
GROKKING THE SQL INTERVIEW
The SUM() function with the OVER clause is used to calculate the total sum
of sales across the entire cte_sales result set.
Question 23
How do you use Window Functions to find the top N products with
the highest sales, considering ties?
You can use the DENSE_RANK() function to handle ties and rank the
products accordingly. For example:
— 186 —
WINDOW FUNCTION AND CTE
In this query:
Question 24
How can you calculate the average salary for each department while
excluding the highest and lowest salaries using Window Functions?
You can use the RANK() function with PARTITION BY to rank salaries
within each department and then filter out the highest and lowest ranks in
a subquery. For example:
— 187 —
GROKKING THE SQL INTERVIEW
In this query:
The result will display the department and the average salary for
departments where employees' salaries are not the highest or the lowest
within their respective departments.
Question 25
How do you find the top N products with the highest sales for each
month using Window Functions?
You can use the PARTITION BY clause with the RANK() function to rank
products within each month and then filter the results using a subquery.
For example:
— 188 —
WINDOW FUNCTION AND CTE
In this query:
Replace N with the desired number of top products you want to retrieve for
each month.
Question 26
What are the potential use cases for Recursive CTEs, and how do they
work?
Recursive CTEs are useful for working with hierarchical data, such as
organizational charts, bill of materials, or nested categories. They consist of
two parts: the anchor member and the recursive member, and they
repeatedly execute the recursive member until it returns no rows.
— 189 —
GROKKING THE SQL INTERVIEW
Question 27
How can you use a Recursive CTE to traverse a hierarchical table with
unlimited nesting levels?
You can use the UNION ALL operator in the recursive member to
repeatedly join the CTE with itself.
For example:
In this query:
— 190 —
WINDOW FUNCTION AND CTE
Question 28
How can you use the SUM() function with PARTITION BY and
ORDER BY to calculate the moving average of sales for each product?
You can use the SUM() function with PARTITION BY to calculate the sum
of sales within each product and then use the ORDER BY clause to specify
the order of sales.
For example:
— 191 —
GROKKING THE SQL INTERVIEW
In this query:
Question 29
How do you use a CTE to simplify a query that involves multiple
subqueries?
You can create CTEs for each subquery and then reference them in the main
query, making the code more concise and easier to understand.
— 192 —
WINDOW FUNCTION AND CTE
Question 30
How can you use the ROWS UNBOUNDED PRECEDING and ROWS
UNBOUNDED FOLLOWING clauses in Window Functions?
The ROWS UNBOUNDED PRECEDING and ROWS UNBOUNDED
FOLLOWING clauses in Window Functions allow you to specify an
unbounded window frame that includes all rows from the beginning
(preceding) or end (following) of the partition, respectively.
For example, let's say you want to calculate the running total of sales for each
product, considering all previous sales.
You can use the SUM() function with the ORDER BY clause and the
unbounded window frame to achieve this:
— 193 —
GROKKING THE SQL INTERVIEW
Question 31
How do you use a CTE to simplify a complex hierarchical query for an
organizational chart?
You can create a Recursive CTE to traverse the organizational chart and
generate a result set that shows the hierarchical relationships in a more
readable and structured format.
Question 32
What is the difference between the DENSE_RANK() and RANK()
functions in Window Functions?
The DENSE_RANK() function assigns a unique rank to each distinct row,
leaving no gaps in the ranking sequence. On the other hand, the RANK()
function may leave gaps in the ranking sequence when there are ties.
Question 33
How can you use a Window Function to find the percentage of sales
contributed by each product compared to the total sales?
You can use the SUM() function with the OVER() clause to calculate the
total sales and then divide the individual sales by the total to get the
percentage.
For example:
— 194 —
WINDOW FUNCTION AND CTE
In this query:
Question 34
Can you use Window Functions without the PARTITION BY clause?
If yes, what will be the result?
Yes, you can use Window Functions without the PARTITION BY clause.
In this case, the function will treat the entire result set as a single partition,
and the calculation will be performed on the entire set of rows.
Question 35
How can you use the RANK() function with the PARTITION BY
clause to find the top N employees with the highest salary within each
department?
You can use the RANK() function with the PARTITION BY clause to rank
employees based on salary within each department and then filter the results
using a subquery.
— 195 —
GROKKING THE SQL INTERVIEW
For example:
In this query:
Replace N with the desired number to retrieve the top N employees within
each department based on their salary ranks.
— 196 —
WINDOW FUNCTION AND CTE
Question 36
How do you use a Recursive CTE to calculate the total cost of a bill of
materials for a product and all its components?
You can create a Recursive CTE that repeatedly joins the product table with
the component table until there are no more components, and then
calculate the total cost using the SUM() function.
Question 37
What are some common performance optimization techniques for
queries involving Window Functions and CTEs?
Some common performance optimization techniques include proper
indexing, limiting the result set using WHERE clauses before applying
Window Functions, and avoiding unnecessary ordering when not required.
Question 38
How can you use the FIRST_VALUE() and LAST_VALUE() functions
in Window Functions to retrieve the first and last values in a
partition?
The FIRST_VALUE() function returns the first value in the window frame,
while the LAST_VALUE() function returns the last value in the window
frame.
For example:
— 197 —
GROKKING THE SQL INTERVIEW
The query fetches sales data from the sales_table. It calculates two things
for each row:
1. first_sale: The first sale value for each product, ordered by sale date.
2. last_sale: The latest sale value for each product, ordered by sale date.
The result shows product_id, sale_date, sales, first_sale, and last_sale
columns. This gives insight into the first and last sales values for each
product over time.
Question 39
How do you use Window Functions to calculate the difference
between the sales of each product and the average sales of all products?
You can use the AVG() function with the OVER() clause to calculate the
average sales of all products and then subtract the individual sales to find
the difference.
For example:
— 198 —
WINDOW FUNCTION AND CTE
In this query:
Question 40
How can you use Window Functions to calculate the difference in
sales between the current and previous months for each product?
You can use the LAG() function to access the previous month's sales and
then calculate the difference.
For example:
— 199 —
GROKKING THE SQL INTERVIEW
In this query:
1. The LAG() function with the OVER clause is used to calculate the
difference between the sales of the current row and the previous row
within each partition of product_id, ordered by sale_date.
2. The result will display the product_id, sale_date, sales, and
sales_difference columns for each row in the sales_table, showing
the difference in sales between the current row and the previous row
for each product.
These 40+ SQL questions covering Window Functions and Common Table
Expressions (CTEs) will help you build a strong understanding of these
advanced SQL concepts. Practicing these questions will equip you to tackle
interview challenges, analyze data with precision, and optimize your data
manipulation skills. Remember to combine theoretical knowledge with
hands-on practice to become proficient in using these powerful SQL
features effectively.
— 200 —
CHAPTER 13
Deep Dive
I n this section will do a deep dive on popular SQL and Database related
interview questions to understand the underlying concept in detail? This
list include all kind of SQL questions like SQL query, questions based upon
JOIN, Date and Time, and other important SQL Concepts as well questions
form MySQL, Oracle, SQL Server, and PostgreSQL
For example, if you are ranking employees by their salaries then what would
be the rank of two employees of the same salaries? It depends on which
ranking function you are using like row_number, rank, or dense_rank.
— 201 —
GROKKING THE SQL INTERVIEW
employees Shane and Rick have the same salary and has row number 4 and
5, this is random, if you run again, Shane might come 5th.
The rank() and dense_rank() will give the same ranking to rows
that cannot be distinguished by the order by clause, but dense_rank will
always generate a contiguous sequence of ranks like (1,2,3,...), whereas
rank() will leave gaps after two or more rows with the same rank (think
"Olympic Games": if two athletes win the gold medal, there is no second
place, only third).
You can see that we have included two employees with the same salaries i.e.
Shane and Rick, just to demonstrate the difference between row_number,
rank, and dense_rank window function in the SQL server, which is
obvious when there are ties in the ranking.
— 202 —
DEEP DIVE
ROW_NUMBER() Example
It always generates a unique value for each row, even if they are the same
and the ORDER BY clause cannot distinguish between them. That's why it
is used to solve problems like the second-highest salary or nth highest salary,
we have seen earlier.
In the following example, we have two employees with the same salary and
even though we have generated row numbers over the salary column it
produces different row numbers for those two employees with the same salary.
You can see in this example that we have ranked employees based upon their
salaries and each of them has a unique rank even if their salaries are the same
e.g. Shane and Rick have the same salary of 3000 but they got the unique
rank 4th and 5th. It's worth knowing that in the case of a tie, ranks are
assigned on a random basis.
RANK() Example
The rank() function will assign the same rank to the same values i.e. which
are not distinguishable by ORDER BY. Also, the next different rank will not
start from immediately next number but there will be a gap i.e. if 4th and 5th
employees have the same salary then they will have the same rank 4, and 6th
employee which has a different salary will have a new rank 6.
— 203 —
GROKKING THE SQL INTERVIEW
You can see that both Shane and Rick have got the same rank 4th, but the
Sid got the rank 6th, instead of 5 because it keeps the original ordering.
DENSE_RANK() Example
The dense_rank function is similar to the rank() window function i.e.
same values will be assigned the same rank, but the next different value will
have a rank which is just one more than the previous rank, i.e. if 4th and 5th
employee has the same salary then they will have the same rank but 6th
employee, which has different salary will have rank 5, unlike rank 6 as is the
case with rank() function. There will be no gap in ranking in the case of
dense_rank() as shown in the following example:
— 204 —
DEEP DIVE
You can see that both Shane and Rick have the same ranking 4th, but Sid
now has 5th rank which is different than 6th in the earlier example when we
used the rank() function.
select e.*,
row_number() over (order by salary desc) row_number,
rank() over (order by salary desc) rank,
dense_rank() over (order by salary desc) as
dense_rank
from #Employee e
And here is the output which clearly shows the difference in the ranking
generated by rank() and dense_rank() function. This will clear your
doubt about rank, desnse_rank, and row_nubmer function.
— 205 —
GROKKING THE SQL INTERVIEW
You can see the employees Shane and Rick have the same salary 3000 hence
their ranking is the same when you use the rank() and dense_rank()
but the next ranking is 6 which is as per continuous ranking using rank()
and 5 when you use dense_rank(). The row_number() doesn't break
ties and always gives a unique number to each record.
Btw, I ran all three SQL queries on Oracle 11g R2 and, Oracle 12c and it gave
me the same result. So, it seems both Oracle and SQL Server support these
functions and they behave identically.
In the case of dense_rank, both rows in the tie will have the same rank and
there will be no gap. The next different rank will be in sequence.
For example, in the case of VARCHAR, each character takes 1 byte but in
the case of NVARCHAR, each character takes 2 bytes of storage, which
means NVARACHAR is twice as expensive as VARCHAR type.
While we will look into the difference between these two, it's also worth
noting the similarities between them. For example, both VARCHAR and
— 206 —
DEEP DIVE
NVARCHAR are character data types and used to store text or String values.
Both are also variable-length data types, so storage size depending upon
actual data stored.
— 207 —
GROKKING THE SQL INTERVIEW
That's all about the difference between VARCHAR and NVARCHAR data
types in SQL. You should always use the data type that will take less space.
In SQL Server NVARCHAR takes more space than VARCHAR data type,
almost 2x as much space as VARCHAR.
You should use VARCHAR if you know that all your data would be in ASCII
encoding, but if you are going to store Unicode string, like storing data from
different languages. You need to use NVARCHAR to support Unicode data.
NVARCHAR is a must if you intend to support internationalization (i18n)
Well, let's first understand the difference between SQL, T-SQL, and PL/SQL,
and then we will understand the need for these dialects. SQL is standard for
querying, inserting, and modifying data in a relational database. It is
categorized into DDL and DML and is powerful enough to create database
objects e.g. table, view, stored procedure, and can perform CRUD operation
(SELECT, INSERT, UPDATE, and DELETE) query.
— 208 —
DEEP DIVE
In this article, we will see a couple of more differences between SQL, T-SQL,
and PL/SQL to understand them better.
In order to make SQL more powerful and to expand its usage from simple
querying to create complex stored procedures for report generation, XSLT
transformation, and many other functionalities, various database vendors
started adding proprietary features on SQL supported by their platform. These
efforts created different SQL dialects e.g. T-SQL, which is a SQL dialect for
Microsoft SQL Server, and Sybase, PL/SQL which is a SQL dialect for Oracle.
In fact, every database has its own SQL dialect, which comprises features
and keywords only supported in their database e.g. MySQL has the LIMIT
keyword which can be used for pagination or solving problems like second
highest salary, but it will not work on Oracle or Microsoft SQL Server
database. Similarly, PostgreSQL has some features which are not available
to other databases.
It's always recommended to use standard ANSI SQL if it serves your purpose
because query written in ANSI SQL is portable across different database
vendors but if you use a proprietary keyword e.g. TOP in Microsoft SQL
— 209 —
GROKKING THE SQL INTERVIEW
Server, LIMIT in MySQL then you need to change your query when your
application migrate from one database to another.
1. Full form
SQL stands for Structured Query language, T-SQL stands for Transact-SQL
and PL/SQL stands for Procedural Language/SQL.
2. Supported Database
SQL is supported across all database vendors like Oracle, SQL Server,
MySQL, PostgreSQL, IBM DB2, and even lightweight databases like
SQLLite, but T-SQL is only supported in Microsoft SQL Server and Sybase,
and PL/SQL is supported only in Oracle.
3. Performance
Another key difference between SQL and PL/SQL, T-SQL is the
performance improvement by saving database roundtrip. Both PL/SQL and
T-SQL allow grouping of SQL statements which means if your code has 4
SELECT SQL queries then instead of making four round trips to the
database, they can be sent as one single unit to the database and their result
will also come back as one unit.
— 210 —
DEEP DIVE
The output of the query is a single row with attributes resulting from the
expression with names assigned using the aliases e.g.
Language Rank
Java 1
That's all on the difference between SQL, T-SQL, and PL/SQL. Just
remember that both T-SQL and PL/SQL are dialects of SQL, which is the
standard specified by ANSI for managing data in relational databases. T-
SQL is only supported in Sybase and SQL SERVER, while PL/SQL is only
supported in the Oracle database. Though both T-SQL and PL/SQL are
more powerful than SQL and provide several languages construct to do
more with database e.g. conditionals, loops, branching, etc.
— 211 —
GROKKING THE SQL INTERVIEW
Using = or != is perfectly fine if your column has NOT NULL constraint and
you know for sure that there are no NULL values in that column, but it does
contain NULLs then your SQL query will return the incorrect result at times.
This is one of the most common mistakes but at the same time hard to find
SQL bugs if it managed to get into the real environment. In this article, you
will learn the right way to check NULL values in SQL queries using IS NULL
and IS NOT NULL predicates.
Now, the question is, how many records the following query will return?
— 212 —
DEEP DIVE
Many SQL programmers will answer that it will return 2 records, which is
wrong. This query will only return one record, the row with Id=2 as shown
below:
Why? Why the row with NULL Id was was not returned? Because when you
compare something with NULL the result is Unknown, not true or false.
SQL uses three value logic, true, false, and unknown.
In order to check for NULL values, you must use IS NULL or IS NOT NULL
clause. For example, to include the row with Id as NULL, you can modify
your SQL query like
Output
id
2
NULL
You can see that it returned both rows. Remember even comparing NULL
to NULL doesn't work, it also returns unknown e.g. if you try to write the
above query using = operator it will not work as shown below:
You can see it just return 1 row, the row with Id=NULL was not included
again. Always remember null != null in SQL whether it's Oracle, MySQL,
PostgreSQL, or Microsoft SQL Server.
— 213 —
GROKKING THE SQL INTERVIEW
How to test for not null values in SQL? IS NOT NULL Example
Similarly to test for values that are not null, instead of using the!= operator
use IS NOT NULL operator. For example, the following query which we
have written to return all the rows where Id is NOT NULL will not work
because we are using != operator
That's all about the right way to check for NULL values in the WHERE clause
in the SQL query. Don't use = or != operator to compare values if your column
allows NULLs, it may not return what you expect because comparing NULL
with anything else returns Unknown. Instead, you should always use IS NULL
and IS NOT NULL to check for null values in SQL queries.
— 214 —
DEEP DIVE
By using the PARSE function, you can also indicate the culture by using any
culture supported by Microsoft's dot NET framework. For example,
PARSE('7/8/2015' AS DATE USING 'en-US') parses the input
literal as a DATE by using the United States English Culture, similar to 101
formatting style.
— 215 —
GROKKING THE SQL INTERVIEW
If culture is not specified then it will use the current session's effective
language.
4. Syntax
Using CAST:
Using CONVERT:
Using PARSE
5. Examples
Let's see some examples to convert DATE to VARCHAR in Microsoft SQL
Server using the cast(), convert(), and parse function.
— 216 —
DEEP DIVE
Today
Apr 25 2017 6:32AM
— 217 —
GROKKING THE SQL INTERVIEW
Here is the screenshot of executing SQL queries with CAST and Convert in
SQL Server Management Studio:
CurrentDate
2017-12-25 00:00:00.0000000
Price
345.98
— 218 —
DEEP DIVE
You can see that the amount of currency value Euro is parsed correctly
because of German culture, but if you try to change the currency symbol to
$ it will not parse and give you an error as shown below:
But as soon as you change the culture to en-US it will be able to parse the
currency value correctly, but do note down the actual value which is very
different from the german one, that's where culture can make a big difference.
Price
34598.00
Here is one more example of using the PARSE function in SQL Server to
parse String using implicit language setting
Output
2017-04-16 00:00:00.0000000
— 219 —
GROKKING THE SQL INTERVIEW
Here is the screenshot of executing PARSE related SQL queries on the SSMS
tool:
That's all about the difference between CAST, CONVERT, and PARSE in
SQL SERVER. Prefer CAST over CONVERT and PARSE because it's ANSI
standard and your query will be more portable across different database
vendors. I generally prefer CAST for casting between VARCHAR and
NUMERIC type, but I prefer to use CONVERT for converting String literals
into DATE, TIME, and DATETIME types. I don't use PARSE, but it is
something good to know about it.
— 220 —
DEEP DIVE
Another difference between them is that UNION ALL is faster than UNION
but may look slow because it returns more data which takes more time to
travel via the network. The difference between UNION and UNION ALL
can be a tricky SQL question, especially for developers, who have not used
this useful keyword ever.
Since the UNION clause is not as common as a SELECT clause in SQL, it's
usually asked in a telephonic round of programming interviews to check
whether the candidate is comfortable with SQL or not. It's in the same league
of questions like clustered vs non-clustered index or primary vs unique key.
UNION is very different than other SQL commands because it operates on
data rather than columns.
Anyway, the answer to this question is simple, though both UNION and
UNION ALL are used to combine the result of two separate SQL queries on
the same or different table, UNION does not keep a duplicate record (a row
is considered duplicate if the value of all columns is same), while UNION
ALL does.
Since you mostly don't want duplicate rows, UNION is preferred over
UNION ALL in reporting and application development. By the way, you
should keep in mind that UNION ALL performance better than UNION
because it doesn't have to remove duplicates, so no extra work.
This keyword is very well supported by all major databases like Oracle,
Microsoft SQL Server, MySQL, and PostgreSQL. Another thing to keep in
mind is the amount of data returned by UNION ALL; if your database server
is quite far away and you have limited bandwidth, UNION ALL may appear
slower than UNION because of the number of duplicates it returned.
The cost of transferring duplicate rows can exceed the query execution
benefits in many cases. We will see a couple of examples UNION and
— 221 —
GROKKING THE SQL INTERVIEW
UNION ALL in SQL, but before that few things to keep in mind. In order to
combine the results of two queries, they must contain the same number of
columns.
For example, if one query contains 3 columns and the other contains 4
columns then you cannot use UNION or UNION ALL. This is because a row
will only be considered duplicated when all columns will have the same
value, irrespective of the name of the columns themselves.
If you look at the result of the first two select queries, you will see that the
first query returns two rows and the second query returns three rows, where
two rows have exactly the same data as the first query.
Key things to note is that column names are different in both result sets, first
one has emp_id and emp_name, while second data set has customer_id
and customer_name, but most important both dataset has only two
columns. This is a must in order to combine them using UNION and UNION
ALL keywords.
The third query is an example of how to use the UNION clause in SQL, you
can see that the combined result has just three columns, all are unique.
— 222 —
DEEP DIVE
Duplicate columns from the second result set were not included. This is
more like how you do UNION in Set theory, where the final result contains
data from both sets.
The fourth query is how you should use UNION ALL, it contains five rows,
two from the first query and three from the second query. It has not
removed duplicate rows from the second query, that's why you see Ken and
Bob repeating twice.
This example teaches us the core concept that the UNION doesn't depend
upon the column name but the data. You can combine the result of as many
queries as possible until the number of columns in all of them is the same
and the data is from the same set.
Regarding performance, you need to run UNION and UNION ALL with a
large database, containing millions of rows. There you can monitor how
much time both takes and compare them.
Theoretically, UNION ALL should take less time to execute but more time
to transfer data to the client.
— 223 —
GROKKING THE SQL INTERVIEW
— 224 —
DEEP DIVE
1. Combining Results
Both UNION and UNION ALL are used to combine the results of two separate
SQL queries, it could be on the same table or a different table but the data
should be the same.
2. Duplicates
The key difference between UNION and UNION ALL is that the former will
remove duplicates but later will keep them. In other words, UNION is equal
to running distinct on the output of UNION ALL.
3. Execution time
Due to the above difference query execution time of UNION ALL is smaller
than UNION, which means the former runs faster than the latter. So if you
want faster output and don't care about duplicates use UNION ALL.
This is something you can also guess from your existing SQL knowledge and
that's where working on fundamentals pays off.
— 225 —
GROKKING THE SQL INTERVIEW
That's why in practice some time UNION ALL appears slower than UNION
because it returns a lot of data with duplicates which require more time to
travel from database server to client machine. To evaluate the performance
of UNION and UNION ALL case by case.
For example, if the result of query 1 has three columns and the result of
query 2 has two columns then you cannot combine them using the UNION
command.
That's all on the difference between the UNION and UNION ALL
command in SQL. It's one of the useful commands to combine the result of
two SELECT queries when they contain the same data. There are many
practical scenarios where UNION is very useful, for example when you need
to create a list out of different tables containing data from the same set.
The main difference between UNION and UNION ALL is about duplicates,
the former removes it while later keeps it, other differences between them
on performance and networking bandwidth usage can be easily derived by
knowing this difference. Also keep in mind that it is well supported big three
databases like MySQL, Oracle, and SQL Server. Let us know if you have been
asked this question in your SQL interview.
— 226 —
DEEP DIVE
How the database does chooses which scan or seek to use? And how you can
optimize the performance of your SQL SELECT queries by using this
knowledge. In general, there are only two ways in which your query engine
retrieves the data, using a table scan or by using an index.
Which method is used for your particular query depends upon what indexes
are available in that table, what columns you are requesting in your query,
the kind of joins you are doing, and the size of your tables.
If you have a clear understanding of how the index works and how the SQL
engine retrieves data from the disk then you can quickly identify performance
problems and solve them. That's where most of the SQL developers, especially
the Java application developer who write queries and design databases lack.
Btw, if you are not familiar with what is an index and how to create and drop
an index, then I suggest you first go through these free SQL and Database
courses to learn those basics.
— 227 —
GROKKING THE SQL INTERVIEW
You might have heard nasty things about table scans but in truth, it's the fastest
way to retrieve data especially if your table is quite small. It starts being bad
when your table starts growing. You can imagine doing a full table scan in a
table with 4 million rows and a full table scan in a table with just 100 rows.
In a small table, a query engine can load all data in just one shot but in a
large table, it's not possible, which means more IO and more time to process
those data.
Normally, a full table scan is used when your query doesn't have a WHERE
clause, I mean, you want more or less every record from a table like the
following query will use a full table scan:
Btw, if your query is taking too long in a large table then most likely it using
either table scan or index scan. You can see that by enabling an execution
plan like by doing Ctrl + A in Microsoft SQL Server Management Studio
— 228 —
DEEP DIVE
Once the right index is chosen, the SQL Query processor or engine navigates
the tree structure to the point of data that matches your criteria and again
extracts only the records it needs
The main difference between a full table scan and an index scan is that
because data is sorted in the index tree, the query engine knows when it has
reached the end of the current it is looking for. It can then send the query,
or move on to the next range of data as necessary.
For example, the following query, same as above will use Index scan if you
have a clustered index in your table:
This is slightly faster than the table scan but considerably slower than an
index seek which we'll see in the next section.
— 229 —
GROKKING THE SQL INTERVIEW
seek. It is the fastest way to retrieve data in a database. The index seeks are
also a great sign that your indexes are being properly used.
This happens when you specify a condition in WHERE clause like searching
an employee by id or name if you have a respective index.
For example, the following query will use an index seek, you can also
confirm that by checking the execution plan of this query when you run this
on SQL server:
In this case, the Query Optimizer can use an index to directly go to the third
employee and retrieve the data. If you look at the execution plan shown
below, you can see that it uses an index seek using the index created on
EmployeeId.
1. A table scan and an index scan are used when you need to retrieve
all data like 90% to 100% while index seek is used when you need to
retrieve data based upon some conditions like 10% of data.
2. If your query doesn't have a WHERE clause and your table doesn't
have a clustered index then a full table scan is used, if it does have a
clustered index then an index scan is used.
3. Index scan is faster than a table scan because they look at sorted data
and query optimizers know when to stop and look for another
range.
— 230 —
DEEP DIVE
4. Index seek is the fastest way to retrieve data and it comes into the
picture when your search criterion is very specific. Normally, when
you have a WHERE clause in your query and you are using a column
that also has an index, then index seek is used to retrieve data as
shown in the following query:
select * from Employee where Id= 3;
You can confirm that by actually looking at the execution plan for your
query.
In MSSQL management studio, you can see the execution plan by clicking
Ctrl + A and then running your query.
That's all about the difference between table scan, index scan and index
seek in a database. As I told you, there are only two ways to retrieve data in
a database either by using a table scan or by using an index. The latter is
faster in the case of large tables. The choice of the index depends upon
multiple things like the WHERE clause and joins in your table, the columns
you are requesting, the size of tables, etc.
If you feel that your query is slow, you must check the execution plan to
confirm whether it's using index seeks or index scan, or table scan. Then you
can optimize your query by introducing the right index or tuning your query.
— 231 —
GROKKING THE SQL INTERVIEW
With ISNULL(), you can only provide one alternate value but with
COALESCE you can provide more than one e.g. if col1 IS NULL then take
value from column2, if that is NULL then take the default value.
Btw, that's not the only difference, there are three key differences between
ISNULL() and COALESCE() which we will explore in this article.
— 232 —
DEEP DIVE
4. When you use them in a SELECT INTO query then both will
produce a NON-NULL value in the result table if the attribute has
NON NULL constraint but if it doesn't then COALESCE will create
an attribute that allows NULL and ISNULL will create which doesn't
allow NULLs
In the first SQL query, we have used ISNULL and the first data type is INT
but because of its NULL, it also converts 3.00 to INT and performed integer
arithmetic, but COALESCE correctly promotes the 19 to FLOAT and
performed floating-point arithmetic.
2. COALESCE allows multiple values but ISNULL allows only one value
You can provide COALESCE multiple values to use in case the target is
NULL.
— 233 —
GROKKING THE SQL INTERVIEW
DECLARE @x VARCHAR(10)
DECLARE @y VARCHAR(10)
DECLARE @z VARCHAR(10)
DECLARE @a VARCHAR(10)
SELECT @a = 'SQL'
Output
SQL
This flexibility allows you to replace complex case statements with simple
coalesce functions called on SQL Server stored procedures and functions.
See these free SQL and database courses to learn more about it.
3. Length of Result
In the case of COALESCE data type of the result, value determines the type
of COALESCE expression but in the case of ISNULL, it's the type of the first
argument. For example, see the following T-SQL Query:
DECLARE
@a AS VARCHAR(4) = NULL,
@b AS VARCHAR(10) = '1234567890';
SELECT COALESCE(@a, @b) AS [COALESCE], ISNULL(@a,
@b) AS [ISNULL];
Output
COALESCE ISNULL
1234567890 1234
— 234 —
DEEP DIVE
You can see that in the case of COALESCE() the result has type and length
VARCHAR(10) but in the case of ISNULL() is the length of the first value
i.e. length is 4 characters. Another worth noting thing is the use of square
bracket e.g. [ISNULL], we do this when we use any keyword or function as
literal i.e. variable name or column name.
COALESCE(column1, 0) as new_column
vs
ISNULL(column1, 0) as new_column.
Then, both expressions will produce a NOT NULL attribute in result table
if the source is defined as NOT NULL, but in case source attribute allows
NULLs then COALESCE will create an attribute that allows NULL and
ISNULL will create which doesn't allow NULLs.
That's all about the difference between ISNULL and COALESCE in SQL
Server. Generally, it's recommended to stick to standard features unless
there is some flexibility or major performance you get by using a non-
standard feature.
— 235 —
GROKKING THE SQL INTERVIEW
Btw, you must remember these key differences between ISNULL and
COALESCE if you are refactoring code and replacing ISNULL with
COALESCE in your SQL Script.
+----+--------+
| Id | Salary |
+----+--------+
| 1 | 100 |
| 2 | 200 |
| 3 | 300 |
+----+--------+
For example, given the above Employee table, the nth highest salary where
n = 2 is 200. If there is no nth highest salary, then the query should return
null.
1. Accepted Solution
This was my accepted solution for this LeetCode problem:
RETURN (
— 236 —
DEEP DIVE
2. Alternate Solution
This is another solution to find Nth highest salary problem, this was not
accepted by LeetCode compiler but they work fine on Database
BEGIN
RETURN (
);
END
— 237 —
GROKKING THE SQL INTERVIEW
);
For example, given the above Employee table, the nth highest salary where
n = 2 is 200. If there is no Nth highest salary, then the query should return
null. You can see that we have used the above query to find the highest,
second-highest, and third-highest salaries from the employee table.
— 238 —
DEEP DIVE
difference between VARCHAR and CHAR data type. I am sure, you all
have used both of them numerous times but because it's so common many
of us ignore the difference between them and when asked to choose between
VARCHAR and CHAR on interviews, they fail to give a convincing reason.
The difference is not just significant from an interview point of view but also
from a robust database design because an incorrect choice of data type not
only limit what kind of data you can put on but also waste precious space
and makes your query slower, particularly when you have to deal with the
massive amount of data.
The key difference between CHAR and VARCHAR is that the former is a
fixed-length data type while later is a variable-length data type. Yes, the VAR
stands for variable length in VARCHAR. To give you an example, CHAR(10)
is a fixed-length non-Unicode string of length 10, while VARCHAR(10) is a
variable-length non-Unicode string with a maximum length of 10.
This means the actual length will depend upon the data. For example, if you
are going to sore a single character string like "Y" then VARCHAR will take
less space than CHAR because it will adjust depending upon the length of
data. Typically, you would use the char if all data values are 10 characters
and varchar if the lengths vary.
It's better to use the data type that will take less space. In SQL Server 2005,
2008, 2012, and 2014, NVARCHAR takes more space than VARCHAR data
type, almost 2x as much space as VARCHAR.
— 239 —
GROKKING THE SQL INTERVIEW
So, use VARCHAR if you know that all your data would be in ASCII
encoding, but if you are going to store Unicode strings like storing data from
different languages, then you need to use NVARCHAR to support Unicode
data.
The size (9000) given to the type 'varchar' exceeds the maximum allowed for
any data type (8000).
Don't confuse length and size here, the length here represents how many
characters a CHAR or VARCHAR variable can take, and size represents the
storage bytes they take to store those characters.
— 240 —
DEEP DIVE
What this means is that the storage size of the CHAR column or variable is
always fixed, as specified initially but the storage size of the VARCHAR
column or variable depends upon actually stored data.
For example, if you create a variable of type CHAR(6) then it will always
take 6 bytes, whether or not you store six characters ( 1 byte per character)
but VARCHAR(6) column can take anything between 2 to 8 bytes. 2 bytes
are additional overhead, and 1 to 6 bytes are actual storage depending upon
how many characters you store.
2.Usage
You should use CHAR when your data is of fixed length, like telephone
numbers, zip code, cc number, ba num, ss numbers, etc.
Another use of CHAR data type is storing boolean columns like 'Y' and 'N'.
You can use the VARCHAR type column to store things that are not of fixed
length like name, comment, etc.
3. Storage
CHAR variables always take the same storage space irrespective of the
number of characters actually stored, while the VARCHAR variable's
storage size depends upon the actual number of characters stored.
— 241 —
GROKKING THE SQL INTERVIEW
4. Space Overhead
VARCHAR data type has an overhead of 2 bytes as compared to CHAR
variables. This means if your data is always fixed length, then storing them
into VARCHAR will take more space than CHAR.
5. Padding
In the case of CHAR, data are padded to make specific characters long, no
padding is done on VARCHAR columns
6. Null
A CHAR column cannot hold a NULL, so behind the scene, SQL will
actually use a VARCHAR field like CHAR(x) NULL column is actually a
VARCHAR(x) column.
7. Reservation
CHAR reserves storage space, VARCHAR doesn't
8. Index
Use of index can fail if you provide wrong data type like in SQL Server when
you have an index over a VARCHAR column and present it a Unicode
String, MSSQL Server will not use the index.
That's all about the difference between CHAR and VARCHAR data types
in SQL. In short, CHAR is a fixed-size data type, while VARCHAR is a
variable-size data type, where actual storage space depends upon an actual
number of characters stored in the column.
You should always use the right data type to minimize storage requirements.
Using incorrect data types not only results in wasted space but also affects
the performance of the SQL query.
— 242 —
DEEP DIVE
You can even use the WHERE clause without HAVING or GROUP BY, as
you have seen many times. On the other hand, HAVING can only be used if
grouping has been performed using the GROUP BY clause in the SQL query.
Another worth noting thing about the WHERE and HAVING clause is that
the WHERE clause cannot contain aggregate functions like COUNT(),
SUM(), MAX(), MIN(), etc but the HAVING clause may contain aggregate
functions.
Another worth noting the difference between WHERE and HAVING clause
is that WHERE is used to impose filtering criterion on a SELECT, UPDATE,
DELETE statement as well as single row function and used before group by
clause but HAVING is always used after group by clause.
If you are starting with SQL, then these are some of the fundamentals you
need to learn, and a good course can help you a lot. If you need a
recommendation, I suggest you join any of these best SQL and database
courses online. This list contains the best courses from Udmey, Coursera,
Pluralsight, and other websites.
— 243 —
GROKKING THE SQL INTERVIEW
4. You can use an aggregate function to filter rows with the HAVING
clause. Because the HAVING clause is processed after the rows have
been grouped, you can refer to an aggregate function in the logical
expression. For example, the following query will display only
courses which have more than 10 students :
— 244 —
DEEP DIVE
— 245 —
GROKKING THE SQL INTERVIEW
That's all about the difference between WHERE and HAVING clause in
SQL. These differences are valid for almost all major databases like MySQL,
Oracle, SQL Server, and PostgreSQL. Just remember that WHERE is used
to filter rows before grouping while HAVING is used to filter rows after
grouping. You can also use the AGGREGATE function along with the
HAVING clause for filtering.
By the way, this is my second post about primary keys, In the last one, we
have seen a comparison of primary key vs unique key, which also happens
to be one of the frequently asked database questions.
— 246 —
DEEP DIVE
Btw, If you are new to SQL and don't understand fundamentals like primary
key, normalization, and basic SQL queries then you can also join these free
SQL courses to learn those SQL fundamentals. It's one of the best resources
to learn SQL online.
— 247 —
GROKKING THE SQL INTERVIEW
Btw, even though you can solve this problem by using a regular query it
becomes tricky when the Interviewer extends this problem to find the Nth
highest salary then you just can't go with regular subquery because of
unlimited nesting.
In this article, I am going to compare correlated subquery with the regular one
of different parameters e.g. their working, speed, performance, and
dependency. I'll also give you a good example of a correlated subquery e.g. the
Nth highest salary problem and explain how exactly it solves the problem.
So, if the interviewer asks you to find the 4th highest salary then there can
only be at most 4 salaries which is equal to or greater than the 4th highest
salary. This is just an example, you can use a correlated subquery to solve
many such problems in the world of data and SQL. In short, here are the
main difference between correlated and non-correlated subqueries in SQL
— 248 —
DEEP DIVE
1. Working
A non-correlated subquery is executed only once and its result can be swapped
back for a query, on the other hand, a correlated subquery is executed
multiple times, precisely once for each row returned by the outer query.
Let's say the first row of employees has a salary of 5000, in this case, e.salary
will be 500 and subquery will be
and subquery will find how many salaries are higher than 5000 if count
return 2 then it's the second-highest salary. This logic needs to be executed
for each row the outer query will process.
— 249 —
GROKKING THE SQL INTERVIEW
2. Dependency
A correlated subquery depends upon the outer query and cannot execute in
isolation, but a regular or non-correlated subquery doesn't depend on the
outer query and can execute in isolation.
From the above example, you can see that a correlated subquery like
SELECT COUNT(Salary) FROM Employee p WHERE p.salary
>= e.salary depends upon outer query because it needs the value of
e.salary, which comes from the table listed on the outer query.
— 250 —
DEEP DIVE
This means if your table has n rows then whole processing will take the n *
n = n^2 time, as compared to 2n times taken by a non-correlated subquery.
This is the reason you should be very careful using a correlated subquery
with large tables e.g. tables with millions of rows because that can take a long
time and could potentially block other jobs and queries from accessing the
table.
In many cases, you can replace correlated subquery with inner join which
would result in better performance. For example, to find all employees
whose salary is greater than the average salary of the department you can
write the following correlated subquery:
Now, you can convert this correlated subquery to a JOIN based query for
better performance as shown below:
— 251 —
GROKKING THE SQL INTERVIEW
By the way, If you have written INNER join using where clause then using
a comparison operator as = will be known as an equijoin. Equi joins or Self-
join is not a formal JOIN or part of the syntax, instead, they are a just
popular way to refer to certain join examples.
One of the best examples of Self Join, I have seen in an SQL query Interview
questions is "How do you find all Employees who are Managers in Employee
table", which is commonly asked along with another popular question how
to find the second highest salary of employee or questions related to joining
three tables in one SQL query.
— 252 —
DEEP DIVE
In this SQL tutorial, we will learn to self-join by example while solving this
SQL query. Btw, If you are new to SQL and don't understand fundamentals
like JOINs or co-related sub-queries then I highly recommend you go
through a comprehensive SQL course like The Complete SQL Bootcamp
by Jose Portilla on Udemy. It's one of the best and also most affordable
courses to learn SQL online.
No doubt that SQL queries are an integral part of any programming job
interview which requires database and SQL knowledge. The queries are also
very interesting to check the candidate's logical reasoning ability.
Earlier, I have shared a list of frequently asked SQL queries from interviews
and this article is an extension of that. I have shared a lot of good SQL-based
— 253 —
GROKKING THE SQL INTERVIEW
problems on that article and users have also shared some excellent problems
in the comments, which you should look at.
Some candidate says that they will find duplicate by using group by and
printing name which has counted more than 1, but when it comes to
deleting this approach doesn't work, because if you delete using this logic
both duplicate and unique row will get deleted.
This little bit of extra detail like row_number makes this problem
challenging for many programmers who don't use SQL on a daily basis. Now,
let's see our solution to delete duplicate rows from a table in SQL Server.
In our table, I have just one column for simplicity, if you have multiple
columns then the definition of duplicate depends on whether all columns
should be equal or some key columns e.g. name and city can be the same for
two unique persons. In such cases, you need to extend the solution by using
those columns on key places e.g. on a distinct clause in the first solution and
on the partition by in the second solution.
— 254 —
DEEP DIVE
Anyway, here is our temp table with test data, it is carefully constructed to
have duplicates, you can see that C++ is repeated thrice while Java is
repeated twice in the table.
-- cleanup
drop table #programming
Once unique rows are copied, delete everything from the original table and
then copy unique rows again. This way, all the duplicate rows have been
removed as shown below.
— 255 —
GROKKING THE SQL INTERVIEW
-- check after
select * from #programming
name
Java
C++
JavaScript
Python
You can see the duplicate occurrences of Java and C++ have been removed
from the #programming temp table.
This way row number will restart as soon as a different name comes up but
for the same name, all rows will get sequential numbers e.g. 1, 2, 3, etc. Now,
it's easy to spot the duplicates in the derived table as shown in the following
example:
— 256 —
DEEP DIVE
Now, you can remove all the duplicates which are nothing but rows with rn
> 1, as done by following SQL query:
delete dups
from (select *, row_number()
over ( partition by name order by name) as rn
from #programming)
dups
WHERE rn > 1
(3 row(s) affected)
Now, if you check the #programming table again there won't be any
duplicates.
This is by far the simplest solution and also quite easy to understand but it
doesn't come to your mind without practicing. I suggest solving some SQL
puzzles from Joe Celko's classic book, SQL Puzzles, and Answers, Second
Edition to develop your SQL sense. It's a great practice book to learn and
master SQL logic.
Example: The CTE stands for common table expression, which is similar to
a derived table and used to the temporary result set that is defined within
the execution scope of a single SELECT, INSERT, UPDATE, DELETE, or
CREATE VIEW statement. Similar to a derived table, CTE is also not stored
as an object and lasts only for the duration of the query.
— 257 —
GROKKING THE SQL INTERVIEW
You can rewrite the previous solution using CTE as shown below:
;with cte
as (select row_number()
over (partition by name order by(select 0)) rn
from #programming)
delete from cte where rn > 1
The logic is exactly similar to the previous example and I am using select 0
because it's arbitrary which rows to preserve in the event of a tie as both
contents the same data. If you are new to CTE then I suggest reading T-SQL
Fundamentals, one of the best books to learn SQL Server fundamentals.
Here is a nice summary of all three ways to remove duplicates from a table
using SQL:
That's all about how to remove duplicate rows from a table in SQL. As I
said, this is one of the frequently asked SQL queries, so be prepared for that
when you go for your programming job interview.
I have tested the query in SQL Server 2008 and they work fine and you might
need to tweak them a little bit depending upon the database you are going
to use like MySQL, Oracle, or PostgreSQL. Feel free to post, if you face any
issue while removing duplicates in Oracle, MySQL, or any other database.
— 258 —
DEEP DIVE
How to find all the customers who have never ordered anything?
Table: Customers.
+----+-------+
| Id | Name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
Table: Orders.
+----+------------+
| Id | CustomerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+
— 259 —
GROKKING THE SQL INTERVIEW
When you join two tables in SQL using a LEFT OUTER JOIN, then a big
table will be created with NULL values in the column which don't exist in
another table.
For example, the big table will have four columns C.Id, C.Name, O.Id,
and O.CustomerId, for Customers who have never ordered anything, the
O.CustomerId will be NULL.
This is a correlated subquery, where the inner query will execute for each
row of the outer query, and only those customers will be returned who have
not ordered anything.
— 260 —
DEEP DIVE
That's all about how to use the EXISTS clause in SQL to find all customers
who have never ordered. If you like to improve your SQL query skills, then
you can also problems given in Joe Celko's classical book SQL Puzzles and
Answers, the 2nd Edition. One of the best books with lots of challenging
questions to test your SQL skill, and if you need some online courses to learn
SQL in-depth or fill the gaps in your SQL knowledge, the following courses
are a good place to start with.
— 261 —
GROKKING THE SQL INTERVIEW
Given below table, write an SQL query to find all duplicate values (Emails)
+----+---------+
| Id | Email |
+----+---------+
| 1 | a@b.com |
| 2 | c@d.com |
| 3 | a@b.com |
+----+---------+
For example, your query should return the following for the above table:
+---------+
| Email |
+---------+
| a@b.com |
+---------+
Here are three ways to solve this problem in SQL query, first by using group
by clause, second by using self-join, and then third by using subquery with
exists clause.
While I agree that this problem can be solved in a different way, but it is also
a perfect example of how you can use the SQL GROUP BY and HAVING
clause.
But, if you are new to the SQL world, it's better to start with a comprehensive
SQL course like The Complete SQL Bootcamp course by Jose Portilla on
Udemy.
That will help you to learn SQL better and quicker, and these kinds of
articles will also make more sense once you have some SQL knowledge
under your belt.
— 262 —
DEEP DIVE
This is also my accepted answer on LeetCode. You can see by using the count
function you can count a number of elements in the group and if your group
contains more than 1 row then it's a duplicate value that you want to print.
— 263 —
GROKKING THE SQL INTERVIEW
Remember to use the keyword distinct here because it will print the
duplicate email as many times it appears in the table. This is also an accepted
solution in Leetcode.
— 264 —
CHAPTER 14
Conclusion
S oftware Engineering and Technical Interview is not easy as you can see
there are a lot of topics to master and a lot of concepts to cover. Given
the importance of SQL in Software Development and Data Science, it
becomes extremely important for any developer or Data Scientist to have a
strong knowledge of Database concepts and SQL.
You can use this book in many ways, for example, you can use it to revise
frequently asked SQL questions and concepts before you for any telephonic
or face-to-face Software Engineering and Data Science interview as well as
to get an idea of essential concepts.
The book covers a lot of ground, including the basics of SQL queries and
database design, as well as more advanced topics like indexes, stored
procedures, triggers, and normalization. Additionally, the book provides
numerous real-world SQL interview questions, giving readers ample
opportunities to practice and refine their skills.
— 265 —
GROKKING THE SQL INTERVIEW
So, if you are preparing for an SQL interview or looking to brush up on your
SQL skills, this book is definitely worth a read.
Once again, thanks for reading this book and I wish it fulfills your
expectations. Don’t forget to leave ratings and give feedback as they help us
to create more such books and make them even better.
All the best for your Software Engineering and Data Science interview.
— 266 —