You are on page 1of 10

DS 5110: Introduction to Data Management and Processing

Sample Midterm Exam (Solutions)

• You have 2.5 hours.

• The exam is closed book.

• You are allowed two letter-sized pages of notes (both sides).

• No electronic devices are allowed.

• Please read all questions carefully before answering them.

• Good luck!

Name:

NEU ID (optional):

1
1 Multiple-Choice Questions (40 points)
Circle ALL the correct choices: there may be more than one correct choice, but there is always
at least one correct choice. NO partial credit: the set of all the correct answers must be checked.
There are 10 multiple choice questions worth 4 points each.

1. Which of the following describes the role of a Data Definition Language (DDL) in a DBMS?

(a) Manages database transactions and ensures data integrity.


(b) Facilitates the querying and retrieval of data from a database.
(c) Defines and manages the database structure.
(d) Manages the user access rights and security roles within a database.

(c). DDL is used for defining and managing the structure of the database objects, like creating,
altering, and deleting tables, indexes, and schemas.

2. The following ERD describes a relation between employees, managers, and departments:

Which of the following statements is true?

(a) An employee can work in more than one department.


(b) Every employee has a manager.
(c) Two employees in the same department can have different managers.
(d) There may be departments with no managers.

(c) and (d).


(a): The employee entity is on the many side of the ternary relation, and the manager and
department entities are on the one side of the relation. This means that each employee has
at most one manager and belongs to at most one department.
(b): The participation of manager in the relationship is partial (there is no double line),
therefore there may be employees with no managers.
(c): There is nothing in the diagram that restricts two employees from the same department
from having different managers.
(d): The participation of managers in the relationship is partial (there is no double line),
therefore there may be departments with no managers.

3. Suppose that S(A, ...) is a relation containing m rows and T (B, ...) is a relation containing
n rows. Assume that A is the primary key of S and B is a foreign key reference to S(A).
Suppose that we perform an inner join between S and T on S.A = T.B. The maximum
number of rows that can be in the output is:

2
(a) m
(b) n
(c) m + n
(d) mn
(e) max(m, n)

(b). When we perform an inner join S.A = T.B, each row in T can match at most one row
of S (since the values of the primary key A are unique), therefore the maximum number of
rows we could have in the result is n.

4. Assume that we have executed the following SQL statements:


CREATE TABLE s ( a INT , b INT );
CREATE TABLE r ( a INT , b INT , c INT );
INSERT INTO s (a , b ) VALUES (1 ,2);
INSERT INTO s (a , b ) VALUES (3 ,3);
INSERT INTO s (a , b ) VALUES (1 ,2);
INSERT INTO s (a , b ) VALUES (1 ,4);
INSERT INTO s (a , b ) VALUES (3 ,4);
INSERT INTO r (a ,b , c ) VALUES (1 ,2 ,10);
INSERT INTO r (a ,b , c ) VALUES (3 ,3 ,3);
INSERT INTO r (a ,b , c ) VALUES (1 ,7 ,3);
INSERT INTO r (a ,b , c ) VALUES (1 ,2 ,8);
INSERT INTO r (a ,b , c ) VALUES (3 ,3 ,5);
INSERT INTO r (a ,b , c ) VALUES (1 ,1 ,10);
INSERT INTO r (a ,b , c ) VALUES (2 ,2 ,10);

What will be the result of the following query:


SELECT c
FROM r
GROUP BY c
HAVING COUNT ( c ) = ( SELECT ( COUNT ( DISTINCT a ))
FROM s );

(a) 1
(b) 3
(c) 5
(d) 10

(b)

5. Which of the following statements about foreign keys is true?

(a) A foreign key always references a primary key in another table.


(b) A table can have multiple foreign keys.
(c) Foreign keys prevent actions that would destroy links between tables.
(d) A foreign key cannot have NULL values.

3
(b) and (c).
(a): While it is common for a foreign key to reference a primary key in another table, it can
also reference a unique key in another table.
(b): A table can have only one primary key, but it can have multiple foreign keys referencing
different tables.
(c): Foreign keys establish referential integrity constraints, ensuring that relationships be-
tween tables are maintained. They prevent actions that would create inconsistencies in linked
data, such as deleting a record that is referenced by another table.
(d): A foreign key can have NULL values unless it is explicitly defined as NOT NULL.

6. Consider the following table:

Employees(EmployeeID, FirstName, LastName, ManagerID)

You are asked to find the entire lineage (all superiors) of employee with id 100 up to the
highest level in the organization. Which SQL statement will retrieve this information?

(a) WITH RecursiveCTE AS (


SELECT EmployeeID , FirstName , LastName , ManagerID
FROM Employees
WHERE EmployeeID = 100
UNION ALL
SELECT e . EmployeeID , e . FirstName , e . LastName , e . ManagerID
FROM Employees e
JOIN RecursiveCTE r ON e . EmployeeID = r . ManagerID
)
SELECT FirstName , LastName FROM RecursiveCTE ;

(b) WITH RecursiveCTE AS (


SELECT EmployeeID , FirstName , LastName , ManagerID
FROM Employees
WHERE EmployeeID = 100
UNION ALL
SELECT e . EmployeeID , e . FirstName , e . LastName , e . ManagerID
FROM Employees e
JOIN RecursiveCTE r ON e . ManagerID = r . EmployeeID
)
SELECT FirstName , LastName FROM RecursiveCTE ;

(c) WITH RecursiveCTE AS (


SELECT EmployeeID , FirstName , LastName , ManagerID
FROM Employees
WHERE EmployeeID = 100
UNION ALL
SELECT e . EmployeeID , e . FirstName , e . LastName , e . ManagerID
FROM Employees e
JOIN RecursiveCTE r ON e . EmployeeID = r . EmployeeID
)
SELECT FirstName , LastName FROM RecursiveCTE ;

4
(d) WITH RecursiveCTE AS (
SELECT EmployeeID , FirstName , LastName , ManagerID
FROM Employees
WHERE EmployeeID = 100
UNION ALL
SELECT e . EmployeeID , e . FirstName , e . LastName , e . ManagerID
FROM Employees e
JOIN RecursiveCTE r ON e . ManagerID = r . ManagerID
)
SELECT FirstName , LastName FROM RecursiveCTE ;

(a) In this option, the recursive CTE starts with the specific employee (where EmployeeID =
100) and then, in each subsequent iteration, it fetches the direct manager of the last retrieved
employee by joining on e.EmployeeID = r.ManagerID. This process continues until there are
no more managers for the given employee lineage.

7. Given the schema R = (A, B, C, D, E, G) with the following functional dependencies:


F = {AB → C, AD → E, B → E, BC → A, AD → B, E → G}.
Which of the following decompositions of R preserves the dependencies?

(a) R1 = (A, B, C), R2 = (A, C, D, E), R3 = (A, D, G)


(b) R1 = (A, B, C), R2 = (A, C, D, E), R3 = (B, D, E, G)
(c) R1 = (A, B, C, D), R2 = (A, D, E, G)
(d) R1 = (A, B, C, D), R2 = (B, D, E, G)

(d).
(a): The dependencies AD → B and E → G are not preserved.
(b): The dependency AD → B is not preserved.
(c): The dependency B → E is not preserved.

8. What normal form the following table is in?

OrderId CustomerId CustomerName ProductId ProductName


1 1001 Alice P01 Apple
2 1001 Alice P02 Banana
3 1002 Bob P03 Cherry

(a) 1NF
(b) 2NF
(c) 3NF
(d) BCNF

(a) and (b). The table is in 1NF since every attribute in the table is atomic, and it is also in
2NF since it does not have any non-prime attribute that is functionally dependent on a subset
of the primary key (the primary key consists only of one column OrderId). However, it is not
in BCNF or 3NF, since the functional dependency ProductId → ProductName depends on
ProductId, which is not a candidate key of the table.

5
9. Which of the following is true about foreign keys?

(a) A foreign key cannot be null.


(b) A foreign key must be unique.
(c) A foreign key value must match an existing value in the parent table.
(d) A foreign key can be used to automatically update related records in the child table
when the referenced key in the parent table is updated.

(c) and (d)

10. What is the primary use of a View in SQL?

(a) To speed up the performance of database queries.


(b) To provide a virtual table on the result-set of an SQL statement.
(c) To impose additional security on database transactions.
(d) To abstract the complexity of underlying database schemas.

(b) and (d)

11. What is the purpose of a cursor object when working with databases in Python?

(a) Create and manage database connections.


(b) Store tables from the database in memory for fast access.
(c) Execute SQL queries and fetch the results from the database.
(d) Define the database schema.

(c)

6
2 ERD (15 points)
Design a system for movie rentals that manages movie collections, customer rentals, payments, and
employee interactions. Draw an ERD using the provided entities and their relationships. Mark
primary keys, and indicate the cardinality and participation (total/partial) of relationships.
System description:

• A movie has id, title, release year, and a director.

• A director has an id, first name, last name, and birth date.

• Each movie has one director, but a director can direct multiple movies.

• A customer has id, first name, last name, address, and email.

• A movie can be rented multiple times by different customers at different times. For each
rental, the system stores the rental date, due date, and return date.

• Each rental is associated with a payment. A payment has amount and date.

• An employee has id, first name, last name, and hire date.

• A rental can be processed by only one employee, but an employee can process multiple rentals.

Figure 1: ERD for the movie rental system

7
3 Data Normalization (15 points)
Consider the relational schema R(A, B, C, D, E, G) with the functional dependencies:
F = {AB → C, C → A, BC → D, ACD → B, D → EG, BE → C, CG → BD, CE → AG}.

1. Is the dependency BD → ACG in F + ?

2. Find all the candidate keys in R.

3. Indicate all BCNF violations in R.

4. Decompose R into a set of relations that are in BCNF.

1. BD → ACG is in F + , since from D we can get EG (using D → EG), from B and E we


can get C (using BE → C), and from C we can get A (using C → A). Therefore, BD is a
candidate key of R.

2. The candidate keys are AB, BC, BD, BE, CD, CE, CG.

3. The BCNF violations are: C → A, D → EG.

4. We first split on C → A, and get the relations R1 = (A, C) and R2 = (B, C, D, E, G). The
relation R2 is not in BCNF, because the dependency D → EG violates the BCNF (D is not
a candidate key in R2 ). Therefore, we split it into R21 = (D, E, G) and R22 = (B, C, D).
Both R21 and R22 are in BCNF. The final relations are: R1 = (A, C), R21 = (B, C, D), R22 =
(D, E, G).

8
4 SQL (30 points)
You are given the following schemas of a database for a coaching club.

coach(coach_id, name, e_mail, from_date, hourly_rate)


types(type_name, description)
coaches(coach_id, type_name)
clients(client_id, address, mobile)
training_program(client_id, start_date, coach_id, type_name, hours)

Description of the relations:

• The relation coach contains data on coaches that work in the club. Each coach has an id,
name, e-mail address, date of starting his/her job as a coach (from_date) and hourly rate.

• The relation types contains data on the coaching types offered in the club, including the type
name (e.g., ”life coaching”, ”career coaching”, etc.) and description.

• The relation coaches describes which coaching types are offered by each coach. Each coaching
type has at least one coach who offers it.

• The relation clients contains data on clients of the club. Each client has an id, name,
address and mobile phone. Each client has at least one training program.

• The relation training_program contains data on the training programs of the clients. For
each client, it stores the starting date of the program (start_date), id of the coach, the
coaching type (type_name), and the number of total hours planned for the program. The
cost of the program is calculated based on the number of hours and the hourly rate of the
selected coach.

Write the following queries in SQL (5 points for each query):

1. Find all the clients that have been trained in a coaching type whose description contains the
word ”life”.
SELECT tp . client_id
FROM training_program AS tp
JOIN types AS t
ON tp . type_name = t . type_name
WHERE t . description LIKE '% life % ';

2. Find pairs of different clients who started their training at the same day with the same coach.
The result columns should include the ids of the customers and the starting date of the
training. Each such pair should appear only once in the result.
SELECT t1 . client_id , t2 . client_id , t1 . start_date
FROM training_program AS t1
JOIN training_program AS t2
ON t1 . client_id < t2 . client_id
WHERE t1 . start_date = t2 . start_date AND t1 . coach - id = t2 . coach - id ;

3. Find all the clients who have never been trained by a coach named Levi.

9
SELECT client_id
FROM clients
WHERE client_id NOT IN (
SELECT tp . client_id
FROM training_program AS tp
JOIN coach AS c
ON tp . coach_id = c . coach_id
WHERE c . name = ' Levi '
);

4. Find for each client the total amount he/she has to pay for all his/her training programs.
SELECT tp . client_id , SUM ( tp . hours * c . hourly_rate ) AS total_pay
FROM training_program AS tp
JOIN coach AS c
ON tp . coach_id = c . coach_id
GROUP BY tp . client_id ;

5. Find customers who have been trained in all the coaching types offered by the club.
SELECT c . client_id
FROM clients AS c
WHERE NOT EXISTS (
SELECT *
FROM types
WHERE type_name NOT IN (
SELECT tp . type_name
FROM training_program AS tp
WHERE tp . client_id = c . client_id
)
);

6. Find customers who have had at least 3 different training programs, and all their training
programs have been carried out by the same coach.
SELECT client_id
FROM training_program
WHERE client_id NOT IN (
SELECT t1 . client_id
FROM training_program AS t1
JOIN training_program AS t2
ON t1 . client_id = t2 . client_id
WHERE t1 . coach_id <> t2 . coach_id
)
GROUP BY client_id
HAVING COUNT (*) >= 3;

10

You might also like