Professional Documents
Culture Documents
Docs Will Be DOCs
Docs Will Be DOCs
PART IV
AND SOLUTIONS
Database Systems: Instructor’s Guide – IV
Part 1 Background
4
Chapter 1 Introduction to Databases
4
Chapter 2 Database Environment
7
Chapter 5 Relational Algebra and Relational Calculus
10
Chapters 6 – 8 SQL
18
Chapter 9 Object-Relational DBMSs
39
43
Chapters 12–13 (Enhanced) Entity–Relationship Modeling
45
Reliable Rentals Case Study
48
Chapters 14 - 15 Normalization
50
Part 4 Methodology
69
Chapters 16 - 19 Methodology – Conceptual, Logical, and Physical Database Design
69
Case Study 1 - Adult Education Department
Database Systems: Instructor’s Guide – IV
69
71
74
79
85
97
104
115
Chapter 22 Transaction Management
117
Assignment – Transaction Management Design and Implementation
120
121
130
Case Study 1 – Real Estate Agency
130
Database Systems: Instructor’s Guide – IV
133
136
139
142
145
150
151
Assignment – Distributed Database System Implementation
154
155
Case Study – Library System
156
172
Case Study 1 – Cornucopia Ltd
172
175
177
180
181
Chapter 30 Semistructured Data and XML
182
2
Chapters 33 - 34 OLAP and Data Mining
3
Database Systems: Instructor’s Guide – IV
Part 1 Background
Chapter 1 Introduction to Databases
1.1 A database management system provides a number of facilities that will vary from system to
system. Describe the type of facilities you might expect, especially those that aid the initial
implementation of a database and its subsequent administration.
Initially, the type of facilities expected should be described. These include: data storage and
retrieval, concurrency control mechanism, authorization services, integrity mechanisms and
transaction support. The focus is then on facilities such as data definition, which is stored in the
catalog, authorization control to manage the users. View definition, transaction support and
integrity controls also aid subsequent administration.
1.2 Discuss the problems that arise when organizations rely upon multiple computerized file systems
to store data.
Discuss the advantages and disadvantages in using a database management system to carry out
the same functions.
For the first part the answer should cover broadly all the problems associated with file-based
systems, such as: data duplication, data inconsistency, problems with validation and security,
fragmentation of data.
For the second part, the focus is on a DBMS to support the same operations. Advantages include:
providing all the data structuring facilities, security mechanisms, integrity mechanisms, backup
and recovery facilities, minimal redundancy, data sharing, data independence. Disadvantages
include: cost, size, complexity.
1.3 Applications built around file management systems have often been used to satisfy user
requirements. Discuss the problems that arise with such systems, and what advantages a database
management system could offer instead.
This answer contains the points made in part one of the previous question and also includes the
advantages contained in part two of the previous question.
1.4 Explain what is meant by a database management system, and contrast it with a File
Management System.
Give a full account of the type of system structure you might expect for a Database Management
System, and outline the type of facilities such a system should provide.
If you were in the position of appraising an application for possible implementation using a
Database Management System, what aspects of the application would you consider with respect
to the advice you might give?
First part: For a DBMS, the emphasis is on the management of a collection of data as a resource
that is accessible to different users for different purposes. In contrast, a File Management System
(FMS) manages a collection of files for a specific purpose. Different FMSs have their own
Database Systems: Instructor’s Guide – IV
collection of files and cannot share data. Additional details should focus on aspects such as data
redundancy, data sharing, data independence.
Second part: A diagram illustrating typical system structure would be useful. It should show the
DBMS as a suite of programs/modules, each with specific functions. Following on from this, the
typical facilities provided can be described.
Third part: Perhaps consider: how many users there are requiring different data, the type of
application it is (such as, interactive with many screens), is it part of a larger set of applications, is
it likely to be extended in the future, are the data relationships complex, are various integrity
controls needed.
Database Systems: Instructor’s Guide – IV
2.1 Discuss the reasons for the three-level architecture for a Database Management System.
A diagram would be useful to illustrate the relationships between the external, conceptual, and
internal levels. The different levels should be explained, and reference should be made to the need
for logical and physical data independence. This should be explained also using examples.
2.2 Security and recovery are two important functions of a Database Management System, which are
also of interest to Database Administration. Various facilities can be provided to aid both these
functions. Explain what these facilities are and with examples, show how Database
Administration could effectively use them.
The examples should be given using the current DBMS that is in use. The focus is on security and
recovery. Security may include authorization mechanisms and associated granting of access
privileges. Views and encryption procedures may also be described. These mechanisms should be
elaborated. Recovery functions include backup procedures and safe storage of the media,
journaling and checkpointing, and the recovery manager. The database administration role should
be linked to their use.
2.3 Explain the purpose of the ANSI/SPARC three-level architecture for a Database Management
System, and describe its function giving examples to illustrate your points. Detail the components
of a Database Management System, explaining in particular the specific functions of the software
modules.
The answer to the first part of this question is similar to that for 2.1. This question specifically
asks for examples to be used, and these should be present in a full answer.
To answer the second part of the question, a diagram would be useful. It may show the whole
environment – users, DBMS software (expanded), data, and implicit hardware. The emphasis is
on the DBMS modules, which need to be explained. These include DML precompiler, Query
processor, DDL compiler, Database manager, and Data dictionary. The explanation should include
how modules interact with others.
2.4 A database management system provides a number of facilities, which will vary from system to
system. Describe the type of facilities you might expect, especially those that aid the initial
implementation of a database and its subsequent administration.
Initially, the type of facilities expected should be described. These include: data storage and
retrieval, concurrency control mechanism, authorization services, integrity mechanisms, and
transaction support. The focus is then on facilities such as data definition, which is stored in the
catalog, authorization control to manage the users. View definition, transaction support and
integrity controls also aid subsequent administration.
Database Systems: Instructor’s Guide – IV
4.1 Describe the main characteristics of the Relational Data Model, including the properties of
relations and the rules for relational integrity.
Relational
– set of tables (as perceived by users)
– rows/columns
– variety of possible storage structures
– no repeating groups
– no links/pointers
– use of join columns.
– no duplicate tuples
– tuples are unordered
– attributes are unordered
– all attribute values are atomic.
Entity Integrity: No attribute participating in the PK of a base relation is allowed to accept null
values.
Referential Integrity: If a base relation R2 includes a FK targeting the PK of R1, then every
value of FK must either be:
– equal to the value of PK in some tuple of R1 or
– wholly null.
4.2 The Relational and CODASYL models are examples of two different approaches by which a
Database Management System will be classified. Make a detailed comparison of each of these
models, indicating clearly the relative advantages and disadvantages of each.
CODASYL
– records in chains/rings
– parent-child links
– many parents for a child
– ordering of all records
– fixed access paths
– potential for automatic referential integrity.
Relational
– set of tables (as perceived by users)
– rows/columns
– variety of possible storage structures
– no repeating groups
– no links/pointers
– use of join columns.
4.3 Describe the difference between a base relation and a view and discuss the main benefits of
using views in a relational database.
A base relation is a named relation, corresponding to an entity in the conceptual schema, whose
tuples are physically stored in the database. A view can be constructed by performing operations
such as the relational algebra selection, projection, join or other calculations on the values of
existing base relations. A view is the dynamic result of one or more relational operations operating
on the base relations to produce another relation. A view is a virtual relation that does not actually
exist in the database but is produced upon request by a particular user, at the time of request.
4.4 From an SQL user’s perspective, does the relational model provide logical and physical data
independence?
Since a user can define views, logical data independence can be achieved by using view definitions
to hide changes in the conceptual schema. Since the SQL user has no knowledge of how the data is
physically represented, relying solely on the relation abstraction for querying, physical data
independence is also achieved.
Database Systems: Instructor’s Guide – IV
Database Systems: Instructor’s Guide – IV
5.1 Choose any four relational algebra operators and explain how each functions.
For example: Select – produces a horizontal subset of a relation. Project produces a vertical subset
of a relation by picking out particular attributes. Product (Cartesian product) produces a result
relation by multiplying one relation by another. If the first relation contained 20 rows and the
second 50, the result relation would contain 20 × 50 rows. Join produces a result relation by
(commonly) joining two relations over the equal value of an attribute common to both relations.
Examples should be given for all operators explained.
5.2 Given two relations R and S, where R contains N 1 tuples and S contains N2 tuples (N2 > N1 > 0),
give the minimum and maximum cardinality for the result relation for each of the following
relational algebra expressions and in each case state any assumptions about the schemas that are
required to make the expression meaningful:
(a) RS
(b) RS
(c) R–S
(d) RxS
(e) a = 1(R)
(f) a(R)
5.3 A relational database contains details about journeys from Paisley to a variety of destinations
and contains the following relations:
Each operator is assigned a unique code (opCode) and the relation operator records the
association between this code and the operator’s name (opName). Each destination has a
unique code (destinationCode) and the relation destination records the association between this
code and the destination name (destinationName), and the distance of the destination from
Paisley. The relation Journey records the price of an adult fare from Paisley to the given
destination by as specified operator, several operators may operate over the same route.
Database Systems: Instructor’s Guide – IV
Formulate the following queries using relational algebra, tuple relational calculus, and domain
relational calculus (the answers to these queries in SQL are given in the next section):
RA: destinationName(Destination)
(d) List the names of all operators with at least one journey priced at under £5.
(e) List the names of all operators and prices of journeys to ‘Ayr’.
Operator))
D.destinationName = ‘Ayr’)}
(f) List the names of all destinations that do not have any operators.
Destination)
5.4 The following tables form part of a database held in a Relational Database Management System:
Formulate the following queries in relational algebra, tuple relational calculus, and domain relational
calculus.
RA: Employee
TRC: {E | Employee(E) }
(3) List the names and addresses of all employees who are Managers.
(4) Produce a list of the names and addresses of all employees who work for the ‘IT’ department.
DRC: {lName, address | (empID, fName, DOB, sex, position, deptNo, deptNo1,
deptName, mgrEmpID) (Employee(empID, fName, lName, address, DOB, sex,
position, deptNo)
Department(deptNo1, deptName, mgrEmpID) (deptNo = deptNo1)
deptName = ‘IT’)}
(5) Produce a list of the names of all employees who work on the ‘SCCS’ project.
DRC: {fName, lName | (empID, address, DOB, sex, position, deptNo, projNo,
projName, deptNo1, empID1, projNo1, hoursWorked)
(Employee(empID, fName, lName, address, DOB, sex, position, deptNo)
Project(projNo, projName, deptNo1) WorksOn(empID1, projNo1,
hoursWorked) (empID = empID1) (projNo = projNo1)
projName = ‘SCCS’)}
Database Systems: Instructor’s Guide – IV
5.5 The following tables form part of a database held in a Relational Database Management System:
where Employee contains details of all employees (pilots and non-pilots) and empNo
is the key.
AirCraft contains details of aircraft and aircraftNo is the key.
Flight contains details of the flights and flightNo is the key.
and Certified contains details of the staff who are certified to fly an aircraft, and
empNo/aircraftNo form the key.
Formulate the following queries in relational algebra, tuple relational calculus, and domain
relational calculus (the answers to these queries in SQL are given in the next section).
(3) List the employee numbers of pilots certified for Boeing aircraft.
(5) List the aircraft that can fly nonstop from Glasgow to New York (flyingRange >
flightDistance).
(6) List the employee numbers of employees who have the highest salary.
RA: To answer this query, we first find all employees who do not have the
highest salary and then subtract these from the original list of employees to
give the list of highest employees.
E1(Employee)
E2(Employee)
E3(E2.empNo(E1 3 E1.salary > E2.salary E2)
empNo(E1) – E3
(7) List the employee numbers of employees who have the second highest salary.
RA: To answer this query, we proceed as above and first find all employees who
do not have the highest salary and then subtract these from the original list
of employees to give the list of highest employees. We next remove the list
of highest paid employees from the original list leaving the second highest
Database Systems: Instructor’s Guide – IV
paid employees together with the rest of the employees. We can then simply
remove the rest of the employees to give the second highest paid employees.
E1(Employee)
E2(Employee)
E3(E2.empNo(E1 3 E1.salary > E2.salary E2)
E4(E2 3 E3)
E5(E2 3 E3)
E6(E5.empNo(E4 3 E1.salary > E5.salary E5)
empNo(E3) – E6
(8) List the employee numbers of employees who are certified for exactly three aircraft.
RA: To answer this query, we first find the employees who are certified for at
least three aircraft, then find the employees who are certified for at least four
aircraft. The difference provides the employees who are certified for exactly
three aircraft.
C1(Certified)
C2(Certified)
C3(Certified)
C4(Certified)
C5(empNo((C1.empNo=C2.empNo=C3.empNo)C1.aircraftNo<>C2.aircraftNo<>C3.aircraftNo) (C1 X C2 X C3)))
C6(empNo((C1.empNo=C2.empNo=C3.empNo=C4.empNo)C1.aircraftNo<>C2.aircraftNo<>C3.aircraftNo<>C4.aircraftNo) (C1 X
C2 X C3 X C4)))
C5 – C6
Chapters 6 – 8 SQL
Advantages
Satisfies ideals for database language
(Relatively) Easy to learn
Portability
SQL standard exists
Both interactive and embedded access
can be used by specialist and non-specialist.
Disadvantages
Impedance mismatch – mixing programming paradigms with embedded access
Lack of orthogonality – many different ways to express some queries
Language is becoming enormous (SQL2 is 6 times larger than predecessor; SQL3 is even larger again)
Handling of nulls in aggregate functions
Result tables are not strictly relational – can contain duplicate tuples, imposes an ordering on
both columns and rows.
(a) (1) List all skills with a charge out rate greater than 60 per hour, in alphabetical
order of description.
SELECT *
FROM Skill
WHERE chargeOutRate > 60
ORDER BY description;
(2) List all staff with the skill description ‘Programmer’ who work in the ‘Special
Projects’ department.
SELECT *
Database Systems: Instructor’s Guide – IV
(3) For all projects that were active in July 1995, list the staff name, project
number and the date and number of hours worked on the project, ordered by
staff name, within staff name by the project number and within project number
by date.
SELECT COUNT(*)
FROM Staff s, Skill k
WHERE s.skillCode = k.skillCode AND
description = ‘Programmer’;
(5) List all projects that have at least two staff booking to it.
SELECT AVG(chargeOutRate)
FROM Skill;
(7) List all staff with a charge out rate greater than the average charge out rate.
(b) Create a view of staff details giving the staff number, staff name, skill description, and
department, but excluding the skill number and charge out rate.
6.3 The following tables form part of a database held in a Relational Database Management System:
(a) (1) List all employees in alphabetical order of surname and within surname, first name.
SELECT *
FROM Employee
ORDER BY lName, fName;
SELECT *
FROM Employee
WHERE sex = ‘F’;
(3) List the names and addresses of all employees who are Managers.
Or
SELECT fName, lName, address
FROM Employee e, Department d
WHERE e.empID = d.mgrEmpID;
(4) Produce a list of the names and addresses of all employees who work for the ‘IT’
department.
(5) Produce a complete list of all managers who are due to retire this year, in alphabetical
order of surname.
SELECT lName
FROM Employee e, Department d
WHERE e.empID = d.mgrEmpID AND
date_part(‘year’,DOB) < date_part(‘year’, DATE(‘2001-10-01’) – 65;
(student does not need to know exact date functions – just general idea)
(6) Find out how many employees are managed by ‘James Adams’.
SELECT COUNT(*)
FROM Employees e1,e2, Department d
WHERE e1.lName = ‘Adams’ AND e1.fName = ‘James’ AND
e1.empID = d.mgrEmpID AND d.deptNo = e2.deptNo;
(7) Produce a report of the total hours worked by each employee, arranged in order of
department number and within department, alphabetically by employee surname.
(8) For each project on which more than two employees worked, list the project number,
project name and the number of employees who work on that project.
results table.
(b) Create a view of employee details for all employees who work on project ‘MIS Development’,
excluding department number.
6.4 The following tables form part of a database held in a Relational Database Management System
for a printing company that handles printing jobs for book publishers:
SELECT pubName
FROM Publisher
ORDER BY pubName;
(2) List all printing jobs for the publisher ‘Gold Press’.
SELECT jobID
FROM BookJob b, Publisher p
WHERE b.pubID = p.pubID AND pubName = ‘Gold Press’;
(3) List the names and phone numbers of all publisher who have a rush job (jobType = ‘R’).
(4) List the dates of all the purchase orders for the publisher ‘Gold Press’.
(5) How many publisher fall into each credit code category?
(6) List all job type’s with at least three printing jobs.
SELECT AVG(price)
FROM Item;
Database Systems: Instructor’s Guide – IV
(8) List all items with a price below the average price of an item.
SELECT *
FROM Item
WHERE price < (SELECT AVG(price) FROM Item);
(b) Create a view of publisher details for all publisher who have a rush printing job,
excluding their credit code.
6.5 The relational schema shown below is part of a hospital database. The primary keys are
highlighted in bold.
SELECT *
FROM Patient
ORDER BY patName
(4) Find the names of all the patients being prescribed ‘Morphine’.
SELECT p.patName
Database Systems: Instructor’s Guide – IV
(5) What is the total cost of Morphine supplied to a patient called ‘John Smith’ ?
(6) What is the maximum, minimum and average number of beds in a ward? Create
appropriate column headings for the results table.
(7) For each ward that admitted more than 10 patients today, list the ward number, ward
type and number of beds in each ward.
(8) List the numbers and names of all patients and the drugNo and number of units of
their medication. The list should also include the details of patients that are not
prescribed medication.
SELECT *
FROM Patient p Left Join Prescribed pr ON pr.patientNo = p.patientNo
5.6 A relational database contains details about journeys from Paisley to a variety of destinations
and contains the following relations:
Each operator is assigned a unique code (opCode) and the relation operator records the
association between this code and the operator’s name (opName). Each destination has a
unique code (destinationCode) and the relation destination records the association between this
code and the destination name (destinationName), and the distance of the destination from
Database Systems: Instructor’s Guide – IV
Paisley. The relation Journey records the price of an adult fare from Paisley to the given
destination by as specified operator, several operators may operate over the same route.
Formulate the following queries using SQL (the answers to these queries in relational algebra,
tuple relational calculus, and domain relational calculus were given in the previous section):
SELECT *
FROM Journey
WHERE price < 100;
SELECT destinationName
FROM Destination;
SELECT destinationName
FROM Destination
WHERE distance < 20;
(d) List the names of all operators with at least one journey priced at under £5.
SELECT opName
FROM Operator o, Journey j
WHERE (o.opCode = j.opCode) AND (j.price < 5);
(e) List the names of all operators and prices of journeys to ‘Ayr’.
(f) List the names of all destinations that do not have any operators.
SELECT destinationName
FROM Destination d
WHERE destinationCode NOT IN (SELECT destinationCode FROM Journey);
6.7 The following tables form part of a database held in a Relational Database Management System:
where Employee contains details of all employees (pilots and non-pilots) and empNo
is the key.
AirCraft contains details of aircraft and aircraftNo is the key.
Flight contains details of the flights and flightNo is the key.
and Certified contains details of the staff who are certified to fly an aircraft, and
empNo/aircraftNo form the key.
Formulate the following queries in SQL (the answers to these queries in relational algebra, tuple relational
calculus, and domain relational calculus were given in the previous section).
SELECT *
FROM Aircraft
WHERE aName = ‘Boeing’;
SELECT *
FROM Aircraft
WHERE (aName = ‘Boeing’) AND (aModel = ‘737’);
(3) List the employee numbers of pilots certified for Boeing aircraft.
SELECT empNo
FROM Aircraft a, Certified c
WHERE (a.aircraftNo = c.aircraftNo) AND (a.aName = ‘Boeing’);
SELECT eName
FROM Aircraft a, Certified c, Employee e
WHERE (a.aircraftNo = c.aircraftNo) AND (e.empNo = c.empNo) AND
(a.aName = ‘Boeing’);
(5) List the aircraft that can fly nonstop from Glasgow to New York (flyingRange >
flightDistance).
SELECT a.aircraftNo
FROM Aircraft a, Flight f
WHERE (from = ‘Glasgow’ to = ‘New York’) (flyingRange > flightDistance);
(6) List the employee numbers of employees who have the highest salary.
SELECT empNo
FROM Employee e1
WHERE e1.salary = (SELECT MAX(e2.salary)
FROM Employee e2);
Database Systems: Instructor’s Guide – IV
(7) List the employee numbers of employees who have the second highest salary.
SELECT empNo
FROM Employee e1
WHERE e1.salary = (SELECT MAX(e2.salary)
FROM Employee e2
WHERE e2.salary <> (SELECT MAX(e3.salary)
FROM Employee e3);
(8) List the employee numbers of employees who are certified for exactly three aircraft.
SELECT c1.empNo
FROM Certified c1, Certified c2, Certified c3
WHERE (c1.empNo = c2.empNo) AND (c2.empNo = c3.empNo) AND
(c1aircraftNo <> c2.aircraftNo) AND (c2.aircraftNo <> c3.aircraftNo) AND
(c3.aircraftNo <> c1.aircraftNo)
EXCEPT
SELECT c4.empNo
FROM Certified c4, Certified c5, Certified c6, Certified c7
WHERE (c4.empNo = c5.empNo) AND (c5.empNo = c6.empNo) AND
(c6.empNo = c7.empNo) AND (c4.aircraftNo <> c5.aircraftNo) AND
(c4.aircraftNo <> c6.aircraftNo) AND (c4.aircraftNo <> c7.aircraftNo) AND
(c5.aircraftNo <> c6.aircraftNo) AND (c5.aircraftNo <> c7.aircraftNo) AND
(c6.aircraftNo <> c7.aircraftNo)
Database Systems: Instructor’s Guide – IV
Connect to the database ‘OracleDB’ with the user name ‘SalesTracking’ and password
‘sales’;
CONNECT SalesTracking/sales@OracleDB
SHOW user;
DESC Branch;
8.2 Assume that the following table is created and stored in your database:
UPDATE Staff
SET salary = salary * 1.05
WHERE post = ‘Manager’;
(b) Remove the records of all 'salesmen' from the Staff table;
(c) List the Staff table tablespace name, pctfree, and pctused.
8.3 What is a sequence? Write an SQL statement to create a sequence that starts from 10 and is
incremented by 10 up to a maximum value of 10000. The sequence should continue to
generate values after reaching its maximum value.
A sequence is a database object that is used to create and maintain a sequence of numbers for
tables.
CREATE SEQUENCE StaffSeq
START WITH 10
INCREMENT BY 10
MAXVALUE 10000
CYCLE;
8.4 (a) What is a ‘table’ and what is a ‘tablespace’ in Oracle9i? Write an SQL statement to
create a tablespace, say MyTableSpace, with default storage. You can use a datafile
called ‘MyDataFile.dbf ’ for this tablespace.
(b) Explain what is a DUAL table, where is it stored, and what is it useful for? Give an
example of its use.
A DUAL is a table with one column called DUMMY and contains one row with
a value 'X'. It is automatically created by Oracle along with the data dictionary.
DUAL is stored in the schema of the user SYS, but is accessible by the name
DUAL to all users.
DUAL is useful for computing a constant expression with the SELECT
statement. Because DUAL has only one row, the constant is returned only once.
8.5 (a) What is a named block in PL/SQL and how many types does PL/SQL support?
Named blocks are PL/SQL subprograms. PL/SQL has two types of subprograms:
procedures and functions.
Database Systems: Instructor’s Guide – IV
Named block general structure
[DECLARE
Declaration statement]
BEGIN
Executable statements
[EXCEPTION
Exception handler statements]
END;
(c) Assume that the following tables are part of a Company database schema
Create a PL/SQL procedure object in your schema. Using a Cursor the procedure code
should be able to retrieve records of all customers and check their credit limit. If a
customer credit limit is greater than 100000 the procedure should insert a record in the
table HighCredits.
BEGIN
Check_Credit;
END;
8.6 Assuming that a single-row form is created on the following Country table:
State the steps you would take to execute the following query using the form: List all the
African countries whose currency name is the pound and population greater than 20 millions.
Order the result by country name.
Run the form. Press the icon Enter Query or select Enter from the Query menu;
Type colon ( : ) in any of the fields and press Execute Query icon or select Execute from
the Query menu;
In the Query Where dialog box type the following expression:
(a) Press OK button. Scroll through the result using the arrow key.
8.7 Oracle database consists of logical and physical database structures. Describe each of the
following concepts and state to which structure they belong:
(a) Schema;
A data block is the smallest unit of storage that oracle can use or allocate. A data
block size should be a multiple of the operating system's block size. A data block is a
logical data structure.
A Redo log file is a physical data structure that is used to record all changes made to
the data for recovery purposes. Every Oracle database has a set of two or more Redo
log files.
8.9 (a) SQL*Plus environment variables are set to default values when SQL*Plus is started.
State three ways by which users can change the default setting.
Users can change the default setting in three ways (use SET command):
Interactively by resetting each variable;
Write all commands in a script file and run the file each time the user wishes to
change the setting;
Write all commands in the file login.sql. The file must be stored in the current
directory (folder) of SQL*Plus.
Connect to the database 'OracleDB' with the user name 'CustomerOrders' and
password 'customer';
CONNECT CustomerOrders/customer@OracleDB;
Use the COLUMN command to set the output format of the field 'StaffName' to
30 characters.
(1) List the name and granted roles of the current user;
(2) List name and type of all objects owned by the current user;
Database Systems: Instructor’s Guide – IV
(3) List table name and the tablespace name to which the table is assigned of all
tables owned by the current user;
SELECT Emp_Seq.NEXTVAL
FROM SYS.DUAL;
where: custNo is the number of the customer who placed the order.
Write a script with the necessary formatting commands (e.g. COLUMN, BREAK,
TTITLE, etc.) and an SQL statement to create a report that lists each customer
number, each order number he/she placed, the total value of each order, the sum of
the total values of all the orders for each customer, and the grand total of all the
orders placed by all customers under the following heading:
8.10 (a) In PL/SQL what is a Cursor? When do we use an explicit Cursor? What do you do
when you declare a Cursor?
A cursor is a PL/SQL construct that lets you name a work area and access its
stored information.
We declare an explicit cursor for queries that returns more than one row so that
we can process the rows individually.
When we declare an explicit cursor we name it and associate it with a specific
query.
(c) Assume that the following tables are part of a Library database system
Create a PL/SQL procedure object in your schema. Using a Cursor the procedure
code should be able to retrieve records of all employees and check in which city they
live. If an employee lives in Glasgow the procedure should insert a record in the
table GlasgowEmployees.
END;
(e) Assume that a single-row form is created on the Member table and you are using it to
enter data in the table. State the steps you would take to create a trigger that will fire
and insert the next card number in the member record when the record is saved.
Specify the type of trigger you will use and write the trigger code. Assume that a
sequence generator already exists.
SELECT Member_Seq.NEXVAL
INTO :Member.cardNo
FROM SYS.dual;
8.11 (a) What is a SQL*Plus script? Why is it a good practice to create a log file while a
SQL*Plus script is executed, and how can the log file be created?
(b) What can a ‘database trigger’ be used for? Explain concisely what the code below
does.
table. Triggers enforce business rules, prevent incorrect values from being stored, and
reduce the need to perform checking and cleanup operations in each application.
The above trigger fires before insert or update on st_customer. If the field
customer_insert_user of the row in question is empty, USER, SYSDATE etc are
inserted, otherwise original data is kept.
which capture, among others, the items that are ordered by customers. Explain what
the following code is and explain how it works by writing comments against each
line of the code:
RETURN NUMBER
IS
statusCode_int order.statusCode%TYPE:= UPPER(statusCode_in);
returnValue NUMBER;
BEGIN
OPEN salesCur (statusCode_int);
FETCH salesCur INTO returnValue;
IF salesCur%notfound
THEN
CLOSE salesCur;
RETURN NULL;
ELSE
CLOSE salesCur;
RETURN returnValue;
END IF;
END total_sales;
The student should be able to explain the main points of the function: the structure, the
meaning and type of the parameters, the cursor, variables, the embedded SQL statement
and the main executable body.
Database Systems: Instructor’s Guide – IV
9.1 Compare and contrast the two manifestos: Object-Oriented Database System Manifesto based on
the object-oriented paradigm (Atkinson et al., 1989a) and the Third Generation Database System
Manifesto published by the Committee for Advanced DBMS Function (CADF).
See Appendix N.
9.2 Discuss how the new version of the SQL standard addresses object-oriented data management.
Give examples to illustrate your answers.
9.3 The on-going debate between proponents of the relational data model and proponents of the
object-oriented data model (if one truly exists), resembles that between the proponents of
network/hierarchic systems and relational systems a couple of decades ago. However, another
system is evolving that may have a significant impact on what the database management system
of the future may be and that is the Object-Relational Database Management System (ORDBMS).
Give your definition of an ORDBMS. Compare and contrast the ORDBMS and the Object-
Oriented Database Management System (OODBMS).
There is no one definition of an ORDBMS. The key feature is to support object-oriented features
within the confines of a relational system.
Comparison should be based around main differences – ORDBMS will probably use SQL3,
giving object-oriented features, object identity, triggers, etc., although changes will be necessary
to storage mechanisms (e.g. Quad-tree, K-D-B-tree) with resulting changes in the query optimizer.
OODBMS will probably evolve with ODMG standard, closely linked to programming language.
9.4 Discuss how the proposed SQL:2011 standard will handle object identity and give an example of
its intended use.
Object identity in SQL:2011 achieved through a referenceable base table. For example:
To limit the references to a specific table, a SCOPE clause can be added to the Staff table:
References can be used in path expressions that permit traversal of object references to
navigate from one row to another. To traverse a reference, the dereference operator (–>) is
used.
where:
Pet contains details of pets and the pet number (petNo) is the key. The surgery where the
pet is registered is given by the surgery number (surgeryNo). A pet can only be registered
with one surgery at a time. The doctor who treats the pet is given by the doctorStaffNo.
Picture contains an image of the pet.
Staff contains details of staff and staff number (staffNo) is the key.
Surgery contains details of each surgery and the surgery number (surgeryNo) is the key.
Discuss how you would want a query optimiser within an ORDBMS to handle this type of query.
Use a relational algebra tree to illustrate your answer.
(i) The first expectation is that the QP could flatten the SQL-defined function
StaffDoctors().
(ii) Secondly, the normal heuristics of pushing the selection operations down past the join
would need to be modified in this case, as it would be better to run the external function,
which will be CPU-intensive, after the join operations on a smaller set of records.
(c) to retrieve records of all Ford cars from the ‘vehicles’ table.
9.7 In the Object Relational Model (ORM) an object-type has a name, attributes, and methods.
What is a method? And what is the principal use of methods?
Three kinds of methods are supported by Oracle ORM: member methods: map and
order methods; static methods; and constructor method.
Database Systems: Instructor’s Guide – IV
10.1 Explain the procedures and techniques needed to achieve a conceptual data model.
The answer should discuss the different stages that might be identified, such as requirements
analysis through identification of user views etc. The use of Entity–Relationship modeling and
normalization techniques should be explained, and the merging of views and validating the model
against transactions.
10.2 The application prototyping approach to software development gives users what they want by
employing principles used in other engineering disciplines, i.e. build a working model and use it.
Critically discuss the arguments for and against this approach showing how the software
development life cycle is consequently affected.
What are the necessary conditions for it to be successful and what are the dangers/problems that
could arise?
The arguments against prototyping tend to support the conventional approach to software
development, which maintains, for example, that all requirements can be specified, the static
model is correct, people can communicate effectively etc. Many of the arguments supporting
prototyping are the opposite, such as, people do not communicate effectively, they do not know
what they want necessarily etc. therefore it is better to build a model and show it to them, thus
obtaining feedback. The prototype effectively captures requirements specifications, therefore it
fits in to the software development lifecycle after initial requirements analysis and prior to any
system development. It relies on the use of appropriate software tools, requires a commitment to
the approach on the part of both developers and clients who must be prepared to be involved. The
application must be suitable and the prototype will go through several iterations. Problems include
the danger of sloppy work lacking any rigorous methodology, team boredom, prototype seen as
the end product.
10.3 Explain what is meant by application prototyping and why the development of Fourth Generation
software should support it.
Much of the same points made in 9.2 also apply to this question. The answer should include that
application prototyping involves building models, is an iterative process that should be carried out
quickly, relies on a lot of user involvement and intends to capture the requirements definition.
Consequently, fourth generation tools support it because they permit components such as screens
and reports to be quickly designed, generated, and altered if necessary. In considering how
software development is helped, some reference should be made to the more traditional approach
to software development.
10.4 Explain why planning is important in the database lifecycle, highlighting particular points that
are important during the process.
Database Systems: Instructor’s Guide – IV
Discuss the major functions of the Database Administration role, in particular indicating the
skills most relevant to the role.
The planning process should be briefly discussed, such as looking at the overall objectives, what
the current situation is like especially with regard to any particular problems that might be
prevalent. It is possible that a generalized data model for the whole enterprise may be produced,
which may then be subdivided into different functional areas giving priorities for development.
10.5 Discuss the importance of planning in the database lifecycle, detailing the significant stages in
the process.
Once the design phase of the database lifecycle has been initiated, the objective is to achieve a
conceptual database schema design.
The points made in the first part of 9.4 apply here also. In planning, current and future needs are
determined across the organization. Management commitment is crucial. Specific goals and
objectives are defined, business functions are defined, an information model may be developed
and an implementation plan. This may be revised as the current situation changes. Overall costs
and benefits are assessed,
In commencing design, we are starting data modeling. The process of data modeling should be
explained including the use of various techniques to aid this process. The answer may include
reference to top-down and bottom-up approaches.
Database Systems: Instructor’s Guide – IV
(a) A company called Perfect Pets runs a number of clinics. A clinic has many staff and a
member of staff manages at most one clinic (not all staff manage clinics). Each clinic
has a unique clinic number (clinicNo) and each member of staff has a unique staff
number (staffNo).
Has
Clinic Staff
1..1 1..*
clinicNo Manages staffNo
0..1
(b) When a pet owner contacts a clinic, the owner’s pet is registered with the clinic. An
owner can own one or more pets, but a pet can only register with one clinic. Each owner
has a unique owner number (ownerNo) and each pet has a unique pet number
(petNo).
1..1 1..1
Owns
Registers
1..*
Pet
petNo
1..*
(c) When the pet comes along to the clinic, it undergoes an examination by a member of the
consulting staff. The examination may result in the pet being prescribed with one or
more treatments. Each examination has a unique examination number (examNo) and
each type of treatment has a unique treatment number (treatNo).
Pet
petNo
1..1
Undergoes
1..*
Has
PetOwner IsContactedBy Clinic Staff
1..1 1..*
ownerNo clinicNo Manages staffNo
1..* 1..1
0..1 1..1
1..1 1..1 1..1
Owns
Registers
1..*
Performs
Pet
petNo
1..*
1..1
Undergoes
1..*
Examination 0..*
examNo
A regional council requires the design of a database system that can provide information on
all schools in the region. The requirements collection and analysis phase of the database
design process has provided the following data requirements for the schools database system.
(a) Every school has many pupils and many teachers. Each pupil is assigned to one school
and each teacher work for one school only.
(b) Each teacher teaches more than one subject but a subject may be taught by more than
one teacher. The database should store the number of hours a teacher spent teaching a
subject. Data held on each teacher includes his/her national Insurance Number (NIN)
name (first and last), sex, and qualifications. The data held on each subject includes
subject title and type.
(c) Each pupil can study more than one subject and a subject may be studied by more than
one pupil. Data held on each pupil includes the pupil's code, name (first and last), sex,
and date of birth.
(d) Each school is managed by one of its teachers. The database should keep track of the
date he/she started managing the school. Data stored on each school includes the
school's code, name, address (town, street, and post code) and phone.
Pupil Subject
Study
pCode 0..* 1..* title
1..* 1..*
1..1 0..*
Employs
School Teacher
1..1 1..*
sCode nin
Manages
0..1 1..1
date
Database Systems: Instructor’s Guide – IV
The requirements collection and analysis phase of the database design process has provided
the following data requirements for a company called Reliable Rentals, which rents out
vehicles (cars and vans). The Company has various outlets (garage/offices) throughout
Glasgow. Each outlet has a number, address, phone number, fax number, and a manager who
supervises the operation of the garage and offices at each site.
Each site is allocated a stock of vehicles for hire, however, individual vehicles may be moved
between outlets, as required. Only the current location for each vehicle is stored. The
registration number uniquely identifies each vehicle for hire and is used when hiring a vehicle
to a client.
Clients may hire vehicles for various periods of time (minimum 1 day to maximum 1 year).
Each individual hire agreement between a client and the Company is uniquely identified using
a hire number. Information stored on the vehicles for hire include: the vehicle registration
number, model, make, engine size, capacity, current mileage, date MOT due, daily hire rate,
and the current location (outlet) of each vehicle.
The data stored on a hire agreement includes the hire number, the client’s number, name,
address and phone number, date the client started the hire period, date the client wishes to
terminate the hire period, the vehicle registration number, model and make, the mileage before
and after the hire period. After each hire a member of staff checks the vehicle and notes any
fault(s). Fault report information on each vehicle is stored, which records the name of the
member of staff responsible for the check, date checked, whether fault(s) where found (yes or
no), the vehicle registration number, model, make and the current mileage.
The Company has two types of clients: personal and business. The data stored on personal
clients includes the client number, name (first and last name), home address, phone number,
date of birth and driving licence number. The data stored on business clients includes the
client number, name of business, type of business, address, telephone and fax numbers. The
client number uniquely identifies each client and the information stored relates to all clients
who have hired in the past and those currently hiring a vehicle.
Information is stored on the staff based at various outlets including: staff number, name (first
and last name), home address, home phone number, date of birth (DOB), sex, National
Insurance Number (NIN), date joined the Company, job title and salary. Each staff member is
associated with a single outlet but may be moved to an alternative outlet as required, although
only the current location for each member of staff is stored.
12.3 Create a conceptual schema for Reliable Rentals using the concepts of the Enhanced Entity–
Relationship (EER) model. To simplify the diagram, only show entities, relationships and the
primary key attributes. Specify the cardinality ratio and participation constraint of each
relationship type. State any assumptions you make when creating the EER model (if
necessary).
Database Systems: Instructor’s Guide – IV
1..* 1..*
Generates Has
1..1 1..1
1..*
PersonalClient BusinessClient
Database Systems: Instructor’s Guide – IV
Chapters 14 - 15 Normalization
14.1 Explain the purpose of data normalization and describe the main steps in the normalization
process.
The process of organizing data so that it avoids data inconsistency prevents update anomalies. The
main steps include: 1NF to remove repeating groups; 2NF to remove partial dependencies; 3NF to
remove transitive dependencies and BCNF to remove remaining anomalies from dependencies.
1NF
14.2 The table shown below displays the details of the roles played by actors/actresses in films.
PK
filmNo filmNo
fTitle actorNo director
dirNo role fTitle
actorNo dirNo
aName director
role aName timeOnScreen
timeOnScreen
F1100 Happy Days D101 Jim Alan A1020 Sheila Toner Jean Simson 15.45 fd1
D101 Jim Alan A1222 Peter Watt Tom Kinder 25.38
fd2
D101 Jim Alan A1020 Sheila Toner Silvia Simpson 22.56
F1109 Snake Bite D076 Sue Ramsay A1567 Steven McDonald Tim Rosey fd319.56
2NF
The table contains a repeating group; the details of actors/actresses are repeated for each
film. As a consequence, there are multiple values at the intersection of certain rows and
columns. filmNo actorNo role timeOnScreen
(b) The table shown above is susceptible to update anomalies. Provide examples of how
insertion, deletion,
filmNo fTitle and dirNo
modificationdirector
anomalies could occur on this table. aName
actorNo
Using the data in the table, the student should provide examples of how insertion,
deletion, and update anomalies could occur.
(d) Using the functional dependencies identified in part (c), describe and illustrate the
process of normalization by converting thefd1table shown in Figure 1 to Boyce–Codd fd4
Normal Form (BCNF). Identify the primary and foreign keys in your BCNF relations.
PK of normalization is shown
The process FK in figure below. PK
fd2 fd3
Database Systems: Instructor’s Guide – IV
Database Systems: Instructor’s Guide – IV
(e) Sketch an Entity–Relationship model for the data shown in table above.
Makes
1..1
Director
dirNo {PK}
director
14.3 Briefly describe how the techniques of normalization and Entity–Relationship modeling can
be used to produce a set of relations with desirable properties.
14.4 Describe the purpose of normalizing data and identify the four most commonly used normal
forms.
The normalization process, as first proposed by Codd (1972a), takes a relational schema through a
series of tests to assess whether or not it belongs to a certain normal form. Normalizing data is the
process during which unsatisfactory relational schemas are decomposed by breaking up their
attributes into smaller relational schemas that possess desirable properties.
Codd proposed three normal forms called first, second and third normal forms. A stronger
definition of the third normal form was proposed later by Boyce–Codd normal form (BCNF). All
of these normal forms are based on the functional dependencies among the attributes of a relation.
A formal framework for analyzing relation schemas based on their keys and the functional
dependencies among their attributes. A series of tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any degree.
Database Systems: Instructor’s Guide – IV
14.5 The table below lists customer/car hire data. Each customer may hire cars from various
outlets throughout Glasgow. A car is registered at a particular outlet and can be hired out to a
customer on a given date.
(a) The data in the table is susceptible to update anomalies. Provide examples of how
insertion, deletion, and modification anomalies could occur on this table.
Using the data shown in the table above, the student should provide examples of how
insertion, deletion, and update anomalies could occur.
(b) Identify the functional dependencies represented by the data shown in the table. State any
assumptions you make about the data.
(c) Using the functional dependencies identified in part (b), describe and illustrate the process of
normalization by converting Table 1 to Third Normal Form (3NF) relations. Identify the
primary and foreign keys in your 3NF relations.
1NF
PK
fd1
fd2
fd3
fd4
fd5
2NF
3NF / BCNF
PK
FK FK PK
fd1’ fd3
PK FK PK PK
(d) Sketch an Entity–Relationship model for the data shown in Table 1. Show all the entities,
relationships, and attributes.
1..*
hireDate Makes
1..1
Manufacturer
model {PK}
make
Hires Makes
1..1 1..1
Customer Manufacturer
14.6 Examine the table shown below. This table represents the hours worked per week for
temporary staff at each branch of a company.
S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16
S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9
(a) The table shown above is susceptible to update anomalies. Provide examples of how
insertion, deletion, and modification anomalies could occur on this table.
Using the data shown in this table, the student should provide examples of how insertion,
deletion, and update anomalies could occur.
(b) Identify the functional dependencies represented by the data shown in the table. State
any assumptions you make about the data (if necessary).
S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16
S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9
Composite
primary key
(c) Using the functional dependencies identified in part (b), describe and illustrate the
process of normalization by converting Table 1 to Third Normal Form (3NF)
relations. Identify the primary and foreign keys in your 3NF relations.
First Normal Form (1NF) is a relation in which the intersection of each row and column
contains one and only one value.
Student should indicate why table is not in 2NF. Second Normal Form (2NF) is a relation that
is in first normal form and every non-primary-key attribute is fully functionally dependent on
the primary key.
Database Systems: Instructor’s Guide – IV
Converting to 2NF
Composite
primary key
S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16
S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9
B002 City Center Plaza, Seattle, WA 98122 S4555 Ellen Layman Assistant
B004 16 – 14th Avenue, Seattle, WA 98128 S4612 Dave Sinclair Assistant
Becomes Becomes
primary primary
key key
S4555 B002 16
S4555 B004 9
S4612 B002 14
S4612 B004 10
Becomes Becomes
foreign key foreign
key
Composite
primary key
The student should then describe the process of normalizing the table to 2NF by removing the
partial dependencies.
Database Systems: Instructor’s Guide – IV
The student should provide a diagrammatic illustration of the process of normalizing the table
form 1NF to 3NF.
The student should present the 3NF tables displaying the primary key, foreign key(s) and
alternate key(s) for each table.
(d) Create an Entity–Relationship (ER) model using the Unified Modeling Language
(UML) to represent the data shown in Figure 1. Your ER model should show all
entities, relationships, and attributes.
hoursPerWeek
Staff Branch
Has
14.7 The table below lists customer/car hire data. Each customer may hire cars from various
outlets throughout Glasgow. A car is registered at a particular outlet and can be hired out to a
customer on a given date.
date/time instructorID
iFName iLName clientIDD cFName cLName cAddress
(a) The data in the table is susceptible to update anomalies. Provide examples of how
insertion, deletion, and modification anomalies could occur on this table.
Using the data shown in the table above, the student should provide examples of how
insertion, deletion, and update anomalies could occur.
(b) Identify the functional dependencies represented by the data shown in the table. State
any assumptions you make about the data.
(c) Using the functional dependencies identified in part (b), describe and illustrate the
process of normalization by converting Table 1 to Third Normal Form (3NF) relations.
Identify the primary and foreign keys in your 3NF relations.
fd1
fd2
fd3
fd4
fd5
The primary key of the original relation is appointment number (appNo) and the relation is
already in 2NF. The original relation also has two alternate keys instructorID, dateTime and
clientID, dateTime. The functional dependencies fd2 to fd5 represent transitive dependencies
in the original relation and must be removed to create the following relations in 3NF.
14.8 The table below lists customer/car hire data. Each customer may hire cars from various
outlets throughout Glasgow. A car is registered at a particular outlet and can be hired out to a
customer on a given date.
jobID pickupDateTime
driverID dFNamedLName clientIDD cFNamecLName cAddress
(a) The data in the table is susceptible to update anomalies. Provide examples of how
insertion, deletion, and modification anomalies could occur on this table.
Using the data shown in the table above, the student should provide examples of how
insertion, deletion, and update anomalies could occur.
(b) Identify the functional dependencies represented by the data shown in the table. State
any assumptions you make about the data.
(c) Using the functional dependencies identified in part (b), describe and illustrate the
process of normalization by converting Table 1 to Third Normal Form (3NF) relations.
Identify the primary and foreign keys in your 3NF relations.
Database Systems: Instructor’s Guide – IV
pickupDateTime
driverID dFName
dLName clientIDD cFName
cLName cAddress
fd1
fd2
fd3
Students should identify that jobId is the PK for the table and then show the three functional
dependencies. Students may split up the pickupDateTime column to separate Date and
Time…This is OK but not necessary.
The original table is in 2NF and is transformed into 3NF by removing functional dependencies
fd2 and fd3. The structure of the three tables created is shown below.
PK FK FK
pickupDateTime
driverID clientIDD
fd1
PK
driverID dFName
dLName
fd2
PK
clientIDD cFName
cLName cAddress
fd3
14.9 The table below provides osme sample data for an agency called Hotel Services supplies part-
time/temporary staff to hotels within Strathclyde region. The relation in Figure 2 lists the
number of hours worked by each staff at various hotels. The relation is first normal form
Database Systems: Instructor’s Guide – IV
(1NF). Assuming that a contract is for one hotel only but a staff may work in more than one
hotel on different contracts, identify the functional dependencies represented by the data in
this relation.
Contracts
NIN contractNo hours eName hNo hLoc
1135 C1024 16 Smith, J. H25 East Kilbride
1057 C1025 16 Green, D. H4 Glasgow
1068 C1024 28 Green, D. H25 East Kilbride
1135 C1025 16 Smith, J. H4 Glasgow
1057 C1026 25 Green, D. H15 Glasgow
1088 C1027 25 Crowe, M. H25 East Kilbride
Functional dependencies
(NIN, contractNo) ----> {hours, eName, hNo, hLoc}
(NIN, hNo) -----> {contractNo, hours, eName, hLoc)
contractNo -----> {hNo, hLoc}
hNo -----> hLoc
NIN ------> eName
14.10 Given the following relation schema and its functional dependencies:
RR
A B C D E F G H
fd1
fd2
fd3
fd4 fd5
(b) Assuming that the relation is in first normal form (1NF), describe and illustrate the
process of normalising the relation schema to second (2NF) and third (3NF) normal
forms. Identify the primary and foreign keys in your schemas.
pk
RR1 pk RR fk
A B C G H A F D E
fd2 fd1
fd5 fd4
fd3
pk fk pk
RR1 RR11
A B C G G H
fd2 fd5
pk
RR2 pk
RR fk
A F D D E
fd1 fd4
fd3
14.11 The table below represents data about employees of a company and the projects they work on.
An employee may work on one or more projects a certain number of hours
EmpProjects
Assuming that the functional dependencies in the relation in Figure 2 will hold for any
additional data, which of the following functional dependencies are true and which are false?
Justify your answer.
vi) projManager projName false; More than one projName for one projManager
vii) empNo hours false; More than one value of hours for one empNo
viii) (projName, empNo) hours true; One value of hours for the compound value of
(projName, empNo)
ix) managerDOB projManager false; More than one projManager for one
managerDOB
14.12 Given the following relational schema and its functional dependencies:
RentalInfo
custNo propertyNo custName pAddress rentStart ownerNo OName
fd1
fd2
fd3
fd4 fd5
(b) Assuming that the relation is in first normal form (1NF), describe and illustrate the
process of normalising the relational schema to second (2NF) and third (3NF) normal
forms. Identify the primary and foreign keys in your third normal forms.
2NF: fd3 and fd54 violates the definition of 2NF. Decompose relation.
custNo propertyNo
rentStart
fd1
fd2
Database Systems: Instructor’s Guide – IV
pk pk fk
custNo custName propertyNo pAddress ownerNo
fd4 fd3
pk
fk fk pk
custNo propertyNo rentStart ownerNo OName
fd1 fd5
fd2
14.13(a) Give a set of functional dependencies for the relational schema R(A, B, C, D) with
primary key AB under which R is in 1NF but not in 2NF.
(b) Give a set of functional dependencies for the relational schema R(A, B, C, D) with
primary key AB under which R is in 2NF but not in 3NF.
Any (non-trivial) dependency X → ρ, with X not a proper subset of {A, B} and ρ not
equal to A or B will violate 3NF but not 2NF.
14.15 Consider the following relational schemas and functional dependencies. Assume that the
relations have been produced from a relation ABCDEFGHI and that all known
dependencies for this relation are listed. State the strongest normal form for each one
and, if appropriate, decompose to BCNF.
14.16 Consider the relational schema R(A, B, C, D). For each of the following functional
dependencies identify the candidate key(s) for R and state its strongest normal form. If
appropriate, decompose to BCNF.
(a) B → C, C → A, C → D
(b) B → C, D → A
(c) ABC → D, D → A
(d) A → B, A → C, BC → D
(e) AB → C, AB → D, C → A, D → B
(a) Candidate key is B. Relation is in 2NF but not 3NF (C → A, C → D violate BCNF).
Decompose to: AC, BC, and CD.
(b) Candidate key is BD. Relation is in 1NF but not 2NF (both FDs violate BCNF).
Decompose to: AD, BC, and BD.
(c) Candidate keys are ABC and BCD. Relation is in 3NF but not BCNF (D → A and D
is not a key). However, if we split R into AD and BCD we cannot preserve the first
dependency, so there is no BCNF decomposition.
(d) Candidate key is A. Relation is in 2NF but not 3NF (because of FD BC → D). This
FD also violates BCNF since BC does not contain a key, so decompose to: BCD and
ABC.
(e) Candidate keys are AB, BC, CD, and AD. Relation is in 3NF but not BCNF. First
decompose to: AC, BCD; this does not preserve AB → C, AB → D, and BCD is still
not in BCNF because of FD D → B. Decompose further to: AC, BD, CD, ABC, and
ABD.
Database Systems: Instructor’s Guide – IV
Part 4 Methodology
Chapters 16 - 19 Methodology – Conceptual, Logical, and Physical Database Design
An Adult Education Department runs various courses during the daytime and evenings, and at
different times of the year. For example, ‘Spanish level 1’ is offered on Monday mornings,
Monday evenings or Wednesday evenings, and runs over 25 weeks from October to March. On
the other hand, ‘Introduction to Digging Up Your Ancestors’ only runs for 8 weeks, but is offered
on Tuesday or Wednesday evenings from October to December, January to March, and April to
June, with an optional field week in August.
There is always a maximum number of places for each course offering, which is dependent on the
individual tutor. For example, ‘Spanish level 1’ on Monday evenings may be limited to 20 places,
but on Wednesday evenings the limit may be 25. Each course offering is only taken by one tutor,
however, a tutor may take different courses, for example, ‘French level 1’ and ‘Spanish level 2’.
To guarantee enrolment, prospective students must pay the fee before the start of the first class.
There is a special reduction for those unemployed. All applicants are kept on a register for
subsequent mailshots.
For the diagram, four entities can be determined: Tutor, Course, Student, and Offering.
Course to Offering and Tutor to Offering are both 1:*, but Student to Offering is *:*.
1..*
Enrols
fee
1..*
Student
matricNo{PK}
The tables should be derived relatively easily from the model, the most tricky one being
Offering. There also needs to be a table to represent the relationship between Student
and Offering, which will also contain the courseFee attribute.
(d) Show that your data model supports the following transactions:
(i) Add a new course to the database, prior to it being offered on any particular day or
from any particular date.
(ii) Enrol a new student on the ‘German level 2’ course that runs on Monday evenings
commencing October 10 1994.
Bearing in mind the above transactions, explain how the physical database design might
be influenced describing what changes you might make etc, and how the application
(and transactions) would be affected. Your comments can apply to both computerized
procedures and manual procedures.
The transactions can be shown in various ways. For example, a check could be made first to
ensure no details have been entered. If separate tables exist for both Course and Offering, simply
insert the new details into Course (identifier, name, and level). If the tables are combined, this
table is simply checked for the existence of any records, and then the new details are inserted,
leaving the rest of the attributes null.
For the second transaction, check whether Student details exist, and insert if necessary. Find the
identifier for the Course, then repeatedly check Offering to find the correct Offering identifier
for the day/time. Repeat read the Registration records, and if space, insert a new Registration
record.
Alterations may be made to the physical design, such as adding a derived field to Offering
indicating how many students are enrolled – alters procedures. To aid searching, set up secondary
indexes on required fields.
Database Systems: Instructor’s Guide – IV
The BusyBee Cleaning Company specializes in providing cleaning services for both domestic and
commercial clients. Each type of client has a set of requirements. For example, The Cardboard
Box Company requires cleaning services from Monday to Friday 7am until 9am and 5pm until
7pm each day, but P. Nuttall only requires cleaning services on a Wednesday from 10am until
1pm.
Whenever a new client is taken on, a BusyBee administrator assesses how many cleaning staff are
required for the premises prior to assigning any staff to the job. Note that this is the ideal number,
it may differ in practice. In addition, the administrator also assesses whether any specialist
equipment is required and when. For example, three industrial floor cleaners may be needed on
two out of five occasions for one commercial client.
The cleaning staff work in groups of six, with a supervisor to oversee the work done. The other
staff are administrative staff who manage the day-to-day office work including visiting new
clients and ensuring the specialist equipment is properly maintained.
Four main entities can be identified: Client, Requirement, Equipment, and Staff. Staff can
form a superclass with Cleaner and Admin forming subclasses. There is a recursive relationship
(1:*) on Cleaner. Cleaner to Requirement and Equipment to Requirement are both *:*
relationships (Assigned and Booked), whereas Client to Requirement is 1:*. This represents
the core of the problem.
Database Systems: Instructor’s Guide – IV
Supervises
0..6 1..1
1..*
UsedFor
0..*
Staff Equipment
In deriving the tables, the primary keys should be chosen judiciously, without separate attributes
being devised for all of them. So, for example, Client, Cleaner, Admin, Staff, and Equipment
will have a reference number each. Requirement will have the Client reference number with day
and time from. Assigned will have Client reference number with day, time from and Cleaner
reference number. Booked will have Client reference number with day, time from and equipment
reference number.
(c) Demonstrate that your model supports the following transactions and explain how they
might influence physical database design:
(i) For a specific client, produce a schedule of the cleaning times together with the
number of staff assigned, and details of any specialist equipment required.
(ii) For a specific supervisor, produce a list of staff on their team together with their
assignment details.
The transactions often demonstrate whether the data model is reasonable or incorrect. For the first
transaction you could assume a reference number is used to access the client table, then each
requirement can be accessed in turn. For each requirement, the assigned table will be accessed
repeatedly, to count the number of staff assigned, and the booked table will be accessed repeatedly
and linked to the equipment table using the equipment reference number.
For the second transaction, a similar procedure is carried out. The difference is that there is a
recursive relationship involved, but if the table structures are correct, this should not pose a
problem. In looking at the transactions refinements may be obvious, such as using derived
attributes or posting in some non-key attributes from one table to another or creating secondary
indexes.
Database Systems: Instructor’s Guide – IV
The requirements collection and analysis phase of the database design process has provided
the following data requirements for a company called Reliable Rentals, which rents out
vehicles (cars and vans). The Company has various outlets (garage/offices) throughout
Glasgow. Each outlet has a number, address, phone number, fax number, and a manager who
supervises the operation of the garage and offices at each site.
Each site is allocated a stock of vehicles for hire, however, individual vehicles may be moved
between outlets, as required. Only the current location for each vehicle is stored. The
registration number uniquely identifies each vehicle for hire and is used when hiring a vehicle
to a client.
Clients may hire vehicles for various periods of time (minimum 1 day to maximum 1 year).
Each individual hire agreement between a client and the Company is uniquely identified using
a hire number. Information stored on the vehicles for hire include: the vehicle registration
number, model, make, engine size, capacity, current mileage, date MOT due, daily hire rate,
and the current location (outlet) of each vehicle.
The data stored on a hire agreement includes the hire number, the client’s number, name,
address, and phone number, date the client started the hire period, date the client wishes to
terminate the hire period, the vehicle registration number, model and make, the mileage before
and after the hire period. After each hire a member of staff checks the vehicle and notes any
fault(s). Fault report information on each vehicle is stored, which records the name of the
member of staff responsible for the check, date checked, whether fault(s) where found (yes or
no), the vehicle registration number, model, make and the current mileage.
The Company has two types of clients: personal and business. The data stored on personal
clients includes the client number, name (first and last name), home address, phone number,
date of birth, and driving licence number. The data stored on business clients includes the
client number, name of business, type of business, address, telephone, and fax numbers. The
client number uniquely identifies each client and the information stored relates to all clients
who have hired in the past and those currently hiring a vehicle.
Information is stored on the staff based at various outlets including: staff number, name (first
and last name), home address, home phone number, date of birth (DOB), sex, National
Insurance Number (NIN), date joined the Company, job title, and salary. Each staff member is
associated with a single outlet but may be moved to an alternative outlet as required, although
only the current location for each member of staff is stored.
Database Systems: Instructor’s Guide – IV
16.3 (a) Create a conceptual schema for Reliable Rentals using the concepts of the Enhanced
Entity–Relationship (EER) model. To simplify the diagram, only show entities,
relationships and the primary key attributes. Specify the cardinality ratio and participation
constraint of each relationship type. State any assumptions you make when creating the
EER model (if necessary).
1..* 1..*
Generates Has
1..1 1..1
1..*
PersonalClient BusinessClient
(b) Map your high-level data model to a set of relational tables that represent the
entity and relationship types. Identify primary, alternate, and foreign keys.
Staff (staffNo, fName, lName, address, telNo, DOB, sex, NIN, dateJoined, jobTitle,
salary, outletNo)
Primary Key staffNo
Alternate Key NIN
Foreign Key outletNo references Outlet(outletNo)
Database Systems: Instructor’s Guide – IV
Note: Assumption is that a vehicle is checked for faults, only once on a given date.
16.4 Map the high-level data model shown below to a set of relational tables. Identify
primary, alternate, and foreign keys.
0..*
Supervises WorksFor
Employee 1..* 1..1
Department
1..1 Controls
1..*
RelatedTo
WorksOn
hours
0..* 0..*
0..*
Dependent Project
Employee (SSN, fName, MI, lName, address, birthDate, sex, salary, number)
Primary Key SSN
Foreign Key number references Department (number)
(Or do not create SupervisoryTeam relation and simply add supSSN to Employee relation)
16.5 Map the high-level data model shown below to a set of relational tables. Identify
primary, alternate, and foreign keys.
name{PK} equipID{PK}
phone {AK} price
1..1 1..1
1..* 1..*
ElectFault MechFault
mPart
ePart
Company(name, phone)
Equipment(equipID, price, CName)
Engineer(engNo, fName, lName, CName, dateJoined)
Fault(faultID, description, equipID)
ElectFault(faultID, ePart)
MechFault(faultID, mPart)
Repairs(engNo, faultID, date, hours)
Database Systems: Instructor’s Guide – IV
A practice called Perfect Pets provides private health care for domestic pets throughout America. This
service is provided through various clinics located in the main cities of America. The Director of
Perfect Pets is concerned that there is a lack of communication within the practice and particularly in
the sharing of information and resources across the various clinics. To resolve this problem the Director
has requested the creation of a centralized database system to assist in the more effective and efficient
running of the practice. The Director has provided the following description of the current system.
Data Requirements
Veterinary Clinics
Perfect Pets has many veterinary clinics located in the main cities of America. The details of each
clinic include the clinic number, clinic address (consisting of the street, city, state, and zipcode), and the
telephone and fax numbers. Each clinic has a Manager and a number of staff (for example, vets, nurses,
secretaries, cleaners). The clinic number is unique throughout the practice.
Staff
The details stored on each member of staff include the staff number, name (first and last), address
(street, city, state, and zipcode), telephone number, date of birth, sex, social security number (SSN),
position, and current annual salary. The staff number is unique throughout the practice.
Pet Owners
When a pet owner first contacts a clinic of Perfect Pets the details of the pet owner are recorded, which
include an owner number, owner name (first name and last name), address (street, city, state, and
zipcode), and home telephone number. The owner number is unique to a particular clinic.
Pets
The details of the pet requiring treatment are noted, which include a pet number, pet name, type of pet,
description, date of birth (if unknown, an approximate date is recorded), date registered at clinic,
current status (alive/deceased), and the details of the pet owner. The pet number is unique to a
particular clinic.
Examinations
When a sick pet is brought to a clinic, the vet on duty examines the pet. The details of each
examination are recorded and include an examination number, the date and time of the examination, the
name of the vet, the pet number, pet name, and type of pet, and a full description of the examination
results. The examination number is unique to a particular clinic. As a result of the examination, the vet
may propose treatment(s) for the pet.
Treatments
Perfect Pets provides various treatments for all types of pets. These treatments are provided at a
standard rate across all clinics. The details of each treatment include a treatment number, a full
description of the treatment, and the cost to the pet owner. For example, treatments include:
A standard rate of $20.00 is charged for each examination, which is recorded as a type of treatment.
The treatment number uniquely identifies each type of treatment and is used by all Perfect Pets clinics.
Pet Treatments
Based on the results of the examination of a sick pet, the vet may propose one or more types of
treatment. For each type of treatment, the information recorded includes the examination number and
date, the pet number, name and type, treatment number, description, quantity of each type of treatment,
and date the treatment is to begin and end. Any additional comments on the provision of each type of
treatment are also recorded.
Pens
In some cases, it’s necessary for a sick pet to be admitted to the clinic. Each clinic has 20 – 30 animal
pens, each capable of holding between one and four pets. Each pen has a unique pen number, capacity,
and status (an indication of availability). The sick pet is allocated to a pen and the details of the pet, any
treatment(s) required by the pet, and any additional comments about the care of the pet are recorded.
The details of the pet’s stay in the pen are also noted, which include a pen number, and the date the pet
was put into and taken out of the pen. Depending on the pet’s illness, there may be more than one pet in
a pen at the same time. The pen number is unique to a particular clinic.
Invoices
The pet owner is responsible for the cost of the treatment given to a pet. The owner is invoiced for the
treatment arising from each examination, and the details recorded on the invoice include the invoice
number, invoice date, owner number, owner name and full address, pet number, pet name, and the
details of the treatment given. The invoice provides the cost for each type of treatment and the total cost
of all treatments given to the pet. Additional data is also recorded on the payment of the invoice,
including the date the invoice was paid and the method of payment (for example, check, cash, visa).
The invoice number is unique throughout the practice.
Each clinic also maintains a stock of pharmaceutical supplies (for example, antibiotics, pain killers). The
details of pharmaceutical supplies include a drug number and name, description, dosage, method of
administration, quantity in stock (this is ascertained on the last day of each month), reorder level, reorder
quantity, and cost. The drug number uniquely identifies each type of pharmaceutical supply. The drug
number is unique for each pharmaceutical supply and used throughout the practice.
Appointments
If the pet requires to be seen by the vet at a later date, the owner and pet are given an appointment. The
details of an appointment are recorded and include an appointment number, owner number, owner
name (first name and last name), home telephone number, the pet number, pet name, type of pet, and
the appointment date and time. The appointment number is unique to a particular clinic.
Database Systems: Instructor’s Guide – IV
Transaction Requirements
Listed below are the transactions that should be supported by the Perfect Pets database application.
2. The database should be capable of supporting the following example query transactions:
a) Present a report listing the Manager’s name, clinic address, and telephone number for
each clinic, ordered by clinic number.
b) Present a report listing the names and owner numbers of pet owners with the details of
their pets.
c) List the historic details of examinations for a given pet.
d) List the details of the treatments provided to a pet based on the results of a given
examination.
e) List the details of an unpaid invoice for a given pet owner.
f) Present a report on invoices that have not been paid by a given date, ordered by invoice
number.
g) List the details of pens available on a given date for clinics in the New York area, ordered
by clinic number.
h) Present a report that provides the total monthly salary for staff at each clinic, ordered by
clinic number.
i) List the maximum, minimum and average cost for treatments.
j) List the total number of pets in each pet type, ordered by pet type.
k) Present a report of the names and staff numbers for all vets and nurses over 50 years old,
ordered by staff name.
l) List the appointments for a given date and for a particular clinic.
m) List the total number of pens in each clinic, ordered by clinic number.
n) Present a report of the details of invoices for pet owners between 1997 to 1999, ordered
by invoice number.
o) List the pet number, name, and description of pets owned by a particular owner.
p) Present a report listing the pharmaceutical supplies that need to be reordered at each
clinic, ordered by clinic number.
q) List the total cost of the non-surgical and surgical supplies currently in stock at each
clinic, ordered by clinic number.
Database Systems: Instructor’s Guide – IV
16.6 (a) Create a conceptual schema for Perfect Pets using the concepts of the Enhanced Entity–
Relationship (EER) model. To simplify the diagram, only show entities, relationships and the
primary key attributes. Specify the cardinality ratio and participation constraint of each
relationship type. State any assumptions you make when creating the EER model (if
necessary).
The student should show that each of the transactions identified above is supported by his/her
conceptual data model.
(c) Map your high-level data model to a set of relational tables that represent the entity
and relationship types. Identify primary, alternate, and foreign keys.
Clinic (clinicNo, street, city, state, zipCode, telNo, Staff (staffNo, sFName, sLName, sStreet, sCity,
faxNo, mgrStaffNo) sState, sZipCode, sTelNo, DOB, sex, SSN,
Primary Key clinicNo position, salary, clinicNo)
Alternate Key zipCode Primary Key staffNo
Alternate Key telNo Alternate Key SSN
Alternate Key faxNo Foreign Key clinicNo references Clinic(clinicNo)
Foreign Key mgrStaffNo references Staff(staffNo)
PetOwner (ownerNo, oFName, oLName, oStreet, Pet (petNo, petName, petType, petDescription, pDOB,
oCity, oState, oZipCode, oTelNo, clinicNo) dateRegistered, petStatus, ownerNo, clinicNo)
Primary Key ownerNo Primary Key petNo
Foreign Key clinicNo references Clinic(clinicNo) Foreign Key ownerNo references Owner(ownerNo)
Foreign Key clinicNo references Clinic(clinicNo)
Examination (examNo, examDate, examTime, Treatment (treatNo, description, cost)
examResults, petNo, staffNo) Primary Key treatNo
Primary Key examNo
Alternate Key staffNo, examDate, examTime
Foreign Key petNo references Pet(petNo)
Foreign Key staffNo references Staff(staffNo)
Pen (penNo, penCapacity, penStatus, clinicNo) PetPen (penNo, petNo, dateIn, dateOut, comments)
Primary Key penNo Primary Key penNo, petNo, dateIn
Foreign Key clinicNo references Clinic(clinicNo) Alternate Key penNo, petNo, dateOut
Foreign Key penNo references Pen(penNo)
Foreign Key petNo references Pet(petNo)
PetTreatment (examNo, treatNo, startDate, endDate, Item (itemNo, itemName, itemDescription, itemCost)
quantity, ptComments) Primary Key itemNo
Primary Key examNo, treatNo
Foreign Key examNo references Examination(examNo)
Foreign Key treatNo references Treatment(treatNo)
Pharmacy (drugNo, drugName, drugDescription, ItemClinicStock (itemNo, clinicNo, inStock,
dosage, methodAdmin, drugCost) reorderLevel, reorderQty)
Primary Key drugNo Primary Key itemNo, clinicNo
Foreign Key itemNo references Item(itemNo)
Foreign Key clinicNo references Clinic(clinicNo)
PharmClinicStock (drugNo, clinicNo, inStock, Invoice (invoiceNo, invoiceDate, datePaid,
reorderLevel, reorderQty) paymentMethod, ownerNo, examNo)
Primary Key drugNo, clinicNo Primary Key invoiceNo
Foreign Key drugNo references Pharmacy(drugNo) Foreign Key ownerNo references Owner(ownerNo)
Foreign Key clinicNo references Clinic(clinicNo) Foreign Key examNo references Examination(examNo)
(d) Produce a physical database design for a relational DBMS you have access to.
Implement this physical database design.
Interactions between tables and query transactions (with suggested frequencies).
Table Transaction Access Frequency (per day)
Appointment 2(l) join: petNo 250
search condition: aDate
Examination 2(c), 2(d) join: Pet on petNo 100
2(d) join: Staff on staffNo
Invoice 2(e), 2(f) join: PetOwner on ownerNo 10
search condition: datePaid IS NULL
2(n) join: PetOwner on ownerNo 1 per month
search condition: invoiceDate
ItemClinicStock 2(q) search condition: inStock < reorderLevel 50 per month
Pet 2(b) join: PetOwner on ownerNo 50
2(j) group: petType 1
order by: petType
aggregate: count on petType
2(l) join: Clinic on clinicNo 250
2(o) join: PetOwner on ownerNo 1500
PharmClinicStock 2(p) search condition: inStock < reorderLevel 50 per month
Based on the guidelines provided for Oracle in Chapter 16 there may be performance benefits
in adding the indexes shown in the following table.
Additional Oracle indexes for the Perfect Pets database.
Table Index
Pet clinicNo
ownerNo
Appointment aDate
petNo
Invoice ownerNo
invoiceDate
Database Systems: Instructor’s Guide – Part V
Stock
{Mandatory, OR}
1..*
Holds Item Pharmacy
1..*
Has itemNo drugNo
PetOwner IsContactedBy Clinic Staff
1..1 1..*
ownerNo clinicNo Manages staffNo
1..1 1..* 1..1
0..1 1..1
1..1 1.. 1..1 1..1 1..1 1..1
1
POAttends Schedules
1..1 1..1
Undergoes
Performs
ResultsFrom
1..*
examNo
This case study describes a company called StayHome, which rents out videos to its members. The first
branch of StayHome was established in 1982 in Seattle but the company has now grown and has many
branches throughout the United States. The company’s success is due to the first class service it
provides to its members and the wide and varied stock of videos available for rent.
As StayHome has grown, so has the difficulties in managing the increasing amount of data used and
generated by the company. To ensure the continued success of the company, the Director of StayHome
has urgently requested that a database application be built to help solve the increasing problems of data
management.
Below is a description of two views of the company: a Branch view and a Business View.
the ‘data requirements’ section describes the data used by the Branch view;
the ‘data transactions’ section provides examples of how the data is used by the Branch view (that
is, the transactions that staff have to perform on the data).
Data Requirements
The data held on a branch of StayHome is the branch address made up of street, city, state, and zip
code, and the telephone numbers (maximum of 3 lines). Each branch is given a branch number, which
is unique throughout the company.
Each branch of StayHome has staff, which includes a Manager, one or more Supervisors, and a number
of other staff. The Manager is responsible for the day-to-day running of a given branch. Each branch
has several Supervisors and each Supervisor is responsible for supervising a group of staff. The data
held on a member of staff is his or her name, position, and salary. Each member of staff is given a staff
number, which is unique throughout the company.
Each branch of StayHome is allocated a stock of videos. The data held on a video is the catalog
number, video number, title, category, daily rental rate, purchase price, status, and the names of the
main actors (and the characters played), and the director. The catalog number uniquely identifies each
video. In most cases, there are several copies of each video at a branch, and the individual copies are
identified using the video number. A video is given a category such as Action, Adult, Children, Thriller,
Horror, or Sci-Fi. The status indicates whether a specific copy of a video is available for rent.
Before renting a video from the company, a customer must first register as a member of a local branch
of StayHome. The data held on a member is the first and last name, address, and the date that the
member registered at the branch. Each member is given a member number, which is unique across all
branches and is used even when a member chooses to register at more than one branch. The name of
the member of staff responsible for processing the registration of a member at a branch is also noted.
Once registered, a member is free to rent videos, up to a maximum of 10 at any one time. The data held
on each video rented is the rental number, the member’s full name and member number, the video
number, title, and daily rental cost, and the dates the video is rented out and returned. The rental
number is unique throughout the company.
Database Systems: Instructor’s Guide – PART IV
Transaction Requirements
Data Entry
(a) Enter the details of a new branch.
(b) Enter the details of a new member of staff at a branch (such as an employee Tom Daniels at
branch B001).
(c) Enter the details for a newly released video (such as details of a video called Independence
Day).
(d) Enter the details of copies of a new video at a given branch (such as three copies of
Independence Day at branch B001).
(e) Enter the details of a new member registering at a given branch (such as a member Bob
Adams registering at branch B002).
(f) Enter the details of a rental agreement for a member renting a video (such as member Don
Nelson renting Tomorrow Never Dies on 4- Feb-2000).
Data Queries
The database should be capable of supporting the following sample queries:
the ‘data requirements’ section describes the data used by the Business view;
the ‘data transactions’ section provides examples of how the data is used by the Business view
(that is, the transactions that staff have to perform on the data).
Data Requirements
The details held on a branch of StayHome are the branch address and the telephone number. Each
branch is given a branch number, which is unique throughout the company.
Each branch of StayHome has staff, which includes a Manager. The details held on a member of staff
are his or her name, position, and salary. Each member of staff is given a staff number, which is unique
throughout the company.
Each branch of StayHome is allocated a stock of videos. The details held on a video are the catalog
number, video number, title, category, daily rental rate, and purchase price. The catalog number
uniquely identifies each video. However, in most cases there are several copies of each video at a
branch, and the individual copies are identified using the video number.
Each branch of StayHome receives videos from video suppliers. The details held on video suppliers are
the supplier number, name, address, telephone number, and status. Orders for videos are placed with
these suppliers and the details held on a video order are the order number, supplier number, supplier
address, video catalog number, video title, video purchase price, quantity, date order placed, date order
received, and the address of the branch receiving the order.
A customer of StayHome must first register as a member of a local branch of StayHome. The details
held on a member are name, address, and the date that the member registered at a branch. Each member
is given a member number, which is unique throughout all branches of the company and is used even
when a member chooses to register at more than one branch.
The details held on each video rented are the rental number, full name and member number, the video
number, title, and daily rental rate, and the dates the video is rented out and returned. The rental number
is unique throughout the company.
Transaction Requirements
Data Entry
(a) Enter the details for a newly released video (such as details of a video called Independence
Day).
(b) Enter the details of a video supplier (such as a supplier called WorldView Videos).
(c) Enter the details of a video order (such as ordering 10 copies of Saving Private Ryan for
branch B002).
Data Queries
Database Systems: Instructor’s Guide – PART IV
(a) List the name, position, and salary of staff at all branches, ordered by branch number.
(b) List the name and telephone number of the Manager at a given branch.
(c) List the catalog number and title of all videos at a given branch, ordered by title.
(d) List the number of copies of a given video at a given branch.
(e) List the number of members at each branch, ordered by branch number.
(f) List the number of members who joined this year at each branch, ordered by branch number.
(g) List the number of video rentals at each branch between certain dates, ordered by branch
number.
(h) List the number of videos in each category at a given branch, ordered by category.
(i) List the name, address, and telephone number of all video suppliers, ordered by supplier
number.
(j) List the name and telephone number of a video supplier.
(k) List the details of all video orders placed with a given supplier, ordered by the date of order.
(l) List the details of all video orders placed on a certain date.
(m) List the total daily rentals for videos at each branch between certain dates, ordered by branch
number.
15.7 (a) Create a conceptual schema for each view of StayHome using the concepts of the Enhanced
Entity–Relationship (EER) model. To simplify each diagram, only show entities, relationships
and the primary key attributes. Specify the cardinality ratio and participation constraint of
each relationship type. State any assumptions you make when creating the EER model (if
necessary).
The student should show that each of the transactions identified above is supported by his/her
conceptual data models.
(c) Map your high-level local conceptual data models to local logical data models.
Identify primary, alternate, and foreign keys.
Branch View
Directs Director
1..1 directorNo
1..*
1..1
Telephone
Is
telNo
1..3 Supervises
Supervisee
1..* Provides 1..1 0..*
Tables for local logical data model for the Branch view
Actor (actorNo, actorName) Branch (branchNo, street, city, state, zipCode, mgrStaffNo)
Primary Key actorNo Primary Key branchNo
Alternate Key zipCode
Foreign Key mgrStaffNo references Staff(staffNo)
Director (directorNo, directorName) Member (memberNo, fName, lName, address)
Primary Key directorNo Primary Key memberNo
Registration (branchNo, memberNo, staffNo, RentalAgreement (rentalNo, dateOut, dateReturn, memberNo,
dateJoined) videoNo)
Primary Key branchNo, memberNo Primary Key rentalNo
Foreign Key branchNo references Branch(branchNo) Alternate Key memberNo, videoNo, dateOut
Foreign Key memberNo references Member(memberNo) Foreign Key memberNo references Member(memberNo)
Foreign Key staffNo references Staff(staffNo) Foreign Key videoNo references VideoForRent(videoNo)
Role (catalogNo, actorNo, character) Staff (staffNo, name, position, salary, branchNo, supervisorStaffNo)
Primary Key catalogNo, actorNo Primary Key staffNo
Foreign Key catalogNo references Video(catalogNo) Foreign Key branchNo references Branch(branchNo)
Foreign Key actorNo references Actor(actorNo) Foreign Key supervisorStaffNo references Staff(staffNo)
Telephone (telNo, branchNo) Video (catalogNo, title, category, dailyRental, price, directorNo)
Primary Key telNo Primary Key catalogNo
Foreign Key branchNo references Branch(branchNo) Foreign Key directorNo references Director(directorNo)
Business View
Supplier
supplierNo
1..1
Supplies
1..*
1..1 1..*
Is
Places
1..1
1..*
0..* 1..*
Video (catalogNo, title, category, dailyRental, price, VideoForRent (videoNo, available, catalogNo, branchNo)
supplierNo) Primary Key videoNo
Primary Key catalogNo Foreign Key catalogNo references Video(catalogNo)
Foreign Key supplierNo references Supplier(supplierNo) Foreign Key branchNo references Branch(branchNo)
VideoOrder (orderNo, dateOrdered, dateReceived, VideoOrderLine (orderNo, catalogNo, quantity)
branchNo) Primary Key orderNo, catalogNo
Primary Key orderNo Foreign Key orderNo references VideoOrder(orderNo)
Foreign Key branchNo references Branch(branchNo) Foreign Key catalogNo references Video(catalogNo)
Database Systems: Instructor’s Guide – PART IV
(d) Merge the two logical data models together to produce a global logical data model.
Supplier
supplierNo
1..1
Directs Director
1..1 directorNo
1..1 1..*
Telephone
telNo
Is
Places
1..3 Supervises
Supervisee
Provides 1..1
1..* 1..1 0..*
Actor (actorNo, actorName) Branch (branchNo, street, city, state, zipCode, mgrStaffNo)
Primary Key actorNo Primary Key branchNo
Alternate Key zipCode
Foreign Key mgrStaffNo references Staff(staffNo)
Director (directorNo, directorName) Member (memberNo, fName, lName, address)
Primary Key directorNo Primary Key memberNo
Registration (branchNo, memberNo, staffNo, dateJoined) RentalAgreement (rentalNo, dateOut, dateReturn, memberNo,
Primary Key branchNo, memberNo videoNo)
Foreign Key branchNo references Branch(branchNo) Primary Key rentalNo
Foreign Key memberNo references Member(memberNo) Alternate Key memberNo, videoNo, dateOut
Foreign Key staffNo references Staff(staffNo) Foreign Key memberNo references Member(memberNo)
Foreign Key videoNo references Video(videoNo)
Role (catalogNo, actorNo, character) Staff (staffNo, name, posiiton, salary, branchNo, supervisorStaffNo)
Primary Key catalogNo, actorNo Primary Key staffNo
Foreign Key catalogNo references Video(catalogNo) Foreign Key branchNo references Branch(branchNo)
Foreign Key actorNo references Actor(actorNo) Foreign Key supervisorStaffNo references Staff(staffNo)
Supplier (supplierNo, name, address, telNo, status) Telephone (telNo, branchNo)
Primary Key supplierNo Primary Key telNo
Alternate Key telNo Foreign Key branchNo references Branch(branchNo)
Video (catalogNo, title, category, dailyRental, price, VideoForRent (videoNo, available, catalogNo, branchNo)
directorNo, supplierNo) Primary Key videoNo
Primary Key catalogNo Foreign Key catalogNo references Video(catalogNo)
Foreign Key directorNo references Director(directorNo) Foreign Key branchNo references Branch(branchNo)
Foreign Key supplierNo references Supplier(supplierNo)
VideoOrder (orderNo, dateOrdered, dateReceived, VideoOrderLine (orderNo, catalogNo, quantity)
branchNo) Primary Key orderNo, catalogNo
Primary Key orderNo Foreign Key orderNo references VideoOrder(orderNo)
Foreign Key branchNo references Branch(branchNo) Foreign Key catalogNo references Video(catalogNo)
Database Systems: Instructor’s Guide – PART IV
18.1 Which of the three basic file organizations (heap, ordered, hash) would you choose for a file
where the most frequent operations were as follows:
(a) Inserts and scans where the order of records does not matter.
(b) Record searches based on a range of field values.
(c) Record searches based on a particular field value.
18.2 Discuss the difference between each of the following types of indexes:
(a) A dense index has at least one data entry for every search key value that appears in a
record in the indexed file. A sparse index contains an entry for each page of records in a
data file. It must be clustered (so we can only have one sparse index on a data file).
(b) A primary index is an index on a set of fields that includes the primary key and is
guaranteed not to contain duplicates. A secondary index is not a primary index (and may
have duplicates).
(c) A clustered index is one in which the ordering of data entries is the same as the ordering
of data records. There can be at most one clustered index on a data file. An unclustered
index is an index that is not clustered. There can be more than one unclustered index on a
data file.
Database Systems: Instructor’s Guide – PART IV
DATABASE SYSTEMS
COURSEWORK
Introduction to Coursework
You and your group members are part of a consultancy company that specialises in the provision of
database applications. The Director of FastCabs has recently approached your company to undertake a
project to design and partially implement a database management system for the company.
Notes
1. You are in the initial stages of user requirements collection and analysis and are required to read
the FastCabs case study.
2. The information presented in the case study is an overview of how business is conducted at the
FastCabs. As the information described in the case study is an overview and ambiguous in places,
it will be necessary for you to make your own assumptions about certain aspects of the case study.
Your assumptions should be clearly stated in your coursework submission.
Coursework Requirements
(Submit hardcopy)
2. Derive relational schema from your ER model that represents the entities and relationships.
Identify primary, alternate and foreign keys. Note: use the following notation to describe your
relational schema, as shown in the example of a Staff relation given below.
(Submit hardcopy)
Database Systems: Instructor’s Guide – PART IV
3. Use the technique of normalization to validate the structure of your relational schema.
Demonstrate that each of your relations is in third normal form (3NF) by displaying the functional
dependencies between attributes in each relation. Note, if any of your relations are not in 3NF, this
may indicate that your ER model is structurally incorrect or that you have introduced errors in the
process of deriving relations from your model.
(Submit hardcopy)
4. To further demonstrate your knowledge of normalization, assume that a proposed (badly structured)
relation for the FastCabs database has the following structure.
jobID pickupDateTime
driverID dFNamedLName clientIDD cFNamecLName cAddress
Identify the functional dependencies represented in this relation and demonstrate the process of
normalizing this relation into 3NF relations.
(Submit hardcopy)
The prototype should facilitate the creation, maintenance and querying of records and where
appropriate automate various tasks for the user.
Note: The user manual should be a maximum of 15 pages in length (including screen dumps).
(Submit hardcopy)
Part 7 – Individual Critical Evaluation
Each student should submit his or her own critical assessment of the coursework. The evaluation
should include a discussion on how the coursework has reinforced (or otherwise) his or her
appreciation of the techniques and processes employed in undertaking a database project. In addition
the evaluation may include a wider discussion on topics such as:
How the Database Systems module relates to the other modules on your course.
How the knowledge and skills taught on the module and/or course, relates to your previous
experience as a student and/or employee.
The appropriateness of the knowledge and skills taught on the module for future employment.
This component of the coursework can be submitted as a separate document from the main part of the
coursework.
(Submit hardcopy)
Marking Scheme
The assessment of this coursework will be carried out on the following components of the work. Please
note that each student should submit his or her own critical evaluation of the coursework and will
receive an individual mark for this component (out of 10%). This individual student mark will be
combined with the mark for the groupwork component (out of 90%) for the coursework.
Part 6 – Demonstrate Database Application (5)
Part 7 – Individual Critical Evaluation (10)
A private taxi company called FastCabs was established in Glasgow in 1992. Since then, the company
has grown steadily and now has offices in most of the main cities of Scotland. However, the company
is now so large that more and more administrative staff are being employed to cope with the ever-
increasing amount of paperwork. Furthermore, the communication and sharing of information within
the company is poor. The Director of the company, Paddy MacKay feels that too many mistakes are
being made and that the success of his company will be short-lived if he does not do something to
remedy the situation. He knows that a database could help in part to solve the problem and has
approached you and your team to help in creating a database application to support the running of
FastCabs.
The Director has provided the following brief description of how FastCabs operates.
Each office has a Manager; several taxi owners, drivers and administrative staff. The Manager is
responsible for the day-to-day running of the office. An owner provides one or more taxis to FastCabs
and each taxi is allocated for use to a number of drivers. The majority of owners are also drivers.
FastCab taxis are not available for hire by the public hailing a taxi in the street but must be requested
by first phoning the company to attend a given address.
There are two kinds of clients, namely private and business. The business provided by private clients is
on an ad hoc basis. The details of private clients are collected on the first booking of a taxi. However,
the business provided by business clients is more formal and involves agreeing a contract of work with
the business. A contract stipulates the number of jobs that FastCabs will undertake for a fixed fee.
When a job comes into FastCabs the name, phone number and contract number (when appropriate) of
the client is taken and then the pick-up date/time and pick-up/drop-off addresses are noted. Each job is
allocated a unique jobID. The nearest driver to the pick-up address is called by radio and is informed of
the details of the job.
When a job is completed the driver should note the mileage used and the charge made (for private
clients only). If a job is not complete, the reason for the failed job should be noted.
The Director has provided some examples of typical queries that the database application for FastCabs
must support.
Staff WorksAt
staffNo 1..*
{Optional, And}
1..1
1..* 1..*
TaxiDriver Contract
Takes
Owns 1..1 contractNo
IsFor
1..* 1..*
Has
1..*
Requests 1..1
{Mandatory, Or}
Client
Database Systems: Instructor’s Guide – PART IV
Staff (staffNo, fName, lName, sAddress, jobDescription, salary, NIN, sex, DOB, officeNo)
Primary Key staffNo
Alternate Key NIN
Foreign Key officeNo references Office(officeNo)
TaxiDriver(vehRegNo, driverNo)
Primary Key vehRegNo, driverNo
Foreign Key vehRegNo references Taxi(vehRegNo)
Foreign Key driverNo references Staff(staffNo)
DATABASE SYSTEMS
COURSEWORK
Introduction to Coursework
You have been approached by a University for the design and implementation of a relational database
system that will provide information on the courses it offers, the academic departments that run the
courses, the academic staff and the enrolled students. The system will be used mainly by the students
and the academic staff.
The requirement collection and analysis phase of the database design process provided the following
data requirements for the University Database system.
Coursework Requirements
Each department runs a number of courses. The university provides a set of modules used in different
courses. Each course uses a number of modules but not every module is used. A course is assigned a
unique course code and a module is identified by a unique module code. A module can be used in one
course only, but can be studied by many students. In addition to the module code each module unique
title, start date, end date, texts (books), and assessment scheme (i.e. coursework and exam marks
percentages) are also stored.
Each course is managed by a member of academic staff, and each module is coordinated by a member
of academic staff also. The database should also store each course unique title, and duration (in years).
A student can enrol in one course at a time. Once enrolled a student is assigned a unique matriculation
number. To complete a course, each student must undertake and pass all the required modules in his/her
course. This requires that the database store the performance (pass or fail) of each student in every
module.
Additional data stored on each student includes student name (first and last), address (town, street, and
post code), date-of-birth, sex, and financial loan. For emergency purposes the database stores the name
(not composite), address (not composite), phone, and relationship of each student next-of-kin. None of
the next-of-kin's attributes is unique. Assume that every next-of-kin is a next-of-kin of one student only.
Each department is managed by a member of academic staff. The database should record the date
he/she started managing the department. Each department has a name, phone number, fax number, and
location (e.g. E Block). Each department employs many members of academic staff.
Database Systems: Instructor’s Guide – PART IV
A member of academic staff can be the leader (i.e. manager) of at most one course, but can be the
coordinator of more than one module. A member of academic staff may not be assigned any of the
above mentioned roles (coordinator, course leader, department manager). All members of academic
staff teach modules. Every member of academic staff teaches one or more modules, and a module may
be taught by more than one member of academic staff. The database should record the number of hours
per week a member of academic staff spend teaching each module. Each member of academic staff is
identified by a unique staff number. All members of staff and students have unique computer network
user ID numbers. Additional data stored on each member of academic staff includes name (first and
last), phone extension number, office number, sex, salary, post (lecturer, or senior lecturer, or Professor,
etc.), qualifications, and address (not composite). A member of academic staff work for one department
only.
(Submit hardcopy)
2. Derive relational schema from your ER model that represents the entities and relationships.
Identify primary, alternate and foreign keys. Note: use the following notation to describe your
relational schema, as shown in the example of a Staff relation given below.
(Submit hardcopy)
3. Use the technique of normalization to validate the structure of your relational schema.
Demonstrate that each of your relations is in third normal form (3NF) by displaying the functional
dependencies between attributes in each relation. Note, if any of your relations are not in 3NF, this
may indicate that your ER model is structurally incorrect or that you have introduced errors in the
process of deriving relations from your model.
(Submit hardcopy)
Identify the functional dependencies represented in this relation and demonstrate the process of
normalizing this relation into 3NF relations.
(Submit hardcopy)
The prototype should facilitate the creation, maintenance and querying of records and where
appropriate automate various tasks for the user.
Note: The user manual should be a maximum of 15 pages in length (including screen dumps).
(Submit hardcopy)
Part 7 – Individual Critical Evaluation
Each student should submit his or her own critical assessment of the coursework. The evaluation
should include a discussion on how the coursework has reinforced (or otherwise) his or her
appreciation of the techniques and processes employed in undertaking a database project. In addition
the evaluation may include a wider discussion on topics such as:
How the Database Systems module relates to the other modules on your course.
How the knowledge and skills taught on the module and/or course, relates to your previous
experience as a student and/or employee.
The appropriateness of the knowledge and skills taught on the module for future employment.
This component of the coursework can be submitted as a separate document from the main part of the
coursework.
(Submit hardcopy)
Marking Scheme
The assessment of this coursework will be carried out on the following components of the work. Please
note that each student should submit his or her own critical evaluation of the coursework and will
receive an individual mark for this component (out of 10%). This individual student mark will be
combined with the mark for the groupwork component (out of 90%) for the coursework.
Part 4 – Implement Prototype Database Application (15)
Part 5 – Document Database Application (10)
Part 6 – Demonstrate Database Application (5)
Part 7 – Individual Critical Evaluation (10)
Database Systems: Instructor’s Guide – PART IV
Partial ER diagrams
name {PK}
cCode {PK} name {PPK} phone
title {AK} phone faxNo
duration relationship location
Database Systems: Instructor’s Guide – PART IV
Runs
startDate
1..* 1..1
Manages
Course Leads Staff Department
1..1 0..1
cCode 0..1 1..1 staffNo deptName
Employs
1..1 0..1 1..1 1..* 1..1
0..*
Coordinates
Teaches
Enrol Uses
hours
0..* 1..*
1..* 1..* Module
Student
Undertake mCode
matricNo
0..* 1..*
1..1
Has performance
1..1
Next-Of-Kin
name
Staff {staffNo, fName, lName, address, phone, officeNo, sex, salary, post, computerId}
Student {matricNo, fName, lName, town, street, postCode, dob, sex, loan}
Staff(staffNo, fName, lName, address, phone, officeNo, sex, salary, post, computerId,
deptName)
Primary key: staffNo
Foreign key: deptName references Department(deptName)
Student(matricNo, fName, lName, town, street, postCode, dob, sex, loan, courseCode)
Primary key: matricNo
Foreign key: courseCode references Course(cCode)
Texts(moduleCode, text)
Primary key: moduleCode, text
Foreign key: moduleCode references Module(mCode)
Qualifications(qualStaffNo, qualification)
Primary key: qualStaffNo, qualification
Database Systems: Instructor’s Guide – PART IV
4. Normalisation
Functional dependencies
fd1: (matricNo, moduleTitle) {name, sex, modSartDate, performance, flatNo, address}
fd2: matricNo {name, sex, flatNo, address}
fd3: moduleTitle moduleStartdate
fd5: flatNo address
2NF - fd2 and fd3 violate the definition of 2NF. Decompose relation.
fd2
fd3
fd4
fd1
pk fk pk
matricNo name sex flatNo moduleTitle moduleStartDate
flatNoflatNo
fd2 fd3
pk
pk
fk fk
moduleTitle moduleStartDate performance flatNo address
fd1 fd4
(a) SELECT *
FROM Department
WHERE location = ‘E Block’;
20.1 Describe a general plan of action for initiating a security policy, elaborating on each stage that
might be undertaken.
First of all, the need for one must be appreciated, and there must be commitment on the part of
senior managers. Depending on course coverage, an IT security team may be formed to oversee
the development of the policy. They may decide on an information classification exercise for the
area under consideration, then carry out a risk analysis. Following on from this, the policy will be
prepared, specific responsibilities identified, and then standards and procedures formulated for
implementation. The whole process is iterative, the policy should be continually refined. Certain
aspects of the plan should be elaborated, such as, how information might be classified, how risk
analysis might be carried out, what the policy should cover.
20.2 Detail the types of problems associated with microcomputer security and the types of
countermeasures that could be installed against loss.
The data a PC holds may be considerably more valuable than the machine itself. We are
concerned here with both data security and physical security. Some obvious precautions that
concern the data. For example, careful storage of the media, regular backups taken that are labeled
and classified if appropriate. Working procedures should be appropriately defined. Obvious
physical security such as fixing the machine to a surface, using locks and/or alarming it. Other
measures include using security programs, careful disposal of old media and equipment, staff
training.
20.3 Discuss the problems associated with microcomputer security, and contrast the measures required
to provide a secure environment with those of a mainframe computing environment.
The problems concern data security and physical security. Problems and precautions in dealing
with these are generally as for part two of the previous question. It is important that a contrast is
made with a mainframe environment. For example, in dealing with microcomputers, you are
dealing with individual machines and staff, possibly over a wide location. Much of the
responsibility rests with members of staff, consequently, all staff training is important. In the
mainframe environment it is possible to set up centralized controls (physical and logical).
Consequently, it should be an easier environment to control, with the responsibility residing in the
IT manager.
20.4 Discuss the types of threat that might occur within the general database environment, and
indicate the measures that could be taken to safeguard against them.
The answer should cover the broad categories of theft and fraud, loss of confidentiality, loss of
privacy, loss of integrity and loss of availability. These threats may be either accidental or
intentional. Examples should be given together with the type of measure that could be in place.
For example, a programmer altering programs or data. Programmers should not normally have
access to live programs or data, and any access should be properly authorized with details that
enable any audit to be made properly. Adequate audit controls should also uncover any odd
functioning of the system.
Database Systems: Instructor’s Guide – PART IV
20.5 Explain the integrity features that a database management system may provide making reference
to the system used, and indicate the disadvantages that arise where they are not available.
This answer also requires knowledge of a particular DBMS that is being used. The basic integrity
features can be explained, such as domains, primary key specification, nulls, referential integrity,
with reference to how the particular DBMS provides them. The advantages of having the DBMS
handle these should be appreciated to determine the disadvantages. For example, validation
routines would need to be included in all relevant programs, it is difficult to understand constraints
that are buried in programming code, duplication of effort.
20.6 The increasing accessibility of databases on the Internet and intranets requires a reanalysis and
extension of the normal approaches to security. Discuss some of the issues associated with the
database system security in these environments.
22.1 (a) Locking-based algorithms for concurrency control can be employed to synchronize the
execution of transactions. Explain what is meant by a serializable schedule and show
that the following locking-based schedule is not serializable:
S= [wl1(y), wl2(y), R1(y), W1(y), R2(y), W2(y), rl1(y), rl2(y), wl2(z), R2(z), W2(z),
rl2(z), C2,
wl1(z), R1(z), W1(z), rl1(z), C1]
A schedule S is said to be serializable if all the reads and writes of each transaction can be
reordered in such a way that when they are grouped together as in a serial schedule the net effect
of executing this reorganized schedule is the same as that of the original schedule S. This
reorganized schedule is called the equivalent serial schedule. A serializable schedule will there-
fore be equivalent to and have the same effect on the database as some serial schedule.
(b) Identify the problem with the above schedule, and produce a correct locking-based
serializable schedule.
The problem is that the locking algorithm releases the locks that are held by a transaction as soon
as the associated database command is executed, and that lock unit no longer needs to be
accessed. However, the transaction itself is locking other items after it releases its lock on the data
item. This permits transactions to interfere with one another, resulting in the loss of total isolation
and atomicity. Need to use two-phase locking. Correct schedule (for example):
22.2 ‘One of the potential advantages of Distributed Database Management Systems is improved
reliability and availability.’
(a) The consistency and reliability aspects of transactions are due to the ‘ACIDity’
properties of transactions. Discuss each of these properties and how they relate to the
concurrency control and recovery mechanisms. Give examples to illustrate your answer.
22.3 (a) Produce a wait-for-graph for the transactions with locking information shown in Table
1. What can you conclude for this graph?
Table 1
(b) Compare and contrast the approaches to deadlock management in database systems.
22.4 The locking information for several transactions is shown in Table 2. Produce a wait-for-graph
(WFG) for the transactions and determine whether deadlock exists.
T1 X2 X1
T3 X8 X4
T4 X7 X1
T5 X1, X5 X3
T6 X4, X9 X6
T7 X6 X5
Table 2
T1 T2
T5 T4 T3
T7 T6
Database Systems: Instructor’s Guide – PART IV
22.5 Locking-based algorithms for concurrency control can be employed to synchronize the execution
of transactions. Explain the rules for two-phase locking in a centralized Database Management
System and why each of these is necessary to avoid the database becoming inconsistent.
Optimistic: assume not too many transactions will conflict with one another. Delay
synchronization of transactions until their termination.
Pessimistic: assume many transactions will conflict with each other. Synchronization execution of
transactions early in their lifecycle.
Database Systems: Instructor’s Guide – PART IV
Student Project
Assignment – Transaction Management Design and Implementation
Introduction
The objective of this project is to design and implement part of a Relational Database Management System.
The work is to be undertaken in groups – it will be left to yourselves to split into groups of 2/3.
Specification of Requirements
Design and implement an interface to the proprietary Relational Database Management System, Oracle, on
the SUN workstations, which will provide layers for:
(a) authorization
(b) concurrency control
(c) deadlock management
(d) recovery control.
The interface will be implemented using embedded SQL based on the architecture of a client–server model.
There will be a regular project meeting scheduled for each group, at which time the group will present an
outline of the work carried out since the last meeting with a list of current problems/suggested solutions.
Marking Scheme
For a more complete description, see Case Study 3 under Chapter 22.
23.1 Using the above relational schema, determine whether the following query is both type and
semantically correct:
Query graph will show that OrderDetail has no join to the other relations, although it is
connected to the Result. Should conclude that it is not semantically correct.
23.2 Consider the above relational schema. Map the following query onto a relational algebra tree,
and then transform it into a reduced query:
Give a full explanation of the reasoning behind each step and state any transformation rules
used during the reduction process.
Database Systems: Instructor’s Guide – PART IV
3 areaNo
3 clientNo O A
dateOrder > ‘1-Jan-96’ AND dateDeliver < ‘1-Jun-96’ cName = ‘J. Smith’
CO C
3 areaNo
orderNo, cAddress, oAddress, areaNo areaNo
3 officeNo areaDescription = ‘NE’
orderNo, cAddress
officeNo, oAddress, areaNo
3 clientNo O A
clientNo,orderNo
clientNo, cAddress
dateOrder > ‘1-Jan-96’ AND dateDeliver < ‘1-Jun-96’ cName = ‘J. Smith’
CO C
Database Systems: Instructor’s Guide – PART IV
For a more complete description, see Case Study 4 under Chapter 22.
23.3 Using the above relational schema, determine whether the following query is both type and
semantically correct:
23.4 Consider the above relational schema. Map the following query onto a relational algebra tree,
and then transform it into a reduced query:
Give a full explanation of the reasoning behind each step and state any transformation rules
used during the reduction process.
Database Systems: Instructor’s Guide – PART IV
medName = ‘Provac’
3 medNo
petDescription = ‘Setter’
3 ownerNo
P unitsPerDay > 200 medName = ‘Provac’
O 3 petNo
PR M
3 medNo
P
PR M
3 ownerNo
P medNo
petNo, medNo
PR M
Database Systems: Instructor’s Guide – PART IV
23.5 Now assume that the relation Medication given in Question 20.4 is horizontally fragmented as follows:
M1 = medName = ‘Provac’(Medication)
M2 = medName != ‘Provac’(Medication)
Transform the relational algebra tree from Question 20.4 into a reduced query on fragments.
Give a full explanation of the reasoning behind each step and state any transformation rules
used during the reduction process.
3 medNo 3 medNo
3 medNo
PR1
Database Systems: Instructor’s Guide – PART IV
For a more complete description, see Case Study 5 under Chapter 22.
23.6 Using the above relational schema, determine whether the following query is both type and
semantically correct:
Query graph will show that the join graph is connected to the Result. Should conclude that it is
semantically correct.
23.7 Consider the above relational schema. Map the following query onto a relational algebra tree,
and then transform it into a reduced query:
Give a full explanation of the reasoning behind each step and state any transformation rules
used during the reduction process.
Database Systems: Instructor’s Guide – PART IV
3 officeNo
3 unitRegNo O
3 officeNo officeNo = 2 TR U
3 unitRegNo T
O
3 trailerNo U
T TR
trailerDescription, unitDescription
3 officeNo
3 unitRegNo officeNo = 2
trailerDescription, maxCarryingWt, officeNo, unitRegNo
O
3 trailerNo officeNo = 2
officeNo = 2 U
TR
T
Database Systems: Instructor’s Guide – PART IV
23.8 Now assume that the relation Trailer given in Question 20.7 is horizontally fragmented as
follows:
Transform the relational algebra tree from Question 21.7 into a reduced query on fragments.
Give a full explanation of the reasoning behind each step and state any transformation rules
used during the reduction process.
trailerDescription, unitDescription
3 officeNo
3 unitRegNo officeNo = 2
trailerDescription, maxCarryingWt, officeNo, unitRegNo
3 trailerNo officeNo = 2 O
officeNo = 2 U
T1 T2 T3
Database Systems: Instructor’s Guide – PART IV
Using the fact that the Selection operation on (officeNo = 2) will not produce any tuples from
T1 and T3, this will simplify to the following tree:
trailerDescription, unitDescription
3 officeNo
3 unitRegNo officeNo = 2
trailerDescription, maxCarryingWt, officeNo, unitRegNo
3 trailerNo officeNo = 2 O
T2 TR2 U
Database Systems: Instructor’s Guide – PART IV
A large real estate agency has decided to distribute its project management information at the regional
level. A part of the current centralized relational schema is as follows:
Employee (NIN, fName, lName, address, DOB, sex, salary, taxCode, agencyNo)
Agency (agencyNo, agencyAddress, managerNIN, propertyTypeNo,
regionNo)
Property (propertyNo, propertyTypeNo, propertyAddress, ownerNo,
askingPrice, agencyNo, contactNIN)
Owner (ownerNo, fName, lName, ownerAddress, ownerTelNo)
PropertyType (propertyTypeNo, propertyTypeName)
Region (regionNo, regionName)
where Employee contains employee details and the national insurance number NIN is the key.
Agency contains agency details and agencyNo is the key. managerNIN identifies
the employee who is the manager of the agency. There is only one manager
for each agency; an agency only handles one type of property.
Property contains details of the properties the company is dealing with and the key is
propertyNo. The agency that deals with the property is given by
agencyNo, and the contact in the estate agents by contactNIN; owner is
given by ownerNo.
Owner contains details of the owners of the properties and the key is ownerNo.
PropertyType contains the names of the property types and the key is propertyTypeNo.
and Region contains names of the regions and the key is regionNo.
Information is required by property type, which covers: Domestic, Industrial, and Letting. There are no
Industrial properties in the South and all Letting properties are in the West of Scotland. Properties are
handled by the local estate agents office. As well as distributing the data on a regional basis, there is an
additional requirement to access the employee data either by personal information (by Personnel) or by
salary-related information (by Payroll).
Database Systems: Instructor’s Guide – PART IV
24.1 (a) Draw an Entity–Relationship Diagram for the above case study.
Employee Handles
NIN
1..1
1..* 1..1
Manages
Has
1..1 0..1 0..*
1..*
TypeFor
1..1
PropertyType
propertyTypeNo
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
Give a full explanation of the reasoning behind each step and state
any assumptions necessary to support your design.
Possible solution
Don’t fragment PropertyType/Region – replicate relations at all sites – only contain a small
number of records.
Agency
Use primary horizontal fragmentation for Agency with minterm predicates:
Reconstruction: A1 A2 A3 A4 A5 A6 A7 A8
Employee
Use vertical fragmentation for Employee:
Reconstruction: P1 P2 P3 P4 P5 P6 P7 P8.
Reconstruction: O1 O2 O3 O4 O5 O6 O7 O8.
Database Systems: Instructor’s Guide – PART IV
Quack Consulting is a computer consulting firm that specializes in developing and installing PC-based
hardware/software systems. Quack Consulting has decided to distribute its project management
information at the regional level. A part of the current centralized relational schema is as follows:
where Client contains client details and the client number clientNo is the key.
Project contains project details and the project number projectNo is the key.
Speciality contains details of all consultants’ skills (such as Project Management, C
Programming, SSADM) and the skill identifier skillID is the key.
Consultant contains consultant details and the national insurance number NIN is the key.
Task Projects are often quite complex and consist of separate tasks, each of which
consists of a description, a start and end date and the national insurance number
of the consultant assigned to the task. Only one consultant is ever assigned to a
task. The combination of projectNo/consultantNIN is the key.
Time contains details of the time a consultant has spent on a task and the key is the
combination of projectNo/consultantNIN/date.
and Region contains names of the regions and the key is regionNo.
In addition, clients are grouped into the same regions; projects are managed by the office closest to the
client. As well as distributing the data on a regional basis, there is an additional requirement to access the
employee data either by personal information (by Personnel) or by salary-related information (by Payroll).
Database Systems: Instructor’s Guide – PART IV
24.2 (a) Draw an Entity–Relationship Diagram for the above case study.
Time
NIN, projectNo,
date
1..*
Has
1..1
Manages
Runs Contains
1..1 1..1
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
Give a full explanation of the reasoning behind each step and state any assumptions
necessary to support your design.
Possible solution
Don’t fragment Speciality/Region – replicate relations at all sites – only contain a small number of
records.
Consultant
Use vertical fragmentation for Consultant first:
Then use primary horizontal fragmentation for Consultant with minterm predicates:
Client
Use primary horizontal fragmentation for Client with minterm predicates:
Reconstruction: T1 T2 T3 T4
Tm1 Tm2 Tm3 Tm4
Project
Use derived fragmentation for Project:
Reconstruction: P1 P2 P3 P4
Database Systems: Instructor’s Guide – PART IV
A home shopping catalog company called InstantBuy specializes in the provision of clothing and
household items for customers. InstantBuy has many offices throughout the UK and Eire to process
customer orders and has decided to distribute its operations according to areas of the UK and Eire. There is
also an additional requirement to distribute staff information according to area and furthermore to allow
access to staff data either by personal information (by Personnel) or by-salary-related information (by
Payroll).
where:
Client contains the details of clients and the client number (clientNo) is the key. Clients are
registered with the office nearest to their home.
Item contains the details of items and the item number (itemNo) is the key.
ItemType contains the description of types of items and item type (itemType) is the key.
ClientOrder contains the details of client orders for clothing and/or household items and order
number (orderNo) is the key.
OrderDetail contains the details of items ordered and order number / item number (orderNo,
itemNo) is the key.
Staff contains the details of staff and staff number (staffNo) is the key.
Office contains the details of offices and office number (officeNo) is the key. Each office has a
Manager.
Area contains the description of areas of the UK and Eire and the key is area number
(areaNo).
InstantBuy offices are grouped into areas of the UK and Eire as follows:
InstantBuy provides various types of items that fall into the following categories:
W: Womenswear C: Childrenswear
M: Menswear H: Household
InstantBuy only provides household items to customers in Scotland and Wales, and North England, and
does not provide childrenswear items in Eire and Northern Ireland.
Database Systems: Instructor’s Guide – PART IV
24.3 (a) Draw an Entity–Relationship Diagram for the above case study.
1..* 1..*
Handles PartOf
1..1 1..1
Has
Manages TypeFor
1..* 1..1 1..1
Staff ItemType
staffNo itemTypeNo
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
Give a full explanation of the reasoning behind each step and state any assumptions
necessary to support your design.
Possible solution
Assumption – The Area/ItemType/Item relations will be used for reference purposes and will not be
subjected to frequent updates. Although some of the areas require access to only part of the Items table,
there are future plans to offer all of the item types in all of the areas.
Office/Client/ClientOrder/OrderDetail Relations
Predicates
{areaNo = 1, areaNo = 2, areaNo = 3, areaNo =4}
Staff – Mixed
S1: staffNo, sName, sAddress, sTelNo, sex, officeNo(Staff)
S2: staffNo, DOB, NIN, taxCode, salary(Staff)
then:
S1i: S1 officeNo Oi
Reconstruction
Office = O1 O2 O3 O4
Client = C1 C2 C3 C4
ClientOrder = CO1 CO2 CO3 CO4
OrderDetail = OD1 OD2 OD3 OD4
Staff = (S11 S21 S31 S41) NIN S2
Database Systems: Instructor’s Guide – PART IV
A company called Complete Pet Care provides private health-care for domestic pets throughout the
UK. Complete Pet Care, which currently has over one hundred surgeries and opens a new surgery
almost every month, has decided to distribute its operations according to areas of the country (i.e.
Scotland, North England, South East England, South West England, Wales and Northern Ireland). The
company proposes to distribute staff details to the appropriate areas, however staff payroll details will
be processed by the Head Office of Complete Pet Care, which is located in the Scotland.
where
Owner contains details of owners and the owner number (ownerNo) is the key. Owners are
registered with the nearest surgery in their area. Owners may own several pets.
Pet contains details of pets and the pet number (petNo) is the key. The owner is given by
the owner number (ownerNo) and the surgery by the surgery number (surgeryNo).
A pet can only be registered with one surgery at a time.
Staff contains details of staff and staff number (staffNo) is the key.
Surgery contains details of each surgery and the surgery number (surgeryNo) is the key. Each
surgery has a Manager represented as manager staff number (mgrStaffNo).
Prescription contains details of the prescriptions for the pets and the prescription number
(prescNo) is the key. The pet receiving the treatment is represented by the pet
number (petNo), the medication prescribed is represented by the medication number
(medNo) and the vet who prescribed the medication is given by the vet’s staff
number (prescStaffNo).
Medication contains details of medication prescribed to pets and the medication number
(medNo) is the key.
Area contains the names of each area and the area number (areaNo) is the key.
24.4 (a) Draw an Entity–Relationship Diagram for the above case study.
Receives
1..1
1..*
CaresFor
1..1
Surgery
0..1
surgeryNo
1..1
1..*
Contains
1..1
Area
areaNo
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
Give a full explanation of the reasoning behind each step and state any assumptions
necessary to support your design.
Possible solution
Reconstructions:
Surgery: S1 S2 S3 S4 S5 S6
Pet: P1 P2 P3 P4 P5 P6
Prescription: PR1 PR2 PR3 PR4 PR5 PR6
Medication: M1 M2 M3 M4 M5 M6
Owner: O1 O2 O3 O4 O5 O6
Staff: (ST11 ST12 ST13 ST14 ST15 ST16) nin ST2
Database Systems: Instructor’s Guide – PART IV
A haulage company called Rapid Roads specializes in the transportation of loads throughout the UK
and Europe. Rapid Roads has many offices throughout the UK and Europe to process customer orders
and has decided to distribute its operations according to these countries. The company also proposes to
distribute staff details to the appropriate countries, however staff payroll details will be processed by
the Head Office of Rapid Roads, which is located in the UK.
where
Client contains the details of clients and the client number (clientNo) is the key. Clients are
registered with an office in their country and nearest to the location of their company.
Unit contains the details of the unit that pulls one or two trailers and the registration
number (unitRegNo) is the key.
Trailer contains the description of the trailer that is pulled by the unit and the trailer number
(trailerNo) is the key.
ClientOrder contains the details of client orders for the transportation of a load from the collection
address (collectAddress) to the delivery address (deliveryAddress) and the order
number (orderNo) is the key.
TransportReq contains the details of the transportation (units plus trailers) required to fulfill a
clients order. The client’s order number (orderNo) and the registration number of a
unit (unitRegNo) is the key.
Staff contains the details of staff and staff number (staffNo) is the key.
Office contains the details of offices and office number (officeNo) is the key. Each office
has a Manager. Orders
Client ClientOrder Contains TransportReq
Country contains the name of the country andorderNo
clientNo orderNo,
country number (countryNo) is the key. unitNo,
1..1 1..* 1..1 1..* trailerNo
Handles UPartOf
Country 1 (C1): UK1..1 Country 4 (C4): Switzerland
HasUnit Unit
Country 2 (C2): France (C5): Spain 1..1
Country 5unitNo TPartOf
Office 1..1 1..*
Country 3 (C3): officeNo
Germany Country 6 (C6): Italy 1..1
HasTrailer
Trailer
1..1 1..*
1..* trailerNo
24.5 (a) Draw an Entity–Relationship Diagram for the above case study.
Runs
1..1
Country
countryNo
Database Systems: Instructor’s Guide – PART IV
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
Give a full explanation of the reasoning behind each step and state any assumptions
necessary to support your design.
Possible solution
Client/Unit/Trailer
CIE = Client officeNo Oi I = 1..6
Ui = Unit officeNo Oi I = 1..6
Ti = Trailer officeNo Oi I = 1..6
Database Systems: Instructor’s Guide – PART IV
(ii) Reconstruction:
Office: O1 O2 O3 O4 O5 O6
Client: C1 C2 C3 C4 C5 C6
Unit: U1 U2 U3 U4 U5 U6
Trailer: T1 T2 T3 T4 T5 T6
ClientOrder: CO1 CO2 CO3 CO4 CO5 CO6
TransportReq: TR1 TR2 TR3 TR4 TR5 TR6
Staff: (S11 S12 S13 S14 S15 S16 ) NIN S2
Database Systems: Instructor’s Guide – PART IV
Perilous Printing is a large printing company that does work for book publishers throughout Europe.
The company currently has over 50 offices, most of which operate autonomously, apart from salaries,
which are paid by the head office in each country. To improve the sharing and communication of data,
the company has decided to implement a Distributed DBMS. Perilous Printing jobs consist of printing
books or part of books. A printing job requires the use of materials, such as paper and ink, which are
assigned to a job via purchase orders. Each printing job may have several purchase orders assigned to
it. Likewise, each purchase order may contain several purchase order items.
Office contains details of each office and the office number (officeNo) is the key. Each
office has a Manager represented by the manager’s national insurance number
(mgrNIN).
Staff contains details of staff and the national insurance number (NIN) is the key. The
office that the member of staff works from is given by officeNo.
Publisher contains details of publisher and the publisher number (pubNo) is the key. Publishers
are registered with the nearest office in their country, given by officeNo.
Bookjob contains details of publishing jobs and the job number (jobNo) is the key. The
publisher is given by the publisher number (pubNo) and the supervisor for the job by
supervisorNIN.
PurchaseOrder contains details of the purchase orders for each job and the combination of job number
and a purchase order number (jobNo, poNo) form the key.
Item contains details of all materials that can be used in printing jobs and the item number
(itemNo) is the key.
POItem contains details of the items on the purchase order and (jobNo, poNo, itemNo)
forms the key.
Country contains the names of each country that Perilous Printing operates in and the country
number (countryNo) is the key.
As well as accessing printing jobs based on the publisher, jobs can also be accessed on the job type
(jobType), which can be: 1 – Normal; 2 – Rush.
24.6 (a) Draw an Entity–Relationship Diagram for the above case study.
Database Systems: Instructor’s Guide – PART IV
1..* 1..*
1..*
Handles Supervises
PartOf
1..1 Has 1..1 1..1
Runs TypeFor
1..1 1..1
Country ItemType
countryNo itemTypeNo
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
Give a full explanation of the reasoning behind each step and state any assumptions
necessary to support your design.
Possible solution
Publisher
Pi: officeNo=I(Publisher) I = 1…50
Staff
S1: NIN, sName, sAddress, sTelNo, officeNo(Staff)
S2: NIN, sex, DOB, position, taxCode, salary, officeNo(Staff) Note duplication of officeNo is necessary here
S1i: officeNo=I (S1) I = 1…50
S2j: officeNo=j (S2) j = 1…5, where officei is the HQ for Country I
Bookjob
Bi: Bookjob pubNo Pi I = 1…50
PurchaseOrder
POi: PurchaseOrder jobNo Bi I = 1…50
POItem
POIi: POItem jobNo,poNo POi I = 1…50
Reconstructions
Publisher: P1 P2 … P50
Bookjob: B1 B2 … B50
PurchaseOrder: PO1 PO2 … PO50
POItem: POI1 POI2 … POI50
Staff: (S11 S12 … S150) NIN (S21 S22 … S25)
Advantages
Locality of Reference. Data stored close to where it is used, if possible. If a fragment is used at a
number of sites, it may be advantageous to store copies of the fragment at these sites.
Improved Reliability and Availability. Reliability and availability are improved by replication;
there is another copy of the fragment available at another site in the event of one site failing.
Performance. Inherent parallelism and distribution of resources.
Storage capacities and costs.
Reduced Communication costs.
Security. Data not required by local applications is not stored and so not available for
unauthorized users.
Disadvantages
Design. May be difficult to design and allocate fragments efficiently.
Query optimization is more difficult.
Integrity is more difficult.
Database Systems: Instructor’s Guide – PART IV
24.8 (a) A DDBMS may be classified as homogeneous or heterogeneous. Compare and contrast
these two types of distributed systems.
In a homogeneous system, all sites use the same DBMS package. In a heterogeneous system, sites
may run different DBMS packages. Not only may the packages be different, but the packages
may not use the same underlying data model, so that there may be a mixture of relational, network
and hierarchic DBMSs.
Homogeneous systems are much easier to design and manage. This approach provides
incremental growth, making the addition of a new site to the distributed system easy, and
increased performance, by exploiting the parallel processing capability of multiple sites.
Heterogeneous systems usually result when individual sites have implemented their own database
solutions, and integration is considered at a later date. In a heterogeneous system, translations are
required to allow the different DBMSs to communicate. To provide DBMS transparency, users
must be able to make requests in the language of the DBMS at the local site. The system then has
the task of locating the data and performing any necessary translation. Data may be required from
another site which may have:
different hardware
different DBMS packages
different hardware and different DBMS packages.
(b) Discuss the extended capabilities or services that a DDBMS must provide over a
centralized DBMS.
extended communication services to provide access to remote sites and allow the transfer of
queries and data among the sites, using a network;
extended Data Dictionary to store data distribution details;
distributed query processing including query optimization and remote data access;
extended concurrency control to maintain consistency of replicated data;
extended recovery services to take account of failures of individual sites and the failures of
communication links.
To list the clients in Edinburgh who have ordered items of type ‘TV3190’, we can use the SQL query:
SELECT C.clientNo
FROM Client C, OrderDetail OD, ClientOrder CO
WHERE C.clientNo = CO.clientNo AND CO.orderNo = OD.orderNo AND
cCity = ‘Edinburgh’ AND itemType = ‘TV3190’;
For simplicity, assume that each tuple in each relation is 10 characters long, there are 100
clients who have ordered item ‘TV3190’, there are 10 clients in Edinburgh and computation
Database Systems: Instructor’s Guide – PART IV
time is negligible compared to communication time. The communication system has a data
transmission rate of 10,000 characters per second and a 1-second access delay to send a
message from one site to another. For the following five possible strategies for this query,
calculate the communication times, using the following algorithm:
Communication Time = C0+(no_of_bits_in_message/transmission_rate_per_bit)
where C0 is the access delay.
Strategy Description
1 Move the Client relation to London and process query there.
2 Move the ClientOrder and OrderDetail relations to Glasgow and process query there.
3 Join the ClientOrder and OrderDetail relations at London, select tuples for items ‘TV3190’, and then
for each of these tuples in turn, check at Glasgow to determine if the associated Client city is Edinburgh.
4 Select Clients in Edinburgh at Glasgow, and for each one found, check at London to see if there is a
ClientOrder involving that Client and a ‘TV3190’ OrderDetail.
5 Join ClientOrder and OrderDetail relations at London, select ‘TV3190’ items and project result over
clientNo and orderNo and move this result to Glasgow for matching with Edinburgh Clients. For
simplicity, assume that the projected result is still 10 characters long.
which are horizontally fragmented on the department number, deptNo. Assume there is an
integrity constraint that requires that every member of staff earns less than every manager in
the same department. Further assume that we wish to insert the tuple (‘S9100’, ‘John Smith’,
‘1-May-1960’, 30000, ‘D1’) into the Staff relation. Under what conditions can this constraint
be checked locally?
If a record has previously been inserted that satisfies the constraint, then that can be used as a basis
for a local integrity check. For example, suppose that department D1 already has an employee
S1000 whose salary is £40,000. Given that the constraint has not been violated by the existing
tuple, we know that the salary of every manager in department D1 is more than £40,000.
Therefore, it would be OK to insert a record with salary £30,000.
Database Systems: Instructor’s Guide – PART IV
Student Project
Assignment – Distributed Database Analysis and Design
Objective
To undertake analysis of a commercial company’s database requirements, and to produce a distributed
database design based on those requirements.
Approach
It is essential to appreciate the importance of the analysis and design phase of a project and to complete this
before any implementation is carried out. The lectures are intended to familiarize you with the theory of
distributed database analysis and design, including fragmentation and allocation of fragments to sites,
although they cannot cover all aspects. The intention of this project is to make the distributed database
design process more realistic.
To this end, you are expected, firstly, to find a suitable business or organization with a requirement for
distributed sharing of data. The company does not necessarily need to have a distributed database at
present, or even have a current requirement for such a system. The objective is to demonstrate how the
company’s requirements could best be represented in a distributed database. Suitable companies may
include travel agents, estate agents, banks, insurance companies, retail chains, supermarket chains,
bookshops, video shops, etc.
You can then put theory into practice by investigating the needs of an existing company. You are asked to
interview relevant staff to discover the way in which the present system operates (which may be manual or
a centralized database system) and to determine the data and application requirements. You are free to
identify possible improvements and solutions to problems with the current system.
Finally, you must produce a comprehensive report for the analysis and design of your proposed distributed
system and include:
(i) analysis of the company’s data and application requirements;
(ii) a suitable fragmentation schema for the system;
(iii) in the case of primary horizontal fragmentation, give a minimal set of
predicates;
(iv) the reconstruction of global relations from fragments.
Assessment
The assessment will be based on a written report and an oral justification of the design. Assessment will be
carried out on the following components of the work:
25.1 The centralized two-phase commit protocol uses a series of timeouts to prevent unnecessary
blocking. Discuss the actions for both coordinator and participants when a timeout occurs.
Consideration should be given to the various stages of the commit protocol.
25.2 Under the three-phase commit protocol, discuss how the coordinator and participants would
recover following a failure. Consideration should be given to the various stages of the commit
protocol.
Coordinator Failure
1. Failure in INITIAL state. The coordinator has not yet started the commit procedure.
Recovery in this case starts the commit procedure.
2. Failure in WAITING state. The coordinator has sent the prepare message and although
has not received all responses, the coordinator has not received an abort response. In this
case, recovery restarts the commit procedure.
3. Failure in DECIDED state. The coordinator has instructed the participants to globally
abort or commit the transaction. On restart, if the coordinator has received all
acknowledgements, it can complete successfully. Otherwise, it has to initiate the
termination protocol discussed above.
4. Failure in PRE-COMMITTED state. The coordinator has instructed the participants to
pre-commit the transaction. On restart, if the coordinator has received all
acknowledgements, it can send the global commit message. Otherwise, it can send the
pre-commit message again.
Participant Failure
The objective of the recovery protocol for a participant is to ensure that a participant process on
restart performs the same action as all other participants and that this restarting can be done
independently (i.e. without the need to consult either the coordinator or the other participants).
1. Failure in INITIAL state. The participant has not yet voted on the transaction. Therefore,
on recovery, it can unilaterally abort the transaction, as it would have been impossible for
the coordinator to have reached a global commit decision without this participants vote.
2. Failure in PREPARED state. The participant has sent its vote to the coordinator. In this
case, recovery is via the termination protocol discussed above.
3. Failure in ABORTED/COMMITTED states. Participant has completed the transaction.
Therefore, on restart, no further action is necessary.
4. Failure in PRE-COMMITTED state. Participant informs coordinator and waits for
global commit message.
Database Systems: Instructor’s Guide – PART IV
25.3 Consider six transactions T1, T2, T3, T4, and T5 with:
The locking information for these transactions is shown in Table 1. Produce the local
wait-for-graphs (WFGs) for each of the sites. What can you conclude from the local
WFGs?
Table 1
T1 T2 T1 T4 T2 T3
T3 T4 T5
25.4 One of the most well-known methods for distributed deadlock detection was developed by
Obermarck. Explain how Obermarck’s method works and how deadlock is detected and resolved.
25.5 Using the above transactions, demonstrate how Obermarck’s method for distributed deadlock
detection works.
Text
T1 T2 T1 T4 T2 T3
Text T3 Text T4 T5
Cycle at site 1, so move WFG from Site 1 to site 3. The resulting WFG shows a cycle:
Text
T1 T2 T3
T4 T5
Sites 1 and 3
which implies system is in global deadlock and one of the transactions must be selected to be
aborted and restarted.
Database Systems: Instructor’s Guide – PART IV
Student Project
Assignment – Distributed Database System Implementation
Introduction
The objective of the project for this module is to design and implement part of a Distributed Database
Management System (DDBMS).
The work is to be undertaken in groups – it will be left to yourselves to split into groups of 2/3. However, to
ensure an even distribution, please divide yourselves such that there is a member from each course stream.
Each group member has to supply his/her own critical appraisal with the final report.
Specification of Requirements
Design and implement a distributed database, which contains the following base functionality:
A possible implementation for the DDBMS would be based on Visual Basic for the front-end, ADO
(ActiveX Data Objects) for accessing the individual databases, and Access for the individual DBMSs.
However, this is only a suggestion and clearly different implementation environments (for example, Visual
C++, Java/JDBC) are possible.
There will be a regular project meeting scheduled for each group, at which time the group will present an
outline of the work carried out since the last meeting with a list of current problems/suggested solutions.
Marking Scheme
27.1 Discuss why traditional transaction management protocols are too restrictive for advanced
database applications.
The librarian will access the system to issue books, record reservations, generate return requests for
reserved books and record books being returned. Returning a book may involve generating an
availability notification if the returned book has been reserved by another member. If an availability
notification is generated then the librarian sends it out to the member. When a book is requested to be
issued or requested to be reserved then the system validates details on the membership card as well as
validating that the requested book is one stocked by the library. If the membership card is invalid or the
library does not stock the book then the request is rejected. If a book is requested for issue but is on-
loan then that book will be reserved. Members of the library can display details of books stocked. The
subscriptions section deals with membership issues e.g. renewing membership cards on an annual basis
etc. The purchasing section deals with adding new books to the library.
There may be multiple copies of books held in the library. When a book is borrowed the borrowing date
is noted and when the book is returned the return date is noted. The details held about each reservation
include who the current borrower(s) is and who has reserved the book.
When a copy of a book arrives from the suppliers, it is held in storage until its details are registered on
the system. These details include the number, author and title of the book. Once registered the copy is
put in the lending shelf which means that it is available to be borrowed. When a copy is returned, if the
book has been reserved then the copy is held in a reservation area otherwise it is returned to the lending
shelf. If a copy is reported lost by the borrower then the copy details are deleted from the system by the
librarian.
A reservation
Reservation
Number:
Date:
Book
Number:
Title:
Author:
Reserver
Number:
Name:
Address:
Current Borrower
Number:
Name:
Address:
27.2 Produce use case diagrams and a set of associated sequence diagrams for the above case
study.
The following are some example diagrams for the case study (produced in Rational Rose):
Database Systems: Instructor’s Guide – PART IV
Member
<<extend>>
Generate Availabilty Note
Librarian
Return Book
<<include>>
Maintain Membership
<<extend>>
<<include>>
Member
Add New Books
Purchasing
Reserve Book
Request Return
Supplier
Database Systems: Instructor’s Guide – PART IV
Issue Book
VALIDATE REQUEST
if request valid then
while copies to be checked
if copy available
dispense book
end if
end while
if no copies available then
RESERVE BOOK
end if
end if
Validate Request
Librarian inputs member details
if membership invalid then
reject request
else
Librarian input book details
if book is not stocked then
reject request
else
accept request
end if
end if
Reserve Book
if details not already validated then
VALIDATE REQUEST
end if
if request valid then
record reservation details
end if
Database Systems: Instructor’s Guide – PART IV
Reservation
Book
date 0..n
number author
title
0..n 1 reservation status
0..n number
is reserver
is borrower 1
1
Member 1..n
number
name
address
0..n
1..n
Copy
0..n loan status
Loan
loan date
return date
Version 1
Database Systems: Instructor’s Guide – PART IV
member details
validation results
book details
check stocked
validation results
[invalid book]request rejected
get status
* loan status
[copy available]copy
1: issue book
2: member details : (Member)
6: book details
3: validate membership (member details)
: Librarian
8: validation results
7: check stocked
: Copy : Loan
Reservation Book
date 0..n author
number title
0..n reservation status
1 number
0..n
is reserver
is borrower check stocked()
1
1 1..n Library Controller
Member
issue book()
number
name
address
1..n
validate membership() Copy
loan status
get status()
set status()
Loan
loan date
return date
create()
delivered/create
Waiting
Registration
registered
On Shelf
Reservation Book
date author
0..n
number title
reservation status
0..n 1 number
0..n
is reserver
check stocked()
is borrower get reservation status()
1
1 1..n Library Controller
Member
issue book()
number
name 1..n
address Copy
validate membership() loan status
registration status
create()
delete()
Loan get loan status()
set loan status()
loan date
set registration status()
return date
create()
record loan date()
record return date()
27.3 Give your definition of an Object-Oriented Database Management System (OODBMS). Discuss
what you consider to be the three most important advantages of a DDBMS over a relational
DBMS. Justify your selection of the three advantages.
See Sections 27.2 and 27.5. Expect some justification for the top three selection. Also expect
detailed discussion of the advantages, not just bullet points.
27.4 Despite the superior expressive power of the Object-Oriented Database Management System
(OODBMS) in comparison to the established relational systems, the acceptance of the
OODBMS will ultimately depend on its performance. The key to this may well lie with how
persistent objects are accessed. Discuss the design goals for the incorporation of persistence
in a programming language.
Design goals:
27.5 Object databases have roots in both programming languages and database management.
However, not all aspects of these two technologies blend together easily. One area of potential
conflict arises when we try to completely separate persistence and type.
Two axes on a graph are said to be orthogonal if they are perpendicular, that is, if they
represent different dimensions. Similarly, persistence and type are considered to be
orthogonal if they do not influence each other in any way: any type can have persistent
or transient instances, and any behavior or characteristics that apply to a type are totally
independent of whether its instances are persistent or transient.
(b) Discuss this issue for any three of the following subsystems:
(1) Queries
(2) Schemas
(3) Transactions
(4) Existence Semantics.
Queries
Database Systems: Instructor’s Guide – PART IV
Which objects should a query consider? The traditional database point of view is that
declarative queries should range over persistent objects. In contrast, the programming
language point of view is that persistence should be orthogonal to type and that the
programmer should treat both transient and persistent objects in exactly the same way.
This view results in the conclusion that the range of a query should include both
transient and persistent objects. Therefore, there should be no distinction that makes
query capability applicable only to persistent objects.
If queries should apply to both transient and persistent objects, then does a given user’s
query need to filter the transient objects of the desired type that have been created in
other user’s applications but that have not yet been committed to persistent storage? Or
does the user’s query only include in its scope the transient objects of the desired type
created by that run unit, along with appropriate persistent objects from the object
database?
Deciding that transient objects are included in query scopes complicates the design of
the query processor. For example, the DBMS may need to construct and maintain
indexes on transient as well as persistent objects. If queries include filtering of local
transient objects, then the query processor must execute on the client application side
and may even need to be distributed across the client and server processes.
Schema
The traditional database point of view is that there should be explicit declaration of
which types may have persistent instances. This viewpoint stems from a philosophy that
persistent data and transient data are inherently different. The programmer should
therefore be aware of when he is dealing with each. Database functionality, for example,
queries, indexes, transaction commits, versions, and so forth, apply only to persistent
objects. A DBMS should enforce semantic constraints, such as uniqueness and
referential integrity, only for persistent objects. Since an object DBMS uses class
specifications for its schema information, persistence capability should be explicitly
declared in those class specifications.
By contrast, the programming language point of view is that all classes should be
persistence-capable. There should be no distinction in either available functionality or
mode of interface between transient and persistent objects, therefore there is no need to
declare which classes are persistence-capable.
Transactions
Programming language and database people alike generally agree that the commit of a
transaction must guarantee that updates to persistent objects are durably written to
persistent storage. Similarly, when a transaction aborts, any persistent-object updates that
occurred during the transaction are guaranteed not to be written to the database. The
question is, what should happen to transient objects that were updated during an aborted
transaction? Should these updates also be undone? Should object DBMSs support
transaction-consistent transient objects?
The traditional database point of view ignores updates to transient objects. Because a
DBMS does not manage transient objects, it does not log their updates and cannot undo
Database Systems: Instructor’s Guide – PART IV
them. Whatever changes were made in program memory variables remain, even if the
transaction aborts. It is the application’s responsibility to manage whatever clean-up may
be required for program variables.
Existence Semantics
How long does an object exist and when is it actually deleted? This issue shows up when
programmers use an object database with Smalltalk. In Smalltalk, an object exists if it is
reachable, that is, if some other object references it. A Smalltalk object is
garbage-collected, that is, removed from the Smalltalk Image, when it is no longer
reachable. The traditional database point of view, however, is that an object exists until it
is explicitly deleted. Therefore, Smalltalk and traditional DBMSs have different
existence semantics – different understandings of what is necessary for an object to exist.
In Smalltalk, an object exists if it is reachable. To a DBMS, an object exists because it
exists.
One particular issue for the object DBMS vendor who wants to support Smalltalk is
determining what the object DBMS for Smalltalk should include in an extent, which is
the set of all instances of a class. Extents are commonly used as the scopes for queries. If
the only reference to an object is because it is part of a class extent, does the object exist?
One approach is for the object DBMS to treat the extent as a special kind of collection,
the semantics of which include making an object eligible for garbage collection if its
only reference is from the extent collector.
A related issue arises with the notion of keys, which are uniqueness constraints. A key is
an (possibly compound) attribute, the value of which is unique across an extent. If a
class has a defined key, the object DBMS should check that a new instance of that class
does not violate the uniqueness constraint. Should it check both transient and persistent
instances of the class? Should it check both transients local to this process and other
transient instances of the class? Should it check again against only persistent instances at
the end of the transaction? What if there is conflict with an about-to-be-garbage-
collected instance’s key value?
27.6 Discuss the concept of object identifiers (OIDs) in an object DBMS and discuss four different
approaches for the representation of OIDs.
OIDs are used both by application programs for referencing objects and for representing relations
between objects. The choice of the type of representation of OIDs can also influence the
performance of an ODBMS. OIDs can be represented in different ways (for a review of the
different approaches proposed, see Khoshafian and Copeland (1986)).
An OID can be physical or logical. The former contains the actual address of the object, whereas
the latter is an index from which the address of the object is obtained. Different approaches have
Database Systems: Instructor’s Guide – PART IV
been proposed for the representation of both physical and logical OIDs, thereby producing at least
four types of OID.
Physical address
The OID is the physical address of the object. This representation, normally used by
programming languages, has the advantage of being very efficient, but it is rarely used in
an ODBMS since, if a given object is moved or deleted, all the objects containing its
OID must be modified.
Structured address
The OID consists of two parts – the first contains the segment number and the page
number, thus making it possible to obtain quickly the address on the disk to be read,
whereas the second part contains a logical slot number, which is used to determine the
position of the object within the page. With this representation, the object can be
relocated within the page simply by changing the slot array, or it can be moved to
another page by inserting its forward address in the slot array.
Surrogate
The OID is generated by using an algorithm which guarantees its uniqueness (for
example, the time and date, or a monotonically increasing counter). Surrogate OIDs are
then transformed into the object’s physical addresses, normally by using an index.
Typed surrogates
A variant of the surrogate for representing OIDs involves having both a type identifier
(type ID) and a portion of the object identifier. A different counter generates the object
identifier portion for each type. Thus, the address space is segmented. Moreover, the type
identifier in the object’s OID allows us to determine the object type without retrieving
the object from the disk.
To C++, an object identifier is an address in process memory space. This space is too small for
most database purposes. To an object DBMS, an object identifier cannot be just a memory
address. Scalability requires that object identifiers be valid across storage volumes. Distributed
object databases require that object identifiers be valid across machine boundaries. From the
database management perspective, an object identifier must be a unique identifier that persists
with an object for its entire lifetime, regardless of where it may be stored or moved or how it is
being used. The object DBMS can then use object identifiers as the basis of references used to
implement relationships.
However, from the programming language perspective, there should be no need to introduce
reference syntax to supplement pointers. Pointers should be used instead of object identifiers or
object identifiers should simply behave like pointers, even if the object DBMS eventually
converts the addresses to the equivalent of object identifiers with a larger scope.
Traversal paths are not the same as C++ pointers. First, an object DBMS creates and deletes
traversal paths in pairs. In contrast, just because one C++ object points to another does not
mean there is a reverse pointer. Second, it is the responsibility of an object DBMS to maintain
Database Systems: Instructor’s Guide – PART IV
An object DBMS cannot even use pointers directly to represent relationships, because by
definition pointers are not location-independent. Because relationships are logical, they must
remain valid even when the associated objects move. When an object is moved in memory, its
address changes. A pointer to that object now points to something else. However, when an
object is moved in memory, all of its relationships must retain their validity. This must be true,
even if the object is moved from main memory to disk, or to a different disk volume, or to a
different site on the network. Object identifiers, which are location-independent can also use
object identifiers to implement relationships.
27.8 (a) ‘Pointer swizzling’ is a technique that can be used to optimize access to main memory
resident persistent objects. Discuss in general terms how pointer swizzling works.
The basic objective of pointer swizzling is to convert main memory pointers representing
interobject references to disk pointers when an object is being written to disk and subsequently to
convert disk pointers to main memory pointers when the object is being read back in again. Thus,
the program language can access the interobject references as though they are normal pointers,
thereby improving performance.
Three dimensions:
(i) eager v. lazy
(ii) direct v. indirect
(iii) copy v. in-place
1. Eager: Guarantees that all the pointers in the main memory are swizzled. When an object is
loaded from disk, the object is scanned through and all the pointers in the object are
immediately swizzled.
2. Lazy: Only swizzles pointers on demand; i.e., a pointer is not swizzled until the object it refers
to is accessed via this particular pointer.
Advantage is that that no pointers are swizzled unnecessarily. On negative side, lazy swizzling
must handle two different kinds of pointers at run-time: swizzled and non-swizzled.
3. Direct: Requires that the referenced object be resident in memory. A directly swizzled pointer
contains the main memory address of the object it references.
Problem is that if an object is displaced from the system buffer (i.e., is no longer resident) all the
directly swizzled objects that reference the displaced object need to be unswizzled.
Database Systems: Instructor’s Guide – PART IV
In the case of eager direct swizzling, one cannot simply unswizzle pointers because eager
swizzling guarantees that all pointers in the buffer are swizzled. Instead, these pointers (i.e., their
home objects) must be displaced too – possible snowball effect.
Induces an additional overhead when it comes to simple object lookups – leads to an additional
level of indirection (due to existence of a descriptor that stores the main memory address of
object) and to a residency check, a check of whether the descriptor is valid or not. For direct
swizzling, the information that an object is resident is coded in the swizzled pointer.
5. Copy: When faulting objects in, data is copied into the application’s local object cache.
6. In-place: When faulting objects in, data is accessed within the data manager’s cache.
Copy swizzling may be more efficient as, in worst case, only modified objects have to be
swizzled back to their OIDs, whereas with in-place may have to unswizzle an entire page of
objects if one object on the page is modified. On other hand, with copy approach, every object
must be explicitly copied into the local object cache.
(c) Briefly discuss any alternative approach that could be used to handle persistence in an
ODBMS.
27.8 (a) Discuss why traditional transaction management protocols are too restrictive for
advanced database applications.
There should be no reason why conventional systems cannot handle such protocols. Main reason
why ODBMS systems handle them is because they were necessary for the types of applications
the systems were originally designed for.
Database Systems: Instructor’s Guide – PART IV
28.1 Discuss the Object Model proposed by the Object Data Management Group.
Cornucopia Ltd is a large, multinational oil company that uses contractors for systems analysis
whenever possible. The following data is held about the contracts:
(a) Each contract consists of a contract name and contract number, the name of the main
contracting company, the names of any other companies involved in the contract, the
start and scheduled end date for the contract, the budget for the contract and the name of
the project manager for the contact.
(b) Each contract consists of a number of tasks, each with an activity name, a start and
scheduled end date, a task leader, a work group and a set of deliverables.
(c) A work group consists of a group code and a list of staff. A group leader is identified for
each group.
(d) Deliverables take the form of documents, consisting of a document code, a document
title, the names of the authors, a list of people the document is to be distributed to, an
issue date and the name of the person who has approved the document. Documents
come in two forms: requirements specifications and functional specifications.
(e) Company data consists of the name of the company, the technical head of staff, the
administrative head of staff, the company’s main address (street, city, town, and
postcode), a telephone, and fax number.
(f) Each company is divided into a number of divisions, consisting of an address (street,
city, town, and postcode), and a divisional head of staff.
(g) Staff data consists of a name, address, and a job title, which determines the hourly cost
for the member of staff.
28.2 Using the Object Definition Language (ODL) of the ODMG object model, define the interface of
the object schema’s types for this case study. For each type, define at least one method that you
consider appropriate. State any assumptions necessary to support your design.
See schema diagram on following page for possible solution. Specification for the Project class
shown below.
Database Systems: Instructor’s Guide – PART IV
class Project {
(extent projects key projectName)
28.3 Write the following query in the Object Query Language (OQL) of the ODMG object model:
‘List the names of all documents delivered under the project named “SCCS”.’
SELECT title:x.title
FROM z IN Project,
y IN z.ConsistsOf,
x IN y.HasDeliverables
WHERE z.projectName = ‘SCCS’
Database Systems: Instructor’s Guide – PART IV
Task Person
Title
nameActivity: STRING name: struct STRING
contractingCompany company
taskLeader division code: STRING
workGroup title hourlyCost: FLOAT
startDate: DATE address: struct STRING
schedEndDate: DATE
deliverables
Group
groupCode: STRING
members*
head
Document
docCode: STRING
title: STRING
authors*
FuncSpec
listDistrib*
issueDate: DATE
approvedBy
RequirementsAnalysis
issueDate: DATE
approvedBy
Perilous Printing Ltd is a small printing company that does work for book publishers. Its jobs
consist of printing books or parts of books. The following data is held about the work:
(a) The company does work for many different publishing houses. The data on each publisher
consists of a name, an address (street, town, city, and postcode), a telephone, and fax number.
(b) Each printing job consists of a due date, a description and a job type (rush job, normal job, or
fill-in). Each printing job is associated with one or more members of staff.
(c) A printing job requires the use of several materials, such as paper and ink. The company holds
details of the amount of stock of each type of material that they have on hand together with the
price and a reorder level.
(d) Materials are assigned to a job via purchase orders. Each printing job may have several
purchase orders assigned to it. Purchase orders identify a vendor and a date for the purchase.
(e) Likewise, each purchase order may contain several purchase order items, containing the
quantity of material required.
(f) Company data consists of the name of the company, the technical head of staff, the
administrative head of staff, the company’s main address (street, town, city, and postcode), a
telephone and fax number.
(g) Staff data consists of a name (first and last name), address, and a job title.
28.4 Using the Object Definition Language (ODL) of the ODMG object model, define the interface
of the object schema’s types for this case study. For each type, define at least one method that
you consider appropriate. State any assumptions necessary to support your design.
See figure below for a possible schema. Class specification for Publisher shown below.
class Publisher {
(extent publishers key pubName)
28.5 Write the following query in the Object Query Language (OQL) of the ODMG object model:
Database Systems: Instructor’s Guide – PART IV
‘List the surnames of all people associated with printing jobs for Addison Wesley Longman.’
SELECT surname:s.lName
FROM p IN Publisher,
j IN p.HasJob,
s IN j.HasStaff
WHERE p.pubName = ‘Addison Wesley Longman’
Publisher Company
jobs*
Job Staff
PurchaseOrder*
Material
type: STRING
stockLevel: FLOAT
price: FLOAT
PurchaseOrder reorderLevel: FLOAT
vendor: STRING
dateOfPurchase: DATE
items*
POItem
quantityRequired: FLOAT
material
A practice called Perfect Pets provides private health care for domestic pets throughout
America. This service is provided through various clinics located in the main cities of
America. The Director has provided the following description of the current system.
Perfect Pets has many veterinary clinics located in the main cities of America. The details of
each clinic include the clinic number, clinic address (consisting of the street, city, state, and
zipcode), and the telephone and fax numbers. Each clinic has a Manager and a number of staff
(for example, vets, nurses, secretaries, cleaners). The clinic number is unique throughout the
practice.
The details stored on each member of staff include the staff number, name (first and last),
address (street, city, state, and zipcode), telephone number, date of birth, sex, social security
number (SSN), position, and current annual salary. The staff number is unique throughout the
practice.
When a pet owner first contacts a clinic of Perfect Pets the details of the pet owner are
recorded, which include an owner number, owner name (first name and last name), address
(street, city, state, and zipcode), and home telephone number. The owner number is unique to a
particular clinic.
The details of the pet requiring treatment are noted, which include a pet number, pet name,
type of pet, description, date of birth (if unknown, an approximate date is recorded), date
registered at clinic, current status (alive/deceased), and the details of the pet owner. The pet
number is unique to a particular clinic.
When a sick pet is brought to a clinic, the vet on duty examines the pet. The details of each
examination are recorded and include an examination number, the date and time of the
examination, the name of the vet, the pet number, pet name, and type of pet, and a full
description of the examination results. The examination number is unique to a particular
clinic. As a result of the examination, the vet may propose treatment(s) for the pet.
Perfect Pets provides various treatments for all types of pets. These treatments are provided at
a standard rate across all clinics. The details of each treatment include a treatment number, a
full description of the treatment, and the cost to the pet owner. For example, treatments
include:
A standard rate of $20.00 is charged for each examination, which is recorded as a type of
treatment. The treatment number uniquely identifies each type of treatment and is used by all
Perfect Pets clinics.
Based on the results of the examination of a sick pet, the vet may propose one or more types
of treatment. For each type of treatment, the information recorded includes the examination
number and date, the pet number, name and type, treatment number, description, quantity of
each type of treatment, and date the treatment is to begin and end. Any additional comments
on the provision of each type of treatment are also recorded.
Database Systems: Instructor’s Guide – PART IV
28.6 Using the Object Definition Language (ODL) of the ODMG object model, define the interface
of the object schema’s types for this case study. For each type, define at least one method that
you consider appropriate. State any assumptions necessary to support your design.
See figure below for a possible schema. Class specification for Pet shown below.
class Pet {
(extent pets key petNo)
Has
PetOwner Clinic Staff
IsContactedBy 1..1 1..*
ownerNo clinicNo Manages staffNo
1..* 1..1
0..1 1..1
1..1 1..1 1..1
Owns
Registers
1..*
Performs
Pet
petNo
1..*
1..1
Undergoes
1..*
Examination 0..*
examNo
28.7 Write the following query in the Object Query Language (OQL) of the ODMG object model:
List the names of all pets that received the T123 treatment.
SELECT petName:p.petName
FROM t IN Treatment,
pt IN t.UsedIn,
e IN pt.ResultsFrom,
p IN e.Underwent
WHERE t.treatNo = ‘T123’
Database Systems: Instructor’s Guide – PART IV
Student Project
Assignment – Persistence in a Programming Language
Introduction
The objective of the ODBMS project is to design and implement persistence in a programming language.
The work is to be undertaken in groups – it is left to yourselves to split into groups of 2/3.
Specification of Requirements
The aim of this project is to develop a method that will allow persistent objects to be automatically stored
and read as required. The solution should cope with standard object concepts, such as inheritance, and
also provide for interobject relationships, thereby providing basic database functionality.
It is acceptable to use an existing DBMS to store objects, rather than getting involved in writing disk I/O
software for paging, indexing and clustering. For example, Microsoft Access could be used to store
objects from Visual C++ using ADO (ActiveX Data Objects) or ODBC (Open Database Connectivity). It
is not acceptable to use MFC serializability classes (such as CArchive).
To demonstrate your particular implementation, a small application should be developed which handles
objects and relationships. For example, you may produce an application for corresponding to a data model
for course administration, consisting of entities Course, Module, Student, Lecturer, Department, etc.
where Student and Lecturer are derived from a superclass Person, to exhibit inheritance.
There will be a regular project meeting scheduled for each group, at which time the group will present an
outline of the work carried out since the last meeting with a list of current problems/suggested solutions.
Marking Scheme
29.1 The World Wide Web is a distributed information system based on hypertext. Discuss how the two-
tier client–server architecture may not be entirely suitable for this environment and propose an
alternative architecture.
29.2 Discuss what you consider to be the four most important advantages and the four most important
disadvantages of the World Wide Web as a distributed information system. Justify your selection
in each case.
See Section 29.2.7, but looking for justification for advantages and disadvantages selected, not
just regurgitation.
Database Systems: Instructor’s Guide – PART IV
30.1 Despite the excitement surrounding XML, it is important to note that most operational
business data, even for new Web-based applications, continues to be stored in relational
DBMSs. This is unlikely to change in the foreseeable future because of their reliability,
scalability, tools, and performance. Consequently, if XML is to fulfil its potential, some
mechanism is required to publish relational data in the form of XML documents. The
SQL:2003 standard has defined extensions to SQL to enable the publication of XML,
commonly referred to as SQL/XML. Discuss in detail these extensions.
SQL/XML contains:
a new native XML data type, XML, which allows XML documents to be treated as
relational values in columns of tables, attributes in user-defined types, variables, and
parameters to functions;
an implicit set of mappings from relational data to XML. The mapping may take as its
source an individual table, all the tables in a particular schema, or all the tables in a given
catalog. The standard does not specify a syntax for the mapping; instead it is provided for
use by applications and as a reference for other standards. The mapping produces two
XML documents: one that contains the mapped table data and the other that contains an
XML Schema describing the first document.
30.2 Provide XQuery expressions for the following queries based on the sample XML file (books.xml)
below.
<author><last>Connolly</last><first>Thomas</first></author>
<author><last>Begg</last><first>Carolyn</first></author>
<publisher>Addison Wesley</publisher>
<price>99.00</price>
</book>
<book year = “2003”>
<title>Database Solutions</title>
<author><last>Connolly</last><first>Thomas</first></author>
<author><last>Begg</last><first>Carolyn</first></author>
<publisher>Addison Wesley</publisher>
<price>52.99</price>
</book>
<book year = “2000”>
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann</publisher>
<price>45.95</price>
</book>
<book year = “1995”>
<title>Modern Database Systems</title>
<editor><last>Kim</last><first>Won</first></editor>
<publisher>Addison Wesley</publisher>
<price>69.95</price>
</book>
</bib>
doc(“books.xml”)/book[1]/title
(b) List the titles of all the books along with a count of the number of books.
This produces:
<title>Database Solutions</title>
</titles>
(c) List the title and price of each book published in the year 2003.
for $b in doc(“books.xml”)//book
where $b/@year = “2003”
return $b/title
for $b in doc(“books.xml”)//book
let $c := $b/author
return <book> { $b/title, <count> { count($c) } </count> } </book>
This produces:
<book>
<title>Database Systems</title>
<count>2</count>
</book>
<book>
<title>Database Solutions</title>
<count>2</count>
</book>
<book>
<title>Data on the Web</title>
<count>3</count>
</book>
<book>
<title>Modern Database Systems</title>
<count>0</count>
</book>
(e) List the titles of books whose price is less than $60.
for $b in doc(“books.xml”)//book
where $b/price < 60
return $b/title
for $t in doc(“books.xml”)//title
order by $t
return $t
Database Systems: Instructor’s Guide – PART IV
This produces:
(g) List the authors, sorted in reverse order of surname, then first name.
for $a in distinct-values(doc(“books.xml”)//author)
order by $a/last descending, $a/first descending
return $a
This produces:
<author>
<last>Suciu</last>
<first>Dan</first>
</author>
<author>
<last>Connolly</last>
<first>Thomas</first>
</author>
<author>
<last>Buneman</last>
<first>Peter</first>
</author>
<author>
<last>Begg</last>
<first>Carolyn</first>
</author>
<author>
<last>Abiteboul</last>
<first>Serge</first>
</author>
(h) List the authors as a single string, sorted in reverse order of surname, then first name.
for $a in distinct-values(doc(“books.xml”)//author)
order by $a/last descending, $a/first descending
return <author> { string($a/first), “ “, string($a/last) }</author>
Note that the order by clause specifies conditions based on data that is not used in the return
clause. This produces:
<author>Dan Suciu</author>
<author>Thomas Connolly</author>
<author>Peter Buneman</author>
<author>Carolyn Begg</author>
Database Systems: Instructor’s Guide – PART IV
<author>Serge Abiteboul</author>
(I) List the titles of books that have at least one author called Thomas Connolly.
for $b in doc(“books.xml”)//book
where some $a in $b/author
satisfies ($a/first = “Thomas” and $a/last = “Connolly”
return $b/title
This query uses the XQuery existential qualifier, which tests whether at least one item
satisfies a condition. This produces:
<title>Database Systems</title>
<title>Database Solutions </title>
(j) List the titles of books that have every author called Thomas Connolly.
for $b in doc(“books.xml”)//book
where every $a in $b/author
satisfies ($a/first = “Thomas” and $a/last = “Connolly”)
return $b/title
This query uses the XQuery universal qualifier, which tests whether every node in a sequence
satisfies a condition. This produces:
Note in this case that the title of the last book is returned. This book had no authors specified
(only an editor). In this case, the expression $b/author evaluates to an empty sequence. If a
universal quantifier is applied to an empty sequence, it always returns true, because every
item in that (empty) sequence satisfies the condition (even though there are no items).
<authors>
{
let $a := doc(“books.xml”)//author
for $l in distinct-values($a/last), $f in distinct-values($a[last=$l]/first)
order by $l, $f
return
<author>
<name> { $l, “, “, $f } </name>
{
for $b in doc(“books.xml”)/bib/book
where some $ba in $b/author
satisfies ($ba/last = $l and $a/first = $f)
order by $b/title
return $b/title
}
Database Systems: Instructor’s Guide – PART IV
<author>
}
</authors>
This produces:
<authors>
<author>
<name>Abiteboul, Serge</name>
<title>Data on the Web</title>
</author>
<author>
<name>Begg, Carolyn</name>
<title>Database Solutions</title>
<title>Database Systems</title>
</author>
<author>
<name>Buneman, Peter</name>
<title>Data on the Web</title>
</author>
<author>
<name>Connolly, Thomas</name>
<title>Database Solutions</title>
<title>Database Systems</title>
</author>
<author>
<name>Suciu, Dan</name>
<title>Data on the Web</title>
</author>
</authors>
(l) Test whether the most expensive book is also the book with the largest number of
authors/editors.
Note, this expression performs a node comparison using the IS test; the last()
function determines whether a node is the last node in a sequence.
(m) List the books where Begg is an author but is not listed as the first author.
for $b in doc(“books.xml”)//book
let $a := ($b/author)[1], $aBegg := ($b/author)[last = “Begg”]
Database Systems: Instructor’s Guide – PART IV
Note, the operator << returns true if the first operand precedes the second operand
in document order.
(n) List the titles of books that are more expensive than the average book price.
let $b := doc(“books.xml”)//book
let $avgPrice := average($b//price)
return $b[price > $avgPrice]/title
For our XML document, this produces the titles of the first two books, Database Systems and
Database Solutions.
(a) xs:string
(b) xs:integer
(c) xs:decimal
(d) xs:double
(e) xs:integer
(f) xs:decimal
(g) xs:double
(h) xs:decimal
(i) xs:double
(j) xs:double
(k) this is a static type error
(l) this is a static type error
(m) xs:boolean
(n) xs:boolean
(o) xs:boolean
(p) this is a static type error
(d) xs:integer (1 + 2 has type xs:integer, so variable $a has type xs:integer and return expression is also
xs:integer)
(e) xs:double
(f) xs:integer
(g) xs:decimal (this is well typed because xs:integer is a subtype of xs:decimal)
(h) this is a static type error (not xs:integer is not a subtype of xs:double)
(i) xs:double
(j) xs:double
(k) this is a static type error
(l) xs:integer
(m) (xs:string | xs:decimal)
(n) (xs:double | xs:integer)
(o) (xs:double | xs:decimal)
Database Systems: Instructor’s Guide - Part V
31.1 ‘Data warehouse and the underlying support for informational processing will emerge as a growing
trend in the 1990s. With the advent of the data warehouse, some basic ideas about data management
will change’ (Inmon, W.H., 1993). Briefly discuss why the emergence of the data warehouse
phenomenon is causing such interest in the business world.
See introductory paragraphs to Chapter 31 for general introduction to data warehousing and Sections
31.1.1 for evolution of data warehousing and 31.1.3 for the potential benefits to an organization.
31.2 ‘A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in
support of management’s decision-making process’ (W.H. Inmon, 1993). Describe the major
characteristics of the data held in a data warehouse.
31.3 Discuss the relationship between online transaction processing (OLTP) and data warehousing and identify
the major differences between these systems.
31.5 Discuss the current issues associated with the development and management of a data warehouse.
31.6 Discuss the major problems associated with the design, development, and management of a data
warehouse.
See Sections 31.1.5 and for a discussion on the management of data warehouse meta-data see Section
31.4.3.
31.7 Describe how data marts differ from a data warehouse and identify the major issues associated with the
development and management of data marts.
31.8 Explain why businesses have shown a growing interest in data warehousing in recent years.
Database Systems: Instructor’s Guide - Part V
Since the 1970s, organizations have mostly focused their investment in new computer systems that
automate business processes. In this way, the businesses gained competitive advantage through systems
that offered more efficient and cost-effective services to the customer. Throughout this period,
businesses accumulated growing amounts of data stored in their operational databases. However, in
recent times, where such systems are common place, businesses are focusing on ways to use
operational data to support decision-making, as a means of gaining competitive advantage.
The successful implementation of a data warehouse can bring major benefits to an organization
including:
31.9 Discuss the reasons why an organisation may have one or more Online Transaction Processing
(OLTP) system but only a single data warehouse.
An organisation may have several OLTPs due to the range and complexity of the business processes
that support the business. Several OLTP may also be the result of mergers and acquisitions. It is also
possible for an organisation with a single focus to be well supported by a single OLTP. However the
goal for an organisation is to have a single repository of data to support analysis i.e. data warehouse
that offers a single version of the truth.
Subject-oriented Data
The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and
sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales).
This is reflected in the need to store decision-support data rather than application-oriented data.
Integrated Data
The data warehouse integrates corporate application-oriented data from different source systems, which
often includes data that is inconsistent. The integrated data source must be made consistent to present a
unified view of the data to the users.
Time-variant Data
Data in the warehouse is only accurate and valid at some point in time or over some time interval.
Database Systems: Instructor’s Guide - Part V
Time-variance is also shown in the extended time that the data is held, the implicit or explicit
association of time with all data, and the fact that the data represents a series of snapshots.
Non-volatile Data
Data in the warehouse is not normally updated in real-time (RT) but is refreshed from operational
systems on a regular basis. (However, emerging trend is towards RT or NRT DWs). New data is always
added as a supplement to the database, rather than a replacement.
A DBMS built for Online Transaction Processing (OLTP) is generally regarded as unsuitable for data
warehousing because each system is designed with a differing set of requirements in mind. For
example, OLTP systems are designed to maximize the transaction processing capacity, while data
warehouses are designed to support ad hoc query processing. An organization will normally have a
number of different OLTP systems for business processes such as inventory control, customer
invoicing, and point-of-sale. These systems generate operational data that is detailed, current, and
subject to change. The OLTP systems are optimized for a high number of transactions that are
predictable, repetitive, and update intensive. The OLTP data is organized according to the requirements
of the transactions associated with the business applications and supports the day-to-day decisions of a
large number of concurrent operational users. In contrast, an organization will normally have a single
data warehouse, which holds data that is historic, detailed, and summarized to various levels and rarely
subject to change (other than being supplemented with new data). The data warehouse is designed to
support relatively lower numbers of transactions that are unpredictable in nature and require answers to
queries that are ad hoc, unstructured, and heuristic. The warehouse data is organized according to the
requirements of potential queries and supports the long term strategic decisions of a relatively low
number of managerial users.
31.11 Explain why businesses have shown a growing interest in technologies that support their decision
makers.
Since the 1970s, organizations have mostly focused their investment in new computer systems that
automate business processes. In this way, the businesses gained competitive advantage through systems
that offered more efficient and cost-effective services to the customer. Throughout this period,
businesses accumulated growing amounts of data stored in their operational databases. However, in
recent times, where such systems are common place, businesses are focusing on ways to use
operational data to support decision-making, as a means of gaining competitive advantage.
The successful implementation of a decision support technology can bring major benefits to an
organisation including:
Growing number of vendors are providing a range of BI tools to support the decision makers with
differing technical skills.
Database Systems: Instructor’s Guide - Part V
Decision maker can access tools that enable interactive querying/reporting/model building using a
range of internal and external data sources.
Potential high returns on investment
Move towards more fact-based decision means that decisions can be faster and more correct.
Increased productivity of corporate decision-makers
Opportunities for launching new products into new markets can be explored with less risk.
Competitive advantage
31.12 The main characteristics for describing Online Transaction Processing (OLTP) systems and data
warehousing systems are listed below.
- Main purpose
- Data age
- Data latency
- Data granularity
- Data processing
- Reporting
- Users
Using each characteristic compare and contrast OLTP systems with data warehousing systems and
describe the relationship that can exist between these systems. Include in your answer any emerging
trends that are influencing the characteristics of data warehousing systems.
The student’s answer should cover the information contained in the second and third columns of the
table below.
OLTP are the major source of data for the data warehouse. However OLTP systems were never
designed to support such business activities and so tapping into these systems for decision-making may
never be an easy solution. The legacy is that a typical business may have numerous operational systems
with overlapping and sometimes contradictory definitions, such as data types. The challenge for
organizations is the need to turn their archives of data into a source of knowledge, so that a single
integrated / consolidated view of the organization’s data is presented to the user. The concept of a data
warehouse was deemed the solution to meet the requirements of a system capable of supporting
decision-making, receiving data from multiple operational data sources.
A DBMS built for Online Transaction Processing (OLTP) is generally regarded as unsuitable for data
warehousing because each system is designed with a differing set of requirements in mind. For
example, OLTP systems are designed to maximize the transaction processing capacity, while data
warehouses are designed to support ad hoc query processing. An organization will normally have a
number of different OLTP systems for business processes such as inventory control, customer
invoicing, and point-of-sale. These systems generate operational data that is detailed, current, and
subject to change. The OLTP systems are optimized for a high number of transactions that are
predictable, repetitive, and update intensive. The OLTP data is organized according to the requirements
of the transactions associated with the business applications and supports the day-to-day decisions of a
large number of concurrent operational users. In contrast, an organization will normally have a single
data warehouse, which holds data that is historic, detailed, and summarized to various levels and rarely
subject to change (other than being supplemented with new data). The data warehouse is designed to
support relatively lower numbers of transactions that are unpredictable in nature and require answers to
queries that are ad hoc, unstructured, and heuristic. The warehouse data is organized according to the
requirements of potential queries and supports the long term strategic decisions of a relatively low
number of managerial users.
Although OLTP systems and data warehouses have different characteristics and are built with different
purposes in mind, these systems are closely related, in that the OLTP systems provide the source data
for the warehouse. A major problem of this relationship is that the data held by the OLTP systems can
be inconsistent, fragmented, and subject to change, containing duplicate or missing entries. As such, the
operational data must be ‘cleaned up’ before it can be used in the data warehouse. OLTP systems are
not built to quickly answer ad hoc queries. They also tend not to store historical data, which is
necessary to analyze trends. Basically, OLTP offers large amounts of raw data, which is not easily
Database Systems: Instructor’s Guide - Part V
analyzed. The data warehouse allows more complex queries to be answered as opposed to simple
aggregations.
Database Systems: Instructor’s Guide - Part V
32.1 Describe the main principles and key features of Kimball’s Business Dimensional Lifecycle. Support
your answer with a diagram showing the stages of the lifecycle.
32.2 Describe the proposed approach to designing a star schema in the Dimensional Modelling stage of
Kimball’s Business Dimensional Lifecycle. Include the creation of a simple (maximum of four dimension
tables and one fact table with maximum of six attributes per table) star schema to illustrate each step.
32.3 Identify three differences between Kimball’s approach to data warehouse development and Inmon’s
approach.
There are a number of differences between the approaches and the student is free to identify any three such
as Kimball recommends first breaking down the project into smaller parts (called data marts) whereas
Inmon does not; Kimball uses new techniques e.g. dimensional modelling whereas Inmon uses traditional
database techniques and Kimball recommends a denormalised database structure while Inmon does not and
his conforms to 3NF.
Database Systems: Instructor’s Guide - Part V
32.4. The star schema shown in Figure 32.1 describes part of the database that will provide decision-support
for a property sales company. Describe the main characteristics of fact and dimension tables and discuss
the purpose of the tables shown in the star schema of Figure 32.1.
Figure 32.1
The fact table has relatively few attributes but many records and constitutes the largest part of the
decision-support database. The primary key for the fact table is composed of the foreign keys, which
relate to the dimension tables.
Each dimension table has relatively more attributes but few records compared with the fact table. The
more attributes contained by dimensions the more different types of analysis supported. The primary
key for each dimension table is a single simple surrogate key, which is copied to the fact table.
The schema is referred to as a ‘star schema’ because of the star shape that results from surrounding the
fact table in the centre with the dimension tables. The schema is a star schema because some of the
dimension tables contain repeating data such as the Location and Date dimension. The purpose of the
star schema is to reduce the number of joins between tables and hence speed up queries.
Database Systems: Instructor’s Guide - Part V
The purpose of the fact table is to contain the attributes that describe the important metrics associated
with property sales such as offerPrice and sellingPrice and foreign key attributes to allow queries about
these metrics to be set in context such as a query concerning the average property selling price to
buyers living in different cities.
The purpose of the dimension tables is to contain attributes to allow property sales to be queried from
different perspectives.
32.5 Identify three types of analysis that the star schema shown in Figure 32.1 can support about property
sales.
For example, the star schema can support analysis of property sales according to –
32.6 What do slowly changing dimensions (SCDs) represent to a database designed for decision-support?
A database designed for decision-support will normally store data that can be several years old. During
that time the business is likely to change and this may impact on the corporate data. The star schema
has to continually evolve to support the business and this means that it is important to identify and
accommodate these changes where necessary.
Many dimension attribute values are not fixed but change over time e.g. an employee may change
department. Dimensions that have changeable attribute values are called slowly changing dimensions
(SCDs).
This requires the identification of dimensions and attributes to be tracked and decisions made on how
they are to be tracked as not all changes are significant.
There is a third tracking technique called Type 3 SCD, which creates separate attributes for both the old
and new attribute values. However, Type 3 is less common because it involves changing physical
tables and is not very scalable.
32.9 Identify two possible examples of SCDs in the property sales star schema shown in Figure 32.1 and
discuss the types of change each represents.
An example of a SCD is the position attribute in Staff. The position held by a member of staff is likely
to change over time. If it is important to know the position held by a member of staff associated by a
property sale then this is a Type 2 SCD. This means that a new Staff dimension record has to be created
to represent the member of staff with the new position and this record takes effect from this point
onwards until the next change.
A second example of a possible SCD is the propertyNo attribute in PropertyForSale. It is possible that
the business may change the coding used to uniquely identify each property from e.g. PG1 to
PGW00001. If it is not important to know the old unique identifier for a property when analysing sales
then this is a Type 1 SCD. This means that the new value for each property overwrites the old value.
Evidence of the change associated with propertyNo is lost and the old values will not be available when
querying property sales.
32.10 Discuss what the bus matrix (shown in Figure 32.2) for an online retailer represents and how it can be
used to facilitate the creation of a data warehouse.
Customer X X X X
Registration
Customer X X X X X X X
Purchase
Product X X X X X X X
Delivery
Product X X X X
Promotion
Product X X X X X X X
Return
Customer X X X X X X
Product Review
Figure 32.2
Following the establishment of analytical themes, the business processes that are associated with them
are identified. The student should describe the layout and features of the matrix using the example of
bus matrix shown in Figure 32.2. For example, the student should indicate that the company’s
measurement-driven business processes are listed down the x-axis. While the columns of the y-axis
represent the objects (dimensions) that participate in the business processes.
Database Systems: Instructor’s Guide - Part V
The bus matrix represents the enterprise dimensional data architecture. Each analytical theme
supported by one or more business process shown in the bus matrix is subjected to a prioritization
process. The most important/most valuable business processes are scheduled to be built earliest.
However, the conformed dimensions that are shared across more than one business processes form the
basis for the eventual formation of an enterprise–wide data warehouse.
32.11 Produce a star schema for the Product Delivery business process using the information shown in
Figure 32.2. Based on your understanding of this business process, add a maximum of 5 (possible)
attributes to each dimension table in your schema. Complete your star schema by adding a maximum
of 8 (possible) attributes to your fact table. Your choice of attributes should demonstrate that you have
a realistic idea of how this star schema is likely to be queried.
The student should present star schema for any single business processes shown in Figure 32.2. The
attributes added should be reasonable and demonstrate that the student has a good idea of how the
schema is likely to be queried.
___________________________________________________________________________________
The data mart shown in Figure 32.3 supports the analysis of a media (TV programmes and films)
streaming service.
Database Systems: Instructor’s Guide - Part V
dimMember
memberID
memberName
memberGender
memberDOB
dimFim
memberHomeCityTown
filmID
memberHomePostCode
filmNo
memberJoinDate
filmTitle
filmCertification
filmGenre
filmYearRelease
filmDirector
filmProductionCompany
mainActor1
factStreaming dimTVProgramme
mainActor2
streamID TVProgrammeID
mainActor3
TVProgrammeID TVProgrammeNo
filmID TVProgEpisodeNo
memberID TVProgEpisodeTitle
startDate TVProgEpisodeDuration
Figure 32.3
32.12 Describe the characteristics and purpose of fact and dimension tables and explain how you recognise
that this data mart is based on a star schema design. Illustrate your answer using the data mart tables
in Figure 32.3
The fact table has relatively few attributes but many records and constitutes the largest part of the
decision-support database. The fact table is made up of foreign keys and (usually) one or more metrics.
Each dimension table has relatively more attributes but few records compared with the fact table. The
more attributes contained by dimensions the wider the range of analysis supported. The primary key for
each dimension table is a single simple surrogate key, which is copied to the fact table as foreign key.
The purpose of the fact table is to contain any important metrics (streamDuration) and any additional
descriptors (customerRating) about the TV programmes/films streamed. The attributes of the dimension
tables allow a range of queries about this events being analysed. For example, the dimensional tables
allow analysis of the timing of streaming.
The purpose of the dimension tables is to contain attributes to allow streaming to be queried from
different perspectives.
Database Systems: Instructor’s Guide - Part V
The schema that describes this data mart is referred to as a ‘star schema’ because of the star shape that
results from surrounding the fact table in the centre with the dimension tables.
The schema is a star schema because the dimension tables are de-normalised and contain repeating data
such as that found in the dimFilm and dimTVProgramme dimension.
The purpose of the star schema is to reduce the number of joins between dimension tables and hence
speed up queries.
32.13 Identify three types of analysis that the data mart shown in Figure 32.3 can support about media
streaming.
For example, the data mart can support analysis of media streaming according to the
following.
d. The most popular actors in streamed films.
e. The most popular genre for streamed TV programmes.
f. The most popular dates and times for streaming.
32.14 The data mart shown in Figure 32.3 cannot support the analysis of media streaming according to the
age of the member at the time of the streaming. Describe the changes necessary to the data mart to
support this type of analysis.
32.15 What do slowly changing dimensions (SCDs) represent to a database designed for decision-support?
A database designed for decision-support will normally store data that can be several years old. During
that time the business is likely to change and this may impact on the corporate data. The star schema
has to continually evolve to support the business and this means that it is important to identify and
accommodate these changes where necessary.
Many dimension attribute values are not fixed but change over time e.g. an employee may change
department. Dimensions that have changeable attribute values are called slowly changing dimensions
(SCDs).
This requires the identification of dimensions and attributes to be tracked and decisions made on how
they are to be tracked as not all changes are significant.
32.16 Describe three types of slowly changing dimension (SCDs) and discuss the best SCD Type to track
changes over time to a member’s home address.
Database Systems: Instructor’s Guide - Part V
Most common techniques for dealing with SCDs that are not significant are called Type 1 and those
that are significant are called Types 2 and 3.
A Type 1 SCD overwrites the existing attribute value with the new value. This gives no tracking record
of historical values.
A Type 2 SCD captures the attribute values that were in effect at a point in time and relates them to the
business events in which they participated.
When a change occurs to a Type 2 SCD, a new record is created in the dimension table to capture the
new values that are to take effect from that point onwards or to the next change.
Type 3 SCD creates separate attributes for both the old and new attribute values. However, Type 3 is
less common because it involves changing physical tables and is not very scalable.
Only Type 2 or 3 should be discuss for the member’s home address SCD as Type 1 does not track
changes.
Database Systems: Instructor’s Guide - Part V
Chapters 33 OLAP
The OLAP cube shown in Figure 33.1 has been prepared for a car rental company that rents out cars to
customers throughout the UK. The company wishes to explore which manufacturer, model, engine size and
trim (interior finish) generates the most rental income in each location of the UK. For example, the
manufacturer Ford has various models including Mondeo, Fiesta and Ka and each model comes in various
size of engine such as 1.8 or 1.6 with each available in one of three trim levels of high, medium and low.
Country
Year
Figure 33.1
33.1 Present typical dimensional hierarchies (with 4 levels of aggregation) for each dimension shown in
Figure 1. The highest level of aggregation for each dimension is shown in Figure 33.1.
Quarter Season
City Model
Month Week
Area Engine size
Day
Postcode
Trim level
33.2 Describe the four common OLAP operations for querying data. Provide an example of each operation
using the OLAP cube in Figure 33.1 and your answer to question 33.1.
Roll-up performs aggregations on the data by moving up the dimensional hierarchy or by dimensional
reduction e.g. 3-D to 2-D.
For example, {location, manufacturer and time} analysis of rental income to {location, manufacturer}
analysis of rental income.
For example, analysis of weekly rental income to analysis of monthly rental income.
Drill-down is the reverse of roll-up and involves revealing the detailed data that forms the aggregated
data. Drill-down can be performed by moving down the dimensional hierarchy or by dimensional
introduction e.g. 2-D to 3-D.
For example, {location, manufacturer} analysis of rental income to {location, manufacturer and time}
analysis of rental income.
For example, analysis of monthly rental income to analysis of weekly rental income.
Slice and dice - ability to look at data from different viewpoints. The slice operation performs a
selection on one dimension of the data whereas dice uses two or more dimensions.
For example, a slice is analysis of rental income according to {time = ‘2012’} OR {manufacturer =
‘Ford’} OR {location = ‘London’}
For example, a dice is analysis of rental income according to {time = ‘2012’} and {manufacturer =
‘Ford’} OR {time = ‘2012’} and {manufacturer = ‘Ford’} and {location = ‘London’}
Pivot - ability to rotate the data to provide an alternative view of the same data.
For example, analysis of rental income using the location (city) as x-axis against time (year) as the y-
axis can be rotated so that time (year) is the x-axis against location (city) is the y-axis.
33.3 Describe how SQL has been extended to include OLAP-type analysis of relational data.
The extensions to SQL are collectively referred to as the ‘OLAP package’ and includes:
Aggregation is a fundamental part of OLAP. To improve aggregation capabilities the SQL standard
provides extensions to the GROUP BY clause such as the ROLLUP and CUBE functions. ROLLUP
supports calculations using aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing
levels of aggregation, from the most detailed up to a grand total. CUBE is similar to ROLLUP, enabling
a single statement to calculate all possible combinations of aggregations. CUBE can generate the
information needed in cross-tabulation reports with a single query.
Supports a variety of operations such as rankings and window calculations. Ranking functions include
cumulative distributions, percent rank, and N-tiles. Windowing allows the calculation of cumulative
and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.
_________________________________________________________________________________
The OLAP cube shown in the figure below has been prepared for a media (TV programmes and films) streaming
service.
Country
Media
Media
Streaming
Year
Figure 33.2
33.4 Describe the four common OLAP operations for querying data. Provide an example of each operation
using the OLAP cube in Figure 33.2.
Roll-up performs aggregations on the data by moving up the dimensional hierarchy or by dimensional
reduction e.g. 3-D to 2-D.
For example, {country, media and time} analysis of media streaming to {country, media} analysis of
streaming.
Database Systems: Instructor’s Guide - Part V
Drill-down is the reverse of roll-up and involves revealing the detailed data that forms the aggregated
data. Drill-down can be performed by moving down the dimensional hierarchy or by dimensional
introduction e.g. 2-D to 3-D.
For example, {country, media} analysis of media streaming to {country, media and time} analysis of
streaming.
For example, analysis of monthly of media streaming to analysis of weekly streaming.
Slice and dice - ability to look at data from different viewpoints. The slice operation performs a
selection on one dimension of the data whereas dice uses two or more dimensions.
For example, a slice is analysis of media streaming according to {time = ‘2014’} OR {media = ‘film’}
OR {location = ‘London’}
For example, a dice is analysis of media streaming according to {time = ‘2014’} and {media = ‘film’}
OR {time = ‘2014’} and {media = ‘film’} and {location = ‘London’}
Pivot - ability to rotate the data to provide an alternative view of the same data.
For example, analysis of media streaming using the location (city) as x-axis against time (year) as the
y-axis can be rotated so that time (year) is the x-axis against location (city) is the y-axis.
33.5 What are the three key features that all OLAP applications share?
OLAP applications all require (1) multi-dimensional views of data, (2) support for complex
calculations (such as forecasting), and (3) time intelligence. Time intelligence is a key feature of almost
any analytical application as performance is almost always judged over time.
Multi-dimensional data can be characterized through many different views. (e.g. DVD sales can be
viewed according to the characteristics of the DVD such as genre and/or characteristics of Buyers such
home address).
The types of analysis available from OLAP ranges from basic navigation and browsing (referred to as
‘slicing and dicing’) to calculations, to more complex analyses such as time series and complex
modeling.
33.6 Explain the difference between the output produced by SQL ROLLUP and CUBE queries.
capabilities by generated subtotal rows for every combination of GROUP BY columns. ROLLUP and
CUBE also generate a grand total row.
34.1 Data mining can provide huge paybacks for companies who have made a significant investment in data
warehousing. Describe the advantages that data mining tools offer the business analyst.
34.2 Explain why a data warehouse is well equipped for providing the data for data mining.
A data warehouse is well equipped for providing data for mining for the following reasons:
Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive
models. Data warehouses are populated with clean, consistent data.
It is advantageous to mine data from multiple sources to discover as many interrelationships as
possible. Data warehouses contain data from a number of sources.
Selecting the relevant subsets of records and fields for data mining requires the query capabilities of
the data warehouse.
The results of a data mining study are useful if there is some way to further investigate the
uncovered patterns. Data warehouses provide the capability to go back to the data source.
34.3 Identify the factors that have led to the growing popularity of data mining.
34.4 Discuss the relationship between data mining, OLAP and data warehousing.
Data mining and OLAP are complementary BI tools. While OLAP functions typically include
aggregations, allocations, ratios etc, which are descriptive in nature, data mining uses regression, neural
nets, decision trees and clustering, which are associated with pattern discovery or explanatory
modeling.
A data warehouse is well equipped for providing the data source for data mining and OLAP.
Database Systems: Instructor’s Guide - Part V
Data quality and consistency is a pre-requisite for mining and/or browsing to ensure the accuracy of the
predictive models/descriptive models. Data warehouses are populated with clean, consistent data.
It is advantageous to mine and/or browse data from multiple sources to discover as many
interrelationships as possible. Data warehouses contain data from a number of sources.
Selecting the relevant subsets of records and fields for data mining/browsing requires the query
capabilities of the data warehouse.
The results of a data mining/OLAP study are useful if there is some way to further investigate the
uncovered patterns. Data warehouses provide the capability to go back to the data source.
34.5 There are a growing number of data mining tools. Describe four key features that these tools offer the
analyst.
The student should discuss key features of these tools such as:
data preparation facilities;
selection of data mining operations (algorithms);
product scalability and performance;
facilities for visualization of results.