Docs Will Be DOCs

Database Systems: Instructor’s Guide – IV
PART IV
EXAM QUESTIONS & COURSEWORKS
AND SOLUTIONS
Exam Questions and Solutions
Part 1 Background
4
Chapter 1 Introduction to Databases
4
Chapter 2 Database Environment
Part 2 The Relational Model and Languages

7
Chapter 4 The Relational Model
7
Chapter 5 Relational Algebra and Relational Calculus
10
Chapters 6 – 8 SQL
18
Chapter 9 Object-Relational DBMSs
39
Part 3 Database Analysis and Design Techniques

43
Chapter 10 Database Planning, Design, and Administration
43
Chapters 12–13 (Enhanced) Entity–Relationship Modeling
45
Reliable Rentals Case Study
48
Chapters 14 - 15 Normalization
50
Part 4 Methodology
69
Chapters 16 - 19 Methodology – Conceptual, Logical, and Physical Database Design
69
Case Study 1 - Adult Education Department
69
Case Study 2 - BusyBee Cleaning Company
71
Case Study 3 - Reliable Rentals
74
Case Study 4 - Perfect Pets
79
Case Study 5 - StayHome Video Rentals
85
Database Systems Coursework 1 (Case Study 6 – Fastcabs Cab Company)
97
Database Systems Coursework 2 (Case Study 7 – University Database)
104
Part 5 Selected Database Issues

115
Chapter 20 Security and Administration
115
Chapter 22 Transaction Management
117
Assignment – Transaction Management Design and Implementation
120
Chapter 23 Query Processing
121
Part 6 Distributed DBMSs and Replication

130
Chapter 24 Distributed DBMSs – Concepts and Design
130
Case Study 1 – Real Estate Agency
130
Case Study 2 – Quack Consulting
133
Case Study 3 – InstantBuy
136
Case Study 4 – Complete Pet Care
139
Case Study 5 – Rapid Roads
142
Case Study 6 – Perilous Printing
145
Assignment – Distributed Database Analysis and Design
150
Chapter 25 Distributed DBMSs – Advanced Concepts
151
Assignment – Distributed Database System Implementation
154
Part 7 Object DBMSs

155
Chapter 27 Object-Oriented DBMSs – Concepts and Design
155
Case Study – Library System
156
Chapter 28 Object-Oriented DBMSs – Standards and Systems
172
Case Study 1 – Cornucopia Ltd
172
175
Case Study 3 – Perfect Pets

177
Assignment – Persistence in a Programming Language
180
Part 8 Web and DBMSs

181
Chapter 29 Web Technology and DBMSs
181
Chapter 30 Semistructured Data and XML
182
Part 9 Business Intelligence

2
Chapter 31 Data Warehousing Concepts
2
Chapters 33 - 34 OLAP and Data Mining
3
Part 1 Background
Chapter 1 Introduction to Databases
1.1 A database management system provides a number of facilities that will vary from system to
system. Describe the type of facilities you might expect, especially those that aid the initial
implementation of a database and its subsequent administration.
Initially, the type of facilities expected should be described. These include: data storage and
retrieval, concurrency control mechanism, authorization services, integrity mechanisms and
transaction support. The focus is then on facilities such as data definition, which is stored in the
catalog, authorization control to manage the users. View definition, transaction support and
integrity controls also aid subsequent administration.
1.2 Discuss the problems that arise when organizations rely upon multiple computerized file systems
to store data.
Discuss the advantages and disadvantages in using a database management system to carry out
the same functions.
For the first part the answer should cover broadly all the problems associated with file-based
systems, such as: data duplication, data inconsistency, problems with validation and security,
fragmentation of data.
For the second part, the focus is on a DBMS to support the same operations. Advantages include:
providing all the data structuring facilities, security mechanisms, integrity mechanisms, backup
and recovery facilities, minimal redundancy, data sharing, data independence. Disadvantages
include: cost, size, complexity.
1.3 Applications built around file management systems have often been used to satisfy user
requirements. Discuss the problems that arise with such systems, and what advantages a database
management system could offer instead.
This answer contains the points made in part one of the previous question and also includes the
advantages contained in part two of the previous question.
1.4 Explain what is meant by a database management system, and contrast it with a File
Management System.
Give a full account of the type of system structure you might expect for a Database Management
System, and outline the type of facilities such a system should provide.
If you were in the position of appraising an application for possible implementation using a
Database Management System, what aspects of the application would you consider with respect
to the advice you might give?
First part: For a DBMS, the emphasis is on the management of a collection of data as a resource
that is accessible to different users for different purposes. In contrast, a File Management System
(FMS) manages a collection of files for a specific purpose. Different FMSs have their own
collection of files and cannot share data. Additional details should focus on aspects such as data
redundancy, data sharing, data independence.
Second part: A diagram illustrating typical system structure would be useful. It should show the
DBMS as a suite of programs/modules, each with specific functions. Following on from this, the
typical facilities provided can be described.
Third part: Perhaps consider: how many users there are requiring different data, the type of
application it is (such as, interactive with many screens), is it part of a larger set of applications, is
it likely to be extended in the future, are the data relationships complex, are various integrity
controls needed.
Chapter 2 Database Environment
2.1 Discuss the reasons for the three-level architecture for a Database Management System.
A diagram would be useful to illustrate the relationships between the external, conceptual, and
internal levels. The different levels should be explained, and reference should be made to the need
for logical and physical data independence. This should be explained also using examples.
2.2 Security and recovery are two important functions of a Database Management System, which are
also of interest to Database Administration. Various facilities can be provided to aid both these
functions. Explain what these facilities are and with examples, show how Database
Administration could effectively use them.
The examples should be given using the current DBMS that is in use. The focus is on security and
recovery. Security may include authorization mechanisms and associated granting of access
privileges. Views and encryption procedures may also be described. These mechanisms should be
elaborated. Recovery functions include backup procedures and safe storage of the media,
journaling and checkpointing, and the recovery manager. The database administration role should
be linked to their use.
2.3 Explain the purpose of the ANSI/SPARC three-level architecture for a Database Management
System, and describe its function giving examples to illustrate your points. Detail the components
of a Database Management System, explaining in particular the specific functions of the software
modules.
The answer to the first part of this question is similar to that for 2.1. This question specifically
asks for examples to be used, and these should be present in a full answer.
To answer the second part of the question, a diagram would be useful. It may show the whole
environment – users, DBMS software (expanded), data, and implicit hardware. The emphasis is
on the DBMS modules, which need to be explained. These include DML precompiler, Query
processor, DDL compiler, Database manager, and Data dictionary. The explanation should include
how modules interact with others.
2.4 A database management system provides a number of facilities, which will vary from system to
system. Describe the type of facilities you might expect, especially those that aid the initial
implementation of a database and its subsequent administration.
Initially, the type of facilities expected should be described. These include: data storage and
retrieval, concurrency control mechanism, authorization services, integrity mechanisms, and
transaction support. The focus is then on facilities such as data definition, which is stored in the
catalog, authorization control to manage the users. View definition, transaction support and
integrity controls also aid subsequent administration.
Part 2 The Relational Model and Languages

Chapter 4 The Relational Model
4.1 Describe the main characteristics of the Relational Data Model, including the properties of
relations and the rules for relational integrity.
Relational
– set of tables (as perceived by users)
– rows/columns
– variety of possible storage structures
– no repeating groups
– no links/pointers
– use of join columns.
– no duplicate tuples
– tuples are unordered
– attributes are unordered
– all attribute values are atomic.
Entity Integrity: No attribute participating in the PK of a base relation is allowed to accept null
values.
Referential Integrity: If a base relation R2 includes a FK targeting the PK of R1, then every
value of FK must either be:
– equal to the value of PK in some tuple of R1 or
– wholly null.
4.2 The Relational and CODASYL models are examples of two different approaches by which a
Database Management System will be classified. Make a detailed comparison of each of these
models, indicating clearly the relative advantages and disadvantages of each.
CODASYL
– records in chains/rings
– parent-child links
– many parents for a child
– ordering of all records
– fixed access paths
– potential for automatic referential integrity.
Relational
– set of tables (as perceived by users)
– rows/columns
– variety of possible storage structures
– no repeating groups
– no links/pointers
– use of join columns.
CODASYL Data Model Advantages
– longevity and availability of DBMSs for this model

– all data relationships can be modeled

– standards exist
– good performance
– data independence.
CODASYL Data Model Disadvantages

– complex navigation
– complexity.
Relational Data Model Advantages

– simplicity
– high level of data independence
– strong theoretical foundation.
Relational Data Model Disadvantages

– performance needs improvement
– referential integrity not supported.
4.3 Describe the difference between a base relation and a view and discuss the main benefits of
using views in a relational database.
A base relation is a named relation, corresponding to an entity in the conceptual schema, whose
tuples are physically stored in the database. A view can be constructed by performing operations
such as the relational algebra selection, projection, join or other calculations on the values of
existing base relations. A view is the dynamic result of one or more relational operations operating
on the base relations to produce another relation. A view is a virtual relation that does not actually
exist in the database but is produced upon request by a particular user, at the time of request.
The main benefits of views includes:

 It provides a powerful and flexible security mechanism by hiding parts of the database from
certain users. The user is not aware of the existence of any attributes or tuples that are
missing from the view.
 It permits users to access data in a way that is customized to their needs, so that the same data
can be seen by different users in different ways, at the same time.
 It can simplify complex operations on the base relations. For example, if a view is defined as
a join of two relations, the user may now perform the more simple unary operations of
selection and projection on the view, which will be translated by the DBMS into equivalent
operations on the join.
4.4 From an SQL user’s perspective, does the relational model provide logical and physical data
independence?
Since a user can define views, logical data independence can be achieved by using view definitions
to hide changes in the conceptual schema. Since the SQL user has no knowledge of how the data is
physically represented, relying solely on the relation abstraction for querying, physical data
independence is also achieved.
Chapter 5 Relational Algebra and Relational Calculus
5.1 Choose any four relational algebra operators and explain how each functions.
For example: Select – produces a horizontal subset of a relation. Project produces a vertical subset
of a relation by picking out particular attributes. Product (Cartesian product) produces a result
relation by multiplying one relation by another. If the first relation contained 20 rows and the
second 50, the result relation would contain 20 × 50 rows. Join produces a result relation by
(commonly) joining two relations over the equal value of an attribute common to both relations.
Examples should be given for all operators explained.
5.2 Given two relations R and S, where R contains N 1 tuples and S contains N2 tuples (N2 > N1 > 0),
give the minimum and maximum cardinality for the result relation for each of the following
relational algebra expressions and in each case state any assumptions about the schemas that are
required to make the expression meaningful:
(a) RS
(b) RS
(c) R–S
(d) RxS
(e) a = 1(R)
(f) a(R)
Answer is shown in table below.
Expression Min Max Assumptions

RS N2 N1 + N2 R and S must be union -compatible
RS 0 N1 R and S must be union –compatible
R–S 0 N1 R and S must be union –compatible
RxS N1 * N2 N1 * N2 No restrictions
a = 1(R) 0 N1 R must have an attribute called a
a(R) 1 N1 R must have an attribute called a
5.3 A relational database contains details about journeys from Paisley to a variety of destinations
and contains the following relations:
Operator (opCode, opName)

Journey (opCode, destinationCode, price)
Destination (destinationCode, destinationName, distance)
Each operator is assigned a unique code (opCode) and the relation operator records the
association between this code and the operator’s name (opName). Each destination has a
unique code (destinationCode) and the relation destination records the association between this
code and the destination name (destinationName), and the distance of the destination from
Paisley. The relation Journey records the price of an adult fare from Paisley to the given
destination by as specified operator, several operators may operate over the same route.
Formulate the following queries using relational algebra, tuple relational calculus, and domain
relational calculus (the answers to these queries in SQL are given in the next section):
(a) List the details of journeys less than £100.
RA: price < 100(Journey)
TRC: {J | Journey(J)  J.price < 100}
DRC: {opCode, destinationCode, price |Journey(opCode, destinationCode, price) 

price < 100)}
(b) List the names of all destinations.
RA: destinationName(Destination)
TRC: {D.destinationName | Destination(D) }
DRC: {destinationName | (destinationCode, distance)

(Destination(destinationCode, destinationName, distance))}
(c) Find the names of all destinations within 20 miles.
RA: destinationName(distance < 20(Destination))
TRC: {D.destinationName | Destination(D)  D.distance < 20}

(Destination(destinationCode, destinationName, distance)  distance < 20)}
(d) List the names of all operators with at least one journey priced at under £5.
RA: opName(price < 5(Journey) 3 opCode Operator)
TRC: {O.opName | Operator(O)  (J) (Journey (J)  (O.opCode = J.opCode) 

J.price < 5)}
DRC: {opName | (opCode, opCode1, destinationCode, price)

(Operator(opCode, opName)  Journey(opCode1, destinationCode, price) 
(opCode = opCode1)  price < 5)}
(e) List the names of all operators and prices of journeys to ‘Ayr’.
RA: opName, price ( (destinationName = ‘Ayr’ (Destination) 3 destinationCode (Journey 3 opCode
Operator))
TRC: {O.opName, J.price | Operator(O)  (J)(D) (Journey (J)  Destination(D) 

(O.opCode = J.opCode)  (J.destinationCode = D.destinationCode) 
D.destinationName = ‘Ayr’)}
DRC: {opName, price | (opCode, opCode1, destinationCode, destinationCode1,

destinationName, distance) (Operator(opCode, opName) 
Journey (opCode1, destinationCode, price) 
Destination(destinationCode1, destinationName, distance) 
(opCode = opCode1)  (destinationCode = destinationCode1) 
destinationName = ‘Ayr’)}
(f) List the names of all destinations that do not have any operators.
RA: destinationNamedestinationCode(Destination) – destinationCode(Journey)) 3 destinationCode
Destination)
TRC: {D.destinationName | Destination(D)  (~(J) (Journey (J) 

(J.destinationCode = D.destinationCode))) }

(Destination(destinationCode, destinationName, distance) 
(~(opC, destinationCode1, price) (Journey(opC, destinationCode1, price) 
(destinationCode = D.destinationCode1)))}
5.4 The following tables form part of a database held in a Relational Database Management System:
Employee (empID, fName, lName, address, DOB, sex, position, deptNo)

Department (deptNo, deptName, mgrEmpID)
Project (projNo, projName, deptNo)
WorksOn (empID, projNo, hoursWorked)
where Employee contains employee details and empID is the key.

Department contains department details and deptNo is the key. mgrEmpID
identifies the employee who is the manager of the department. There
is only one manager for each department.
Project contains details of the projects in each department and the key is
projNo (no two departments can run the same project).
and WorksOn contains details of the hours worked by employees on each project,
and empID/projNo form the key.
Formulate the following queries in relational algebra, tuple relational calculus, and domain relational
calculus.
(1) List all employees.
RA: Employee
TRC: {E | Employee(E) }
DRC: {empID, fName, lName, address, DOB, sex, position, deptNo |

Employee(empID, fName, lName, address, DOB, sex, position, deptNo) }
(2) List all the details of employees who are female.
RA: sex = ‘F’(Employee)
TRC: {E | Employee(E)  E.sex = ‘F’}
DRC: {empID, fName, lName, address, DOB, sex, position, deptNo |

Employee(empID, fName, lName, address, DOB, sex, position, deptNo) 
sex = ‘F’}
(3) List the names and addresses of all employees who are Managers.
RA: fName, lName, address(position = ‘Manager’(Employee))
TRC: {E.fName, E.lName, E.address | Employee(E)  E.position = ‘Manager’}
DRC: {fName, lName, address | (empID, DOB, sex, position, deptNo)

(Employee(empID, fName, lName, address, DOB, sex, position, deptNo) 
position = ‘Manager’}
(4) Produce a list of the names and addresses of all employees who work for the ‘IT’ department.
RA: lName, address(deptName = ‘IT’(Department) 3 deptNo Employee)
TRC: {E.lName, E.address | Employee(E)  (D)(Department(D) 

(D.deptNo = E.deptNo )  D.deptName = ‘IT’)}
DRC: {lName, address | (empID, fName, DOB, sex, position, deptNo, deptNo1,
deptName, mgrEmpID) (Employee(empID, fName, lName, address, DOB, sex,
position, deptNo) 
Department(deptNo1, deptName, mgrEmpID)  (deptNo = deptNo1) 
deptName = ‘IT’)}
(5) Produce a list of the names of all employees who work on the ‘SCCS’ project.
RA: fName, lName (projName = ‘SCCS’(Project) 3 projNo (WorksOn 3 empID Employee))
TRC: {E.fName, E.lName | Employee(E)  (P) (W) (Project(P)  WorksOn(W) 

(E.empID = W.empID )  (W.projNo = P.projNo)  P.projName = ‘SCCS’)}
DRC: {fName, lName | (empID, address, DOB, sex, position, deptNo, projNo,
projName, deptNo1, empID1, projNo1, hoursWorked)
(Employee(empID, fName, lName, address, DOB, sex, position, deptNo) 
Project(projNo, projName, deptNo1)  WorksOn(empID1, projNo1,
hoursWorked)  (empID = empID1)  (projNo = projNo1) 
projName = ‘SCCS’)}
Employee (empNo, eName, salary, position)

Aircraft (aircraftNo, aName, aModel, flyingRange)
Flight (flightNo, from, to, flightDistance, departTime, arriveTime)
Certified (empNo, aircraftNo)
where Employee contains details of all employees (pilots and non-pilots) and empNo
is the key.
AirCraft contains details of aircraft and aircraftNo is the key.
Flight contains details of the flights and flightNo is the key.
and Certified contains details of the staff who are certified to fly an aircraft, and
empNo/aircraftNo form the key.
Formulate the following queries in relational algebra, tuple relational calculus, and domain
relational calculus (the answers to these queries in SQL are given in the next section).
(1) List all Boeing aircraft.
RA: aName = ‘Boeing’(Aircraft)
TRC: {A | Aircraft(A)  (A.aName = ‘Boeing’)}
DRC: {aircraftNo, aName, aModel, flyingRange |

Aircraft(aircraftNo, aName, aModel, flyingRange)  (aName = ‘Boeing’)}
(2) List all Boeing 737 aircraft.
RA: aName = ‘Boeing’  aModel= ‘737’ (Aircraft)
TRC: {A | Aircraft(A)  (A.aName = ‘Boeing’)  (A.aModel = ‘737’)}
DRC: {aircraftNo, aName, aModel, flyingRange |

Aircraft(aircraftNo, aName, aModel, flyingRange) 
(aName = ‘Boeing’)  (aModel = ‘737’)}
(3) List the employee numbers of pilots certified for Boeing aircraft.
RA: empNo (aName = ‘Boeing’(Aircraft) 3 aircraftNo Certified)
TRC: {C.empNo |Certified(C)  (A)(Aircraft(A) (A.aircraftNo=C.aircraftNo)


A.aName = ‘Boeing’)}
DRC: {empNo | (aNo)(Certified(empNo, aNo) 

(aNo1, aName)Aircraft(aNo1, aName, aT, fR)  (aNo = aNo1) 
(aName = ‘Boeing’)}
(4) List the names of pilots certified for Boeing aircraft.
RA: eName ((aName = ‘Boeing’(Aircraft) 3 aircraftNo Certified) 3 empNo Employee)
TRC: {E.eName | Employee(E)  ((C) Certified(C)  (E.empNo = C.empNo)

 (A)(Aircraft(A)  (A.aircraftNo = C.aircraftNo) 
A.aName = ‘Boeing’))}
DRC: {eName | (eNo)(Employee(eNo, eName, sal, posn) 

(empNo1, aNo)(Certified(empNo1, aNo)  (empNo = empNo1) 
(aNo1, aName)Aircraft(aNo1, aName, aT, fR)  (aNo = aNo1) 
(aName = ‘Boeing’)}
(5) List the aircraft that can fly nonstop from Glasgow to New York (flyingRange >
flightDistance).
RA: aircraftNo(flyingRange > flightDistance(from=‘Glasgow’  to=‘New York’ (Flight) X Aircraft))
TRC: {A.aircraftNo | Aircraft(A)  ((F) Flight(F)  (F.from = ‘Glasgow’ 

F.to = ‘New York’)  (A.flyingRange > F.flightDistance))}
DRC: {aircraftNo | (flyingRange)(Aircraft(aircraftNo, aN, aM, flyingRange)) 

(from, to, flightDistance)(Flight(fNo, from, to, flightDistance, dT, aT) 
(from = ‘Glasgow’  to = ‘New York’)  (flyingRange > flightDistance))}
(6) List the employee numbers of employees who have the highest salary.
RA: To answer this query, we first find all employees who do not have the
highest salary and then subtract these from the original list of employees to
give the list of highest employees.
E1(Employee)
E2(Employee)
E3(E2.empNo(E1 3 E1.salary > E2.salary E2)
empNo(E1) – E3
TRC: {E1.empNo | Employee(E1)  (~(E2)Employee(E2) 

(E2.salary > E1.salary))}
DRC: {empNo | (sal) (Employee(empNo, eN, sal, posn)) 

(~(sal1)(Employee(empNo1, eN1, sal1, posn1) 
(sal1 > sal))}
(7) List the employee numbers of employees who have the second highest salary.
RA: To answer this query, we proceed as above and first find all employees who
do not have the highest salary and then subtract these from the original list
of employees to give the list of highest employees. We next remove the list
of highest paid employees from the original list leaving the second highest
paid employees together with the rest of the employees. We can then simply
remove the rest of the employees to give the second highest paid employees.
E1(Employee)
E2(Employee)
E4(E2 3 E3)
E5(E2 3 E3)
empNo(E3) – E6
TRC: {E1.empNo | Employee(E1)  Employee(E2)  (E2.salary > E1.salary)

 (~(E3)Employee(E3)  (E3.salary > E2.salary))}
DRC: {empNo | (sal) (Employee(empNo, eN, sal, posn)) 

(sal1) (Employee(empNo1, eN1, sal1, posn1))  (sal1 > sal)
(~(sal2)(Employee(empNo2, eN2, sal2, posn2)  (sal2 > sal1))}
(8) List the employee numbers of employees who are certified for exactly three aircraft.
RA: To answer this query, we first find the employees who are certified for at
least three aircraft, then find the employees who are certified for at least four
aircraft. The difference provides the employees who are certified for exactly
three aircraft.
C1(Certified)
C2(Certified)
C3(Certified)
C4(Certified)
C5(empNo((C1.empNo=C2.empNo=C3.empNo)C1.aircraftNo<>C2.aircraftNo<>C3.aircraftNo) (C1 X C2 X C3)))
C6(empNo((C1.empNo=C2.empNo=C3.empNo=C4.empNo)C1.aircraftNo<>C2.aircraftNo<>C3.aircraftNo<>C4.aircraftNo) (C1 X
C2 X C3 X C4)))
C5 – C6
TRC: {C1.empNo | Certified(C1)  (C2)(Certified(C2)  (C3)(Certified(C3) 

(C1.empNo = C2.empNo)  (C2.empNo = C3.empNo) 
(C1.aircraftNo <> C2.aircraftNo)  (C2.aircraftNo <> C3.aircraftNo) 
(C3.aircraftNo <> C1.aircraftNo))  (~(C4)(Certified(C4) 
(C3.empNo = C4.empNo)  (C1.aircraftNo <> C4.aircraftNo) 
(C2.aircraftNo <> C4aircraftNo)  (C3aircraftNo <> C4aircraftNo))}
DRC: {C1empNo | (C1aircraftNo)Certified(C1empNo, C1aircraftNo)) 

(C2empNo, C2aircraftNo)Certified(C2empNo, C2aircraftNo)) 
(C3empNo, C3aircraftNo) (Certified(C3empNo, C3aircraftNo)) 
(C1empNo = C2empNo)  (C2empNo = C3empNo) 
(C1aircraftNo <> C2aircraftNo)  (C2aircraftNo <> C3aircraftNo) 
(C3aircraftNo <> C1aircraftNo) 
(~(C4empNo, C4aircraftNo)(Certified(C4empNo, C4aircraftNo) 
(C3.empNo = C4.empNo)  (C1.aircraftNo <> C4.aircraftNo) 
(C2.aircraftNo <> C4aircraftNo)  (C3aircraftNo <> C4aircraftNo))}

Chapters 6 – 8 SQL
6.1 What are the main advantages and disadvantages of SQL?
Advantages
Satisfies ideals for database language
(Relatively) Easy to learn
Portability
SQL standard exists
Both interactive and embedded access
can be used by specialist and non-specialist.
Disadvantages
Impedance mismatch – mixing programming paradigms with embedded access
Lack of orthogonality – many different ways to express some queries
Language is becoming enormous (SQL2 is 6 times larger than predecessor; SQL3 is even larger again)
Handling of nulls in aggregate functions
Result tables are not strictly relational – can contain duplicate tuples, imposes an ordering on
both columns and rows.
6.2 Consider the following relational schema:
Staff (staffNo, name, dept, skillCode)

Skill (skillCode, description, chargeOutRate)
Project (projectNo, startDate, endDate, budget, projectManagerStaffNo)
Booking (staffNo, projectNo, dateWorkedOn, timeWorkedOn)
where: Staff contains staff details and staffNo is the key.

Skill contains descriptions of skill codes (e.g. Programmer, Analyst,
Manager, etc.) and the charge out rate per hour for that skill; the
key is skillCode.
Project contains project details and projectNo is the key.
Booking contains details of the date and the number of hours that a member of
staff worked on a project and the key is staffNo/projectNo.
Formulate the following queries using SQL:
(a) (1) List all skills with a charge out rate greater than 60 per hour, in alphabetical
order of description.
SELECT *
FROM Skill
WHERE chargeOutRate > 60
ORDER BY description;
(2) List all staff with the skill description ‘Programmer’ who work in the ‘Special
Projects’ department.
SELECT *
FROM Staff s, Skill k

WHERE s.skillCode = k.skillCode AND
description = ‘Programmer’ AND dept = ‘Special Projects’;
(3) For all projects that were active in July 1995, list the staff name, project
number and the date and number of hours worked on the project, ordered by
staff name, within staff name by the project number and within project number
by date.
SELECT name, p.projectNo, dateWorkedOn, timeWorkedOn

FROM Staff s, Project p, Booking b
WHERE s.staffNo = b.staffNo AND
b.projectNo = p.projectNo AND
endDate >= DATE ‘1995-06-01’
ORDER BY name, p.projectNo, dateWorkedOn;
(4) How many staff have the skill ‘Programmer’?
SELECT COUNT(*)
description = ‘Programmer’;
(5) List all projects that have at least two staff booking to it.
SELECT projectNo, COUNT(*)

FROM Booking
GROUP BY projectNo
HAVING COUNT(*) >= 2;
(6) List the average charge out rate.
SELECT AVG(chargeOutRate)
FROM Skill;
(7) List all staff with a charge out rate greater than the average charge out rate.
SELECT s.* FROM Staff s, Skill k

chargeOutRate > (SELECT AVG(chargeOutRate)
FROM Skill);
(b) Create a view of staff details giving the staff number, staff name, skill description, and
department, but excluding the skill number and charge out rate.
CREATE VIEW SD (staffNo, name, dept, description)

AS SELECT staffNo, name, dept, description
WHERE s.skillCode = k.skillCode;
Employee (empID, fName, lName, address, DOB, sex, position, deptNo)

Department (deptNo, deptName, mgrEmpID)
Project (projNo, projName, deptNo)
WorksOn (empID, projNo, hoursWorked)
where Employee contains employee details and empID is the key.

Department contains department details and deptNo is the key. mgrEmpID
identifies the employee who is the manager of the department. There
is only one manager for each department.
Project contains details of the projects in each department and the key is
projNo (no two departments can run the same project).
and WorksOn contains details of the hours worked by employees on each project,
and empID/projNo form the key.
(a) (1) List all employees in alphabetical order of surname and within surname, first name.
SELECT *
FROM Employee
ORDER BY lName, fName;
(2) List all the details of employees who are female.
SELECT *
FROM Employee
WHERE sex = ‘F’;
(3) List the names and addresses of all employees who are Managers.
SELECT fName, lName, address

FROM Employee
WHERE position = ‘Manager’;
Or
SELECT fName, lName, address
FROM Employee e, Department d
WHERE e.empID = d.mgrEmpID;
(4) Produce a list of the names and addresses of all employees who work for the ‘IT’
department.
SELECT e.lName, e.address

WHERE e.deptNo = d.deptNo AND d.deptName = ‘IT’;
(5) Produce a complete list of all managers who are due to retire this year, in alphabetical
order of surname.
SELECT lName
WHERE e.empID = d.mgrEmpID AND
date_part(‘year’,DOB) < date_part(‘year’, DATE(‘2001-10-01’) – 65;
(student does not need to know exact date functions – just general idea)
(6) Find out how many employees are managed by ‘James Adams’.
SELECT COUNT(*)
FROM Employees e1,e2, Department d
WHERE e1.lName = ‘Adams’ AND e1.fName = ‘James’ AND
e1.empID = d.mgrEmpID AND d.deptNo = e2.deptNo;
(7) Produce a report of the total hours worked by each employee, arranged in order of
department number and within department, alphabetically by employee surname.
SELECT e.empID, e.lName, e.fName, e.deptNo,

SUM(w.hoursWorked)
FROM Employee e, Project p, WorksOn w
WHERE e.deptNo = p.deptNo AND e.empID = w.empID
ORDER BY e.deptNo, e.lName;
(8) For each project on which more than two employees worked, list the project number,
project name and the number of employees who work on that project.
SELECT e.projNo, e.projName, COUNT(*)

FROM Project p, WorksOn w
WHERE p.projNo = w. projNo
GROUP BY e.projNo, e.projName
HAVING COUNT(*) > 2;
(9) List the total number of employees in each department for those departments with
more than 10 employees. Create an appropriate heading for the columns of the
results table.
SELECT deptNo AS departmentNumber, COUNT(empID) AS totalEmployees

FROM Employee
GROUP BY deptNo, empID
HAVING COUNT(empID) > 10
(b) Create a view of employee details for all employees who work on project ‘MIS Development’,
excluding department number.
CREATE VIEW ED(eid, fName, lName, address, DOB)

AS SELECT empID, fName, lName, address, DOB
FROM Employee e, Project p
WHERE e.deptNo = p.deptNo AND p.projName = ‘MIS Development’;
6.4 The following tables form part of a database held in a Relational Database Management System
for a printing company that handles printing jobs for book publishers:
Publisher (pubID, pubName, street, city, postcode, telNo, creditCode)

BookJob (jobID, pubID, jobDate, description, jobType)
PurchaseOrder (jobID, poID, poDate)
POItem (jobID, poID, itemID, quantity)
Item (itemID, description, onHand, price)
where Publisher contains publisher details and pubID is the key.

BookJob contains details of the printing jobs (books or part books) and jobID
is the key.
PurchaseOrder A printing job requires the use of materials, such as paper and ink,
which are assigned to a job via purchase orders. This table contains
details of the purchase orders for each job and the key is
jobID/poID. Each printing job may have several purchase orders
assigned to it.
POItem Each purchase order (PO) may contain several PO items. This table
contains details of the PO items and jobID/poID/itemID form the
key.
and Item contains details of the materials which appear in POItem, and the
key is itemID.
Formulate the following queries using SQL:

(a) (1) List all publishers in alphabetical order of name.
SELECT pubName
FROM Publisher
ORDER BY pubName;
(2) List all printing jobs for the publisher ‘Gold Press’.
SELECT jobID
FROM BookJob b, Publisher p
WHERE b.pubID = p.pubID AND pubName = ‘Gold Press’;
(3) List the names and phone numbers of all publisher who have a rush job (jobType = ‘R’).
SELECT pubName, telNo

WHERE b.pubID = p.pubID AND jobType = ‘R’;
(4) List the dates of all the purchase orders for the publisher ‘Gold Press’.
SELECT poID, poDate

FROM PurchaseOrder po, BookJob b, Publisher p
WHERE po.jobID = b.jobID AND b.pubID = p.pubID AND
pubName = ‘Gold Press’;
(5) How many publisher fall into each credit code category?
SELECT creditCode, COUNT(*)

FROM Publisher
GROUP BY creditCode;
(6) List all job type’s with at least three printing jobs.
SELECT jobType, COUNT(*)

FROM BookJob
GROUP BY jobType
HAVING COUNT(*) >= 3;
(7) List the average price of all items.
SELECT AVG(price)
FROM Item;
(8) List all items with a price below the average price of an item.
SELECT *
FROM Item
WHERE price < (SELECT AVG(price) FROM Item);
(b) Create a view of publisher details for all publisher who have a rush printing job,
excluding their credit code.
CREATE VIEW PB (pubID, pubName, street, city, postcode, telNo)

AS SELECT p.pubID, pubName, street, city, postcode, telNo
WHERE b.pubID = p.pubID AND jobType = ‘R’;
6.5 The relational schema shown below is part of a hospital database. The primary keys are
highlighted in bold.
Patient (patientNo, patName, patAddr, DOB)

Ward (wardNo, wardName, wardType, noOfBeds)
Contains (patientNo, wardNo, admissionDate)
Drug (drugNo, drugName, costPerUnit)
Prescribed (patientNo, drugNo, unitsPerDay, startDate, finishDate)
Formulate the following SQL statements:
(1) List all the patients’ details, alphabetically by name.
SELECT *
FROM Patient
ORDER BY patName
(2) List all the patients contained in the ‘Surgical’ ward.
SELECT p.patientNo, p.patName

FROM Patient p, Ward w, Contains c
WHERE w.wardNo = c.wardNo AND c.patientNo = p.patientNo AND
wardName= ‘Surgical’
(3) List all the patients admitted today.
SELECT p.patientNo, p.patName

FROM Patient p, Contains c
WHERE c.patientNo = p.patientNo AND admissionDate = ‘today’
(4) Find the names of all the patients being prescribed ‘Morphine’.
SELECT p.patName
FROM Patient p, Prescribed pr, Drug d

WHERE pr.patientNo = p.patientNo AND pr.drugNo = d.drugNo AND
drugName = ‘Morphine’.
(5) What is the total cost of Morphine supplied to a patient called ‘John Smith’ ?
SELECT SUM(((finishDate – startDate) * unitsPerDay) * costPerUnit) AS

totalCost
FROM Patient p, Prescribed pr, Drug d
WHERE pr.patientNo = p.patientNo AND pr.drugNo = d.drugNo AND
drugName = ‘Morphine’ AND patName = ‘John Smith’
(6) What is the maximum, minimum and average number of beds in a ward? Create
appropriate column headings for the results table.
SELECT MAX(noOfBeds) AS Maximum, MIN(noOfBeds) AS Minimum,

AVG(noOfBeds) AS Average
FROM Ward
(7) For each ward that admitted more than 10 patients today, list the ward number, ward
type and number of beds in each ward.
SELECT w.wardNo, wardType, noOfBeds

FROM Patient p, Ward w, Contains c
WHERE w.wardNo = c.wardNo AND c.patientNo = p.patientNo AND
admissionDate = ‘today’
GROUP BY wardNo, wardType, noOfBeds
HAVING COUNT(*) > 10
(8) List the numbers and names of all patients and the drugNo and number of units of
their medication. The list should also include the details of patients that are not
prescribed medication.
SELECT *
FROM Patient p Left Join Prescribed pr ON pr.patientNo = p.patientNo
5.6 A relational database contains details about journeys from Paisley to a variety of destinations
and contains the following relations:
Operator (opCode, opName)

Journey (opCode, destinationCode, price)
Destination (destinationCode, destinationName, distance)
Each operator is assigned a unique code (opCode) and the relation operator records the
association between this code and the operator’s name (opName). Each destination has a
unique code (destinationCode) and the relation destination records the association between this
code and the destination name (destinationName), and the distance of the destination from
Paisley. The relation Journey records the price of an adult fare from Paisley to the given
destination by as specified operator, several operators may operate over the same route.
Formulate the following queries using SQL (the answers to these queries in relational algebra,
tuple relational calculus, and domain relational calculus were given in the previous section):
(a) List the details of journeys less than £100.
SELECT *
FROM Journey
WHERE price < 100;
(b) List the names of all destinations.
SELECT destinationName
FROM Destination;
(c) Find the names of all destinations within 20 miles.
FROM Destination
WHERE distance < 20;
(d) List the names of all operators with at least one journey priced at under £5.
SELECT opName
FROM Operator o, Journey j
WHERE (o.opCode = j.opCode) AND (j.price < 5);
(e) List the names of all operators and prices of journeys to ‘Ayr’.
SELECT opName, price

FROM Destination d, Operator o, Journey j
WHERE (j.destinationCode = d.destinationCode) AND
(o.opCode = j.opCode) AND (destinationName = ‘Ayr’);
(f) List the names of all destinations that do not have any operators.
FROM Destination d
WHERE destinationCode NOT IN (SELECT destinationCode FROM Journey);
Employee (empNo, eName, salary, position)

Aircraft (aircraftNo, aName, aModel, flyingRange)
Flight (flightNo, from, to, flightDistance, departTime, arriveTime)
Certified (empNo, aircraftNo)
where Employee contains details of all employees (pilots and non-pilots) and empNo
is the key.
AirCraft contains details of aircraft and aircraftNo is the key.
Flight contains details of the flights and flightNo is the key.
and Certified contains details of the staff who are certified to fly an aircraft, and
empNo/aircraftNo form the key.
Formulate the following queries in SQL (the answers to these queries in relational algebra, tuple relational
calculus, and domain relational calculus were given in the previous section).
(1) List all Boeing aircraft.
SELECT *
FROM Aircraft
WHERE aName = ‘Boeing’;
(2) List all Boeing 737 aircraft.
SELECT *
FROM Aircraft
WHERE (aName = ‘Boeing’) AND (aModel = ‘737’);
(3) List the employee numbers of pilots certified for Boeing aircraft.
SELECT empNo
FROM Aircraft a, Certified c
WHERE (a.aircraftNo = c.aircraftNo) AND (a.aName = ‘Boeing’);
(4) List the names of pilots certified for Boeing aircraft.
SELECT eName
FROM Aircraft a, Certified c, Employee e
WHERE (a.aircraftNo = c.aircraftNo) AND (e.empNo = c.empNo) AND
(a.aName = ‘Boeing’);
(5) List the aircraft that can fly nonstop from Glasgow to New York (flyingRange >
flightDistance).
SELECT a.aircraftNo
FROM Aircraft a, Flight f
WHERE (from = ‘Glasgow’  to = ‘New York’)  (flyingRange > flightDistance);
(6) List the employee numbers of employees who have the highest salary.
SELECT empNo
FROM Employee e1
WHERE e1.salary = (SELECT MAX(e2.salary)
FROM Employee e2);
(7) List the employee numbers of employees who have the second highest salary.
SELECT empNo
FROM Employee e1
WHERE e1.salary = (SELECT MAX(e2.salary)
FROM Employee e2
WHERE e2.salary <> (SELECT MAX(e3.salary)
FROM Employee e3);
(8) List the employee numbers of employees who are certified for exactly three aircraft.
SELECT c1.empNo
FROM Certified c1, Certified c2, Certified c3
WHERE (c1.empNo = c2.empNo) AND (c2.empNo = c3.empNo) AND
(c1aircraftNo <> c2.aircraftNo) AND (c2.aircraftNo <> c3.aircraftNo) AND
(c3.aircraftNo <> c1.aircraftNo)
EXCEPT
SELECT c4.empNo
FROM Certified c4, Certified c5, Certified c6, Certified c7
WHERE (c4.empNo = c5.empNo) AND (c5.empNo = c6.empNo) AND
(c6.empNo = c7.empNo) AND (c4.aircraftNo <> c5.aircraftNo) AND
(c4.aircraftNo <> c6.aircraftNo) AND (c4.aircraftNo <> c7.aircraftNo) AND
(c5.aircraftNo <> c6.aircraftNo) AND (c5.aircraftNo <> c7.aircraftNo) AND
(c6.aircraftNo <> c7.aircraftNo)
8.1 Write SQL*Plus commands to do the following:
 Connect to the database ‘OracleDB’ with the user name ‘SalesTracking’ and password
‘sales’;
CONNECT SalesTracking/sales@OracleDB
 Make sure that the Autocommit is off;
SET Autocommit OFF;
 Display the name of the current user;
SHOW user;
 List the names of all tables in this schema;
SELECT * FROM USER_TABLES (or just TAB);
 Display the structure of the table Branch;
DESC Branch;
 List the names of all sequences in this schema.
SELECT sequence_name FROM user_sequences;
8.2 Assume that the following table is created and stored in your database:
Staff (staffNo, name, post, salary, sex, DOB)
Write SQL statements to do the following:

(a) Increment the salary of managers by 5%;
UPDATE Staff
SET salary = salary * 1.05
WHERE post = ‘Manager’;
(b) Remove the records of all 'salesmen' from the Staff table;
DELETE FROM Staff

WHERE post = ‘salesman’;
(c) List the Staff table tablespace name, pctfree, and pctused.
SELECT tablespace_name, pct_free, pct_used

FROM USER-TABLES
WHERE table_name = ‘Staff’;
8.3 What is a sequence? Write an SQL statement to create a sequence that starts from 10 and is
incremented by 10 up to a maximum value of 10000. The sequence should continue to
generate values after reaching its maximum value.
A sequence is a database object that is used to create and maintain a sequence of numbers for
tables.
CREATE SEQUENCE StaffSeq
START WITH 10
INCREMENT BY 10
MAXVALUE 10000
CYCLE;
8.4 (a) What is a ‘table’ and what is a ‘tablespace’ in Oracle9i? Write an SQL statement to
create a tablespace, say MyTableSpace, with default storage. You can use a datafile
called ‘MyDataFile.dbf ’ for this tablespace.
 A table is a basic conceptual structure that stores data. A table represents an

entity or a relationship.
 A tablespace is logical name for a database storage unit that stores related
database objects.
CREATE TABLESPACE MyTableSpace

DATAFILE '\\ict-oracle-s\o$\oracle\oradata\ MyDataFile.dbf' SIZE 5M REUSE
DEFAULT STORAGE (INITIAL 5K
NEXT 5K
MINEXTENTS 5
MAXEXTENTS 100
)
ONLINE;
(b) Explain what is a DUAL table, where is it stored, and what is it useful for? Give an
example of its use.
 A DUAL is a table with one column called DUMMY and contains one row with
a value 'X'. It is automatically created by Oracle along with the data dictionary.
 DUAL is stored in the schema of the user SYS, but is accessible by the name
DUAL to all users.
 DUAL is useful for computing a constant expression with the SELECT
statement. Because DUAL has only one row, the constant is returned only once.
e.g. SELECT LENGTH ('My String') FROM DUAL;
8.5 (a) What is a named block in PL/SQL and how many types does PL/SQL support?
Named blocks are PL/SQL subprograms. PL/SQL has two types of subprograms:
procedures and functions.
(b) Specify the general structure of an anonymous block.
Named block general structure
[DECLARE
Declaration statement]
BEGIN
Executable statements
[EXCEPTION
Exception handler statements]
END;
(c) Assume that the following tables are part of a Company database schema
Customer(custNo, custName, address, sex, DOB, creditLimit)

HighCredit(hcCustNo, hcCustName, hcCreditLimit)
Create a PL/SQL procedure object in your schema. Using a Cursor the procedure code
should be able to retrieve records of all customers and check their credit limit. If a
customer credit limit is greater than 100000 the procedure should insert a record in the
table HighCredits.
-- Using Cursor FOR Loop

CREATE OR REPLACE PROCEDURE Check_Credit
IS
-- Declare a Cursor
CURSOR C1 IS
SELECT custNo, custName, creditLimit FROM Customer;
BEGIN
FOR Customer_rec IN C1 LOOP
IF (Customer_rec.creditLimit > 100000) THEN
INSERT INTO HighCredit VALUES (Customer_rec.custNo,
Customer_rec.custName, Customer_rec.creditLimit);
END IF;
END LOOP;
END;
-- Students can also use OPEN, FETCH, CLOSE Cursor.

CREATE OR REPLACE PROCEDURE Check_Credit
IS
-- Declare a Cursor
CURSOR C1 IS
SELECT custNo, custName, creditLimit FROM Customer;
Customer_rec C1%ROWTYPE ; -- declare a record of cursor type
BEGIN
OPEN C1;
LOOP
FETCH C1 INTO Customer_rec;
-- exit loop when no more records
EXIT WHEN C1%NOTFOUND;

IF (Customer_rec.creditLimit > 100000) THEN
INSERT INTO HighCredit values (Customer_rec.custNo,
Customer_rec.custName, Customer_rec.creditLimit);
END IF;
END LOOP;
CLOSE C1;
END;
(d) Invoke the procedure from SQL*Plus.
BEGIN
Check_Credit;
END;
8.6 Assuming that a single-row form is created on the following Country table:
Country(countryId, countryName, currencyName, continent, population)
State the steps you would take to execute the following query using the form: List all the
African countries whose currency name is the pound and population greater than 20 millions.
Order the result by country name.
 Run the form. Press the icon Enter Query or select Enter from the Query menu;
 Type colon ( : ) in any of the fields and press Execute Query icon or select Execute from
the Query menu;
 In the Query Where dialog box type the following expression:
currencyName = ‘pound’ AND continent = ‘Africa’ AND population > 20

ORDER BY countryName
(a) Press OK button. Scroll through the result using the arrow key.
8.7 Oracle database consists of logical and physical database structures. Describe each of the
following concepts and state to which structure they belong:
(a) Schema;
Schema - Schema is a collection of database objects (e.g. tables, views, sequences,

etc.), owned by a database user and has the same name as the name of the owner.
Schema is a logical database structure.
(b) Data block;
A data block is the smallest unit of storage that oracle can use or allocate. A data
block size should be a multiple of the operating system's block size. A data block is a
logical data structure.
(c) Redo log file.

A Redo log file is a physical data structure that is used to record all changes made to
the data for recovery purposes. Every Oracle database has a set of two or more Redo
log files.
8.8 State the three primary uses of Oracle Data Dictionary.
The primary use of Oracle Data Dictionary are:

 Oracle accesses the data dictionary to find information about users, schema objects, and
storage structures.
 Oracle modifies the data dictionary every time that a data definition language (DDL)
statement is issued.
 Any Oracle user can use the data dictionary as a read-only reference for information about
the database.
8.9 (a) SQL*Plus environment variables are set to default values when SQL*Plus is started.
State three ways by which users can change the default setting.
Users can change the default setting in three ways (use SET command):
 Interactively by resetting each variable;
 Write all commands in a script file and run the file each time the user wishes to
change the setting;
 Write all commands in the file login.sql. The file must be stored in the current
directory (folder) of SQL*Plus.
(b) Write SQL*Plus commands to do the following:
 List all SQL*Plus commands;

HELP index;
 Connect to the database 'OracleDB' with the user name 'CustomerOrders' and
password 'customer';
CONNECT CustomerOrders/customer@OracleDB;
 Use the COLUMN command to set the output format of the field 'StaffName' to
30 characters.
COLUMN StaffName FORMAT A3;
(c) Write SQL statements to do the following:
(1) List the name and granted roles of the current user;
SELECT USERNAME, GRANTED_ROLE

FROM USER_ROLE_PRIVS;
(2) List name and type of all objects owned by the current user;
SELECT OBJECT_NAME, OBJECT_TYPE

FROM USER_OBJECTS;
(3) List table name and the tablespace name to which the table is assigned of all
tables owned by the current user;
SELECT TABLE_NAME, TABLSPACE_NAME

FROM USER_TABLES;
(4) List the next value of the sequence Emp_Seq.
SELECT Emp_Seq.NEXTVAL
FROM SYS.DUAL;
(d) Given the following schema of a database table:
Orders(orderNo, dateDue, totalValue, status, custNo)
where: custNo is the number of the customer who placed the order.
Write a script with the necessary formatting commands (e.g. COLUMN, BREAK,
TTITLE, etc.) and an SQL statement to create a report that lists each customer
number, each order number he/she placed, the total value of each order, the sum of
the total values of all the orders for each customer, and the grand total of all the
orders placed by all customers under the following heading:
Orders Total Values

Page: xx
Customer Order Total
No No Value
COLUMN custNo FORMAT A10 HEADING 'Customer | No'

COLUMN orderNo FORMAT A8 HEADING 'Order | No'
COLUMN totalValue FORMAT L99999.99 HEADING 'Total | Value'
SET UNDERLINE =
BREAK ON custNo SKIP 1 ON Report skip 2

COMPUTE SUM OF totalValue on custNo
COMPUTE SUM OF totalValue on Report
TTITLE CENTER 'Orders Total Value' SKIP 1 -

Right ' Page: ' FORMAT 999 sql.pno SKIP1
SELECT custNo, orderNo, totalValue

FROM Orders
ORDER BY custNo, orderNo;
8.10 (a) In PL/SQL what is a Cursor? When do we use an explicit Cursor? What do you do
when you declare a Cursor?
 A cursor is a PL/SQL construct that lets you name a work area and access its
stored information.
 We declare an explicit cursor for queries that returns more than one row so that
we can process the rows individually.
 When we declare an explicit cursor we name it and associate it with a specific
query.
(b) Specify the general structure of a named function block.
FUNCTION name [(parameter list)] RETURN datatype IS

Declaration section
BEGIN
Executable statements
EXCEPTION
Exceptions statements
END;
(c) Assume that the following tables are part of a Library database system
Employee(empNo, fName, lName, street, city, sex, salary, libName)

GlasgowEmployees(gEmpNo, fName, lName, sex, salary)
Create a PL/SQL procedure object in your schema. Using a Cursor the procedure
code should be able to retrieve records of all employees and check in which city they
live. If an employee lives in Glasgow the procedure should insert a record in the
table GlasgowEmployees.
CREATE OR REPLACE PROCEDURE Glasgow_Members

IS
-- Declare a Cursor
CURSOR C1 IS
SELECT * FROM Employee;
BEGIN
FOR emp_rec IN C1 LOOP
IF (emp_rec.city = 'Glasgow') THEN
INSERT INTO GlasgowEmployees values (emp_rec.empNo,
emp_rec.fName, emp_rec.lName, emp_rec.sex, emp_rec.salary);
END IF;
END LOOP;
END;
(d) Invoke the procedure from SQL*Plus.
-- Invoke procedure from SQL*Plus

BEGIN
Glasgow_Members;
END;
(e) Assume that a single-row form is created on the Member table and you are using it to
enter data in the table. State the steps you would take to create a trigger that will fire
and insert the next card number in the member record when the record is saved.
Specify the type of trigger you will use and write the trigger code. Assume that a
sequence generator already exists.
 Double-click ‘cardNo’ field in Object Navigator. In the Property Palette window

that opens change the property ‘Required’ under the Data section to ‘No’;
 Drag the field icon, in the Object navigator, to the end of the attributes list;
 Right click the table block icon in Object Navigator;
 Select ‘PL/SQL’ Editor and choose PRE-INSERT trigger from the window that
opens and click OK.
 In the window that opens (PL/SQL Editor window) type the following code:
SELECT Member_Seq.NEXVAL
INTO :Member.cardNo
FROM SYS.dual;
8.11 (a) What is a SQL*Plus script? Why is it a good practice to create a log file while a
SQL*Plus script is executed, and how can the log file be created?
SQL*Plus script is a collection of SQL*Plus statements. A log file can be used to

record the action of SQL*Plus when as a script is executed. You can use ‘spool’
command to create a log file.
(b) What can a ‘database trigger’ be used for? Explain concisely what the code below
does.
CREATE TRIGGER st_customer_trg BEFORE INSERT OR UPDATE ON st_customer

FOR EACH ROW
BEGIN
IF :old.customer_insert_user IS NULL THEN
:new.customer_insert_user := USER;
:new.customer_insert_date := SYSDATE;
:new.customer_update_user := NULL;
:new.customer_update_date := NULL;
ELSE
:new.customer_insert_user := :old.customer_insert_user;
:new.customer_insert_date := :old.customer_insert_date;
:new.customer_update_user := USER;
:new.customer_update_date := SYSDATE;
END IF;
END;
A database trigger is a stored procedure that Oracle invokes ("fires") automatically

when certain events occur, for example, when a DML operation modifies a certain
table. Triggers enforce business rules, prevent incorrect values from being stored, and
reduce the need to perform checking and cleanup operations in each application.
The above trigger fires before insert or update on st_customer. If the field
customer_insert_user of the row in question is empty, USER, SYSDATE etc are
inserted, otherwise original data is kept.
(c) Assume the following tables:
Order(orderNo, statusCode, customerNo,…)

Item(itemNo, orderNo, price, amount,…)
which capture, among others, the items that are ordered by customers. Explain what
the following code is and explain how it works by writing comments against each
line of the code:
FUNCTION total_sales (customerNo_in IN orders.customerNo%TYPE,

statusCode_in IN order.statusCode%TYPE:=NULL)
RETURN NUMBER
IS
statusCode_int order.statusCode%TYPE:= UPPER(statusCode_in);
CURSOR salesCur (statusCode_in IN statusCode%TYPE) IS

SELECT SUM (price*amount)
FROM item
WHERE EXISTS (select ‘x’ FROM Order
WHERE order.orderNo = item.orderNo AND customerNo = customerNo_in
AND statusCode = statusCode_in);
returnValue NUMBER;
BEGIN
OPEN salesCur (statusCode_int);
FETCH salesCur INTO returnValue;
IF salesCur%notfound
THEN
CLOSE salesCur;
RETURN NULL;
ELSE
CLOSE salesCur;
RETURN returnValue;
END IF;
END total_sales;
The student should be able to explain the main points of the function: the structure, the
meaning and type of the parameters, the cursor, variables, the embedded SQL statement
and the main executable body.
Chapter 9 Object-Relational DBMSs
9.1 Compare and contrast the two manifestos: Object-Oriented Database System Manifesto based on
the object-oriented paradigm (Atkinson et al., 1989a) and the Third Generation Database System
Manifesto published by the Committee for Advanced DBMS Function (CADF).
See Appendix N.
9.2 Discuss how the new version of the SQL standard addresses object-oriented data management.
Give examples to illustrate your answers.
See Section 9.5.
9.3 The on-going debate between proponents of the relational data model and proponents of the
object-oriented data model (if one truly exists), resembles that between the proponents of
network/hierarchic systems and relational systems a couple of decades ago. However, another
system is evolving that may have a significant impact on what the database management system
of the future may be and that is the Object-Relational Database Management System (ORDBMS).
Give your definition of an ORDBMS. Compare and contrast the ORDBMS and the Object-
Oriented Database Management System (OODBMS).
There is no one definition of an ORDBMS. The key feature is to support object-oriented features
within the confines of a relational system.
Comparison should be based around main differences – ORDBMS will probably use SQL3,
giving object-oriented features, object identity, triggers, etc., although changes will be necessary
to storage mechanisms (e.g. Quad-tree, K-D-B-tree) with resulting changes in the query optimizer.
OODBMS will probably evolve with ODMG standard, closely linked to programming language.
9.4 Discuss how the proposed SQL:2011 standard will handle object identity and give an example of
its intended use.
Object identity in SQL:2011 achieved through a referenceable base table. For example:
CREATE TABLE Person OF PERSON_TYPE (

REF IS oid SYSTEM GENERATED);
CREATE TYPE STAFF_TYPE UNDER PERSON_TYPE AS (

staffNo VARCHAR(5) NOT NULL UNIQUE,
nextOfKin REF(PERSON_TYPE)
REFERENCES ARE CHECKED ON DELETE
CASCADE,
branchNo VARCHAR(3) NOT NULL)
NOT FINAL;
CREATE TABLE Staff OF STAFF_TYPE (

PRIMARY KEY staffNo);
Unfortunately, there is an added complication that the reference type REF(PERSON_TYPE)

could refer to a row in any table having rows of type Person_Type. For example, we may
create another table based on Person_Type as follows:
CREATE TABLE AnotherPersonTable OF PERSON_TYPE (

REF IS oid SYSTEM GENERATED);
To limit the references to a specific table, a SCOPE clause can be added to the Staff table:
CREATE TABLE Staff OF STAFF_TYPE (

PRIMARY KEY staffNo,
nextOfKin SCOPE Person);
References can be used in path expressions that permit traversal of object references to
navigate from one row to another. To traverse a reference, the dereference operator (–>) is
used.
9.5 Consider the following schema:
Pet (petNo, petName, petDescription, dateRegistered, picture, surgeryNo,

doctorStaffNo)
Staff (staffNo, sName, sAddress, position, surgeryNo)
Surgery (surgeryNo, surgeryAddress, surgeryTelNo)
where:
Pet contains details of pets and the pet number (petNo) is the key. The surgery where the
pet is registered is given by the surgery number (surgeryNo). A pet can only be registered
with one surgery at a time. The doctor who treats the pet is given by the doctorStaffNo.
Picture contains an image of the pet.
Staff contains details of staff and staff number (staffNo) is the key.
Surgery contains details of each surgery and the surgery number (surgeryNo) is the key.
Now consider the following SQL:2003 query:
SELECT petNo, staffNo

FROM Pet p, TABLE StaffDoctors s
WHERE p.doctorStaffNo = s.staffNo AND
p.brownShortHairedTerrier(picture) AND s.surgeryNo = ‘S3’ AND
p.dateRegistered < ‘1-Jan-90’;
The routine brownShortHairedTerrier() is an externally defined routine that searches the

specified image for certain characteristics. The StaffDoctors() function is fully defined within
SQL3 as:
CREATE FUNCTION StaffDoctors() RETURNS SET(Staff)

SELECT * FROM Staff WHERE position = ‘Doctor’;
Discuss how you would want a query optimiser within an ORDBMS to handle this type of query.
Use a relational algebra tree to illustrate your answer.
(i) The first expectation is that the QP could flatten the SQL-defined function
StaffDoctors().
(ii) Secondly, the normal heuristics of pushing the selection operations down past the join
would need to be modified in this case, as it would be better to run the external function,
which will be CPU-intensive, after the join operations on a smaller set of records.
(iii) Have index on the result of functions.
9.6 Given the following example of an object type in Oracle:
CREATE TYPE carType AS OBJECT (

regNo VARCHAR2 (8),
make VARCHAR2 (20),
model VARCHAR2 (20));
CREATE TABLE vehicles OF carType;
State the two ways:

(a) in which the ‘vehicles’ table can be viewed;
The table can be viewed as:

 a single-column table in which each row is a carType object, allowing users to
perform object-oriented operations;
 a multi-column table in which each attribute of the object type carType, namely
regNo, make and model, occupies a column, allowing users to perform relational
operations.
(b) to insert a record in the ‘vehicles’ table;
INSERT INTO vehicles VALUES (carType (‘T567SDF’, ‘Nissan’, ‘Primera’));

INSERT INTO vehicles VALUES (‘S123NPM’, ‘Ford’, ‘Escort’);
(c) to retrieve records of all Ford cars from the ‘vehicles’ table.
SELECT VALUE(v) FROM vehicles v WHERE v.make = ‘Ford’;

SELECT * FROM vehicles WHERE make = ‘Ford’;
9.7 In the Object Relational Model (ORM) an object-type has a name, attributes, and methods.
 What is a method? And what is the principal use of methods?
A method is procedure or a function that is declared in the object-type definition to

implement behaviour that we want objects of that type to perform. The principal use
of methods is to provide access to an object's data.
 What kinds of methods are supported by Oracle ORM.
Three kinds of methods are supported by Oracle ORM: member methods: map and
order methods; static methods; and constructor method.
Part 3 Database Analysis and Design Techniques

Chapter 10 Database Planning, Design, and Administration
10.1 Explain the procedures and techniques needed to achieve a conceptual data model.
The answer should discuss the different stages that might be identified, such as requirements
analysis through identification of user views etc. The use of Entity–Relationship modeling and
normalization techniques should be explained, and the merging of views and validating the model
against transactions.
10.2 The application prototyping approach to software development gives users what they want by
employing principles used in other engineering disciplines, i.e. build a working model and use it.
Critically discuss the arguments for and against this approach showing how the software
development life cycle is consequently affected.
What are the necessary conditions for it to be successful and what are the dangers/problems that
could arise?
The arguments against prototyping tend to support the conventional approach to software
development, which maintains, for example, that all requirements can be specified, the static
model is correct, people can communicate effectively etc. Many of the arguments supporting
prototyping are the opposite, such as, people do not communicate effectively, they do not know
what they want necessarily etc. therefore it is better to build a model and show it to them, thus
obtaining feedback. The prototype effectively captures requirements specifications, therefore it
fits in to the software development lifecycle after initial requirements analysis and prior to any
system development. It relies on the use of appropriate software tools, requires a commitment to
the approach on the part of both developers and clients who must be prepared to be involved. The
application must be suitable and the prototype will go through several iterations. Problems include
the danger of sloppy work lacking any rigorous methodology, team boredom, prototype seen as
the end product.
10.3 Explain what is meant by application prototyping and why the development of Fourth Generation
software should support it.
In what way(s) does it help software development?
Much of the same points made in 9.2 also apply to this question. The answer should include that
application prototyping involves building models, is an iterative process that should be carried out
quickly, relies on a lot of user involvement and intends to capture the requirements definition.
Consequently, fourth generation tools support it because they permit components such as screens
and reports to be quickly designed, generated, and altered if necessary. In considering how
software development is helped, some reference should be made to the more traditional approach
to software development.
10.4 Explain why planning is important in the database lifecycle, highlighting particular points that
are important during the process.
Discuss the major functions of the Database Administration role, in particular indicating the
skills most relevant to the role.
The planning process should be briefly discussed, such as looking at the overall objectives, what
the current situation is like especially with regard to any particular problems that might be
prevalent. It is possible that a generalized data model for the whole enterprise may be produced,
which may then be subdivided into different functional areas giving priorities for development.
The major functions of database administration concern physical database design,

implementation, testing, data conversion and loading, and operational maintenance. Database
administration is also responsible for DBMS evaluation, although this would not be carried out on
a regular basis. These functions should be described in more detail, and include the skills required
– obviously technically oriented, although other skills will be needed for the minor functions.
10.5 Discuss the importance of planning in the database lifecycle, detailing the significant stages in
the process.
Once the design phase of the database lifecycle has been initiated, the objective is to achieve a
conceptual database schema design.
Explain the procedures necessary in order to achieve this objective.
The points made in the first part of 9.4 apply here also. In planning, current and future needs are
determined across the organization. Management commitment is crucial. Specific goals and
objectives are defined, business functions are defined, an information model may be developed
and an implementation plan. This may be revised as the current situation changes. Overall costs
and benefits are assessed,
In commencing design, we are starting data modeling. The process of data modeling should be
explained including the use of various techniques to aid this process. The answer may include
reference to top-down and bottom-up approaches.
Chapters 12–13 (Enhanced) Entity–Relationship Modeling
12.1 Represent each of the following requirements with an ER diagram:
(a) A company called Perfect Pets runs a number of clinics. A clinic has many staff and a
member of staff manages at most one clinic (not all staff manage clinics). Each clinic
has a unique clinic number (clinicNo) and each member of staff has a unique staff
number (staffNo).
Has
Clinic Staff
1..1 1..*
clinicNo Manages staffNo
0..1
(b) When a pet owner contacts a clinic, the owner’s pet is registered with the clinic. An
owner can own one or more pets, but a pet can only register with one clinic. Each owner
has a unique owner number (ownerNo) and each pet has a unique pet number
(petNo).
PetOwner IsContactedBy Clinic
ownerNo 1..* 1..1 clinicNo
1..1 1..1
Owns
Registers
1..*
Pet
petNo
1..*
(c) When the pet comes along to the clinic, it undergoes an examination by a member of the
consulting staff. The examination may result in the pet being prescribed with one or
more treatments. Each examination has a unique examination number (examNo) and
each type of treatment has a unique treatment number (treatNo).
Pet
petNo
1..1
Undergoes
1..*
Examination Performs Staff
examNo 0..* 1..1 staffNo
Treatment UsedIn PetTreatment ResultsIn 1..1
treatNo 1..1 1..* 1..*

(d) Represent the complete set of requirements in one ER diagram.
Has
PetOwner IsContactedBy Clinic Staff
1..1 1..*
ownerNo clinicNo Manages staffNo
1..* 1..1
0..1 1..1
1..1 1..1 1..1
Owns
Registers
1..*
Performs
Pet
petNo
1..*
1..1
Undergoes
1..*
Examination 0..*
examNo
treatNo 1..1 1..* 1..*
12.2 Represent each of the following requirements with an ER diagram:
A regional council requires the design of a database system that can provide information on
all schools in the region. The requirements collection and analysis phase of the database
design process has provided the following data requirements for the schools database system.
(a) Every school has many pupils and many teachers. Each pupil is assigned to one school
and each teacher work for one school only.
(b) Each teacher teaches more than one subject but a subject may be taught by more than
one teacher. The database should store the number of hours a teacher spent teaching a
subject. Data held on each teacher includes his/her national Insurance Number (NIN)
name (first and last), sex, and qualifications. The data held on each subject includes
subject title and type.
(c) Each pupil can study more than one subject and a subject may be studied by more than
one pupil. Data held on each pupil includes the pupil's code, name (first and last), sex,
and date of birth.
(d) Each school is managed by one of its teachers. The database should keep track of the
date he/she started managing the school. Data stored on each school includes the
school's code, name, address (town, street, and post code) and phone.
The complete diagram is shown below.

Pupil Subject
Study
pCode 0..* 1..* title
1..* 1..*
Joins hours Teaches
1..1 0..*
Employs
School Teacher
1..1 1..*
sCode nin
Manages
0..1 1..1
date
Reliable Rentals Case Study
The requirements collection and analysis phase of the database design process has provided
the following data requirements for a company called Reliable Rentals, which rents out
vehicles (cars and vans). The Company has various outlets (garage/offices) throughout
Glasgow. Each outlet has a number, address, phone number, fax number, and a manager who
supervises the operation of the garage and offices at each site.
Each site is allocated a stock of vehicles for hire, however, individual vehicles may be moved
between outlets, as required. Only the current location for each vehicle is stored. The
registration number uniquely identifies each vehicle for hire and is used when hiring a vehicle
to a client.
Clients may hire vehicles for various periods of time (minimum 1 day to maximum 1 year).
Each individual hire agreement between a client and the Company is uniquely identified using
a hire number. Information stored on the vehicles for hire include: the vehicle registration
number, model, make, engine size, capacity, current mileage, date MOT due, daily hire rate,
and the current location (outlet) of each vehicle.
The data stored on a hire agreement includes the hire number, the client’s number, name,
address and phone number, date the client started the hire period, date the client wishes to
terminate the hire period, the vehicle registration number, model and make, the mileage before
and after the hire period. After each hire a member of staff checks the vehicle and notes any
fault(s). Fault report information on each vehicle is stored, which records the name of the
member of staff responsible for the check, date checked, whether fault(s) where found (yes or
no), the vehicle registration number, model, make and the current mileage.
The Company has two types of clients: personal and business. The data stored on personal
clients includes the client number, name (first and last name), home address, phone number,
date of birth and driving licence number. The data stored on business clients includes the
client number, name of business, type of business, address, telephone and fax numbers. The
client number uniquely identifies each client and the information stored relates to all clients
who have hired in the past and those currently hiring a vehicle.
Information is stored on the staff based at various outlets including: staff number, name (first
and last name), home address, home phone number, date of birth (DOB), sex, National
Insurance Number (NIN), date joined the Company, job title and salary. Each staff member is
associated with a single outlet but may be moved to an alternative outlet as required, although
only the current location for each member of staff is stored.
12.3 Create a conceptual schema for Reliable Rentals using the concepts of the Enhanced Entity–
Relationship (EER) model. To simplify the diagram, only show entities, relationships and the
primary key attributes. Specify the cardinality ratio and participation constraint of each
relationship type. State any assumptions you make when creating the EER model (if
necessary).
FaultReport Raises Staff

0..* 1..1 1..*staffNo
1..* 1..*
Generates Has
1..1 1..1
Vehicle Rents Outlet

1..* 1..1
regNo outletNo
1..1
HiredBy
1..*
Client Hires HireAgreement

1..1 1..* hireNo
clientNo
PersonalClient BusinessClient
Chapters 14 - 15 Normalization
14.1 Explain the purpose of data normalization and describe the main steps in the normalization
process.
The process of organizing data so that it avoids data inconsistency prevents update anomalies. The
main steps include: 1NF to remove repeating groups; 2NF to remove partial dependencies; 3NF to
remove transitive dependencies and BCNF to remove remaining anomalies from dependencies.
1NF
14.2 The table shown below displays the details of the roles played by actors/actresses in films.
PK
filmNo filmNo
fTitle actorNo director
dirNo role fTitle
actorNo dirNo
aName director
role aName timeOnScreen
timeOnScreen
F1100 Happy Days D101 Jim Alan A1020 Sheila Toner Jean Simson 15.45 fd1
D101 Jim Alan A1222 Peter Watt Tom Kinder 25.38
fd2
D101 Jim Alan A1020 Sheila Toner Silvia Simpson 22.56
F1109 Snake Bite D076 Sue Ramsay A1567 Steven McDonald Tim Rosey fd319.56
D076 Sue Ramsay A1222 Peter Watt Archie

fd4
Bold 10.44
(a) Describe why the table shown below is notfd2

in and
first fd3
normal form (1NF).
violates 2NF
2NF
The table contains a repeating group; the details of actors/actresses are repeated for each
film. As a consequence, there are multiple values at the intersection of certain rows and
columns. filmNo actorNo role timeOnScreen
(b) The table shown above is susceptible to update anomalies. Provide examples of how
insertion, deletion,
filmNo fTitle and dirNo
modificationdirector
anomalies could occur on this table. aName
actorNo
Using the data in the table, the student should provide examples of how insertion,
deletion, and update anomalies could occur.
(c) Identify the functional dependencies represented by 3NF

fd4 violates the table shown above. State any
assumptions you make about the data shown in this table (if necessary).
3NF / BCNF
The functional
PK dependencies (fd1 to fd4) are shown in figure below. The student
should state any assumptions made about the data show in the table. For example, we
FK FK PK
may assume that a film has only one director.
filmNo actorNo role timeOnScreen dirNo director
(d) Using the functional dependencies identified in part (c), describe and illustrate the
process of normalization by converting thefd1table shown in Figure 1 to Boyce–Codd fd4
Normal Form (BCNF). Identify the primary and foreign keys in your BCNF relations.
PK of normalization is shown
The process FK in figure below. PK
filmNo fTitle dirNo actorNo aName
fd2 fd3
(e) Sketch an Entity–Relationship model for the data shown in table above.
Film Contains FilmRole Has Actor
filmNo {PK} 1..1 1..* Role 1..* 0..1 actorNo {PK}

fTitle timeOnScreen aName
1..*
Makes
1..1
Director
dirNo {PK}
director
14.3 Briefly describe how the techniques of normalization and Entity–Relationship modeling can
be used to produce a set of relations with desirable properties.
Normalization and Entity–Relationship modeling are complementary techniques that take a

bottom-up and top-down approach to database design, respectively. Both techniques enhance
the designers understanding of the data and facilitate the design of appropriately structured
relations.
14.4 Describe the purpose of normalizing data and identify the four most commonly used normal
forms.
The normalization process, as first proposed by Codd (1972a), takes a relational schema through a
series of tests to assess whether or not it belongs to a certain normal form. Normalizing data is the
process during which unsatisfactory relational schemas are decomposed by breaking up their
attributes into smaller relational schemas that possess desirable properties.
Codd proposed three normal forms called first, second and third normal forms. A stronger
definition of the third normal form was proposed later by Boyce–Codd normal form (BCNF). All
of these normal forms are based on the functional dependencies among the attributes of a relation.
Discuss how normal forms support a database designer.
A formal framework for analyzing relation schemas based on their keys and the functional
dependencies among their attributes. A series of tests that can be carried out on individual relation
schemas so that the relational database can be normalized to any degree.
14.5 The table below lists customer/car hire data. Each customer may hire cars from various
outlets throughout Glasgow. A car is registered at a particular outlet and can be hired out to a
customer on a given date.
carReg make model custNo custName hireDate outletNo outletLoc
M565 0GD Ford Escort C100 Smith, J 14/5/98 01 Bearsden
M565 0GD Ford Escort C201 Hen, P 15/5/98 01 Bearsden
N734 TPR Nissan Sunny C100 Smith, J 16/5/98 01 Bearsden
M134 BRP Ford Escort C313 Blatt, O 14/5/98 02 Kelvinbridge
M134 BRP Ford Escort C100 Smith, J 20/5/98 02 Kelvinbridge
M611 0PQ Nissan Sunny C295 Pen, T 20/5/98 02 Kelvinbridge
(a) The data in the table is susceptible to update anomalies. Provide examples of how
insertion, deletion, and modification anomalies could occur on this table.
Using the data shown in the table above, the student should provide examples of how
insertion, deletion, and update anomalies could occur.
(b) Identify the functional dependencies represented by the data shown in the table. State any
assumptions you make about the data.
For answer see figure below.
(c) Using the functional dependencies identified in part (b), describe and illustrate the process of
normalization by converting Table 1 to Third Normal Form (3NF) relations. Identify the
primary and foreign keys in your 3NF relations.

1NF
PK
carReg hireDate make model custNo custName outletNo outletLoc
fd1
fd2
fd3
fd4
fd5
fd2 violates 2NF
2NF
carReg make model outletNo outletLoc
carReg hireDate custNo custName
fd3 + fd4 + fd5 violates 3NF
3NF / BCNF
PK
FK FK PK
carReg hireDate custNo model make
fd1’ fd3
PK FK PK PK
carReg model outletNo custNo custName outletNo outletLoc
fd2’ fd4 fd5
(d) Sketch an Entity–Relationship model for the data shown in Table 1. Show all the entities,
relationships, and attributes.
Either of the following ER diagrams would be correct.

Customer Hires Car Has Outlet
custNo {PK} 1..* 1..* carRegNo{PK} 1..* 1..1 outletNo {PK}

custName model outletLoc
1..*
hireDate Makes
1..1
Manufacturer
model {PK}
make
Hire UsedIn Car Has Outlet
hireDate 1..* 1..1 carRegNo{PK} 1..* 1..1 outletNo {PK}

model outletLoc
1..* 1..*
Hires Makes
1..1 1..1
Customer Manufacturer
custNo {PK} model {PK}

custName make
14.6 Examine the table shown below. This table represents the hours worked per week for
temporary staff at each branch of a company.
staffNo branchNo BranchAddress name position hoursPerWeek
S4555 B002 City Center Plaza, Seattle, WA 98122 Ellen Layman Assistant 16
S4555 B004 16 – 14th Avenue, Seattle, WA 98128 Ellen Layman Assistant 9
S4612 B002 City Center Plaza,Seattle, WA 98122 Dave Sinclair Assistant 14
S4612 B004 16 – 14th Avenue, Seattle, WA 98128 Dave Sinclair Assistant 10
(a) The table shown above is susceptible to update anomalies. Provide examples of how
Using the data shown in this table, the student should provide examples of how insertion,
deletion, and update anomalies could occur.
(b) Identify the functional dependencies represented by the data shown in the table. State
any assumptions you make about the data (if necessary).
staffNo branchNo branchAddress name position hoursPerWeek
Composite
primary key
(c) Using the functional dependencies identified in part (b), describe and illustrate the
process of normalization by converting Table 1 to Third Normal Form (3NF)
relations. Identify the primary and foreign keys in your 3NF relations.
Primary key for table is staffNo, branchNo

Student should identify that primary key for table and describe why the table is in 1NF.
First Normal Form (1NF) is a relation in which the intersection of each row and column
contains one and only one value.
Student should indicate why table is not in 2NF. Second Normal Form (2NF) is a relation that
is in first normal form and every non-primary-key attribute is fully functionally dependent on
the primary key.
Converting to 2NF
Composite
primary key
staffNo branchNo BranchAddress name position hoursPerWeek
Remove name and

position column to
Remove new table, along with
branchAddress column a copy of staffNo
to new table, along with

a copy of branchNo
branchNo branchAddress staffNo name position
B002 City Center Plaza, Seattle, WA 98122 S4555 Ellen Layman Assistant
B004 16 – 14th Avenue, Seattle, WA 98128 S4612 Dave Sinclair Assistant
Becomes Becomes
primary primary
key key
staffNo branchNo hoursPerWeek
S4555 B002 16
S4555 B004 9
S4612 B002 14
S4612 B004 10
Becomes Becomes
foreign key foreign
key
Composite
primary key
The student should then describe the process of normalizing the table to 2NF by removing the
partial dependencies.
The student should say why the table is not in 3NF.

Third Normal Form (3NF) is a relation that is in first and second normal form in which no
non-primary-key attribute is transitively dependent on the primary key.
The student should provide a diagrammatic illustration of the process of normalizing the table
form 1NF to 3NF.
The student should present the 3NF tables displaying the primary key, foreign key(s) and
alternate key(s) for each table.
(d) Create an Entity–Relationship (ER) model using the Unified Modeling Language
(UML) to represent the data shown in Figure 1. Your ER model should show all
entities, relationships, and attributes.
hoursPerWeek
Staff Branch
Has
staffNo {PK} 1..* 1..* branchNo {PK}

name branchAddress
position
date/time instructorID
iFName iLName clientIDD cFName cLName cAddress
111 Storrie Road, Paisley

25/07/00.10.00 I456
29/07/00.10.00 I456
30/07/00.11.00 I344
120 Lady Lane, Paisley
2/08/00.13.00 I666
13 Renfrew Road, Paisley
2/08/00.13.00 I957
Worth 34 High Street, Paisley
25/08/00.10.00 I344
any assumptions you make about the data.
process of normalization by converting Table 1 to Third Normal Form (3NF) relations.
Identify the primary and foreign keys in your 3NF relations.

date/time instructorID iFName

iLName clientIDD cFName cLName cAddress
25/07/00.10.00 I456 Jane Anne Way 111 Storrie Road, Paisley
29/07/00.10.00 I456 Jane Anne Way 111 Storrie Road, Paisley
30/07/00.11.00 I344 Tom Anne Way 111 Storrie Road, Paisley
2/08/00.13.00 I666 Karen Mark Fields 120 Lady Lane, Paisley
2/08/00.13.00 I957 Steven John Brown 13 Renfrew Road, Paisley
25/08/00.10.00 I344 Tom Karen Worth 34 High Street, Paisley
fd1
fd2
fd3
fd4
fd5
The primary key of the original relation is appointment number (appNo) and the relation is
already in 2NF. The original relation also has two alternate keys instructorID, dateTime and
clientID, dateTime. The functional dependencies fd2 to fd5 represent transitive dependencies
in the original relation and must be removed to create the following relations in 3NF.
Appointment (appNo, dateTime, instructorID, clientID)

Primary Key appNo
Alternate Key dateTime, instructorID
Alternate Key dateTime, clientID
Foreign Key instructorID references Instructor (instructorID)
Foreign Key clientID references Client (clientID)
Instructor (instructorID, iFName, iLName)

Primary Key instructorID
Client (clientID, cFName, cLName, cAddress)

Primary Key clientID
jobID pickupDateTime
driverID dFNamedLName clientIDD cFNamecLName cAddress
25/07/00.10.00 Jane Anne 111 Storrie Road, Paisley

30/07/00.11.00 Tom Anne 111 Storrie Road, Paisley
2/08/00.13.00 Karen Mark 120 Lady Lane, Paisley
2/08/00.13.00 Steven John 13 Renfrew Road, Paisley
25/08/00.10.00 Tom Karen Worth 34 High Street, Paisley
any assumptions you make about the data.
process of normalization by converting Table 1 to Third Normal Form (3NF) relations.
Identify the primary and foreign keys in your 3NF relations.
pickupDateTime
driverID dFName
dLName clientIDD cFName
cLName cAddress
fd1
fd2
fd3
Students should identify that jobId is the PK for the table and then show the three functional
dependencies. Students may split up the pickupDateTime column to separate Date and
Time…This is OK but not necessary.
The original table is in 2NF and is transformed into 3NF by removing functional dependencies
fd2 and fd3. The structure of the three tables created is shown below.
PK FK FK
pickupDateTime
driverID clientIDD
fd1
PK
driverID dFName
dLName
fd2
PK
clientIDD cFName
cLName cAddress
fd3
14.9 The table below provides osme sample data for an agency called Hotel Services supplies part-
time/temporary staff to hotels within Strathclyde region. The relation in Figure 2 lists the
number of hours worked by each staff at various hotels. The relation is first normal form
(1NF). Assuming that a contract is for one hotel only but a staff may work in more than one
hotel on different contracts, identify the functional dependencies represented by the data in
this relation.
Contracts
NIN contractNo hours eName hNo hLoc
1135 C1024 16 Smith, J. H25 East Kilbride
1057 C1025 16 Green, D. H4 Glasgow
1068 C1024 28 Green, D. H25 East Kilbride
1135 C1025 16 Smith, J. H4 Glasgow
1057 C1026 25 Green, D. H15 Glasgow
1088 C1027 25 Crowe, M. H25 East Kilbride
Functional dependencies
(NIN, contractNo) ----> {hours, eName, hNo, hLoc}
(NIN, hNo) -----> {contractNo, hours, eName, hLoc)
contractNo -----> {hNo, hLoc}
hNo -----> hLoc
NIN ------> eName
14.10 Given the following relation schema and its functional dependencies:
RR
A B C D E F G H
fd1
fd2
fd3
fd4 fd5
(a) Specify candidate keys and state the primary key.
Candidate keys: (A, F), (A, D)

Primary key: (A, F)
(b) Assuming that the relation is in first normal form (1NF), describe and illustrate the
process of normalising the relation schema to second (2NF) and third (3NF) normal
forms. Identify the primary and foreign keys in your schemas.
fd2 violates 2NF definition. Decompose.

pk
RR1 pk RR fk
A B C G H A F D E
fd2 fd1
fd5 fd4
fd3
- fd4, fd5 violate 3NF definition. Decompose
pk fk pk
RR1 RR11
A B C G G H
fd2 fd5
pk
RR2 pk
RR fk
A F D D E
fd1 fd4
fd3
14.11 The table below represents data about employees of a company and the projects they work on.
An employee may work on one or more projects a certain number of hours
EmpProjects
projName empNo projBudget hours projManager managerDOB
Hardware E1 20000 20 Smith Jan 63

AI E2 17000 42 Adams March 52
Statellite E2 84000 42 Smith Jan 63
AI E3 17000 17 Adams March 52
Software E4 84000 41 Adams March 52
Cancer E5 66000 42 Jones Jan 63
Assuming that the functional dependencies in the relation in Figure 2 will hold for any
additional data, which of the following functional dependencies are true and which are false?
Justify your answer.
i) projName empNo false; More than one empNo for a projName

ii) projName projBudget true; One projBudget for every projName
iii) projBudget projName false; More than one projName for one projBudget
iv) projName hours false; More than one value of hours for one projName
v) projName projManager true; One projManager for every ProjName

vi) projManager projName false; More than one projName for one projManager
vii) empNo hours false; More than one value of hours for one empNo
viii) (projName, empNo) hours true; One value of hours for the compound value of
(projName, empNo)
ix) managerDOB projManager false; More than one projManager for one
managerDOB
14.12 Given the following relational schema and its functional dependencies:
RentalInfo
custNo propertyNo custName pAddress rentStart ownerNo OName
fd1
fd2
fd3
fd4 fd5
(a) Specify candidate keys and state the primary key.
Candidate keys: (custNo, propertyNo), (custNo, rentStart)

Primary key: any of the above set of attributes; e.g. (custNo, propertyNo)
(b) Assuming that the relation is in first normal form (1NF), describe and illustrate the
process of normalising the relational schema to second (2NF) and third (3NF) normal
forms. Identify the primary and foreign keys in your third normal forms.
2NF: fd3 and fd54 violates the definition of 2NF. Decompose relation.
custNo custName propertyNo pAddress ownerNo OName

fd4 fd3
fd5
custNo propertyNo
rentStart
fd1
fd2
3NF: fd5 violates 3NF definition. Decompose relation.
pk pk fk
custNo custName propertyNo pAddress ownerNo
fd4 fd3
pk
fk fk pk
custNo propertyNo rentStart ownerNo OName
fd1 fd5
fd2
14.13(a) Give a set of functional dependencies for the relational schema R(A, B, C, D) with
primary key AB under which R is in 1NF but not in 2NF.
Any (non-trivial) dependency X → ρ, with X = A or X = B and ρ not equal to A or B will

violate 2NF.
(b) Give a set of functional dependencies for the relational schema R(A, B, C, D) with
primary key AB under which R is in 2NF but not in 3NF.
Any (non-trivial) dependency X → ρ, with X not a proper subset of {A, B} and ρ not
equal to A or B will violate 3NF but not 2NF.
(c) Consider a relational schema R(A, B, C) with a functional dependency B → C. If A is a

candidate key of R, could R be in BCNF and, if so, under what conditions?
The only way R could be in BCNF is if B is a key of R.
14.14 Consider a relational schema R(A, B, C, D, E) with the following functional

dependencies: A → B, BC → E, and ED → A.
(a) List all keys of R.

(b) Is R in 3NF?
(c) Is R in BCNF?
(a) The keys are: ACD, BCD, and CDE.

(b) R is in 3NF because B, E, and A are all parts of keys.
(c) R is not in BCNF because none of A, BC, and ED contain a key.
14.15 Consider the following relational schemas and functional dependencies. Assume that the
relations have been produced from a relation ABCDEFGHI and that all known
dependencies for this relation are listed. State the strongest normal form for each one
and, if appropriate, decompose to BCNF.
(a) R(A, B, C, D, E); A → B, C → D

(b) S(A, B, F); AC → E, B → F

(c) T(A, D); D → G, G → H
(d) U(C, D, H, G); A → I, I → A
(e) V(A,C, E, I)
(a) Not in 3NF. Decompose to: AD, CD, and ACE.

(b) Not in 3NF. Decompose to: AB and BF.
(c) – (e) Last three are in BCNF.
14.16 Consider the relational schema R(A, B, C, D). For each of the following functional
dependencies identify the candidate key(s) for R and state its strongest normal form. If
appropriate, decompose to BCNF.
(a) B → C, C → A, C → D
(b) B → C, D → A
(c) ABC → D, D → A
(d) A → B, A → C, BC → D
(e) AB → C, AB → D, C → A, D → B
(a) Candidate key is B. Relation is in 2NF but not 3NF (C → A, C → D violate BCNF).
Decompose to: AC, BC, and CD.
(b) Candidate key is BD. Relation is in 1NF but not 2NF (both FDs violate BCNF).
Decompose to: AD, BC, and BD.
(c) Candidate keys are ABC and BCD. Relation is in 3NF but not BCNF (D → A and D
is not a key). However, if we split R into AD and BCD we cannot preserve the first
dependency, so there is no BCNF decomposition.
(d) Candidate key is A. Relation is in 2NF but not 3NF (because of FD BC → D). This
FD also violates BCNF since BC does not contain a key, so decompose to: BCD and
ABC.
(e) Candidate keys are AB, BC, CD, and AD. Relation is in 3NF but not BCNF. First
decompose to: AC, BCD; this does not preserve AB → C, AB → D, and BCD is still
not in BCNF because of FD D → B. Decompose further to: AC, BD, CD, ABC, and
ABD.
Part 4 Methodology
Chapters 16 - 19 Methodology – Conceptual, Logical, and Physical Database Design
Case Study 1 - Adult Education Department
An Adult Education Department runs various courses during the daytime and evenings, and at
different times of the year. For example, ‘Spanish level 1’ is offered on Monday mornings,
Monday evenings or Wednesday evenings, and runs over 25 weeks from October to March. On
the other hand, ‘Introduction to Digging Up Your Ancestors’ only runs for 8 weeks, but is offered
on Tuesday or Wednesday evenings from October to December, January to March, and April to
June, with an optional field week in August.
There is always a maximum number of places for each course offering, which is dependent on the
individual tutor. For example, ‘Spanish level 1’ on Monday evenings may be limited to 20 places,
but on Wednesday evenings the limit may be 25. Each course offering is only taken by one tutor,
however, a tutor may take different courses, for example, ‘French level 1’ and ‘Spanish level 2’.
To guarantee enrolment, prospective students must pay the fee before the start of the first class.
There is a special reduction for those unemployed. All applicants are kept on a register for
subsequent mailshots.
16.1 Given the above information:

(a) Develop an Entity–Relationship model to illustrate the logical database design.
(b) Produce a set of tables from your Entity–Relationship model, clearly identifying the
primary keys.
State any justifications or assumptions you make.
For the diagram, four entities can be determined: Tutor, Course, Student, and Offering.
Course to Offering and Tutor to Offering are both 1:*, but Student to Offering is *:*.
Course Has Offering Takes Tutor

1..1 1..* 1..* 1..1
courseNo {PK} tutorNo {PK}
1..*
Enrols
fee
1..*
Student
matricNo{PK}
The tables should be derived relatively easily from the model, the most tricky one being
Offering. There also needs to be a table to represent the relationship between Student
and Offering, which will also contain the courseFee attribute.
Student (matricNo, studentFName, studentLName, street, city, postcode,

telNo)
Primary Key matricNo
Course (courseNo, courseName, courseDescription)

Primary Key courseNo
Offering (courseNo, tutorNo, startDate, startTime, endDate, endTime,

maxStudents)
Primary Key courseNo, matricNo
Foreign Key courseNo references Course(courseNo)
Foreign Key tutorNo references Tutor (tutorNo)
Registration (courseNo, tutorNo, matricNo, registrationDate, courseFee)

Primary Key courseNo, tutorNo, matricNo
Foreign Key (courseNo, tutorNo) references Offering(courseNo, tutorNo)
Foreign Key matricNo references Student(matricNo)
(d) Show that your data model supports the following transactions:
(i) Add a new course to the database, prior to it being offered on any particular day or
from any particular date.
(ii) Enrol a new student on the ‘German level 2’ course that runs on Monday evenings
commencing October 10 1994.
Bearing in mind the above transactions, explain how the physical database design might
be influenced describing what changes you might make etc, and how the application
(and transactions) would be affected. Your comments can apply to both computerized
procedures and manual procedures.
The transactions can be shown in various ways. For example, a check could be made first to
ensure no details have been entered. If separate tables exist for both Course and Offering, simply
insert the new details into Course (identifier, name, and level). If the tables are combined, this
table is simply checked for the existence of any records, and then the new details are inserted,
leaving the rest of the attributes null.
For the second transaction, check whether Student details exist, and insert if necessary. Find the
identifier for the Course, then repeatedly check Offering to find the correct Offering identifier
for the day/time. Repeat read the Registration records, and if space, insert a new Registration
record.
Alterations may be made to the physical design, such as adding a derived field to Offering
indicating how many students are enrolled – alters procedures. To aid searching, set up secondary
indexes on required fields.
Case Study 2 - BusyBee Cleaning Company
The BusyBee Cleaning Company specializes in providing cleaning services for both domestic and
commercial clients. Each type of client has a set of requirements. For example, The Cardboard
Box Company requires cleaning services from Monday to Friday 7am until 9am and 5pm until
7pm each day, but P. Nuttall only requires cleaning services on a Wednesday from 10am until
1pm.
Whenever a new client is taken on, a BusyBee administrator assesses how many cleaning staff are
required for the premises prior to assigning any staff to the job. Note that this is the ideal number,
it may differ in practice. In addition, the administrator also assesses whether any specialist
equipment is required and when. For example, three industrial floor cleaners may be needed on
two out of five occasions for one commercial client.
The cleaning staff work in groups of six, with a supervisor to oversee the work done. The other
staff are administrative staff who manage the day-to-day office work including visiting new
clients and ensuring the specialist equipment is properly maintained.
16.2 (a) Develop an Entity–Relationship model from the above information.

(b) Produce a set of tables from your Entity–Relationship model clearly identifying each
primary key.
State any justifications or assumptions you make.
Four main entities can be identified: Client, Requirement, Equipment, and Staff. Staff can
form a superclass with Cleaner and Admin forming subclasses. There is a recursive relationship
(1:*) on Cleaner. Cleaner to Requirement and Equipment to Requirement are both *:*
relationships (Assigned and Booked), whereas Client to Requirement is 1:*. This represents
the core of the problem.
Supervises
0..6 1..1
Admin Cleaner Works Requirement Has Client

1..* 1..* reqtNo {PK} 1..* 1..1 clientNo {PK}
1..*
UsedFor
0..*
Staff Equipment
staffNo {PK} eqptNo{PK}
In deriving the tables, the primary keys should be chosen judiciously, without separate attributes
being devised for all of them. So, for example, Client, Cleaner, Admin, Staff, and Equipment
will have a reference number each. Requirement will have the Client reference number with day
and time from. Assigned will have Client reference number with day, time from and Cleaner
reference number. Booked will have Client reference number with day, time from and equipment
reference number.
Cleaner (staffNo, fName, lName, address, salary, taxCode, homeTelNo, supervisorStaffNo)

Primary Key staffNo
Foreign Key supervisorStaffNo references Cleaner(staffNo)
Admin (staffNo, fName, lName, address, salary, taxCode, homeTelNo)

Primary Key staffNo
Client (clientNo, name, address, telNo, faxNo)

Primary Key clientNo
Equipment (eqptNo, description, usage, cost)

Primary Key eqptNo
Requirement (reqtNo, startDate, startTime, duration, comments)

Primary Key reqtNo
Assigned (staffNo, reqtNo)

Primary Key staffNo, reqtNo
Foreign Key staffNo references Cleaner(staffNo)
Foreign Key reqtNo references Requirement(reqtNo)
Booked (reqtNo, eqptNo)

Primary Key reqtNo, eqptNo
Foreign Key reqtNo references Requirement(reqtNo)
Foreign Key eqptNo references Equipment(eqptNo)
(c) Demonstrate that your model supports the following transactions and explain how they
might influence physical database design:
(i) For a specific client, produce a schedule of the cleaning times together with the
number of staff assigned, and details of any specialist equipment required.
(ii) For a specific supervisor, produce a list of staff on their team together with their
assignment details.
The transactions often demonstrate whether the data model is reasonable or incorrect. For the first
transaction you could assume a reference number is used to access the client table, then each
requirement can be accessed in turn. For each requirement, the assigned table will be accessed
repeatedly, to count the number of staff assigned, and the booked table will be accessed repeatedly
and linked to the equipment table using the equipment reference number.
For the second transaction, a similar procedure is carried out. The difference is that there is a
recursive relationship involved, but if the table structures are correct, this should not pose a
problem. In looking at the transactions refinements may be obvious, such as using derived
attributes or posting in some non-key attributes from one table to another or creating secondary
indexes.
Case Study 3 - Reliable Rentals
The requirements collection and analysis phase of the database design process has provided
the following data requirements for a company called Reliable Rentals, which rents out
vehicles (cars and vans). The Company has various outlets (garage/offices) throughout
Glasgow. Each outlet has a number, address, phone number, fax number, and a manager who
supervises the operation of the garage and offices at each site.
Each site is allocated a stock of vehicles for hire, however, individual vehicles may be moved
between outlets, as required. Only the current location for each vehicle is stored. The
registration number uniquely identifies each vehicle for hire and is used when hiring a vehicle
to a client.
Clients may hire vehicles for various periods of time (minimum 1 day to maximum 1 year).
Each individual hire agreement between a client and the Company is uniquely identified using
a hire number. Information stored on the vehicles for hire include: the vehicle registration
number, model, make, engine size, capacity, current mileage, date MOT due, daily hire rate,
and the current location (outlet) of each vehicle.
The data stored on a hire agreement includes the hire number, the client’s number, name,
address, and phone number, date the client started the hire period, date the client wishes to
terminate the hire period, the vehicle registration number, model and make, the mileage before
and after the hire period. After each hire a member of staff checks the vehicle and notes any
fault(s). Fault report information on each vehicle is stored, which records the name of the
member of staff responsible for the check, date checked, whether fault(s) where found (yes or
no), the vehicle registration number, model, make and the current mileage.
The Company has two types of clients: personal and business. The data stored on personal
clients includes the client number, name (first and last name), home address, phone number,
date of birth, and driving licence number. The data stored on business clients includes the
client number, name of business, type of business, address, telephone, and fax numbers. The
client number uniquely identifies each client and the information stored relates to all clients
who have hired in the past and those currently hiring a vehicle.
Information is stored on the staff based at various outlets including: staff number, name (first
and last name), home address, home phone number, date of birth (DOB), sex, National
Insurance Number (NIN), date joined the Company, job title, and salary. Each staff member is
associated with a single outlet but may be moved to an alternative outlet as required, although
only the current location for each member of staff is stored.
16.3 (a) Create a conceptual schema for Reliable Rentals using the concepts of the Enhanced
Entity–Relationship (EER) model. To simplify the diagram, only show entities,
relationships and the primary key attributes. Specify the cardinality ratio and participation
constraint of each relationship type. State any assumptions you make when creating the
EER model (if necessary).
FaultReport Raises Staff

0..* 1..1 1..*staffNo
1..* 1..*
Generates Has
1..1 1..1
Vehicle Rents Outlet
regNo 1..* 1..1 outletNo

1..1
HiredBy
1..*
Client Hires HireAgreement

1..1 1..* hireNo
clientNo
PersonalClient BusinessClient
(b) Map your high-level data model to a set of relational tables that represent the
entity and relationship types. Identify primary, alternate, and foreign keys.
Outlet (outletNo, address, telNo, faxNo, mgrStaffNo)

Primary Key outletNo
Foreign Key mgrStaffNo references Staff(staffNo)
Staff (staffNo, fName, lName, address, telNo, DOB, sex, NIN, dateJoined, jobTitle,
salary, outletNo)
Primary Key staffNo
Alternate Key NIN
Foreign Key outletNo references Outlet(outletNo)
Vehicle (regNo, model, make, engineSize, capacity currentMileage, motDate, dRate,

outletNo)
Primary Key regNo
Foreign Key outletNo references Outlet(outletNo)
FaultReport (staffNo, regNo, dateChecked, faults)

Primary Key regNo, dateChecked
Foreign Key staffNo references Staff(staffNo)
Foreign Key regNo references Vehicle(regNo)
Note: Assumption is that a vehicle is checked for faults, only once on a given date.
PersonalClient (pclientNo, fName, lName, address, telNo, DOB, licenceNo)

Primary Key pclientNo
Alternate Key licenceNo
BusinessClient (bclientNo, bName, bType, address, telNo, faxNo)

Primary Key bclientNo
HireAgreement (hireNo, clientNo, regNo, dateStart, dateFinish, mileageBefore,

mileageAfter)
Primary Key hireNo
Foreign Key clientNo references PersonalClient(pclientNo) and
BusinessClient(bclientNo)
Foreign Key regNo references Vehicle(regNo)
16.4 Map the high-level data model shown below to a set of relational tables. Identify
primary, alternate, and foreign keys.
0..*
Supervises WorksFor
Employee 1..* 1..1
Department
0..1 SSN {PK}

name name {PK}
fName number
Manages locations [1…5]
MI
lName 1..1 0..1 /noOfEmployees
address 1..1
sex startDate
salary
birthDate
1..1 Controls
1..*
RelatedTo
WorksOn
hours
0..* 0..*
0..*
Dependent Project
name {PPK} number {PK}

relationship name {AK}
sex location
birthDate
Employee (SSN, fName, MI, lName, address, birthDate, sex, salary, number)
Primary Key SSN
Foreign Key number references Department (number)
Department (number, name, mgrSSN, startDate)

Primary Key number
Alternate Key name
Foreign Key mgrSSN references Employee (SSN)
DepartmentLocations (number, location)

Primary Key number, location
Foreign Key number references Department (number)
WorksOn (SSN, number, hours)

Primary Key SSN, number
Foreign Key SSN references Employee (SSN)
Foreign Key number references Project (number)
Project (number, name, location, dNumber)

Primary Key number
Foreign Key dNumber references Department (number)
SupervisoryTeam (supSSN, SSN)

Primary Key SSN
Foreign Key supSSN references Employee (SSN)
(Or do not create SupervisoryTeam relation and simply add supSSN to Employee relation)
Dependent (SSN, name, sex, birthDate relationship)

Primary Key name, SSN
16.5 Map the high-level data model shown below to a set of relational tables. Identify
primary, alternate, and foreign keys.
Company 1..1 Owns 1..* Equipment
name{PK} equipID{PK}
phone {AK} price
1..1 1..1
Employs dateJoined Has
1..* 1..*
Engineer 1..* Repairs 1..* Fault

engNo{PK} faultID{PK}
name description
date
fName
hours
lName
ElectFault MechFault
mPart
ePart
Company(name, phone)
Equipment(equipID, price, CName)
Engineer(engNo, fName, lName, CName, dateJoined)
Fault(faultID, description, equipID)
ElectFault(faultID, ePart)
MechFault(faultID, mPart)
Repairs(engNo, faultID, date, hours)
Case Study 4 - Perfect Pets
A practice called Perfect Pets provides private health care for domestic pets throughout America. This
service is provided through various clinics located in the main cities of America. The Director of
Perfect Pets is concerned that there is a lack of communication within the practice and particularly in
the sharing of information and resources across the various clinics. To resolve this problem the Director
has requested the creation of a centralized database system to assist in the more effective and efficient
running of the practice. The Director has provided the following description of the current system.
Data Requirements
Veterinary Clinics
Perfect Pets has many veterinary clinics located in the main cities of America. The details of each
clinic include the clinic number, clinic address (consisting of the street, city, state, and zipcode), and the
telephone and fax numbers. Each clinic has a Manager and a number of staff (for example, vets, nurses,
secretaries, cleaners). The clinic number is unique throughout the practice.
Staff
The details stored on each member of staff include the staff number, name (first and last), address
(street, city, state, and zipcode), telephone number, date of birth, sex, social security number (SSN),
position, and current annual salary. The staff number is unique throughout the practice.
Pet Owners
When a pet owner first contacts a clinic of Perfect Pets the details of the pet owner are recorded, which
include an owner number, owner name (first name and last name), address (street, city, state, and
zipcode), and home telephone number. The owner number is unique to a particular clinic.
Pets
The details of the pet requiring treatment are noted, which include a pet number, pet name, type of pet,
description, date of birth (if unknown, an approximate date is recorded), date registered at clinic,
current status (alive/deceased), and the details of the pet owner. The pet number is unique to a
particular clinic.
Examinations
When a sick pet is brought to a clinic, the vet on duty examines the pet. The details of each
examination are recorded and include an examination number, the date and time of the examination, the
name of the vet, the pet number, pet name, and type of pet, and a full description of the examination
results. The examination number is unique to a particular clinic. As a result of the examination, the vet
may propose treatment(s) for the pet.
Treatments
Perfect Pets provides various treatments for all types of pets. These treatments are provided at a
standard rate across all clinics. The details of each treatment include a treatment number, a full
description of the treatment, and the cost to the pet owner. For example, treatments include:
T123 Penicillin antibiotic course $50.00

T155 Feline hysterectomy $200.00
T112 Vaccination course against feline flu $70.00
T56 Small dog – stay in pen per day (includes feeding) $20.00
A standard rate of $20.00 is charged for each examination, which is recorded as a type of treatment.
The treatment number uniquely identifies each type of treatment and is used by all Perfect Pets clinics.
Pet Treatments
Based on the results of the examination of a sick pet, the vet may propose one or more types of
treatment. For each type of treatment, the information recorded includes the examination number and
date, the pet number, name and type, treatment number, description, quantity of each type of treatment,
and date the treatment is to begin and end. Any additional comments on the provision of each type of
treatment are also recorded.
Pens
In some cases, it’s necessary for a sick pet to be admitted to the clinic. Each clinic has 20 – 30 animal
pens, each capable of holding between one and four pets. Each pen has a unique pen number, capacity,
and status (an indication of availability). The sick pet is allocated to a pen and the details of the pet, any
treatment(s) required by the pet, and any additional comments about the care of the pet are recorded.
The details of the pet’s stay in the pen are also noted, which include a pen number, and the date the pet
was put into and taken out of the pen. Depending on the pet’s illness, there may be more than one pet in
a pen at the same time. The pen number is unique to a particular clinic.
Invoices
The pet owner is responsible for the cost of the treatment given to a pet. The owner is invoiced for the
treatment arising from each examination, and the details recorded on the invoice include the invoice
number, invoice date, owner number, owner name and full address, pet number, pet name, and the
details of the treatment given. The invoice provides the cost for each type of treatment and the total cost
of all treatments given to the pet. Additional data is also recorded on the payment of the invoice,
including the date the invoice was paid and the method of payment (for example, check, cash, visa).
The invoice number is unique throughout the practice.
Surgical, Non-surgical, and Pharmaceutical Supplies

Each clinic maintains a stock of surgical supplies (for example, syringes, sterile dressings, bandages)
and non-surgical supplies (for example, plastic bags, aprons, litter trays, pet name tags, pet food). The
details of surgical and non-surgical supplies include the item number and name, item description,
quantity in stock (this is ascertained on the last day of each month), reorder level, reorder quantity, and
cost. The item number uniquely identifies each type of surgical or non-surgical supply. The item
number is unique for each surgical or non-surgical item and used throughout the practice.
Each clinic also maintains a stock of pharmaceutical supplies (for example, antibiotics, pain killers). The
details of pharmaceutical supplies include a drug number and name, description, dosage, method of
administration, quantity in stock (this is ascertained on the last day of each month), reorder level, reorder
quantity, and cost. The drug number uniquely identifies each type of pharmaceutical supply. The drug
number is unique for each pharmaceutical supply and used throughout the practice.
Appointments
If the pet requires to be seen by the vet at a later date, the owner and pet are given an appointment. The
details of an appointment are recorded and include an appointment number, owner number, owner
name (first name and last name), home telephone number, the pet number, pet name, type of pet, and
the appointment date and time. The appointment number is unique to a particular clinic.
Transaction Requirements
Listed below are the transactions that should be supported by the Perfect Pets database application.
1. The database should be capable of supporting the following maintenance transactions:

a) Create and maintain records recording the details of Perfect Pets clinics and the members
of staff at each clinic.
b) Create and maintain records recording the details of pet owners.
c) Create and maintain the details of pets.
d) Create and maintain records recording the details of the types of treatments available for
pets.
e) Create and maintain records recording the details of examinations and treatments given to
pets.
f) Create and maintain records recording the details of invoices to pet owners for treatment
to their pets.
g) Create and maintain records recording the details of surgical, non-surgical, and
pharmaceutical supplies at each clinic.
h) Create and maintain records recording the details of pens available at each clinic and the
allocation of pets to pens.
i) Create and maintain pet owner/pet appointments at each clinic.
2. The database should be capable of supporting the following example query transactions:
a) Present a report listing the Manager’s name, clinic address, and telephone number for
each clinic, ordered by clinic number.
b) Present a report listing the names and owner numbers of pet owners with the details of
their pets.
c) List the historic details of examinations for a given pet.
d) List the details of the treatments provided to a pet based on the results of a given
examination.
e) List the details of an unpaid invoice for a given pet owner.
f) Present a report on invoices that have not been paid by a given date, ordered by invoice
number.
g) List the details of pens available on a given date for clinics in the New York area, ordered
by clinic number.
h) Present a report that provides the total monthly salary for staff at each clinic, ordered by
clinic number.
i) List the maximum, minimum and average cost for treatments.
j) List the total number of pets in each pet type, ordered by pet type.
k) Present a report of the names and staff numbers for all vets and nurses over 50 years old,
ordered by staff name.
l) List the appointments for a given date and for a particular clinic.
m) List the total number of pens in each clinic, ordered by clinic number.
n) Present a report of the details of invoices for pet owners between 1997 to 1999, ordered
by invoice number.
o) List the pet number, name, and description of pets owned by a particular owner.
p) Present a report listing the pharmaceutical supplies that need to be reordered at each
clinic, ordered by clinic number.
q) List the total cost of the non-surgical and surgical supplies currently in stock at each
clinic, ordered by clinic number.
16.6 (a) Create a conceptual schema for Perfect Pets using the concepts of the Enhanced Entity–
Relationship (EER) model. To simplify the diagram, only show entities, relationships and the
primary key attributes. Specify the cardinality ratio and participation constraint of each
relationship type. State any assumptions you make when creating the EER model (if
necessary).
See ER diagram below.
(b) Validate the conceptual data model.
The student should show that each of the transactions identified above is supported by his/her
conceptual data model.
(c) Map your high-level data model to a set of relational tables that represent the entity
and relationship types. Identify primary, alternate, and foreign keys.
Clinic (clinicNo, street, city, state, zipCode, telNo, Staff (staffNo, sFName, sLName, sStreet, sCity,
faxNo, mgrStaffNo) sState, sZipCode, sTelNo, DOB, sex, SSN,
Primary Key clinicNo position, salary, clinicNo)
Alternate Key zipCode Primary Key staffNo
Alternate Key telNo Alternate Key SSN
Alternate Key faxNo Foreign Key clinicNo references Clinic(clinicNo)
PetOwner (ownerNo, oFName, oLName, oStreet, Pet (petNo, petName, petType, petDescription, pDOB,
oCity, oState, oZipCode, oTelNo, clinicNo) dateRegistered, petStatus, ownerNo, clinicNo)
Primary Key ownerNo Primary Key petNo
Foreign Key clinicNo references Clinic(clinicNo) Foreign Key ownerNo references Owner(ownerNo)
Foreign Key clinicNo references Clinic(clinicNo)
Examination (examNo, examDate, examTime, Treatment (treatNo, description, cost)
examResults, petNo, staffNo) Primary Key treatNo
Primary Key examNo
Alternate Key staffNo, examDate, examTime
Foreign Key petNo references Pet(petNo)
Foreign Key staffNo references Staff(staffNo)
Pen (penNo, penCapacity, penStatus, clinicNo) PetPen (penNo, petNo, dateIn, dateOut, comments)
Primary Key penNo Primary Key penNo, petNo, dateIn
Foreign Key clinicNo references Clinic(clinicNo) Alternate Key penNo, petNo, dateOut
Foreign Key penNo references Pen(penNo)
PetTreatment (examNo, treatNo, startDate, endDate, Item (itemNo, itemName, itemDescription, itemCost)
quantity, ptComments) Primary Key itemNo
Primary Key examNo, treatNo
Foreign Key examNo references Examination(examNo)
Foreign Key treatNo references Treatment(treatNo)
Pharmacy (drugNo, drugName, drugDescription, ItemClinicStock (itemNo, clinicNo, inStock,
dosage, methodAdmin, drugCost) reorderLevel, reorderQty)
Primary Key drugNo Primary Key itemNo, clinicNo
Foreign Key itemNo references Item(itemNo)
Foreign Key clinicNo references Clinic(clinicNo)
PharmClinicStock (drugNo, clinicNo, inStock, Invoice (invoiceNo, invoiceDate, datePaid,
reorderLevel, reorderQty) paymentMethod, ownerNo, examNo)
Primary Key drugNo, clinicNo Primary Key invoiceNo
Foreign Key drugNo references Pharmacy(drugNo) Foreign Key ownerNo references Owner(ownerNo)
Foreign Key clinicNo references Clinic(clinicNo) Foreign Key examNo references Examination(examNo)
Appointment (appNo, aDate, aTime, petNo)

Primary Key appNo
(d) Produce a physical database design for a relational DBMS you have access to.
Implement this physical database design.
Interactions between tables and query transactions (with suggested frequencies).
Table Transaction Access Frequency (per day)
Appointment 2(l) join: petNo 250
search condition: aDate
Examination 2(c), 2(d) join: Pet on petNo 100
2(d) join: Staff on staffNo
Invoice 2(e), 2(f) join: PetOwner on ownerNo 10
search condition: datePaid IS NULL
2(n) join: PetOwner on ownerNo 1 per month
search condition: invoiceDate
ItemClinicStock 2(q) search condition: inStock < reorderLevel 50 per month
Pet 2(b) join: PetOwner on ownerNo 50
2(j) group: petType 1
order by: petType
aggregate: count on petType
2(l) join: Clinic on clinicNo 250
2(o) join: PetOwner on ownerNo 1500
PharmClinicStock 2(p) search condition: inStock < reorderLevel 50 per month
Based on the guidelines provided for Oracle in Chapter 16 there may be performance benefits
in adding the indexes shown in the following table.
Additional Oracle indexes for the Perfect Pets database.
Table Index
Pet clinicNo
ownerNo
Appointment aDate
petNo
Invoice ownerNo
invoiceDate
Database Systems: Instructor’s Guide – Part V
Stock
{Mandatory, OR}
1..*
Holds Item Pharmacy
1..*
Has itemNo drugNo
PetOwner IsContactedBy Clinic Staff
1..1 1..*
1..1 1..* 1..1
0..1 1..1
1..1 1.. 1..1 1..1 1..1 1..1
1
POAttends Schedules
Pays Appointment Pen

20..30
1..* appNo 1..* Provides penNo
Owns 1..* Registers 0..*
1..* PAttends IsAllocatedTo

1..*
1..1
Invoice Pet
invoiceNo petNo 1..*

1..*
1..1 1..1
Undergoes
Performs
ResultsFrom
1..*
1..1 Examination 0..*
examNo
ER model for Perfect Pets with primary keys shown and

specialization/generalization of Stock
treatNo 1..1 1..* 1..*
Database Systems: Instructor’s Guide – PART IV
Case Study 5 - StayHome Video Rentals
This case study describes a company called StayHome, which rents out videos to its members. The first
branch of StayHome was established in 1982 in Seattle but the company has now grown and has many
branches throughout the United States. The company’s success is due to the first class service it
provides to its members and the wide and varied stock of videos available for rent.
As StayHome has grown, so has the difficulties in managing the increasing amount of data used and
generated by the company. To ensure the continued success of the company, the Director of StayHome
has urgently requested that a database application be built to help solve the increasing problems of data
management.
Below is a description of two views of the company: a Branch view and a Business View.
Branch View of StayHome

The users’ requirements specification for the Branch view is listed in two sections:
 the ‘data requirements’ section describes the data used by the Branch view;
 the ‘data transactions’ section provides examples of how the data is used by the Branch view (that
is, the transactions that staff have to perform on the data).
Data Requirements
The data held on a branch of StayHome is the branch address made up of street, city, state, and zip
code, and the telephone numbers (maximum of 3 lines). Each branch is given a branch number, which
is unique throughout the company.
Each branch of StayHome has staff, which includes a Manager, one or more Supervisors, and a number
of other staff. The Manager is responsible for the day-to-day running of a given branch. Each branch
has several Supervisors and each Supervisor is responsible for supervising a group of staff. The data
held on a member of staff is his or her name, position, and salary. Each member of staff is given a staff
number, which is unique throughout the company.
Each branch of StayHome is allocated a stock of videos. The data held on a video is the catalog
number, video number, title, category, daily rental rate, purchase price, status, and the names of the
main actors (and the characters played), and the director. The catalog number uniquely identifies each
video. In most cases, there are several copies of each video at a branch, and the individual copies are
identified using the video number. A video is given a category such as Action, Adult, Children, Thriller,
Horror, or Sci-Fi. The status indicates whether a specific copy of a video is available for rent.
Before renting a video from the company, a customer must first register as a member of a local branch
of StayHome. The data held on a member is the first and last name, address, and the date that the
member registered at the branch. Each member is given a member number, which is unique across all
branches and is used even when a member chooses to register at more than one branch. The name of
the member of staff responsible for processing the registration of a member at a branch is also noted.
Once registered, a member is free to rent videos, up to a maximum of 10 at any one time. The data held
on each video rented is the rental number, the member’s full name and member number, the video
number, title, and daily rental cost, and the dates the video is rented out and returned. The rental
number is unique throughout the company.
Data Entry
(a) Enter the details of a new branch.
(b) Enter the details of a new member of staff at a branch (such as an employee Tom Daniels at
branch B001).
(c) Enter the details for a newly released video (such as details of a video called Independence
Day).
(d) Enter the details of copies of a new video at a given branch (such as three copies of
Independence Day at branch B001).
(e) Enter the details of a new member registering at a given branch (such as a member Bob
Adams registering at branch B002).
(f) Enter the details of a rental agreement for a member renting a video (such as member Don
Nelson renting Tomorrow Never Dies on 4- Feb-2000).
Data Update / Deletion

(a) Update / delete the details of a branch.
(b) Update / delete the details of a member of staff at a branch.
(c) Update / delete the details of a given video.
(d) Update / delete the details of a copy of a video.
(e) Update / delete the details of a given member.
(f) Update / delete the details of a given rental agreement for a member renting a video.
Data Queries
The database should be capable of supporting the following sample queries:
(a) List the details of branches in a given city.

(b) List the name, position, and salary of staff at a given branch, ordered by staff name.
(c) List the name of each Manager at each branch, ordered by branch number.
(d) List the title, category, and availability of all videos at a specified branch, ordered by category.
(e) List the title, category, and availability of all videos for a given actor at a specified branch,
ordered by title.
(f) List the title, category, and availability of all videos for a given director at a specified branch,
ordered by title.
(g) List the details of all videos a specified member currently has on rent.
(h) List the details of copies of a given video at a specified branch.
(i) List the titles of all videos in a specified category, ordered by title.
(j) List the total number of videos in each video category at each branch, ordered by branch
number.
(k) List the total cost of the videos at all branches.
(l) List the total number of videos featuring each actor, ordered by actor name.
(m) List the total number of members at each branch who joined in 1999, ordered by branch
number.
(n) List the total possible daily rental for videos at each branch, ordered by branch number.
The Business View of StayHome

The users’ requirements specification for the Business view is listed in two sections:
 the ‘data requirements’ section describes the data used by the Business view;
 the ‘data transactions’ section provides examples of how the data is used by the Business view
(that is, the transactions that staff have to perform on the data).
Data Requirements
The details held on a branch of StayHome are the branch address and the telephone number. Each
branch is given a branch number, which is unique throughout the company.
Each branch of StayHome has staff, which includes a Manager. The details held on a member of staff
are his or her name, position, and salary. Each member of staff is given a staff number, which is unique
throughout the company.
Each branch of StayHome is allocated a stock of videos. The details held on a video are the catalog
number, video number, title, category, daily rental rate, and purchase price. The catalog number
uniquely identifies each video. However, in most cases there are several copies of each video at a
branch, and the individual copies are identified using the video number.
Each branch of StayHome receives videos from video suppliers. The details held on video suppliers are
the supplier number, name, address, telephone number, and status. Orders for videos are placed with
these suppliers and the details held on a video order are the order number, supplier number, supplier
address, video catalog number, video title, video purchase price, quantity, date order placed, date order
received, and the address of the branch receiving the order.
A customer of StayHome must first register as a member of a local branch of StayHome. The details
held on a member are name, address, and the date that the member registered at a branch. Each member
is given a member number, which is unique throughout all branches of the company and is used even
when a member chooses to register at more than one branch.
The details held on each video rented are the rental number, full name and member number, the video
number, title, and daily rental rate, and the dates the video is rented out and returned. The rental number
is unique throughout the company.
Data Entry
(a) Enter the details for a newly released video (such as details of a video called Independence
Day).
(b) Enter the details of a video supplier (such as a supplier called WorldView Videos).
(c) Enter the details of a video order (such as ordering 10 copies of Saving Private Ryan for
branch B002).
Data Update / Deletion

(a) Update / delete the details of a given video.
(b) Update / delete the details of a given video supplier.
(c) Update / delete the details of a given video order.
Data Queries
(a) List the name, position, and salary of staff at all branches, ordered by branch number.
(b) List the name and telephone number of the Manager at a given branch.
(c) List the catalog number and title of all videos at a given branch, ordered by title.
(d) List the number of copies of a given video at a given branch.
(e) List the number of members at each branch, ordered by branch number.
(f) List the number of members who joined this year at each branch, ordered by branch number.
(g) List the number of video rentals at each branch between certain dates, ordered by branch
number.
(h) List the number of videos in each category at a given branch, ordered by category.
(i) List the name, address, and telephone number of all video suppliers, ordered by supplier
number.
(j) List the name and telephone number of a video supplier.
(k) List the details of all video orders placed with a given supplier, ordered by the date of order.
(l) List the details of all video orders placed on a certain date.
(m) List the total daily rentals for videos at each branch between certain dates, ordered by branch
number.
Initial database size

(a) There are approximately 20000 video titles and 400000 videos for rent distributed over 100
branches. There are an average of 4000 and a maximum of 10000 videos for rent at each
branch.
(b) There are approximately 2000 staff working across all branches. There are an average of 15
and a maximum of 25 members of staff working at each branch.
(c) There are approximately 100000 members registered across all branches. There are an average
of 1000 and a maximum of 1500 members registered at each branch.
(d) There are approximately 400000 video rentals across all branches. There are an average of
4000 and a maximum of 10000 video rentals at each branch.
(e) There are approximately 1000 directors and 30000 main actors in 60000 starring roles.
(f) There are approximately 50 video suppliers and 1000 video orders.
Database rate of growth

(a) Approximately 100 new video titles and 20 copies of each video are added to the database
each month.
(b) Once a copy of a video is no longer suitable for renting out, (this includes those of poor visual
quality, lost, or stolen), the corresponding record is deleted from the database. Approximately
100 records of videos for rent are deleted each month.
(c) Approximately 20 members of staff join and leave the company each month. The records of
staff who have left the company are deleted after one year. Approximately 20 staff records are
deleted each month.
(d) Approximately 1000 new members register at branches each month. If a member does not rent
out a video at anytime within a period of two years, his or her record is deleted.
Approximately 100 member records are deleted each month.
(e) Approximately 5000 new video rentals are recorded across 100 branches each day. The details
of video rentals are deleted two years after the creation of the record.
(f) Approximately 50 new video orders are placed each week. The details of video orders are
destroyed two years after the creation of the record.
The types and average number of record searches

(a) Searching for the details of a branch – approximately 10 per day.

(b) Searching for the details of a member of staff at a branch – approximately 20 per day.
(c) Searching for the details of a given video – approximately 5000 per day (Sunday to Thursday),
approximately 10000 per day (Friday and Saturday). Peak workload 6–9pm daily.
(d) Searching for the details of a copy of a video – approximately 10000 per day (Sunday to
Thursday), approximately 20000 per day (Friday and Saturday). Peak workload 6–9pm daily.
(e) Searching for the details of a specified member – approximately 100 per day.
(f) Searching for the details of a rental agreement for a member renting a video – approximately
10000 per day (Sunday to Thursday), approximately 20000 per day (Friday and Saturday).
Peak workload 6–9pm daily.
15.7 (a) Create a conceptual schema for each view of StayHome using the concepts of the Enhanced
Entity–Relationship (EER) model. To simplify each diagram, only show entities, relationships
and the primary key attributes. Specify the cardinality ratio and participation constraint of
each relationship type. State any assumptions you make when creating the EER model (if
necessary).
See diagrams below.
(b) Validate the conceptual data model.
The student should show that each of the transactions identified above is supported by his/her
conceptual data models.
(c) Map your high-level local conceptual data models to local logical data models.
Identify primary, alternate, and foreign keys.
See details below.

Branch View
Directs Director
1..1 directorNo
1..*
Video Features Role Plays Actor
catalogNo 1..* 0..* 1..* 1..1 actorNo
1..1
Telephone
Is
telNo
1..3 Supervises
Supervisee
1..* Provides 1..1 0..*
VideoForRent Branch Has Staff

IsAllocated
1..1 1..*
videoNo 1..* branchNo Manages staffNo Supervisor
1..1
0..1 1..1 0..1
IsPartOf 1..1 1..1 Registers Processes 1..1
0..* 1..* 0..*
RentalAgreement Member Registration

Requests Agrees
rentalNo 0..* memberNo
1..1 1..1 1..*
Tables for local logical data model for the Branch view
Actor (actorNo, actorName) Branch (branchNo, street, city, state, zipCode, mgrStaffNo)
Primary Key actorNo Primary Key branchNo
Alternate Key zipCode
Director (directorNo, directorName) Member (memberNo, fName, lName, address)
Primary Key directorNo Primary Key memberNo
Registration (branchNo, memberNo, staffNo, RentalAgreement (rentalNo, dateOut, dateReturn, memberNo,
dateJoined) videoNo)
Primary Key branchNo, memberNo Primary Key rentalNo
Foreign Key branchNo references Branch(branchNo) Alternate Key memberNo, videoNo, dateOut
Foreign Key memberNo references Member(memberNo) Foreign Key memberNo references Member(memberNo)
Foreign Key staffNo references Staff(staffNo) Foreign Key videoNo references VideoForRent(videoNo)
Role (catalogNo, actorNo, character) Staff (staffNo, name, position, salary, branchNo, supervisorStaffNo)
Primary Key catalogNo, actorNo Primary Key staffNo
Foreign Key catalogNo references Video(catalogNo) Foreign Key branchNo references Branch(branchNo)
Foreign Key actorNo references Actor(actorNo) Foreign Key supervisorStaffNo references Staff(staffNo)
Telephone (telNo, branchNo) Video (catalogNo, title, category, dailyRental, price, directorNo)
Primary Key telNo Primary Key catalogNo
Foreign Key branchNo references Branch(branchNo) Foreign Key directorNo references Director(directorNo)
VideoForRent (videoNo, available, catalogNo, branchNo)

Primary Key videoNo
Foreign Key catalogNo references Video(catalogNo)
Foreign Key branchNo references Branch(branchNo)
Business View
Supplier
supplierNo
1..1
Supplies
1..*
Video PartOf VideoOrderLine Comprises VideoOrder
catalogNo 1..1 1..* 1..* 1..1 orderNo
1..1 1..*
Is
Places
1..1
1..*

IsAllocated
1..1 1..*
videoNo 1..* branchNo Manages staffNo
1..1
0..1 1..1
1..1
1..1
IsPartOf
Registers
0..* 1..*
RentalAgreement Member Registration

Requests Agrees
1..1 1..1 1..*
Tables for local logical data model for Business view
Branch (branchNo, address, telNo, mgrStaffNo) Member (memberNo, name, address)

Primary Key branchNo Primary Key memberNo
Alternate Key telNo
Registration (branchNo, memberNo, dateJoined) RentalAgreement (rentalNo, dateOut, dateReturn,
Primary Key branchNo, memberNo memberNo, videoNo)
Foreign Key branchNo references Branch(branchNo) Primary Key rentalNo
Foreign Key memberNo references Member(memberNo) Alternate Key memberNo, videoNo, dateOut
Foreign Key memberNo references Member(memberNo)
Foreign Key videoNo references Video(videoNo)
Staff (staffNo, name, positon, salary, branchNo) Supplier (supplierNo, sName, sAddress, sTelNo,
Primary Key staffNo status)
Foreign Key branchNo references Branch(branchNo) Primary Key supplierNo
Alternate Key sTelNo
Video (catalogNo, title, category, dailyRental, price, VideoForRent (videoNo, available, catalogNo, branchNo)
supplierNo) Primary Key videoNo
Primary Key catalogNo Foreign Key catalogNo references Video(catalogNo)
Foreign Key supplierNo references Supplier(supplierNo) Foreign Key branchNo references Branch(branchNo)
VideoOrder (orderNo, dateOrdered, dateReceived, VideoOrderLine (orderNo, catalogNo, quantity)
branchNo) Primary Key orderNo, catalogNo
Primary Key orderNo Foreign Key orderNo references VideoOrder(orderNo)
Foreign Key branchNo references Branch(branchNo) Foreign Key catalogNo references Video(catalogNo)
(d) Merge the two logical data models together to produce a global logical data model.
Global logical data model
Supplier
supplierNo
1..1
Directs Director
1..1 directorNo
Features Role Plays Actor

Supplies
0..* 1..* 1..1 actorNo
1..* 1..* 1..*
Video PartOf VideoOrderLine Comprises VideoOrder
catalogNo 1..1 1..* 1..* 1..1 orderNo
1..1 1..*
Telephone
telNo
Is
Places
1..3 Supervises
Supervisee
Provides 1..1
1..* 1..1 0..*

IsAllocated
1..1 1..*
videoNo 1..* branchNo Manages staffNo Supervisor
1..1
0..1
0..1 1..1
1..1 Processes 1..1
1..1
IsPartOf
Registers
0..* 1..* 0..*
RentalAgreement Member Agrees

Registration
Requests
1..1 1..1 1..*
Table structures for global logical data model
Actor (actorNo, actorName) Branch (branchNo, street, city, state, zipCode, mgrStaffNo)
Primary Key actorNo Primary Key branchNo
Alternate Key zipCode
Director (directorNo, directorName) Member (memberNo, fName, lName, address)
Primary Key directorNo Primary Key memberNo
Registration (branchNo, memberNo, staffNo, dateJoined) RentalAgreement (rentalNo, dateOut, dateReturn, memberNo,
Primary Key branchNo, memberNo videoNo)
Foreign Key branchNo references Branch(branchNo) Primary Key rentalNo
Foreign Key memberNo references Member(memberNo) Alternate Key memberNo, videoNo, dateOut
Foreign Key staffNo references Staff(staffNo) Foreign Key memberNo references Member(memberNo)
Foreign Key videoNo references Video(videoNo)
Role (catalogNo, actorNo, character) Staff (staffNo, name, posiiton, salary, branchNo, supervisorStaffNo)
Primary Key catalogNo, actorNo Primary Key staffNo
Foreign Key catalogNo references Video(catalogNo) Foreign Key branchNo references Branch(branchNo)
Foreign Key actorNo references Actor(actorNo) Foreign Key supervisorStaffNo references Staff(staffNo)
Supplier (supplierNo, name, address, telNo, status) Telephone (telNo, branchNo)
Primary Key supplierNo Primary Key telNo
Alternate Key telNo Foreign Key branchNo references Branch(branchNo)
Video (catalogNo, title, category, dailyRental, price, VideoForRent (videoNo, available, catalogNo, branchNo)
directorNo, supplierNo) Primary Key videoNo
Primary Key catalogNo Foreign Key catalogNo references Video(catalogNo)
Foreign Key directorNo references Director(directorNo) Foreign Key branchNo references Branch(branchNo)
Foreign Key supplierNo references Supplier(supplierNo)
VideoOrder (orderNo, dateOrdered, dateReceived, VideoOrderLine (orderNo, catalogNo, quantity)
branchNo) Primary Key orderNo, catalogNo
Primary Key orderNo Foreign Key orderNo references VideoOrder(orderNo)
Foreign Key branchNo references Branch(branchNo) Foreign Key catalogNo references Video(catalogNo)
18.1 Which of the three basic file organizations (heap, ordered, hash) would you choose for a file
where the most frequent operations were as follows:
(a) Inserts and scans where the order of records does not matter.
(b) Record searches based on a range of field values.
(c) Record searches based on a particular field value.
(a) Heap file

(b) Ordered file using the fields as the search key.
(c) Hash file using the particular field as the search key.
18.2 Discuss the difference between each of the following types of indexes:
(a) Dense versus sparse indexes.

(b) Primary versus secondary indexes.
(c) Clustered versus unclustered indexes.
(a) A dense index has at least one data entry for every search key value that appears in a
record in the indexed file. A sparse index contains an entry for each page of records in a
data file. It must be clustered (so we can only have one sparse index on a data file).
(b) A primary index is an index on a set of fields that includes the primary key and is
guaranteed not to contain duplicates. A secondary index is not a primary index (and may
have duplicates).
(c) A clustered index is one in which the ordering of data entries is the same as the ordering
of data records. There can be at most one clustered index on a data file. An unclustered
index is an index that is not clustered. There can be more than one unclustered index on a
data file.
Database Systems Coursework 1 (Case Study 6 – Fastcabs Cab Company)
DATABASE SYSTEMS
COURSEWORK
GROUP WORK: 2 – 3 Students
CONTRIBUTION: 50% of Final Mark
SUBMISSION DATE: Week 11
DEMONSTRATION DATE: Lab Time Week 11
Introduction to Coursework
You and your group members are part of a consultancy company that specialises in the provision of
database applications. The Director of FastCabs has recently approached your company to undertake a
project to design and partially implement a database management system for the company.
Notes
1. You are in the initial stages of user requirements collection and analysis and are required to read
the FastCabs case study.
2. The information presented in the case study is an overview of how business is conducted at the
FastCabs. As the information described in the case study is an overview and ambiguous in places,
it will be necessary for you to make your own assumptions about certain aspects of the case study.
Your assumptions should be clearly stated in your coursework submission.
Coursework Requirements
Part 1 – Design the Database

1. Create an Entity–Relationship (ER) model of the data requirements for the FastCabs case study using
the UML notation. Note: if necessary, use the additional concepts of the Enhanced Entity–
Relationship (EER) model. State any assumptions necessary to support your design.
(Submit hardcopy)
2. Derive relational schema from your ER model that represents the entities and relationships.
Identify primary, alternate and foreign keys. Note: use the following notation to describe your
relational schema, as shown in the example of a Staff relation given below.
Staff (staffNo, fName, lName, address, NIN, sex, DOB, deptNo)

Primary Key staffNo
Alternate Key lName, DOB
Alternate Key NIN
Foreign Key deptNo references Department(deptNo) On Delete No Action On Update
Cascade
(Submit hardcopy)
3. Use the technique of normalization to validate the structure of your relational schema.
Demonstrate that each of your relations is in third normal form (3NF) by displaying the functional
dependencies between attributes in each relation. Note, if any of your relations are not in 3NF, this
may indicate that your ER model is structurally incorrect or that you have introduced errors in the
process of deriving relations from your model.
(Submit hardcopy)
4. To further demonstrate your knowledge of normalization, assume that a proposed (badly structured)
relation for the FastCabs database has the following structure.
jobID pickupDateTime
driverID dFNamedLName clientIDD cFNamecLName cAddress

Identify the functional dependencies represented in this relation and demonstrate the process of
normalizing this relation into 3NF relations.
(Submit hardcopy)
Part 2 – Implement the Database

1. Create the tables for the FastCabs database. Where appropriate set field and table properties,
including any required indexes. Ensure that referential integrity is established between related
tables.
2. Create customised forms for data entry.
3. Enter some test data (approximately 5 – 10 rows) into each table.
Part 3 – Query the Database

Before starting this section, please ensure that your tables contain sufficient data to enable you to test
the query transactions described in the FastCabs case study.
1. Create and save the query transactions.

2. Create a customised form or a report for each saved query.
3. Provide 10 additional examples of queries, which retrieve useful data from the FastCabs
database. State the purpose of each query and attempt to use each example to demonstrate the
breadth of your knowledge of Access QBE/SQL.
(Only submit hardcopy for Part 3, point 3)

Part 4 – Implement Database Application

Implement a prototype database application for the FastCabs. The purpose of this prototype is to allow
the Director to provide feedback on your proposed design.
The prototype should facilitate the creation, maintenance and querying of records and where
appropriate automate various tasks for the user.
Part 5 – Document Database Application

1. Create a user manual that describes your prototype for the FastCabs database application.
2. The user manual should introduce the database application to the users, describe the functionality
of the database application and clearly demonstrate how to use the application.
Note: The user manual should be a maximum of 15 pages in length (including screen dumps).
(Submit hardcopy)
Part 6 – Demonstrate Database Application

You and the members of your group are required to demonstrate your database application to your Lab
Tutor during your usual lab time in Week 11.
Part 7 – Individual Critical Evaluation
Each student should submit his or her own critical assessment of the coursework. The evaluation
should include a discussion on how the coursework has reinforced (or otherwise) his or her
appreciation of the techniques and processes employed in undertaking a database project. In addition
the evaluation may include a wider discussion on topics such as:
 How the Database Systems module relates to the other modules on your course.
 How the knowledge and skills taught on the module and/or course, relates to your previous
experience as a student and/or employee.
 The appropriateness of the knowledge and skills taught on the module for future employment.
This component of the coursework can be submitted as a separate document from the main part of the
coursework.
(Submit hardcopy)
Marking Scheme
The assessment of this coursework will be carried out on the following components of the work. Please
note that each student should submit his or her own critical evaluation of the coursework and will
receive an individual mark for this component (out of 10%). This individual student mark will be
combined with the mark for the groupwork component (out of 90%) for the coursework.
Part 1 – Design the Database (30)

Part 2 – Implement the Database (15)
Part 3 – Query the Database (15)
Part 4 – Implement Prototype Database Application (15)
Part 5 – Document Database Application (10)
Part 6 – Demonstrate Database Application (5)
Part 7 – Individual Critical Evaluation (10)
The FastCabs Case Study
A private taxi company called FastCabs was established in Glasgow in 1992. Since then, the company
has grown steadily and now has offices in most of the main cities of Scotland. However, the company
is now so large that more and more administrative staff are being employed to cope with the ever-
increasing amount of paperwork. Furthermore, the communication and sharing of information within
the company is poor. The Director of the company, Paddy MacKay feels that too many mistakes are
being made and that the success of his company will be short-lived if he does not do something to
remedy the situation. He knows that a database could help in part to solve the problem and has
approached you and your team to help in creating a database application to support the running of
FastCabs.
The Director has provided the following brief description of how FastCabs operates.
Each office has a Manager; several taxi owners, drivers and administrative staff. The Manager is
responsible for the day-to-day running of the office. An owner provides one or more taxis to FastCabs
and each taxi is allocated for use to a number of drivers. The majority of owners are also drivers.
FastCab taxis are not available for hire by the public hailing a taxi in the street but must be requested
by first phoning the company to attend a given address.
There are two kinds of clients, namely private and business. The business provided by private clients is
on an ad hoc basis. The details of private clients are collected on the first booking of a taxi. However,
the business provided by business clients is more formal and involves agreeing a contract of work with
the business. A contract stipulates the number of jobs that FastCabs will undertake for a fixed fee.
When a job comes into FastCabs the name, phone number and contract number (when appropriate) of
the client is taken and then the pick-up date/time and pick-up/drop-off addresses are noted. Each job is
allocated a unique jobID. The nearest driver to the pick-up address is called by radio and is informed of
the details of the job.
When a job is completed the driver should note the mileage used and the charge made (for private
clients only). If a job is not complete, the reason for the failed job should be noted.
The Director has provided some examples of typical queries that the database application for FastCabs
must support.
a) The names and phone numbers of the Managers at each office.

b) The names of all female drivers based in the Glasgow office.
c) The total number of staff at each office.
d) The details of all taxis at the Glasgow office.
e) The total number of W registered taxis.
f) The number of drivers allocated to each taxi.
g) The name and number of owners with more than one taxi.
h) The full address of all business clients in Glasgow.
i) The details of the current contracts with business clients in Glasgow.

j) The total number of private clients in each city.
k) The details of jobs undertaken by a driver on a given day.
l) The names of drivers who are over 55 years old.
m) The names and numbers of private clients who hired a taxi in November 2000.
n) The names and addresses of private clients who have hired a taxi more than three times.
o) The average number of miles driven during a job.
p) The total number of jobs allocated to each car.
q) The total number of jobs allocated to each driver.
r) The total amount charged for each car in November 2000.
s) The total number of jobs and miles driven for a given contract.
FastCabs Sample ER Model
Staff WorksAt
staffNo 1..*
{Optional, And}
1..1
Owner Driver Manager Manages

Office
1..1 1..1 officeNo
1..1 1..1 1..1

1..1
Drives Agrees
1..* 1..*
TaxiDriver Contract
Takes
Owns 1..1 contractNo
IsFor
1..* 1..*
Has
1..1 SentTo 1..* 1..*
1..* Taxi 1..1 0..* Job Accepts

vehRegNo jobID
1..*
Requests 1..1
1..1 PrivateClient BusinessClient

pClientNo bClientNo
{Mandatory, Or}
Client
FastCabs Sample Relational Schema
Staff (staffNo, fName, lName, sAddress, jobDescription, salary, NIN, sex, DOB, officeNo)
Primary Key staffNo
Alternate Key NIN
Foreign Key officeNo references Office(officeNo)
Office (officeNo, oStreet, oCity, oPostcode, oTelNo, oFaxNo, mgrStaffNo)

Primary Key officeNo
Alternate Key oTelNo Alternate Key oFaxNo
PrivateClient (pClientNo, fName, lName, street, city, postcode, telNo)

Primary Key pClientNo
Alternate Key fName, lName, telNo
BusinessClient (bClientNo, bName, bType, street, city, postcode, telNo, faxNo)

Primary Key bClientNo
Alternate Key bName Alternate Key telNo Alternate Key faxNo
Contract (contractNo, bClientNo, startDate, finishDate, totalCharge, maxNoOfJobs, officeNo)

Primary Key contractNo
Foreign Key bClientNo references BusinessClient(bClientNo)
Foreign Key officeNo references Office(officeNo)
Taxi (vehRegNo, model, make, color, capacity, currentMileage, MOTDueDate, ownerNo)

Primary Key vehRegNo
Foreign Key ownerNo references Staff(staffNo)
TaxiDriver(vehRegNo, driverNo)
Primary Key vehRegNo, driverNo
Foreign Key vehRegNo references Taxi(vehRegNo)
Foreign Key driverNo references Staff(staffNo)
Job (jobID, vehRegNo, driverNo, pClientNo, contractNo, pickupDate, pickupTime,

pickupAddress, dropOffAddress, mileageUsed, Charge)
Primary Key jobID
Alternate Key vehRegNo, pickupDate, pickupTime
Alternate Key driverNo, pickupDate, pickupTime
Alternate Key pClientNo, pickupDate, pickupTime
Foreign Key vehRegNo references Taxi(vehRegNo)
Foreign Key driverNo references Staff(staffNo)
Foreign Key pClientNo references PrivateClient(pClientNo)
Foreign Key contractNo references Contract(contractNo)
Database Systems Coursework 2 (Case Study 7 – University Database)
DATABASE SYSTEMS
COURSEWORK
GROUP WORK: 2 – 3 Students
CONTRIBUTION: 50% of Final Mark
SUBMISSION DATE: Week 11
DEMONSTRATION DATE: Lab Time Week 11
Introduction to Coursework
You have been approached by a University for the design and implementation of a relational database
system that will provide information on the courses it offers, the academic departments that run the
courses, the academic staff and the enrolled students. The system will be used mainly by the students
and the academic staff.
The requirement collection and analysis phase of the database design process provided the following
data requirements for the University Database system.
Coursework Requirements
Each department runs a number of courses. The university provides a set of modules used in different
courses. Each course uses a number of modules but not every module is used. A course is assigned a
unique course code and a module is identified by a unique module code. A module can be used in one
course only, but can be studied by many students. In addition to the module code each module unique
title, start date, end date, texts (books), and assessment scheme (i.e. coursework and exam marks
percentages) are also stored.
Each course is managed by a member of academic staff, and each module is coordinated by a member
of academic staff also. The database should also store each course unique title, and duration (in years).
A student can enrol in one course at a time. Once enrolled a student is assigned a unique matriculation
number. To complete a course, each student must undertake and pass all the required modules in his/her
course. This requires that the database store the performance (pass or fail) of each student in every
module.
Additional data stored on each student includes student name (first and last), address (town, street, and
post code), date-of-birth, sex, and financial loan. For emergency purposes the database stores the name
(not composite), address (not composite), phone, and relationship of each student next-of-kin. None of
the next-of-kin's attributes is unique. Assume that every next-of-kin is a next-of-kin of one student only.
Each department is managed by a member of academic staff. The database should record the date
he/she started managing the department. Each department has a name, phone number, fax number, and
location (e.g. E Block). Each department employs many members of academic staff.
A member of academic staff can be the leader (i.e. manager) of at most one course, but can be the
coordinator of more than one module. A member of academic staff may not be assigned any of the
above mentioned roles (coordinator, course leader, department manager). All members of academic
staff teach modules. Every member of academic staff teaches one or more modules, and a module may
be taught by more than one member of academic staff. The database should record the number of hours
per week a member of academic staff spend teaching each module. Each member of academic staff is
identified by a unique staff number. All members of staff and students have unique computer network
user ID numbers. Additional data stored on each member of academic staff includes name (first and
last), phone extension number, office number, sex, salary, post (lecturer, or senior lecturer, or Professor,
etc.), qualifications, and address (not composite). A member of academic staff work for one department
only.
Part 1 – Design the Database

1. Create an Entity–Relationship (ER) model of the data requirements for the University Database
case study using the UML notation. Note: if necessary, use the additional concepts of the
Enhanced Entity–Relationship (EER) model. State any assumptions necessary to support your
design.
(Submit hardcopy)
2. Derive relational schema from your ER model that represents the entities and relationships.
Identify primary, alternate and foreign keys. Note: use the following notation to describe your
relational schema, as shown in the example of a Staff relation given below.
Staff (staffNo, fName, lName, address, NIN, sex, DOB, deptNo)

Primary Key staffNo
Alternate Key lName, DOB
Alternate Key NIN
Foreign Key deptNo references Department(deptNo) On Delete No Action On Update
Cascade
(Submit hardcopy)
3. Use the technique of normalization to validate the structure of your relational schema.
Demonstrate that each of your relations is in third normal form (3NF) by displaying the functional
dependencies between attributes in each relation. Note, if any of your relations are not in 3NF, this
may indicate that your ER model is structurally incorrect or that you have introduced errors in the
process of deriving relations from your model.
(Submit hardcopy)
4. To further demonstrate your knowledge of normalization, assume that a proposed (badly

structured) relation for the University Database database has the following structure.
matricNo name sex moduleTitle module startDate performance flatNo address

00/5021 Mcleod, A F BITS 27/09/01 Pass F001 6 lady Lane, Paisley
00/4647 Smith, J M Software Dev. 01/10/01 Pass F001 6 lady Lane, Paisley
01/4670 Owen, M F FDBS 27/09/01 Fail F002 28 New Str, Paisley
01/4765 Smith, J M OOAD 01/10/01 Pass F003 28 New Str, Paisley
00/5021 Mcleod, A F FDBS 27/09/01 Pass F001 6 lady Lane, Paisley

00/4647 Smith, J M FDBS 27/09/01 Fail F001 6 lady Lane, Paisley
Identify the functional dependencies represented in this relation and demonstrate the process of
normalizing this relation into 3NF relations.
(Submit hardcopy)
Part 2 – Implement the Database

1. Create the tables for the University Database database. Where appropriate set field and table
properties, including any required indexes. Ensure that referential integrity is established between
related tables.
2. Create customised forms for data entry.
3. Enter some test data (approximately 5 – 10 rows) into each table.
Part 3 – Query the Database

Before starting this section, please ensure that your tables contain sufficient data to enable you to test
the query transactions described in the University Database case study.
1. Create and save the following query transactions:

(a) List details of all departments located in E Block.
(b) List title, start and end dates of all modules run in the PgDIT course.
(c) List name, address, and salary for each female member of academic staff who
manages a department.
(d) List name, sex, and salary for each lecturer with a PhD degree.
(e) List last name, post, and qualifications of all members of academic staff who are
employed by CIS department.
(f) List matriculation number, last name, and sex of all students who are studying 'multi-
media' module. Order result alphabetically by last name.
(g) List staff number, last name, sex, and post of all academic staff whose salary is
greater than the average salary of all academic staff.
(h) For each course with more than 10 students, list course title and the number of
students (under an appropriate header).
(i) List the number of female members of academic staff and the number of male
members of academic staff employed by CIS department.
(j) For each member of academic staff who spends more than 6 hours teaching any
module list the member of academic staff last name, the module title and the number
of hours.
(k) For each department list the department name, and the number of female members of
academic staff, and the number of male members of academic staff under appropriate
headers (use a crosstab query).
2. Create a customised form or a report for each saved query.
3. Provide 10 additional examples of queries, which retrieve useful data from the University
Database database. State the purpose of each query and attempt to use each example to
demonstrate the breadth of your knowledge of QBE/SQL.
(Only submit hardcopy for Part 3, point 3)

Part 4 – Implement Database Application

Implement a prototype database application for the University Database. The purpose of this prototype
is to allow the Director to provide feedback on your proposed design.
The prototype should facilitate the creation, maintenance and querying of records and where
appropriate automate various tasks for the user.
Part 5 – Document Database Application

1. Create a user manual that describes your prototype for the University Database database
application.
2. The user manual should introduce the database application to the users, describe the functionality
of the database application and clearly demonstrate how to use the application.
Note: The user manual should be a maximum of 15 pages in length (including screen dumps).
(Submit hardcopy)
Part 6 – Demonstrate Database Application

You and the members of your group are required to demonstrate your database application to your Lab
Tutor during your usual lab time in Week 11.
Part 7 – Individual Critical Evaluation
Each student should submit his or her own critical assessment of the coursework. The evaluation
should include a discussion on how the coursework has reinforced (or otherwise) his or her
appreciation of the techniques and processes employed in undertaking a database project. In addition
the evaluation may include a wider discussion on topics such as:
 How the Database Systems module relates to the other modules on your course.
 How the knowledge and skills taught on the module and/or course, relates to your previous
experience as a student and/or employee.
 The appropriateness of the knowledge and skills taught on the module for future employment.
This component of the coursework can be submitted as a separate document from the main part of the
coursework.
(Submit hardcopy)
Marking Scheme
The assessment of this coursework will be carried out on the following components of the work. Please
note that each student should submit his or her own critical evaluation of the coursework and will
receive an individual mark for this component (out of 10%). This individual student mark will be
combined with the mark for the groupwork component (out of 90%) for the coursework.
Part 1 – Design the Database (30)

Part 2 – Implement the Database (15)
Part 3 – Query the Database (15)
Part 4 – Implement Prototype Database Application (15)
Part 5 – Document Database Application (10)
Part 6 – Demonstrate Database Application (5)
Part 7 – Individual Critical Evaluation (10)
Partial ER diagrams
Staff Student Module
staffNo {PK} matricNo {PK} mCode {PK}

name computerId {AK} title
fName name startDate
lName fName endDate
address lName assessment
phone {AK} address coursework
officeNo town exam
sex street texts [1..*]
salary postCode
post dob
computerId {AK} sex
loan
qualifications [1..*]
Course Next-Of-Kin Department
name {PK}
cCode {PK} name {PPK} phone
title {AK} phone faxNo
duration relationship location
'Global' EER diagram
Runs
startDate
1..* 1..1
Manages
Course Leads Staff Department
1..1 0..1
cCode 0..1 1..1 staffNo deptName
Employs
1..1 0..1 1..1 1..* 1..1
0..*
Coordinates
Teaches
Enrol Uses
hours
0..* 1..*
1..* 1..* Module
Student
Undertake mCode
matricNo
0..* 1..*
1..1
Has performance
1..1
Next-Of-Kin
name
3. Logical Design (map ER to Relational)

1. Map entities and their attributes
Department {deptName, phone, faxNo, location}
Staff {staffNo, fName, lName, address, phone, officeNo, sex, salary, post, computerId}
Student {matricNo, fName, lName, town, street, postCode, dob, sex, loan}
Course {cCode, title, duration}
Module {mCode, title, startDate, endDate, coursework, exam}
Next-Of-Kin {matricNo, name, phone, relationship}
2. Final relational schema

Department(deptName, phone, faxNo, location, mgrStaffNo, mgrStartDate)

Primary key: deptName
Foreign key: mgrStaffNo references Staff(staffNo)
Staff(staffNo, fName, lName, address, phone, officeNo, sex, salary, post, computerId,
deptName)
Primary key: staffNo
Foreign key: deptName references Department(deptName)
Course(cCode, title, duration, leaderStaffNo, deptName)

Primary key: cCode
Foreign key: leaderStaffNo references Staff(staffNo)
Foreign key: deptName reference Department(deptName)
Module(mCode, title, startDate, endDate, coursework, exam, courseCode, cordStaffNo)

Primary key: mCode
Foreign key: courseCode references Course(cCode)
Foreign key: cordStaffNo references Staff(staffNo)
Student(matricNo, fName, lName, town, street, postCode, dob, sex, loan, courseCode)
Primary key: matricNo
Foreign key: courseCode references Course(cCode)
Next-Of-Kin(matricNo, name, phone, relationship)

Primary key: matricNo, name
Foreign key: matricNo references Student(matricNo)
Undertake(stdMatricNo, moduleCode, performance)

Primary key: stdMatricNo, moduleCode
Foreign key: stdMatricNo references Student(matricNo)
Foreign key: moduleCode references Module(mCode)
Teaches(teachStaffNo, moduleCode, hours)

Primary key: teachStaffNo, moduleCode
Foreign key: teachStaffNo references Staff(staffNo)
Texts(moduleCode, text)
Primary key: moduleCode, text
Qualifications(qualStaffNo, qualification)
Primary key: qualStaffNo, qualification
Foreign Key: qualStaffNo references Staff(staffNo)
4. Normalisation
Functional dependencies
fd1: (matricNo, moduleTitle) {name, sex, modSartDate, performance, flatNo, address}
fd2: matricNo {name, sex, flatNo, address}
fd3: moduleTitle moduleStartdate
fd5: flatNo address
matricNo name sex moduleTitle moduleStartDate performance flatNo address

fd1
fd2
fd3 fd4
 Primary key: (matricNo, moduleTitle)
 2NF - fd2 and fd3 violate the definition of 2NF. Decompose relation.
matricNo name sex flatNo address moduleTitle moduleStartDate
fd2
fd3
fd4
matricNo moduleTitle performance
fd1
 3NF – fd4 violate 3NF definition. Decompose relation.
pk fk pk
matricNo name sex flatNo moduleTitle moduleStartDate
flatNoflatNo
fd2 fd3
pk
pk
fk fk
moduleTitle moduleStartDate performance flatNo address
fd1 fd4
5. Query the Database
(a) SELECT *
FROM Department
WHERE location = ‘E Block’;
(b) SELECT Module.title, startDate, endDate

FROM Course, Module

WHERE (cCode = courseCode) AND (Course.title = ‘PgDIT’);
(c) SELECT fName, lName, address, salary

FROM Staff, Department
WHERE (mgrStaffNo = staffNo) AND (sex = ‘F’);
(d) SELECT fName, lName, sex, salary

FROM Staff, Qualifications
WHERE (staffNo = qualStaffNo) AND (post = ‘lecturer’) AND
(qualification = ‘PhD’);
(e) SELECT fName, lName, post, qualification

FROM Staff, Qualifiactions, Department
WHERE (qualStaffNo = staffNo) AND (Staff.deptName = Department.deptName)
AND (deptName = ‘CIS’);
(f) SELECT matricNo, lName, sex

FROM Student, Undertake, Module
WHERE (matricNo = stdMatricNo) AND (mCode = moduleCode) AND
(title = ‘mutli-media’);
ORDER BY lName;
(g) SELECT staffNo, lName, sex, post

FROM Staff
WHERE salary > (SELECT AVG(salary)
FROM Staff);
(h) SELECT title, COUNT(*) AS [No of Students]

FROM Student, Course
WHERE (cCode = courseCode)
GROUP BY title
HAVING COUNT(*) > 10;
(i) SELECT sex, COUNT (sex)

FROM Staff
WHERE deptName = ‘CIS’
GROUP BY sex;
(j) SELECT lName, title, hours

FROM Staff, Module, Teaches
WHERE (Staff.staffNo = Teaches.teachStaffNo) AND
(Module.mCode = Teaches.moduleCode) AND
(hours > 6);
Crosstab query (See database system)

Part 5 Selected Database Issues

Chapter 20 Security and Administration
20.1 Describe a general plan of action for initiating a security policy, elaborating on each stage that
might be undertaken.
First of all, the need for one must be appreciated, and there must be commitment on the part of
senior managers. Depending on course coverage, an IT security team may be formed to oversee
the development of the policy. They may decide on an information classification exercise for the
area under consideration, then carry out a risk analysis. Following on from this, the policy will be
prepared, specific responsibilities identified, and then standards and procedures formulated for
implementation. The whole process is iterative, the policy should be continually refined. Certain
aspects of the plan should be elaborated, such as, how information might be classified, how risk
analysis might be carried out, what the policy should cover.
20.2 Detail the types of problems associated with microcomputer security and the types of
countermeasures that could be installed against loss.
The data a PC holds may be considerably more valuable than the machine itself. We are
concerned here with both data security and physical security. Some obvious precautions that
concern the data. For example, careful storage of the media, regular backups taken that are labeled
and classified if appropriate. Working procedures should be appropriately defined. Obvious
physical security such as fixing the machine to a surface, using locks and/or alarming it. Other
measures include using security programs, careful disposal of old media and equipment, staff
training.
20.3 Discuss the problems associated with microcomputer security, and contrast the measures required
to provide a secure environment with those of a mainframe computing environment.
The problems concern data security and physical security. Problems and precautions in dealing
with these are generally as for part two of the previous question. It is important that a contrast is
made with a mainframe environment. For example, in dealing with microcomputers, you are
dealing with individual machines and staff, possibly over a wide location. Much of the
responsibility rests with members of staff, consequently, all staff training is important. In the
mainframe environment it is possible to set up centralized controls (physical and logical).
Consequently, it should be an easier environment to control, with the responsibility residing in the
IT manager.
20.4 Discuss the types of threat that might occur within the general database environment, and
indicate the measures that could be taken to safeguard against them.
The answer should cover the broad categories of theft and fraud, loss of confidentiality, loss of
privacy, loss of integrity and loss of availability. These threats may be either accidental or
intentional. Examples should be given together with the type of measure that could be in place.
For example, a programmer altering programs or data. Programmers should not normally have
access to live programs or data, and any access should be properly authorized with details that
enable any audit to be made properly. Adequate audit controls should also uncover any odd
functioning of the system.
20.5 Explain the integrity features that a database management system may provide making reference
to the system used, and indicate the disadvantages that arise where they are not available.
This answer also requires knowledge of a particular DBMS that is being used. The basic integrity
features can be explained, such as domains, primary key specification, nulls, referential integrity,
with reference to how the particular DBMS provides them. The advantages of having the DBMS
handle these should be appreciated to determine the disadvantages. For example, validation
routines would need to be included in all relevant programs, it is difficult to understand constraints
that are buried in programming code, duplication of effort.
20.6 The increasing accessibility of databases on the Internet and intranets requires a reanalysis and
extension of the normal approaches to security. Discuss some of the issues associated with the
database system security in these environments.
See Section 20.5.

Chapter 22 Transaction Management
22.1 (a) Locking-based algorithms for concurrency control can be employed to synchronize the
execution of transactions. Explain what is meant by a serializable schedule and show
that the following locking-based schedule is not serializable:
S= [wl1(y), wl2(y), R1(y), W1(y), R2(y), W2(y), rl1(y), rl2(y), wl2(z), R2(z), W2(z),
rl2(z), C2,
wl1(z), R1(z), W1(z), rl1(z), C1]
where Ri(x)/Wi(x) indicates a read/write by transaction i on data item x

rli(x)/wli(x) indicates a release/write lock by transaction i on item x
Ci indicates a commit operation by transaction i.
A schedule S is said to be serializable if all the reads and writes of each transaction can be
reordered in such a way that when they are grouped together as in a serial schedule the net effect
of executing this reorganized schedule is the same as that of the original schedule S. This
reorganized schedule is called the equivalent serial schedule. A serializable schedule will there-
fore be equivalent to and have the same effect on the database as some serial schedule.
Use a precedence graph to show that a cycle exists in the schedule.
(b) Identify the problem with the above schedule, and produce a correct locking-based
serializable schedule.
The problem is that the locking algorithm releases the locks that are held by a transaction as soon
as the associated database command is executed, and that lock unit no longer needs to be
accessed. However, the transaction itself is locking other items after it releases its lock on the data
item. This permits transactions to interfere with one another, resulting in the loss of total isolation
and atomicity. Need to use two-phase locking. Correct schedule (for example):
S= [wl1(y), wl1(z), R1(y), W1(y), R1(z), W1(z), rl1(y), rl1(z), C1,

wl2(y), wl2(z), R2(y), W2(y), R2(z), W2(z), rl2(y), rl2(z), C2]
22.2 ‘One of the potential advantages of Distributed Database Management Systems is improved
reliability and availability.’
(a) The consistency and reliability aspects of transactions are due to the ‘ACIDity’
properties of transactions. Discuss each of these properties and how they relate to the
concurrency control and recovery mechanisms. Give examples to illustrate your answer.
A– Atomicity; all-or-nothing property; part of recovery control

C– Consistency (correctness); a transaction is a correct program that maps a database from
one consistent database state to another; part of concurrency control
I– Isolation; requires each transaction to see a consistent database at all times; part of
concurrency control
D– Durability; ensures that once a transaction commits, its results are permanent and cannot
be erased from the database; part of recovery control
22.3 (a) Produce a wait-for-graph for the transactions with locking information shown in Table
1. What can you conclude for this graph?
Transaction Data items locked by Data items transaction is

transaction waiting for
T1 X2 X1
T2 X3, X10 X7, X8
T3 X8 X4
T4 X7 X1
T5 X1, X5 X3
T6 X4, X9 X6
T7 X6 X5
Table 1
The WFG has cycles implying that deadlock exists.
(b) Compare and contrast the approaches to deadlock management in database systems.
Deadlock free => claim all resources at once.

Deadlock prevention => order resources.
Deadlock detection => generate WFGs. Break cycle(s) by rolling back one or more transactions
and restarting them.
22.4 The locking information for several transactions is shown in Table 2. Produce a wait-for-graph
(WFG) for the transactions and determine whether deadlock exists.
Transaction Data items locked by Data items transaction is

transaction waiting for
T1 X2 X1
T2 X3, X10 X7, X8
T3 X8 X4
T4 X7 X1
T5 X1, X5 X3
T6 X4, X9 X6
T7 X6 X5
Table 2
T1 T2
T5 T4 T3
T7 T6
WFG will show that system is in deadlock.
22.5 Locking-based algorithms for concurrency control can be employed to synchronize the execution
of transactions. Explain the rules for two-phase locking in a centralized Database Management
System and why each of these is necessary to avoid the database becoming inconsistent.
See Section 22.2.3.
22.6 A taxonomy of concurrency control algorithms can classify algorithms as pessimistic or

optimistic. Compare and contrast these algorithms.
Optimistic: assume not too many transactions will conflict with one another. Delay
synchronization of transactions until their termination.
Pessimistic: assume many transactions will conflict with each other. Synchronization execution of
transactions early in their lifecycle.
Student Project
Assignment – Transaction Management Design and Implementation
Introduction
The objective of this project is to design and implement part of a Relational Database Management System.
The work is to be undertaken in groups – it will be left to yourselves to split into groups of 2/3.
Specification of Requirements
Design and implement an interface to the proprietary Relational Database Management System, Oracle, on
the SUN workstations, which will provide layers for:
(a) authorization
(b) concurrency control
(c) deadlock management
(d) recovery control.
The interface will be implemented using embedded SQL based on the architecture of a client–server model.
There will be a regular project meeting scheduled for each group, at which time the group will present an
outline of the work carried out since the last meeting with a list of current problems/suggested solutions.
Marking Scheme
Design (including use of correct algorithms) 30

Test Strategy 10
Implementation 30
Critical Appraisal 10
Project Working 10
Demonstration 10
Chapter 23 Query Processing
Client (clientNo, cName, cAddress, cTelNo, cFaxNo, sex, DOB, officeNo)

Item (itemNo, itemDescription, itemSize, itemColor, itemPrice, itemType)
ItemType (itemType, typeDescription)
ClientOrder (orderNo, clientNo, dateOrder, dateDeliver)
OrderDetail (orderNo, itemNo, noOfItems)
Staff (staffNo, sName, sAddress, sTelNo, sex, DOB, NIN, taxCode, salary, officeNo)
Office (officeNo, oAddress, oTelNo, oFaxNo, mgrStaffNo, areaNo)
Area (areaNo, areaDescription)
For a more complete description, see Case Study 3 under Chapter 22.
23.1 Using the above relational schema, determine whether the following query is both type and
semantically correct:
SELECT CO.orderNo, C.cAddress, OD.itemNo

FROM OrderDetail OD, ClientOrder CO, Client C, Office O, Area A
WHERE CO.clientNo = C.clientNo AND
C.officeNo = O.officeNo AND
O.areaNo = A.areaNo AND
A.areaDescription = ‘SE’ AND
C.cName = ‘J. Smith’;
Query graph will show that OrderDetail has no join to the other relations, although it is
connected to the Result. Should conclude that it is not semantically correct.
23.2 Consider the above relational schema. Map the following query onto a relational algebra tree,
and then transform it into a reduced query:
SELECT CO.orderNo, C.cAddress, O.oAddress

FROM ClientOrder CO, Client C, Office O, Area A
C.officeNo = O.officeNo AND
CO.dateDeliver < ‘1-Jun-96’ AND
O.areaNo = A.areaNo AND
A.areaDescription = ‘NE’ AND
C.cName = ‘J. Smith’ AND
CO.dateOrder > ‘1-Jan-96’;
Give a full explanation of the reasoning behind each step and state any transformation rules
used during the reduction process.
 CO.orderNo, cAddress, oAddress
3 areaNo
3 officeNo  areaDescription = ‘NE’
3 clientNo O A
 dateOrder > ‘1-Jan-96’ AND dateDeliver < ‘1-Jun-96’  cName = ‘J. Smith’
CO C
 CO.orderNo, cAddress, oAddress
3 areaNo
orderNo, cAddress, oAddress, areaNo areaNo
3 officeNo  areaDescription = ‘NE’
orderNo, cAddress
officeNo, oAddress, areaNo
3 clientNo O A
clientNo,orderNo
clientNo, cAddress
 dateOrder > ‘1-Jan-96’ AND dateDeliver < ‘1-Jun-96’  cName = ‘J. Smith’
CO C
Owner (ownerNo, oName, oAddress, oTelNo)

Pet (petNo, petName, petDescription, petSex, petDOB, dateRegistered,
ownerNo, surgeryNo)
Staff (staffNo, sName, sAddress, sTelNo, sex, DOB, position, NIN, taxCode, salary,
surgeryNo)
Surgery (surgeryNo, surgeryAddress, surgeryTelNo, surgeryFaxNo, mgrStaffNo,
areaNo)
Prescription (prescNo, petNo, medNo, prescStaffNo, adminMethod, unitsPerDay,
startDate, finishDate)
Medication (medNo, medName, description, dosage, cost)
Area (areaNo, areaName)
SELECT P.petNo, petName, oName, O.TelNo

FROM Pet P, Owner O, Prescription Pr, Medication M, Surgery S
WHERE Pr.medNo = M.medNo AND
S.surgeryNo = O.surgeryNo AND
PR.petNo = P.petNo AND
P.ownerNo = O.ownerNo AND
M.medNo = ‘J. Smith’ AND adminMethod = ‘Oral’ AND
S.surgeryNo = 100;
Various problems with query:

surgeryNo is not part of Owner table;
medNo is not a textual attribute but an integer;
TelNo is not part of Owner (its oTelNo)
SELECT P.petNo, petName, oName, oAddress

FROM Pet P, Owner O, Prescription P, Medication M
WHERE PR.medNo = M.medNo AND
PR.petNo = P.petNo AND
P.ownerNo = O.ownerNo AND
medName = ‘ Provac’ AND
unitsPerDay > 200 AND
petDescription = ‘Setter’;
 p.petNo, petName, oName, oAddress  p.petNo, petName, oName, oAddress
 petdescription = ‘Setter’ 3 ownerNo
 unitsPerDay > 200

O 3 petNo
 medName = ‘Provac’
3 medNo
 petDescription = ‘Setter’
3 ownerNo
P  unitsPerDay > 200  medName = ‘Provac’
O 3 petNo
PR M
3 medNo
P
PR M
 p.petNo, petName, oName, oAddress
3 ownerNo
 ownerNo, oName, oAddress 3 petNo
O  p.petNo, petName  p.petNo
 petDescription = ‘Setter’ 3 medNo
P  medNo
petNo, medNo
 unitsPerDay > 200  medName = ‘Provac’
PR M
23.5 Now assume that the relation Medication given in Question 20.4 is horizontally fragmented as follows:
M1 = medName = ‘Provac’(Medication)
M2 = medName != ‘Provac’(Medication)
and that Prescription is horizontally derived from Medication:
PRi = Prescription medNo Mi 1i2
Transform the relational algebra tree from Question 20.4 into a reduced query on fragments.
3 medNo 3 medNo
 petNo, medNo  medNo  petNo, medNo  medNo
 unitsPerDay > 200  medName = ‘Provac’  unitsPerDay > 200

M1
  
PR1 PR2 M1 M2 PR1 PR2
3 medNo
 petNo, medNo  medNo
 unitsPerDay > 200

M1
PR1
Client (clientNo, cName, cAddress, cTelNo, cFaxNo, officeNo)

Unit (unitRegNo, unitDescription, maxPayload, officeNo)
Trailer (trailerNo, trailerDescription, trailerLength, maxCarryingWt, officeNo)
ClientOrder (orderNo, clientNo, dateOrder, collectDate, collectAddress, deliveryDate,
deliveryAddress, loadWeight, loadDescription)
TransportReq (orderNo, unitRegNo, trailerNo)
officeNo)
Office (officeNo, oAddress, oTelNo, oFaxNo, countryNo)
Country (countryNo, countryName)
SELECT CO.clientNo, cAddress

FROM TransportReq TR, ClientOrder CO, Client C, Unit U, Trailer T
TR.unitRegNo = U.unitRegNo AND
TR.trailerNo = T.trailerNo AND
maxCarryingWt < maxPayload AND
LoadWeight < maxPayload;
Query graph will show that the join graph is connected to the Result. Should conclude that it is
semantically correct.
SELECT trailerDescription, unitDescription

FROM Trailer T, Unit U, Office O, TransportReq TR
WHERE TR.trailerNo = T.trailerNo AND
TR.unitRegNo = U.unitRegNo AND
U.officeNo = O.officeNo AND
maxPayload > maxCarryingWt AND
O.officeNo = 2;
 trailerDescription, unitDescription  trailerDescription, unitDescription
3 officeNo
 officeNo = 2 maxPayload > maxCarryingwt officeNo = 2
3 unitRegNo O
 maxPayload > maxCarryingwt 3 trailerNo  officeNo = 2
3 officeNo  officeNo = 2 TR U
3 unitRegNo T
O
3 trailerNo U
T TR
 trailerDescription, unitDescription
3 officeNo
 maxPayload > maxCarryingWt

trailerDescription, unitDescription, maxPayload, maxCarryingWt, officeNo  officeNo
3 unitRegNo  officeNo = 2
trailerDescription, maxCarryingWt, officeNo, unitRegNo
O
3 trailerNo  officeNo = 2
 officeNo = 2 U
TR
T
23.8 Now assume that the relation Trailer given in Question 20.7 is horizontally fragmented as
follows:
T1 = officeNo < 2(Trailer)

T2 = officeNo = 2(Trailer)
T3 = officeNo > 2(Trailer)
and that TransportReq is horizontally derived from Trailer:
TRi = TransportReq trailerNo Ti 1i3
Transform the relational algebra tree from Question 21.7 into a reduced query on fragments.
3 officeNo
maxpayload > maxCarryingWt
3 trailerNo  officeNo = 2 O
officeNo = 2  U
 TR1 TR2 TR3
T1 T2 T3
Using the fact that the Selection operation on (officeNo = 2) will not produce any tuples from
T1 and T3, this will simplify to the following tree:
3 officeNo
maxpayload > maxCarryingWt
3 trailerNo  officeNo = 2 O
T2 TR2 U
Part 6 Distributed DBMSs and Replication

Chapter 24 Distributed DBMSs – Concepts and Design
Case Study 1 – Real Estate Agency
A large real estate agency has decided to distribute its project management information at the regional
level. A part of the current centralized relational schema is as follows:
Employee (NIN, fName, lName, address, DOB, sex, salary, taxCode, agencyNo)
Agency (agencyNo, agencyAddress, managerNIN, propertyTypeNo,
regionNo)
Property (propertyNo, propertyTypeNo, propertyAddress, ownerNo,
askingPrice, agencyNo, contactNIN)
Owner (ownerNo, fName, lName, ownerAddress, ownerTelNo)
PropertyType (propertyTypeNo, propertyTypeName)
Region (regionNo, regionName)
where Employee contains employee details and the national insurance number NIN is the key.
Agency contains agency details and agencyNo is the key. managerNIN identifies
the employee who is the manager of the agency. There is only one manager
for each agency; an agency only handles one type of property.
Property contains details of the properties the company is dealing with and the key is
propertyNo. The agency that deals with the property is given by
agencyNo, and the contact in the estate agents by contactNIN; owner is
given by ownerNo.
Owner contains details of the owners of the properties and the key is ownerNo.
PropertyType contains the names of the property types and the key is propertyTypeNo.
and Region contains names of the regions and the key is regionNo.
Agencies are grouped regionally as follows:
Region 1: North; Region 2: South; Region 3: East; Region 4: West
Information is required by property type, which covers: Domestic, Industrial, and Letting. There are no
Industrial properties in the South and all Letting properties are in the West of Scotland. Properties are
handled by the local estate agents office. As well as distributing the data on a regional basis, there is an
additional requirement to access the employee data either by personal information (by Personnel) or by
salary-related information (by Payroll).
24.1 (a) Draw an Entity–Relationship Diagram for the above case study.
Employee Handles
NIN
1..1
1..* 1..1
Manages
Has
1..1 0..1 0..*
Region Contains Agency Runs Property Owns Owner

regionNo agencyNo propertyNo ownerNo
1..1 1..* 1..1 1..* 1..* 1..1
1..*
TypeFor
1..1
PropertyType
propertyTypeNo
(b) Using this diagram from (a) above, produce a distributed database design for the system
that satisfies the correctness rules for fragmentation and include:
(i) a suitable fragmentation schema for the system;

(ii) in the case of primary horizontal fragmentation, give a minimal set of
predicates;
(iii) the reconstruction of global relations from fragments.
Give a full explanation of the reasoning behind each step and state
any assumptions necessary to support your design.
Possible solution
Don’t fragment PropertyType/Region – replicate relations at all sites – only contain a small
number of records.
Agency
Use primary horizontal fragmentation for Agency with minterm predicates:
{regionNo = 1 (‘North’) and propertyTypeNo = 1 (‘Domestic’),

regionNo = 1 (‘North’) and propertyTypeNo = 2 (‘Industrial’),
regionNo = 2 (‘South’) and propertyTypeNo = 1 (‘Domestic’),
regionNo = 3 (‘East’) and propertyTypeNo = 1 (‘Domestic’),
regionNo = 3 (‘East’) and propertyTypeNo = 2 (‘Industrial’),
regionNo = 4 (‘West’) and propertyTypeNo = 3 (‘Letting’)}
A1: regionNo = 1 and propertyTypeNo = 1 (Agency)


A8: regionNo = 4 and propertyTypeNo = 3(Agency)
Reconstruction: A1  A2  A3  A4  A5  A6  A7  A8
Employee
Use vertical fragmentation for Employee:
E1: NIN, fName, lName, address, DOB, sex, agencyNo(Employee)

E2: NIN, salary, taxCode(Employee)
Property and Owner

Use derived fragmentation for Property and Owner:
Pi: Property Ai 1  i  8
Oi: Owner Pi 1i8
Reconstruction: P1  P2  P3  P4  P5  P6  P7  P8.
Reconstruction: O1  O2  O3  O4  O5  O6  O7  O8.
Case Study 2 – Quack Consulting
Quack Consulting is a computer consulting firm that specializes in developing and installing PC-based
hardware/software systems. Quack Consulting has decided to distribute its project management
information at the regional level. A part of the current centralized relational schema is as follows:
Client (clientNo, fName, lName, address, regionNo, telNo)

Project (projectNo, clientNo, projectStart, projectEnd, managerNIN)
Speciality (skillID, skillDescription, costRate)
Consultant (NIN, fName, lName, address, DOB, sex, salary, taxCode, regionNo,
skillID)
Task (projectNo, consultantNIN, taskDescription, taskStart, taskEnd)
Time (projectNo, consultantNIN, date, hours)
Region (regionNo, regionName)
where Client contains client details and the client number clientNo is the key.
Project contains project details and the project number projectNo is the key.
Speciality contains details of all consultants’ skills (such as Project Management, C
Programming, SSADM) and the skill identifier skillID is the key.
Consultant contains consultant details and the national insurance number NIN is the key.
Task Projects are often quite complex and consist of separate tasks, each of which
consists of a description, a start and end date and the national insurance number
of the consultant assigned to the task. Only one consultant is ever assigned to a
task. The combination of projectNo/consultantNIN is the key.
Time contains details of the time a consultant has spent on a task and the key is the
combination of projectNo/consultantNIN/date.
and Region contains names of the regions and the key is regionNo.
Consultants are grouped regionally as follows:
Region 1: North; Region 2: South; Region 3: East; Region 4: West
In addition, clients are grouped into the same regions; projects are managed by the office closest to the
client. As well as distributing the data on a regional basis, there is an additional requirement to access the
employee data either by personal information (by Personnel) or by salary-related information (by Payroll).
Time
NIN, projectNo,
date
1..*
Has
1..1
Project HasTask Task WorksOn Consultant For Speciality

projectNo NIN, projectNo NIN skillID
1..1 1..* 1..* 1..1 1..* 1..1
1..* 0..* 1..1 1..*
Manages
Runs Contains
1..1 1..1
Client Handles Region

clientNo regionNo
1..* 1..1

predicates;
Give a full explanation of the reasoning behind each step and state any assumptions
necessary to support your design.
Possible solution
Don’t fragment Speciality/Region – replicate relations at all sites – only contain a small number of
records.
Consultant
Use vertical fragmentation for Consultant first:
CA = NIN, fName, lName, address, sex, regionNo, skillID(Consultant)

CB = NIN, DOB, salary, taxCode(Consultant)
Then use primary horizontal fragmentation for Consultant with minterm predicates:
{regionNo = 1, regionNo = 2, regionNo = 3, regionNo = 4}
CA1 regionNo = 1 (CA) (North)

CA2 regionNo = 2 (CA) (South)
CA3 regionNo = 3 (CA) (East)

CA4 regionNo = 4 (CA) (West)
Reconstruction: (CB NIN (CA1  CA2  CA3  CA4))
Client
Use primary horizontal fragmentation for Client with minterm predicates:
{regionNo = 1, regionNo = 2, regionNo = 3, regionNo = 4}
Cl1 regionNo = 1 (Client) (North)

Cl2 regionNo = 2 (Client) (South)
Cl3 regionNo = 3 (Client) (East)
Cl4 regionNo = 4 (Client) (West)
Reconstruction: Cl1  Cl2  Cl3  Cl4
Tasks and Time

Use derived fragmentation for Task and Time:
Ti: Task consultantNIN=NIN CAi 1i4

Tmi: Time consultantNIN=NIN CAi 1i4
Reconstruction: T1  T2  T3  T4
Tm1  Tm2  Tm3  Tm4
Project
Use derived fragmentation for Project:
Pi: Project clientNo Cli 1i4
Reconstruction: P1  P2  P3  P4
A home shopping catalog company called InstantBuy specializes in the provision of clothing and
household items for customers. InstantBuy has many offices throughout the UK and Eire to process
customer orders and has decided to distribute its operations according to areas of the UK and Eire. There is
also an additional requirement to distribute staff information according to area and furthermore to allow
access to staff data either by personal information (by Personnel) or by-salary-related information (by
Payroll).
Client (clientNo, cName, cAddress, cTelNo, cFaxNo, Sex, DOB, officeNo)

Item (itemNo, itemDescription, itemSize, itemColor, itemPrice, itemTypeNo)
ItemType (itemTypeNo, typeDescription)
ClientOrder (orderNo, clientNo, dateOrder, dateDeliver)
OrderDetail (orderNo, itemNo, noOfItems)
Staff (staffNo, sName, sAddress, sTelNo, Sex, DOB, NIN, taxCode, salary,
officeNo)
Office (officeNo, oAddress, oTelNo, oFaxNo, mgrStaffNo, areaNo)
Area (areaNo, areaDescription)
where:
Client contains the details of clients and the client number (clientNo) is the key. Clients are
registered with the office nearest to their home.
Item contains the details of items and the item number (itemNo) is the key.
ItemType contains the description of types of items and item type (itemType) is the key.
ClientOrder contains the details of client orders for clothing and/or household items and order
number (orderNo) is the key.
OrderDetail contains the details of items ordered and order number / item number (orderNo,
itemNo) is the key.
Staff contains the details of staff and staff number (staffNo) is the key.
Office contains the details of offices and office number (officeNo) is the key. Each office has a
Manager.
Area contains the description of areas of the UK and Eire and the key is area number
(areaNo).
InstantBuy offices are grouped into areas of the UK and Eire as follows:
Area 1: Scotland and Wales Area 3: North England

Area 2: Eire and Northern Ireland Area 4: South England
InstantBuy provides various types of items that fall into the following categories:
W: Womenswear C: Childrenswear
M: Menswear H: Household
InstantBuy only provides household items to customers in Scotland and Wales, and North England, and
does not provide childrenswear items in Eire and Northern Ireland.
Client Orders ClientOrder Contains OrderDetail

clientNo orderNo orderNo, itemNo
1..1 1..* 1..1 1..*
1..* 1..*
Handles PartOf
1..1 1..1
Office HasTask Area Item

officeNo areaNo itemNo
1..1 1..*
1..1 0..1 1..*
Has
Manages TypeFor
1..* 1..1 1..1
Staff ItemType
staffNo itemTypeNo

predicates;
Possible solution
Area /ItemType/Item Relations

No fragmentation.
Assumption – The Area/ItemType/Item relations will be used for reference purposes and will not be
subjected to frequent updates. Although some of the areas require access to only part of the Items table,
there are future plans to offer all of the item types in all of the areas.
Office/Client/ClientOrder/OrderDetail Relations
Predicates
{areaNo = 1, areaNo = 2, areaNo = 3, areaNo =4}
Office (PHF) = Oi: areaNo = i(Office)

Client (DHF) = Ci: Client officeNo Oi
ClientOrder (DHF) = COI: ClientOrder clientNo CIE
OrderDetail (DHF) = ODI: OrderDetail orderNo COi

Staff – Mixed
S1:  staffNo, sName, sAddress, sTelNo, sex, officeNo(Staff)
S2:  staffNo, DOB, NIN, taxCode, salary(Staff)
then:
S1i: S1 officeNo Oi
Reconstruction
Office = O1  O2  O3  O4
Client = C1  C2  C3  C4
ClientOrder = CO1  CO2  CO3  CO4
OrderDetail = OD1  OD2  OD3  OD4
Staff = (S11  S21  S31  S41) NIN S2
A company called Complete Pet Care provides private health-care for domestic pets throughout the
UK. Complete Pet Care, which currently has over one hundred surgeries and opens a new surgery
almost every month, has decided to distribute its operations according to areas of the country (i.e.
Scotland, North England, South East England, South West England, Wales and Northern Ireland). The
company proposes to distribute staff details to the appropriate areas, however staff payroll details will
be processed by the Head Office of Complete Pet Care, which is located in the Scotland.
Owner (ownerNo, oName, oAddress, oTelNo)

Pet (petNo, petName, petDescription, petSex, petDOB, dateRegistered,
ownerNo, surgeryNo)
surgeryNo)
Surgery (surgeryNo, surgeryAddress, surgeryTelNo, surgeryFaxNo, mgrStaffNo,
areaNo)
Prescription (prescNo, petNo, medNo, prescStaffNo, adminMethod, unitsPerDay,
startDate, finishDate)
Medication (medNo, medName, description, dosage, cost)
Area (areaNo, areaName)
where
Owner contains details of owners and the owner number (ownerNo) is the key. Owners are
registered with the nearest surgery in their area. Owners may own several pets.
Pet contains details of pets and the pet number (petNo) is the key. The owner is given by
the owner number (ownerNo) and the surgery by the surgery number (surgeryNo).
A pet can only be registered with one surgery at a time.
Staff contains details of staff and staff number (staffNo) is the key.
Surgery contains details of each surgery and the surgery number (surgeryNo) is the key. Each
surgery has a Manager represented as manager staff number (mgrStaffNo).
Prescription contains details of the prescriptions for the pets and the prescription number
(prescNo) is the key. The pet receiving the treatment is represented by the pet
number (petNo), the medication prescribed is represented by the medication number
(medNo) and the vet who prescribed the medication is given by the vet’s staff
number (prescStaffNo).
Medication contains details of medication prescribed to pets and the medication number
(medNo) is the key.
Area contains the names of each area and the area number (areaNo) is the key.
All primary key fields are physically implemented as integers.

Medication UsedIn Prescription Prescribes Staff

medNo prescNo staffNo
1..1 1..* 0..* 1..1
1..* 1..1 1..*
Receives
1..1
Owner Owns Pet Manages

ownerNo petNo
1..1 1..* Has
1..*
CaresFor
1..1
Surgery
0..1
surgeryNo
1..1
1..*
Contains
1..1
Area
areaNo
(I) a suitable fragmentation schema for the system;

predicates;
Possible solution
Area – Don’t fragment.
Surgery – Horizontally fragment by area:
Si:  areaNo = I (Surgery) I = 1 .. 6
Minterm predicates: {areaNo = 1, areaNo = 2, areaNo = 3, areaNo = 4, areaNo = 5,

areaNo = 6}
Pet – Horizontally derived from Surgery:
Pi: Pet surgeryNo Si I = 1 .. 6
Prescription: Horizontally derived from Pet:
PRi: Prescription petNo Pi I = 1 .. 6
Owner: Horizontally derived from Pet:
Oi: Owner petNo Pi I = 1 .. 6
Medication: Horizontally derived from Prescription:
Mi: Medication medNo PRi I = 1 .. 6
Staff: Horizontally derived from Surgery + vertical fragmentation
ST1: staffNo, sName, sAddress, sTelNo, surgeryNo(Staff)

ST2: staffNo, sex, DOB, position, NIN, taxCode, salary(Staff)
ST1i: ST1 surgeryNo Si I = 1 .. 6
Reconstructions:
Surgery: S1  S2  S3  S4  S5  S6
Pet: P1  P2  P3  P4  P5  P6
Prescription: PR1  PR2  PR3  PR4  PR5  PR6
Medication: M1  M2  M3  M4  M5  M6
Owner: O1  O2  O3  O4  O5  O6
Staff: (ST11  ST12  ST13  ST14  ST15  ST16) nin ST2
A haulage company called Rapid Roads specializes in the transportation of loads throughout the UK
and Europe. Rapid Roads has many offices throughout the UK and Europe to process customer orders
and has decided to distribute its operations according to these countries. The company also proposes to
distribute staff details to the appropriate countries, however staff payroll details will be processed by
the Head Office of Rapid Roads, which is located in the UK.
Client (clientNo, cName, cAddress, cTelNo, cFaxNo, officeNo)

Unit (unitRegNo, unitDescription, maxPayload, officeNo)
Trailer (trailerNo, trailerDescription, trailerLength, maxCarryingWt, officeNo)
ClientOrder (orderNo, clientNo, dateOrder, collectDate, collectAddress, deliveryDate,
deliveryAddress, loadWeight, loadDescription)
TransportReq (orderNo, unitRegNo, trailerNo)
officeNo)
Office (officeNo, oAddress, oTelNo, oFaxNo, countryNo)
where
Client contains the details of clients and the client number (clientNo) is the key. Clients are
registered with an office in their country and nearest to the location of their company.
Unit contains the details of the unit that pulls one or two trailers and the registration
number (unitRegNo) is the key.
Trailer contains the description of the trailer that is pulled by the unit and the trailer number
(trailerNo) is the key.
ClientOrder contains the details of client orders for the transportation of a load from the collection
address (collectAddress) to the delivery address (deliveryAddress) and the order
number (orderNo) is the key.
TransportReq contains the details of the transportation (units plus trailers) required to fulfill a
clients order. The client’s order number (orderNo) and the registration number of a
unit (unitRegNo) is the key.
Staff contains the details of staff and staff number (staffNo) is the key.
Office contains the details of offices and office number (officeNo) is the key. Each office
has a Manager. Orders
Client ClientOrder Contains TransportReq
Country contains the name of the country andorderNo
clientNo orderNo,
country number (countryNo) is the key. unitNo,
1..1 1..* 1..1 1..* trailerNo
The offices of Rapid Roads

1..* are grouped into countries as follows: 1..* 1..*
Handles UPartOf
Country 1 (C1): UK1..1 Country 4 (C4): Switzerland
HasUnit Unit
Country 2 (C2): France (C5): Spain 1..1
Country 5unitNo TPartOf
Office 1..1 1..*
Country 3 (C3): officeNo
Germany Country 6 (C6): Italy 1..1
HasTrailer
Trailer
1..1 1..*
1..* trailerNo
Runs
1..1
Country
countryNo

predicates;
Possible solution
Country – Don’t fragment.

Office – Oi =  countryNo = I (Office) I = 1..6
Predicates: {countryNo = 1, countryNo = 2, countryNo = 3, countryNo = 4, countryNo = 5, countryNo

= 6}
Client/Unit/Trailer
CIE = Client officeNo Oi I = 1..6
Ui = Unit officeNo Oi I = 1..6
Ti = Trailer officeNo Oi I = 1..6
ClientOrder – COi = ClientOrder clientNoCIE I = 1..6

TransportReq – TRi = TransportReq orderNo COi I = 1..6
Staff – S1 = staffNo, sName, sAddress, sTelNo, sex, DOB, position, officeNo(Staff)

S2 = staffNo, NIN, taxCode, salary, officeNo(Staff)
Si1 = S1 officeNo Oi I = 1..6
(ii) Reconstruction:
Office: O1  O2  O3  O4  O5  O6
Client: C1  C2  C3  C4  C5  C6
Unit: U1  U2  U3  U4  U5  U6
Trailer: T1  T2  T3  T4  T5  T6
ClientOrder: CO1  CO2  CO3  CO4  CO5  CO6
TransportReq: TR1  TR2  TR3  TR4  TR5  TR6
Staff: (S11  S12  S13  S14  S15  S16 ) NIN S2
Perilous Printing is a large printing company that does work for book publishers throughout Europe.
The company currently has over 50 offices, most of which operate autonomously, apart from salaries,
which are paid by the head office in each country. To improve the sharing and communication of data,
the company has decided to implement a Distributed DBMS. Perilous Printing jobs consist of printing
books or part of books. A printing job requires the use of materials, such as paper and ink, which are
assigned to a job via purchase orders. Each printing job may have several purchase orders assigned to
it. Likewise, each purchase order may contain several purchase order items.
Office (officeNo, oAddress, oTelNo, oFaxNo, mgrNIN, countryNo)

Staff (NIN, fName, lName, sAddress, sTelNo, sex, DOB, position, taxCode, salary,
officeNo)
Publisher (pubNo, pName, pCity, pTelNo, pFaxNo, creditCode, officeNo)
Bookjob (jobNo, pubNo, jobDate, jobDescription, jobType, supervisorNIN)
PurchaseOrder (jobNo, poNo, poDate)
POItem (jobNo, poNo, itemNo, quantity)
Item (itemNo, itemDescription, amountInStock, price)
Office contains details of each office and the office number (officeNo) is the key. Each
office has a Manager represented by the manager’s national insurance number
(mgrNIN).
Staff contains details of staff and the national insurance number (NIN) is the key. The
office that the member of staff works from is given by officeNo.
Publisher contains details of publisher and the publisher number (pubNo) is the key. Publishers
are registered with the nearest office in their country, given by officeNo.
Bookjob contains details of publishing jobs and the job number (jobNo) is the key. The
publisher is given by the publisher number (pubNo) and the supervisor for the job by
supervisorNIN.
PurchaseOrder contains details of the purchase orders for each job and the combination of job number
and a purchase order number (jobNo, poNo) form the key.
Item contains details of all materials that can be used in printing jobs and the item number
(itemNo) is the key.
POItem contains details of the items on the purchase order and (jobNo, poNo, itemNo)
forms the key.
Country contains the names of each country that Perilous Printing operates in and the country
number (countryNo) is the key.
As well as accessing printing jobs based on the publisher, jobs can also be accessed on the job type
(jobType), which can be: 1 – Normal; 2 – Rush.
The offices of Perilous Printing are grouped into countries as follows:
Country 1: UK Country 2: France Country 3: Germany

Country 4: Italy Country 5: Spain
Publisher Orders BookJob Orders PurchaseOrder Contains POItem

pubNo jobNo jobNo, poNo jobNo, poNo,
1..1 1..* 1..1 1..* 1..1 1..* itemNo
1..* 1..*
1..*
Handles Supervises
PartOf
1..1 Has 1..1 1..1
Office 1..1 1..* Staff Item

officeNo staffNo itemNo
Manages
0..1 1..1
1..* 1..*
Runs TypeFor
1..1 1..1
Country ItemType
countryNo itemTypeNo

predicates;
Possible solution
Country – Don’t fragment.

Item – Don’t fragment.
Office – Don’t fragment.
Publisher
Pi: officeNo=I(Publisher) I = 1…50
minterm predicates: {officeNo=1, officeNo=2, … officeNo=50}

Staff
S1: NIN, sName, sAddress, sTelNo, officeNo(Staff)
S2: NIN, sex, DOB, position, taxCode, salary, officeNo(Staff) Note duplication of officeNo is necessary here
S1i: officeNo=I (S1) I = 1…50
S2j: officeNo=j (S2) j = 1…5, where officei is the HQ for Country I
minterm predicates for S1i: {officeNo=1, officeNo=2, … officeNo=50}

minterm predicates for S2j: {officeNo=1, officeNo=2, … officeNo=5}
Bookjob
Bi: Bookjob pubNo Pi I = 1…50
PurchaseOrder
POi: PurchaseOrder jobNo Bi I = 1…50
POItem
POIi: POItem jobNo,poNo POi I = 1…50
Reconstructions
Publisher: P1  P2  …  P50
Bookjob: B1  B2  …  B50
PurchaseOrder: PO1  PO2  …  PO50
POItem: POI1  POI2  …  POI50
Staff: (S11  S12  …  S150) NIN (S21  S22  …  S25)
22.7 Discuss the advantages and disadvantages of fragmentation.
Advantages
Locality of Reference. Data stored close to where it is used, if possible. If a fragment is used at a
number of sites, it may be advantageous to store copies of the fragment at these sites.
Improved Reliability and Availability. Reliability and availability are improved by replication;
there is another copy of the fragment available at another site in the event of one site failing.
Performance. Inherent parallelism and distribution of resources.
Storage capacities and costs.
Reduced Communication costs.
Security. Data not required by local applications is not stored and so not available for
unauthorized users.
Disadvantages
Design. May be difficult to design and allocate fragments efficiently.
Query optimization is more difficult.
Integrity is more difficult.
24.8 (a) A DDBMS may be classified as homogeneous or heterogeneous. Compare and contrast
these two types of distributed systems.
In a homogeneous system, all sites use the same DBMS package. In a heterogeneous system, sites
may run different DBMS packages. Not only may the packages be different, but the packages
may not use the same underlying data model, so that there may be a mixture of relational, network
and hierarchic DBMSs.
Homogeneous systems are much easier to design and manage. This approach provides
incremental growth, making the addition of a new site to the distributed system easy, and
increased performance, by exploiting the parallel processing capability of multiple sites.
Heterogeneous systems usually result when individual sites have implemented their own database
solutions, and integration is considered at a later date. In a heterogeneous system, translations are
required to allow the different DBMSs to communicate. To provide DBMS transparency, users
must be able to make requests in the language of the DBMS at the local site. The system then has
the task of locating the data and performing any necessary translation. Data may be required from
another site which may have:
 different hardware
 different DBMS packages
 different hardware and different DBMS packages.
(b) Discuss the extended capabilities or services that a DDBMS must provide over a
centralized DBMS.
 extended communication services to provide access to remote sites and allow the transfer of
queries and data among the sites, using a network;
 extended Data Dictionary to store data distribution details;
 distributed query processing including query optimization and remote data access;
 extended concurrency control to maintain consistency of replicated data;
 extended recovery services to take account of failures of individual sites and the failures of
communication links.
24.9 Consider the following simplified relational schema for InstantBuy:
OrderDetail(orderNo, itemType) 10,000 records stored in London

Client(clientNo, cCity) 1,000 records stored in Glasgow
ClientOrder(clientNo, orderNo) 100,000 records stored in London
To list the clients in Edinburgh who have ordered items of type ‘TV3190’, we can use the SQL query:
SELECT C.clientNo
FROM Client C, OrderDetail OD, ClientOrder CO
WHERE C.clientNo = CO.clientNo AND CO.orderNo = OD.orderNo AND
cCity = ‘Edinburgh’ AND itemType = ‘TV3190’;
For simplicity, assume that each tuple in each relation is 10 characters long, there are 100
clients who have ordered item ‘TV3190’, there are 10 clients in Edinburgh and computation
time is negligible compared to communication time. The communication system has a data
transmission rate of 10,000 characters per second and a 1-second access delay to send a
message from one site to another. For the following five possible strategies for this query,
calculate the communication times, using the following algorithm:
Communication Time = C0+(no_of_bits_in_message/transmission_rate_per_bit)
where C0 is the access delay.
Strategy Description
1 Move the Client relation to London and process query there.
2 Move the ClientOrder and OrderDetail relations to Glasgow and process query there.
3 Join the ClientOrder and OrderDetail relations at London, select tuples for items ‘TV3190’, and then
for each of these tuples in turn, check at Glasgow to determine if the associated Client city is Edinburgh.
4 Select Clients in Edinburgh at Glasgow, and for each one found, check at London to see if there is a
ClientOrder involving that Client and a ‘TV3190’ OrderDetail.
5 Join ClientOrder and OrderDetail relations at London, select ‘TV3190’ items and project result over
clientNo and orderNo and move this result to Glasgow for matching with Edinburgh Clients. For
simplicity, assume that the projected result is still 10 characters long.
State any assumptions necessary to support your calculations.
Strategy 1: Time = 1 + (1,000 * 10 / 10,000) = 2 seconds

Strategy 2: Time = 2 + [(100,000 + 10,000) * 10 / 10,000]  1.9 minutes
Strategy 3: The check for each tuple will involve two messages: a query and a response.
Time = 100* [1 + 10 / 10,000] + 100* 1.1  3.35 minutes
Strategy 4: Again, two messages are needed.
Time = 10 * [1 + 10 / 10,000] + 10 * 1.1  20 seconds
Strategy 5: Time = 1 + (100 * 10 / 10,000) = 1.1 seconds
24.10 Consider the following two relations:
Staff (staffNo, name, DOB, salary, deptNo)

Department (deptNo, deptName, managerStaffNo)
which are horizontally fragmented on the department number, deptNo. Assume there is an
integrity constraint that requires that every member of staff earns less than every manager in
the same department. Further assume that we wish to insert the tuple (‘S9100’, ‘John Smith’,
‘1-May-1960’, 30000, ‘D1’) into the Staff relation. Under what conditions can this constraint
be checked locally?
If a record has previously been inserted that satisfies the constraint, then that can be used as a basis
for a local integrity check. For example, suppose that department D1 already has an employee
S1000 whose salary is £40,000. Given that the constraint has not been violated by the existing
tuple, we know that the salary of every manager in department D1 is more than £40,000.
Therefore, it would be OK to insert a record with salary £30,000.
Student Project
Assignment – Distributed Database Analysis and Design
Objective
To undertake analysis of a commercial company’s database requirements, and to produce a distributed
database design based on those requirements.
Approach
It is essential to appreciate the importance of the analysis and design phase of a project and to complete this
before any implementation is carried out. The lectures are intended to familiarize you with the theory of
distributed database analysis and design, including fragmentation and allocation of fragments to sites,
although they cannot cover all aspects. The intention of this project is to make the distributed database
design process more realistic.
To this end, you are expected, firstly, to find a suitable business or organization with a requirement for
distributed sharing of data. The company does not necessarily need to have a distributed database at
present, or even have a current requirement for such a system. The objective is to demonstrate how the
company’s requirements could best be represented in a distributed database. Suitable companies may
include travel agents, estate agents, banks, insurance companies, retail chains, supermarket chains,
bookshops, video shops, etc.
You can then put theory into practice by investigating the needs of an existing company. You are asked to
interview relevant staff to discover the way in which the present system operates (which may be manual or
a centralized database system) and to determine the data and application requirements. You are free to
identify possible improvements and solutions to problems with the current system.
Finally, you must produce a comprehensive report for the analysis and design of your proposed distributed
system and include:
(i) analysis of the company’s data and application requirements;
(ii) a suitable fragmentation schema for the system;
(iii) in the case of primary horizontal fragmentation, give a minimal set of
predicates;
(iv) the reconstruction of global relations from fragments.
Any assumptions necessary to support your design must be clearly stated.

It may not be possible to achieve a complete database design within the time available. You should
therefore select carefully those areas you wish to consider.
Assessment
The assessment will be based on a written report and an oral justification of the design. Assessment will be
carried out on the following components of the work:
– Analysis of company data and application requirements (20)

– Distributed database design
– fragmentation schema (35)
– minimal set of predicates (10)
– reconstruction algorithm (5)
– Critical appraisal (20)
– Oral presentation (10)
Chapter 25 Distributed DBMSs – Advanced Concepts
25.1 The centralized two-phase commit protocol uses a series of timeouts to prevent unnecessary
blocking. Discuss the actions for both coordinator and participants when a timeout occurs.
Consideration should be given to the various stages of the commit protocol.
See Section 25.4.3.
25.2 Under the three-phase commit protocol, discuss how the coordinator and participants would
recover following a failure. Consideration should be given to the various stages of the commit
protocol.
Coordinator Failure
1. Failure in INITIAL state. The coordinator has not yet started the commit procedure.
Recovery in this case starts the commit procedure.
2. Failure in WAITING state. The coordinator has sent the prepare message and although
has not received all responses, the coordinator has not received an abort response. In this
case, recovery restarts the commit procedure.
3. Failure in DECIDED state. The coordinator has instructed the participants to globally
abort or commit the transaction. On restart, if the coordinator has received all
acknowledgements, it can complete successfully. Otherwise, it has to initiate the
termination protocol discussed above.
4. Failure in PRE-COMMITTED state. The coordinator has instructed the participants to
pre-commit the transaction. On restart, if the coordinator has received all
acknowledgements, it can send the global commit message. Otherwise, it can send the
pre-commit message again.
Participant Failure
The objective of the recovery protocol for a participant is to ensure that a participant process on
restart performs the same action as all other participants and that this restarting can be done
independently (i.e. without the need to consult either the coordinator or the other participants).
1. Failure in INITIAL state. The participant has not yet voted on the transaction. Therefore,
on recovery, it can unilaterally abort the transaction, as it would have been impossible for
the coordinator to have reached a global commit decision without this participants vote.
2. Failure in PREPARED state. The participant has sent its vote to the coordinator. In this
case, recovery is via the termination protocol discussed above.
3. Failure in ABORTED/COMMITTED states. Participant has completed the transaction.
Therefore, on restart, no further action is necessary.
4. Failure in PRE-COMMITTED state. Participant informs coordinator and waits for
global commit message.
25.3 Consider six transactions T1, T2, T3, T4, and T5 with:
– T1 initiated at site S1 and spawning an agent at site S2,

– T5 initiated at site S3.
The locking information for these transactions is shown in Table 1. Produce the local
wait-for-graphs (WFGs) for each of the sites. What can you conclude from the local
WFGs?
Transaction Data items locked by Data items transaction is Site involved in

transaction waiting for operations
T1 x1 x8 S1
T1 x6 x2 S2
T2 x4 x1 S1
T2 x5 S3
T3 x2 x7 S1
T3 x3 S3
T4 x7 S2
T4 x8 x5 S3
T5 x3 x7 S3
Table 1
T1 T2 T1 T4 T2 T3
T3 T4 T5
Site 1 Site 2 Site 3
Conclusion: There is no local deadlock at any site.
25.4 One of the most well-known methods for distributed deadlock detection was developed by
Obermarck. Explain how Obermarck’s method works and how deadlock is detected and resolved.
See Section 25.3.

25.5 Using the above transactions, demonstrate how Obermarck’s method for distributed deadlock
detection works.
Text
T1 T2 T1 T4 T2 T3
Text T3 Text T4 T5
Site 1 Site 2 Site 3
Cycle at site 1, so move WFG from Site 1 to site 3. The resulting WFG shows a cycle:
Text
T1 T2 T3
T4 T5
Sites 1 and 3
which implies system is in global deadlock and one of the transactions must be selected to be
aborted and restarted.
Student Project
Assignment – Distributed Database System Implementation
Introduction
The objective of the project for this module is to design and implement part of a Distributed Database
Management System (DDBMS).
The work is to be undertaken in groups – it will be left to yourselves to split into groups of 2/3. However, to
ensure an even distribution, please divide yourselves such that there is a member from each course stream.
Each group member has to supply his/her own critical appraisal with the final report.
Design and implement a distributed database, which contains the following base functionality:
 ability to fragment a relation/relations based on some specified predicate.

 ability to query the global data dictionary to determine the location of any relation/field/data
item.
 ability to enter an SQL query involving one or more base relations that have been fragmented
and for the DDBMS to return and display the resulting data.
 ability to enter an SQL INSERT/UPDATE/DELETE statement (or otherwise indicate that
data is to be inserted, updated or deleted) and for the DDBMS to write the data to the
appropriate fragments.
A possible implementation for the DDBMS would be based on Visual Basic for the front-end, ADO
(ActiveX Data Objects) for accessing the individual databases, and Access for the individual DBMSs.
However, this is only a suggestion and clearly different implementation environments (for example, Visual
C++, Java/JDBC) are possible.
Marking Scheme
Analysis and Design 30

Test Strategy 10
Implementation 25
Project Working 10
Demonstration 10
Part 7 Object DBMSs

Chapter 27 Object-Oriented DBMSs – Concepts and Design
27.1 Discuss why traditional transaction management protocols are too restrictive for advanced
database applications.
 Conventional transaction management systems (TMSs) synchronize simple read and

write operations. However, TMSs for advanced database applications must be able to
deal with abstract operations. It may even be possible to improve concurrency by
utilizing semantic knowledge about the objects and their abstract operations.
 Advanced database applications have different database access patterns than
conventional database applications where the access is competitive (e.g., two users
accessing the same bank account). Instead, sharing may be more cooperative as in the
case, for example, multiple users accessing and working on the same design document.
In this case, users accesses need to be synchronized, but users are willing to cooperate
rather than compete for access to shared objects.
 Conventional transactions access ‘flat’ objects (e.g., pages, tuples) whereas transactions
for advanced database applications may require synchronized access to composite and
complex objects. Synchronization of access to such objects requires synchronization of
access to the component objects.
 These applications require the support of long-duration transactions spanning hours,
days or even weeks (e.g., working on a design object). Therefore, the transaction
mechanism must support the sharing of partial results. Furthermore, to avoid the failure
of partial tasks jeopardizing a long activity, it is necessary to distinguish between those
activities that are essential for the completion of the transaction and those that are not,
and to provide for alternative actions in case the primary activity fails.
 These applications may benefit from active capabilities for timely response to events and
changes in the environment. This new database paradigm requires the monitoring of
events and the execution of system triggered activities within running transactions.
Case Study – Library System
The librarian will access the system to issue books, record reservations, generate return requests for
reserved books and record books being returned. Returning a book may involve generating an
availability notification if the returned book has been reserved by another member. If an availability
notification is generated then the librarian sends it out to the member. When a book is requested to be
issued or requested to be reserved then the system validates details on the membership card as well as
validating that the requested book is one stocked by the library. If the membership card is invalid or the
library does not stock the book then the request is rejected. If a book is requested for issue but is on-
loan then that book will be reserved. Members of the library can display details of books stocked. The
subscriptions section deals with membership issues e.g. renewing membership cards on an annual basis
etc. The purchasing section deals with adding new books to the library.
There may be multiple copies of books held in the library. When a book is borrowed the borrowing date
is noted and when the book is returned the return date is noted. The details held about each reservation
include who the current borrower(s) is and who has reserved the book.
When a copy of a book arrives from the suppliers, it is held in storage until its details are registered on
the system. These details include the number, author and title of the book. Once registered the copy is
put in the lending shelf which means that it is available to be borrowed. When a copy is returned, if the
book has been reserved then the copy is held in a reservation area otherwise it is returned to the lending
shelf. If a copy is reported lost by the borrower then the copy details are deleted from the system by the
librarian.
A reservation
Reservation
Number:
Date:
Book
Number:
Title:
Author:
Reserver
Number:
Name:
Address:
Current Borrower
Number:
Name:
Address:
27.2 Produce use case diagrams and a set of associated sequence diagrams for the above case
study.
The following are some example diagrams for the case study (produced in Rational Rose):
Remove Book Display Books
Member
<<extend>>
Generate Availabilty Note
Librarian
Return Book
<<include>>
Maintain Membership
Issue Book Validate Request Subscriptions
<<extend>>
<<include>>
Member
Add New Books
Purchasing
Reserve Book
Request Return
Supplier
Use Case Descriptions
Issue Book
VALIDATE REQUEST
if request valid then
while copies to be checked
if copy available
dispense book
end if
end while
if no copies available then
RESERVE BOOK
end if
end if
Validate Request
Librarian inputs member details
if membership invalid then
reject request
else
Librarian input book details
if book is not stocked then
reject request
else
accept request
end if
end if
Reserve Book
if details not already validated then
VALIDATE REQUEST
end if
if request valid then
record reservation details
end if
Reservation
Book
date 0..n
number author
title
0..n 1 reservation status
0..n number
is reserver
is borrower 1
1
Member 1..n
number
name
address
0..n
1..n
Copy
0..n loan status
Loan
loan date
return date
Version 1
: Librarian : Library : (Member) : Book : Copy : Loan

Controller
issue book
member details
validate membership (member details)
validation results
[invalid member]request rejected
book details
check stocked
validation results
[invalid book]request rejected
get status
* loan status
[copy available] <<create>>
set status (on loan)
[copy available]copy
[no copy available]request rejected
Sequence Diagram for Issue Book when copy available

1: issue book
2: member details : (Member)
6: book details
3: validate membership (member details)
: Librarian
5: [invalid member]request rejected

9: [invalid book]request rejected : Library 4: validation results
14: [copy available]copy Controller
15: [no copy available]request rejected
8: validation results
7: check stocked
11: * loan status 12: [copy available] <<create>>

: Book
10: get status

13: set status (on loan)
: Copy : Loan
Collaboration Diagram for Issue Book

Reservation Book
date 0..n author
number title
0..n reservation status
1 number
0..n
is reserver
is borrower check stocked()
1
1 1..n Library Controller
Member
issue book()
number
name
address
1..n
validate membership() Copy
loan status
get status()
set status()
Loan
loan date
return date
create()
Class Diagram with updates from Issue Book Sequence Diagram.

delivered/create
Waiting
Registration
registered
On Shelf
returned[ book not reserved ] / record return date

borrowed / record loan date
returned[ book reserved ] / record return date

On loan On
Reserve
lost/delete borrowed[ by reserver ] / record loan date
State Diagram for Copy

Reservation Book
date author
0..n
number title
reservation status
0..n 1 number
0..n
is reserver
check stocked()
is borrower get reservation status()
1
1 1..n Library Controller
Member
issue book()
number
name 1..n
address Copy
validate membership() loan status
registration status
create()
delete()
Loan get loan status()
set loan status()
loan date
set registration status()
return date
create()
record loan date()
record return date()
Class Diagram with updates from Copy State Diagram.

27.3 Give your definition of an Object-Oriented Database Management System (OODBMS). Discuss
what you consider to be the three most important advantages of a DDBMS over a relational
DBMS. Justify your selection of the three advantages.
See Sections 27.2 and 27.5. Expect some justification for the top three selection. Also expect
detailed discussion of the advantages, not just bullet points.
27.4 Despite the superior expressive power of the Object-Oriented Database Management System
(OODBMS) in comparison to the established relational systems, the acceptance of the
OODBMS will ultimately depend on its performance. The key to this may well lie with how
persistent objects are accessed. Discuss the design goals for the incorporation of persistence
in a programming language.
Design goals:
 Persistence should be orthogonal to type. Persistence should be a property of object instances

and not types. It should be possible to allocate objects of any type in either volatile or
persistent store.
 There should be no run-time penalty for code that does not deal with persistent objects.
 Allocation and manipulation of persistent objects should be similar to the manipulation of
transient objects. For example, it should be possible to move objects from persistent store to
volatile store and vice versa in much the same way as it is possible to move objects from the
stack to the heap and vice versa.
 Inadvertent fabrication of object identities should be prevented.
 Language changes should be kept to a minimum.
27.5 Object databases have roots in both programming languages and database management.
However, not all aspects of these two technologies blend together easily. One area of potential
conflict arises when we try to completely separate persistence and type.
(a) Define the orthogonality of persistence and type.
Two axes on a graph are said to be orthogonal if they are perpendicular, that is, if they
represent different dimensions. Similarly, persistence and type are considered to be
orthogonal if they do not influence each other in any way: any type can have persistent
or transient instances, and any behavior or characteristics that apply to a type are totally
independent of whether its instances are persistent or transient.
(b) Discuss this issue for any three of the following subsystems:
(1) Queries
(2) Schemas
(3) Transactions
(4) Existence Semantics.
Queries
Which objects should a query consider? The traditional database point of view is that
declarative queries should range over persistent objects. In contrast, the programming
language point of view is that persistence should be orthogonal to type and that the
programmer should treat both transient and persistent objects in exactly the same way.
This view results in the conclusion that the range of a query should include both
transient and persistent objects. Therefore, there should be no distinction that makes
query capability applicable only to persistent objects.
If queries should apply to both transient and persistent objects, then does a given user’s
query need to filter the transient objects of the desired type that have been created in
other user’s applications but that have not yet been committed to persistent storage? Or
does the user’s query only include in its scope the transient objects of the desired type
created by that run unit, along with appropriate persistent objects from the object
database?
Deciding that transient objects are included in query scopes complicates the design of
the query processor. For example, the DBMS may need to construct and maintain
indexes on transient as well as persistent objects. If queries include filtering of local
transient objects, then the query processor must execute on the client application side
and may even need to be distributed across the client and server processes.
Schema
The traditional database point of view is that there should be explicit declaration of
which types may have persistent instances. This viewpoint stems from a philosophy that
persistent data and transient data are inherently different. The programmer should
therefore be aware of when he is dealing with each. Database functionality, for example,
queries, indexes, transaction commits, versions, and so forth, apply only to persistent
objects. A DBMS should enforce semantic constraints, such as uniqueness and
referential integrity, only for persistent objects. Since an object DBMS uses class
specifications for its schema information, persistence capability should be explicitly
declared in those class specifications.
By contrast, the programming language point of view is that all classes should be
persistence-capable. There should be no distinction in either available functionality or
mode of interface between transient and persistent objects, therefore there is no need to
declare which classes are persistence-capable.
Transactions
Programming language and database people alike generally agree that the commit of a
transaction must guarantee that updates to persistent objects are durably written to
persistent storage. Similarly, when a transaction aborts, any persistent-object updates that
occurred during the transaction are guaranteed not to be written to the database. The
question is, what should happen to transient objects that were updated during an aborted
transaction? Should these updates also be undone? Should object DBMSs support
transaction-consistent transient objects?
The traditional database point of view ignores updates to transient objects. Because a
DBMS does not manage transient objects, it does not log their updates and cannot undo
them. Whatever changes were made in program memory variables remain, even if the
transaction aborts. It is the application’s responsibility to manage whatever clean-up may
be required for program variables.
By contrast, the programming language view requiring consistency in treatment of

transient and persistent objects leads to an expectation that the DBMS will also undo
updates to transient objects. In fact, relatively strange things can happen if transient
updates are not undone and the programmer is not careful. These anomalies commonly
occur as bugs, for example, as variables that are not properly reset or counters that are
not adjusted to account for aborted iterations.
Existence Semantics
How long does an object exist and when is it actually deleted? This issue shows up when
programmers use an object database with Smalltalk. In Smalltalk, an object exists if it is
reachable, that is, if some other object references it. A Smalltalk object is
garbage-collected, that is, removed from the Smalltalk Image, when it is no longer
reachable. The traditional database point of view, however, is that an object exists until it
is explicitly deleted. Therefore, Smalltalk and traditional DBMSs have different
existence semantics – different understandings of what is necessary for an object to exist.
In Smalltalk, an object exists if it is reachable. To a DBMS, an object exists because it
exists.
One particular issue for the object DBMS vendor who wants to support Smalltalk is
determining what the object DBMS for Smalltalk should include in an extent, which is
the set of all instances of a class. Extents are commonly used as the scopes for queries. If
the only reference to an object is because it is part of a class extent, does the object exist?
One approach is for the object DBMS to treat the extent as a special kind of collection,
the semantics of which include making an object eligible for garbage collection if its
only reference is from the extent collector.
A related issue arises with the notion of keys, which are uniqueness constraints. A key is
an (possibly compound) attribute, the value of which is unique across an extent. If a
class has a defined key, the object DBMS should check that a new instance of that class
does not violate the uniqueness constraint. Should it check both transient and persistent
instances of the class? Should it check both transients local to this process and other
transient instances of the class? Should it check again against only persistent instances at
the end of the transaction? What if there is conflict with an about-to-be-garbage-
collected instance’s key value?
27.6 Discuss the concept of object identifiers (OIDs) in an object DBMS and discuss four different
approaches for the representation of OIDs.
OIDs are used both by application programs for referencing objects and for representing relations
between objects. The choice of the type of representation of OIDs can also influence the
performance of an ODBMS. OIDs can be represented in different ways (for a review of the
different approaches proposed, see Khoshafian and Copeland (1986)).
An OID can be physical or logical. The former contains the actual address of the object, whereas
the latter is an index from which the address of the object is obtained. Different approaches have
been proposed for the representation of both physical and logical OIDs, thereby producing at least
four types of OID.
 Physical address
The OID is the physical address of the object. This representation, normally used by
programming languages, has the advantage of being very efficient, but it is rarely used in
an ODBMS since, if a given object is moved or deleted, all the objects containing its
OID must be modified.
 Structured address
The OID consists of two parts – the first contains the segment number and the page
number, thus making it possible to obtain quickly the address on the disk to be read,
whereas the second part contains a logical slot number, which is used to determine the
position of the object within the page. With this representation, the object can be
relocated within the page simply by changing the slot array, or it can be moved to
another page by inserting its forward address in the slot array.
 Surrogate
The OID is generated by using an algorithm which guarantees its uniqueness (for
example, the time and date, or a monotonically increasing counter). Surrogate OIDs are
then transformed into the object’s physical addresses, normally by using an index.
 Typed surrogates
A variant of the surrogate for representing OIDs involves having both a type identifier
(type ID) and a portion of the object identifier. A different counter generates the object
identifier portion for each type. Thus, the address space is segmented. Moreover, the type
identifier in the object’s OID allows us to determine the object type without retrieving
the object from the disk.
27.7 Discuss how OIDs differ from C++ pointers.
To C++, an object identifier is an address in process memory space. This space is too small for
most database purposes. To an object DBMS, an object identifier cannot be just a memory
address. Scalability requires that object identifiers be valid across storage volumes. Distributed
object databases require that object identifiers be valid across machine boundaries. From the
database management perspective, an object identifier must be a unique identifier that persists
with an object for its entire lifetime, regardless of where it may be stored or moved or how it is
being used. The object DBMS can then use object identifiers as the basis of references used to
implement relationships.
However, from the programming language perspective, there should be no need to introduce
reference syntax to supplement pointers. Pointers should be used instead of object identifiers or
object identifiers should simply behave like pointers, even if the object DBMS eventually
converts the addresses to the equivalent of object identifiers with a larger scope.
Traversal paths are not the same as C++ pointers. First, an object DBMS creates and deletes
traversal paths in pairs. In contrast, just because one C++ object points to another does not
mean there is a reverse pointer. Second, it is the responsibility of an object DBMS to maintain
the referential integrity of relationships. If an object that participates in a relationship is

deleted, a subsequent attempt to traverse the relationship (in either direction) should raise an
exception rather than referencing invalid space. C++ associates no semantics with following a
pointer. An application is allowed to dereference any of its pointers, regardless of whether they
point to meaningful space.
An object DBMS cannot even use pointers directly to represent relationships, because by
definition pointers are not location-independent. Because relationships are logical, they must
remain valid even when the associated objects move. When an object is moved in memory, its
address changes. A pointer to that object now points to something else. However, when an
object is moved in memory, all of its relationships must retain their validity. This must be true,
even if the object is moved from main memory to disk, or to a different disk volume, or to a
different site on the network. Object identifiers, which are location-independent can also use
object identifiers to implement relationships.
27.8 (a) ‘Pointer swizzling’ is a technique that can be used to optimize access to main memory
resident persistent objects. Discuss in general terms how pointer swizzling works.
The basic objective of pointer swizzling is to convert main memory pointers representing
interobject references to disk pointers when an object is being written to disk and subsequently to
convert disk pointers to main memory pointers when the object is being read back in again. Thus,
the program language can access the interobject references as though they are normal pointers,
thereby improving performance.
(b) Classify and evaluate the different approaches to pointer swizzling.
Three dimensions:
(i) eager v. lazy
(ii) direct v. indirect
(iii) copy v. in-place
1. Eager: Guarantees that all the pointers in the main memory are swizzled. When an object is
loaded from disk, the object is scanned through and all the pointers in the object are
immediately swizzled.
2. Lazy: Only swizzles pointers on demand; i.e., a pointer is not swizzled until the object it refers
to is accessed via this particular pointer.
Advantage is that that no pointers are swizzled unnecessarily. On negative side, lazy swizzling
must handle two different kinds of pointers at run-time: swizzled and non-swizzled.
3. Direct: Requires that the referenced object be resident in memory. A directly swizzled pointer
contains the main memory address of the object it references.
Problem is that if an object is displaced from the system buffer (i.e., is no longer resident) all the
directly swizzled objects that reference the displaced object need to be unswizzled.
In the case of eager direct swizzling, one cannot simply unswizzle pointers because eager
swizzling guarantees that all pointers in the buffer are swizzled. Instead, these pointers (i.e., their
home objects) must be displaced too – possible snowball effect.
4. Indirect: Permits the swizzling of pointers that reference non-resident objects.
Induces an additional overhead when it comes to simple object lookups – leads to an additional
level of indirection (due to existence of a descriptor that stores the main memory address of
object) and to a residency check, a check of whether the descriptor is valid or not. For direct
swizzling, the information that an object is resident is coded in the swizzled pointer.
5. Copy: When faulting objects in, data is copied into the application’s local object cache.
6. In-place: When faulting objects in, data is accessed within the data manager’s cache.
Copy swizzling may be more efficient as, in worst case, only modified objects have to be
swizzled back to their OIDs, whereas with in-place may have to unswizzle an entire page of
objects if one object on the page is modified. On other hand, with copy approach, every object
must be explicitly copied into the local object cache.
(c) Briefly discuss any alternative approach that could be used to handle persistence in an
ODBMS.
Several alternative strategies (e.g.):

 Leave it to user.
 Map to relational system
 Use a one pointer strategy that leaves it to the run-time system to determine whether
object is transient or volatile. Could trap references to persistent objects, for example, by
using illegal values such as negative values for pointers to persistent objects.
27.8 (a) Discuss why traditional transaction management protocols are too restrictive for
advanced database applications.
 Conventional transaction management systems (TMSs) synchronise simple read and

write operations. However, TMSs for advanced database applications must be able to
deal with abstract operations. It may even be possible to improve concurrency by
utilizing semantic knowledge about the objects and their abstract operations.
 Advanced database applications have different database access patterns than
conventional database applications where the access is competitive (e.g., two users
accessing the same bank account). Instead, sharing may be more cooperative as in the
case, for example, multiple users accessing and working on the same design document.
In this case, users accesses need to be synchronized, but users are willing to cooperate
rather than compete for access to shared objects.
 Conventional transactions access ‘flat’ objects (e.g., pages, tuples) whereas transactions
for advanced database applications may require synchronized access to composite and
complex objects. Synchronization of access to such objects requires synchronization of
access to the component objects.
 These applications require the support of long-duration transactions spanning hours,
days or even weeks (e.g., working on a design object). Therefore, the transaction
mechanism must support the sharing of partial results. Furthermore, to avoid the failure
of partial tasks jeopardizing a long activity, it is necessary to distinguish between those

activities that are essential for the completion of the transaction and those that are not,
and to provide for alternative actions in case the primary activity fails.
 These application may benefit from active capabilities for timely response to events and
changes in the environment. This new database paradigm requires the monitoring of
events and the execution of system triggered activities within running transactions.
(c) ‘Only Object-Oriented Database Management Systems can support alternative

transaction management protocols effectively.’ Discuss the validity of this statement.
There should be no reason why conventional systems cannot handle such protocols. Main reason
why ODBMS systems handle them is because they were necessary for the types of applications
the systems were originally designed for.
Chapter 28 Object-Oriented DBMSs – Standards and Systems
28.1 Discuss the Object Model proposed by the Object Data Management Group.
See Section 28.2.2.
Case Study 1 – Cornucopia Ltd
Cornucopia Ltd is a large, multinational oil company that uses contractors for systems analysis
whenever possible. The following data is held about the contracts:
(a) Each contract consists of a contract name and contract number, the name of the main
contracting company, the names of any other companies involved in the contract, the
start and scheduled end date for the contract, the budget for the contract and the name of
the project manager for the contact.
(b) Each contract consists of a number of tasks, each with an activity name, a start and
scheduled end date, a task leader, a work group and a set of deliverables.
(c) A work group consists of a group code and a list of staff. A group leader is identified for
each group.
(d) Deliverables take the form of documents, consisting of a document code, a document
title, the names of the authors, a list of people the document is to be distributed to, an
issue date and the name of the person who has approved the document. Documents
come in two forms: requirements specifications and functional specifications.
(e) Company data consists of the name of the company, the technical head of staff, the
administrative head of staff, the company’s main address (street, city, town, and
postcode), a telephone, and fax number.
(f) Each company is divided into a number of divisions, consisting of an address (street,
city, town, and postcode), and a divisional head of staff.
(g) Staff data consists of a name, address, and a job title, which determines the hourly cost
for the member of staff.
28.2 Using the Object Definition Language (ODL) of the ODMG object model, define the interface of
the object schema’s types for this case study. For each type, define at least one method that you
consider appropriate. State any assumptions necessary to support your design.
See schema diagram on following page for possible solution. Specification for the Project class
shown below.
class Project {
(extent projects key projectName)
attribute string projectName;

attribute date startDate;
attribute date scheduledEndDate;
attribute float budget;
relationship Company IsMainContractor inverse Company::HasMainContract;
relationship set<Company> IsContractingCompany inverse Company::HasContract;
relationship set<Task> ConsistsOf inverse Task::PartOf;
relationship Person Manages inverse Person::ManagerFor;
/* Define operations */
...}
28.3 Write the following query in the Object Query Language (OQL) of the ODMG object model:
‘List the names of all documents delivered under the project named “SCCS”.’
SELECT title:x.title
FROM z IN Project,
y IN z.ConsistsOf,
x IN y.HasDeliverables
WHERE z.projectName = ‘SCCS’
Project Company Division
projectName: STRING tradingName: STRING address: struct STRING

mainContractingCompany technicalHead head
contractingCompanies* admHead
startDate: DATE address: struct STRING
scheduledEndDate: DATE fax: STRING
budget: FLOAT tel: STRING
task divisions*
projectManager
Task Person
Title
nameActivity: STRING name: struct STRING
contractingCompany company
taskLeader division code: STRING
workGroup title hourlyCost: FLOAT
startDate: DATE address: struct STRING
schedEndDate: DATE
deliverables
Group
groupCode: STRING
members*
head
Document
docCode: STRING
title: STRING
authors*
FuncSpec
listDistrib*
issueDate: DATE
approvedBy
RequirementsAnalysis
issueDate: DATE
approvedBy
Example Schema for Cornucopia Ltd

Perilous Printing Ltd is a small printing company that does work for book publishers. Its jobs
consist of printing books or parts of books. The following data is held about the work:
(a) The company does work for many different publishing houses. The data on each publisher
consists of a name, an address (street, town, city, and postcode), a telephone, and fax number.
(b) Each printing job consists of a due date, a description and a job type (rush job, normal job, or
fill-in). Each printing job is associated with one or more members of staff.
(c) A printing job requires the use of several materials, such as paper and ink. The company holds
details of the amount of stock of each type of material that they have on hand together with the
price and a reorder level.
(d) Materials are assigned to a job via purchase orders. Each printing job may have several
purchase orders assigned to it. Purchase orders identify a vendor and a date for the purchase.
(e) Likewise, each purchase order may contain several purchase order items, containing the
quantity of material required.
(f) Company data consists of the name of the company, the technical head of staff, the
administrative head of staff, the company’s main address (street, town, city, and postcode), a
telephone and fax number.
(g) Staff data consists of a name (first and last name), address, and a job title.
28.4 Using the Object Definition Language (ODL) of the ODMG object model, define the interface
of the object schema’s types for this case study. For each type, define at least one method that
you consider appropriate. State any assumptions necessary to support your design.
See figure below for a possible schema. Class specification for Publisher shown below.
class Publisher {
(extent publishers key pubName)
attribute string pubName;

attribute struct PAddress {string street, string town, string city, string postcode}address;
attribute string tel;
attribute string fax;
relationship set<Job> HasJob inverse Job::ForPublisher;
...}
‘List the surnames of all people associated with printing jobs for Addison Wesley Longman.’
SELECT surname:s.lName
FROM p IN Publisher,
j IN p.HasJob,
s IN j.HasStaff
WHERE p.pubName = ‘Addison Wesley Longman’
Publisher Company
pubName: STRING tradingName: STRING

address: struct STRING technicalHead
fax: STRING admHead
tel: STRING address: struct STRING
fax: STRING
tel: STRING
jobs*
Job Staff
jobNo: STRING name: struct STRING

description:STRING title
respStaff* address: struct STRING
jobType: CHAR
dueDate: DATE
publisher
PurchaseOrder*
Material
type: STRING
stockLevel: FLOAT
price: FLOAT
PurchaseOrder reorderLevel: FLOAT
vendor: STRING
dateOfPurchase: DATE
items*
POItem
quantityRequired: FLOAT
material
Example Schema for Perilous Printing

Case Study 3 – Perfect Pets
A practice called Perfect Pets provides private health care for domestic pets throughout
America. This service is provided through various clinics located in the main cities of
America. The Director has provided the following description of the current system.
Perfect Pets has many veterinary clinics located in the main cities of America. The details of
each clinic include the clinic number, clinic address (consisting of the street, city, state, and
zipcode), and the telephone and fax numbers. Each clinic has a Manager and a number of staff
(for example, vets, nurses, secretaries, cleaners). The clinic number is unique throughout the
practice.
The details stored on each member of staff include the staff number, name (first and last),
address (street, city, state, and zipcode), telephone number, date of birth, sex, social security
number (SSN), position, and current annual salary. The staff number is unique throughout the
practice.
When a pet owner first contacts a clinic of Perfect Pets the details of the pet owner are
recorded, which include an owner number, owner name (first name and last name), address
(street, city, state, and zipcode), and home telephone number. The owner number is unique to a
particular clinic.
The details of the pet requiring treatment are noted, which include a pet number, pet name,
type of pet, description, date of birth (if unknown, an approximate date is recorded), date
registered at clinic, current status (alive/deceased), and the details of the pet owner. The pet
number is unique to a particular clinic.
When a sick pet is brought to a clinic, the vet on duty examines the pet. The details of each
examination are recorded and include an examination number, the date and time of the
examination, the name of the vet, the pet number, pet name, and type of pet, and a full
description of the examination results. The examination number is unique to a particular
clinic. As a result of the examination, the vet may propose treatment(s) for the pet.
Perfect Pets provides various treatments for all types of pets. These treatments are provided at
a standard rate across all clinics. The details of each treatment include a treatment number, a
full description of the treatment, and the cost to the pet owner. For example, treatments
include:
T123 Penicillin antibiotic course $50.00

T155 Feline hysterectomy $200.00
T112 Vaccination course against feline flu $70.00
A standard rate of $20.00 is charged for each examination, which is recorded as a type of
treatment. The treatment number uniquely identifies each type of treatment and is used by all
Perfect Pets clinics.
Based on the results of the examination of a sick pet, the vet may propose one or more types
of treatment. For each type of treatment, the information recorded includes the examination
number and date, the pet number, name and type, treatment number, description, quantity of
each type of treatment, and date the treatment is to begin and end. Any additional comments
on the provision of each type of treatment are also recorded.
28.6 Using the Object Definition Language (ODL) of the ODMG object model, define the interface
of the object schema’s types for this case study. For each type, define at least one method that
you consider appropriate. State any assumptions necessary to support your design.
See figure below for a possible schema. Class specification for Pet shown below.
class Pet {
(extent pets key petNo)
enum pStatus {Alive, Deceased};

attribute string petNo;
attribute string petName;
attribute string petType;
attribute string petDescription;
attribute date pDOB;
attribute date dateRegistered;
attribute pStatus petStatus;
relationship set<Examination> HasExamination inverse Examination::ExaminationFor;
relationship <Clinic> RegisteredWith inverse Clinic::Registers;
relationship <PetOwner> OwnedBy inverse PetOwner::Owns;
...}
Has
PetOwner Clinic Staff
IsContactedBy 1..1 1..*
1..* 1..1
0..1 1..1
1..1 1..1 1..1
Owns
Registers
1..*
Performs
Pet
petNo
1..*
1..1
Undergoes
1..*
Examination 0..*
examNo
treatNo 1..1 1..* 1..*
List the names of all pets that received the T123 treatment.
SELECT petName:p.petName
FROM t IN Treatment,
pt IN t.UsedIn,
e IN pt.ResultsFrom,
p IN e.Underwent
WHERE t.treatNo = ‘T123’
Student Project
Assignment – Persistence in a Programming Language
Introduction
The objective of the ODBMS project is to design and implement persistence in a programming language.
The work is to be undertaken in groups – it is left to yourselves to split into groups of 2/3.
The aim of this project is to develop a method that will allow persistent objects to be automatically stored
and read as required. The solution should cope with standard object concepts, such as inheritance, and
also provide for interobject relationships, thereby providing basic database functionality.
It is acceptable to use an existing DBMS to store objects, rather than getting involved in writing disk I/O
software for paging, indexing and clustering. For example, Microsoft Access could be used to store
objects from Visual C++ using ADO (ActiveX Data Objects) or ODBC (Open Database Connectivity). It
is not acceptable to use MFC serializability classes (such as CArchive).
To demonstrate your particular implementation, a small application should be developed which handles
objects and relationships. For example, you may produce an application for corresponding to a data model
for course administration, consisting of entities Course, Module, Student, Lecturer, Department, etc.
where Student and Lecturer are derived from a superclass Person, to exhibit inheritance.
Marking Scheme
Design (including use of correct algorithms) 30

Basic persistence 10
Inheritance 10
Interobject relationships 10
Testing Strategy 5
Implementation 30
Basic persistence 10
Inheritance 10
Interobject relationships 10
Project Working 10
Demonstration 10
Part 8 Web and DBMSs

Chapter 29 Web Technology and DBMSs
29.1 The World Wide Web is a distributed information system based on hypertext. Discuss how the two-
tier client–server architecture may not be entirely suitable for this environment and propose an
alternative architecture.
Discussion of traditional Two-Tier Client–Server Architecture versus Three-Tier Architecture

– see Section 3.1.
29.2 Discuss what you consider to be the four most important advantages and the four most important
disadvantages of the World Wide Web as a distributed information system. Justify your selection
in each case.
See Section 29.2.7, but looking for justification for advantages and disadvantages selected, not
just regurgitation.
Chapter 30 Semistructured Data and XML
30.1 Despite the excitement surrounding XML, it is important to note that most operational
business data, even for new Web-based applications, continues to be stored in relational
DBMSs. This is unlikely to change in the foreseeable future because of their reliability,
scalability, tools, and performance. Consequently, if XML is to fulfil its potential, some
mechanism is required to publish relational data in the form of XML documents. The
SQL:2003 standard has defined extensions to SQL to enable the publication of XML,
commonly referred to as SQL/XML. Discuss in detail these extensions.
SQL/XML contains:
 a new native XML data type, XML, which allows XML documents to be treated as
relational values in columns of tables, attributes in user-defined types, variables, and
parameters to functions;
 a set of operators for the type:

 XMLELEMENT, to generate an XML value with a single element as a
child of its root item. The element can have zero or more attributes specified
using an XMLATTRIBUTES subclause.
 XMLFOREST, to generate an XML value with a list of elements as
children of a root item.
 XMLCONCAT, to concatenate a list of XML values.
 XMLPARSE, to perform a non-validating parse of a character string to
produce an XML value.
 XMLROOT, to create an XML value by modifying the properties of the
root item of another XML value.
 XMLCOMMENT, to generate an XML comment.
 XMLPI, to generate an XML processing instruction.
 XMLSERIALIZE, to generate a character or binary string from an XML
value;
 XMLAGG, an aggregate function, to generate a forest of elements from a
collection of elements.
 an implicit set of mappings from relational data to XML. The mapping may take as its
source an individual table, all the tables in a particular schema, or all the tables in a given
catalog. The standard does not specify a syntax for the mapping; instead it is provided for
use by applications and as a reference for other standards. The mapping produces two
XML documents: one that contains the mapped table data and the other that contains an
XML Schema describing the first document.
30.2 Provide XQuery expressions for the following queries based on the sample XML file (books.xml)
below.
<?xml version= “1.0” encoding= “UTF-8” standalone= “yes”?>

<?xml:stylesheet type = “text/xsl” href = “bib_list.xsl”?>
<bib>
<book year = “2005”>
<title>Database Systems</title>
<author><last>Connolly</last><first>Thomas</first></author>
<author><last>Begg</last><first>Carolyn</first></author>
<publisher>Addison Wesley</publisher>
<price>99.00</price>
</book>
<title>Database Solutions</title>
<author><last>Connolly</last><first>Thomas</first></author>
<author><last>Begg</last><first>Carolyn</first></author>
</book>
<title>Data on the Web</title>
<author><last>Abiteboul</last><first>Serge</first></author>
<author><last>Buneman</last><first>Peter</first></author>
<author><last>Suciu</last><first>Dan</first></author>
<publisher>Morgan Kaufmann</publisher>
</book>
<title>Modern Database Systems</title>
<editor><last>Kim</last><first>Won</first></editor>
</book>
</bib>
(a) List the title of the first book.
doc(“books.xml”)/book[1]/title
This produces: <title>Database Systems</title>
(b) List the titles of all the books along with a count of the number of books.
<titles count = “{ count(doc(‘books.xml’)//title) }”>

{
doc(“books.xml”)//title
}
</titles>
This produces:
<titles count = “4”>

</titles>
(c) List the title and price of each book published in the year 2003.
for $b in doc(“books.xml”)//book
where $b/@year = “2003”
return $b/title
This produces: <title>Database Solutions</title>
(d) List the title and numbers of authors of each book.
let $c := $b/author
return <book> { $b/title, <count> { count($c) } </count> } </book>
This produces:
<book>
<count>2</count>
</book>
<book>
<count>2</count>
</book>
<book>
<count>3</count>
</book>
<book>
<count>0</count>
</book>
(e) List the titles of books whose price is less than $60.
where $b/price < 60
return $b/title
This produces: <title>Data on the Web</title>

(f) List the titles of books in alphabetical order.
for $t in doc(“books.xml”)//title
order by $t
return $t
This produces:

(g) List the authors, sorted in reverse order of surname, then first name.
for $a in distinct-values(doc(“books.xml”)//author)
order by $a/last descending, $a/first descending
return $a
This produces:
<author>
<last>Suciu</last>
<first>Dan</first>
</author>
<author>
<last>Connolly</last>
<first>Thomas</first>
</author>
<author>
<last>Buneman</last>
<first>Peter</first>
</author>
<author>
<last>Begg</last>
<first>Carolyn</first>
</author>
<author>
<last>Abiteboul</last>
<first>Serge</first>
</author>
(h) List the authors as a single string, sorted in reverse order of surname, then first name.
for $a in distinct-values(doc(“books.xml”)//author)
order by $a/last descending, $a/first descending
return <author> { string($a/first), “ “, string($a/last) }</author>
Note that the order by clause specifies conditions based on data that is not used in the return
clause. This produces:
<author>Dan Suciu</author>
<author>Thomas Connolly</author>
<author>Peter Buneman</author>
<author>Carolyn Begg</author>
<author>Serge Abiteboul</author>
(I) List the titles of books that have at least one author called Thomas Connolly.
where some $a in $b/author
satisfies ($a/first = “Thomas” and $a/last = “Connolly”
return $b/title
This query uses the XQuery existential qualifier, which tests whether at least one item
satisfies a condition. This produces:
<title>Database Solutions </title>
(j) List the titles of books that have every author called Thomas Connolly.
where every $a in $b/author
satisfies ($a/first = “Thomas” and $a/last = “Connolly”)
return $b/title
This query uses the XQuery universal qualifier, which tests whether every node in a sequence
satisfies a condition. This produces:
Note in this case that the title of the last book is returned. This book had no authors specified
(only an editor). In this case, the expression $b/author evaluates to an empty sequence. If a
universal quantifier is applied to an empty sequence, it always returns true, because every
item in that (empty) sequence satisfies the condition (even though there are no items).
(k) List books by author.
<authors>
{
let $a := doc(“books.xml”)//author
for $l in distinct-values($a/last), $f in distinct-values($a[last=$l]/first)
order by $l, $f
return
<author>
<name> { $l, “, “, $f } </name>
{
for $b in doc(“books.xml”)/bib/book
where some $ba in $b/author
satisfies ($ba/last = $l and $a/first = $f)
order by $b/title
return $b/title
}
<author>
}
</authors>
This produces:
<authors>
<author>
<name>Abiteboul, Serge</name>
</author>
<author>
<name>Begg, Carolyn</name>
</author>
<author>
<name>Buneman, Peter</name>
</author>
<author>
<name>Connolly, Thomas</name>
</author>
<author>
<name>Suciu, Dan</name>
</author>
</authors>
(l) Test whether the most expensive book is also the book with the largest number of
authors/editors.
let $aCount := for $b in doc(“books.xml”)//book

order by count($b/author) + count($b/editor)
return $b
let $bPrice := for $b in doc(“books.xml”)//book
order by $b/price
return $b
return $aCount[last()] is $bPrice[last()]
Note, this expression performs a node comparison using the IS test; the last()
function determines whether a node is the last node in a sequence.
(m) List the books where Begg is an author but is not listed as the first author.
let $a := ($b/author)[1], $aBegg := ($b/author)[last = “Begg”]
where $a << $aBegg

return $b
Note, the operator << returns true if the first operand precedes the second operand
in document order.
(n) List the titles of books that are more expensive than the average book price.
let $b := doc(“books.xml”)//book
let $avgPrice := average($b//price)
return $b[price > $avgPrice]/title
For our XML document, this produces the titles of the first two books, Database Systems and
Database Solutions.
30.3 State the underlying type of the following expressions:

(a) “hello” (i) 3.6e1 + 69
(b) 36 (j) 36.00 + 6.9e1
(c) 36.00 (k) “abc” + “def”
(d) 3.6e1 (l) “36” + 69.00
(e) 36 + 69 (m) “abc” < “def”
(f) 36.00 + 69.00 (n) 1<2
(g) 3.6e1 + 6.9e1 (o) 1 < 2.0
(h) 36 + 69.00 (p) “abc” < 2
Database Systems: Instructor’s Guide - Part V
(a) xs:string
(b) xs:integer
(c) xs:decimal
(d) xs:double
(e) xs:integer
(f) xs:decimal
(g) xs:double
(h) xs:decimal
(i) xs:double
(j) xs:double
(k) this is a static type error
(l) this is a static type error
(m) xs:boolean
(n) xs:boolean
(o) xs:boolean
(p) this is a static type error
30.4 State the underlying type of the following expressions:

(a) let $a := 1 + 2 return $a + $a
(b) let $a := 1e0 + 1.0 return $a + $a
(c) let $a as xs:integer := 1 + 2 return $a + $a
(d) let $a as xs:decimal := 1 + 2 return $a + $a
(e) let $a as xs:double := 1 + 2 return $a + $a
(f) let $a := xs:double(1 + 2) return $a + $a
(g) let $a as xs:double := xs:double (1 + 2) return $a + $a
(h) let $a as xs:decimal := 1, $b as xs:integer := $a + $a return $b + $b
(i) if ($a < $b) then 3 else 4 (assume $a and $b are xs:integer)
(j) if ($a < $b) then “abc” else 4.00 (assume $a and $b are xs:integer)
(k) if ($a < $b) then 3.6e1 else 4 (assume $a and $b are xs:integer)
(l) (if ($a < $b) then 3.6e1 else 4) + 0.5 (assume $a and $b are xs:integer)
(d) xs:integer (1 + 2 has type xs:integer, so variable $a has type xs:integer and return expression is also
xs:integer)
(e) xs:double
(f) xs:integer
(g) xs:decimal (this is well typed because xs:integer is a subtype of xs:decimal)
(h) this is a static type error (not xs:integer is not a subtype of xs:double)
(i) xs:double
(j) xs:double
(k) this is a static type error
(l) xs:integer
(m) (xs:string | xs:decimal)
(n) (xs:double | xs:integer)
(o) (xs:double | xs:decimal)
Part 9 Business Intelligence

Chapter 31 Data Warehousing Concepts
31.1 ‘Data warehouse and the underlying support for informational processing will emerge as a growing
trend in the 1990s. With the advent of the data warehouse, some basic ideas about data management
will change’ (Inmon, W.H., 1993). Briefly discuss why the emergence of the data warehouse
phenomenon is causing such interest in the business world.
See introductory paragraphs to Chapter 31 for general introduction to data warehousing and Sections
31.1.1 for evolution of data warehousing and 31.1.3 for the potential benefits to an organization.
31.2 ‘A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in
support of management’s decision-making process’ (W.H. Inmon, 1993). Describe the major
characteristics of the data held in a data warehouse.
See Section 31.1.2.
31.3 Discuss the relationship between online transaction processing (OLTP) and data warehousing and identify
the major differences between these systems.
See Section 31.1.4
31.4 Describe the architecture and major components of a data warehouse.
See Section 31.2.
31.5 Discuss the current issues associated with the development and management of a data warehouse.
See Section 31.1.5.
31.6 Discuss the major problems associated with the design, development, and management of a data
warehouse.
See Sections 31.1.5 and for a discussion on the management of data warehouse meta-data see Section
31.4.3.
31.7 Describe how data marts differ from a data warehouse and identify the major issues associated with the
development and management of data marts.
See Section 31.5.
31.8 Explain why businesses have shown a growing interest in data warehousing in recent years.
Since the 1970s, organizations have mostly focused their investment in new computer systems that
automate business processes. In this way, the businesses gained competitive advantage through systems
that offered more efficient and cost-effective services to the customer. Throughout this period,
businesses accumulated growing amounts of data stored in their operational databases. However, in
recent times, where such systems are common place, businesses are focusing on ways to use
operational data to support decision-making, as a means of gaining competitive advantage.
The successful implementation of a data warehouse can bring major benefits to an organization
including:
Potential high returns on investment

Competitive advantage
Increased productivity of corporate decision-makers
31.9 Discuss the reasons why an organisation may have one or more Online Transaction Processing
(OLTP) system but only a single data warehouse.
An organisation may have several OLTPs due to the range and complexity of the business processes
that support the business. Several OLTP may also be the result of mergers and acquisitions. It is also
possible for an organisation with a single focus to be well supported by a single OLTP. However the
goal for an organisation is to have a single repository of data to support analysis i.e. data warehouse
that offers a single version of the truth.
31.10 “A subject-oriented, integrated, time-variant, and non-volatile collection of data in support of

management’s decision-making process.” (Inmon, 1993).
Discuss what this statement is saying about the data in a data warehouse and contrast the purpose of
such systems with OLTP systems.
Subject-oriented Data
The warehouse is organized around the major subjects of the enterprise (e.g. customers, products, and
sales) rather than the major application areas (e.g. customer invoicing, stock control, and product sales).
This is reflected in the need to store decision-support data rather than application-oriented data.
Integrated Data
The data warehouse integrates corporate application-oriented data from different source systems, which
often includes data that is inconsistent. The integrated data source must be made consistent to present a
unified view of the data to the users.
Time-variant Data
Data in the warehouse is only accurate and valid at some point in time or over some time interval.
Time-variance is also shown in the extended time that the data is held, the implicit or explicit
association of time with all data, and the fact that the data represents a series of snapshots.
Non-volatile Data
Data in the warehouse is not normally updated in real-time (RT) but is refreshed from operational
systems on a regular basis. (However, emerging trend is towards RT or NRT DWs). New data is always
added as a supplement to the database, rather than a replacement.
A DBMS built for Online Transaction Processing (OLTP) is generally regarded as unsuitable for data
warehousing because each system is designed with a differing set of requirements in mind. For
example, OLTP systems are designed to maximize the transaction processing capacity, while data
warehouses are designed to support ad hoc query processing. An organization will normally have a
number of different OLTP systems for business processes such as inventory control, customer
invoicing, and point-of-sale. These systems generate operational data that is detailed, current, and
subject to change. The OLTP systems are optimized for a high number of transactions that are
predictable, repetitive, and update intensive. The OLTP data is organized according to the requirements
of the transactions associated with the business applications and supports the day-to-day decisions of a
large number of concurrent operational users. In contrast, an organization will normally have a single
data warehouse, which holds data that is historic, detailed, and summarized to various levels and rarely
subject to change (other than being supplemented with new data). The data warehouse is designed to
support relatively lower numbers of transactions that are unpredictable in nature and require answers to
queries that are ad hoc, unstructured, and heuristic. The warehouse data is organized according to the
requirements of potential queries and supports the long term strategic decisions of a relatively low
number of managerial users.
31.11 Explain why businesses have shown a growing interest in technologies that support their decision
makers.
Since the 1970s, organizations have mostly focused their investment in new computer systems that
automate business processes. In this way, the businesses gained competitive advantage through systems
that offered more efficient and cost-effective services to the customer. Throughout this period,
businesses accumulated growing amounts of data stored in their operational databases. However, in
recent times, where such systems are common place, businesses are focusing on ways to use
operational data to support decision-making, as a means of gaining competitive advantage.
The successful implementation of a decision support technology can bring major benefits to an
organisation including:
Growing number of vendors are providing a range of BI tools to support the decision makers with
differing technical skills.
Decision maker can access tools that enable interactive querying/reporting/model building using a
range of internal and external data sources.
Potential high returns on investment
Move towards more fact-based decision means that decisions can be faster and more correct.
Increased productivity of corporate decision-makers
Opportunities for launching new products into new markets can be explored with less risk.
Competitive advantage
31.12 The main characteristics for describing Online Transaction Processing (OLTP) systems and data
warehousing systems are listed below.
- Main purpose
- Data age
- Data latency
- Data granularity
- Data processing
- Reporting
- Users
Using each characteristic compare and contrast OLTP systems with data warehousing systems and
describe the relationship that can exist between these systems. Include in your answer any emerging
trends that are influencing the characteristics of data warehousing systems.
The student’s answer should cover the information contained in the second and third columns of the
table below.
Characteristic OLTP Systems Data Warehouse Systems

Main purpose Supports operational processing Supports analytical processing
Data age Current Historic (but trend is towards also
including current data)
Data latency Real-time Depends on data cycle of
supplements to the warehouse
(but trend is to reduce this time to
real-time or near real-time (NRT))
Data granularity Detailed Detailed, lightly and highly
summarised data
Data processing Predictable patterns of data Less predictable pattern of data
insertions, updates, deletions queries. Low to medium level of
and queries. High-level of transaction throughput.
transaction throughput
Reporting Predictable, relatively static Unpredictable, dynamic multi-
fixed reporting displaying few dimensional reporting

dimensions
Users Serves large number of Serves relatively lower number of
operational users strategic and tactical users (but
trend is to also support
requirements of operational
users).
OLTP are the major source of data for the data warehouse. However OLTP systems were never
designed to support such business activities and so tapping into these systems for decision-making may
never be an easy solution. The legacy is that a typical business may have numerous operational systems
with overlapping and sometimes contradictory definitions, such as data types. The challenge for
organizations is the need to turn their archives of data into a source of knowledge, so that a single
integrated / consolidated view of the organization’s data is presented to the user. The concept of a data
warehouse was deemed the solution to meet the requirements of a system capable of supporting
decision-making, receiving data from multiple operational data sources.
A DBMS built for Online Transaction Processing (OLTP) is generally regarded as unsuitable for data
warehousing because each system is designed with a differing set of requirements in mind. For
example, OLTP systems are designed to maximize the transaction processing capacity, while data
warehouses are designed to support ad hoc query processing. An organization will normally have a
number of different OLTP systems for business processes such as inventory control, customer
invoicing, and point-of-sale. These systems generate operational data that is detailed, current, and
subject to change. The OLTP systems are optimized for a high number of transactions that are
predictable, repetitive, and update intensive. The OLTP data is organized according to the requirements
of the transactions associated with the business applications and supports the day-to-day decisions of a
large number of concurrent operational users. In contrast, an organization will normally have a single
data warehouse, which holds data that is historic, detailed, and summarized to various levels and rarely
subject to change (other than being supplemented with new data). The data warehouse is designed to
support relatively lower numbers of transactions that are unpredictable in nature and require answers to
queries that are ad hoc, unstructured, and heuristic. The warehouse data is organized according to the
requirements of potential queries and supports the long term strategic decisions of a relatively low
number of managerial users.
Although OLTP systems and data warehouses have different characteristics and are built with different
purposes in mind, these systems are closely related, in that the OLTP systems provide the source data
for the warehouse. A major problem of this relationship is that the data held by the OLTP systems can
be inconsistent, fragmented, and subject to change, containing duplicate or missing entries. As such, the
operational data must be ‘cleaned up’ before it can be used in the data warehouse. OLTP systems are
not built to quickly answer ad hoc queries. They also tend not to store historical data, which is
necessary to analyze trends. Basically, OLTP offers large amounts of raw data, which is not easily
analyzed. The data warehouse allows more complex queries to be answered as opposed to simple
aggregations.
Chapters 32 Data Warehousing Design
32.1 Describe the main principles and key features of Kimball’s Business Dimensional Lifecycle. Support
your answer with a diagram showing the stages of the lifecycle.
The Business Dimensional Lifecycle - Principles

o Focus on the business e.g. Identify business requirements and their associated value.
o Build an information infrastructure e.g. build a single, integrated, easy-to-use, high perfor-
mance information foundation able to meet the business requirements.
o Deliver in meaningful increments e.g. 6 to 12 month timeframes.
o Deliver the entire solution e.g. provide all the elements necessary to deliver value to the
business user e.g. data warehouse, ad hoc query tools, reporting applications, advanced an-
alytics, training, support etc
The Business Dimensional Lifecycle – Key Features
1. Business Requirements Definition plays central role by influencing Project Planning and providing the
foundation for the three tracks that follow concerning:
o Technology (top track)
o Data (middle track)
o BI applications (bottom track)
2. Comprehensive project management
3. Incremental and iterative approach
(3)
Answer should be supported with a diagram of lifecycle.
32.2 Describe the proposed approach to designing a star schema in the Dimensional Modelling stage of
Kimball’s Business Dimensional Lifecycle. Include the creation of a simple (maximum of four dimension
tables and one fact table with maximum of six attributes per table) star schema to illustrate each step.
Dimensional Modeling Stage

This stage can create a dimensional model (DM) for a data warehouse or to ‘dimensionalize’ the
relational schema of an OLTP database.
A high-level (simplified) version of the dimension model is first created, which progressively gains
more detail.
The DM is created using a two phased approach –

Part I Create a high-level DM – a graphical representation of the dimension, fact and utility tables
involved in representing the business process. The DM is created using a four-step process.
Part II Identify all dimension and fact attributes for the DM.
Part I Create a high-level DM (4 step process)

Step 1 Select Business Process
Enterprise-level business requirements should identify business process (subject area) to be modelled.
The first business process should:

o Deliver significant value to the organisation,
o Built in reasonable time,
o Establish data foundations for the enterprise view by creating reusable or
conformed dimensions.
Step 2 Declare the Grain

Decide on level of detail or the grain needed for the selected business process. In other words, what a
row of the fact table represents.
Choosing the level of grain is determined by finding a balance between meeting business requirements
and what is possible given the data source. Recommendation is to build the DM using the lowest level
of detail available.
Step 3 Choose the Dimensions

Determine the most useful dimensions that apply to each fact row and how to best represent each
dimension.
Step 4 Identify the Facts

Identify numeric facts used to measure the business process. This step should also identify calculations
derived from the facts, which are used to monitor the business.
Part II - Identify Dimension and Fact Attributes

This part involves identifying the attributes needed by the business to analyse the selected business
process.
The usefulness of a DM is determined by the attributes associated with each dimension as this governs
how the data will be viewed for analysis.
32.3 Identify three differences between Kimball’s approach to data warehouse development and Inmon’s
approach.
There are a number of differences between the approaches and the student is free to identify any three such
as Kimball recommends first breaking down the project into smaller parts (called data marts) whereas
Inmon does not; Kimball uses new techniques e.g. dimensional modelling whereas Inmon uses traditional
database techniques and Kimball recommends a denormalised database structure while Inmon does not and
his conforms to 3NF.
32.4. The star schema shown in Figure 32.1 describes part of the database that will provide decision-support
for a property sales company. Describe the main characteristics of fact and dimension tables and discuss
the purpose of the tables shown in the star schema of Figure 32.1.
Figure 32.1
The fact table has relatively few attributes but many records and constitutes the largest part of the
decision-support database. The primary key for the fact table is composed of the foreign keys, which
relate to the dimension tables.
Each dimension table has relatively more attributes but few records compared with the fact table. The
more attributes contained by dimensions the more different types of analysis supported. The primary
key for each dimension table is a single simple surrogate key, which is copied to the fact table.
The schema is referred to as a ‘star schema’ because of the star shape that results from surrounding the
fact table in the centre with the dimension tables. The schema is a star schema because some of the
dimension tables contain repeating data such as the Location and Date dimension. The purpose of the
star schema is to reduce the number of joins between tables and hence speed up queries.
The purpose of the fact table is to contain the attributes that describe the important metrics associated
with property sales such as offerPrice and sellingPrice and foreign key attributes to allow queries about
these metrics to be set in context such as a query concerning the average property selling price to
buyers living in different cities.
The purpose of the dimension tables is to contain attributes to allow property sales to be queried from
different perspectives.
32.5 Identify three types of analysis that the star schema shown in Figure 32.1 can support about property
sales.
For example, the star schema can support analysis of property sales according to –
a. The highest performing staff in terms of property sales.

b. The most popular types of property in terms of property sales.
c. The types of promotions that encourage property sales.
32.6 What do slowly changing dimensions (SCDs) represent to a database designed for decision-support?
A database designed for decision-support will normally store data that can be several years old. During
that time the business is likely to change and this may impact on the corporate data. The star schema
has to continually evolve to support the business and this means that it is important to identify and
accommodate these changes where necessary.
Many dimension attribute values are not fixed but change over time e.g. an employee may change
department. Dimensions that have changeable attribute values are called slowly changing dimensions
(SCDs).
This requires the identification of dimensions and attributes to be tracked and decisions made on how
they are to be tracked as not all changes are significant.
32.7 Describe the three types of SCDs.

Most common techniques for dealing with SCDs that are not significant are called Type 1 and those
that are significant are called Type 2.
A Type 1 SCD overwrites the existing attribute value with the new value. This gives no tracking record
of historical values.
A Type 2 SCD captures the attribute values that were in effect at a point in time and relates them to the
business events in which they participated.
When a change occurs to a Type 2 SCD, a new record is created in the dimension table to capture the
new values that are to take effect from that point onwards or to the next change.
There is a third tracking technique called Type 3 SCD, which creates separate attributes for both the old
and new attribute values. However, Type 3 is less common because it involves changing physical
tables and is not very scalable.
32.9 Identify two possible examples of SCDs in the property sales star schema shown in Figure 32.1 and
discuss the types of change each represents.
An example of a SCD is the position attribute in Staff. The position held by a member of staff is likely
to change over time. If it is important to know the position held by a member of staff associated by a
property sale then this is a Type 2 SCD. This means that a new Staff dimension record has to be created
to represent the member of staff with the new position and this record takes effect from this point
onwards until the next change.
A second example of a possible SCD is the propertyNo attribute in PropertyForSale. It is possible that
the business may change the coding used to uniquely identify each property from e.g. PG1 to
PGW00001. If it is not important to know the old unique identifier for a property when analysing sales
then this is a Type 1 SCD. This means that the new value for each property overwrites the old value.
Evidence of the change associated with propertyNo is lost and the old values will not be available when
querying property sales.
32.10 Discuss what the bus matrix (shown in Figure 32.2) for an online retailer represents and how it can be
used to facilitate the creation of a data warehouse.
Time Customer Promotion Product Courier Location Warehouse Staff
Customer X X X X
Registration
Customer X X X X X X X
Purchase
Product X X X X X X X
Delivery
Product X X X X
Promotion
Product X X X X X X X
Return
Customer X X X X X X
Product Review
Figure 32.2
Following the establishment of analytical themes, the business processes that are associated with them
are identified. The student should describe the layout and features of the matrix using the example of
bus matrix shown in Figure 32.2. For example, the student should indicate that the company’s
measurement-driven business processes are listed down the x-axis. While the columns of the y-axis
represent the objects (dimensions) that participate in the business processes.
The bus matrix represents the enterprise dimensional data architecture. Each analytical theme
supported by one or more business process shown in the bus matrix is subjected to a prioritization
process. The most important/most valuable business processes are scheduled to be built earliest.
However, the conformed dimensions that are shared across more than one business processes form the
basis for the eventual formation of an enterprise–wide data warehouse.
32.11 Produce a star schema for the Product Delivery business process using the information shown in
Figure 32.2. Based on your understanding of this business process, add a maximum of 5 (possible)
attributes to each dimension table in your schema. Complete your star schema by adding a maximum
of 8 (possible) attributes to your fact table. Your choice of attributes should demonstrate that you have
a realistic idea of how this star schema is likely to be queried.
The student should present star schema for any single business processes shown in Figure 32.2. The
attributes added should be reasonable and demonstrate that the student has a good idea of how the
schema is likely to be queried.
___________________________________________________________________________________
The data mart shown in Figure 32.3 supports the analysis of a media (TV programmes and films)
streaming service.
dimMember
memberID
memberName
memberGender
memberDOB
dimFim
memberHomeCityTown
filmID
memberHomePostCode
filmNo
memberJoinDate
filmTitle
filmCertification
filmGenre
filmYearRelease
filmDirector
filmProductionCompany
mainActor1
factStreaming dimTVProgramme
mainActor2
streamID TVProgrammeID
mainActor3
TVProgrammeID TVProgrammeNo
filmID TVProgEpisodeNo
memberID TVProgEpisodeTitle
startDate TVProgEpisodeDuration
dimDate startTime TVSeries

dateID streamDuration TVSeason
fullDate customerRating TVGenre
dayOfWeek TVSeasonYearRelease
numberDayOfWeek TVSeriesLanguage
numberDayOfMonth TVSeriesDirector
numberWeekOfYear [TVProduction Company]
month MainActor1
numberMonthOfYear MainActor2
season MainActor3
dimTime
year timeID
time
hour
[24Hour]
am_pm
morn_aftern_even
Figure 32.3
32.12 Describe the characteristics and purpose of fact and dimension tables and explain how you recognise
that this data mart is based on a star schema design. Illustrate your answer using the data mart tables
in Figure 32.3
The fact table has relatively few attributes but many records and constitutes the largest part of the
decision-support database. The fact table is made up of foreign keys and (usually) one or more metrics.
Each dimension table has relatively more attributes but few records compared with the fact table. The
more attributes contained by dimensions the wider the range of analysis supported. The primary key for
each dimension table is a single simple surrogate key, which is copied to the fact table as foreign key.
The purpose of the fact table is to contain any important metrics (streamDuration) and any additional
descriptors (customerRating) about the TV programmes/films streamed. The attributes of the dimension
tables allow a range of queries about this events being analysed. For example, the dimensional tables
allow analysis of the timing of streaming.
The purpose of the dimension tables is to contain attributes to allow streaming to be queried from
different perspectives.
The schema that describes this data mart is referred to as a ‘star schema’ because of the star shape that
results from surrounding the fact table in the centre with the dimension tables.
The schema is a star schema because the dimension tables are de-normalised and contain repeating data
such as that found in the dimFilm and dimTVProgramme dimension.
The purpose of the star schema is to reduce the number of joins between dimension tables and hence
speed up queries.
32.13 Identify three types of analysis that the data mart shown in Figure 32.3 can support about media
streaming.
For example, the data mart can support analysis of media streaming according to the
following.
d. The most popular actors in streamed films.
e. The most popular genre for streamed TV programmes.
f. The most popular dates and times for streaming.
32.14 The data mart shown in Figure 32.3 cannot support the analysis of media streaming according to the
age of the member at the time of the streaming. Describe the changes necessary to the data mart to
support this type of analysis.
Student should answer as follows:

The age in years of those streaming - requires that an age attribute is added to the fact table. The
member’s age at the time of the streaming could be calculate and stored in the fact table using the dob
and startDate attributes to calculate the member’s age in years.
32.15 What do slowly changing dimensions (SCDs) represent to a database designed for decision-support?
A database designed for decision-support will normally store data that can be several years old. During
that time the business is likely to change and this may impact on the corporate data. The star schema
has to continually evolve to support the business and this means that it is important to identify and
accommodate these changes where necessary.
Many dimension attribute values are not fixed but change over time e.g. an employee may change
department. Dimensions that have changeable attribute values are called slowly changing dimensions
(SCDs).
This requires the identification of dimensions and attributes to be tracked and decisions made on how
they are to be tracked as not all changes are significant.
32.16 Describe three types of slowly changing dimension (SCDs) and discuss the best SCD Type to track
changes over time to a member’s home address.
Most common techniques for dealing with SCDs that are not significant are called Type 1 and those
that are significant are called Types 2 and 3.
A Type 1 SCD overwrites the existing attribute value with the new value. This gives no tracking record
of historical values.
A Type 2 SCD captures the attribute values that were in effect at a point in time and relates them to the
business events in which they participated.
When a change occurs to a Type 2 SCD, a new record is created in the dimension table to capture the
new values that are to take effect from that point onwards or to the next change.
Type 3 SCD creates separate attributes for both the old and new attribute values. However, Type 3 is
less common because it involves changing physical tables and is not very scalable.
Only Type 2 or 3 should be discuss for the member’s home address SCD as Type 1 does not track
changes.
Chapters 33 OLAP
The OLAP cube shown in Figure 33.1 has been prepared for a car rental company that rents out cars to
customers throughout the UK. The company wishes to explore which manufacturer, model, engine size and
trim (interior finish) generates the most rental income in each location of the UK. For example, the
manufacturer Ford has various models including Mondeo, Fiesta and Ka and each model comes in various
size of engine such as 1.8 or 1.6 with each available in one of three trim levels of high, medium and low.
Country
Car manufacturer Car

Rental
Income
Year
Figure 33.1
33.1 Present typical dimensional hierarchies (with 4 levels of aggregation) for each dimension shown in
Figure 1. The highest level of aggregation for each dimension is shown in Figure 33.1.
Country Car manufacturer Year
Quarter Season
City Model
Month Week
Area Engine size
Day
Postcode
Trim level
33.2 Describe the four common OLAP operations for querying data. Provide an example of each operation
using the OLAP cube in Figure 33.1 and your answer to question 33.1.
OLAP operations include:

Roll-up, drill-down, slice and dice, pivoting.
Roll-up performs aggregations on the data by moving up the dimensional hierarchy or by dimensional
reduction e.g. 3-D to 2-D.
For example, {location, manufacturer and time} analysis of rental income to {location, manufacturer}
analysis of rental income.
For example, analysis of weekly rental income to analysis of monthly rental income.
Drill-down is the reverse of roll-up and involves revealing the detailed data that forms the aggregated
data. Drill-down can be performed by moving down the dimensional hierarchy or by dimensional
introduction e.g. 2-D to 3-D.
For example, {location, manufacturer} analysis of rental income to {location, manufacturer and time}
analysis of rental income.
For example, analysis of monthly rental income to analysis of weekly rental income.
Slice and dice - ability to look at data from different viewpoints. The slice operation performs a
selection on one dimension of the data whereas dice uses two or more dimensions.
For example, a slice is analysis of rental income according to {time = ‘2012’} OR {manufacturer =
‘Ford’} OR {location = ‘London’}
For example, a dice is analysis of rental income according to {time = ‘2012’} and {manufacturer =
‘Ford’} OR {time = ‘2012’} and {manufacturer = ‘Ford’} and {location = ‘London’}
Pivot - ability to rotate the data to provide an alternative view of the same data.
For example, analysis of rental income using the location (city) as x-axis against time (year) as the y-
axis can be rotated so that time (year) is the x-axis against location (city) is the y-axis.
33.3 Describe how SQL has been extended to include OLAP-type analysis of relational data.
The extensions to SQL are collectively referred to as the ‘OLAP package’ and includes:
 Extended Grouping capabilities such as ROLLUP and CUBE.

 Extended OLAP operators such as Ranking and Window calculations.
Extended Grouping Capabilities

Aggregation is a fundamental part of OLAP. To improve aggregation capabilities the SQL standard
provides extensions to the GROUP BY clause such as the ROLLUP and CUBE functions. ROLLUP
supports calculations using aggregations such as SUM, COUNT, MAX, MIN, and AVG at increasing
levels of aggregation, from the most detailed up to a grand total. CUBE is similar to ROLLUP, enabling
a single statement to calculate all possible combinations of aggregations. CUBE can generate the
information needed in cross-tabulation reports with a single query.
Elementary OLAP Operators
Supports a variety of operations such as rankings and window calculations. Ranking functions include
cumulative distributions, percent rank, and N-tiles. Windowing allows the calculation of cumulative
and moving aggregations using functions such as SUM, AVG, MIN, and COUNT.
_________________________________________________________________________________
The OLAP cube shown in the figure below has been prepared for a media (TV programmes and films) streaming
service.
Country
Media
Media
Streaming
Year
Figure 33.2
33.4 Describe the four common OLAP operations for querying data. Provide an example of each operation
using the OLAP cube in Figure 33.2.
Common OLAP operations include:

(1) Roll-up, (2) drill-down, (3) slice and dice, and (4) pivoting.
Roll-up performs aggregations on the data by moving up the dimensional hierarchy or by dimensional
reduction e.g. 3-D to 2-D.
For example, {country, media and time} analysis of media streaming to {country, media} analysis of
streaming.
For example, analysis of weekly media streaming to analysis of monthly streaming.
Drill-down is the reverse of roll-up and involves revealing the detailed data that forms the aggregated
data. Drill-down can be performed by moving down the dimensional hierarchy or by dimensional
introduction e.g. 2-D to 3-D.
For example, {country, media} analysis of media streaming to {country, media and time} analysis of
streaming.
For example, analysis of monthly of media streaming to analysis of weekly streaming.
Slice and dice - ability to look at data from different viewpoints. The slice operation performs a
selection on one dimension of the data whereas dice uses two or more dimensions.
For example, a slice is analysis of media streaming according to {time = ‘2014’} OR {media = ‘film’}
OR {location = ‘London’}
For example, a dice is analysis of media streaming according to {time = ‘2014’} and {media = ‘film’}
OR {time = ‘2014’} and {media = ‘film’} and {location = ‘London’}
Pivot - ability to rotate the data to provide an alternative view of the same data.
For example, analysis of media streaming using the location (city) as x-axis against time (year) as the
y-axis can be rotated so that time (year) is the x-axis against location (city) is the y-axis.
33.5 What are the three key features that all OLAP applications share?
OLAP applications all require (1) multi-dimensional views of data, (2) support for complex
calculations (such as forecasting), and (3) time intelligence. Time intelligence is a key feature of almost
any analytical application as performance is almost always judged over time.
Multi-dimensional data can be characterized through many different views. (e.g. DVD sales can be
viewed according to the characteristics of the DVD such as genre and/or characteristics of Buyers such
home address).
The types of analysis available from OLAP ranges from basic navigation and browsing (referred to as
‘slicing and dicing’) to calculations, to more complex analyses such as time series and complex
modeling.
33.6 Explain the difference between the output produced by SQL ROLLUP and CUBE queries.
The ROLLUP and CUBE

extensions to GROUP BY generate OLAP-type summaries of the data with subtotals and totals. The
columns to be defined are defined similarly to how grouping sets can define GROUP BY columns.
ROLLUP generates subtotal and total rows for the GROUP BY columns. CUBE extends the
capabilities by generated subtotal rows for every combination of GROUP BY columns. ROLLUP and
CUBE also generate a grand total row.
A simple example of a ROLLUP query is shown below:

SELECT a, b, c, sum(x)
FROM t
GROUP BY ROLLUP (a,b,c)
Produces aggregates of x for (a,b,c), (a,b), a, (grand total)
A simple example of a CUBE query is shown below:

SELECT a, b, c, sum(x)
FROM t
GROUP BY CUBE (a,b,c)
Produces aggregates of x for (a,b,c), (a,b), (a,c), a, (b,c), b, c, (grand total). The rows that are produced
by ROLLUP and CUBE are shown in bold.
Chapters 34 Data Mining
34.1 Data mining can provide huge paybacks for companies who have made a significant investment in data
warehousing. Describe the advantages that data mining tools offer the business analyst.
See introductory paragraphs to Section 34.2 and Section 34.2.1.
34.2 Explain why a data warehouse is well equipped for providing the data for data mining.
A data warehouse is well equipped for providing data for mining for the following reasons:
 Data quality and consistency is a pre-requisite for mining to ensure the accuracy of the predictive
models. Data warehouses are populated with clean, consistent data.
 It is advantageous to mine data from multiple sources to discover as many interrelationships as
possible. Data warehouses contain data from a number of sources.
 Selecting the relevant subsets of records and fields for data mining requires the query capabilities of
the data warehouse.
 The results of a data mining study are useful if there is some way to further investigate the
uncovered patterns. Data warehouses provide the capability to go back to the data source.
34.3 Identify the factors that have led to the growing popularity of data mining.
Students should identify the factors such as:

Adoption of data mining tools is nature path following on from the success of data warehousing
and OLAP.
Data mining tools are more intuitive to use and there is less need for the specialised staff.
Data mining tools are often bundled together with other BI tools.
DBMS vendors are offering data mining functionality as an inexpensive add-on.
Growing number of data mining tools on the marketplace so there is more user choice.
Growing number of data mining tools are appearing to meet needs of a specific domain.
34.4 Discuss the relationship between data mining, OLAP and data warehousing.
Data mining and OLAP are complementary BI tools. While OLAP functions typically include
aggregations, allocations, ratios etc, which are descriptive in nature, data mining uses regression, neural
nets, decision trees and clustering, which are associated with pattern discovery or explanatory
modeling.
A data warehouse is well equipped for providing the data source for data mining and OLAP.
Data quality and consistency is a pre-requisite for mining and/or browsing to ensure the accuracy of the
predictive models/descriptive models. Data warehouses are populated with clean, consistent data.
It is advantageous to mine and/or browse data from multiple sources to discover as many
interrelationships as possible. Data warehouses contain data from a number of sources.
Selecting the relevant subsets of records and fields for data mining/browsing requires the query
capabilities of the data warehouse.
The results of a data mining/OLAP study are useful if there is some way to further investigate the
uncovered patterns. Data warehouses provide the capability to go back to the data source.
34.5 There are a growing number of data mining tools. Describe four key features that these tools offer the
analyst.
The student should discuss key features of these tools such as:
data preparation facilities;
selection of data mining operations (algorithms);
product scalability and performance;
facilities for visualization of results.

Docs Will Be DOCs

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Docs Will Be DOCs

Uploaded by

Copyright:

Available Formats

Database Systems: Instructor’s Guide – IV

EXAM QUESTIONS & COURSEWORKS

Exam Questions and Solutions

Part 2 The Relational Model and Languages

Part 3 Database Analysis and Design Techniques

Case Study 2 - BusyBee Cleaning Company

Case Study 3 - Reliable Rentals

Case Study 4 - Perfect Pets

Case Study 5 - StayHome Video Rentals

Database Systems Coursework 1 (Case Study 6 – Fastcabs Cab Company)

Database Systems Coursework 2 (Case Study 7 – University Database)

Part 5 Selected Database Issues

Chapter 23 Query Processing

Part 6 Distributed DBMSs and Replication

Case Study 2 – Quack Consulting

Case Study 3 – InstantBuy

Case Study 4 – Complete Pet Care

Case Study 5 – Rapid Roads

Case Study 6 – Perilous Printing

Assignment – Distributed Database Analysis and Design

Chapter 25 Distributed DBMSs – Advanced Concepts

Part 7 Object DBMSs

Chapter 28 Object-Oriented DBMSs – Standards and Systems

Case Study 2 – Perilous Printing

Case Study 3 – Perfect Pets

Assignment – Persistence in a Programming Language

Part 8 Web and DBMSs

Part 9 Business Intelligence

Chapter 2 Database Environment

Part 2 The Relational Model and Languages

CODASYL Data Model Advantages

– longevity and availability of DBMSs for this model

– all data relationships can be modeled

CODASYL Data Model Disadvantages

Relational Data Model Advantages

Relational Data Model Disadvantages

The main benefits of views includes:

Chapter 5 Relational Algebra and Relational Calculus

Answer is shown in table below.

Expression Min Max Assumptions

Operator (opCode, opName)

(a) List the details of journeys less than £100.

RA: price < 100(Journey)

TRC: {J | Journey(J)  J.price < 100}

DRC: {opCode, destinationCode, price |Journey(opCode, destinationCode, price) 

(b) List the names of all destinations.

TRC: {D.destinationName | Destination(D) }

DRC: {destinationName | (destinationCode, distance)

(c) Find the names of all destinations within 20 miles.

RA: destinationName(distance < 20(Destination))

TRC: {D.destinationName | Destination(D)  D.distance < 20}

DRC: {destinationName | (destinationCode, distance)

RA: opName(price < 5(Journey) 3 opCode Operator)

TRC: {O.opName | Operator(O)  (J) (Journey (J)  (O.opCode = J.opCode) 

DRC: {opName | (opCode, opCode1, destinationCode, price)

RA: opName, price ( (destinationName = ‘Ayr’ (Destination) 3 destinationCode (Journey 3 opCode

TRC: {O.opName, J.price | Operator(O)  (J)(D) (Journey (J)  Destination(D) 

DRC: {opName, price | (opCode, opCode1, destinationCode, destinationCode1,

RA: destinationNamedestinationCode(Destination) – destinationCode(Journey)) 3 destinationCode

TRC: {D.destinationName | Destination(D)  (~(J) (Journey (J) 

DRC: {destinationName | (destinationCode, distance)

Employee (empID, fName, lName, address, DOB, sex, position, deptNo)

where Employee contains employee details and empID is the key.