You are on page 1of 11

Mekelle University Faculty of Business & Economics

Computer Science Department

ICT 252 - Theory of Databases

Final Written Exam – June 2005

68 marks Marking Scheme

1. Index File Organization [22 marks total]

a. Compare index-sequential file and B-trees as structures for index file organization. Your
comparison should include information about how the different structures can be searched,
the search performance and any other information you think is relevant. [5 marks]

An index-sequential file consists of an ordered sequential file for data and another one for
the index. Each index entry has a pointer to the corresponding record(s) in the data file.
(.5)
To search the file based on the index search key, the index can be binary-searched. (.5)
This search is fast (log2(b) block accesses where b is the number of blocks taken up by
the index). (.5)
Fast for sequential search in order of the index search key. (.5)
Search based on a key other than the index or the ordering field is not fast – as have to
linear search the data file. (.5)

A B-tree is an alternative structure for an index, where the index is stored in a B-tree. Each
key value in the tree has a pointer to the corresponding record(s) in the data file.
To search based on the index search key, the B-tree can be searched using the binary
search algorithm. (.5)
The search is faster than an index-sequential file because the number of block accesses
required is (usually) <= the height of the tree so a low-height, bushy tree can make
searches faster. (.5)
The B-tree structure is not so fast for sequential search in order of the index search key
(.5) – because the tree has to be traversed in-order. Because there are multiple key values
in each node, this means visiting each node more than once. (.5 – to say why)
Search based on a key other than the index or the ordering field is not fast – as have to
linear search the data file (as for an index-sequential file). (.5)

[if student does not give all the above points but does make other relevant points, give .5
for each point – to max of 10 points i.e. 5 marks]

Page 1 of 11
b. Assume a database table is stored in an ordered sequential file and that it has an index
stored using the index-sequential organization. Assume the index is a dense index.
Outline an algorithm for inserting a new data record to the table (including any updates
necessary to the index). [4 marks]

Insert to the data file


If ordered file, find the correct position based on the value of the ordering field
(1)
Insert to the dense index
If key is not in the index, insert an index entry with the new value, in the correct
position (1)
Else if [key is in the index and] index has pointers to all records for the key
value, add a pointer to the new record (1)
Else if [key is in the index and] index has pointers only to first record for the
key value, ensure the record is after other records with the same key value. (1)

c. What changes would you make to this algorithm if the index were a sparse index, with an
entry for each block in the data file? [3 marks]

Change the code for inserting to the index to this:


If a new block has been created, insert the first key value in the block into the index.
(1)
Else if the new record has the lowest key value in its block, update the index entry
pointing to that block, so it has the new key value (1)
Else no change to the index. (1)

d. List the characteristics of a B tree that distinguish it from normal m-way search trees. [3
marks]
 Each node of the tree, except for the root and the leaf nodes, has at least (m/2) sub-
trees and no more than m sub-trees i.e. each node is at least half-full.(1)
 The root of the tree must have at least 2 subtrees, unless it is itself a leaf node. This
forces the tree to branch early – so searching is faster. (1)
 All leaf nodes of the tree must be on the same level. This gives faster searching. (1)
[NB: 0 for saying the tree is balanced or height-balanced or perfectly-balanced – to be clear,
must say that all leaf nodes are on the same level.]

e. What is a B+ tree? [2 marks]


A B+ tree is a B-tree where the leaf nodes form a sequential set of all the search key values, in
a linked list. (1) The other levels form a B-tree index. (1)
Or
A B+ tree is like a multi-level index (1) where the leaf nodes (or bottom level of the tree) are
equivalent to the first level of the index (1).

[or give 2 marks for a definition that shows the student understands there is an index set (1
mark) and a sequence set (1 mark) in the structure]

Page 2 of 11
f. What benefit (and why) does a B+ tree have compared to a B tree as an index structure? [1
mark]
Faster sequential access (.5), because of the sequence set in the leaf nodes. (.5)
g. When talking about an index, what does the term ‘an extra level of indirection’ mean and
why is it used? Use a diagram to support your answer.
Give an example of an index where this might be used. [4 marks]
It means that the pointer from each index entry is to a bucket of pointers (1). This is done
so that the index entry size is consistent i.e. each entry has only one pointer. (1)

Diagram something like this – or alternatively showing leaf nodes in a B+ tree and a
pointer from each key value to a bucket. (1 mark)

Ababa C0954327 Yohannis Solomon


Sara C1010398 Tesfay Kinfe
Tesfay C2388597 Sara Abebe
Tigist C3340959 Ababa Tekle
Yohannis C3499503 Tesfay Abraha

C8543321 Tigist GebreMariam

Examples: (1 for either)


Used for a secondary index on a non-candidate key field.
OR
Used for the pointers from the leaf nodes in a B+ tree.

2. Transactions [22 marks total]

a. What 4 properties must a logical unit of work have in order to be considered a transaction
in a DBMS? Name each property and describe what it means. [10 marks]
Atomicity (.5)
All data changes made by the operations are reflected in the database or none of them are
(all data modifications performed or none). (1)
This means that all the steps in the transaction must succeed for the whole transaction to
succeed. If any one of the steps fails, the whole transaction must fail. (1)

Consistency (.5)
When completed, a transaction must preserve the consistency of a database. This means
that after completion, all data must be in a correct state (1) and comply with all data
constraints and validation rules (1) [or that data integrity in the database must be
maintained. (1)]

Isolation (.5)
Any transaction must be unaware of other transactions executing in the system
concurrently. (.5)

Page 3 of 11
No other transactions or elements of the database can see the changes resulting from a
transaction until the transaction completes.
OR: Other transactions should see the data in the state it was in before the transaction or
after it completes – not in between.
OR: a transaction must see a consistent database – a transaction cannot read or write data
that is being modified by another transaction. (1.5)

Durability (.5)
After a transaction completes successfully, the changes it has made in the database will
persist – even if there is a system failure. (2)
OR: the changes must be permanent and cannot be erased from the database (after they
are committed).

b. Which two of the 4 properties are ensured by the concurrency control manager of a
DBMS? [1 marks]
Consistency and isolation (.5 for each)

c. Name 3 problems that can occur without concurrency control. [3 marks]


Lost update (1)
Uncommitted dependency (or dirty read) (1)
Incorrect summary (or phantom rows) (1)

d. Using a diagram, describe in detail one of the problems you named in part c. [3 marks]

Lost update – where 2 transactions update the same data but only one of the changes
remains after both have committed. (1)
OR: one update overwrites another update.
Diagram – with a timeline and showing the steps of 2 transactions reading and updating a
value. Something like this one (which we did in class): (2 – must clearly show the
problem)

Time Transaction A Transaction B


T1 read X (result: 1)
T2 x:= X + 50 (result: x=51) read X (result: 1 – because Trans A has not
yet written the new value of x)
T3 write X (X in db now has value X:=X+20 (result: 21)
51)
T4 write x (X in db now has value 21)
T5 Commit transaction
T6 Commit transaction

Uncommitted dependency (dirty read) – when a transaction B reads data that has been
updated but not committed by another transaction A; and the transaction B then does
something with the read data; transaction A then rolls back its changes – so B is using an
uncommitted data change. (1 mark)

Page 4 of 11
Diagram – something like this one (which we did in class): (2 – must clearly show the
problem)
Time Transaction A Transaction B
T1 read X (result: 1)
T2 x:= X - 1 (result: x=0)
T3 write X (X in db now has value 0)
T4 read X (result: 0 – because Trans A has
now written the new value of x)
T5 rollback transaction

Incorrect summary (phantom rows): occurs when Transaction A is updating data while
Transaction B is reading the data to calculate a summary. (1)
Diagram – something like this one (which we did in class): (2 – must clearly show the
problem)
Time Transaction A Transaction B
T1 read X (result: 1)
T2 x:= X - 1 (result: x=0)
T3 write X (X in db now has value 0)
T4 read X (result: 0 – because Trans A has
now written the new value of x)
T5 sum = sum + X
T6 read Y (result: 3)
T7 Sum = sum + Y (result: 0+3 = 3)
T8 Read Y (result: 3)
T9 Y = y – 1 (result: 2)
T10 Write y (result y=2 in the db)

e. Define what serializable means for transaction execution. [2 marks]


An interleaved execution order for 2 transactions is serializable (1) if the result is the same as
if the 2 transactions were executed one after the other (1).

f. What mechanism is used by the concurrency control manager of a DBMS to avoid these
problems? For one of the problems you described in part c, show how this mechanism
would be used to avoid the problem. [3 marks]
Locking (1 mark)

To avoid a problem –show how the first transaction gets a lock on the data it is updating,
and when it releases the lock, the second transaction can then read the data.
For uncommitted read – lock must be maintained till after the roll back.
(2 marks)

3. Indexing Calculations [12 marks total]

Read the information given and carry out the calculations asked for. Make the steps in your
calculations clear and show the formulae you use. This will help you to get marks if you have
the correct formulae but the wrong answer.

Assume Customer data is stored in an ordered sequential file.

Page 5 of 11
There are r = 30000 Customer records and the file is stored on a disk with block size B = 1024
bytes.
The records are ordered by the CustomerID, a field which is 9 bytes long.
CustomerID is a candidate key for the data.
Records have a fixed size of R = 100 bytes (this includes 9 bytes for the CustomerID) and are
stored using an unspanned organization.

A block pointer, P, is 6 bytes long.


A record pointer, PR, is 7 bytes long.

a. Calculate the blocking factor, bfr, and the number of file blocks, b, required for the data
file. [2.5 marks]
NB: formulas must include correct ceiling/floor brackets – half marks if not correct but
the answer shows they did the right thing. 0 if both wrong.

bfr = B/R = 1024/100 = 10 (1 for correct formula & using correct numbers; .5 for
correct answer)
b = r/bfr = 30000/10 = 3000 (1 for correct formula & correct numbers based on
answer above;)

Suppose that we want to build a primary index on CustomerID. Assume it is a sparse index
with one entry per block of the data file.

b. Calculate the number of first-level index entries (ri) and the number of first-level index
blocks (bi). [2.5 marks]

Ri = 9 + 6 = 15 (CustomerID plus block pointer) (1 for correct formula & correct


numbers used in the sum)
bfri = B/Ri = 1024/15 = 68 (.5 for correct formula & numbers – same as in a)
ri = 3000 (number of blocks in the data file) (.5 for using the right number based on
answers above)
bi = ri /bfri = 3000/68 = 45 (.5 for correct formula & numbers – same as in a)

c. Calculate the number of levels required to make it into a multi-level index. For each level,
show the number of blocks in the level. [1.5 marks]

Let b1 = bi = 45 (.5 for showing, based on answers above)


b2 =  b1/ bfri =  45/68 = 1 (.5 for correct formula)
=> need 2 levels (.5 for summing b1 and b2)

d. Which is faster to search for and retrieve a record from the file, given its CustomerID – to
use the index or to use the data file only? Why? Show the calculations you use to arrive at
your answer. [2.5 marks]

To search the index: one block access per level = 2 + 1 to read the data record = 3. (.5 for
number of levels; .5 for adding 1 for data read)

Page 6 of 11
To search the data file only: log2b = log23000 = 12 block accesses + 1 to read the
data record = 13. (.5 for formula; .5 for using correct value and getting correct answer)
Searching the index is faster because it requires fewer block accesses. (1)

Suppose now that the records are not ordered by the CustomerID and that we want to build a
secondary index on CustomerID.

e. Calculate the number of first-level index entries (ri) and the number of first-level index
blocks (bi) in the secondary index. [2 marks]

Using same formulas here, so do not mark for them again. Marks are for knowing that a
dense index has an entry for every record; that the first level index pointers are record
rather than block pointers and that the index pointers in other levels are block pointers.

ri = 30000 (because it is a dense index) (1)


Ri = 9 + 7 = 16 (CustomerID plus record pointer) (1)
bfri = B/Ri = 1024/16 = 64
bi = ri /bfri = 30000/64 = 469

f. Calculate the number of levels required to make it into a multi-level index. For each level,
show the number of blocks in the level. [1 mark]

Let b1 = bi = 469
Different blocking factor – because pointers are to blocks rather than records. Same bfr as
for primary index.
bfri = B/Ri = 1024/(9+6) = 68 (.5 for dividing by 15 rather than 16)
b2 =  b1/ bfri =  469/68 = 7
b3 =  b2/ bfri =  7/68 = 1

=> need 3 levels (.5 )

4. Relational algebra/SQL [10 marks total]

a. You are a database programmer and you have been given the relational database schema
used by a clinic, as shown in Error: Reference source not found below. You are asked to
write queries to produce the following sets of data:

(i) Show all patient visits, including columns for the patient name and father’s name,
visit date, notes and staff name. The list should be ordered by patient name and then
by visit date. [2 marks]

select p.Name, p.Fathersname, s.Name, s.fathersname, v.Date, v.Notes, s.Name


from patients p, visits v, staff s
where p.patientid = v.patientid and v.staffid = s.staffid
order by p.name, v.date

Page 7 of 11
1 for joins (.5 for each join)
.5 for correct columns
.5 for correct use of order by

(ii) Show only staff who have not been involved in any visits. The list should show
StaffID, name and father’s name. [2 marks]

Select s.StaffID, s.name, s.fathersname


from staff s left outer join visit v
on s.staffid = v.staffid
where v.visitID is null

.25 for correct columns


.75 for correct join (.5 for left/right outer join; .25 for on x=y)
1 for use of is null in where (can be for any column in Visits table)

(iii) Get a count of patients who have taken each type of test. [2 marks]

select count(PatientID) 'No. of Patients', TestTypeID


from PatientTest
group by TestTypeID

1 for count(PatientID) (can count any column except TestTypeID)


1 for group by TestTypeID (must group on this column)

(iv) Get a list of patients whose first name begins with the letter T. [1 mark]
Select * from patients where name like ‘T%’

.5 for like; .5 for T%

b. Write relational algebra expressions for the following queries (again using the relational
schema shown in Error: Reference source not found below).

(i) Show the name and father’s name for patients who live in kebele 9. [1.5 marks]
name,fathersname (kebele=9 (Patients))

.5 for projection; .5 for selection, .5 for correct predicate in the selection

(ii) Show the PatientID, TestTypeID, test Name and test date for all tests taken by
patients. You should use the minimum number of operators and the result should
not have attributes other than PatientID, TestTypeID, test Name and test Date. [1.5
marks]

Page 8 of 11
PatientID,TestTypeID,Name,Date (TestTypes nat. join PatientTest )

1 for natural join; .5 for correct projection to get columns

(iii) Show the PatientID for patients who have taken the test that has name attribute
‘HIV’ and who have also taken the test that has name attribute ‘TB’. The result
does not have to show any attributes other than PatientID. [2 marks]

PatientID (PatientTest nat. joinName=’HIV’ TestTypes)



PatientID (PatientTest nat. joinName=’TB’ TestTypes)

OR
PatientID (Name=’HIV’ (PatientTest nat. join TestTypes))

PatientID (Name=’TB’ (PatientTest nat. join TestTypes))

.5 for natural joins (.25 for each)


.5 for predicate Name=’HIV’ and Name=’TB’ (.25 for each one) – can be on the
natural join or as a selection on the join
.5 for the projections (.25 for each)
.5 for the intersection operator

5. Queries [12 marks total]

a. What component of a DBMS deals with running queries? [1 mark]


The query optimiser. (1)

b. What are the steps that are taken to run a query? Explain what happens at each step. [9
marks]
check for syntax & semantic errors (1): check the SQL for errors and if there are errors,
stop processing (1)
query transformation (1): transform to a standard format (usually relational algebra) (1)
access plan evaluation: based on the relational algebra expression, come up with one or
more access plans that list the individual file operations needed (1); evaluate them and
decide on the best one (1) (based on statistics & costs of execution) (1)
execution of the access plan (1): interpret & execute the file operations in the chosen access
plan (1)

c. Name 2 algorithms that the DBMS can use to carry out a join. [2 marks]
Merge join, nested join, hash join (1 each for any two)

6. Entity Relationship Modelling [12 marks total]

Page 9 of 11
a. Draw an E-R model based on the following information. Make sure your diagram clearly
shows entities, attributes, identifiers, relationships and cardinalities.
It should also distinguish between identifying relationships and non-identifying
relationships.

The Hiwa dealership in Mekelle sells cars to customers.


When a customer first buys a car from the dealership, his/her details (name, phone number
and PO box number) are recorded. Each customer is also assigned a unique ID number.
Some customers come back again to buy another car.

Each individual car that the dealership has available for sale has a manufactuer (e.g. Toyota
or Nissan), a colour, a model (e.g. Landcruiser, Landcruiser II, Patrol), an engine size and a
price (in birr). Each car is also assigned a unique ID number by the dealership.

The dealership employs sales people; each salesperson is identified by his or her name (no
two of them have the same name). Every car is sold by one sales person. When a customer
buys a car, the purchase is recorded, along with the date of the purchase. The sales person
gets a percentage commission for each sale. The percentage can be different for each sale.

Some cars are sold again – if the customer sells the car back to the dealership. But a car is
never sold again to the same customer. [6 marks]

Diagram should be as shown below. Main thing is to get the 3-way associative entity –
named Sale or Purchase.
1.5 for SalesPerson, Customer and Car entities (.5 each - deduct .25 for any one that is
incomplete)
1 for Sale entity (could also be named Purchase or something similar – must be associative
– deduct .5 if not);
.5 for date attribute in Sale entity
.5 for PercentCommission attribute in Sale entity
1 for all 3 relationships being identifying relationships (deduct .5 if one is not; give 0 if
only 1 or none shown as identifying)
.75 for cardinality/participation being 1/1 at SalesPerson/Customer/Car sides of
relationships (.25 for each one)
.5 for cardinality/participation being 0/1 for SalesPerson/Car in their relationships (.25 for
each one)
.25 for cardinality/participation being 1/M for Customer in buys relationship

Page 10 of 11
Custom er
SalesPerson
CustomerID
Name
Sale Name
Date POBoxNumber
sells PercentCommission buys Phone

is_sold

Car
CarID
Colour
EngineSize
Manufacturer
Model
Price

b. Convert your E-R diagram to a relational database schema. Draw your database schema,
clearly showing relations, attributes, primary keys, foreign keys and relationships. The
primary keys you choose should be candidate keys (minimal super keys). [3 marks]

1 mark for an overall, completely correct conversion from the student’s E-R diagram.
1 mark for correct PK for the associative Sale entity. Should be combination of
CustomerID and CarID (because a Customer cannot buy the same car again; with Name,
it is not a minimal superkey).
.75 for 3 foreign keys in the Sale entity (Name, CustomerID, CarID)
.25 for correct 1-M relationships (0 if any one of them is wrong)

c. In the relational data model, explain what the following concepts mean and how they are
enforced:
Entity Integrity
Referential integrity
[3 marks]

Entity integrity: means that each relation must have an attribute or combination of
attributes whose values uniquely identify each tuple in the relation. This gives a way to
distinguish different instances of an entity from each other. (1.5)
Referential integrity: is used when an attribute of one relation must have values that exist
in an attribute of another relation. (1.5)

END OF EXAM

Page 11 of 11

You might also like