You are on page 1of 8

A Quick-Start Tutorial on Relational

Database Design
Introduction
Relational database was proposed by Edgar Codd (of IBM Research) around 1969.
It has snce become the domnant database mode for commerca appcatons
(n comparson wth other database modes such as herarchca, network and
ob|ect modes). Today, there are many commerca Relational Database
Management System (RDBMS), such as Orace, IBM DB2 and Mcrosoft SOL
Server. There are aso many free and open-source RDBMS, such as MySOL, mSOL
(mn-SOL) and the embedded |avaDB (Apache Derby).
A reatona database organzes data n tables (or relations). A tabe s made up of
rows and coumns. A row s aso caed a record (or tuple). A coumn s aso caed
a feld (or attribute). A database tabe s smar to a spreadsheet. However, the
reatonshps that can be created among the tabes enabe a reatona database
to emcenty store huge amount of data, and ehectvey retreve seected data.
A anguage caed SOL (Structured Ouery Language) was deveoped to work wth
reatona databases.
Database Design Objective
A we-desgned database sha:
Emnate Data Redundancy: the same pece of data sha not be stored n
more than one pace. Ths s because dupcate data not ony waste storage
spaces but aso easy ead to nconsstences.
Ensure Data Integrty and Accuracy:
|TODO| more
Relational Database Design Process
Database desgn s more art than scence, as you have to make many decsons.
Databases are usuay customzed to sut a partcuar appcaton. No two
customzed appcatons are ake, and hence, no two database are ake.
Gudenes (usuay n terms of what not to do nstead of what to do) are provded
n makng these desgn decson, but the choces utmatey rest on the you - the
desgner.
Step 1: Defne the Purpose of the Database
(Requirement Analysis)
Gather the requrements and dene the ob|ectve of your database, e.g. ...
Draftng out the sampe nput forms, queres and reports, often heps.
Step 2: Gather Data, r!ani"e in tables an# Spe$ify
the Primary %eys
Once you have decded on the purpose of the database, gather the data that are
needed to be stored n the database. Dvde the data nto sub|ect-based tabes.
Choose one coumn (or a few coumns) as the so-caed primary key, whch
unquey dentfy the each of the rows.
Primary %ey
In the reatona mode, a tabe cannot contan dupcate rows, because that
woud create ambgutes n retreva. To ensure unqueness, each tabe shoud
have a coumn (or a set of coumns), caed primary key, that unquey dentes
every records of the tabe. For exampe, an unque number customerID can be
used as the prmary key for
the Customers tabe; productCode for Products tabe; sbn forBooks tabe. A
prmary key s caed a simple key f t s a snge coumn; t s caed a composite
key f t s made up of severa coumns.
Most RDBMSs bud an ndex on the prmary key to factate fast search and
retreva.
The prmary key s aso used to reference other tabes (to be eaborated ater).
You have to decde whch coumn(s) s to be used for prmary key. The decson
may not be straght forward but the prmary key sha have these propertes:
The vaues of prmary key sha be unque (.e., no dupcate vaue). For
exampe, customerName may not be approprate to be used as the prmary
key for the Customers tabe, as there coud be two customers wth the same
name.
The prmary key sha aways have a vaue. In other words, t sha not
contan NULL.
Consder the foowngs n choose the prmary key:
The prmary key sha be smpe and famar,
e.g., empoyeeID for empoyees tabe and sbn for books tabe.
The vaue of the prmary key shoud not change. Prmary key s used to
reference other tabes. If you change ts vaue, you have to change a ts
references; otherwse, the references w be ost. For
exampe, phoneNumber may not be approprate to be used as prmary key for
tabe Customers, because t mght change.
Prmary key often uses nteger (or number) type. But t coud aso be other
types, such as texts. However, t s best to use numerc coumn as prmary
key for emcency.
Prmary key coud take an arbtrary number. Most RDBMSs support so-
caed auto-increment (or AutoNumber type) for nteger prmary key, where
(current maxmum vaue + 1) s assgned to the new record. Ths arbtrary
number s fact-less, as t contans no factua nformaton. Unke factua
nformaton such as phone number, fact-ess number s dea for prmary key,
as t does not change.
Prmary key s usuay a snge coumn (e.g., customerID or productCode).
But t coud aso make up of severa coumns. You shoud use as few coumns
as possbe.
Let's ustrate wth an exampe: a tabe customers contans
coumns astName, rstName, phoneNumber, address, cty, state, zpCode. The
canddates for prmary key are name=(astName,
rstName), phoneNumber, Address1=(address, cty, state), Address1=(address,
zpCode). Name may not be unque. Phone number and address may change.
Hence, t s better to create a fact-ess auto-ncrement number, says customerID,
as the prmary key.
Step &: 'reate Relationships amon! (ables
A database consstng of ndependent and unreated tabes serves tte purpose
(you may consder to use a spreadsheet nstead). The power of reatona
database es n the reatonshp that can be dened between tabes. The most
cruca aspect n desgnng a reatona database s to dentfy the reatonshps
among tabes. The types of reatonshp ncude:
1. one-to-many
2. many-to-many
3. one-to-one
ne)to)*any
In a "cass roster" database, a teacher may teach zero or more casses, whe a
cass s taught by one (and ony one) teacher. In a "company" database, a
manager manages zero or more empoyees, whe an empoyee s managed by
one (and ony one) manager. In a "product saes" database, a customer may
pace many orders; whe an order s paced by one partcuar customer. Ths knd
of reatonshp s known as one-to-many.
One-to-many reatonshp cannot be represented n a snge tabe. For exampe,
n a "cass roster" database, we may begn wth a tabe caed Teachers, whch
stores nformaton about teachers (such as name,omce, phone and ema). To
store the casses taught by each teacher, we coud create
coumns cass1, cass2, cass3, but faces a probem mmedatey on how many
coumns to create. On the other hand, f we begn wth a tabe caed Casses,
whch stores nformaton about a cass
(courseCode, dayOfWeek, tmeStart and tmeEnd); we coud create addtona
coumns to store nformaton about the (one) teacher (such
as name, omce, phone and ema). However, snce a teacher may teach many
casses, ts data woud be dupcated n many rows n tabe Casses.
To support a one-to-many reatonshp, we need to desgn two tabes: a
tabe Casses to store nformaton about the casses wth cassID as the prmary
key; and a tabe Teachers to store nformaton about teachers wth teacherID as
the prmary key. We can then create the one-to-many reatonshp by storng the
prmary key of the tabe Teacher (.e., teacherID) (the "one"-end or the parent
table) n the tabe casses (the "many"-end or the child table), as ustrated
beow.
The coumn teacherID n the chd tabe Casses s known as the foreign key. A
foregn key of a chd tabe s a prmary key of a parent tabe, used to reference
the parent tabe.
Take note that for every vaue n the parent tabe, there coud be zero, one, or
more rows n the chd tabe. For every vaue n the chd tabe, there s one and
ony one row n the parent tabe.
*any)to)*any
In a "product saes" database, a customer's order may contan one or more
products; and a product can appear n many orders. In a "bookstore" database, a
book s wrtten by one or more authors; whe an author may wrte zero or more
books. Ths knd of reatonshp s known as many-to-many.
Let's ustrate wth a "product saes" database. We begn wth two
tabes: Products and Orders. The tabe products contans nformaton about the
products (such as name, descrpton andquanttyInStock) wth productID as ts
prmary key. The tabe orders contans customer's orders
(customerID, dateOrdered, dateRequred and status). Agan, we cannot store the
tems ordered nsde the Orders tabe, as we do not know how many coumns to
reserve for the tems. We aso cannot store the order nformaton n
the Products tabe.
To support many-to-many reatonshp, we need to create a thrd tabe (known as
a junction table), says OrderDetas (or OrderLnes), where each row represents an
tem of a partcuar order. For theOrderDetas tabe, the prmary key conssts of
two coumns: orderID and productID, that unquey dentfy each row. The
coumns orderID and productID n OrderDetas tabe are used to
referenceOrders and Products tabes, hence, they are aso the foregn keys n
the OrderDetas tabe.
The many-to-many reatonshp s, n fact, mpemented as two one-to-many
reatonshps, wth the ntroducton of the |uncton tabe.
1. An order has many tems n OrderDetas. An OrderDetas tem beongs to
one partcuar order.
2. A product may appears n many OrderDetas. Each OrderDetas tem
speced one product.
ne)to)ne
In a "product saes" database, a product may have optona suppementary
nformaton such as mage, moreDescrpton and comment. Keepng them nsde
the Products tabe resuts n many empty spaces (n those records wthout these
optona data). Furthermore, these arge data may degrade the performance of
the database.
Instead, we can create another tabe
(says ProductDetas, ProductLnes or ProductExtras) to store the optona data. A
record w ony be created for those products wth optona data. The two
tabes,Products and ProductDetas, exhbt a one-to-one relationship. That s, for
every row n the parent tabe, there s at most one row (possby zero) n the chd
tabe. The same coumn productID shoud be used as the prmary key for both
tabes.
Some databases mt the number of coumns that can be created nsde a tabe.
You coud use a one-to-one reatonshp to spt the data nto two tabes. One-to-
one reatonshp s aso usefu for storng certan senstve data n a secure tabe,
whe the non-senstve ones n the man tabe.
'olumn Data (ypes
You need to choose an approprate data type for each coumn. Commony data
types ncude: ntegers, oatng-pont numbers, strng (or text), date/tme, bnary,
coecton (such as enumeraton and set).
Step +: Refne , -ormali"e the Desi!n
For exampe,
addng more coumns,
create a new tabe for optona data usng one-to-one reatonshp,
spt a arge tabe nto two smaer tabes,
others.
-ormali"ation
Appy the so-caed normalization rules to check whether your database s
structuray correct and optma.
.irst -ormal .orm (1-.): A tabe s 1NF f every ce contans a snge vaue,
not a st of vaues. Ths propertes s known as atomic. 1NF aso prohbts
repeatng group of coumns such as tem1, tem2,..,temN. Instead, you shoud
create another tabe usng one-to-many reatonshp.
Se$on# -ormal .orm (2-.): A tabe s 2NF, f t s 1NF and every non-key
coumn s fuy dependent on the prmary key. Furthermore, f the prmary key s
made up of severa coumns, every non-key coumn sha depend on the entre
set and not part of t.
For exampe, the prmary key of the OrderDetas tabe
comprsng orderID and productID. If untPrce s dependent ony on productID, t
sha not be kept n the OrderDetas tabe (but n theProducts tabe). On the other
hand, f the untPrce s dependent on the product as we as the partcuar order,
then t sha be kept n the OrderDetas tabe.
(hir# -ormal .orm (&-.): A tabe s 3NF, f t s 2NF and the non-key coumns
are ndependent of each others. In other words, the non-key coumns are
dependent on prmary key, ony on the prmary key and nothng ese. For
exampe, suppose that we have a Products tabe wth coumns productID (prmary
key), name and untPrce. The coumn dscountRate sha not beong
to Products tabe f t s aso dependent on the untPrce, whch s not part of the
prmary key.
/i!her -ormal .orm: 3NF has ts nadequaces, whch eads to hgher Norma
form, such as Boyce/Codd Norma form, Fourth Norma Form (4NF) and Ffth
Norma Form (5NF), whch s beyond the scope of ths tutora.
At tmes, you may decde to break some of the normazaton rues, for
performance reason (e.g., create a coumn caed totaPrce n Orders tabe whch
can be derved from the orderDetas records); or because the end-user requested
for t. Make sure that you fuy aware of t, deveop programmng ogc to hande
t, and propery document the decson.
0nte!rity Rules
You shoud aso appy the ntegrty rues to check the ntegrty of your desgn:
1ntity 0nte!rity Rule: The prmary key cannot contan NULL. Otherwse, t
cannot unquey dentfy the row. For composte key made up of severa coumns,
none of the coumn can contan NULL. Most of the RDBMS check and enforce ths
rue.
Referential 0nte!rity Rule: Each foregn key vaue must be matched to a
prmary key vaue n the tabe referenced (or parent tabe).
You can nsert a row wth a foregn key n the chd tabe ony f the vaue
exsts n the parent tabe.
If the vaue of the key changes n the parent tabe (e.g., the row updated
or deeted), a rows wth ths foregn key n the chd tabe(s) must be
handed accordngy. You coud ether (a) dsaow the changes; (b) cascade
the change (or deete the records) n the chd tabes accordngy; (c) set the
key vaue n the chd tabes to NULL.
Most RDBMS can be setup to perform the check and ensure the referenta
ntegrty, n the speced manner.
2usiness lo!i$ 0nte!rity: Besde the above two genera ntegrty rues, there
coud be ntegrty (vadaton) pertanng to the busness ogc, e.g., zp code sha
be 5-dgt wthn a certan ranges, devery date and tme sha fa n the busness
hours; quantty ordered sha be equa or ess than quantty n stock, etc. These
coud be carred out n vadaton rue (for the specc coumn) or programmng
ogc.
'olumn 0n#e3in!
You coud create index on seected coumn(s) to factate data searchng and
retreva. An ndex s a structured e that speeds up data access for SELECT, but
may sow down INSERT, UPDATE, and DELETE. Wthout an ndex structure, to
process a SELECT query wth a matchng crteron (e.g., SELECT * FROM Customers
WHERE name='Tan Ah Teck'), the database engne needs to compare every
records n the tabe. A specazed ndex (e.g., n BTREE structure) coud reach the
record wthout comparng every records. However, the ndex needs to be rebut
whenever a record s changed, whch resuts n overhead assocated wth usng
ndexes.
Index can be dened on a snge coumn, a set of coumns (caed concatenated
ndex), or part of a coumn (e.g., rst 10 characters of a VARCHAR(100)) (caed
parta ndex) . You coud but more than one ndex n a tabe. For exampe, f you
often search for a customer usng ether customerName or phoneNumber, you
coud speed up the search by budng an ndex on coumn customerName, as we
asphoneNumber. Most RDBMS buds ndex on the prmary key automatcay.

R1.1R1-'1S , R1S4R'1S
"Database desgn bascs (Mcrosoft Access 2007)", avaabe
at http://omce.mcrosoft.com/en-us/access/HA012242471033.aspx.
Pau Ltwn, "Fundamentas of Reatona Database Desgn", avaabe
at http://www.deeptranng.com/twn/dbdesgn/FundamentasOfReatonaDa
tabaseDesgn.aspx.
Codd E. F., "A Reatona Mode of Data for Large Shared Data Banks",
Communcatons of the ACM, vo. 13, ssue 6, pp. 377-387, |une 1970.

You might also like