Professional Documents
Culture Documents
.attribute.Y,.also.in.R,.(written.as.X.→.Y).if.and.only.if.each.
Abstract— X.value.is.associated.with.at.most.one.Y.value..That.is,.given.
Normalization.is.technique.to.reduce.the.redundancy.in.relationa a.tuple.and.the.values.of.the.attributes.in.X,.one.can.unequally
l.database.management.system..It.facilitates.correct.insertion,.del .determine.the.corresponding.value.of.the.Y.attribute..It.is.cust
etion.and.modification.of.data.in.database..A.normalized.databas omarily.to.call.X.the.determinant.set.and.Y.the.dependent.attri
e.does.not.show.anomalies.due.to.future.updates..It.is.very.much. bute.
time.consuming.to.employ.an.automated.technique.to.do.this.dat
a.analysis..At.the.same.time,.the.process.is.tested.to.be.reliable.an
Given.that.X,.Y,.and.Z.are.sets.of.attributes.in.a.relation.R,.on
d.correct..This.paper.proposes.normalization.of.database.in.effici e.can.derive.several.properties.of.functional.dependencies..Am
ent.computation..It.uses.dependency.and.directed.graph.to.gener ong.the.most.important.ones.are.Armstrong's.axioms..These.a
ate.2NF,.3NF,.BCNF.database.depending.upon.the.requirement.. xioms.are.used.in.database.normalization:
Our.proposed.algorithm.performs.normalization.in.n2m.steps,.th Subset.Property.(Axiom.of.Reflexivity):.If.Y.is.a.subset.of.X
us.performing.better.than.other.algorithm. ,.then.X.→.Y...
Augmentation.(Axiom.of.Augmentation):.If.X.→.Y,.then.XZ
Keywords:.Relational.Database,.Functional.Dependency,. .→.YZ..
Normalization,.Primary.Key,.Candidate.key,.Canonical.co Transitivity.(Axiom.of.Transitivity):.If.X.→.Y.and.Y.→.Z,.th
ver,.Functional.Dependency.Equivalence.. en.X→.Z
By.repeated.application.of.Armstrong’s.rules.all.functional.de
................1..INTRODUCTION pendencies.can.be.generated..These.functional.dependencies.p
Normalization.as.a.method.of.producing.good.relational.datab rovide.the.bases.for.database.normalization...Normalization.is.
ase.designs.is.a.well- a.major.task.in.the.design.of.relational.databases.[4]..Mechani
understood.topic.in.the.relational.database.field.[1]..The.goal.o zation.of.the.normalization.process.saves.tremendous.amount.
f.normalization.is.to.create.a.set.of.relational.tables.with.mini of.time.and.money..Despite.its.importance,.very.few.algorithm
mum.amount.of.redundant.data.that.can.be.consistently.and.co s.have.been.developed.to.be.used.in.the.design.of.commercial.
rrectly.modified..The.main.goal.of.any.normalization.techniqu automatic.normalization.tools..Mathematical.normalization.al
e.is.to.design.a.database.that.avoids.redundant.information.and gorithm.is.implemented.in.[5]..In.[6].a.comparison.of.related.s
.update.anomalies.[2]..The.process.of.normalization.was.first.f tudents’.perceptions.of.different.database.normalization.appro
ormalized.by.E.F.Codd..Normalization.is.often.performed.as.a aches.and.the.effects.on.their.performance.is.studied..A.graph.
.series.of.tests.on.a.relation.to.determine.whether.it.satisfies.or rewrite.rule.is.then.obtained.to.transfer.the.data.model.from.on
.violates.the.requirements.of.a.given.normal.form..Three.norm e.normal.form.to.a.higher.normal.form..In.Section.7,.we.use.d
al.forms.called.first.(1NF),.second.(2NF),.and.third.(3NF).nor ependency.graph.diagrams.to.represent.functional.dependenci
mal.forms.were.initially.proposed..An.amendment.was.later.a es.of.a.database.and.we.have.generated.the.dependency.matrix
dded.to.the.third.normal.form.by.R..Boyce.and.E.F..Codd.call .and.the.directed.graph.dependency.matrix..In.Section.8.a.new
ed.Boyce– .algorithm.is.introduced.to.produce.normal.forms.of.the.databa
Codd.Normal.Form.(BCNF)..The.trend.of.defining.other.norm se..Section.9.is.a.short.conclusion.
al.forms.continued.up.to.eighth.normal.form..In.practice,.how
ever,.databases.are.normalized.up.to.and.including.BCNF..The 1.1 Super Key, Candidate Key, Primary Key
refore,.higher.order.normalization.is.not.addressed.in.this.pape Super Key: Attributes in relation which determines all
r..The.first.normal.form.states.that.every.attribute.value.must.b attributes values uniquely in a database are called Attributes of
e.atomic,.in.the.sense.that.it.should.not.be.able.to.be.broken.in Super Keys.
to.more.than.one.singleton.value..As.a.result,.it.is.not.allowed. Candidate key: A candidate key is a column, or set of
to.have.arrays,.structures,.and.as.such.data.structures.for.an.att columns, in a table that can uniquely identify any database
ribute.value...Each.normal.form.is.defined.on.top.of.the.previo record without referring to any other data. Candidate keys are
us.normal.form..That.is,.a.table.is.said.to.be.in.2NF.if.and.only smallest subset of super keys (in other words, if AB is in
.if.it.is.in.1NF.and.it.satisfies.further.conditions..Except.for.the candidate key then none of the A or B should be a super key,
.1NF,.the.other.normal.forms.of.our.interest.rely.on.Functional otherwise that subset will become candidate key). Many
.Dependencies.(FD).among.the.attributes.of.a.relation..Functio efficient Algorithm exists to find the set of candidate key. This
nal.Dependency.is.a.fundamental.notion.of.the.Relational.Mod this paper focuses on Normalization techniques.
el.[3]..Functional.dependency.is.a.constraint.between.two.sets. Primary Key: Out of n number of candidate key, one key is
of.attributes.in.a.relation.of.a.database..Given.a.relation.R,.a.se chosen as primary key[21].
t.of.attributes.X,.in.R,.is.said.to.functionally.determine.another Super Key ⊇ Candidate Key ⊇ Primary Key
2
1.2. Canonical Cover non-prime attribute (attribute which is not part of any
For our proposed algorithm, to get more optimized candidate key), so it is not in second normal form.
computation for normalization, given set of functional
dependencies should be converted to minimal/canonical cover. Third Normal Form(3NF): For a dependency to be in 3NF, it
Steps for canonical cover are as follows: should be in 2NF and there should not be any transitive
1. If right hand side of functional dependency X → Y contains dependency[19]. A dependency is said to be transitive if any
composite attribute (that means Y is a composite attribute) one of the following conditions satisfies:
then, by decomposition rule Y can be decomposed to single I) Part of candidate key with non-prime attribute determines
attribute each containing determinant as X. non-prime attributes.
2. Iterate for each functional dependency and check if without II)Non-prime attribute(s) determines non-prime attribute(s).
considering that dependency, left hand side attribute closure
give same set as when that dependency was considered. If yes Example 3: R= (A, B, C, D), FDs= {AB → C, C → D}
then remove that FD’s since it is redundant, otherwise keep it. Since candidate keys for above FDs is AB only and is in 2NF
Keep iterating until each one is covered. but in C → D, C and D are non-prime attributes and non-
3. By removing an attribute from left hand side (from X), prime attribute is determining another non-prime attribute. So
check if it can be recovered using the remaining attribute in above FDs is not in 3NF.
attribute closure, if it can be recovered then remove it
otherwise, keep it and try other possibility. Keep iterating until Boyce-Codd Normal Form (BCNF): For a dependency to be
each FD’s is covered. in BCNF, it should be in 3NF and there should not be any
4. Go to step 2 and check if any new transitive dependency is overlapping candidate key or in other word for each FD’s,
formed. Keep repeating step 2 and 3 until the result of last and determining attribute should be a super key.
second last step is same.
Example 4: R= (A, B, C), FDs= {A → BC, B → A}
Example 1: FDs= {A → BC, B → C, A → B, AB → C} Since candidate keys for above FDs is A and B and both of the
Step 1: New FDs set= {A→B, A→ C, B → C, A → B, AB determining attribute is a super key, so above relation is in
→C} BCNF.
Step 2: New FDs set= {A → B, B → C, AB → C}
1NF
Step 3: New FDs set= {A → B, B → C, A →C} 2NF
Step 4: Again, checking for step 2: New FDs set= {A → B, B 3NF
BCNF
→ C}
Hence our canonical cover is: {A → B, B → C}.
1.3 Different Normal Forms (DIAGRAM): Inner most in BCNF, then 3NF, then 2NF,
We will be covering 1NF, 2NF, 3NF and BCNF. Since up to then 1NF.
3NF decomposition is both lossless and dependency 1.4 Equivalence of functional dependencies
preserving. BCNF gives lossless decomposition but Since, after decomposition of relation, it is required to check
dependency preservation is not guaranteed. whether dependencies are preserved or not, equivalence of
FDs will be used. We will use it in later section.
First Normal Form(1NF): According to integrity constraint, For the given two set of FDs, F1 and F2, start checking if F1 is
a relational database attributes must contain atomic value such covering all the FDs of F2 and F2 is covering all the FDs of
that it is indivisible[13]. No composite and multivalued F1, if both conditions hold to true then, we can conclude that
attributes are allowed. If it is not in 1NF then decompose the both FDs are equivalent to each other. Here covering means if
relation and add primary key of original relation as foreign we choose every FDs in a given FDs set, say F1, and if its left-
key along with the multivalued attribute in first relation and hand side closure exists in F2 then we can say that F2 is
remove multivalued attribute for second relation. Since we covering F1.
consider ever relational database to be in 1NF, so 1NF won’t If (F1⊐F2 and F2⊐F1) then F1=F2 [15].
be our much concern in normalization. F1 and F2 are equivalent to each other and dependency
Example: A person can have more than one phone number, so is preserved.
in the attribute field, it is not allowed as per 1NF definition, so Example 5: Let, R= (A, B, C, D) and FD1= {A → B, B → C,
the table must be break into two, one containing multivalued AB → D} and FD2= {A → B, B → C, A → C, C → D}
attribute with primary key and other without primary key[14]. Step 1: Checking whether all FDs of FD1 are present in FD2
A → B in set FD1 is present in set FD2. B → C in set FD1 is
Second Normal Form(2NF): For a dependency to be in 2NF, also present in set FD2. AB → D in present in set FD1 but not
it should be in 1NF and for non-trivial FDs there should not be directly in FD2 but we will check whether we can derive it or
any partial dependencies. A dependency is partial if part of not. For set FD2, (AB)+ = {A, B, C, D}. It means that AB can
candidate key determines non-prime attribute[20]. functionally determine A, B, C and D. So, AB → D will also
hold in set FD2.
Example 2: R= (A, B, C, D), FDs= {AB → C, BC → D, A C} As all FDs in set FD1 also hold in set FD2, FD2⊐FD1 is true.
Since candidate keys for above FDs is AB only and is in 1NF Step 2: A → B in set FD2 is present in set FD1. B → C in set
but in A → C, A is part of candidate key AB and determining FD2 is also present in set FD1. A → C is present in FD2 but
3
not directly in FD1 but we will check whether we can derive it show these dependencies by using a set of simple symbols. In
or not. For set FD1, (A)+ = {A, B, C, D}. It means that A can these graphs, arrow is the most important symbol used.
functionally determine A, B, C and D. SO A → C will also Besides, in our way of representing the relationship graph, a
hold in set FD1. A → D is present in FD2 but not directly in (dotted) horizontal line separates simple keys (i.e., attributes)
FD1 but we will check whether we can derive it or not. For set from composite keys (i.e., keys composed of more than one
FD1, (A)+ = {A, B, C, D}. It means that A can functionally attribute). A dependency graph is generated using the
determine A, B, C and D. SO A → D will also hold in set following rules.
FD1. 1. Each attribute of the table is encircled and all attributes of
As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is the table is drawn at the lowest level (i.e., bottom) of the
true. graph.
As FD2 ⊐ FD1 and FD1 ⊐ FD2 both are true, hence FD2 2. A horizontal line is drawn on top of all attributes.
=FD1 is true. These two FD sets are equivalent. 3. Each composite key (if any) is encircled and all composite
1.5 Matrix method for lossless decomposition check keys are drowning on top of the horizontal line.
Tabular method is the efficient way to check whether 4. All functional dependency arrows are drawn.
decomposition is lossless/lossy. Form a matrix of m x n order 5.All reflexivity rule dependencies are drawn using dotted
where m is number of decomposed relation and n is number of arrows (for example AB → A, AB → B). Consider the
attributes of original relation. Initialize each cell of matrix functional dependency set of Example 7 for a relation r.
with the following rule: Example 7: FDs = {A → BCD, C → DE, EF → DG, D →
M[α][β]= X , if column is an attribute of particular row G}
M[α][β]= Yα β , else
Using each FDs check the corresponding column, if functional
dependencies condition violates then change the value to X in
the corresponding column. If at least one of the rows become EF
all X then the decomposition is lossless else iterate until last
and second last step become same and if no rows become all
X until last step then the decomposition is lossy.
Example 6: R= (A, B, C, D) FDs= {AB → CD, D → A},
decomposition is D (AD, BCD).
In part (a) of Figure 6, we start with the first row of the DM Dependency-closure ()
matrix. The determinant key of this row is AB. A and B are {
subsets of AB which appear in columns one and two of the for (i=0; i<n ; i++)
matrix. In Row one, columns one and two are both nonzero. for ( j=0; j<n ; j++)
Therefore, AB depends on AB. Considering the second row, if (i! =j && Path[i][j]!=-1) {
columns one and two are both nonzero, too. Hence, AB for (k=0; k<m ; k++)
depends on BC. However, for the third row, it is not the case if( DM[j][k]!=0 && DM[j][k]!=2)
that both A and B depend on DE. Therefore, a -1 value is put DM[i][k]=j; }
in the intersection of row DE and column AB in the DG }
matrix of part (b) of Figure 6. Figure 8: Recognition of dependency closure
Figure 6: Dependency Matrix and Directed Graph Matrix Analysis of above algorithm:
A B C D E AB BC DE Variable n= number of functional dependencies
AB 2 2 0 0 1 AB 1 -1 -1 Variable m= number of attributes in the relation.
BC 1 2 2 0 0 BC 1 1 -1 For i= 0, 1, 2, 3, 4, …………………… , n
For each i, there would be m iteration in worst case
DE 1 0 0 2 2 DE -1 -1 1 So, (m +m +m…….+ m)(n times)*n= n2m.
So total number iteration required in worst case for
Dependency-closure() procedure would be functional
The algorithm for producing the DG graph follows. dependencies square times number of attributes in the relation.
DM of Figure 5 is updated as follows to reflect all
Directed- Graph-Matrix () dependencies including those that are obtained by
{ Dependency-closure procedure.
for (i=0; i<n; i++) Figure 8: E depends on BC via AB
for (k= each attribute that composed determinant key i)
A B C D E
for (j=0; j<n ; j++ ) {
if (DM[j][k]! =0 && DG[j][i]! =- 1) AB 2 2 0 0 1
DG[j][i] = 1; BC 1 2 2 0 AB
else DG[j][i] = -1;} DE 1 0 0 2 2
}
5
In Figure 9, E depends on BC via AB. It is possible that E Example 8: Consider the following case taken from [8]:
might depend on BC through some other determinant key, too. Relation GH {A, B, C, D, E, F, G, H, I, J, K, L} with
In which case is will not matter which determinant key is used dependencies: FDs = {A → BC, E → AD, G → AEJK, GH
in Figure 9 to represent this dependency. One issue to be →FI, K →AL, and J →K}.
careful of is that by updating the DM matrix to reflect Figure 13 shows the original Dependency Matrix:
transitive dependencies some direct dependencies may fade Figure 13: Initial Dependency Matrix
away.
Consider FDs = {A → B, B → A and B → C}. The DM and A B C D E F G H I J K L
DG matrices are shown in Figure 9. A 2 1 1 0 0 0 0 0 0 0 0 0
E 1 0 0 1 2 0 0 0 0 0 0 0
Figure 10: The DM and DG matrices G 1 0 0 0 1 0 2 0 0 1 1 0
A B C A B GH 0 0 0 0 0 1 2 2 1 0 0 0
A 2 1 0 A 1 1 K 1 0 0 0 0 0 0 0 0 0 2 1
B 1 2 1 B 1 1 J 0 0 0 0 0 0 0 0 0 2 1 0
(a): Dependency Matrix (b): Directed Graph Matrix
Figure 14 is the corresponding DG matrix.
By applying the path finding algorithm, the updated matrix is Figure 14 : The DG matrix for Example 8
shown in part (a) of Figure 11. As it can be seen from part (a) A E G GH K J
of Figure 10, the direct dependency of C to B has faded away.
A 1 -1 -1 -1 -1 -1
To tackle this deficiency, the following Circular-Dependency
algorithm is designed. This algorithm internally uses the E 1 1 -1 -1 -1 -1
FindOne recursive algorithm. The latter will find the direct G 1 1 1 -1 1 1
dependency, if any, and replace the transitive one. This is GH -1 -1 1 1 -1 -1
reflected in part (b) Figure 11. K 1 -1 -1 -1 1 -1
J -1 -1 -1 -1 1 1
Figure 11: The original B → C is returned
A B C A B C The path matrix is shown in Figure15.
A 2 1 B A 2 1 B
Figure 15: Determinant key transitive dependencies
B 1 2 A B 1 2 1
A E G GH K J
A 1 -1 -1 -1 -1 -1
In Figure 11, DM2 represents the initial dependency matrix. E 1 1 -1 -1 -1 -1
G 1 1 1 -1 1 1
Circular-Dependency ()
{ GH 1 1 1 1 1 1
for ( i=0; i<n; i++) K 1 -1 -1 -1 1 -1
for(j=0; j<m; j++) J 1 -1 -1 -1 1 1
if(DM[i][j]!= {0,1,2})
if(FindOne (i, j, j, n) && DM2[i][j]==1)
DM[i][j]=1; New dependencies are applied to the DM and Figure 16 is the
} semi-final result.
int FindOne (int i, element j, int k, int n) Figure 16 : Dependency closure matrix
{ A B C D E F G H I J K L
if(DM[j][k]==1 && n>=1) return 0;
A 2 1 1 0 0 0 0 0 0 0 0 0
elseif (n<1) return 1;
else return FindOne (i, DM[i][k], k, n-1); E 1 A A 1 2 0 0 0 0 0 0 0
} G K E E E 1 0 2 0 0 1 J K
Figure 12: Replacing transitive dependency with GH K G G G G 1 2 2 1 G J K
original direct dependency K 1 A A 0 0 0 0 0 0 0 2 1
J K K K 0 0 0 0 0 0 2 1 0
Analysis of above algorithm:
Variable n= number of functional dependencies
Variable m= number of attributes in the relation. It is now the time to replace direct dependencies which might
Outer loop will run for n iteration, middle for m iteration and have disappeared by applying transitive dependencies.
for each such m maximum n iteration FindOne() procedure However, the FindOne algorithm does not discover any fade
would do. So, in worst case total iteration would be n2 m. away dependency. Therefore, Figure 16 shows the optimal
6
dependency set. Entries with value 1 are identify components determinant key is encountered whose dependency is neither
of this set. partial (from Figure 17) nor it is wholly dependent on part of
We are now in a position to obtain candidate keys. A the primary key [9] a separate table has to be formed. Of
candidate key is a set of attributes to which all other attributes course, if a table is previously formed a duplicate is not
depend on. From the final DM we notice that GH has this generated. This new table will include the determinant key and
property. all other attributes which are transitively depend on this key.
There are other sets of attributes which can be considered as As it can be seen, there is no transitive dependency in part (b)
candidate keys. For example, the set of {G, F, H, I} could be of Figure 17. However, dependencies of A, E, K, and J in part
considered as a candidate key. However, the set with the least (a) are of transitive form. Each of these dependencies led to
number of attributes amongst the determinant keys will be production of a new table.
considered the primary key in the following discussions.
Figure 18: Database Normalized up to 3NF
3. The Proposed Normalization Process A K L
We had already shown the description of normal forms in E G J A B C K 1 2 1
previous section, now its time to convert the relation to 2NF, G 1 2 1 A 2 1 1
3NF, and BCNF form using all the above section information.
3.1 Second Normal Form (2NF) J K A D E
To proceed with the 2NF, it is assumed that the table is J 2 1 E 1 1 2
F G H I
already in 1NF form. The resulting 1NF relation is:
GH_Relation :{ A, B, C, D, E, F, G,H,I, J, K, L}, GH 1 2 2 1
decomposition would be lossless and dependency would be 3.3 The BCNF Normal Form
preserved, we can check using above mentioned method. Since, BCNF decomposition don’t guarantee dependency
The goal is to discover all partial dependencies[11]. To preservation, hence it is not widely used normal form. So, we
produce the 2NF form, we should find all partial are not very much concerned about BCNF conversion.
dependencies. To do this, the DM is scanned row by row The resulting BCNF relations for example 8 are:
(ignoring the primary key row), starting from the first row. If
all values of the simple keys that make up the determinant key GH_Relation :{ GH, F, I}, J_Relation :{ J, K}, K_Relation :{
of the row being scanned are equal to 2 and the values of the K, A, L}, G_Relation :{ G, E, J} , E_Relation :{ E, A, D} and
corresponding columns of the candidate key are equal to 2, A_Relation :{ A, B, C}.
then a partial dependency is found. In Figure 16, the
dependency of G to GH is partial. Therefore, we have to create 3.4 A Complete Normalization Example
a new table. From the DM matrix, we notice that E and J are The following is a complete example with multiple candidate
directly dependent to G. The new table will be composed of G, keys. Example 9: Consider the following case taken from [9]:
E, J, and all simple keys which are transitively dependent on Relation AB:{A, B, C, D, E, F, G, H} with dependencies: FDs
G. The transitive dependencies are obtained from the = {AB → CEFGH, A → D, F → G, BF → H, BCH →
determinant key transitive dependencies’ matrix. G is the ADEFG and BCF → ADE}
primary key of this table. There is no other partial
dependency. In Figure 17, the DM matrix is partitioned into Figure 19 shows the Dependency Matrix:
two new DMs corresponding to new tables. Figure 19: Initial Dependency Matrix