Q PDF

1
EFFICIENT ALGORITHM FOR DATABASE NORMALIZATION
Devbrat Anand, Dr. Maroti Deshmukh
.attribute.Y,.also.in.R,.(written.as.X.→.Y).if.and.only.if.each.
Abstract— X.value.is.associated.with.at.most.one.Y.value..That.is,.given.
Normalization.is.technique.to.reduce.the.redundancy.in.relationa a.tuple.and.the.values.of.the.attributes.in.X,.one.can.unequally
l.database.management.system..It.facilitates.correct.insertion,.del .determine.the.corresponding.value.of.the.Y.attribute..It.is.cust
etion.and.modification.of.data.in.database..A.normalized.databas omarily.to.call.X.the.determinant.set.and.Y.the.dependent.attri
e.does.not.show.anomalies.due.to.future.updates..It.is.very.much. bute.
time.consuming.to.employ.an.automated.technique.to.do.this.dat
a.analysis..At.the.same.time,.the.process.is.tested.to.be.reliable.an
Given.that.X,.Y,.and.Z.are.sets.of.attributes.in.a.relation.R,.on
d.correct..This.paper.proposes.normalization.of.database.in.effici e.can.derive.several.properties.of.functional.dependencies..Am
ent.computation..It.uses.dependency.and.directed.graph.to.gener ong.the.most.important.ones.are.Armstrong's.axioms..These.a
ate.2NF,.3NF,.BCNF.database.depending.upon.the.requirement.. xioms.are.used.in.database.normalization:
Our.proposed.algorithm.performs.normalization.in.n2m.steps,.th Subset.Property.(Axiom.of.Reflexivity):.If.Y.is.a.subset.of.X
us.performing.better.than.other.algorithm. ,.then.X.→.Y...
Augmentation.(Axiom.of.Augmentation):.If.X.→.Y,.then.XZ
Keywords:.Relational.Database,.Functional.Dependency,. .→.YZ..
Normalization,.Primary.Key,.Candidate.key,.Canonical.co Transitivity.(Axiom.of.Transitivity):.If.X.→.Y.and.Y.→.Z,.th
ver,.Functional.Dependency.Equivalence.. en.X→.Z
By.repeated.application.of.Armstrong’s.rules.all.functional.de
................1..INTRODUCTION pendencies.can.be.generated..These.functional.dependencies.p
Normalization.as.a.method.of.producing.good.relational.datab rovide.the.bases.for.database.normalization...Normalization.is.
ase.designs.is.a.well- a.major.task.in.the.design.of.relational.databases.[4]..Mechani
understood.topic.in.the.relational.database.field.[1]..The.goal.o zation.of.the.normalization.process.saves.tremendous.amount.
f.normalization.is.to.create.a.set.of.relational.tables.with.mini of.time.and.money..Despite.its.importance,.very.few.algorithm
mum.amount.of.redundant.data.that.can.be.consistently.and.co s.have.been.developed.to.be.used.in.the.design.of.commercial.
rrectly.modified..The.main.goal.of.any.normalization.techniqu automatic.normalization.tools..Mathematical.normalization.al
e.is.to.design.a.database.that.avoids.redundant.information.and gorithm.is.implemented.in.[5]..In.[6].a.comparison.of.related.s
.update.anomalies.[2]..The.process.of.normalization.was.first.f tudents’.perceptions.of.different.database.normalization.appro
ormalized.by.E.F.Codd..Normalization.is.often.performed.as.a aches.and.the.effects.on.their.performance.is.studied..A.graph.
.series.of.tests.on.a.relation.to.determine.whether.it.satisfies.or rewrite.rule.is.then.obtained.to.transfer.the.data.model.from.on
.violates.the.requirements.of.a.given.normal.form..Three.norm e.normal.form.to.a.higher.normal.form..In.Section.7,.we.use.d
al.forms.called.first.(1NF),.second.(2NF),.and.third.(3NF).nor ependency.graph.diagrams.to.represent.functional.dependenci
mal.forms.were.initially.proposed..An.amendment.was.later.a es.of.a.database.and.we.have.generated.the.dependency.matrix
dded.to.the.third.normal.form.by.R..Boyce.and.E.F..Codd.call .and.the.directed.graph.dependency.matrix..In.Section.8.a.new
ed.Boyce– .algorithm.is.introduced.to.produce.normal.forms.of.the.databa
Codd.Normal.Form.(BCNF)..The.trend.of.defining.other.norm se..Section.9.is.a.short.conclusion.
al.forms.continued.up.to.eighth.normal.form..In.practice,.how
ever,.databases.are.normalized.up.to.and.including.BCNF..The 1.1 Super Key, Candidate Key, Primary Key
refore,.higher.order.normalization.is.not.addressed.in.this.pape Super Key: Attributes in relation which determines all
r..The.first.normal.form.states.that.every.attribute.value.must.b attributes values uniquely in a database are called Attributes of
e.atomic,.in.the.sense.that.it.should.not.be.able.to.be.broken.in Super Keys.
to.more.than.one.singleton.value..As.a.result,.it.is.not.allowed. Candidate key: A candidate key is a column, or set of
to.have.arrays,.structures,.and.as.such.data.structures.for.an.att columns, in a table that can uniquely identify any database
ribute.value...Each.normal.form.is.defined.on.top.of.the.previo record without referring to any other data. Candidate keys are
us.normal.form..That.is,.a.table.is.said.to.be.in.2NF.if.and.only smallest subset of super keys (in other words, if AB is in
.if.it.is.in.1NF.and.it.satisfies.further.conditions..Except.for.the candidate key then none of the A or B should be a super key,
.1NF,.the.other.normal.forms.of.our.interest.rely.on.Functional otherwise that subset will become candidate key). Many
.Dependencies.(FD).among.the.attributes.of.a.relation..Functio efficient Algorithm exists to find the set of candidate key. This
nal.Dependency.is.a.fundamental.notion.of.the.Relational.Mod this paper focuses on Normalization techniques.
el.[3]..Functional.dependency.is.a.constraint.between.two.sets. Primary Key: Out of n number of candidate key, one key is
of.attributes.in.a.relation.of.a.database..Given.a.relation.R,.a.se chosen as primary key[21].
t.of.attributes.X,.in.R,.is.said.to.functionally.determine.another Super Key ⊇ Candidate Key ⊇ Primary Key
2
1.2. Canonical Cover non-prime attribute (attribute which is not part of any
For our proposed algorithm, to get more optimized candidate key), so it is not in second normal form.
computation for normalization, given set of functional
dependencies should be converted to minimal/canonical cover. Third Normal Form(3NF): For a dependency to be in 3NF, it
Steps for canonical cover are as follows: should be in 2NF and there should not be any transitive
1. If right hand side of functional dependency X → Y contains dependency[19]. A dependency is said to be transitive if any
composite attribute (that means Y is a composite attribute) one of the following conditions satisfies:
then, by decomposition rule Y can be decomposed to single I) Part of candidate key with non-prime attribute determines
attribute each containing determinant as X. non-prime attributes.
2. Iterate for each functional dependency and check if without II)Non-prime attribute(s) determines non-prime attribute(s).
considering that dependency, left hand side attribute closure
give same set as when that dependency was considered. If yes Example 3: R= (A, B, C, D), FDs= {AB → C, C → D}
then remove that FD’s since it is redundant, otherwise keep it. Since candidate keys for above FDs is AB only and is in 2NF
Keep iterating until each one is covered. but in C → D, C and D are non-prime attributes and non-
3. By removing an attribute from left hand side (from X), prime attribute is determining another non-prime attribute. So
check if it can be recovered using the remaining attribute in above FDs is not in 3NF.
attribute closure, if it can be recovered then remove it
otherwise, keep it and try other possibility. Keep iterating until Boyce-Codd Normal Form (BCNF): For a dependency to be
each FD’s is covered. in BCNF, it should be in 3NF and there should not be any
4. Go to step 2 and check if any new transitive dependency is overlapping candidate key or in other word for each FD’s,
formed. Keep repeating step 2 and 3 until the result of last and determining attribute should be a super key.
second last step is same.
Example 4: R= (A, B, C), FDs= {A → BC, B → A}
Example 1: FDs= {A → BC, B → C, A → B, AB → C} Since candidate keys for above FDs is A and B and both of the
Step 1: New FDs set= {A→B, A→ C, B → C, A → B, AB determining attribute is a super key, so above relation is in
→C} BCNF.
Step 2: New FDs set= {A → B, B → C, AB → C}
1NF
Step 3: New FDs set= {A → B, B → C, A →C} 2NF
Step 4: Again, checking for step 2: New FDs set= {A → B, B 3NF
BCNF
→ C}
Hence our canonical cover is: {A → B, B → C}.
1.3 Different Normal Forms (DIAGRAM): Inner most in BCNF, then 3NF, then 2NF,
We will be covering 1NF, 2NF, 3NF and BCNF. Since up to then 1NF.
3NF decomposition is both lossless and dependency 1.4 Equivalence of functional dependencies
preserving. BCNF gives lossless decomposition but Since, after decomposition of relation, it is required to check
dependency preservation is not guaranteed. whether dependencies are preserved or not, equivalence of
FDs will be used. We will use it in later section.
First Normal Form(1NF): According to integrity constraint, For the given two set of FDs, F1 and F2, start checking if F1 is
a relational database attributes must contain atomic value such covering all the FDs of F2 and F2 is covering all the FDs of
that it is indivisible[13]. No composite and multivalued F1, if both conditions hold to true then, we can conclude that
attributes are allowed. If it is not in 1NF then decompose the both FDs are equivalent to each other. Here covering means if
relation and add primary key of original relation as foreign we choose every FDs in a given FDs set, say F1, and if its left-
key along with the multivalued attribute in first relation and hand side closure exists in F2 then we can say that F2 is
remove multivalued attribute for second relation. Since we covering F1.
consider ever relational database to be in 1NF, so 1NF won’t If (F1⊐F2 and F2⊐F1) then F1=F2 [15].
be our much concern in normalization. F1 and F2 are equivalent to each other and dependency
Example: A person can have more than one phone number, so is preserved.
in the attribute field, it is not allowed as per 1NF definition, so Example 5: Let, R= (A, B, C, D) and FD1= {A → B, B → C,
the table must be break into two, one containing multivalued AB → D} and FD2= {A → B, B → C, A → C, C → D}
attribute with primary key and other without primary key[14]. Step 1: Checking whether all FDs of FD1 are present in FD2
A → B in set FD1 is present in set FD2. B → C in set FD1 is
Second Normal Form(2NF): For a dependency to be in 2NF, also present in set FD2. AB → D in present in set FD1 but not
it should be in 1NF and for non-trivial FDs there should not be directly in FD2 but we will check whether we can derive it or
any partial dependencies. A dependency is partial if part of not. For set FD2, (AB)+ = {A, B, C, D}. It means that AB can
candidate key determines non-prime attribute[20]. functionally determine A, B, C and D. So, AB → D will also
hold in set FD2.
Example 2: R= (A, B, C, D), FDs= {AB → C, BC → D, A C} As all FDs in set FD1 also hold in set FD2, FD2⊐FD1 is true.
Since candidate keys for above FDs is AB only and is in 1NF Step 2: A → B in set FD2 is present in set FD1. B → C in set
but in A → C, A is part of candidate key AB and determining FD2 is also present in set FD1. A → C is present in FD2 but
3
not directly in FD1 but we will check whether we can derive it show these dependencies by using a set of simple symbols. In
or not. For set FD1, (A)+ = {A, B, C, D}. It means that A can these graphs, arrow is the most important symbol used.
functionally determine A, B, C and D. SO A → C will also Besides, in our way of representing the relationship graph, a
hold in set FD1. A → D is present in FD2 but not directly in (dotted) horizontal line separates simple keys (i.e., attributes)
FD1 but we will check whether we can derive it or not. For set from composite keys (i.e., keys composed of more than one
FD1, (A)+ = {A, B, C, D}. It means that A can functionally attribute). A dependency graph is generated using the
determine A, B, C and D. SO A → D will also hold in set following rules.
FD1. 1. Each attribute of the table is encircled and all attributes of
As all FDs in set FD2 also hold in set FD1, FD1 ⊃ FD2 is the table is drawn at the lowest level (i.e., bottom) of the
true. graph.
As FD2 ⊐ FD1 and FD1 ⊐ FD2 both are true, hence FD2 2. A horizontal line is drawn on top of all attributes.
=FD1 is true. These two FD sets are equivalent. 3. Each composite key (if any) is encircled and all composite
1.5 Matrix method for lossless decomposition check keys are drowning on top of the horizontal line.
Tabular method is the efficient way to check whether 4. All functional dependency arrows are drawn.
decomposition is lossless/lossy. Form a matrix of m x n order 5.All reflexivity rule dependencies are drawn using dotted
where m is number of decomposed relation and n is number of arrows (for example AB → A, AB → B). Consider the
attributes of original relation. Initialize each cell of matrix functional dependency set of Example 7 for a relation r.
with the following rule: Example 7: FDs = {A → BCD, C → DE, EF → DG, D →
M[α][β]= X , if column is an attribute of particular row G}
M[α][β]= Yα β , else
Using each FDs check the corresponding column, if functional
dependencies condition violates then change the value to X in
the corresponding column. If at least one of the rows become EF
all X then the decomposition is lossless else iterate until last
and second last step become same and if no rows become all
X until last step then the decomposition is lossy.
Example 6: R= (A, B, C, D) FDs= {AB → CD, D → A},
decomposition is D (AD, BCD).
Figure 1: Initial Decomposition A B C D E F G

F F F F F F F
A B C D
AD X Y12 Y13 X Figure 3: Graphical representation of dependencies
BCD Y21 X X X If we are able to obtain all dependencies between determinant
keys we can produce all dependencies between all attributes of
a relation. These dependencies are represented by using a
Checking each FDs one by one and change the cell according
Dependency Matrix (DM). Using path finding algorithms and
to the aforementioned rule.
Armstrong’s transitivity rule new dependencies are discovered
from the existing dependency set. This is the basis of the
Figure 2: Final Decomposition
normalization algorithm which we will be discussing here.
A B C D
AB X Y12 Y13 X 2.2. Dependency Matrix
D X X X X From a dependency graph, the corresponding Dependency
Matrix (DM) is generated as follows:
Since last row become all X so the given decomposition is I). Define matrix DM [n] [m], where
lossless. n = number of Determinant Keys.
PROPOSED TECHNIQUE: m = number of Simple Keys.
II). Suppose that αβ ⊆, αγ ⊄ and
2. REPRESENTING DEPENDENCIES βγ ∈, {Simple key set}, α ∈ {Determinant key set}
Now, we are in position to start proposing our techniques, III). Establish DM elements as follows:
since we had already defined and showed with example each If α → β  DM[α][β] = 2
and everything required for the normalization. If α → γ  DM[α][ γ] = 1
We will use three structures, Dependency Graph (DG), Otherwise, DM[α][ γ] = 0
Dependency Matrix (DM), and Directed Graph Matrix (DG), The DM for Example 7 is shown in Figure 4.
to represent and manipulate dependencies amongst attributes Figure 4: Initial dependency matrix
of a relation.
A B C D E F G
2.1. Dependency Graph Diagram A 2 1 1 1 0 0 0
C 0 0 2 1 1 0 0
With functional dependency we can monitor all relations D 0 0 0 2 0 0 1
between different attributes of a table. We can graphically EF 0 0 0 1 2 2 1
4
Analysis of above algorithm:

2.3. Directed Graph Matrix Variable n= number of functional dependencies
For more optimization, canonical cover of FDs as described Variable m= number of attributes in the relation.
above should be found which will reduce number of FDs. For i= 0, 1, 2, 3, 4, …………………… , n
The Directed Graph (DG) matrix for determinant keys is used For each i, there would be m iteration in worst case
to represent all possible direct dependencies between So, (m +m +m…….+ m)(n times)*n= n2m.
determinant keys. The DG is an n × n matrix where n is the
number of determinant keys. The process of determining the After generating the DG matrix we turn our attention towards
elements of this matrix follows. The elements of the DG finding all possible paths between all pairs. This matrix will
matrix are initially set to zeros. Starting from the first row of show all transitive dependencies between determinant keys.
the dependency matrix DM, this matrix is investigated in a There is many such path finding algorithms like Prim,
row major approach. Suppose we are investigating the row Kruskal, and Warshal algorithms. If there is a path from node
corresponding to determinant key x. If all simple keys that x is x to node y it means y transitively depends on x. As an
composed of depend on a determinant key other than x then x example, Figure 7 shows the complete determinant key
also depends on that determinant key (Armstrong’s transitive dependencies corresponding to the DM graph of
augmentation rule). The dependency of a simple key to a Figure 5.
determinant key is represented by a non-zero in the DM Figure 7: Determinant key transitive dependencies
matrix. For example, suppose that FDs = {AB → E, BC → A,
DE → A}. The corresponding dependency matrix and the AB BC DE
initial directed graph matrix are shown in Figure 5. AB 1 -1 -1
Figure 5: Initializing DM and DG Matrices BC 1 1 -1
DE -1 -1 1
A B C D E AB BC DE
AB 2 2 0 0 1 AB 0 0 0
From Figure 7 we can deduct that AB depends on BC. On the
BC 1 2 2 0 0 BC 0 0 0
other hand, E depends on AB. Therefore, E depends on BC.
DE 1 0 0 2 2 DE 0 0 0 That is, BC → AB, AB → E => BC → E. These dependencies
are recognized through dependency closure procedure which
(a): Dependency Matrix (b):Directed Graph Matrix is presented in Figure 8.
In part (a) of Figure 6, we start with the first row of the DM Dependency-closure ()
matrix. The determinant key of this row is AB. A and B are {
subsets of AB which appear in columns one and two of the for (i=0; i<n ; i++)
matrix. In Row one, columns one and two are both nonzero. for ( j=0; j<n ; j++)
Therefore, AB depends on AB. Considering the second row, if (i! =j && Path[i][j]!=-1) {
columns one and two are both nonzero, too. Hence, AB for (k=0; k<m ; k++)
depends on BC. However, for the third row, it is not the case if( DM[j][k]!=0 && DM[j][k]!=2)
that both A and B depend on DE. Therefore, a -1 value is put DM[i][k]=j; }
in the intersection of row DE and column AB in the DG }
matrix of part (b) of Figure 6. Figure 8: Recognition of dependency closure
Figure 6: Dependency Matrix and Directed Graph Matrix Analysis of above algorithm:
A B C D E AB BC DE Variable n= number of functional dependencies
AB 2 2 0 0 1 AB 1 -1 -1 Variable m= number of attributes in the relation.
BC 1 2 2 0 0 BC 1 1 -1 For i= 0, 1, 2, 3, 4, …………………… , n
For each i, there would be m iteration in worst case
DE 1 0 0 2 2 DE -1 -1 1 So, (m +m +m…….+ m)(n times)*n= n2m.
So total number iteration required in worst case for
Dependency-closure() procedure would be functional
The algorithm for producing the DG graph follows. dependencies square times number of attributes in the relation.
DM of Figure 5 is updated as follows to reflect all
Directed- Graph-Matrix () dependencies including those that are obtained by
{ Dependency-closure procedure.
for (i=0; i<n; i++) Figure 8: E depends on BC via AB
for (k= each attribute that composed determinant key i)
A B C D E
for (j=0; j<n ; j++ ) {
if (DM[j][k]! =0 && DG[j][i]! =- 1) AB 2 2 0 0 1
DG[j][i] = 1; BC 1 2 2 0 AB
else DG[j][i] = -1;} DE 1 0 0 2 2
}
5
In Figure 9, E depends on BC via AB. It is possible that E Example 8: Consider the following case taken from [8]:
might depend on BC through some other determinant key, too. Relation GH {A, B, C, D, E, F, G, H, I, J, K, L} with
In which case is will not matter which determinant key is used dependencies: FDs = {A → BC, E → AD, G → AEJK, GH
in Figure 9 to represent this dependency. One issue to be →FI, K →AL, and J →K}.
careful of is that by updating the DM matrix to reflect Figure 13 shows the original Dependency Matrix:
transitive dependencies some direct dependencies may fade Figure 13: Initial Dependency Matrix
away.
Consider FDs = {A → B, B → A and B → C}. The DM and A B C D E F G H I J K L
DG matrices are shown in Figure 9. A 2 1 1 0 0 0 0 0 0 0 0 0
E 1 0 0 1 2 0 0 0 0 0 0 0
Figure 10: The DM and DG matrices G 1 0 0 0 1 0 2 0 0 1 1 0
A B C A B GH 0 0 0 0 0 1 2 2 1 0 0 0
A 2 1 0 A 1 1 K 1 0 0 0 0 0 0 0 0 0 2 1
B 1 2 1 B 1 1 J 0 0 0 0 0 0 0 0 0 2 1 0
(a): Dependency Matrix (b): Directed Graph Matrix
Figure 14 is the corresponding DG matrix.
By applying the path finding algorithm, the updated matrix is Figure 14 : The DG matrix for Example 8
shown in part (a) of Figure 11. As it can be seen from part (a) A E G GH K J
of Figure 10, the direct dependency of C to B has faded away.
A 1 -1 -1 -1 -1 -1
To tackle this deficiency, the following Circular-Dependency
algorithm is designed. This algorithm internally uses the E 1 1 -1 -1 -1 -1
FindOne recursive algorithm. The latter will find the direct G 1 1 1 -1 1 1
dependency, if any, and replace the transitive one. This is GH -1 -1 1 1 -1 -1
reflected in part (b) Figure 11. K 1 -1 -1 -1 1 -1
J -1 -1 -1 -1 1 1
Figure 11: The original B → C is returned
A B C A B C The path matrix is shown in Figure15.
A 2 1 B A 2 1 B
Figure 15: Determinant key transitive dependencies
B 1 2 A B 1 2 1
A E G GH K J
A 1 -1 -1 -1 -1 -1
In Figure 11, DM2 represents the initial dependency matrix. E 1 1 -1 -1 -1 -1
G 1 1 1 -1 1 1
Circular-Dependency ()
{ GH 1 1 1 1 1 1
for ( i=0; i<n; i++) K 1 -1 -1 -1 1 -1
for(j=0; j<m; j++) J 1 -1 -1 -1 1 1
if(DM[i][j]!= {0,1,2})
if(FindOne (i, j, j, n) && DM2[i][j]==1)
DM[i][j]=1; New dependencies are applied to the DM and Figure 16 is the
} semi-final result.
int FindOne (int i, element j, int k, int n) Figure 16 : Dependency closure matrix
{ A B C D E F G H I J K L
if(DM[j][k]==1 && n>=1) return 0;
A 2 1 1 0 0 0 0 0 0 0 0 0
elseif (n<1) return 1;
else return FindOne (i, DM[i][k], k, n-1); E 1 A A 1 2 0 0 0 0 0 0 0
} G K E E E 1 0 2 0 0 1 J K
Figure 12: Replacing transitive dependency with GH K G G G G 1 2 2 1 G J K
original direct dependency K 1 A A 0 0 0 0 0 0 0 2 1
J K K K 0 0 0 0 0 0 2 1 0
Analysis of above algorithm:
Variable n= number of functional dependencies
Variable m= number of attributes in the relation. It is now the time to replace direct dependencies which might
Outer loop will run for n iteration, middle for m iteration and have disappeared by applying transitive dependencies.
for each such m maximum n iteration FindOne() procedure However, the FindOne algorithm does not discover any fade
would do. So, in worst case total iteration would be n2 m. away dependency. Therefore, Figure 16 shows the optimal
6
dependency set. Entries with value 1 are identify components determinant key is encountered whose dependency is neither
of this set. partial (from Figure 17) nor it is wholly dependent on part of
We are now in a position to obtain candidate keys. A the primary key [9] a separate table has to be formed. Of
candidate key is a set of attributes to which all other attributes course, if a table is previously formed a duplicate is not
depend on. From the final DM we notice that GH has this generated. This new table will include the determinant key and
property. all other attributes which are transitively depend on this key.
There are other sets of attributes which can be considered as As it can be seen, there is no transitive dependency in part (b)
candidate keys. For example, the set of {G, F, H, I} could be of Figure 17. However, dependencies of A, E, K, and J in part
considered as a candidate key. However, the set with the least (a) are of transitive form. Each of these dependencies led to
number of attributes amongst the determinant keys will be production of a new table.
considered the primary key in the following discussions.
Figure 18: Database Normalized up to 3NF
3. The Proposed Normalization Process A K L
We had already shown the description of normal forms in E G J A B C K 1 2 1
previous section, now its time to convert the relation to 2NF, G 1 2 1 A 2 1 1
3NF, and BCNF form using all the above section information.
3.1 Second Normal Form (2NF) J K A D E
To proceed with the 2NF, it is assumed that the table is J 2 1 E 1 1 2
F G H I
already in 1NF form. The resulting 1NF relation is:
GH_Relation :{ A, B, C, D, E, F, G,H,I, J, K, L}, GH 1 2 2 1
decomposition would be lossless and dependency would be 3.3 The BCNF Normal Form
preserved, we can check using above mentioned method. Since, BCNF decomposition don’t guarantee dependency
The goal is to discover all partial dependencies[11]. To preservation, hence it is not widely used normal form. So, we
produce the 2NF form, we should find all partial are not very much concerned about BCNF conversion.
dependencies. To do this, the DM is scanned row by row The resulting BCNF relations for example 8 are:
(ignoring the primary key row), starting from the first row. If
all values of the simple keys that make up the determinant key GH_Relation :{ GH, F, I}, J_Relation :{ J, K}, K_Relation :{
of the row being scanned are equal to 2 and the values of the K, A, L}, G_Relation :{ G, E, J} , E_Relation :{ E, A, D} and
corresponding columns of the candidate key are equal to 2, A_Relation :{ A, B, C}.
then a partial dependency is found. In Figure 16, the
dependency of G to GH is partial. Therefore, we have to create 3.4 A Complete Normalization Example
a new table. From the DM matrix, we notice that E and J are The following is a complete example with multiple candidate
directly dependent to G. The new table will be composed of G, keys. Example 9: Consider the following case taken from [9]:
E, J, and all simple keys which are transitively dependent on Relation AB:{A, B, C, D, E, F, G, H} with dependencies: FDs
G. The transitive dependencies are obtained from the = {AB → CEFGH, A → D, F → G, BF → H, BCH →
determinant key transitive dependencies’ matrix. G is the ADEFG and BCF → ADE}
primary key of this table. There is no other partial
dependency. In Figure 17, the DM matrix is partitioned into Figure 19 shows the Dependency Matrix:
two new DMs corresponding to new tables. Figure 19: Initial Dependency Matrix
Figure 17 : Database normalized up to 2NF A B C D E F G H

A B C D E F J K L AB 2 2 1 0 1 1 1 1
A 2 1 1 0 0 0 0 0 0 A 2 0 0 1 0 0 0 0
E 1 A A 1 2 0 0 0 0 F 0 0 0 0 0 2 1 0
G K E E E 1 2 1 J K BF 0 2 0 0 0 2 0 1
K 1 A A 0 0 0 0 2 1 BCH 1 2 2 1 1 1 1 2
J K K K 0 0 0 2 1 0 BCF 1 2 2 1 1 2 0 0
(a): G_Relation :{ G, E, J, K, A, B, C, D, L}
Figure 20 is the corresponding DG matrix.
F G H I
GH 1 2 2 1 The major goal of relational database normalization is to
(b): GH_Relation :{ GH, F, I} maintain atomicity, remove data redundancy, remove data
inconsistency from the database table and ensure that data
dependence among attributes make sense [10].
3.2 Third Normal Form (3NF)
In order to transform the relations into 3NF, each DM is
scanned row by row starting from the first row.
Decomposition would be lossless and dependency would be
preserved, we can check using above mentioned method. If a
7
3.4.1 Candidate Keys of Example 9

A candidate key is a set of attributes to which all other
attributes completely depend on. AB is a candidate key
Figure 20: he DG matrix for Example 9 because all simple keys either directly depend on AB or via a
determinant key which is not qualified to be a candidate key.
AB A F BF BCH BCF For other potential candidate keys BCH and BCF there are
AB 1 1 1 1 1 1 some dependencies of simple keys which are via a potential
A -1 1 -1 -1 -1 -1 candidate key. For example, C depends on BCH via AB which
F -1 -1 1 -1 -1 -1 is a potential candidate key. These kinds of dependencies have
to be reexamined to make sure whether the dependency
BF -1 -1 1 1 -1 -1
persists if the dependency through the potential candidate key
BCH 1 1 1 1 1 1 is ignored. This can be done by applying the Dependency
BCF 1 1 1 1 -1 1 closure routine on the initial DM of the relation. However, a
modification has to be considered for the Dependency-closure
routine which is to be used here. We would like to ignore
The path matrix is shown in Figure 21. dependencies of a potential candidate key to another potential
candidate key. To do so, the statement
Figure 21: Determinant key transitive dependencies if (i! =j && Path[i][j]!=-1) is replaced by
if (i! =j && Path[i][j]!=-1 j∉ {Potential candidate key set})
AB A F BF BCH BCF The result is depicted in Figure 24. This figure shows the set
AB 1 1 1 1 1 1 of optimal dependencies and real candidate keys.
A -1 1 -1 -1 -1 -1
F -1 -1 1 -1 -1 -1 Figure 24: The set of optimal dependencies
BF -1 -1 1 1 -1 -1 A B C D E F G H
BCH 1 1 1 1 1 1 AB 2 2 1 A 1 1 F BF
BCF 1 1 1 1 1 1 A 2 0 0 1 0 0 0 0
F 0 0 0 0 0 2 1 0
BF 0 2 0 0 0 2 F 1
New dependencies are applied to the DM and Figure 22 is the BCH 1 2 2 A 1 1 F 2
semi-final result. BCF 1 2 2 A 1 2 F BF
Figure 22: Dependency closure matrix
A B C D E F G H In the following we will act on 2NF and 3NF. From now on

AB 2 2 1 A BCH BCH F BF we assume AB is the primary key.
A 2 0 0 1 0 0 0 0 3.4.2 2NF of Example 9
F 0 0 0 0 0 2 1 0 The normalization of 1NF relations to 2NF involves the
BF 0 2 0 0 0 2 0 1 removal of partial dependencies on the primary key. If a
BCH 1 2 2 AB AB AB AB 2 partial dependency exists, we remove the functionally
dependent attributes from the relation by placing them in a
BCF 1 2 2 AB AB 2 AB AB
new relation with a copy of their determinant. On identifying
the functional dependencies, we continue the process of
normalization the relation. We begin by testing whether the
It is now the time to restore direct dependencies which might relation is in 2NF by identifying the presence of any partial
have been replaced by transitive dependencies. The FindOne dependencies on the primary key. We see that the attribute D
algorithm discovers all fade away dependencies. One such is partially dependent on part of the primary key, namely A.
dependency is shown in Figure 23 in the intersection of AB On the other hand, the attributes C, E, and F are fully
and F and AB and E. dependent on the whole primary key. We note that H is not
wholly dependent on part of the primary key AB and therefore
Figure 23: After Circular Dependency does not violate 2NF. Hence, we need to create a new relation
A B C D E F G H called A relation. As a result, DM is also partitioned as
AB 2 2 1 A 1 1 F BF Figure 25: Database normalized up to 2NF
A 2 0 0 1 0 0 0 0 A B C E F G H
F 0 0 0 0 0 2 1 0 AB 2 2 1 1 1 F BF
BF 0 2 0 0 0 2 0 1 F 0 0 0 0 2 1 0
BCH 1 2 2 AB AB AB AB 2 BF 0 2 0 0 2 F 1
BCF 1 2 2 AB AB 2 AB AB BCH 1 2 2 1 1 F 2
BCF 1 2 2 1 2 F BF
8
original AB_Relation to BCNF relations has resulted in ‘loss’

of the functional dependency BCH → AEF. On the other
(a): AB_Relation :{ AB, C, E, F, G, H} hand, we recognize that if the functional dependency BF → H
is not removed, the AB_relation will have data redundancy
[9]. In practice, some designers stop at 3NF and do not
A D proceed to BCNF. In which case, there may exist some
A 2 1 redundancies in the database designed.
(b): A_Relation :{ A, D} Figure 27: Database normalized up to BCNF
A B C E F
AB 2 2 1 1 1 F F
3.4.4 3NF of Example 9
The normalization of a 2NF table to 3NF involves the removal BCF 1 2 2 1 2 F 2 1
of transitivity dependencies. If a transitivity dependency (a): AB_Relation :{ AB, C, E, F} (b): F_Relation :{ F, G}
exists, we remove the transitivity dependency attributes from
the relation by placing them in a new relation along with a
B F H A D
copy of their determinant. First, we examine the functional
dependencies within the A and AB relations, which are as BF 2 2 1 A 2 1
figure 25. The A_relaton does not have transitive (c): BF_Relation :{BF,H} (d): A_Relation :{ A, D}
dependencies on the primary key. However, although all the
non-primarykey attributes within the AB_Relation are
functionally dependent on primary key, G is also dependent on 4. Comparison of our algorithm with others:
F. This is an example of a transitive dependency, which occurs Other algorithm for database normalization yields results in
when a non-primary-key attribute is dependent on another exponential time which is very expensive in term of
non-primary-key attribute. Although BF → H, BF is not a computation, some algorithm yields result in more than three
non-primary key (as B is part of the primary key) .Therefore, degree polynomial time, but our algorithm yields result in only
we do not remove the dependency at this stage. In other n2m time, which is very fast than any other. We believe that
words, this dependency is not wholly transitivity dependent on the algorithms are very efficient. However, we will compare
non-primary-key attribute and therefore does not violate 3NF. our algorithms with other similar algorithms, in the future.
To transform the AB_Relation into third normal form, we
must first remove transitive dependency. It is done by creating 5. Conclusion
two new relations called F_Relation and AB_Relation [9]. The A new complete automated relational database normalization
resulting 3NF relations have in figure 26. method is presented. The process is based on the generation of
dependency matrix, directed graph matrix, and determinant
Figure 26: Third normal form of Example 9 key transitive dependency matrix. The details of the methods
A B C E F H for 2NF, 3NF, and BCNF are discussed. Two examples, one
AB 2 2 1 1 1 BF without multiple candidate keys and one with multiple
candidate keys are considered and the defined algorithms are
BF 0 2 0 0 2 1
applied to produce the desired final tables.
BCH 1 2 2 1 1 2
BCF 1 2 2 1 2 BF 6. REFERENCES
(a): AB_Relation :{ AB, C, E, F, H}
[1] M Arenas, L Libkin, An Information-Theoretic Approach
to Normal Forms for Relational and XML Data, Journal of the
A D F G
ACM (JACM), Vol. 52(2), pp. 246-283, 2005.
A 2 1 F 2 1 [2] Kolahi, S., Dependency-Preserving Normalization of
(b): A_Relation :{ A, D} (c): F_Relation :{ F, G} Relational and XML Data, Journal of Computer System
Science, Vol. 73(4): pp. 636-647, 2007.
[3] Mora, A., M. Enciso, P. Cordero, IP de Guzman, An
3.4.5 Boyce-Codd Normal Form (BCNF) Efficient Preprocessing Transformation for Functional
We now examine A, AB and F relations to determine whether Dependencies Sets Based on the Substitution Paradigm,
they are in BCNF. A relation is in BCNF if every determinant CAEPIA2003, pp.136-146, 2003.
of a relation is a candidate key. Therefore, to test for BCNF, [4] Du H., and L. Wery, A Normalization Tool for Relational
we simply identify all the determinants and make sure they are Database Designers, Journal of Network and Computer
candidate keys. To achieve higher level of normalization, we Applications, Volume 22, No. 4, pp. 215-232, October 1999.
must passes through these four levels[18] .We can see that [5] Yazici, A., and Z. Karakaya, Normalizing Relational
from DMs in Figure 26 DM (b) and DM(c) their relations are Database Schemas Using Mathematica, LNCS,
already in BCNF. To transform AB_Relation into BCNF, we SpringerVerlag, Vol.3992, pp. 375-382, 2006.
must remove the dependency that violates BCNF by creating [6] Kung, H. and T. Case, Traditional and Alternative
one new relation for BF→H. The resulting BCNF relations Database Normalization Techniques: Their Impacts on IS/IT
have in figure 27. In this example, the decomposition of the Students’ Perceptions and Performance, International Journal
9
of Information Technology Education, Vol.1, No.1 pp. 53-76,

2004.
[7] Akehurst, D.H., B. Bordbar, P.J. Rodgers, and N.T.G.
Dalgliesh, Automatic Normalization via Metamodelling, ASE
2002 Workshop on Declarative Meta Programming to Support
Software Development, 2002.
[8] Date, C.J., An Introduction to Database Systems,
AddisonWesley, Seventh Edition 2000.
[9] Connoly, Thomas, Carolyn Begg: Database Systems. A
Practical Approach to Design, Implementation, and
Management, Pearson Education, Third edition, 2005.
[10]. S. Lucas, J. Meseguera, "Normal forms and normal
theories in Conditional rewriting", Elsevier Journal of Logical
and Algebraic Methods in Programming, vol. 85, pp. 67-97,
2016.
[11]. S. K. Singh, Database Systems-Concepts Design and
Application, New Delhi India:Publisher Dorling Kindesley,
2009.
[12].Lucas, J. Meseguer, N. R. Hennicker, "Localized
operational termination in general logics in: R.D" in Software
Services and Systems: Lecture. Notes Computer. Sci.,
Springer, vol. 8950, pp. 91-114, 2019.
[13].Moussa Demba, "Algorithm for relational database
Normalization up to 3NF", International Journal of Database
Management Systems, vol. 5, no. 3, June 2013.
[14]. R. Vangipuram, R. Velputa, V. Sravya, "A Web Based
Relational database design Tool to Perform Normalization",
International Journal of Wisdom Based Computing, vol. 1, no.
3, 2011.
[15]. W.-D. Langeveldt, S. Link, "Empirical evidence for the
usefulness of Armstrong relations in the acquisition of
meaningful functional dependencies", Inf. Syst., vol. 35, no. 3,
pp. 352-374, 2010.
[16].P. Bourque, R. Dupuis, "Guide to the Software
Engineering Body of Knowledge", IEEE Software, pp. 35-44,
November, December 1999.
[17]. S. Vimala, N H. Khanna, G. Saranya, A. Kannan,
"Applying Game Theory to Restructure PL|SQL Code",
International Journal of Soft Computing, vol. 7, no. 6, pp. 264-
270, 2012.
[18]. D. Boetticher Gary, "Computing program at UHCL
changing the world bit by bit", Graduate database course
Department of computer science, pp. 15-33, 2018.
[19]. J. Yuan, L. He, E. C. Dragut, W. Meng, C. Yu, "Result
merging for structured queries on the deep web with active
relevance weight estimation", Inf. Syst., vol. 64, pp. 93-103,
2017.
[20]. S. Vimala, K. Nehemiah, H. Bhuvaneswaran, R.S., G.
Sarya, "Design Methodology for Relational Databases: Issues
Related to Ternary Relationships in Entity-Relationship Model
and Higher Normal Forms", IJDMS, vol. 5, no. 3, pp. 15-37,
2013.
[21] Kunal Kumar, S.K Sazid, " database normalization and
design issues", IEEE UPCON, May 5–7 2017 .

Q PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Q PDF

Uploaded by

Copyright:

Available Formats

1

EFFICIENT ALGORITHM FOR DATABASE NORMALIZATION

Devbrat Anand, Dr. Maroti Deshmukh

Figure 1: Initial Decomposition A B C D E F G

Analysis of above algorithm:

Figure 17 : Database normalized up to 2NF A B C D E F G H

3.4.1 Candidate Keys of Example 9

A B C D E F G H In the following we will act on 2NF and 3NF. From now on

original AB_Relation to BCNF relations has resulted in ‘loss’

of Information Technology Education, Vol.1, No.1 pp. 53-76,

You might also like