You are on page 1of 12

Computer Science Journal of Moldova, vol.17, no.

2(50), 2009

Determination of the normalization level of
database schemas through equivalence classes of
attributes
Cotelea Vitalie

Abstract
In this paper, based on equivalence classes of attributes there
are formulated necessary and sufficient conditions that constraint
a database schema to be in the second, third or Boyce-Codd
normal forms. These conditions offer a polynomial complexity
for the testing algorithms of the normalizations level.
Keywords: Relational database schema, functional dependencies, equivalence classes of attributes, normal forms, polynomial algorithms.

1

Introduction

The anomalies that appear during database maintaining are known as
insertion, update and deletion anomalies. These are directly related to
the dependencies between attributes. A rigorous characterization of the
quality grade of a database schema can be made through the exclusion
of mentioned anomalies, with consideration of attributes dependencies,
which offers the possibility to define some formal techniques for design
of desirable relation schemes.
The process of design of some relation scheme structure with intend to eliminate the anomalies, is called normalization. Normalization consists in following a set of defined rules on data arrangement
with the scope to reduce the complexity of scheme structures and its
transformation into smaller and stable structures which will facilitate
c
°2009
by Vitalie Cotelea

123

Vitalie Cotelea data maintenance and manipulation. In Section 2. There exist several normalization levels that are called normal forms.e. third normal form (3NF) and Boyce-Codd normal form (BCNF). can be performed in a polynomial time [2]. the correlation is proven between nonredundant classes 124 . the problem of determination of the normalization level is known to be NP-complete [3. every relation in 3NF is also in 2NF and every relation in 2NF is in 1NF. For example. This approach can be a part of the database analysis and design toolset. 3NF and BCNF are important from a database design standpoint [1]. These forms have increasingly restrictive requirements: every relation in BCNF is also in 3NF. Thus a database designer may work in terms of attributes sets and data dependencies. because normalization testing requires finding the candidate keys and nonprime attributes. for the automation of database design and testing. But it is known that a relation can have an exponential number of keys under the number of all attributes of its scheme [5]. second normal form (2NF). A relation is in 1NF if every attribute contains only atomic values. most of the definitions needed in this paper are presented. Therefore. In this paper necessary and sufficient conditions for a scheme to be in 2NF. the design of a 3NF database schema. The problem of prime and nonprime attributes finding has been solved in a polynomial time [6]. 4]. Besides this. through the synthesizing method. several properties for equivalence classes of attributes. and not in terms of keys. The normal forms based on functional dependencies are first normal form (1NF). the definitions of normal forms use the notions of prime and nonprime attributes. the determination of normalization level of a scheme is also polynomial. proved in [6]. Unfortunately. Firstly. Secondly. which are also related to key. third or BCNF) contain the notion of key. are given. i. 3NF or BCNF are defined. In Section 3. 2NF is mainly of historical interest. the definitions of normal schemes (second. These conditions are described in terms of redundant and nonredundant equivalence classes of attributes and the computation of these classes can be performed in polynomial time [6].

The final section is about algorithmic aspects. The set X is a determinant for Y with respect to F if X 0 → Y is not in F + for every proper subset X 0 of X. Let X and Y be two nonempty finite subsets of R. where it is shown that the determination of the normalization level of database schemas can be performed in polynomial time. Otherwise A is nonprime in Sch(R. 5 and 6 there are presented necessary and sufficient conditions (Theorems 4-6). written as X + . If X is a determinant for R with respect to F . 3NF or BCNF. then X is a key for relation scheme Sch(R. An attribute A is prime in Sch(R. In Sections 4. Let Sch(R. 2 Preliminary notions In this and in the next section. F ) be a relation scheme. If F is a set of functional dependencies over R and X is a subset of R. respectively. Note that some relation scheme may have more than one key. F ). where F is a set of functional dependencies defined on a set R of attributes. that will be as concise as possible. Armstrong’s Axioms are sound in that they generate only functional dependencies in F + when applied to a set F . for a relation scheme to be in 2NF. 125 . in terms of equivalence classes of attributes. is the set of attributes A such that X → A can be inferred using the Armstrong Axioms. F ). some definitions and statements used in this paper are presented. They are complete in that repeated application of these rules will generate all functional dependencies in the closure F + [1]. F ) if A is contained in some key of Sch(R.Determination of the normalization level of database schemas of attributes and the right and left sides of functional dependency that is inferred from a given set of functional dependencies (Theorem 3). that is X + = {A|X → A ∈ F + } [7]. that is F + = {V → W |F | = V → W } [7]. then the closure of the set X with respect to F . F ). The set of all functional dependencies implied by a given set F of functional dependencies is called the closure of F and is denoted as F + .

E). (A. The relation of strong connectivity is an equivalence relation over the set S. Lemma 1.Vitalie Cotelea In what follows. Over the set S ∗ of vertices of graph G∗ a strict partial order is defined. . Tm is obtained. If X → Y ∈ F + and X is a determinant of Y under F . Evidently the condensed graph G∗ is free of directed circuits. Given a relation scheme Sch(R. . n. E ∗ ) is defined as follows: S ∗ = {S1 ... [6].. where: • for every attribute A in R. Vertex Si precedes vertex Sj . Then the condensed graph [8] of G. then for every attribute A ∈ (X − Y ) there is an attribute B ∈ Y so that in the contribution graph G there exists a path from vertex A to vertex B and for every attribute B ∈ (Y − X) there exists in X an attribute A. Let G = (S. A ∈ Si and B ∈ Sj }.. . . B) in E. From the ordered sequence of sets S1 . So. F ). there is a vertex labeled by A in S. where T1 = S1 and Tj = Sj − Sj−1 + ( i=1 Ti )F for j = 2.. it will be assumed that the set F of functional dependencies is reduced [7]. . keeping the precedence of prior sets. the set F can be represented by a graph... if Sj is accessible from Si . 126 . there is a partition of set of vertices S into pairwise disjoint Sn subsets. Let S1 . Strict partial orders are useful because they correspond more directly to directed acyclic graphs.. Sn } and E ∗ = {(Si . that is. Sn be the strongly connected components of a graph G = (S.. E). Sj )|i 6= j. E) be divided into strongly connected components... Tn ..... Sn a sequence of ordered nonredundant sets can be built T1 . All empty sets are excluded from the sequence and a sequence of nonempty sets T1 . from which the vertex B can be reached. G∗ = (S ∗ . S = i=1 Si . B) in E that is directed from vertex A to vertex B. • for every functional dependence X → Y in F and for every attribute A in X and every B in Y there is an edge a = (A. called contribution graph [6] for F and denoted by G = (S.

Sn4). then S Z.. Y ⊆ T1T . If an attribute ASin S Sn is nonprime in scheme Sch = ( i=1 Si . where Z = X (T . on the contribution graph of set F of dependencies. F ). where j = 1. S S Corollary 1. is a determinant for Y Tj under F . Theorem ([6].. Sn under F . ([6]. m. A contradiction has been reached.. Theorem S S 1.. then A ∈ i=1 Ti . In the following sections.. sufficient and necessary conditions for a relation scheme to be in a normal form are presented. ([6]. TEvidently that X ⊆ S S S j S S + 0 T1 . Lemma is a determinant under F of set S S 2.. Tm . Tm . Tj is redundant. Using above structures and statements it will be shown that the problem of determination of the normalization level has polynomial complexity. m..Determination of the normalization level of database schemas 3 Some properties of equivalence classes of attributes In this section a brief overview of several properties of equivalence classes of attributes is given. + Theorem 3... Tm under F .. For aTTj . in terms of equivalence classes of attributes... m. S of set T1 .. Tj−1 . but X Tj = ∅. then A ∈ ( ni=1 Si − m i=1 Ti ). Tm and X → T (Y Tj ) ∈ F . the following takes place: if Y Tj 6= ∅. if and only if X is determinant of set T1 . If an attribute SmA in S1 . Lemma 3). Tj−1 Tj+1 . TheoremT4).. And their proofs are presented in [6]. But in this case. Set X is a determinant S ofS set S1 . 0 where X ⊆ X. Tj under F . Corollary Sn 3). Tm . Sn is prime in scheme Sch = ( i=1 Si ... is a 1 j S determinant for T1 .. According to Lemma 1.. Let X . S S Corollary 2. X 0 ⊆ T1 . where i = 1.. Thereby. 127 . The T soundness of this T statement is proven by contradiction: let Y T = 6 ∅. then X Ti 6= ∅. from every vertex labeled with an attribute in X 0 there exists aSpath T Sto a vertex labeled with an attribute in Y Tj .. Proof. ([6]. then X Tj 6= ∅. T ) and j = 1... Let X S →SY ∈ F . where X is a determinant for Y under F and X. Theorem 2). If set of attributes X is a determinant S 2.. ([6]. F ). If X T S S T1 . Corollary S1 ..

then A is called that completely depends on X. Proposition 1.. if each constituent relation scheme is in the 2NF.... if and Theorem 4. Tj−1 )→A∈F + . X ⊆ T1 . From the construction of contribution Sm + that (T1 . Definition 2. Scheme Sch = ( ni=1 Si . then A completely depends on X.. Tj ) → A ∈ F . if there exists a proper subset X 0 of set X. namely S S S + . if it is in 1NF and each nonprime S attribute in ni=1 Si doesn’t partially depend on every key for Sch. T .Vitalie Cotelea 4 Second normal form Thus. or A ∈ ( ni=1 Si − m i=1 Ti ). F ) be inSthe 2NF.. F ) is in Sm + only if it is in the 1NF and for every T . S Proof. 128 . Necessity. Because A∈( i=1 Ti −Tj )+ . If such a proper subset doesn’t exist. [9]. Let X → A ∈ F be a nontrivial functional dependency (namely A ∈ / X). m. 1 m In S S addition. ( j i=1 Ti − Tj ) = Sm i=1 Ti − Tj takes place. Database schema is in the 2NF. n Then Sm every nonprime attribute A. Let scheme Sch = ( ni=1 Si . such that X 0 → A ∈ F + . X is a determinant of set T .. but there is an attribute A∈( i=1 Ti −Tj ) such S Sm (S i=1 Ti −Tj ). The next theorem gives a characterization of the 2NF in terms of equivalence classes of attributes. But this contradicts A∈( j−1 i=1 Ti −Tj ) . Let AS∈ TS graph. There are two cases: either A ∈ Tj . According to Theorem 1. j = 1. F ) is in the 2NF under a set of functional dependencies F . Assuming to the contrary.. follows j . If set of attributes X is a determinant for attribute A under set of attributes F . S the 2NF. Tm . the relation scheme in the 2NF can be defined: S Definition 1. that Sch is in Sm + that A ∈ / the 2NF. Relation scheme Sch = ( ni=1 Si . that is a member of set S i=1SSi − Sn .. An attribute A is called partially dependent on X. i=1 Ti completely depends on every determinant X of setSS1 S. then (T1 the fact that set Tj is nonredundant. [9]..

. or ( S − i i i i=1 i=1 i=1 Ti ) 6= ∅..Determination of the normalization level of database schemas S S S S Let A ∈ ( ni=1 Si − m for T .. the following equality takes place: ( every T . F ) is in 3NF under a set of functional dependencies F .. m and vice versa. So that (T . XS (TS .. that is in the case when set of nonprime attributes is not empty. if and only if for every i = 1. S Sufficiency. So... S S i=1 If ( ni=1 Si − m T ) = ∅... SmTwo cases are possible: either ( S − T ) = ∅. The + = T − Tj takes place when Ti = Si holds for every T − T ) ( m i i j i=1 i=1 i = 1. j = j i=1 Ti −Tj ) = Sm i=1 Ti −Tj .. In other words. the scheme is in the 2NF. Tm . Tj−1 ) is a de1 S S terminant for T . m Ti = Si holds. Scheme Sch = ( ni=1 Si . taking into account Lemma 2. fact that contradicts the assumption that scheme Sch is in the 2NF. S Definition 3. If X is a determinant T S S under F .. then 1 j−1 1 j−1 T S S + (X (T1 . 5 Third normal form In this section a characterization of the 3NF is given through the equivalence classes. T . Let scheme Sch = ( ni=1 Si . if it is in the 1NF and every nonprime attribute doesn’t transitively depend on a key of scheme Sch. that every nonprime attribute A comS results S pletely depends on T1 . 129 .. Database schema is in the 3NF. T )→A∈F + .. F ) be inS the 1NF and for m + 1. [9]. the nonprime attribute A partially depends on determinant X. That is A partially depends on key X. S S If ( ni=1 Si − m i=1 Ti ) 6= ∅. F ) is in the 2NF. Tm 1 i=1 Ti ). It will be Snproven that Sm scheme Sch isSinn the 2NF. S Corollary 3. therefore. soundness of this statement follows from the fact that Sm S Proof.. furthermore it completely depends on S S determinant X under F of set T1 . then scheme doesn’t contain nonprime i=1 i attributes and. m. if every constituent relation scheme is in the 3NF. Tm . Scheme Sch = ( ni=1 Si . Tj−1 )) → A ∈ F . scheme is in the 2NF and it is even in the third.

A ∈ / V W. Therefore. Hence. Sn is in the S3NF. fully functionally depends SS n ). if Theorem 5. all nonprime attributes A don’t depend transitively on determinant X. F ) S Sn Sm and only if ( i=1 Si − i=1 Ti ) is a determinant for ( ni=1 Si − m i=1 Ti ) under F . A ∈ ( i=1 Si − i=1 Ti ) and A ∈ / XW . Let X be a determinant of set i i i=1 S i=1 T1 . no attribute A. F ). Then scheme Sch S = ( ni=1 SSi . That is. namely it is fully functionally on key X of scheme Sch = ( i=1 Si . 3. V. F ) is also in the 2NF and each m attribute A. Sm ( S S − T ) under F . X → ( i=1 Si − S F holds. S Tm under F . where A ∈ ( ni=1 i − i=1 Ti ). In addition. Tm under F . 130 . then Snthere doesn’t Sm exist anySdependency Sm W → A ∈ F .. W ⊆ i i=1 i=1 Si Sn and A ∈ i=1 Si . where A ∈ ( ni=1 Si − m i=1 Ti ). transitively depends on + X.. F ) be in the 3NF. W → A ∈ F + . 2.Vitalie Cotelea Sn Sn Definition 4. 4. That is. W → V ∈ / F + (namely V doesn’t functionally depend on W )..Sthere doesn’t Sm exist any dependency W → A ∈ F . S S Assume ( ni=1 Si − m i=1 Ti ) is a determinant for SnSufficiency. if the following conditions are all satisfied: 1. LetS scheme Sch = ( ni=1 Si . F S S dependent on determinant XSof set T1S . n that confirms that ( i=1 Si − i=1 Ti ) is a determinant for ( i=1 Si − S m i=1 Ti ) under F . such n Ti ) and / XW . Bei=1 Ti ) ∈ S n m cause ( ni=1 Si − m T ) is a determinant for ( S − i=1 i i=1 i i=1 Ti ) under + F . such that n W ⊆ ( i=1 Si − i=1 Ti ).. X is a +determinant of set SS1 . dependency that S W ⊆S( i=1 Si − Si=1 Sm A ∈ n ( ni=1 Si − m T ) → ( S − T fact i=1 i Sn i=1 i Sm i=1 i ) is reduced on the leftSside.. SSn . V → W ∈ F + . Necessity. It is considered that the attribute A transitively depends on V through W . Let scheme Sch = ( S .. According to Theorem S Sn Sm1. Relation scheme Sch = ( i=1 Si . [10]. S Proof.

the case whenS A ∈ / V . V → Si=1 Si ∈ F holds. S Proof. if and only . that is. i i=1 i=1 Ti ∈ Without constraining the generality. It will be proven that scheme Sch = for ( j Sn i=1 i ( i=1 Si . m. S Let scheme Sch = ( ni=1 Si . the left side of each functional dependency functionally determines all attributes of scheme. j = 1. that Sn dependency + is. S so that V → Sna nontrivial m + S ∈ / F holds. the set Sjm Sm if it is in the 3NF and for every T of attributes ( i=1 Ti − Tj ) is a determinant for ( i=1 Ti − Tj ) under F. m. attributes in Tj j = 1. Then for every nontrivial functional V → A ∈ F + . so that m + V → ( i=1 Ti − Tj ) ∈ F . S Definition 5. j = 1. thenSthe last dependency is not trivial. the set of attributes ( m i=1 Ti − Tj ) is a determinant m T − T ) under F . In this case. Based on the m m + reflexivity rule. if it is in S the 1NF and for every nontrivial dependency V → A ∈ F + V → ni=1 Si ∈ F + takes place. In the determination of a database schema being in BCNF. namely X T = 6 ∅ for j = 1. T 2. Scheme Sch = ( ni=1 Si . let V be a determinant for A under F . m too. F ) be in BCNF. But this functional dependency contradicts the Smfact that every determinant Xof set S n according to Theorem i=1 Si and consequently a set i=1 Ti contains. Necessity. Let scheme Sch = ( ni=1 Si . [11]. F ) is in BCNF. F ) not be in BCNF. that is.Determination of the normalization level of database schemas 6 Boyce-Codd normal form The concept of BCNF is refined from the notion of 3NF. m. F ) is in the normal form BoyceCodd. there is functional dependency V → A ∈ F + . Relation scheme Sch = ( ni=1 Si .Sif there exists a set of attributes V ⊂ ( m i=1 Ti − Tj ). j Sn Sufficiency. Let scheme Sch = ( i=1 S Si . If it’s supposed Sm m that ( i=1 Ti − Tj ) is not a determinant for ( S i=1 Ti − Tj ) under F . F ) be in the 3NF and for everySTj . S Theorem 6. a given set F of functional dependencies is used. From the construction of the contribution graph and from the fact 131 . Then it can be stated that V → / F +. F ) is in BCNF under set F of functional dependencies. ( i=1 Ti − Tj ) → ( i=1 Ti − Tj )S∈ F . By the definition of BCNF V → ni=1 Si ∈ F + holds.

then the i i=1 i=1S m T − T ) of attributes is not a determinant for ( set ( m j i=1 Ti − Tj ) i=1 i under F . Indeed. A few comments about the complexity of the algorithms for finding the normal form of scheme are made below.Vitalie Cotelea that dependency V → A is reduced. S Sm 3. and the right side consists of a nonprime attribute. V ⊆ m i=1 Ti and A ∈ ( i=1 Si − i=1 Ti ) . let m > 1 and A ∈ Tk . left and right sides are formed just from prime attributes. Then by Theorem 3 and the construction way of drawing the contribution graph. S Sn Sm S Suppose that V ⊆ ( ni=1 Si − m i=1 Ti ) and A ∈ ( i=1 Si − i=1 Ti ). F ) will not be in the 2NF. so that ( m i=1S m − Tj ) under F . i = l. k. Tk ). but in this case Ti − Tj ) is not a there exists a Tj . the polynomiality of the normal form testing problem can be proved. S fact that contradicts Smthe hypothesis. which contradicts the hypothesis. S Sn Sm 2. that is. because (( m determinant i=1 Ti − {A}) − Sm for ( i=1 Ti + Tj ) → ( i=1 Ti − Tj ) ∈ F . then nonprime attribute S A would transitively depend through Sn V on n every determinant of set i=1 Si and then scheme Sch = ( i=1 Si . where V Ti 6= ∅. V ⊆ m i=1 Ti and A ∈ i=1 Ti .. itScanSexist S two casesTeither V ⊆ Tk . that is. 132 .. that is. or V 6⊂ Tk .the left side is formed from prime attributes. F ) will not be in the 3NF. left and right sides are formed just from nonprime attributes. three cases can be examined (other cases don’t exist): S S Sn Sm 1. S S S If it is considered that V ⊆ m T and A ∈ ( ni=1 Si − m i=1 i i=1 Ti ). 7 Algorithms’ complexities Based on the above characterization. IfSit is considered that V ⊆ m T and A ∈ Ti . V ⊆ ( ni=1 Si − m i=1 Ti ) and A ∈ ( i=1 Si − i=1 Ti ). neither in the third.Swhere V Tj = ∅. but V ⊆ (Tl Tl+1 . T Evidently m > k − S l + 1. then nonprime attribute A would partially S S depend on a determinant of set ni=1 Si therefore scheme Sch = ( ni=1 Si .

if for every Tj . Database Syst. Synthesizing Third placeNormal Form Relations from Funcional Dependencies. ( i=1 Ti − Tj ) = i=1 Ti − Tj ) requires a time O(|N onRedEquivClasses| · ||F ||). That is. m. 133 . ACM Trans. if for everySTj . set m of attributes ( m i=1 Ti − Tj ) is a determinant for ( i=1 Ti − Tj ) under F ) requires a time O(|R| · ||F ||). P. [3] Beeri. It is not hard to calculate the complexity of algorithms that determine whether a scheme is in the second. Bernstein. p. N 4. S Thus. third or Boyce-Codd normal form. |R| · ||F || for each of these algorithms. F ) toSbe in BCNF (that is. Database Management Systems. Philip A. m. Raghu and Gehrke. 1976.Determination of the normalization level of database schemas Both construction of equivalence classes of scheme’s attributes and redundancy elimination from these classes have a complexity O(|R| · ||F ||) [6].1. S Computation of the condition Sch = ( ni=1 Si . 2000. 900 pp. F )+ to be Smin the 2NF (that is. if ( i=1 Si − m i=1 Ti ) is a determinant for Snbe in theSm ( i=1 Si − i=1 Ti )) requires a time O(||F ||). Second Edition. Therefore the time is O(|R| · ||F ||). F ) Sn for theSscheme to 3NF (that is.A. Johannes. SnSimilarly. N 1.. j = 1. then calculation of the condition for the scheme Sch = ( Sm i=1 Si . References [1] Ramakrishnan.4.. p. j = 1. V..30–59. Database Syst. This is explained through the fact that the complexity of calculation of the classes of nonredundant attributes exceeds the complexity of calculation of the verification conditions that determine if a scheme is in one of the enumerated forms. [2] Bernstein. McGraw-Hill Higher Education.277–298. if nonredundant classes m i=1 T Si nare built. V. verification of the condition for the scheme Sch = ( i=1 Si . ACM Trans. C. March 1979. Computational Problems Related to the design of placeNormal Form Relations Schemes.

J. 1982.. 2009. Shimon. Englewood Cliffs. 2009 Vitalie Cotelea Academy of Economic Studies of Moldova Phone: (+373 22) 40 28 87 E-mail: vitalie. 89–99.T. [10] Chao-Chih Yang.H. Graph Algorithms. [9] Codd. Information Processing Letters. E. Fisher. [6] Cotelea.T. Prentice-Hall.cotelea@gmail. Rustin (ed). N. Inform. 260 p. 637 p. Johnson D. 1986. 33–64. R. [11] Codd. Relational Databases. [7] Maier. V. p. On the complexity of finding the set of candidate keys for a given set of functional dependencies. 1978.F. Computer Science press. 1972. Computer Science Press. Recent Investigation in Relation Data Base Systems. Data Base Systems. 1017–1021..1(49). Computer Science Journal of Moldova. p.C. The theory of relational database. 1983. p. Vitalie Cotelea Received May 27. 1974. p.4. E. Vitalie.17. NJ. An approach for testing the primeness of attributes in relational schemas. Further Normalization of the data base relational model. The complexity of recognizing 3FN relation schemes.187–190. p. Vol. Prentice Hall.5..com 134 .Vitalie Cotelea [4] Jou. P. 250 p. IFIP Congress. D. [5] Yu C. 1979. [8] Even.F. Chisinau. Process.100–101. Nr. Letters 14.