You are on page 1of 5

Chapter 6

 Normalization - Process for evaluating and correcting table structures to minimize data redundancies
o Reduces data anomalies
 Series of stages called normal forms:
o First normal form (1NF)
o Second normal form (2NF)
o Third normal form (3NF)
 2NF is better than 1NF; 3NF is better than 2NF
o For most business database design purposes, 3NF is as high as needed in normalization
o Highest level of normalization is not always most desirable
 Denormalization- produces a lower normal form. Increased performance but greater data redundancy 3NF2NF
 Prime attribute – any attribute that is at least part of a key
 Nonprime attribute – (non-key attribute- not part of any candidate key.
 Normalization Process:
o Each table represents a single subject
o No data item will be unnecessarily stored in more than one table
o All nonprime attributes in a table are dependent on the primary key
o Each table is void of insertion, update, and deletion anomalies
o Objective of normalization is to ensure that all tables are in at least 3NF
o Higher forms are not likely to be encountered in business environment
o Normalization works one relation at a time
o Progressively breaks table into new set of relations based on identified dependencies
 Partial dependency - Exists when there is a functional dependence in which the determinant is only part of the primary
key (assumption that there is only one candidate key)
o (A,B )  (C,D), C and (A,B) is the primary key, then the functional dependence B C is a partial dependency
because only part of the primary key (B) is needed to determine the value of C
 Transitive dependency - Exists when there are functional dependencies such that X → Y, Y → Z, and X is the primary key
o XY, YZ, and X is the primary key. In that case the dependency X Z is a transitive dependency because X
determines the value of Z via Y.
o They occur only when a functional dependency exist among non-prime attributes. In the p[previous example, the
actual transitive dependency is XZ. However, the dependency Yz signals that a transitive dependency exists.
Hence, throughout the discussion of the normalization process, the existence of a functional dependence among
nonprime attributes will be considered a sign of transitive dependency.
 Conversion to First Normal Form:
o Step 1: Eliminate the Repeating Groups
 Eliminate nulls: each repeating group attribute contains an appropriate data value
o Step 2: Identify the Primary Key
 Must uniquely identify attribute value
 New key must be composed
o Step 3: Identify All Dependencies
 Dependencies are depicted with a
diagram Dependency diagram: Depicts all
dependencies found within given table
structure

 First normal form describes tabular format:


o All key attributes are defined
o No repeating groups in the table
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00

https://www.coursehero.com/file/31247488/Ch6docx/
o All attributes are dependent on primary key
 All relational tables satisfy 1NF requirements
 Some tables contain partial dependencies
o Dependencies are based on part of the primary key
 Conversion to Second Normal Form:
o Step 1: Make New Tables to Eliminate Partial
Dependencies
 Write each key component on separate line,
then write original (composite) key on last line
 Each component will become key in new table
o Step 2: Reassign Corresponding Dependent Attributes
 Determine attributes that are dependent on
other attributes
 At this point, most anomalies have been
eliminated
 Table is in second normal form (2NF) when:
o It is in 1NF and
o It includes no partial dependencies:
o No attribute is dependent on only portion of primary
key
 Conversion to Third Normal Form:
o Step 1: Make New Tables to Eliminate Transitive Dependencies
 For every transitive dependency, write its determinant as PK for new table
 Determinant: any attribute whose value determines other values within a row
o Step 2: Reassign Corresponding Dependent Attributes
 Identify attributes dependent on each determinant identified in Step 1
 Identify dependency
 Name table to reflect its contents and function

 A table is in third normal form (3NF) when both of the following are true:
o It is in 2NF
o It contains no transitive dependencies
 Improving Design
o Table structures should be cleaned up to eliminate initial partial and transitive dependencies
o Normalization cannot, by itself, be relied on to make good designs
o Valuable because it helps eliminate data redundancies
o Issues to address, in order, to produce a good normalized set of tables:
 Evaluate PK Assignments – each time a new employee is entered into the EMPLOYEE table, a JOB_CLASS
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00
value must be entered

https://www.coursehero.com/file/31247488/Ch6docx/
Evaluate Naming Conventions- Therefore, CHG_HOUR will be changed to JOB_CHG_HOUR to indicate
association with the JOB table.
 Refine Attribute Atomicity
 Atomic attribute- one that cannot be further subdivided. Such attribute is said to display
atomicity. EMP_NAME is not atomic because it can be decomposed into EMP_LNAME,
EMP_FNAME
 Identify New Attributes- If the EMPLOYEE table were used in real-world environment, several other
attributes would be added. (SS payments, Medicare payments, HIRE_DATE)
 Identify New Relationships- users need to track which employee is acting as the manager of each project.
This can be implemented as a relationship between EMPLOYEE and PROJECT.
 Refine Primary Keys as Required for Data Granularity – Granularity refers to the level of detail
represented by the values stored in a tables row. Data stored at its lowest level of granularity is said to be
atomic data.
 Maintain Historical Accuracy – Writing the job per hour into ASSIGNMENT table is crucial to maintain the
historical accuracy of the tables data.
 Evaluate Using Derived Attributes - You can use a derived attribute in the ASSIGNMENT table to store
the actual charge made to a project. That derived attribute names ASSIGN_CHARGE, is the result of
multiplying ASSIGN_HOURS by ASSIGN_CHRG_HOUR. This creates a transitive dependency such that
(ASSGN_CHRG + ASSIGN_HOURS)  ASSIGN_CHRG_HOUR
 Surrogate Key – When primary key is considered to be unsuitable, designers use surrogate keys
o Data entries in Table 6.4 are inappropriate because they duplicate existing records
 No violation of entity or referential integrity

 Higher-order normal forms are useful on occasion


 Two special cases of 3NF:
o Boyce-Codd normal form (BCNF)
 Every determinant in table is a candidate key
 Has same characteristics as primary key, but for some reason, not chosen to be primary key
 When table contains only one candidate key, the 3NF and the BCNF are equivalent
 BCNF can be violated only when table contains more than one
candidate key
 Most designers consider the BCNF as a special case of 3NF
 Table is in 3NF when it is in 2NF and there are no transitive
dependencies
 Table can be in 3NF and fail to meet BCNF
 No partial dependencies, nor does it contain transitive
dependencies
 A nonkey attribute is the determinant of a key attribute
o Fourth normal form (4NF)
 Table is in fourth normal form (4NF) when both of the following are true:
 It is in 3NF
 No multiple sets of multivalued dependencies
 4NF is largely academic if tables conform to following two rules:
 All attributes dependent on primary key, independent of each other
 No row contains two or more multivalued facts about an entity
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00

https://www.coursehero.com/file/31247488/Ch6docx/
 ER diagram
o Identify relevant entities, their attributes, and their relationships
o Identify additional entities and attributes
 Normalization procedures
o Focus on characteristics of specific entities
o Micro view of entities within ER diagram
 Difficult to separate normalization process from ER modeling process
 Denormalization:
o Joining the larger number of tables reduces system speed
o Conflicts are often resolved through compromises that may include denormalization
o Defects of unnormalized tables:
 Data updates are less efficient because tables are larger
 Indexing is more cumbersome
 No simple strategies for creating virtual tables known as views
 Data Modeling Checklist:
o Data modeling translates specific real-world environment into data model
 Represents real-world data, users, processes, interactions
o Data-modeling checklist helps ensure that data-modeling tasks are successfully performed

 Normalization minimizes data redundancies  Table that is not in 3NF may be split into new tables
 First three normal forms (1NF, 2NF, and 3NF) are most until all of the tables meet 3NF requirements
commonly encountered  Normalization is important part—but only part—of
 Table is in 1NF when: the design process
o All key attributes are defined  Table in 3NF may contain multivalued dependencies
o All remaining attributes are dependent on o Numerous null values or redundant data
primary key  Convert 3NF table to 4NF by:
 Table is in 2NF when it is in 1NF and contains no o Splitting table to remove multivalued
partial dependencies dependencies
 Table is in 3NF when it is in 2NF and contains no  Tables are sometimes denormalized to yield less I/O,
transitive dependencies which increases processing speed

This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00

https://www.coursehero.com/file/31247488/Ch6docx/
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00

https://www.coursehero.com/file/31247488/Ch6docx/
Powered by TCPDF (www.tcpdf.org)

You might also like