Professional Documents
Culture Documents
Normalization - Process for evaluating and correcting table structures to minimize data redundancies
o Reduces data anomalies
Series of stages called normal forms:
o First normal form (1NF)
o Second normal form (2NF)
o Third normal form (3NF)
2NF is better than 1NF; 3NF is better than 2NF
o For most business database design purposes, 3NF is as high as needed in normalization
o Highest level of normalization is not always most desirable
Denormalization- produces a lower normal form. Increased performance but greater data redundancy 3NF2NF
Prime attribute – any attribute that is at least part of a key
Nonprime attribute – (non-key attribute- not part of any candidate key.
Normalization Process:
o Each table represents a single subject
o No data item will be unnecessarily stored in more than one table
o All nonprime attributes in a table are dependent on the primary key
o Each table is void of insertion, update, and deletion anomalies
o Objective of normalization is to ensure that all tables are in at least 3NF
o Higher forms are not likely to be encountered in business environment
o Normalization works one relation at a time
o Progressively breaks table into new set of relations based on identified dependencies
Partial dependency - Exists when there is a functional dependence in which the determinant is only part of the primary
key (assumption that there is only one candidate key)
o (A,B ) (C,D), C and (A,B) is the primary key, then the functional dependence B C is a partial dependency
because only part of the primary key (B) is needed to determine the value of C
Transitive dependency - Exists when there are functional dependencies such that X → Y, Y → Z, and X is the primary key
o XY, YZ, and X is the primary key. In that case the dependency X Z is a transitive dependency because X
determines the value of Z via Y.
o They occur only when a functional dependency exist among non-prime attributes. In the p[previous example, the
actual transitive dependency is XZ. However, the dependency Yz signals that a transitive dependency exists.
Hence, throughout the discussion of the normalization process, the existence of a functional dependence among
nonprime attributes will be considered a sign of transitive dependency.
Conversion to First Normal Form:
o Step 1: Eliminate the Repeating Groups
Eliminate nulls: each repeating group attribute contains an appropriate data value
o Step 2: Identify the Primary Key
Must uniquely identify attribute value
New key must be composed
o Step 3: Identify All Dependencies
Dependencies are depicted with a
diagram Dependency diagram: Depicts all
dependencies found within given table
structure
https://www.coursehero.com/file/31247488/Ch6docx/
o All attributes are dependent on primary key
All relational tables satisfy 1NF requirements
Some tables contain partial dependencies
o Dependencies are based on part of the primary key
Conversion to Second Normal Form:
o Step 1: Make New Tables to Eliminate Partial
Dependencies
Write each key component on separate line,
then write original (composite) key on last line
Each component will become key in new table
o Step 2: Reassign Corresponding Dependent Attributes
Determine attributes that are dependent on
other attributes
At this point, most anomalies have been
eliminated
Table is in second normal form (2NF) when:
o It is in 1NF and
o It includes no partial dependencies:
o No attribute is dependent on only portion of primary
key
Conversion to Third Normal Form:
o Step 1: Make New Tables to Eliminate Transitive Dependencies
For every transitive dependency, write its determinant as PK for new table
Determinant: any attribute whose value determines other values within a row
o Step 2: Reassign Corresponding Dependent Attributes
Identify attributes dependent on each determinant identified in Step 1
Identify dependency
Name table to reflect its contents and function
A table is in third normal form (3NF) when both of the following are true:
o It is in 2NF
o It contains no transitive dependencies
Improving Design
o Table structures should be cleaned up to eliminate initial partial and transitive dependencies
o Normalization cannot, by itself, be relied on to make good designs
o Valuable because it helps eliminate data redundancies
o Issues to address, in order, to produce a good normalized set of tables:
Evaluate PK Assignments – each time a new employee is entered into the EMPLOYEE table, a JOB_CLASS
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00
value must be entered
https://www.coursehero.com/file/31247488/Ch6docx/
Evaluate Naming Conventions- Therefore, CHG_HOUR will be changed to JOB_CHG_HOUR to indicate
association with the JOB table.
Refine Attribute Atomicity
Atomic attribute- one that cannot be further subdivided. Such attribute is said to display
atomicity. EMP_NAME is not atomic because it can be decomposed into EMP_LNAME,
EMP_FNAME
Identify New Attributes- If the EMPLOYEE table were used in real-world environment, several other
attributes would be added. (SS payments, Medicare payments, HIRE_DATE)
Identify New Relationships- users need to track which employee is acting as the manager of each project.
This can be implemented as a relationship between EMPLOYEE and PROJECT.
Refine Primary Keys as Required for Data Granularity – Granularity refers to the level of detail
represented by the values stored in a tables row. Data stored at its lowest level of granularity is said to be
atomic data.
Maintain Historical Accuracy – Writing the job per hour into ASSIGNMENT table is crucial to maintain the
historical accuracy of the tables data.
Evaluate Using Derived Attributes - You can use a derived attribute in the ASSIGNMENT table to store
the actual charge made to a project. That derived attribute names ASSIGN_CHARGE, is the result of
multiplying ASSIGN_HOURS by ASSIGN_CHRG_HOUR. This creates a transitive dependency such that
(ASSGN_CHRG + ASSIGN_HOURS) ASSIGN_CHRG_HOUR
Surrogate Key – When primary key is considered to be unsuitable, designers use surrogate keys
o Data entries in Table 6.4 are inappropriate because they duplicate existing records
No violation of entity or referential integrity
https://www.coursehero.com/file/31247488/Ch6docx/
ER diagram
o Identify relevant entities, their attributes, and their relationships
o Identify additional entities and attributes
Normalization procedures
o Focus on characteristics of specific entities
o Micro view of entities within ER diagram
Difficult to separate normalization process from ER modeling process
Denormalization:
o Joining the larger number of tables reduces system speed
o Conflicts are often resolved through compromises that may include denormalization
o Defects of unnormalized tables:
Data updates are less efficient because tables are larger
Indexing is more cumbersome
No simple strategies for creating virtual tables known as views
Data Modeling Checklist:
o Data modeling translates specific real-world environment into data model
Represents real-world data, users, processes, interactions
o Data-modeling checklist helps ensure that data-modeling tasks are successfully performed
Normalization minimizes data redundancies Table that is not in 3NF may be split into new tables
First three normal forms (1NF, 2NF, and 3NF) are most until all of the tables meet 3NF requirements
commonly encountered Normalization is important part—but only part—of
Table is in 1NF when: the design process
o All key attributes are defined Table in 3NF may contain multivalued dependencies
o All remaining attributes are dependent on o Numerous null values or redundant data
primary key Convert 3NF table to 4NF by:
Table is in 2NF when it is in 1NF and contains no o Splitting table to remove multivalued
partial dependencies dependencies
Table is in 3NF when it is in 2NF and contains no Tables are sometimes denormalized to yield less I/O,
transitive dependencies which increases processing speed
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00
https://www.coursehero.com/file/31247488/Ch6docx/
This study source was downloaded by 100000831743592 from CourseHero.com on 04-30-2022 14:13:17 GMT -05:00
https://www.coursehero.com/file/31247488/Ch6docx/
Powered by TCPDF (www.tcpdf.org)