This action might not be possible to undo. Are you sure you want to continue?
C.krishna Priya ,MCA, Lecturer in computer science , Sri Sai Degree College , Anantapur.
5.NORMALIZATION AND DATABASE TABLES
Database Tables and Normalization Normalization is a process for evaluating and correcting table structures to minimize data redundancies ,and eliminates data anomalies. • Normalization works through a series of stages called normal forms.The first three stages are described as first normal form (1NF), second normal form (2NF), and third normal form (3NF). From a structural point of view, 2NF is better than 1NF, and 3NF is better than 2NF. • In order to meet performance requirements we need to denormalize some portions of database design. • Denormalization produces a lower normal form, i.e., a 3NF will be converted to a 2NF through denormalization. The Need for Normalization: To get a better idea of the normalization process, consider the database activities of Construction Company that manages several building projects.
The Data reflects the assignment of employees to projects .An the employee can be assigned to more than one project. Example: Darlene M.Smithson (EMP_NUM =112) has been assigned to two project Amber wave and star flight. The structure of data set in the above table has the following deficiencies 1. The project number (PROJ_NUM) is intended to be a primary key or at least a part of primary key, but it contains nulls. 2. The table entries invite data inconsistencies. example , The JOB_CLASS_VALUE “ Elect engineer “ might be entered as “Elect .Engg”. 3. The table displays data redundancies .Those redundancies yield the following anomalies : 1
. eliminating repeated groups in a table. one for each EMP_NUM=105 Insertion Anomalies : To complete a row definition . The Normalization Process: The objective of normalization is to ensure that each table conforms to the concept of well-formed relations.krishna Priya . i. the attribute PROJ_NUM is known as the determinant attribute and the attribute PROJ_NAME is known as the dependent attribute.DATABASE MANAGEMENT SYSTEMS C. Fully Functional Dependency: If the attribute B is functionally dependent on a composite key A but not on any subset of that composite key. the project will also be deleted.If that employee leaves the company and the employee data are deleted.MCA. Example : PROJ_NUM ? PROJ_NAME (read as PROJ_NUM functionally determines PROJ_NAME) In this case. and a student table will contain only student data. Each table is void of insertion. Each table represents a single subject . update.Thus a table that contains multivalued attributes or repeating groups is not a relation 2 .Similarly. a course table will contain only data that directly pertains to courses . tables that have the following characteristics: 1. All nonprime attributes in a table are dependent on the primary key . a phantom project must be created to complete the employee data. A relation is in 1NF if all the attributes come from domains with only atomic values . Lecturer in computer science . a new employee must be assigned to a project .. 4. Update Anomalies : Modifying JOB CLASS for Employee number 105 requires many alterations . 3. no repeating groups and PK identified 1NF and no partial dependencies 2NF and no transitive dependencies Every determinant is a candidate key (Special case of 3NF) 3NF and no independent multi-valued dependencies Functional Dependency The attribute B is fully functionally dependent on the attribute A if each value of A determines one and only one value of B. Anantapur. deletion anomalies .e. NORMAL FORMS First Normal Form (1NF) Second Normal Form(2NF) Third Normal Form (3NF) Boyce-Codd Normal Form (BCNF) Fourth Normal Form (4NF) CHARACTERISTICS Table format . If the employee is not assigned.The reason for this requirement is to ensure that the data are uniquely identifiable by a primary key value.The reason for this requirement is to ensure that the data are updated in only one place. FIRST NORMAL FORM (1NF): A relation is in 1NF if it contains no multivalued attributes i. To prevent the loss of information a fictitious employee must be created to save the project information. Sri Sai Degree College .This is to ensure the integrity and consistency of the data. 2.e. Deletion Anomalies: Only one employee is associated with a given project . the attribute B is fully functionally dependent on A.For example. No data item will be unnecessarily stored in more than one table . The most common normal forms and their basic characteristics are listed in the table below.
To maintain a proper primary key that will uniquely identify any attribute value. Database Engineer . Table 2: Now the above Table 2 is in First norm Form Step 2: Identify the Primary Key: In the above table. John G.23. eliminate nulls by making sure that each repeating group attribute contains an appropriate data value. EMP_NAME. they are determined by the combination of PROJ_NUM and EMP_NUM. Example: Below Table which is not in 1NF The below table contain multivalued attributes .$105. JOB_CLASS. Sri Sai Degree College . That can be represented as 3 .e.8 Step 3: Identify All Dependencies: From the above example. EMP_NAME . the PROJ_NUM is not an adequate primary key because the project number does not uniquely identify a row. JOB_CLASS.so the above relation is not in 1NF Table 1: Normalizing the table structure will reduce the data redundancies.krishna Priya . Lecturer in computer science . Step 1: Eliminate the Repeating Groups: To eliminate the repeating groups.. Anantapur.MCA. PROJ_NUM value 15 can identify any one of five employees. the new key composed of PROJ_NUM and EMP_NUM. PROJ_NAME. CHG_HOUR and HOURS values are all dependent on i. Conversion to First Normal Form: The conversion to 1NF process starts with a simple three –step procedure.News .CHG_HOUR and HOURS must be Evergreen . For example consider the PROJ_NUM =15 and EMP_NUM=101 uniquely identifies the entries for attributes PROJ_NAME .00. For example.DATABASE MANAGEMENT SYSTEMS C.
e. PROJ_NUM PROJ_NAME Similarly. For example: only PROJ_NUM determines the PROJ_NAME . 2. SECOND NORMAL FORM: A Relation is in 2NF if it is in 1NF and every non-key attribute is Fully Functionally Dependent (FFD) on the primary key . • Write each key component on a separate line . project name is dependent on the project number i. JOB_CLASS. then the table is automatically in 2NF. neither attribute is atleast part of a key –The condition is known as a transitive dependency NOTE: Transitive dependencies yield data anomalies.. CHG_HOUR. PROJ_NUM. 3. EMP_NUM PROJ_NAME. CHG_HOUR nor JOB_CLASS is a prime attribute that is . Lecturer in computer science . EMP_NAME. no non-key attribute is functionally dependent on part of the primary key CONVERSION TO SECOND NORMAL FORM: Converting to 2NF is done only when the 1NF has a composite primary key . underlined.Transitive Dependency :A transitive dependency is a dependency of one nonprime attribute on another nonprime attribute . Partial Dependency: A dependency based on only a part of a composite primary key is called Partial Dependency.If the 1NF has a single attribute primary key . JOB_CLASS. then write the original key on the last line 4 . i. The arrows below the dependency diagram indicate less desirable dependencies. EMP_NUM.DATABASE MANAGEMENT SYSTEMS C.that is PROJ_NAME is dependent on the part of the primary key (PROJ_NUM.The 1NF to 2NF conversion is simple starting with the 1NF format example above figure 1NF Step 1 : Write each key component on a separate line .. b. Sri Sai Degree College . EMP_NUM). The arrows above the attributes indicate all dependences that are based on the primary key i.MCA. For example. in this example the attributes are dependent on the combination of PROJ_NUM. EMP_NUM EMP_NAME.. The primary key attributes are bold.. CHG_HOUR All the dependencies for the above example can be represented in the below figure Figure 1NF Note the following dependency diagram features: 1. Anantapur.e.For example .e. Two types of dependencies exist: a. HOURS There are additional dependencies.krishna Priya .e. the project number identifies the project name i.
we need to go to only to the PROJECT table and make the change to only one record FIGURE 2NF NOTE : When a table’s primary key is composed of several attributes then the partial dependency can exist . A table whose primary key consists of only a single attribute is automatically in 2NF once it is in 1NF THIRD NORMAL FORM : A relation is in 3NF if and only if it is in 2NF and no transitive dependencies exist . the original table is now divided into three tables (PROJECT.CHG_HOUR) ASSIGNMENT(PROJ_NUM .For example .JOB_CLASS . A transitive dependency is a functional dependency between two non-key attributes CONVERSION TO THIRD NORMAL FORM: The 2NF to 3NF conversion is simple starting with the 2NF format example shown in above figure 2NF The anomalies resulted in Database organization shown in figure 2NF are easily eliminated by completing the following three steps : Step 1:Identify Each New Determinant : 5 . the three new tables ( PROJECT . For example consider . EMP_NAME . Lecturer in computer science .DATABASE MANAGEMENT SYSTEMS C.e.PROJ_NAME) EMPLOYEE(EMP_NUM .EMPLOYEE and ASSIGNMENT ) Step 2:Assign Corresponding Dependent Attributes : Determine those attributes that are dependent on other attributes . • Example : PROJ_NUM EMP_NUM PROJ_NUM EMP_NUM • Each component will become the key in a new table i.krishna Priya . EMPLOYEE . Sri Sai Degree College .if we want to add .ASSIGN_HOURS) The results of step 1 and step are shown in below figure . Anantapur. and ASSIGNMENT ) are described as by the following relational schemas : PROJECT(PROJ_NUM .EMP_NUM .Here Most of the anomalies have been eliminated .MCA.change or delete a PROJECT record .
ASSIGN_HOURS) IMPROVING THE DESIGN Normalization cannot be relied on to make good designs .The figure 2NF shows only one table that contains a transitive dependency . EMP_NUM EMP_NAME . the database contains four tables : PROJECT (PROJ_NUM . Evaluate PK Assignments : Each time a new employee is entered into the EMPLOYEE table .JOB_CLASS ) JOB (JOB_CLASS ..DATABASE MANAGEMENT SYSTEMS C.MCA.For example .In this case table name is JOB STEP 3: Remove the Dependent Attribute from Transitive Dependencies : Eliminate all dependent attributes in the transitive relationships from each of the tables that have such a transitive relationship . write its determinant as a PK for a new Table . In this example . Lecturer in computer science . Therefore .CHG_HOUR will be changed to JOB_CHG_HOUR to indicate its association with the JOB table . Anantapur. Therefore write the determinant for this transitive dependency as : JOB_CLASS Step 2 : Identify the Dependent Attributes : Identify the attributes that are dependent on each determinant identified in step 1 and identify the dependency In this example : JOB_CLASS CHG_HOUR Name the table to reflect its contents and function .CHG_HOUR) ASSIGNMENT(PROJ_NUM.e.Instead . PROJ_NAME) EMPLOYEE (EMP_NUM . Sri Sai Degree College . eliminate CHG_HOUR from the EMPLOYEE table i. a JOB_CLASS value must be entered .krishna Priya . entering DB Designer instead of Database Designer for the JOB_CLASS attribute in the EMPLOYEE table will trigger such a violation . 6 . • For example . After the 3NF conversion has been completed .Unfortunately it is too easy to make data-entry errors that lead to referential integrity violations . normalization is valuable because its use helps eliminate data redundancies . JOB_CLASS JOB_CLASS remains in the EMPLOYEE table to serve as a foreign key (FK). • A determinant is any attribute whose value determines other values within the row . EMP_NAME . • For every transitive dependency .EMP_NUM .For example . it would be better to add a JOB_CODE attribute to create a unique identifier Evaluate Naming Conventions : It is best adhere to the naming conventions .
Although this attribute would appear to have the same value as JOB_CHG_HOUR .would be desirable .DATABASE MANAGEMENT SYSTEMS C. single-valued attributes as indicated by the business rules and processing requirements . a surrogate key is numeric and its value is automatically incremented for each new row . Usually . the use of the EMP_NAME in the EMPLOYEE table is not atomic because .Such an attribute is said to exhibit atomicity. that is true if the JOB_CHG_HOUR value remains same forever .weekly total . a Surrogate key is a system defined attribute generally created and maintained via the DBMS. first name .For example . EMP _NAME can be subdivided into last name . Identify New Relationships : The designer must take care to place the right attributes in the right table by using normalization principles .In general designers prefer to use simple . In other words ASSIGN_HOURS represent hourly total . Anantapur.Data stored at their lowest level of granularity are said to be atomic data .the ASSIGNMENT table uses the ASSIGN_HOURS attribute to represent the hours worked by a given employee on a given project . Such derived attribute values can be calculated when they are needed to write reports or invoices Surrogate Key Considerations : At the implementation level. gross salary payments . daily total . Refine Attribute Atomicity: It is generally good practice to pay attention to the atomicity requirement . as shown in below figure 7 .Here JOB_CODE does not prevent duplicate entries from being made . Identify New Attributes : If the EMPLOYEE table were used in a real world environment .medical payments etc .It would be appropriate to name this attribute ASSIGN_CHG_HOUR . Refine Primary key as required for Data Granularity : Granularity refers to the level of detail represented by the values stored in a tables row .An atomic attribute is one that cannot be further subdivided . For example . Lecturer in computer science .monthly total or yearly total ? clearly ASSIGN_HOURS requires more careful definition . Evaluate Using Derived Attributes We can use a derived attribute in the ASSIGNMENT table to store the actual charge made to a project .MCA. Sri Sai Degree College . Clearly .krishna Priya .social security payments . to be named ASSIGN_CHARGE . For example .several other attributes would have to be added . is the result of multiplying the ASSIGN _HOURS by the ASSIGN_CHG_HOUR . Microsoft SQL Server uses an identity column and Oracle uses sequence object The JOB_CODE attribute was designated to be the JOB table’s primary key . Microsoft Access uses an AutoNumber datatype . The same principle must be applied to all other tables in the design . Maintain Historical Accuracy : Writing the Job charge per hour into the ASSIGNMENT table is crucial to maintaining the historical accuracy of the data in the ASSIGNMENT table .That derived attribute .
Attributes ADVISOR . • All the above anomalies result from the fact that there is determinant (ADVISOR in this example ) that is not a candidate key in the relation . A relation is said to be in BCNF if and only if every determinant in the table is a candidate key.DATABASE MANAGEMENT SYSTEMS C. anomalies may present even though the relation is in 3NF . MAJOR_GPA functionally dependent on primary key .In any case .It is shown in the following figure STEP 1: Modify the relation by making determinant as the component of the primary key as illustrated below Revised Table : SID ADVISOR MAJOR MAJOR_GPA 8 .4 3. This “Multiple duplicate records “ problem was created when the JOB_CODE attribute was added as the PK .5 3.9 In the above table .e. consider the following example STUDENT –ADVISOR table . • The Relation which is in 3NF can be converted into BCNF using a simple two-step process . MAJOR is functionally dependency . • Deletion Anomaly : If a student number 100 in computers withdraw from college .MCA. MAJOR is functionally dependent on ADVISOR and Each ADVISOR advices exactly one MAJOR .8 4. For example . Lecturer in computer science . There is a 2nd functional dependency i.it is not possible until atleast one student in computer is assigned to advisor FFF . primary key is SID+MAJOR . Anomalies in the STUDENT_ADVISOR Table : • Update Anomaly :To replace Physics advisor CCC by YYY this change must be made in two rows in the table • Insertion Anomaly : To insert a row with the information that FFF advices in Computers . The data entries in the above table are inappropriate because they duplicate existing records yet there is no violation of either the entity integrity or referential integrity.This reflects the constraint that A given STUDENT may have more than one MAJOR . Anantapur.7 4. That is When a table contains only one candidate key. i..e. we loss the information that BBB advices Computers .krishna Priya .for each MAJOR a STUDENT has exactly one ADVISOR and GPA. we still must ensure the existence of unique values in the JOB_DESCRIPTION through the use of a unique index Higher Normal Forms : The BOYCE-CODD NORMAL FORM(BCNF): When a relation has more than one candidate key. Sri Sai Degree College . SID MAJOR ADVISOR MAJOR_GPA 100 100 200 300 400 300 Electronics Computers Physics Computers Electronics Physics AAA BBB CCC DDD AAA CCC 4. if JOB_CODE is to be the surrogate PK .0 3.. the 3NF and BCNF are equivalent .
Anantapur. Lecturer in computer science . Multi-valued Dependencies (MVD): The type of dependency that exists when there are atleast three attributes .krishna Priya . The above table is converted into OFFERING by filling the empty cells OFFERING COURSE LECTURER TEXTBOOK Management Pranav Navathe Management Pranav R.K. Sri Sai Degree College . Consider the following table : OFFERING COURSE LECTURER TEXTBOOK Management Pranav Navathe Harsha R. there are no longer anomalies that result from functional dependencies . LECTURER .Taxali Management Sri chand Navathe Management Sri chand R.Taxali Sri chand IT Sashank Navathe Bala Guruswami In this table the following assumptions hold : • Each Course has a well-defined set of Lecturers .However .K.Taxali Management Harsha Navathe Management Harsha R. we will discover that revised table has a partial functional dependency . B and C in a relation . For each value of ‘A’ a well-defined set of values of ‘B’ and values of ‘C’.MCA. • Each Course has a well-defined set of text books that are used • The text books that are used for a given course are independent of the Lecturers for that Course.K. so as a second step decompose the table to eliminate the Partial functional dependency as shown below SID ADVISOR MAJOR_GPA ADVISOR MAJOR FOURTH NORMAL FORM (4NF): When a relation is in BCNF . Step 2:If we examine above modified table . A .K.DATABASE MANAGEMENT SYSTEMS C.Taxali IT Sashank Navathe IT Sashank Bala Guruswami In the above relation primary key is combination of (COURSE. TEXTBOOK) 9 • . for example . but those values ‘B’ and ‘C’ values are independent of each other . there may be some anomalies that result from multivalued dependencies .
To illustrate the proper role of normalization in the design process.two entities and their attributes are defined as : PROJECT(PROJ_NUM . three rows must be added to the OFFERING relation for each Lecturer .and programmer.MCA.DATABASE MANAGEMENT SYSTEMS C.EMP_FNAME .EMP_LNAME. EMP_INITIAL .we begin by identifying relevant entities .krishna Priya .JOB_DESCRIPTION .Therefore make sure that proposed entities meet the required normal form before the table structures are created . an ERD is created through an iterative process . • This type of dependency is known as Multi-valued dependency .let’s examine the operations of the contracting company Simple description of company’s operations .JOB_CHG_HOUR) PROJECT is in 3NF and needs no modification at this point EMPLOYEE requires additional scrutiny . Normalization represents a micro view of the entities within the ERD . Hence the relation is in BCNF • The above relation suffers from the following update anomaly . Initial contracting Company ERD Modified Contracting company ERD 10 . i.The JOB_DESCRIPTION attribute defines job classifications such as Systems analysts .To remove multivalued dependency the relation is decomposed into the following two relations TEXT LECTURER COURSE LECTURER COURSE • • TEXT • Now the relation is in 4NF NORMALIZATION AND DATABASE DESIGN : Normalization should be part of the design process . Anantapur. For example .e. we want to add a 3rd text book say author name Yashwant to the Management course .The ERD provides the big picture or macro view of an organization’s data requirements and operations Second .normalization focuses on the characteristics of specific entities . Sri Sai Degree College . • Relation has no determinants other than primary key . their attributes and their relationships . Lecturer in computer science . PROJ_NAME) EMPLOYEE(EMP_NUM.. First .database designer .
Joining a large number of tables takes additional input output (I/O) operations and processing logic . Sri Sai Degree College . Lecturer in computer science . Incorrect M:N Relationship representation Final Contract Company ERD is depicted below DENORMALIZATION : Denormalization: Denormalization is the process of combining two or more tables into a single table by using denormalization process . thereby reducing the system speed 11 . TO represent the M:N relationship between EMPLOYEE and PROJECT . or may do combination of both. Anantapur. the number of database tables expands. The problem with normalization is that as tables are decomposed to conform to normalization requirements . Denormalization may partition a relation into several physical records .We can reduce the data in memory space or storage space .MCA. Therefore .In general .DATABASE MANAGEMENT SYSTEMS C.may combine attributes from several relations together into one physical record. in order to generate information data must be put together from various tables . and each project can have many employees assigned to it .krishna Priya . we might think that two 1:M relationships could be used –An can be assigned to many projects .
introduce some small degree of redundant data in the model . Pre-aggregated data (also derived data) Information Requirements Impossible to generate the data required by the report using the plain SQL .This is required when creating a tabular report in which the columns represent data that is stored in the table as rows 12 . Lecturer in computer science .DATABASE MANAGEMENT SYSTEMS C. Common Denormalization Examples Case Example Rationale and Controls Redundant data Storing ZIP .creates “denormalized” relations .MCA. Anantapur.in some cases . Using a temporary denormalized table to hold report data . Sri Sai Degree College .The denormalized tables exist only as long as it takes to generate the report . The database design process could . • No need to maintain table .CITY attributes in • Avoid extra join operations the CUSTOMER table when ZIP • program can validate city (drop determines CITY down box) based on the zip code Derived Data Storing STU_HRS and STU_CLASS (student classification )When STU_HRS determines STU_CLASS Storing the student grade point average(STU_GPA) aggregate value in the STUDENT table when this can be calculated from the ENROLL and COURSE tables • Avoid extra join operations • program can validate classification (lookup) based on the student hours • • • • Avoid extra join operations Program computes the GPA every time a grade is entered or updated STU_GPA can be updated only via administrative routine .krishna Priya . the designer’s choices could be narrowed down to • Store the data in permanent denormalized table • Create temporary denormalized table from the permanent normalized tables .This .in effect .Temporary table is deleted once report is done • Processing speed is not an issue In this case there is enough storage space .