You are on page 1of 54

Introduction to RDBMS

Day - 1

What is Data?
Data (plural of the word Datum) - “Data is a factual
information used as a basis for reasoning, discussion, or
calculation”
Data may be numerical data which may be integers or
floating point numbers, and non-numerical data such as
characters, date and etc. Data by itself normally doesn’t
have a meaning associated with it.
e.g:Krishnan
01-Jan-71
15-Jun-05
50000

Information
Related data is often called as information. Information will
always have a meaning and context attached to the data
element. Let us consider the same example that we gave
for data. When we add meaning and context to the data it
becomes information:




Employee Name: Krishnan
Date of Birth: 01-Jan-71
Data of Joining: 15-Jun-05
Salary: 50000
Department Number: 10

Database
A logically coherent collection of related data
(“information”) with inherent meaning, built for a certain
application, and representing a subset of the "real-world".
For e.g. a customer database in your bank, details of the
insurance policies that we hold etc.

Evolution of Databases

or reset a file Read. write or delete a record . close.File Systems (1950’s) Basic Constructs Sequential records A record contains sequential fields Relies on indexes for random access ISAM (Index Sequential Access Method) VSAM (Virtual Storage Access Method) Basic operation Open.

Disadvantages of File processing system Data redundancy and inconsistency Difficulty in accessing data Data isolation Integrity problems Atomicity problem Concurrent access anomalies Security problems .

Hierarchical Model (1960’s) Data are represented by collection of records and the relationship among data are represented by links. A link is an association between precisely two records. A record is a collection of fields (attributes). which can be viewed as pointers. . each of which contains only one data value. Records are organized as collection of rooted trees.

end The account record type can be defined as type account = record account number: string. type customer = record customer name: string. There are two record types. balance: integer. end . customer and account.… Hierarchical Model Again consider the database representing a customeraccount relationship in a banking system. customer city: string. customer street: string.

… Hierarchical Model .

Network Model (1960’s) Data are represented by collection of records and the relationship among data are represented by links. Records are organized as arbitrary graphs. each of which contains only one data Link is an association between precisely two records. Record is a collection of fields (attributes). . which can be viewed as pointers.

… Network Model consider a database representing a customer-account relationship in a banking system. end The account record type can be defined as type account = record account number: string. customer city: string. There are two record types. type customer = record customer name: string. customer and account. customer street: string. end . balance: integer.

… Network Model .

Relational Model (1980’s) Data is presented as a collection of relations Each relation is depicted as a table Columns are attributes Rows ("Tuples") represent entities Every table has one or more set of attributes that taken together as a "key" uniquely identifies each entity Customer Account emp_no name street city emp_no ac_no balance E1 Hayes Main harrison E1 A-102 400 E2 Jhonson Alma Palo alto E2 A-101 500 E3 Turner Putnam stamford E2 A-201 900 E3 A-304 300 .

… Relational Model Relationship is established with the help of keys emp_no name street city E1 Hayes Main harrison E2 Jhonson Alma Palo alto E3 Turner Putnam stamford emp_no ac_no balance E1 A-102 400 E2 A-101 500 E2 A-201 900 E3 A-304 300 .

and extract information from a database.Database Management System (DBMS) A database management system (DBMS) is software that allows databases to be defined. modify. . and management of data in a centralized area that can be shared by many users. constructed. storage. A collection of programs that enables you to store. and manipulated. The general purpose of a DBMS is to provide for the definition.

Purpose of a DBMS DBMS developed to handle the following difficulties: • • • • • • • Data redundancy and inconsistency Difficulty in accessing data Data isolation .multiple files and formats Integrity problems Atomicity of updates Concurrent access by multiple users Security problems .

Functionality of a DBMS Specifying the database structure data definition language Manipulation of the database query processing and query optimization Integrity enforcement integrity constraints Concurrent control multiple user environment Crash recovery Security and authorization .

DBMS Approach .

Data independence capability of changing a database scheme without having to change the scheme at the next higher level two level of data independence .

Data Abstraction Physical Level – describes how data are actually stored Logical Level – describes what data are stored View Level – describes only a part of the database useful for a particular user View 1 View 2 View n Logical level Physical level .

Database Languages data definition language (DDL) : specify the conceptual database scheme data manipulation language (DML): query language used to retrieve and update information in a database host language: a conventional high level language used to write application programs .

DBMS Structure .

Relations .

… Relations .

… Relations .

This can be accomplished through a combination of the table name. primary key. .Codd’s Rule for RDBMS Rule 1: The Information Rule All data should be presented to the user in table form Rule 2: Guaranteed Access Rule All data should be accessible without ambiguity. and column name.

which is distinct from an empty string or a number with a value of zero. Rule 4: Dynamic On-Line Catalog Based on the Relational Model A relational database must provide access to its structure through the same tools that are used to access the data. . this can't apply to primary keys. This is usually accomplished by storing the structure definition within special system tables.Rule 3: Systematic Treatment of Null Values A field should be allowed to remain empty. Of course. most database implementations support the concept of a not null field constraint that prevents null values in a specific table column. This involves the support of a null value. In addition.

Rule 6: View Updating Rule Data can be presented to the user in different logical combinations. Each view should support the same full range of data manipulation that direct-access to a table has available. . called views. and database transaction control. In practice. All commercial relational databases use forms of the standard SQL (Structured Query Language) as their supported comprehensive language. data manipulation. providing update and delete access to logical views is difficult and is not fully supported by any current database. data integrity.Rule 5: Comprehensive Data Sub-language Rule The database must support at least one clearly defined language that includes functionality for data definition.

.Rule 7: High-level Insert. Rule 8: Physical Data Independence The user is isolated from the physical method of storing and retrieving information from the database. update. Changes can be made to the underlying architecture ( hardware. This rule states that insert. and delete operations should be supported for any retrievable set rather than just for a single row in a single table. and Delete Data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. Update. disk storage methods ) without affecting how the user accesses it.

This rule is not fully implemented by most major vendors. Most databases rely on strong ties between the user view of the data and the actual structure of the underlying tables. At a minimum. all databases do preserve two constraints through SQL.No component of a primary key can have a null value -. any value in it must exist as a primary key in another table .If a foreign key is defined in one table. This rule is particularly difficult to satisfy.Rule 9: Logical Data Independence How a user views data should not change when the logical structure (tables structure) of the database changes. -. Rule 10: Integrity Independence The database language (like SQL) should support constraints on user input that maintain database integrity.

Rule 12: Non-subversion Rule There should be no way to modify the database structure other than through the multiple row database language (like SQL). A variety of reasons make this rule difficult to implement.Rule 11: Distribution Independence A user should be totally unaware of whether or not the database is distributed (whether parts of the database exist in multiple locations). Most databases today support administrative tools that allow some direct manipulation of the data Structure. . I will spend time addressing these reasons when we discuss distributed databases.

Functional dependency Given a relation R. also in R. (written X → Y) if and only if each X value is associated with at most one Y value. Customarily we call X the determinant set and Y the dependent attribute. a set of attributes X in R is said to functionally determine another attribute Y. .

then X → Y Augmentation: If X → Y. one can derive several properties of functional dependencies. then XZ → YZ Transitivity: If X → Y and Y → Z. Y. then X → Z . and Z are sets of attributes in a relation R.Properties of functional dependencies Given that X. Among the most important are Armstrong's axioms Reflexivity: If Y is a subset of X.

Closure of a set of Functional Dependencies Given a set F of functional dependencies. The set of all functional dependencies logically implied by F is the closure of F denoted by F+. there are certain other functional dependencies that are logically implied by F. F+ can be found by applying Armstrong’s Axioms: .

…Closure of a set of Functional Dependencies • • We can further simplify the computation of F+ by using additional rules: .

CG ---. I) F= { A ---.> B A ---. G.> C CG ---.> H (transitivity A ---.> C. B. H. CG ---.> H.…Closure of a set of Functional Dependencies(example) R=(A. B ---.> H CG ---. C.> H) AG ---.> I B ---.> I) .> B.> H } Some members of F+: A ---.> I (pseudotransitivity A ---.> I) CG ---.> HI (union CG ---.

NORMALIZATION .

100 S3 10 Paris p1 200 S4 20 London p4. Each supplier may provide many parts. The company creates a simple relational table to store this information that can be expressed in relational notation as: FIRST (s#.Normalization A company obtains parts from a number of suppliers.p5 100. status. qty) S# Status City P# qty S1 20 London p1.p2 200. p#. city.p2.50 S2 10 Paris p1. A city can have more than one supplier located there and each city has a status code associated with it. Each supplier is located in one city.50 .200.p3 100.

First Normal Form (1 NF) A relational table. is in first normal form if all values of the columns are atomic. they contain no repeating values . by definition. That is.

then six rows would have to be updated with this new information. . Redundancy causes what are called update anomalies. UPDATE. then not only is the information about quantity and part lost but also information about the supplier. information about the supplier's location and the location's status have to be repeated for every part supplied. If supplier s1 moved from London to New York.1 NF Anomalies contains redundant data. DELETE. If a row is deleted. • • • INSERT. For example. The fact that a certain supplier (s5) is located in a particular city (Athens) cannot be added until they supplied a part.

.Second Normal Form (2 NF) A relational table is in second normal form 2NF if it is in 1NF and every non-key column is fully dependent upon the primary key.

status city —> status (s#.…2 NF • functional dependencies in the “First” table: s# —> city.p#) —>qty .

…2 NF The process for transforming a 1NF table to 2NF is: 1. and the columns they determine. 3. The original table may be renamed to maintain semantic meaning. 2. 5. The determinate becomes the primary key of the new table. 4. . Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key. Move the determined columns from the original table to the new table. Identify any determinants other than the composite key. Create and name a new table for each determinant and the unique columns it determines.

and city to a new table called SECOND. status. The column s# becomes the primary key of this new table .…2 NF To transform FIRST into 2NF we move the columns s#.

Deleting any row in SUPPLIER destroys the status information about the city as well as the association between supplier and city.2 NF Anomalies Tables in 2NF but not in 3NF still contain modification anomalies. • DELETE. they are: • INSERT. . The fact that a particular city has a certain status (Rome has a status of 50) cannot be inserted until there is a supplier in the city. In the example of SECOND.

In other words. . all nonkey attributes are functionally dependent only upon the primary key.Third Normal Form (3 NF) A relational table is in third normal form (3NF) if it is already in 2NF and every non-key column is non transitively dependent upon its primary key.

• SUPPLIER is in 2NF but not in 3NF because it contains the following transitive dependency. qty. p#). is fully dependent upon the primary key (s#.…3 NF • Table PARTS is already in 3NF. The non-key column. s# —> status s# —> city city —> status .

. and the columns they determine. Move the determined columns from the original table to the new table. 2. 4. Identify any determinants. The original table may be renamed to maintain semantic meaning. Create and name a new table for each determinant and the unique columns it determines. 5. 3.…3 NF The process of transforming a table into 3NF is: 1. The determinate becomes the primary key of the new table. other the primary key. Delete the columns you just moved from the original table except for the determinate which will serve as a foreign key.

city is left behind to serve as a foreign key to CITY_STATUS.…3 NF create a new table called CITY_STATUS and move the columns city and status into it. and the original table is renamed to SUPPLIER_CITY to reflect its semantic meaning . Status is deleted from the original table.

status) Primary Key (city) .s# SUPPLIER_CITY(s#.#p) Foreign Key (s#) references SUPPLIER_CITY. p#. These can be represented in "psuedo-SQL" as: PARTS (#s. city) Primary Key (s#) Foreign Key (city) references CITY_STATUS.city CITY_STATUS (city. qty) Primary Key (s#.… 3 NF The results of putting the original table into 3NF has created three tables.

UPDATE.Advantages of Third Normal Form The advantage of having relational tables in 3NF is that it eliminates redundant data which in turn saves space and reduces manipulation anomalies. . can be added even though there is not supplier in that city. Rome has a status of 50. Changing the location of a supplier or the status of a city requires modifying only one row. For example. • DELETE. facts about new suppliers can be added even though they have not yet supplied parts. the improvements to our sample database are: • INSERT. Likewise. Information about parts supplied can be deleted without destroying information about a supplier or a city. Facts about the status of a city.

• Boyce/Codd Normal Form: a relation is in BCNF if and only if every determinant is a candidate key.Boyce/Codd Normal Form • Determinant: an attribute or a group of attributes on which some other attribute is fully functionally dependent. .

igate.www.com .