You are on page 1of 60

RDBMS CONCEPTS

What is Database ?

A database is a collection of informationpreferably related information and preferably organized.


-a database is a structured object. -the structured object consists of data and metadata -the metadata is the structured part (table definition)

-data is the actual stored descriptive information.


NAME
SMITH JONES MILLER . .

EDUCATION
M Tech B Tech Phd ...... ......

% OF Marks
98 89 87 . .

What is DBMS ? (DataBase Management System)


DBMS is collection of programs that enables you to store, modify, and extract information from a database. The general purpose of a DBMS is to provide for the definition, storage, and management of data in a centralized area that can be shared by many users.

Role of DBMS
data accessibility :It must ensure that each piece of data is available to all those users who need it.
data security : It must protect the data and ensure that it is not lost or damaged.

data privacy or data confidentiality : It must protect the data from access and change by unauthorized users.
data integrity : It must also guarantee that the data is consistent. This means that we must be confident that the data is accurate and reliable and we must be able to insist on certain values for the data For example : - every employee must have a name (not null) - every row in a table must be unique - anyone who is a manager must be a employee (FK) - a student's date of birth must be acceptable (30th, February 1973 would be invalid).

data concurrency :on a system where several users may be handling the same data at the same time, the DBMS must also ensure that there is no confusion when two or more users try to update the same record at the same time. data independence : For Programmers who are writing programs for use on a DBMS, DBMS should make sure that Individual users and programmers should not have to worry about how the data is held Nor should they need to know exactly where the data is held

Nor should they be concerned with pieces of data which are not used by their particular program
When these last three features are available, the DBMS is said to offer data independence or data transparency distributed database : it would be useful if DBMS support networking for accessing database which may be stored across the network

Data Model

A data model is a model that describes in an abstract way how data is represented in an information system or a database management system (DBMS).
The evolution of database modeling techniques

Relational Database Model The relational database model improves on the restriction of a hierarchical structure, not completely abandoning the hierarchy of data, as shown in Figure. Any table can be accessed directly without having to access all parent objects. The trick is to know what to look forif you want to find the address of a specific employee, you have to know which employee to look for, or you can simply examine all employees.

You dont have to search the entire hierarchy, from the company downward, to find a single employee.

Another benefit of the relational database model is that any tables can be linked together, regardless of their hierarchical position.

Relational Database Management System RDBMS


A RDBMS is a term used to describe an entire suite of programs for both managing a relational database and communicating with that relational database engine. Sometimes Software Development Kit (SDK) front-end tools and complete management kits are included with relational database packages (eg: MS Access) In other words, an RDBMS is both the database engine and any other tools that come with it.

Relation/Table Attribute/column/field

Tuples/ Record/ Row

Domain

Relational Databases

A relational database is organized as a


-set of tables of data and a -mathematical language (relational algebra) is available to manipulate the tables. The presentation of data in the form of tables is known as a relation

A relation is made up of a number of tuples (rows) and attributes (columns).


Domains are range of values which can appear in each column are taken from the domain for that attribute. Example :The AGE will be in the range 16 to 65, so the domain of the AGE attribute will be an integer in the range 16 to 65

Relational algebra

In addition to establishing and maintaining the individual tuples, there are also facilities for manipulating entire relations.
These operations form what is known as relational algebra. Typically, the operations of relational algebra allow us to: SELECT extract tuples to form a new relation PROJECT extract attributes to form a new relation JOIN add attributes from one relation to another relation PRODUCT combine all the tuples of one relation will all those of another relation

UNION join two relations to form a new relation

INTERSECT form a new relation from two relations where the key values are the same
The formal language of relational algebra is mathematical in form and differs from that of the commercial RDBMSs.

Database Model Design


Design is the process of ensuring that it all works without actually building it. Design is a little like testing something on paper before spending thousands of hours building it in possibly the wrong way.

Data structure diagram A useful way in which we can depict the entities and the relationships between them is by means of a data structure diagram or DSD. It is a picture of the data encountered within a computer system.

The data structure diagram is also known as a data model or a logical model or an entity-relationship model.

A data structure diagram Entities TUTOR teaches on This DSD tells us that we need to hold data about: The tutors The courses, and COURSE attended by

The students
The information which we need to know about each entity are its attributes

STUDENT

A data structure diagram showing attributes


Tutor number Name Address Subjects taught

TUTOR teaches on

COURSE attended by

Title Examinations available

STUDENT

Student enrolment number Name Address Telephone number Subjects studied

Relationships

An entity does not exist in isolation, but is associated with other entities by means of a relationship.
On a DSD the entities are linked, one to another, by lines representing the relationships. The relationships in our DSD are: teaches on - One tutor teaches on many courses.

is attended by - Each course is attended by many students.


Note that the arrowed lines point from the one end to the many end: from the one tutor to the many courses, and from the one course to the many students.

Types of relationship

1. One-to-one relationship
2. One-to-Many relationship 3. Many-to-Many relationship

One-to-One Relationship

Better transformation of one-to-one relationship

One to Many Relationship

Correct transformation of one-to-many relationship

Many to Many ?

Many to Many .?

Transformation of many-to-many relationship TWO One Many Relationship

Navigating through a data structure diagram


In any data structure diagram, it should be possible conceptually to navigate from entity to entity by way of the relationships. By doing this, we shall be creating paths by which the DBMS or our programs can access the data and use these paths to answer enquiries about the data.

COMPANY
cno C1 C2 C3 cname IGATE IBM ACCEN street GUINDY BGATTA Rd TIDEL PARK city CHENNAI BLORE CHENNAI ord_no 101 102 103

ORDER
cno C1 C3 C1 ord_date 12-FEB-87 25-JUN-90 29-AUG-92 ship_date 15-FEB-87 02-JUL-90 05-SEP-92

PRODUCT
prodid P1 P2 P3 P4 P5 P6 prodname Mouse Keyboard CPU 64MB RAM 128MB RAM Mother Board sellprice 350 400 4900 550 1000 5400 costprice 300 360 4200 520 930 4900

LINEITEM
ord_no 101 101 101 102 103 103 lineno 1 2 3 1 1 2 prodid P1 P2 P5 P2 P3 P6 qty 2 3 1 3 1 1 sellprice 350 400 1000 400 4900 5400

Data Modeling Continued

When talking about data in a database, it is common to look at the data in three different ways:
External or user views: the view of the data that each person has for their functional area. Logical Data Model or Conceptual Model: the entire collection of user views for each functional area. In other words, it is the enterprise's overall view of the database. Physical or Internal view: the structure used to physically implement the logical data model on a physical medium.

What is a Key field in a Database?

A key is the smallest subset of attributes from the relation such that the key is unique for each tuple.
A key functionally determines a tuple (row).

The selection of keys will depend on the particular application being considered.
Keys are crucial to a table structure for many reasons, some of which are identified below:
They ensure that each record in a table is precisely identified. They help establish and enforce various types of integrity. They serve to establish table relationships.

Functional dependency (FD)

A Functional Dependency describes a relationship between attributes in a single relation.


An attribute is functionally dependant on another if we can use the value of one attribute to determine the value of another. Ex 1: Employee_Name is functionally dependant on Social_Security_Number because Social_Security_Number can be used to determine the value of Employee_Name. Ex 2:If we know the value of the destination attribute, then we can find the corresponding value for the fare attribute. The fare attribute is therefore functionally dependent upon the destination attribute.

We use the symbol -> to indicate a functional dependency. -> means functionally determines Examples : destination -> fare

The attributes listed on the left hand side of the -> are called determinants. One can read A -> B as, "A determines B". Not all determinants are keys.

E F CODD's RULES

Codd presented twelve rules that a database must obey if it is to be considered truly relational.
1. The information rule. All information in a relational database is represented explicitly at the logical level and in exactly one way by values in tables.

Rule 1 is basically the informal definition of a relational database. 2. Guaranteed access rule. Each and every datum (atomic value) in a relational database is guaranteed to be logically accessible by resorting to a combination of table name, primary key value, and column name.

Rule 2 stresses the importance of primary keys for locating data in the database. The table name locates the correct table, the column name finds the correct column, and the primary key value finds the row containing an individual data item of interest. 3. Systematic treatment of null values. Null values are supported in a fully relational DBMS for representing missing information and inapplicable information in a systematic way, independent of the data type.

Rule 3 requires support for missing data through NULL values.

4. Dynamic online catalog based on the relational model. The database description is represented at the logical level in the same way as ordinary data, so that authorized users can apply the same relational language to its interrogation as they apply to the regular data.
Rule 4 requires that a relational database be self-describing. In other words, the database must contain certain system tables whose columns describe the structure of the database itself.

5. Comprehensive data sublanguage rule. A relational system may support several languages and various modes of terminal use (for example, the fill-in-the-blanks mode). However, there must be at least one language whose statements are expressible, per some well-defined syntax, as character strings, and that is comprehensive in supporting all of the following items: Data definition View definition Data manipulation (interactive and by program) Integrity constraints Authorization Transaction boundaries (begin, commit, and rollback)

Rule 5 mandates using a relational database language, such as SQL, although SQL is not specifically required. The language must be able to support all the central functions of a DBMS -creating a database, retrieving and entering data, implementing database security, and so on. 6. View updating rule. All views that are theoretically updateable are also updateable by the system. Rule 6 deals with views, which are virtual tables used to give various users of a database different views of its structure. It is one of the most challenging rules to implement in practice, and no commercial product fully satisfies it today.

7. High-level insert, update, and delete. The capability of handling a base relation or a derived relation as a single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data. Rule 7 stresses the set-oriented nature of a relational database. It requires that rows be treated as sets in insert, delete, and update operations. The rule is designed to prohibit implementations that only support row-at-a-time, navigational modification of the database. 8. Physical data independence. Application programs and terminal activities remain logically unimpaired whenever any changes are made in either storage representations or access methods.

9. Logical data independence. Application programs and terminal activities remain logically unimpaired when information preserving changes of any kind that theoretically permit unimpairment are made to the base tables. Rule 8 and Rule 9 insulate the user or application program from the low-level implementation of the database. They specify that specific access or storage techniques used by the DBMS, and even changes to the structure of the tables in the database, should not affect the user's ability to work with the data. 10. Integrity independence. Integrity constraints specific to a particular relational database must be definable in the relational data sublanguage and storable in the catalog, not in the application programs.

Rule 10 says that the database language should support integrity constraints that restrict the data that can be entered into the database and the database modifications that can be made. This is another of the rules that is not support in most commercial DBMS products.
11. Distribution independence. A relational DBMS has distribution independence. Rules 11 says that the database language must be able to manipulate distributed data location on other computer systems.

12. Nonsubversion rule. If a relational system has a low-level (single record at a time) language, that low level cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational language (multiple records at a time). Rule 12 prevents "other paths" into the database that might subvert the relational structure and integrity.

Data integrity
Data integrity means that the data values in the database are correct and consistent. Data integrity means, in part, that you can correctly and consistently navigate and manipulate the tables in the database. Data integrity is enforced in the relational model by entity and referential integrity rules.

Entity Integrity The entity integrity rule states that for every instance of an entity, the value of the primary key must exist, be unique, and cannot be null. Without entity integrity, the primary key could not fulfill its role of uniquely identifying each instance of an entity. Eg: EMPNO column is primary key in EMP table DEPTNO column is primary key in DEPT table

Referential Integrity

The referential integrity rule states that if a relational table has a foreign key, then every value of the foreign key must either be null or match the values in the relational table in which that foreign key is a primary key.
Referential integrity ensures that we can correctly navigate between related entities Eg : DEPTNO column in EMP is FOREIGN KEY with reference to DEPTNO column in DEPT which is a PRIMARY KEY

Foreign key.

A foreign key creates a hierarchical relationship between two associated entities.


The entity containing the foreign key is the child, or dependent, and the table containing the primary key from which the foreign key values are obtained is the parent. In order to maintain referential integrity between the parent and child as data is inserted or deleted from the database certain insert and delete rules are to be considered.

If FK and PK are in same table it is self referential

EMP table
EMPNO 7369 7499 7521 7566 7654 7698 7782 7788 7839 7844 7876 7900 7902 7934 ENAME SMITH ALLEN WARD JONES MARTIN BLAKE CLARK SCOTT KING TURNER ADAMS JAMES FORD MILLER JOB CLERK SALESMAN SALESMAN MANAGER SALESMAN MANAGER MANAGER ANALYST PRESIDENT SALESMAN CLERK CLERK ANALYST CLERK MGR 7902 7698 7698 7839 7698 7839 7839 7566 HIREDATE 17-DEC-80 20-FEB-81 22-FEB-81 02-APR-81 28-SEP-81 01-MAY-81 09-JUN-81 19-APR-87 17-NOV-81 7698 08-SEP-81 7788 23-MAY-87 7698 03-DEC-81 7566 03-DEC-81 7782 23-JAN-82 SAL 800 1600 1250 2975 1250 2850 2450 3000 5000 1500 1100 950 3000 1300 COMM DEPTNO 20 300 30 500 30 20 1400 30 30 10 20 10 0 30 20 30 20 10

DEPT table
DEPTNO DNAME 10 ACCOUNTING 20 RESEARCH 30 SALES 40 OPERATIONS LOC NEW YORK DALLAS CHICAGO BOSTON

SALGRADE table
GRADE 1 2 3 4 5 LOSAL 700 1201 1401 2001 3001 HISAL 1200 1400 2000 3000 9999

Modification Anomalies

Once our E-R model has been converted into relations, we may find that some relations are not properly specified. There can be a number of problems: Deletion Anomaly: Deleting a relation results in some related information (from another entity) being lost.
Insertion Anomaly: Inserting a relation requires we have information from two or more entities - this situation might not be feasible.

NORMALIZATION

NORMALIZATION

It is a process of analyzing the given relation schemas based on their Functional Dependencies (FDs) and primary key to achieve the properties - Minimizing redundancy - Minimizing insertion, deletion and update anomalies.
Note: Dont include computed fields

Normalization comprises a set of rules called Normal Forms i.e., First normal form, Second normal form, Third normal form, Forth normal form, Boyce/Codd Normal Form etc.,

FIRST NORMAL FORM


Reduce entities to first normal form (1NF) by removing repeating or multivalued attributes to another, child entity

Relation in 1NF

Second Normal Form

Reduce first normal form entities to second normal form (2NF) by removing attributes that are not dependent on the whole primary key (partial dependency)

Relation in 2NF

Transitive dependency

An attribute in a table is dependent on another non primary key attribute rather than the primary key.
A condition where A, B, and C are attributes of a relation such that if A -> B and B -> C then C is transitively dependent on A via B i.e, A->C is a transitive dependency. This situation is eliminated when the table is in third normal form.

Third Normal Form

A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive dependencies.
OR Reduce second normal form entities to third normal form (3NF) by removing attributes that depend on other, nonkey attributes.

Decomposed into 3NF

All the tables

You might also like