You are on page 1of 31

Representation of facts, concepts, or instructions in a formalized manner suitable for communication, interpretation, or processing by humans or by automatic means.

Any representations such as characters or analog quantities to which meaning is or might be assigned. (Binary representation of logical entities.) Data may be defined as the smallest meaningful unit of information that cannot be broken further without losing meaning, or distinct pieces of Information

Knowledge acquired through study, experience, or instructions. A message received and understood.

Collection of facts from which conclusions may be drawn. Something known that reduces uncertainty. Information is data that have been put into a meaningful and useful context that may solve a problem, provide answer to a question, help to make a decision, improve knowledge, form message, conclude or reduce uncertainty.

Data Relationships among data Constraints Schema

Relationship has two meanings: Relationship among various data elements is mathematical relation such as Cartesian set or equation. Second: One to one, One to many and many to many.

1 Bank

Clearing House

Bank

Clearing House

Loan A/c

n
S/B A/c

FD A/c

Customer

Customer

Bank a/c

Customer Bank

Customer

Bank

A constraint is a limitation of possibilities. A limitation of any kind to be considered in planning, programming, scheduling, implementing, or evaluating programs. Constraint at its most general is a synonym for "rule". Often a constraint is conceptualized as a rule that restricts the types of the arguments that can appear within a tuple.

A constraint is a declarations on data that the data of the Actor and Target of an Operator have to satisfy. Data rule or restriction that is enforced within the database rather than at application or object level, e.g. primary key, unique key, foreign key (references), check, NOT NULL.

A conceptual model of the structure of a database that defines the data contents and relationships. A database definition language specification is an implementation of a particular schema. An information model implemented in a database. A schema may be a logical schema, which will define, for example, tables, columns, and constraints, but which may not include any optimization. It may be a physical schema that includes optimization, for example, table clustering. A description, organization and relationship of the data represented within a database. A schema is the set of objects (tables, views, indexes, etc) belonging to one set of data defined by some relation.

A Database Management System (DBMS) is a suite of computer programs that manage (i.e. organize and control), manipulates, and retrieves, stores and process data in a systematic way, manages requests from users and other programs as well as ensures security, recovery and integrity of the data. It may use variety of underlying storage methods, such as hierarchical, networking, relational, or objects.

Hierarchical Networking Relational Multi-dimensional Object Oriented -

Assumes that relationship among data items is hierarchical and (inverted) tree structure is the prevalent data structure. It is based on the notion of Logical Adjacency (logical proximity) in a linearized tree. Simplicity Data Security Data Integrity Efficiency Implementation Complexity Database Management Problems Lack of Structural Independence Programming Complexity Implementation Limitation

Many to many relationship cannot be represented.

Replaces the hierarchy with network, thus allowing more than one parent and thereby many to many relationship. Evolved to specifically handle non-hierarchical relationship. Relationship is assumed to be between two sets an owner set and member set. This model allows a member to appear in more than one set.

Simplicity Data Security Data Integrity Efficiency More relation types Data Independence Database Standards CODASYL - Conference

System Complexity Absence of structural independence

on Data Systems Languages - DBTG task group

Based on Dr. E. F. Codds principles that uses relational algebra.

Structural Independence Conceptual Simplicity Data Integrity Efficiency Design, implementation, maintenance and usage ease Ad hoc query capability Well established Database Standards

Hardware overheads Possibility of Bad design Information Islands

The objects/classes used or developed using programming language are directly used by this DBMS. Therefore, there is no need to transform the data into tables and establishing relationships among entities. A data model is a collection of mathematically well-defined concepts that help one to consider and express the static and dynamic properties of data. Static properties such as attributes and relations among objects Integrity rules over objects and operations Dynamic properties such as operations or rules defining new database states based on applied state changes.

Capability to handle large number of different data types {pictures, video, audio, (in future, odor)} Marriage of O-O with DBMS {Support multimedia} Improved productivity Explicit relationship helping better data access

Difficult to maintain Not suited for all applications reducing performance degradation

RDBMS is brain child of Dr. E. F. Codd, first presented in 1970 a paper entitled A Relational Model of Data for Large Shared Data Banks RDBMS is based on Relational Algebra. Relational algebra specifies the operations to be performed on existing relations to derive result relations. These are further separated into: Set oriented operations Relation oriented operations Relation is defined as set of tuples. Operations are: Union Intersection Set Difference Cartesian Product

Relation algebra is specially devised for RDBMS by Dr E. F. Codd. Operations are: Select Project Rename Join Divide

Formal
Relation Tuple Cardinality Attribute Degree

Informal
Table
Row, Record
Number of rows Column, Field Number of fields

Dr. E. F. Codd presented paper, entitled A Relational Model of Data for Large Shared Data Banks which laid the foundation of RDBMS. Chamberlain and Boyce presented a paper on Database Language, called SEQUEL at University of Ann Arbor, Michigan. This was copied by Larry Ellison, founder of Oracle. Due to copy-right, he named it as SQL and applied for its license.

In 1985, Dr. Edgar Frank Codd published a list of 12 rules that concisely define an ideal relational database, which have provided a guideline for the design of all relational database systems ever since.

All data should be presented to the user in table form..

All data should be accessible without ambiguity. This can be accomplished through a combination of the table name, primary key, and column name.

A field should be allowed to remain empty. This involves the support of a null value, which is distinct from an empty string or a number with a value of zero. Of course, this can't apply to primary keys.

A relational database must provide access to its structure through the same tools that are used to access the data.

The database must support at least one clearly defined language that includes functionality for data definition, data manipulation, data integrity, and database transaction control.

Data can be presented to the user in different logical combinations, called views. Each view should support the same full range of data manipulation that direct-access to a table has available.

This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

The user should be isolated from the physical method of storing and retrieving information from the database. Changes can be made to the underlying architecture (hardware, disk storage methods) without affecting how the user accesses it.

How a user views data should not change when the logical structure (tables structure) of the database changes.

The database language should support constraints on user input that maintain database integrity.

A user should be totally unaware of whether or not the database is distributed (whether parts of the database exist in multiple locations).

There should be no way to modify the database structure other than through the multiple row database language.

Data Normalization is a process of developing data attributes of an entity to increase the cohesion of entity types promoting integrity and eliminates redundancy.
1NF 2NF
Eliminate repeating groups - Make a separate table for each set of related attributes, and give each table a primary key. Eliminate redundant data - If an attribute depends on only part of a multi-valued key, remove it to a separate table.

3NF
BCNF 4NF

Eliminate columns not dependent on key - If attributes do not contribute to a description of the key, remove them to a separate table.
Boyce - Codd Normal Form - If there are non-trivial dependencies between candidate key attributes, separate them out into distinct tables. Isolate independent multiple relations - No table may contain two or more 1:n or n:m relationships that are not directly related.

5NF
ONF DKNF

Isolate Semantically Related Multiple Relationships - There may be practical constrains on information that justify separating logically related many-to-many relationships.
Optimal Normal Form - a model limited to only simple (elemental) facts, as expressed in Object Role Model notation. Domain-Key Normal Form - a model free from all modification anomalies.

Conference on Data Systems Languages (CODASYL) Database Task Group defined standards for databases usually termed as DDL and DML.

ANSI/SPARC (Standards Planning and Requirements Committee) recommended visual representation for conceptual model, which later on became famous as Entity-Relationship Diagram (ERD).

It has Entities, attributes of entities and relations. The notations are:

Entity
Attribute Relationship

The entity name is written in upper case whereas the attribute name is written in lower case. The primary keys are underlined. The attributes are connected using lines to the entities. If attribute is single valued, single line is used. If it is multi-valued, double lines are used. If attribute is derived, dotted line is used. If attribute is composite, ellipses shown emanating from it. Degree: It is number of associated entities. Dependency: Strong and Weak. Weak entity is shown in double lined rectangle. Participation: Total or Partial (or Mandatory or Optional). Optional is shown by O. Has a and Is a relationship.