You are on page 1of 11

InfoGuru India Institute of Computer Sc.

& Info-Tech Education

DBMS Notes
What Is a Database? In very simple terms, a database is a collection of inter-related data items, stored and maintained in the form of structured information. Databases are designed specifically to manage large bodies of information, and they store data in an organized and structured manner that makes it easy for users to manage and retrieve that data when required. A Database Management System (DBMS) is a software program that enables users to create and maintain databases. A DBMS also allows users to write queries for an individual database to perform required actions like retrieving data, modifying data, deleting data, and so forth. DBMSs support tables (a.k.a. relations or entities) to store data in rows (a.k.a. records or tuples) and columns (a.k.a. fields or attributes), similar to how data appears in a spreadsheet application. A relational database management system, or RDBMS, is a type of DBMS that stores information in the form of related tables. RDBMS is based on the relational model. Spreadsheets VS Database A database is designed to perform the following actions in an easier and more productive manner than a spreadsheet application would require: Retrieve all records that match particular criteria. Update or modify a complete set of records at one time. Extract values from records distributed among multiple tables.

Advantages of using a Database Compactness: Databases help in maintaining large amounts of data, and thus completely replace voluminous paper files. Speed: Searches for a particular piece of data or information in a database are much faster than sorting through piles of paper. Less drudgery: Maintaining files by hand is dull work; using a database completely eliminates such maintenance. Currency: Database systems can easily be updated and so provide accurate information all the time and on demand.

Benefits of Using a Relational Database Management System(RDBMS) RDBMSs offer various benefits by controlling the following: Redundancy: RDBMSs prevent having multiple duplicate copies of the same data, which takes up disk space unnecessarily. Inconsistency: Each redundant set of data may no longer agree with other sets of the same data. When an RDBMS removes redundancy, inconsistency cannot occur. Data integrity: Data values stored in the database must satisfy certain types of consistency constraints.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

Data atomicity: In event of a failure, data is restored to the consistent state it existed in prior to the failure. For example, fund transfer activity must be atomic. Access anomalies: RDBMSs prevent more than one user from updating the same data simultaneously; such concurrent updates may result in inconsistent data. Data security: Not every user of the database system should be able to access all the data. Security refers to the protection of data against any unauthorized access. Transaction processing: A transaction is a sequence of database operations that represents a logical unit of work. In RDBMSs, a transaction either commits all the changes or rolls back all the actions performed until the point at which failure occurred. Recovery: Recovery features ensure that data is reorganized into a consistent state after a transaction fails. Storage management: RDBMSs provide a mechanism for data storage management. The internal schema defines how data should be stored.

Comparing Desktop and Server RDBMS Systems In the industry today, we mainly work with two types of databases: desktop databases and server databases. Here, well give you a brief look at each of them. Desktop Databases: Desktop databases are designed to serve a limited number of users and run on desktop PCs, and they offer a less-expensive solution wherever a database is required. Chances are you have worked with a desktop database programMicrosoft SQL Server Express, Microsoft Access, Microsoft FoxPro, FileMaker Pro, Paradox, and Lotus represent a wide range of desktop database solutions. Desktop databases differ from server databases in the following ways: Less expensive: Most desktop solutions are available for just a few hundred dollars. In fact, if you own a licensed version of Microsoft Office Professional, youre already a licensed owner of Microsoft Access, which is one of the most commonly and widely used desktop database programs around. User friendly: Desktop databases are quite user friendly and easy to work with, as they do not require complex SQL queries to perform database operations (although some desktop databases also support SQL syntax if you would like to code). Desktop data-bases generally offer an easy-to-use graphical user interface. Server Databases: Server databases are specifically designed to serve multiple users at a time and offer features that allow you to manage large amounts of data very efficiently by serving multiple user requests simultaneously. Well-known examples of server databases include Microsoft SQL Server, Oracle, Sybase, and DB2. Some other characteristics that differentiate server databases from their desktop counterparts: Flexibility: Server databases are designed to be very flexible to support multiple platforms, respond to requests coming from multiple database users, and perform any database management task with optimum speed.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

Availability: Server databases are intended for enterprises, and so they need to be available 24/7. To be available all the time, server databases come with some high availability features, such as mirroring and log shipping. Performance: Server databases usually have huge hardware support, and so servers running these databases have large amounts of RAM and multiple CPUs, and this is why server databases support rich infrastructure and give optimum performance. Scalability: This property allows a server database to expand its ability to process and store records even if it has grown tremendously.

The Database Life Cycle The database life cycle defines the complete process from conception to implementation. The entire development and implementation process of this cycle can be divided into small phases; only after the completion of each phase can you move on to the next phase, and this is the way you build your database block by block. Before getting into the development of any system, you need to have strong a life-cycle model to follow. The model must have all the phases defined in proper sequence, which will help the development team to build the system with fewer problems and full functionality as expected. The database life cycle consists of the following stages, from the basic steps involved in designing a global schema of the database-to-database implementation and maintenance: Requirement analysis: Requirements need to be determined before you can begin design and implementation. The requirements can be gathered by interviewing both the producer and the user of the data; this process helps in creating a formal requirement specification. Logical design: After requirement gathering, data and relationships need to be defined using a conceptual data modeling technique such as an entity relationship (ER) diagram. Physical design: Once the logical design is in place, the next step is to produce the physical structure for the database. The physical design phase involves table creation and selection of indexes. Database implementation: Once the design is completed, the database can be created through implementation of formal schema using the data definition language (DDL) of the RDBMS. Data modification: Data modification language (DML) can be used to query and update the database as well as set up indexes and establish constraints such as referential integrity. Database monitoring: As the database begins operation, monitoring indicates whether performance requirements are being met; if they are not, modifications should be made to improve database performance. Thus the database life cycle continues with monitoring, redesign, and modification.

Mapping Cardinalities Tables are the fundamental components of a relational database. In fact, both data and relationships are stored simply as data in tables. Tables are composed of rows and columns. Each column represents a piece of information.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be associated via a relationship set. Cardinality refers to the uniqueness of data values contained in a particular column of a database table. The term relational database refers to the fact that different tables quite often contain related data. For example, one sales rep in a company may take many orders, which were placed by many customers. The products ordered may come from different suppliers, and chances are that each supplier can supply more than one product. All of these relationships exist in almost every database and can be classified as follows: 1. One-to-One (1:1): For each row in Table A, there is at most only one related row in Table B, and vice versa. This relationship is typically used to separate data by frequency of use to optimally organize data physically. For example, one department can have only one department head. 2. One-to-Many (1:M): For each row in Table A, there can be zero or more related rows in Table B, but for each row in Table B, there is at most one row in Table A. This is the most common relationship. 3. Many-to-Many (M:M): For each row in Table A, there are zero or more related rows in Table B, and vice versa. Many-to-many relationships are not so easy to achieve, and they require a special technique to implement them. This relationship is actually implemented in a one-many-one format, so it requires a third table (often referred to as a junction table) to be introduced in between that serves as the path between the related tables. This is a very common relationship. The KEY concept in RDBMS: Relationships are represented by data in tables. To establish a relationship between two tables, you need to have data in one table that enables you to find related rows in another table. A key is one or more columns of a relation that is used to identify a row. A primary key is an attribute (column) or combination of attributes (columns) whose values uniquely identify records in an entity. Before you choose a primary key for an entity, an attribute must have the following properties: Each record of the entity must have a not-null value. The value must be unique for each record entered into the entity. The values must not change or become null during the life of each entity instance. There can be only one primary key defined for an entity.

Besides helping in uniquely identifying a record, the primary key also helps in searching records as an index automatically gets generated as you assign a primary key to an attribute. An entity will have more than one attribute that can serve as a primary key. Any key or minimum set of keys that could be a primary key is called a candidate key. Once candidate keys are identified, choose one, and only one, primary key for each entity. Sometimes it requires more than one attribute to uniquely identify an entity. A primary key that consists of more than one attribute is known as a composite key. There can be only one primary key in an entity, but a composite key can have multiple attributes.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

A foreign key is an attribute that completes a relationship by identifying the parent entity. Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity. Every relationship in the model must be supported by a foreign key. Data integrity: Data integrity means that data values in a database are correct and consistent. There are two aspects to data integrity: entity integrity and referential integrity. Entity Integrity: No part of a primary key can be null. This is to guarantee that primary key values exist for all rows. The requirement that primary key values exist and that they are unique is known as entity integrity (EI). The DBMS enforces entity integrity by not allowing operations (INSERT, UPDATE) to produce an invalid primary key. Any operation that creates a duplicate primary key or one containing nulls is rejected. That is, to establish entity integrity, you need to define primary keys so the DBMS can enforce their uniqueness. Referential Integrity: Once a relationship is defined between tables with foreign keys, the key data must be managed to maintain the correct relationships, that is, to enforce referential integrity (RI). RI requires that all foreign key values in a child table either match primary key values in a parent table or (if permitted) be null. This is also known as satisfying a foreign key constraint.

Normalization: Normalization is a technique for avoiding potential update anomalies, basically by minimizing redundant data in a logical database design. Normalized designs are in a sense better designs because they (ideally) keep each data item in only one place. Normalized database designs usually reduce update processing costs but can make query processing more complicated. These trade-offs must be carefully evaluated in terms of the required performance profile of a database. Often, a database design needs to be de-normalized to adequately meet operational needs. Normalizing a logical database design involves a set of formal processes to separate the data into multiple, related tables. The result of each process is referred to as a normal form. Five normal forms have been identified in theory, but most of the time third normal form (3NF) is as far as you need to go in practice. To be in 3NF, a relation (the formal term for what SQL calls a table and the precise concept on which the mathematical theory of normalization rests) must already be in second normal form (2NF), and 2NF requires a relation to be in first normal form (1NF). Importance of Normalization: Normalization is the process of efficiently organizing data in a database. There are two goals of the normalization process: eliminating redundant data (for example, storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

The Normal Forms: The database community has developed a series of guidelines for ensuring that databases are normalized. These are referred to as normal forms and are numbered from one (the lowest form of normalization, referred to as first normal form or 1NF) through five (fifth normal form or 5NF). In practical applications, you'll often see 1NF, 2NF, and 3NF along with the occasional 4NF. Fifth normal form is very rarely seen and won't be discussed in this article. Before we begin our discussion of the normal forms, it's important to point out that they are guidelines and guidelines only. Occasionally, it becomes necessary to stray from them to meet practical business requirements. However, when variations take place, it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. That said, let's explore the normal forms. First Normal Form (1NF) First normal form (1NF) sets the very basic rules for an organized database:

Eliminate duplicative columns from the same table. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key).

Second Normal Form (2NF) Second normal form (2NF) further addresses the concept of removing duplicative data:

Meet all the requirements of the first normal form. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. Create relationships between these new tables and their predecessors through the use of foreign keys.

Third Normal Form (3NF) Third normal form (3NF) goes one large step further:

Meet all the requirements of the second normal form. Remove columns that are not dependent upon the primary key.

Fourth Normal Form (4NF) Finally, fourth normal form (4NF) has one additional requirement:

Meet all the requirements of the third normal form. A relation is in 4NF if it has no multi-valued dependencies.

Remember, these normalization guidelines are cumulative. For a database to be in 2NF, it must first fulfill all the criteria of a 1NF database.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

Example of Normalization: A database for an online bookstore needs to store certain information about the books available to the site viewers, such as:

Title Author Author Biography ISBN

Price Subject Number of Pages Publisher

Publisher Address Description Review Reviewer Name

Let's start by adding the book that coined the term Spreadsheet Syndrome. Because this book has two authors, we are going to need to accommodate both in our table. Lets take a look at a typical approach. Table 1. Two Books
Title Author Bio ISBN Subject Pages Publisher

Chad Russell is a programmer and Beginning MySQL Chad network administrator who owns MySQL, Database Design Russell, Jon his own Internet hosting company., 1590593324 Database and Optimization Stephens Jon Stephens is a member of the Design MySQL AB documentation team.

520

Apress

Drawbacks involved in this design: 1. First, this table is subject to several anomalies: I. II. III. We cannot list publishers or authors without having a book because the ISBN is a primary key which cannot be NULL (referred to as an insertion anomaly). Similarly, we cannot delete a book without losing information on the authors and publisher (a deletion anomaly). Finally, when updating information, such as an author's name, we must change the data in every row, potentially corrupting data (an update anomaly).

2. Second, this table is not very efficient with storage. Lets imagine for a second that our publisher is extremely busy and managed to produce 5000 books for our database. Across 5000 rows we would need to store information such as a publisher name, address, phone number, URL, contact email, etc. All that information repeated over 5000 rows is a serious waste of storage resources. 3. Third, this design does not protect data consistency. Lets once again imagine that Jon Stephens has written 20 books. Someone has had to type his name into the database 20 times, and it is possible that his name will be misspelled at least once (i.e.. John Stevens instead of Jon Stephens). Our data is now in an inconsistent state, and anyone searching for a book by author name will find some of the results missing. This also contributes to the update anomalies mentioned earlier.
Normalization is a part of relational theory, which requires that each relation (AKA table) has a primary key. As a result, this article assumes that all tables have primary keys, without which a table cannot even be considered to be in first normal form.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

First Normal Form The first normal form (or 1NF) requires that the values in each column of a table are atomic. By atomic we mean that there are no sets of values within a column. In our example table, we have a set of values in our author and subject columns. With more than one value in a single column, it is difficult to search for all books on a given subject or by a specific author. In addition, the author names themselves are non-atomic: first name and last name are in fact different values. Without separating first and last names it becomes difficult to sort on last name. One method for bringing a table into first normal form is to separate the entities contained in the table into separate tables. In our case this would result in Book, Author, Subject and Publisher tables. Table 2. Book Table
ISBN Title Pages

1590593324 Beginning MySQL Database Design and Optimization 520

Author_ID First_Name Last_name 1 2 3 Chad Jon Mike Russell Stephens Hillyer

Table 3. Author Table

Subject_ID 1 2

Table 4. Subject Table


Name MySQL Database Design

Publisher_ID Name 1

Table 5. Publisher Table


Address

City

State

Zip

Apress 2560 Ninth Street, Station 219 Berkeley California 94710

The Author, Subject, and Publisher tables use what is known as a surrogate primary key -- an artificial primary key used when a natural primary key is either unavailable or impractical. In the case of author we cannot use the combination of first and last name as a primary key because there is no guarantee that each author's name will be unique, and we cannot assume to have the author's government ID number (such as SIN or SSN), so we use a surrogate key. Some developers use surrogate primary keys as a rule, others use them only in the absence of a natural candidate for the primary key. From a performance point of view, an integer used as a surrogate primary key can often provide better performance in a join than a composite primary key across several columns. However, when using a surrogate primary key it is still important to create a UNIQUE key to ensure that duplicate records are not created inadvertently.

Page

InfoGuru India Institute of Computer Sc. & Info-Tech Education

By separating the data into different tables according to the entities each piece of data represents, we can now overcome some of the anomalies mentioned earlier: I. II. III. We can add authors who have not yet written books, We can delete books without losing author or publisher information, and Information such as author names are only recoded once, preventing potential inconsistencies when updating.

Depending on your point of view, the Publisher table may or may not meet the 1NF requirements because of the Address column: on the one hand it represents a single address, on the other hand it is a concatenation of a building number, street number, and street name. The decision on whether to further break down the address will depend on how you intend to use the data: if you need to query all publishers on a given street, you may want to have separate columns. If you only need the address for mailings, having a single address column should be acceptable (but keep potential future needs in mind). Defining Relationships As you can see, while our data is now split up, relationships between the tables have not been defined. There are various types of relationships that can exist between two tables:

One to (Zero or) One One to (Zero or) Many Many to Many

The relationship between the Book table and the Author table is a many-to-many relationship: A book can have more than one author, and an author can write more than one book. To represent a many-to-many relationship in a relational database we need a third table to serve as a link between the two. By naming the table appropriately, it becomes instantly clear which tables it connects in a many-to-many relationship (in the following example, between the Book and the Author table). Table 6. Book_Author Table
ISBN Author_ID

1590593324 1 1590593324 2

Similarly, the Subject table also has a many-to-many relationship with the Book table, as a book can cover multiple subjects, and a subject can be explained by multiple books: Table 7. Book_Subject Table ISBN Subject_ID

Page

1590593324 2

1590593324 1

InfoGuru India Institute of Computer Sc. & Info-Tech Education

Now we have established the relationships between the Book, Author, and Subject tables. A book can have an unlimited number of authors, and can refer to an unlimited number of subjects. We can also easily search for books by a given author or referring to a given subject. The case of a one-to-many relationship exists between the Book table and the Publisher table. A given book has only one publisher (for our purposes), and a publisher will publish many books. When pointing to the primary key of the table representing the one. Here is the new Book table: Table 8. Book Table
ISBN Title Pages Publisher_ID 1

we have a one-to-many relationship, we place a foreign key in the table representing the many,

1590593324 Beginning MySQL Database Design and Optimization 520

Since the Book table represents the many portion of our one-to-many relationship, we have placed the primary key value of the Publisher as in aPublisher_ID column as a foreign key. In the tables above the values stored refer to primary key values from the Book, Author, Subject and Publisher tables. Columns in a table that refer to primary keys from another table are known as foreign keys, and serve the purpose of defining data relationships. In database systems (DBMS) which support referential integrity constraints, such as the InnoDB storage engine for MySQL, defining a column as a foreign key will allow the DBMS to enforce the relationships you define. For example, with foreign keys defined, the InnoDB storage engine will not allow you to insert a row into the Book_Subject table unless the book and subject in question already exist in the Book and Subject tables or if you're inserting NULL values. Such systems will also

prevent the deletion of books from the book table that have child entries in the Book_Subject or Book_Author tables. Second Normal Form Where the First Normal Form deals with atomicity of data, the Second Normal Form (or 2NF) deals with relationships between composite key columns and non-key columns. The second normal form (or 2NF) any non-key columns must depend on the entire primary key. In the case of a composite primary key, this means that a non-key column cannot depend on only part of the composite key. Let's introduce a Review table as an example. Table 9. Review Table ISBN Author_ID Summary Author_URL

1590593324 3

A great book! http://www.openwin.org Page

10

InfoGuru India Institute of Computer Sc. & Info-Tech Education

In this situation, the URL for the author of the review depends on the Author_ID, and not to the combination of Author_ID and ISBN, which form the composite primary key. To bring the Review table into compliance with 2NF, the Author_URL must be moved to the Author table. Third Normal Form Third Normal Form (3NF) requires that all columns depend directly on the primary key. Tables violate the Third Normal Form when one column depends on another column, which in turn depends on the primary key (a transitive dependency). One way to identify transitive dependencies is to look at your table and see if any columns would require updating if another column in the table was updated. If such a column exists, it probably violates 3NF. In the Publisher table the City and State fields are really dependent on the Zip column and not the Publisher_ID. To bring this table into compliance with Third Normal Form, we would need a table based on zip code: Table 10. Zip Table Zip City State

94710 Berkeley California In addition, you may wish to instead have separate City and State tables, with the City_ID in the Zip table and the State_ID in the City table. A complete normalization of tables is desirable, but you may find that in practice that full normalization can introduce complexity to your design and application. More tables often means more JOIN operations, and in most database management systems (DBMSs) such JOIN operations can be costly, leading to decreased performance. The key lies in finding a balance where the first three normal forms are generally met without creating an exceedingly complicated schema.

Page

11