InfoGuru India Institute of Computer Sc.

& Info-Tech Education

DBMS Notes
What Is a Database? In very simple terms, a database is a collection of inter-related data items, stored and maintained in the form of structured information. Databases are designed specifically to manage large bodies of information, and they store data in an organized and structured manner that makes it easy for users to manage and retrieve that data when required. A Database Management System (DBMS) is a software program that enables users to create and maintain databases. A DBMS also allows users to write queries for an individual database to perform required actions like retrieving data, modifying data, deleting data, and so forth. DBMSs support tables (a.k.a. relations or entities) to store data in rows (a.k.a. records or tuples) and columns (a.k.a. fields or attributes), similar to how data appears in a spreadsheet application. A relational database management system, or RDBMS, is a type of DBMS that stores information in the form of related tables. RDBMS is based on the relational model. Spreadsheets VS Database A database is designed to perform the following actions in an easier and more productive manner than a spreadsheet application would require: • • • Retrieve all records that match particular criteria. Update or modify a complete set of records at one time. Extract values from records distributed among multiple tables.

Advantages of using a Database • • • • Compactness: Databases help in maintaining large amounts of data, and thus completely replace voluminous paper files. Speed: Searches for a particular piece of data or information in a database are much faster than sorting through piles of paper. Less drudgery: Maintaining files by hand is dull work; using a database completely eliminates such maintenance. Currency: Database systems can easily be updated and so provide accurate information all the time and on demand.

Benefits of Using a Relational Database Management System(RDBMS) RDBMSs offer various benefits by controlling the following: • • • Redundancy: RDBMSs prevent having multiple duplicate copies of the same data, which takes up disk space unnecessarily. Inconsistency: Each redundant set of data may no longer agree with other sets of the same data. When an RDBMS removes redundancy, inconsistency cannot occur. Data integrity: Data values stored in the database must satisfy certain types of consistency constraints.



InfoGuru India Institute of Computer Sc. if you own a licensed version of Microsoft Office Professional. Recovery: Recovery features ensure that data is reorganized into a consistent state after a transaction fails. and DB2. Page 2 . Desktop databases differ from server databases in the following ways: • Less expensive: Most desktop solutions are available for just a few hundred dollars. Here. Paradox. The internal schema defines how data should be stored. such concurrent updates may result in inconsistent data. Security refers to the protection of data against any unauthorized access. In RDBMSs. Microsoft Access. • User friendly: Desktop databases are quite user friendly and easy to work with. Server Databases: Server databases are specifically designed to serve multiple users at a time and offer features that allow you to manage large amounts of data very efficiently by serving multiple user requests simultaneously. In fact. and perform any database management task with optimum speed. Desktop data-bases generally offer an easy-to-use graphical user interface. Microsoft FoxPro. Comparing Desktop and Server RDBMS Systems In the industry today. we mainly work with two types of databases: desktop databases and server databases. For example. Chances are you have worked with a desktop database program—Microsoft SQL Server Express. as they do not require complex SQL queries to perform database operations (although some desktop databases also support SQL syntax if you would like to code). Oracle. Some other characteristics that differentiate server databases from their desktop counterparts: • Flexibility: Server databases are designed to be very flexible to support multiple platforms. Sybase. Data security: Not every user of the database system should be able to access all the data. Desktop Databases: Desktop databases are designed to serve a limited number of users and run on desktop PCs. fund transfer activity must be atomic. Well-known examples of server databases include Microsoft SQL Server. and Lotus represent a wide range of desktop database solutions. Transaction processing: A transaction is a sequence of database operations that represents a logical unit of work. we’ll give you a brief look at each of them. and they offer a less-expensive solution wherever a database is required. you’re already a licensed owner of Microsoft Access. data is restored to the consistent state it existed in prior to the failure. Storage management: RDBMSs provide a mechanism for data storage management. Access anomalies: RDBMSs prevent more than one user from updating the same data simultaneously. & Info-Tech Education • • • • • • Data atomicity: In event of a failure. a transaction either commits all the changes or rolls back all the actions performed until the point at which failure occurred. which is one of the most commonly and widely used desktop database programs around. FileMaker Pro. respond to requests coming from multiple database users.

both data and relationships are stored simply as data in tables. server databases come with some high availability features. from the basic steps involved in designing a global schema of the database-to-database implementation and maintenance: • • • • • • Requirement analysis: Requirements need to be determined before you can begin design and implementation. To be available all the time. you need to have strong a life-cycle model to follow. The entire development and implementation process of this cycle can be divided into small phases. and so they need to be available 24/7. In fact. data and relationships need to be defined using a conceptual data modeling technique such as an entity relationship (ER) diagram. redesign. only after the completion of each phase can you move on to the next phase. and so servers running these databases have large amounts of RAM and multiple CPUs. monitoring indicates whether performance requirements are being met. & Info-Tech Education • • • Availability: Server databases are intended for enterprises. Database monitoring: As the database begins operation. The model must have all the phases defined in proper sequence. which will help the development team to build the system with fewer problems and full functionality as expected. The Database Life Cycle The database life cycle defines the complete process from conception to implementation. Page 3 . Tables are composed of rows and columns. Mapping Cardinalities Tables are the fundamental components of a relational database. Each column represents a piece of information. The database life cycle consists of the following stages. if they are not. this process helps in creating a formal requirement specification. Performance: Server databases usually have huge hardware support. and modification. modifications should be made to improve database performance. the database can be created through implementation of formal schema using the data definition language (DDL) of the RDBMS. Physical design: Once the logical design is in place. such as mirroring and log shipping. Data modification: Data modification language (DML) can be used to query and update the database as well as set up indexes and establish constraints such as referential integrity. Logical design: After requirement gathering. Before getting into the development of any system. the next step is to produce the physical structure for the database. The physical design phase involves table creation and selection of indexes. Thus the database life cycle continues with monitoring. Scalability: This property allows a server database to expand its ability to process and store records even if it has grown tremendously. The requirements can be gathered by interviewing both the producer and the user of the data. Database implementation: Once the design is completed.InfoGuru India Institute of Computer Sc. and this is the way you build your database block by block. and this is why server databases support rich infrastructure and give optimum performance.

This relationship is typically used to separate data by frequency of use to optimally organize data physically. A primary key is an attribute (column) or combination of attributes (columns) whose values uniquely identify records in an entity. The value must be unique for each record entered into the entity. there can be zero or more related rows in Table B. there is at most only one related row in Table B. Cardinality refers to the uniqueness of data values contained in a particular column of a database table. and chances are that each supplier can supply more than one product. Many-to-Many (M:M): For each row in Table A. and vice versa. but a composite key can have multiple attributes. Before you choose a primary key for an entity. so it requires a third table (often referred to as a junction table) to be introduced in between that serves as the path between the related tables. This is a very common relationship. Page 4 . 3. A key is one or more columns of a relation that is used to identify a row. there are zero or more related rows in Table B. and only one. one sales rep in a company may take many orders. A primary key that consists of more than one attribute is known as a composite key. and vice versa. This relationship is actually implemented in a one-many-one format. The term relational database refers to the fact that different tables quite often contain related data. and they require a special technique to implement them. All of these relationships exist in almost every database and can be classified as follows: 1. Once candidate keys are identified. The KEY concept in RDBMS: Relationships are represented by data in tables. One-to-Many (1:M): For each row in Table A. The products ordered may come from different suppliers. For example. or cardinality ratios. 2. To establish a relationship between two tables. the primary key also helps in searching records as an index automatically gets generated as you assign a primary key to an attribute. An entity will have more than one attribute that can serve as a primary key. Any key or minimum set of keys that could be a primary key is called a candidate key. but for each row in Table B. There can be only one primary key defined for an entity. there is at most one row in Table A. choose one. & Info-Tech Education Mapping cardinalities. one department can have only one department head. For example. Many-to-many relationships are not so easy to achieve. The values must not change or become null during the life of each entity instance. Besides helping in uniquely identifying a record. you need to have data in one table that enables you to find related rows in another table. One-to-One (1:1): For each row in Table A. Sometimes it requires more than one attribute to uniquely identify an entity. an attribute must have the following properties: • • • • Each record of the entity must have a not-null value. primary key for each entity. express the number of entities to which another entity can be associated via a relationship set. which were placed by many customers. There can be only one primary key in an entity. This is the most common relationship.InfoGuru India Institute of Computer Sc.

Referential Integrity: Once a relationship is defined between tables with foreign keys. Any operation that creates a duplicate primary key or one containing nulls is rejected. The requirement that primary key values exist and that they are unique is known as entity integrity (EI). the key data must be managed to maintain the correct relationships. This is to guarantee that primary key values exist for all rows. Data integrity: Data integrity means that data values in a database are correct and consistent. The DBMS enforces entity integrity by not allowing operations (INSERT. Five normal forms have been identified in theory. you need to define primary keys so the DBMS can enforce their uniqueness. This is also known as satisfying a foreign key constraint. & Info-Tech Education A foreign key is an attribute that completes a relationship by identifying the parent entity.InfoGuru India Institute of Computer Sc. • Entity Integrity: No part of a primary key can be null. • Normalization: Normalization is a technique for avoiding potential update anomalies. Page 5 . RI requires that all foreign key values in a child table either match primary key values in a parent table or (if permitted) be null. that is. Often. That is. storing the same data in more than one table) and ensuring data dependencies make sense (only storing related data in a table). To be in 3NF. Normalized designs are in a sense “better” designs because they (ideally) keep each data item in only one place. to enforce referential integrity (RI). There are two aspects to data integrity: entity integrity and referential integrity. related tables. a database design needs to be de-normalized to adequately meet operational needs. basically by minimizing redundant data in a logical database design. There are two goals of the normalization process: eliminating redundant data (for example. Both of these are worthy goals as they reduce the amount of space a database consumes and ensure that data is logically stored. Importance of Normalization: Normalization is the process of efficiently organizing data in a database. but most of the time third normal form (3NF) is as far as you need to go in practice. These trade-offs must be carefully evaluated in terms of the required performance profile of a database. a relation (the formal term for what SQL calls a table and the precise concept on which the mathematical theory of normalization rests) must already be in second normal form (2NF). to establish entity integrity. UPDATE) to produce an invalid primary key. and 2NF requires a relation to be in first normal form (1NF). Every relationship in the model must be supported by a foreign key. Normalizing a logical database design involves a set of formal processes to separate the data into multiple. Foreign keys provide a method for maintaining integrity in the data (called referential integrity) and for navigating between different instances of an entity. Normalized database designs usually reduce update processing costs but can make query processing more complicated. The result of each process is referred to as a normal form.

these normalization guidelines are cumulative. & Info-Tech Education The Normal Forms: The database community has developed a series of guidelines for ensuring that databases are normalized. Third Normal Form (3NF) Third normal form (3NF) goes one large step further: • • Meet all the requirements of the second normal form. Second Normal Form (2NF) Second normal form (2NF) further addresses the concept of removing duplicative data: • • • Meet all the requirements of the first normal form. and 3NF along with the occasional 4NF. Remove subsets of data that apply to multiple rows of a table and place them in separate tables.InfoGuru India Institute of Computer Sc. it must first fulfill all the criteria of a 1NF database. A relation is in 4NF if it has no multi-valued dependencies. First Normal Form (1NF) First normal form (1NF) sets the very basic rules for an organized database: • • Eliminate duplicative columns from the same table. fourth normal form (4NF) has one additional requirement: • • Meet all the requirements of the third normal form. These are referred to as normal forms and are numbered from one (the lowest form of normalization. Create relationships between these new tables and their predecessors through the use of foreign keys. Fourth Normal Form (4NF) Finally. Remove columns that are not dependent upon the primary key. let's explore the normal forms. it becomes necessary to stray from them to meet practical business requirements. For a database to be in 2NF. it's extremely important to evaluate any possible ramifications they could have on your system and account for possible inconsistencies. referred to as first normal form or 1NF) through five (fifth normal form or 5NF). Page 6 . However. Remember. Before we begin our discussion of the normal forms. you'll often see 1NF. Occasionally. it's important to point out that they are guidelines and guidelines only. Fifth normal form is very rarely seen and won't be discussed in this article. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key). In practical applications. That said. 2NF. when variations take place.

Jon his own Internet hosting company. this table is subject to several anomalies: I. contact email. Finally. Because this book has two authors. Normalization is a part of relational theory. etc. Our data is now in an inconsistent state. & Info-Tech Education Example of Normalization: A database for an online bookstore needs to store certain information about the books available to the site viewers. this article assumes that all tables have primary keys. phone number. Across 5000 rows we would need to store information such as a publisher name. Table 1. First. 3. We cannot list publishers or authors without having a book because the ISBN is a primary key which cannot be NULL (referred to as an insertion anomaly). 2. III. Two Books Title Author Bio ISBN Subject Pages Publisher Chad Russell is a programmer and Beginning MySQL Chad network administrator who owns MySQL. II. Let’s once again imagine that Jon Stephens has written 20 books. this design does not protect data consistency. such as an author's name. this table is not very efficient with storage. we are going to need to accommodate both in our table. and anyone searching for a book by author name will find some of the results missing. Page 7 . All that information repeated over 5000 rows is a serious waste of storage resources. URL. Let’s imagine for a second that our publisher is extremely busy and managed to produce 5000 books for our database. such as: • • • • Title Author Author Biography ISBN • • • • Price Subject Number of Pages Publisher • • • • Publisher Address Description Review Reviewer Name Let's start by adding the book that coined the term “Spreadsheet Syndrome”. Database Design Russell.. 520 Apress Drawbacks involved in this design: 1. Similarly. This also contributes to the update anomalies mentioned earlier. Someone has had to type his name into the database 20 times. when updating information. and it is possible that his name will be misspelled at least once (i. Second.InfoGuru India Institute of Computer Sc. which requires that each relation (AKA table) has a primary key. Let’s take a look at a typical approach. address. we must change the data in every row. John Stevens instead of Jon Stephens).. 1590593324 Database and Optimization Stephens Jon Stephens is a member of the Design MySQL AB documentation team. As a result. Third.e. we cannot delete a book without losing information on the authors and publisher (a deletion anomaly). potentially corrupting data (an update anomaly). without which a table cannot even be considered to be in first normal form.

InfoGuru India Institute of Computer Sc. Some developers use surrogate primary keys as a rule. Station 219 Berkeley California 94710 The Author. so we use a surrogate key. Subject and Publisher tables. we have a set of values in our author and subject columns. In our example table. the author names themselves are non-atomic: first name and last name are in fact different values. With more than one value in a single column. Page 8 . when using a surrogate primary key it is still important to create a UNIQUE key to ensure that duplicate records are not created inadvertently. By atomic we mean that there are no sets of values within a column. In our case this would result in Book. From a performance point of view. Author Table Subject_ID 1 2 Table 4. Book Table ISBN Title Pages 1590593324 Beginning MySQL Database Design and Optimization 520 Author_ID First_Name Last_name 1 2 3 Chad Jon Mike Russell Stephens Hillyer Table artificial primary key used when a natural primary key is either unavailable or impractical. Subject. In addition. However. it is difficult to search for all books on a given subject or by a specific author. Without separating first and last names it becomes difficult to sort on last name. and Publisher tables use what is known as a surrogate primary key -. others use them only in the absence of a natural candidate for the primary key. Author. and we cannot assume to have the author's government ID number (such as SIN or SSN). Publisher Table Address City State Zip Apress 2560 Ninth Street. & Info-Tech Education First Normal Form The first normal form (or 1NF) requires that the values in each column of a table are atomic. One method for bringing a table into first normal form is to separate the entities contained in the table into separate tables. an integer used as a surrogate primary key can often provide better performance in a join than a composite primary key across several columns. In the case of author we cannot use the combination of first and last name as a primary key because there is no guarantee that each author's name will be unique. Table 2. Subject Table Name MySQL Database Design Publisher_ID Name 1 Table 5.

having a single address column should be acceptable (but keep potential future needs in mind). between the Book and the Author table). III. street number. relationships between the tables have not been defined. There are various types of relationships that can exist between two tables: • • • One to (Zero or) One One to (Zero or) Many Many to Many The relationship between the Book table and the Author table is a many-to-many relationship: A book can have more than one author. we can now overcome some of the anomalies mentioned earlier: I. Book_Subject Table ISBN Subject_ID Page 1590593324 2 9 1590593324 1 . Table 6. and street name. We can delete books without losing author or publisher information. To represent a many-to-many relationship in a relational database we need a third table to serve as a link between the two. We can add authors who have not yet written books. the Publisher table may or may not meet the 1NF requirements because of the Address column: on the one hand it represents a single address. and Information such as author names are only recoded once. The decision on whether to further break down the address will depend on how you intend to use the data: if you need to query all publishers on a given street. it becomes instantly clear which tables it connects in a many-to-many relationship (in the following example. II. as a book can cover multiple subjects. Defining Relationships As you can see. the Subject table also has a many-to-many relationship with the Book table. on the other hand it is a concatenation of a building number. you may want to have separate columns. Depending on your point of view. & Info-Tech Education By separating the data into different tables according to the entities each piece of data represents.InfoGuru India Institute of Computer Sc. By naming the table appropriately. If you only need the address for mailings. Book_Author Table ISBN Author_ID 1590593324 1 1590593324 2 Similarly. while our data is now split up. and an author can write more than one book. preventing potential inconsistencies when updating. and a subject can be explained by multiple books: Table 7.

& Info-Tech Education Now we have established the relationships between the Book. Here is the new Book table: Table 8. and Subject tables. The second normal form (or 2NF) any non-key columns must depend on the entire primary key.InfoGuru India Institute of Computer Sc. Let's introduce a Review table as an Page 10 . Subject and Publisher tables. the InnoDB storage engine will not allow you to insert a row into the Book_Subject table unless the book and subject in question already exist in the Book and Subject tables or if you're inserting NULL values. the Second Normal Form (or 2NF) deals with relationships between composite key columns and non-key columns. we have placed the primary key value of the Publisher as in aPublisher_ID column as a foreign key. this means that a non-key column cannot depend on only part of the composite key. In the case of a composite primary key. and can refer to an unlimited number of subjects. A given book has only one publisher (for our purposes). Second Normal Form Where the First Normal Form deals with atomicity of data. Review Table ISBN Author_ID Summary Author_URL 1590593324 3 A great book! http://www. In database systems (DBMS) which support referential integrity constraints. A book can have an unlimited number of authors. such as the InnoDB storage engine for MySQL. The case of a one-to-many relationship exists between the Book table and the Publisher table. Table 9. Author. 1590593324 Beginning MySQL Database Design and Optimization 520 Since the Book table represents the “many” portion of our one-to-many relationship. with foreign keys defined. defining a column as a foreign key will allow the DBMS to enforce the relationships you define. Book Table ISBN Title Pages Publisher_ID 1 we have a one-to-many relationship. we place a foreign key in the table representing the “many”. We can also easily search for books by a given author or referring to a given subject. Columns in a table that refer to primary keys from another table are known as foreign keys. When pointing to the primary key of the table representing the “one”. For example. In the tables above the values stored refer to primary key values from the Book. Author. and a publisher will publish many books. and serve the purpose of defining data relationships.openwin. Such systems will also prevent the deletion of books from the book table that have “child” entries in the Book_Subject or Book_Author tables.

Zip Table Zip City State 94710 Berkeley California In addition. the Author_URL must be moved to the Author table. which form the composite primary key. which in turn depends on the primary key (a transitive dependency).InfoGuru India Institute of Computer Sc. Tables violate the Third Normal Form when one column depends on another column. One way to identify transitive dependencies is to look at your table and see if any columns would require updating if another column in the table was updated. leading to decreased performance. & Info-Tech Education In this situation. it probably violates 3NF. with the City_ID in the Zip table and the State_ID in the City table. In the Publisher table the City and State fields are really dependent on the Zip column and not the Publisher_ID. and not to the combination of Author_ID and ISBN. we would need a table based on zip code: Table 10. the URL for the author of the review depends on the Author_ID. If such a column exists. To bring this table into compliance with Third Normal Form. To bring the Review table into compliance with 2NF. More tables often means more JOIN operations. but you may find that in practice that full normalization can introduce complexity to your design and application. Page 11 . A complete normalization of tables is desirable. and in most database management systems (DBMSs) such JOIN operations can be costly. you may wish to instead have separate City and State tables. The key lies in finding a balance where the first three normal forms are generally met without creating an exceedingly complicated schema. Third Normal Form Third Normal Form (3NF) requires that all columns depend directly on the primary key.