“INTRODUCTION TO DATABASE SYSTEMS” Introduction

• •

Q: What is a Database ? Answer from Pratt/Adamski: o A Database (DB) is structure that can store information about: 1. multiple types of entities, 2. the attributes that describe those entities; and 3. the relationships among the entities Answer from Elmasri/Navathe: o A Database (DB) is collection of related data - with the following properties: 1. A DB is logically coherent and has some relevant meaning 2. A DB is designed, built and populated with data for a specific purpose 3. A DB represents some aspect of the real world. Answer from Kroenke: An integrated, self-describing collection of related data o Integrated: Data is stored in a uniform way, typically all in one place (a single physical computer for example) o Self-Describing: A database maintains a description of the data it contains (Catalog) o Related: Data has some relationship to other data. In a University we have students who take courses taught by professors o By taking advantage of relationships and integration, we can provide information to users as opposed to simply data. o We can also say that the database is a model of what the users perceive. o Three main categories of models: 1. User or Conceptual Models: How users perceive the world and/or the business. 2. Logical Models: Represent the logic of how a a business operates. For example, the relationship between different entities and the flow of data through the organization. Based on the User's model. 3. Physical Models: Represent how the database is actually implemented on a computer system. This is based on the logical model. Database Management System (DBMS) A collection of software programs that are used to define, construct, maintain and manipulate data in a database. Database System (DBS) contains: The Database + The DBMS + Application Programs (what users interact with)

~

1

~

File Systems
• •

File System: A collection of individual files accessed by applications programs Limitations of a File System: o Separated and Isolated Data - Makes coordinating, assimilating and representing data difficult o Data Duplication - Wastes space and can lead to data integrity (inconsistency) problems o Application Program Dependencies - Changes to a single file can require changes to numerous application programs o Incompatible Files o Lack of Data Sharing - Difficult to control access to files, especially to individual portions of files Advantages of a DBMS A DBMS can provide: o Data Consistency and Integrity - by controlling access and minimizing data duplication o Application program independence - by storing data in a uniform fashion o Data Sharing - by controlling access to data items, many users can access data concurrently o Backup and Recovery o Security and Privacy o Multiple views of data

~

2

~

Example Database
An Example Database CustomerID 123 123 124 125 125 127 127
• • • • • • •

Name Mr. Smith Mr. Smith Mrs. Jones Mr. Axe Mr. Axe Mr. & Mrs. Builder Mr. & Mrs. Builder

Address

City

State Acct_Number 9987 9980 8811 4422 4433 3322 1122

Balance 4000 2000 1000 6000 9000 500 800

123 Lexington Smithville KY 123 Lexington Smithville KY 12 Davis Ave. 443 Grinder Ln. 443 Grinder Ln. Smithville KY Broadville GA Broadville GA

661 Parker Rd. Streetville GA 661 Parker Rd. Streetville GA

What happens when a customer moves to a new house ? Who should have access to what data in this database ? What happens if Mr. and Mrs. Builder both try and withdraw $500 from account 3322 ? What happens if the system crashes just as Mr. Axe is depositing his latest paycheck ? What data is the customer concerned with ? What data is a bank manager concerned with ? Send a mailing to all customers with checking accounts having greater than $2000 balance Let all GA customers know of a new branch location

Brief History of Database Systems
• •

1940's, 50's Initial use of computers as calculators. Limited data, focus on algorithms. Science, military applications. 1960's Business uses. Organizational data, customer data, sales, inventory, accounting, etc. File system based, high emphasis on applications programs to extract and assimilate data. Larger amounts of data, relatively simple calculations. 1970's The relational model. Data separated into individual tables. Related by keys. Initially required heavy system resources. Examples: Oracle, Sybase, Informix, Digital RDB, IBM DB2. 1980's Microcomputers - the IBM PC, Apple Macintosh. Database program such as DBase (sort of), Paradox, FoxPro, MS Access. Individual user can crate, maintain small databases.

~

3

~

• •

Late- 1980's Local area networks. Workgroups sharing resources such as files, printers, e-mail. Client/Server Database resides on a central server, applications programs run on client PCs attached to the server over a LAN. 1990's Internet and World Wide Web make databases of all kinds available from a single type of client - the Web Browser. Data warehousing and Data Mining also emerge. Other types of Databases: o Object-Oriented Database Systems. Objects (data and methods) stored persistently. o Distributed Database Systems. Copies of data reside at different locations for redundancy or for performance reasons.

Appropriate Use for a Database

In addition to the advantages already mentioned: o Performance o Expendability, Flexibility, Scalability o Reduced application development times o Standards enforcement However, keep in mind: o DBMS has High initial cost (although falling) o DBMS has High Overhead - requires powerful computers o DBMS are not special purpose software programs e.g., contrast a canned accouting software package like Quicken or QuickBooks with DBMS like MS Access.

When is a DBMS Not Appropriate? o Database is small with a simple structure o Applications are simple, special purpose and relatively static. o Applications have real-time requirements Examples: Traffic signal control ECU patient monitoring o Concurrent, multi-user access to data is not required.

Contents of a Database
A Database contains:
• • • •

User Data Metadata Indexes Application metadata

User Data

~

4

~

updating and viewing. Note relationship between the two tables . ~ 5 ~ . 9987 9980 8811 4422 4433 3322 1122 4000 2000 1000 6000 9000 500 800 The customer table has 4 records and 5 columns. Each table has one or more columns.split into 2 tables: Customer Table CustomerID 123 124 125 127 Accounts Table CustomerID Acct_Number Balance 123 123 124 125 125 127 127 • • • Name Mr. Smith Mrs. Jones Mr. Axe Address 123 Lexington 12 Davis Ave. primary keys.CustomerID column. The Accounts table has 7 records and 3 columns. data will be generally stored in tables with some relationships between tables. column name. data type. Builder 661 Parker Rd. etc. What were some problems we discussed ? Here is one improvement . Recall our example database for the bank. & Mrs. A set of columns forms a database record. Broadville GA Mr.• • • • • Data users work with directly by entering. Data that describe how user data are stored in terms of table name. For our purposes. Metadata • • • Recall that a database is self describing Metadata: Data about data. City State Smithville KY Smithville KY Streetville GA 443 Grinder Ln. How should we split data into the tables ? What are the relationships between the tables ? There are questions that are answered by Database Modeling and Database Design. length.

• Database Instance or State: The actual data contained in a database at a given time. In the case of the book. choose Analyze and then Documentor).• Metadata are typically stored in System tables or System Catalog and are typically only directly accessible by the DBMS or by the system administrator. Have a look at the Database Documentor feature of MS Access (under the tools menu. etc. o A set of operations for specifying retrieval and updates on a database o Examples: Relational. the pointer is a page number. It can also show metadata for Queries. ~ 6 ~ . Example: Index in a book consists of two things: 1) A Keyword stored in order 2) A pointer to the rest of the information. we focus on the Relational data model. Indexes • • • • • In keeping with our desire to provide users with several different views of data. in an MS Access database. Example: Look at the Documentor tool in MS Access. Updating data requires an extra step: The index must also be updated. Applications Metadata is accessed via the database development programs. relationships and constraints in data o Is independent of any application program o Changes infrequently Data Model: o A set of primitives for defining the structure of a database. Reports. Hierarchical. queries and other application components. Object-Oriented In this course. Data Modeling and Database Design • • • Database Design: The activity of specifying the schema of a database in a given data model Database Schema: The structure of a database that: o Captures data types. reports. This tool queries the system tables to give all kinds of Metadata for tables. Indexes allow the database to access a record without having to search through the entire table. indexes provide an alternate means of accessing user data. Networked. Forms. etc. Applications Metadata • • • Many DBMS have storage facilities for forms. Sorting and Searching: An index for our new banking example might include the account numbers in a sorted order.

Bottom-Up: Design systems from a specific perspective . Customer_Id in the ACCOUNTS table is called a Foreign Key Notice that when naming columns in the tables we always use an underscore character and do not use any other punctuation. Given a Customer_Id. A Systems Analysis and Design course (such as CIS 3900 for undergraduates. Top-Down: Design systems from an overall organization perspective 2. Account_Type. Data Modeling: Based on user requirements. Zip ACCOUNTS Customer_Id. it is not a good idea. we can uniquely identify the remaining information. There are many variations to this basic development process. Maintenance of the system begins. etc. This logical model is then converted to a physical data model (tables. even though Access allows you to use spaces. Street. how the system should behave. lets assume that the managers are interested in creating a database to track their customers and accounts. what functions should be supported. columns. State. Applications are then written to perform the required functions. Implementation: Based on the data model.A Brief Example For our Bank example.The Database Development Process Two overall approaches: 1. o o o Customer_Id is the key for the CUSTOMERS table. relationships. Testing: The system is tested using real data. Deployment: The system is deployed to users.) that will be implemented. Date_Opened. ~ 7 ~ . The following is a very brief outline describing the database development process. a database can be created. form a logical model of the system.one system at a time. • • • • • User needs assessment and requirements gathering: Determine what the user's are looking for. Account_Number. We call Customer_Id a Key for the CUSTOMERS table. City. etc. Designing A Database . • Tables CUSTOMERS Customer_Id. Account_Number is the key for the ACCOUNTS table. Balance Note that we use an artificial identifier (a number we make up) for the customer called Customer_Id. CIS 9490 for graduates) covers these topics in greater detail. Name.

• • Relationships The relationship between CUSTOMERS and ACCOUNTS is by Customer_Id. CUSTOMERS Domain Data Type Size Customer_Id (Key) Integer 20 Name Character 30 Street Character 30 City Character 25 State Character 2 Zip Character 5 ACCOUNTS Domain Data Type Size Customer_Id (FK) Integer 20 Account_Number (Key) Integer 15 Account_Type Character 2 Date_Opened Date Balance Real 12. Smith Mrs. Axe Address 123 Lexington 12 Davis Ave. • Accounts Table ~ 8 ~ . we call this a One to Many relationship. & Mrs. (1:N).2 Column Column • • • We use the above information to build a logical model of the database. Builder 661 Parker Rd. City State Zip 91232 91232 81992 81990 Smithville KY Smithville KY Streetville GA 443 Grinder Ln. Since a customer may have more than one account at the bank. Domain also includes the type and length or size of data found in each column. The following is some example data for the Accounts and Customers tables: Customer Table Customer_Id 123 124 125 127 Name Mr. Domains A domain is a set of values that a column may have. This logical model is then converted to a physical model and implemented as tables. Broadville GA Mr. Jones Mr.

3.Customer_Id Account_Number Account_Type Date_Opened Balance 123 123 124 125 125 127 127 • 9987 9980 8811 4422 4433 3322 1122 Checking Savings Savings Checking Savings Savings Checking 10/12/89 10/12/89 01/05/92 12/01/94 12/01/94 08/22/94 11/13/88 4000.00 Business Rules Business rules allow us to specify constraints on what data can appear in tables and what operations can be performed on data in tables. ~ 9 ~ . Money can only be transferred from a "Savings" account to a "Checking" account. Savings accounts with less than a $500 balance incur a service charge. specify and document logical data requirements for database processing systems.00 9000. 2. 4. How do we enforce business rules ? o o Constraints on the database Applications Entity Relationship Modeling • Entity Relationship Modeling: A Set of constructs used to interpret.00 6000.00 2000.00 800.00 500. A Customer can not be deleted if they have an existing (open) account. For example: 1.00 1000. An account balance can never be negative.

Attributes. Here we call them entities. Properties used to distinguish one entity instance from another. symbols used to represent the 4 main constructs. They can not be directly implemented in a database. Identifiers It is important to get used to this terminology and to be able to use it at the appropriate time. For example. E-R Modeling Constructs • • • E-R Modeling Constructs are: Entity. Mainly differences in notation. Attributes of entity EMPLOYEE might include: EmployeeID Social Security Number First Name Last Name Street Address City State ZipCode Date Hired Health Benefits Plan Attributes of entity PRODUCT might include: ProductID Product_Description Weight Size Cost ~ 10 ~ . Examples of Entities are: EMPLOYEE CUSTOMER ORGANIZATION PART INGREDIENT PURCHASE ORDER CUSTOMER ORDER PRODUCT An instance of an entity is like a specific example: Bill Gates is an Employee of Microsoft SPAM is a Product Greenpeace is an Organization Flour is an ingredient • Attribute: A characteristic of an Entity.• • • E-R Models are Conceptual Models of the database. Relationship. Many variations of E-R Modeling used in practice. in the ER Model. Entity: Some identifiable object relevant to the system being built. we do not refer to tables.

• Relationship: An association between two entities. For example: one CUSTOMER may place many CUSTOMER ORDERS many STUDENTS may sign up for many CLASSES one EMPLOYEE receives one PAYCHECK one SALESPERSON is assigned one COMPANY_CAR 1:N "One to Many" N:M "Many to Many" o o o o o 1:1 "One to One" Beware of 1:1 relationships. Most relationships in databases are binary. Typically split these into two 1:N relationships with an intersection entity. o A relationship can include one or more entities o The degree of a relationship is the number of Entities that participate in the relationship. The two entities involved might be coalesced into one. o A CUSTOMER places a CUSTOMER ORDER An EMPLOYEE takes a CUSTOMER ORDER A STUDENT enrolls in a COURSE A COURSE is taught by a FACULTY MEMBER o Relationships are typically given names. o Typically we look for unique identifiers: o Social Security Number uniquely identifies an EMPLOYEE o CustomerID uniquely identifies a CUSTOMER o We can also use two attributes to indicate an identifier: ORDER_NUMBER and LINE_ITEM uniquely identify an item on an order. • Identifier: A special attribute used to identify a specific instance of an entity. o Relationships of degree 2 are called binary relationships. For example. E-R Diagrams • The most common way to represent the E-R constructs is by using a diagram ~ 11 ~ . Beware of N:M relationships. Also called HAS-A relationship. Exercise: Choose one of your attributes as the identifier for each of the entities above. one CUSTOMER may place many CUSTOMER ORDERS one EMPLOYEE must fill out one or more PAY SHEETS This is also called "minimal cardinality" or the "optionality" of a relationship. Participation of instances in a relationship may be mandatory or optional. o Relationship Cardinality refers to the number of entity instances involved in the relationship.Exercise: Come up with a list of attributes for each of the entities above.

An ITEM must have one and only one ORDER. Most of the differences concern how relationships are specified and how attributes are shown. cardinality. These are admittedly clumsy. Optional participation indicated by a 0 intersecting the relationship line segment. Elmasri/Navathe textbook. Variation One .• • • • • There are a wide variety of notations for E-R Diagrams. entities are depicted as rectangles with either pointed or rounded corners. An ORDER may have zero or more ITEMS. Cardinality: Displayed inside the relationship diamond. In almost all variations. Variation Two . Relationships can be displayed as diamonds (see below) or can be simply line segments between two entities. but you get the point. Oracle Designer/2000 and Visible Analyst.Elmasri/Navathe Book ~ 12 ~ . For this diagram: • • • • An ORDER must be placed by one and only one CUSTOMER. optionality (minimal cardinality) Here we will give examples from 4 variations: The Kroenke textbook. degree. Degree: Shown by line segments between the relationship diamond and 2 or more entities. A CUSTOMER may place zero or more ORDERS.What the Kroenke book uses • • • • Relationship Name: Displayed just outside of the relationship diamond. Optionality: Mandatory participation indicated by an intersecting hash mark made perpendicular to the relationship line segment. need to convey: Relationship name. The entity name appears inside. For Relationships.

Relationship diamonds are not used. The "be" is mandatory making the verb difficult to get right. Split up the cardinality. Degree: Shown by line segments between any two entities. As such. Optional participation is indicated by a dotted line segment.• • • • Relationship Name: Displayed just inside the relationship diamond. There are two phrases. 3 way relationships as described in the Kronke book can not exist. Optionality: Mandatory participation indicated by double relationship line Optional participation indicated by a single relationship line. Cardinality: Displayed between the participating entity and the relationship diamond next to the relationship line. This phrase is then written along the line segments for the relationship. Degree: Shown by line segments between the relationship diamond and 2 or more entities. For example: An ORDER must be placed by one and only one CUSTOMER.Oracle Designer/2000 CASE • • • • • • In Oracle Corporation's Designer/2000. one for each direction of the relationship. ~ 13 ~ . Multiple participation ("N") is indicated by crow's feet Optionality: Mandatory participation is indicated by a solid relationship line segment. Variation Three . Relationship Names: Are expressed as a verb phrase starting with "be". relationships are expressed in a rigid sentence format. Cardinality: Single participation ("1" in the previous example) is indicated by a single line segment.

o A single line show a "One" side of the relationship. One ORDER may be made up of zero or more ITEMS. One CUSTOMER may be placing zero or more ORDERS. The relationships use the following symbols: o For cardinality. o Optional participation is shown with an open circle. Thus in the above diagram. One ITEM must be an item on one and only one ORDER.• • • • One ORDER must be placed by one and only one CUSTOMER. Variation Four . a Customer May place one or more Orders. ~ 14 ~ .Visible Analyst • • Visible Analyst Workbench (VAW) uses the rounded box to show an Attributive Entity one that depends on the existence of a fundamental entity (noted by just the rectangle). the crow's feet are used to show a "Many" side of a relationship. There are a set of tools that can print these "relationship sentences".

This is a "physical" level diagram of how the tables are actually created. Variation Five . Displaying Attributes ~ 15 ~ .Sybase PowerDesigner This is not an Entity Relationship Diagram! It is true: The "Relationships" screen in MS Access is NOT an Entity Relationship diagramming tool. an Order Must be placed by one and only one Customer.o Mandatory participation is shown with two hash marks. Thus in the above diagram.

Clients. Resources. 2. Note that an ITEM can not exist by itself.• • • Technically. List attributes inside of the entity box. Examples of strong entities: People. 1. Two main ways to display attributes associated with an entity. Weak Entity: An entity that depends on another for its existence. Consider: Entity-Relationship-Attribute (ERA) model. ID Dependent Entity: A weak entity that includes the identifier of the related strong entity. Attributes appear in ovals attached to the entity. Gets messy. Vendors. Elmasri/Navathe definition: Weak entity: Entity types that do not have key attributes of their own. an Entity-Relationship diagram should show only entities and their relationships. It must be identified with a specific Order. Weak Entities • • • • • • Broad definition. Customers. ID Dependent entities are sometimes shown with curved boxes as in the Visible Analyst ER example. Services. ~ 16 ~ . Employees. Students Products. Materials Banks Examples of ID Dependent entities: Dependents (of employees). Bank Branches (of Banks). Parts.

Subtype Entities • • • Attributes of two or more Entities may overlap significantly but not completely. Duration) LongDistance Call (Source#. ~ 17 ~ . • Second approach. Destination#. put common attributes into a parent or supertype entity and then have 3 subtype entities.• • The Elmasri/Navathe notation shows the ID Dependent entity with a double box. Destination#. Time of day. Long distance Carrier) Cell Phone Call (Source#. Time of day. Time of day. Consider: Phone Call (Source#. AirTime) One approach would be to put all of the attributes into a single entity. Final note: ID Dependent entities will always result in relations (and later on tables) with composite keys. Destination#. LandTime. Duration. The "identifying relationship" (from the strong entity to the weak entity) is shown with a double diamond.

Only one subtype entity can participate in an instance. ~ 18 ~ . Each column (attribute) value must be a single value only. The Relational Model • • Recall. All values for a given column (attribute) must be of the same type. Each column (attribute) name must be unique. As before. which are made up of attributes. Below is the same diagram drawn using E-R symbols from the Elmasri/Navathe book. the Relational Model consists of the elements: relations. A relation is a set of columns (attributes) with values for each attribute such that: 1. 3. the double line between the Call entity and the d in the circle indicates the relationship is mandatory. The d in the circle indicates the subtype entity is distinct. 2.• Relationship is called an IS-A relationship. The above diagram uses the Oracle Designer/2000 symbols for Supertype/Subtype.

3. Recall that no two relations should have exactly the same values. Gather user/business requirements. The order of columns is insignificant 5. copies of attributes (the identifiers) were placed in related relations. 5. Normalize the relations to remove any anomalies (***). No two rows (tuples) in a relation can be identical. Develop the E-R Model (shown as an E-R Diagram) based on the user/business requirements. The selection of keys will depend on the particular application being considered. 6. Implement the database by creating a table for each normalized relation. Functional Dependencies • • • • • • • • • • • A Functional Dependency describes a relationship between attributes in a single relation. Tax -> Car_Price Course_Number. Keys and Uniqueness • • • • • • Key: One or more attributes that uniquely identify a tuple (row) in a relation. Semester# -> Grade SKU -> Compact_Disk_Title. Example: Employee_Name is functionally dependant on Social_Security_Number because Social_Security_Number can be used to determine the value of Employee_Name. An attribute is functionally dependant on another if we can use the value of one attribute to determine the value of another. Users can offer some guidance as to what would make an appropriate key. Course#. Also this is pretty much an art as opposed to an exact science. From our discussion of E-R Modelling. Not all determinants are keys.• • 4. thus a candidate key would consist of all of the attributes in a relation. Artist Model. "A determines B". Convert the E-R Model to a set of relations in the relational model 4. Section -> Professor. depending on the relationships between entities. 2. We use the symbol -> to indicate a functional dependency. we know that an Entity typically corresponds to a relation and that the Entity's attributes become attributes of the relation. The process we are following is: 1. We also discussed how. Options. The order of the rows (tuples) is insignificant. Number of Students The attributes listed on the left hand side of the -> are called determinants. Classroom. -> is read functionally determines Student_ID -> Student_Major Student_ID. One can read A -> B as. A key functionally determines a tuple (row). ~ 19 ~ .

There can be a number of problems: o Deletion Anomaly: Deleting a relation results in some related information (from another entity) being lost.this situation might not be feasible. • Here is a quick example: A company has a Purchase order form: • Our dutiful consultant creates the E-R Model: ~ 20 ~ . we may find that some relations are not properly specified.Modification Anomalies • Once our E-R model has been converted into relations. o Insertion Anomaly: Inserting a relation requires we have information from two or more entities .

Ship_To.LINE_ITEMS (PO_Number.00 5 $2. Qty) PO_HEADER (PO_Number. PartNum. Normalization ~ 21 ~ . Typical way to solve these anomalies is to split the relation in to two or more relations Process called Normalization. . Description.00 7 $1. PODate..00 5 $2. ItemNum.00 6 $3.) Consider some sample data for the LINE_ITEMS relation: PO_Number O101 O101 O101 O102 O102 O103 • • • • ItemNum PartNum Description Price I01 I02 I03 I01 I02 I01 P99 P98 P77 P99 P77 P33 Plate Cup Bowl Plate Bowl Fork Qty $3.50 8 What are some of the problems with this relation ? What happens when we delete item 2 from Order O101 ? These problems occur because the relation in question contains data about 2 or more themes.. Vendor. Price.00 11 $2. Consider the performance impact.

If you have a key defined for the relation. Close_Price) Company Symbol Headquarters Date Close Price ~ 22 ~ . Example relation in 1NF: STOCKS (Company.00 01/06/94 112. then you can meet the unique row requirement. Normal forms are given name such as: o First normal form (1NF) o Second normal form (2NF) o Third normal form (3NF) o Boyce-Codd normal form (BCNF) o Fourth normal form (4NF) o Fifth normal form (5NF) o Domain-Key normal form (DK/NF) These forms are cumulative. 5. Date and Symbol. Close Price is dependent on Company. 6. Close_Price) Company Symbol IBM IBM IBM Netscape Netscape IBM IBM IBM NETS NETS Date Close Price 01/05/94 101. No two rows (tuples) in a relation can be identical. 3. First Normal Form (1NF) • • • A relation is in first normal form if it meets the definition of a relation: 1. All values for a given column (attribute) must be of the same type. In the example below. Symbol. Date The following example relation is not in 2NF: STOCKS (Company. The order of columns is insignificant.50 01/07/94 102. Each column (attribute) value must be a single value only. A relation in Third normal form is also in 2NF and 1NF. Each column (attribute) name must be unique.00 Second Normal Form (2NF) • • • • • A relation is in second normal form (2NF) if all of its non-key attributes are dependent on all of the key.• • • • Relations can fall into one or more categories (or classes) called Normal Forms Normal Form: A class of relations free from a certain set of modification anomalies.00 01/06/94 100. Symbol. Date. 2. 4. The order of the rows (tuples) is insignificant. Date. Relations that have a single attribute for a key are automatically in 2NF. Headquarters. This is one reason why we often use artificial identifiers as keys.00 01/05/94 33.

Headquarters Symbol IBM IBM IBM NETS NETS • Symbol.00 Sunyvale. Headquarters • Consider that Company. Date -> Close Price Company -> Symbol.00 Sunyvale. NY Armonk.00 01/06/94 112.50 01/07/94 102.00 Third Normal Form (3NF) • A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive dependencies. Headquarters Symbol -> Company. NY Sunnyvale. NY 01/05/94 101. Headquarters) STOCKS (Symbol. Close_Price) Company Symbol Headquarters IBM Netscape IBM NETS Armonk. Date -> Close Price. ~ 23 ~ . Headquarters Symbol -> Company. Date as our key.00 Company. Date -> Close Price Date Close Price 01/05/94 101. Also.00 01/06/94 100. However: Company -> Headquarters This violates the rule for 2NF. consider the insertion and deletion anomalies. CA 01/05/94 33.IBM IBM IBM Netscape Netscape • • • • • • IBM IBM IBM NETS NETS Armonk. Symbol.50 01/07/94 102. NY Armonk.00 01/06/94 100. Date. So we might use Company.00 01/05/94 33. CA 01/06/94 112. CA • • Company -> Symbol. Date -> Close Price Symbol. One Solution: Split this up into two relations: COMPANY (Company.

Split this up into two relations: Company County IBM AT&T Putnam Bergen • Company -> County County Tax Rate Putnam 28% Bergen 26% • County -> Tax Rate Boyce-Codd Normal Form (BCNF) • • • • • A relation is in BCNF if every determinant is a candidate key. If A -> B and B -> C then A -> C Transitive Dependency: Three attributes with the above dependencies. Section Course_Num. Section -> Classroom. Those determinants that are keys we initially call candidate keys. Funds are managed by one or more Managers Investment Types can have one more Managers Managers only manage one type of investment. Example: At CUNY: Course_Code -> Course_Num. Professor Example: At Rutgers: Course_Index_Num -> Course_Num. B and C. Recall that not all determinants are keys. ~ 24 ~ . Eventually. Professor Example: Company County Tax Rate IBM AT&T Putnam 28% Bergen 26% • • • • • • • Company -> County and County -> Tax Rate thus Company -> Tax Rate What happens if we remove AT&T ? We loose information about 2 different themes. Section Course_Num. Consider the following example: Funds consist of one or more Investment Types. Section -> Classroom. we select a single candidate key to be the primary key for the relation.• • • • • • • • • Consider relation R containing attributes A.

Manager) 1. Retain the determinant in the original relation. Is this relation R(FundID. the combination FundID and InvestmentType form a candidate key because we can use FundID. 2NF because all of the non-key attributes (Manager) is dependant on all of the key. InvestmentType. create a new relation from the functional dependency.FundID InvestmentType Manager 99 99 33 22 11 • • • • • • • Common Stock Common Stock Growth Stocks Common Stock Smith Green Brown Smith Municipal Bonds Jones FundID. Consider what happens if we delete the tuple with FundID 22. Create a new relation from the functional dependency: Rnew(Manager. InvestmentType -> Manager FundID. InvestmentType as the Primary Key: 1NF for sure. The determinants are: FundID. InvestmentType. Manager Manager 2. List all of the determinants. Similarly. Manager YES Manager NO YES 3. we have retained the determinant "Manager" in the original relation Rorig. InvestmentType FundID. For any determinant that is not a candidate key. InvestmentType FundID. the combination FundID and Manager also form a candidate key because we can use FundID. 2NF or 3NF ? Given we pick FundID. InvestmentType) Rorig(FundID. Manager) In this last step." The following are steps to normalize a relation into BCNF: 1. Manager) in 1NF. ~ 25 ~ . 2.InvestmentType to uniquely identify a tuple in the relation. Which determinants can act as keys ? FundID. For our example: Rorig(FundID. 3NF because there are no transitive dependencies. Manager -> InvestmentType Manager -> InvestmentType • • • In this case. We loose the fact that Brown manages the InvestmentType "Growth Stocks. Manager to uniquely identify a tuple. Manager by itself is not a candidate key because we cannot use Manager alone to uniquely identify a tuple in the relation. 3. See if each determinant can act as a key (candidate keys).

No regular functional dependencies 2.Fourth Normal Form (4NF) • • • • A relation is in fourth normal form if it is in BCNF and it contains no multivalued dependencies. Student participates in one or more activities. Book example: Student has one or more majors. B. 3. B and C are independent of one another. Must always maintain the combinations to preserve the meaning. one can determine multiple values of B. and C. All three attributes taken together form the key. ~ 26 ~ . Rowe Price Emerging Markets Bond Fund • A few characteristics: 1. Latter two attributes are independent of one another. There must be at least 3 attributes in the relation. call them A. Given A. 4. More formally. one can determine multiple values of C. Multivalued Dependency: A type of functional dependency where the determinant can determine more than one value. for example. StudentID 100 100 100 100 200 Major CIS CIS Activities Baseball Volleyball Accounting Baseball Accounting Volleyball Marketing Swimming • • StudentID ->-> Major StudentID ->-> Activities Portfolio ID 999 999 999 999 888 Stock Fund Janus Fund Janus Fund Municipal Bonds Bond Fund Dreyfus Short-Intermediate Municipal Bond Fund Scudder Global Fund Municipal Bonds Scudder Global Fund Dreyfus Short-Intermediate Municipal Bond Fund Kaufmann Fund T. 2. Insertion anomaly: Cannot add a stock fund without adding a bond fund (NULL Value). Given A. there are 3 criteria: 1. 3.

There is no known algorithm for converting a relation directly into DK/NF. it cannot be reassembled back into its original form. PortfolioID PortfolioID ->-> ->-> Stock Fund Bond Fund Resolution: Split into two tables with the common key: Portfolio ID 999 999 888 Portfolio ID 999 999 888 Municipal Bonds Dreyfus Short-Intermediate Municipal Bond Fund T. Zip) ~ 27 ~ . Rowe Price Emerging Markets Bond Fund Stock Fund Janus Fund Scudder Global Fund Kaufmann Fund Bond Fund Fifth Normal Form (5NF) • • There are certain conditions under which after decomposing a relation. State. NULL values) and semantic (logical) description of what values an attribute can hold. De-Normalization • Consider the following relation: CUSTOMER (CustomerID. Examples: 1. Multivalued Dependencies 3. Constraint: An rule governing static values of an attribute such that we can determine if this constraint is True or False. Domain Key Normal Form (DK/NF) • • A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains.• • • • Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID. Functional Dependencies 2. Inter-relation rules 4. • • • Key: Unique identifier of a tuple. City. size. Intra-relation rules However: Does Not include time dependent constraints. Domain: The physical (data type. We don't consider these issues here. Name. Address.

• • •

This relation is not in DK/NF because it contains a functional dependency not implied by the key.
Zip -> City, State

• •

We can normalize this into DK/NF by splitting the CUSTOMER relation into two: CUSTOMER (CustomerID, Name, Address, Zip) CODES (Zip, City, State) We may pay a performance penalty - each customer address lookup requires we look in two relations (tables). In such cases, we may de-normalize the relations to achieve a performance improvement.

All-in-One Example
Many of you asked for a "complete" example that would run through all of the normal forms from beginning to end using the same tables. This is tough to do, but here is an attempt: Example relation: EMPLOYEE ( Name, Project, Task, Office, Phone ) Note: Keys are underlined. Example Data: Name Project Task Office Floor Phone Bill Bill Bill Bill Sue Sue Sue Ed
• • •

100X 100X 200Y 200Y 100X 200Y 300Z 100X

T1 T2 T1 T2 T33 T33 T33 T2

400 400 400 400 442 442 442 588

4 4 4 4 4 4 4 5

1400 1400 1400 1400 1442 1442 1442 1588

• • •

Name is the employee's name Project is the project they are working on. Bill is working on two different projects, Sue is working on 3. Task is the current task being worked on. Bill is now working on Tasks T1 and T2. Note that Tasks are independent of the project. Examples of a task might be faxing a memo or holding a meeting. Office is the office number for the employee. Bill works in office number 400. Floor is the floor on which the office is located. Phone is the phone extension. Note this is associated with the phone in the given office.

~

28

~

First Normal Form
• •

Assume the key is Name, Project, Task. Is EMPLOYEE in 1NF ?

Second Normal Form
• • •

List all of the functional dependencies for EMPLOYEE. Are all of the non-key attributes dependant on all of the key ? Split into two relations EMPLOYEE_PROJECT_TASK and EMPLOYEE_OFFICE_PHONE. EMPLOYEE_PROJECT_TASK (Name, Project, Task) Name Project Task Bill Bill Bill Bill Sue Sue Sue Ed 100X 100X 200Y 200Y 100X 200Y 300Z 100X T1 T2 T1 T2 T33 T33 T33 T2

EMPLOYEE_OFFICE_PHONE (Name, Office, Floor, Phone) Name Office Floor Phone Bill Sue Ed 400 442 588 4 4 5 1400 1442 1588

Third Normal Form
• • • •

Assume each office has exactly one phone number. Are there any transitive dependencies ? Where are the modification anomalies in EMPLOYEE_OFFICE_PHONE ? Split EMPLOYEE_OFFICE_PHONE. EMPLOYEE_PROJECT_TASK (Name, Project, Task) Name Project Task Bill 100X T1

~

29

~

Bill Bill Bill Sue Sue Sue Ed

100X 200Y 200Y 100X 200Y 300Z 100X

T2 T1 T2 T33 T33 T33 T2

EMPLOYEE_OFFICE (Name, Office, Floor) Name Office Floor Bill Sue Ed 400 442 588 4 4 5

EMPLOYEE_PHONE (Office, Phone) Office Phone 400 442 588 1400 1442 1588

Boyce-Codd Normal Form
• •

List all of the functional dependencies for EMPLOYEE_PROJECT_TASK, EMPLOYEE_OFFICE and EMPLOYEE_PHONE. Look at the determinants. Are all determinants candidate keys ?

Forth Normal Form
• • •

Are there any multivalued dependencies ? What are the modification anomalies ? Split EMPLOYEE_PROJECT_TASK.

~

30

~

EMPLOYEE_PROJECT (Name, Project ) EMPLOYEE_TASK (Name, Task ) Name Task Bill Bill Sue Ed T1 T2 T33 T2

Name Project Bill Bill Sue Sue Sue Ed 100X 200Y 100X 200Y 300Z 100X

EMPLOYEE_OFFICE (Name, Office, Floor) Name Office Floor Bill Sue Ed 400 442 588 4 4 5

R4 (Office, Phone) Office Phone 400 442 588 1400 1442 1588

At each step of the process, we did the following: 1. 2. 3. 4. Write out the relation (optionally) Write out some example data. Write out all of the functional dependencies Starting with 1NF, go through each normal form and state why the relation is in the given normal form.

Another short example
Consider the following example of normalization for a CUSTOMER relation.

~

31

~

Solution: Split CUSTOMER into two relations: CUSTOMER (CustomerID. Name. As a final step. Kroenke (7th ed. New Brunswick NJ 07101 732-555-1212 07066 908-555-1212 Mary Green 11 Birch St. Old Bridge Functional Dependencies CustomerID -> Name. Prescott & McFadden (6th ed.) Connolly/Begg (3rd Ed. Phone) ZIPCODES (Zip. consider de-normalization. City. State. Zip.Relation Name CUSTOMER (CustomerID. Phone Zip -> City. Street. Name. Zip. City. Relational Algebra: Elmasri/Navathe (3rd) ed. 2NF All non key attributes are dependent on all of the key. State Normalization • • • • 1NF Meets the definition of a relation. BCNF Relation CUSTOMER is not in BCNF because one of the determinants Zip can not act as a key for the entire relation. State) Check both CUSTOMER and ZIPCODE to ensure they are both in 1NF up to BCNF. Street. Street. 2 Chapter 7 Chapter 8 Chapter 4 N/A ~ 32 ~ . 3NF There are no transitive dependencies. State.) Rob/Coronel (5th ed) Hoffer. Phone) Example Data CustomerID C101 C102 Name Bill Smith Street City State Zip NJ Phone 123 First St.) N/A MataToledo / Cushman Shaum's Outlines Ch. Zip. City. • 4NF There are no multi-valued dependencies in either CUSTOMER or ZIPCODES.

The order of attributes is insignificant 5. Specific Relational Operations: Selection. Join. Relations are operands and the result of an operation is another relation. Difference: R . Relational Algebra is a collection of operations on Relations. No two rows (tuples) in a relation can be identical. All values for a given attribute must be of the same type (or domain).S Result: Relation with tuples from R but not from S ~ 33 ~ . Intersection. 4. 2. Set theory operations: Union. Two main collections of relational operators: 1. Projection. Each attribute value must be a single value only (atomic). which are made up of attributes. 2. the Relational Model consists of the elements: relations. Difference and Cartesian product. Division Set Theoretic Operations Consider the following relations R and S R First Bill Last Age Smith 22 Sally Green 28 Mary Keen 23 Tony Jones 32 S First Forrest Sally Last Gump Green Age 36 28 DonJuan DeMarco 27 • • Union: R S Result: Relation with tuples from R and S with duplicates removed. The order of the rows (tuples) is insignificant. 6. 3.• • • • • Recall. A relation is a set of attributes with values for each attribute such that: 1. Each attribute name must be unique.

intersection and difference operations. Some additional properties: o Union. Domain is the datatype and size of an attribute. However. they must have the same number of attributes or arity and the domains for corresponding attributes must be identical.• Intersection: R S Result: Relation with tuples that appear in both R and S. R S First Bill Sally Mary Tony Forrest Last Smith Green Keen Jones Gump Age 22 28 23 32 36 DonJuan DeMarco 27 R-S First Last Bill Age Smith 22 Mary Keen 23 Tony Jones 32 R S First Last Age Sally Green 28 Union Compatible Relations • • • • • • Attributes of relations need not be identical to perform union. Intersection and difference operators may only be applied to Union Compatible relations. The degree of relation R is the number of attributes it contains. Definition: Two relations R and S are union compatible if and only if they have the same degree and the domains of the corresponding attributes are the same. ~ 34 ~ .

T is not equal to T .R The resulting relations may not have meaningful names for the attributes.S not equal S .R Cartesian Product • Produce all combinations of tuples from two relations. R . Convention is to use the attribute names from the first relation. R First Last Bill Age Smith 22 Mary Keen 23 Tony Jones 32 S ~ 35 ~ .o o o Union and Intersection are commutative operations R S=S R R S=S R Difference operation is NOT commutative. Exercises • Assume relation T fName Sally Mary lName Green Score 44 28 William Smith Kontrary 27 • Compute R T Compute R T Show that R .

The selection operator is sigma: The selection operation acts like a filter on a relation by returning only a certain number of tuples. The resulting relation may have fewer tuples than the original relation. o Comparison operators: • o Logical operators: Use the Truth tables (memorize these) for logical expressions: T T F T F F F F T F T T T F T F T F F T ~ 36 ~ .Dinner Steak Dessert Ice Cream Lobster Cheesecake RXS First Last Age Dinner Bill Bill Smith 22 Smith 22 Steak Steak Steak Dessert Ice Cream Ice Cream Ice Cream Lobster Cheesecake Lobster Cheesecake Lobster Cheesecake Mary Keen 23 Mary Keen 23 Tony Jones 32 Tony Jones 32 Selection Operator • • • • • • • • Selection and Projection are unary operators. The tuples to be returned are dependent on a condition that is part of the selection operator. C (R) Returns only those tuples in R that satisfy condition C A condition C can be made up of any combination of comparison or logical operators that operate on the attributes of R. The resulting relation will have the same degree as the original relation.

Selection Examples Assume the following relation EMP has the following tuples: Name Smith Jones Green Office Dept 400 220 160 CS Rank Assistant Econ Adjunct Econ Assistant CS Fin Associate Associate Brown 420 Smith • 500 Select only those Employees in the CS department: Dept = 'CS' (EMP) Result: Name Smith Office Dept 400 CS CS Rank Assistant Associate Brown 420 • Select only those Employees with last name Smith who are assistant professors: Name = 'Smith' Rank = 'Assistant' (EMP) Rank Assistant Result: Name Office Dept Smith • 400 CS Select only those Employees who are either Assistant Professors or in the Economics department: Rank = 'Assistant' Dept = 'Econ' (EMP) Result: Name Office Dept Smith Jones 400 220 CS Rank Assistant Econ Adjunct Econ Assistant Green 160 ~ 37 ~ .

• Select only those Employees who are not in the CS department or Adjuncts: (Rank = 'Adjunct' Dept = 'CS') (EMP) Result: Name Office Dept Green 160 Smith 500 Rank Econ Assistant Fin Associate Exercises • Evaluate the following expressions: 1. 5. • Project only the names and departments of the employees: name. 3 and 4 above all evaluate ot the same thing? Projection Operator • • • • • • Projection is also a Unary operator. The Projection operator is pi: Projection limits the attributes that will be returned from the original relation. The resulting relation will have the same number of tuples as the original relation (unless there are duplicate tuples produced). dept (EMP) Results: ~ 38 ~ . 3. The degree of the resulting relation may be equal to or less than that of the original relation. 4. Projection Examples Assume the same EMP relation above is used. (EMP) Rank = 'Associate' ( Dept = 'CS' EMP ) Dept = 'CS' ( Rank = 'Associate' EMP ) (Rank = 'Adjunct' Dept = 'CS') Rank = 'Associate' Age > 26 Dept = 'CS' (EMP) (R S) For this expression. The general syntax is: attributes R Where attributes is the list of attributes to be displayed and R is the relation. • Do expressions 2. 2. use R and S from the Set Theoretic Operations section above.

rank (Rank = 'Adjunct' Age > 22 Dept = 'CS') (EMP) ) (R S) ) For this expression.Name Smith Jones Green Dept CS Econ Econ Brown CS Smith Fin Combining Selection and Projection • • The selection and projection operators can be combined to perform both operations. rank (EMP)) ~ 39 ~ . age ( name. use R and S from the Set Theoretic Operations section above. 2. 3. ( fname. Show the names of all employees working in the CS department: name ( Dept = 'CS' (EMP) ) Results: Name Smith Brown • Show the name and rank of those Employees who are not in the CS department or Adjuncts: name. rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) ) Result: Name Rank Green Assistant Smith Associate Exercises • Evaluate the following expressions: 1. office > 300 ( name.

Aggregate Function Examples Assume the relation EMP has the following tuples: Name Smith Jones Green Office Dept 400 220 160 CS Salary 45000 Econ 35000 Econ 50000 CS Fin (EMP) 65000 60000 Brown 420 Smith • 500 Find the minimum Salary: Results: MIN (salary) MIN(salary) 35000 • Find the average Salary: Results: AVG (salary) (EMP) AVG(salary) 51000 • Count the number of employees in the CS department: Results: COUNT(name) 2 COUNT (name) ( Dept = 'CS' (EMP) ) ~ 40 ~ . MEDIAN o COUNT Aggregate functions are sometimes written using the Projection operator or the Script F character: as in the Elmasri/Navathe book. MEAN.Aggregate Functions • • We can also apply Aggregate functions to attributes and tuples: o SUM o MINIMUM o MAXIMUM o AVERAGE.

For example assume we have the EMP relation as above and a separate DEPART relation with (Dept. MainOffice. The generic join operator (called the Theta Join is: It takes as arguments the attributes from the two relations that are to be joined.Dept = depart.• Find the total payroll for the Economics department: Results: SUM(salary) 85000 SUM (salary) ( Dept = 'Econ' (EMP) ) Join Operation • Join operations bring together two relations and combine their attributes and tuples in a specific fashion.Dept DEPART DEPART. Phone) : EMP EMP.Dept Salary 400 220 CS Econ 45000 35000 Phone 555-1212 555-1234 404 200 ~ ~ .Dept = DEPART. Join Examples Assume we have the EMP relation from above and the following DEPART relation: Dept CS MainOffice 404 Phone 555-1212 555-1234 555-4321 555-9876 Econ 200 Fin Hist • 501 100 Find all information on every employee including their department info: EMP Results: Name Smith Jones emp.Dept • • • DEPART • • • The join condition can be When the join condition operator is = then we call this an Equijoin Note that the attributes in common are repeated.Dept MainOffice CS Econ 41 Office EMP.

any attributes in common (such as dept above) are repeated. The natural join operator is: * We can also assume using * that the join condition will be = on the two attributes in common.Dept Salary Smith 400 CS Econ Fin 45000 50000 60000 DEPART.Dept MainOffice CS Econ Fin 404 200 501 Phone 555-1212 555-1234 555-4321 Green 160 Smith 500 Natural Join • • • • • Notice in the generic (Theta) join operation.dept) DEPART Name Office EMP.Green 160 Econ CS Fin 50000 65000 60000 Econ CS Fin 200 404 501 555-1234 555-1212 555-4321 Brown 420 Smith • 500 Find all information on every employee including their department info where the employee works in an office numbered less than the department main office: EMP Results: (emp. The Natural Join operation removes these duplicate attributes. Example: EMP * DEPART Results: Name Smith Jones Green Office Dept Salary 400 220 160 CS 45000 MainOffice 404 200 200 404 501 Phone 555-1212 555-1234 555-1234 555-1212 555-4321 Econ 35000 Econ 50000 CS Fin 65000 60000 Brown 420 Smith 500 Outer Join ~ 42 ~ .office < depart.dept = depart.mainoffice) (emp.

Three types of outer joins: 1. includes all tuples in the left hand relation and from the right hand relation.• • • In the Join operations so far. Right Outer Join 3.food = menu. only those tuples from both relations that satisfy the join condition are included in the output relation.food MENU Day Tuesday Monday NULL NULL Name Age people. 2.food Name Age people.food = menu.Food Day ~ 43 ~ . includes all tuples in the right hand relation and includes ony those matching tuples from the left hand relation.Food Alice Bill Carl Dina • 21 24 23 19 Hamburger Pizza Beer Shrimp MENU Hamburger Pizza NULL NULL PEOPLE people. The Outer join includes other tuples as well according to a few rules.Food menu.Food menu. Full Outer Join • Examples: Assume we have two relations: PEOPLE and MENU: PEOPLE: Name Age Alice Bill Carl Dina 21 24 23 19 Food Hamburger Pizza Beer Shrimp MENU: Food Pizza Day Monday Hamburger Tuesday Chicken Pasta Tacos Wednesday Thursday Friday • PEOPLE people. Left Outer Join includes all tuples in the left hand relation and includes only those matching tuples from the right hand relation.

Food Hamburger Pizza Beer Shrimp Hamburger Pizza NULL NULL Chicken Pasta Tacos Day Tuesday Monday NULL NULL Wednesday Thursday Friday NULL NULL NULL NULL NULL NULL NULL NULL NULL Outer Union • • • The Outer Union operation is applied to partially union compatible relations.food = menu. Operator is: * Example: PEOPLE * MENU Name Alice Bill Carl Age 21 24 23 Food Day Hamburger NULL Pizza Beer NULL NULL ~ 44 ~ .Bill Alice 24 21 Pizza Hamburger Pizza Hamburger Chicken Pasta Tacos Monday Tuesday Wednesday Thursday Friday NULL NULL NULL NULL NULL NULL NULL NULL NULL • PEOPLE people.Food menu.food MENU Name Alice Bill Carl Dina Age 21 24 23 19 people.

we demonstrate them here. One way to do this is to use the Symbol choice on the Insert menu in MS Word. Most of the relational algebra symbols can be produced using the "Symbol" font.Dina 19 Shrimp NULL NULL NULL Hamburger Monday NULL NULL Pizza NULL NULL Chicken NULL NULL Pasta NULL NULL Tacos Tuesday Wednesday Thursday Friday How to make Relational Algebra Symbols in MS Word When doing homework assignments and projects. This is shown below: The following dialog box will appear: ~ 45 ~ . it is very helpful to be able to type these relational algebra symbols into MS Word or other work processor. Since we mainly use MS Word or another word processor running in Microsoft Windows.

1. retrieve and update data from tables. SQL is either specified by a command-line tool or is embedded into a general purpose programming language such as Cobol.adds some Object oriented concepts SQL has two major parts: Data Definition Language (DDL) Used to create (define) data structures such as tables. clusters 2. SQL is a standardized language monitored by the American National Standards Institute (ANSI) as well as by National Institute of Standards (NIST).By default. Data Manipulation Language (DML) is used to store. Informix. Structured Query Language • • • • • • SQL was first implemented in IBM's System R in the late 1970's. Sybase. Oracle. etc. but the majority of SQL is standard across MS Access. the symbols displayed on this screen will use the Symbol font. SQL is the de-facto standard query language for creating and manipulating data in relational databases. etc. Some symbols such as join and outer join are not available in this fashion. o ANSI 1990 .SQL 2 Standard (sometimes called SQL-92) o SQL 3 . Some minor syntax differences. For these you can copy and paste the graphics in the MS Word file linked here. indexes. Pascal. "C". All of the relational algebra symbols are included.SQL 1 standard o ANSI 1992 . ~ 46 ~ .

INT or SMALLINT Real Numbers: FLOAT. +/. PRECISION Formatted Numbers: DECIMAL(i.j) Character Strings • • • Two main types: Fixed length and variable length. Other ways of expressing dates: o Store as characters or integers with Year.HH:MM Used to specify some span of time measured in days or minutes.j). Fixed length of n characters: CHAR(n) or CHARACTER(n) Variable length up to size n: VARCHAR(n) Date and Time • • • • Note: Implementations vary widely for these data types. DATE Has 10 positions in the format: YYYY-MM-DD TIME Has 8 positions in the format: HH:MM:SS TIME(i) Defines the TIME data type with an additional i positions for fractions of a second. Month Day: 19972011 o Store as Julian date: 1997283 • Both MS Access and Oracle store date and time information together in a DATE data type. etc. For example: HH:MM:SS:dd • • • • TIMESTAMP INTERVAL Offset from UTZ. DOUBLE. Numeric Data Types • • • Integers: INTEGER. Examples of Data Types for Some Popular RDBMS • • Data types most often used are shown in Bold letters MS Access Examples from the MS Access Help File (c) Microsoft: Storage Data Type Range of Values Size Byte 1 byte 0 to 255 ~ 47 ~ . REAL. NUMERIC(i.SQL Data Types • Each implementation of SQL uses slightly different names for the data types.

REAL.94065645841247E-324 for precision floating.337. characters) string length • Oracle supports the following data types: o Numeric: BINARY_INTEGER.402823E38 to -1. DECIMAL. precision floating. Drop or Alter a table Create or Drop an Index Define Integrity constraints Define access privileges to users ~ 48 ~ .477. PLS_INTEGER.8 bytes negative values. Integer 2 bytes -32.401298E-45 to 3.401298E-45 for negative values.79769313486232E308 to -4. o Character: CHAR. NUMBER.685.Boolean 2 bytes True or False. Currency (scaled 8 bytes -922. DOUBLE PRECISION. 9999. LONG RAW.5808 to 922.203.79769313486232E308 for positive values. POSITIVE. numbers) Variant (with 22 bytes + Same range as for variable-length String. 4. RAW Note: You will not need to memorize the above two tables for exams.4 bytes 1.648 to 2.768 to 32. length) string Variant (with 16 bytes Any numeric value up to the range of a Double. String (fixedLength of 1 to approximately 65. VARCHAR2 o Others: BOOLEAN. 100 to December 31.400. LONG.647.685.402823E38 for positive values.767. NATURAL. integer) Single (single-3. CHARACTER. VARCHAR. NUMERIC. 2 billion (approx.147. Long (long 4 bytes -2. etc. SMALLINT o Date: DATE Note: Also stores time.483. String (variable. INTEGER. 65. INT.10 bytes + 0 to approx.337.147. Data Definition Language • • • • • • DDL is used to define the schema of the database.1). They are only there for your reference.400 for MS Windows length) string length version 3. DEC. Create a database schema Create. STRING. point) Double (double-1. Object 4 bytes Any Object reference. NATURALN.5807.203.483. POSITIVEN. integer) Date 8 bytes January 1.477.94065645841247E-324 to point) 1. FLOAT.

columns and other database objects. order_date DATE. ~ 49 ~ . choose SQL. NOT NULL. Under the View menu.0) NOT NULL. sales_person VARCHAR(25). NOT NULL. • • • • • • • • • • • • • • • • • • Creating a Table: CREATE TABLE employee ( Last_Name VARCHAR(20) First_name VARCHAR(18) Soc_Sec VARCHAR(11) Date_of_Birth DATE. then choose Design View and then close the next dialog box. Salary NUMBER(8. bill_to_zip VARCHAR(10). bill_to VARCHAR(35). NOT NULL. Specifying Primary and Foreign keys: CREATE TABLE order_header ( order_number NUMBER(10. bill_to_address VARCHAR(45). part_number VARCHAR(12) NOT NULL. • • • • • • • • • • • • • • • • • Note: When naming tables.0) NOT NULL. CREATE TABLE dependant ( Last_Name VARCHAR(20) NOT NULL. Date_of_Birth DATE. Employee_Soc_Sec VARCHAR(11) NOT NULL ). you can type in any SQL statement and execute it. do not call the last name column: Last Name If you wish to separate words in a name. Note that MS Access's DDL syntax is extremely limited. go to the Queries form and choose New. bill_to_city VARCHAR(20). Creating a Schema Note: To try out these SQL examples in MS Access. NOT NULL constraints and referential integrity constraints) are not supported. Soc_Sec VARCHAR(11) NOT NULL.• • Define access privileges on objects SQL2 specification supports the creation of multiple schemas per database each with a distinct owner and authorized users. bill_to_state VARCHAR(2). line_item NUMBER(4. do not include spaces in the names.0) NOT NULL. From this point. Most of the DDL statements below (including domains. CREATE TABLE order_items ( order_number NUMBER(10. First_name VARCHAR(18) NOT NULL. For example. PRIMARY KEY (order_number) ).2) ) . use the underscore character.

Orphans.Indicate which attribute(s) form the primary key o FOREIGN KEY .Indicates which attribute(s) must have unique values. CREATE INDEX employee_index ON employee (ssn) . Specify when constraint should be enforced: o Immediate o Deferrable until commit time Referential Integrity Constraint: Specify the behavior for child tuples when a parent tuple is modified. ssn INTEGER CONSTRAINT ssnConstraint PRIMARY KEY ).0) NOT NULL. This enforces referential integrity o UNIQUE . line_item).Attribute may not take a NULL value o DEFAULT . line_item) ASC . FOREIGN KEY (part_number) REFERENCES parts (part_number) ). Examples of ON DELETE and ON UPDATE CREATE TABLE order_items ( order_number NUMBER(10. Action to take if referential integrity is violated: o SET NULL . FORIEGN KEY (order_number) REFERENCES order_header (order_number).Indicate which attribute(s) form a foreign key. Specifying Constraints on Columns and Tables • • • • • • • Constraints on attributes: o NOT NULL . o SET DEFAULT . CREATE INDEX order_index ON order_header (order_number) ASC . LastName TEXT. o CASCADE .• • • • • • • • • • • • • • • • • • • • • • • • quantity NUMBER(4.Child tuples foreign key is set to NULL .0). Example from MS Access: CREATE TABLE employee ( FirstName TEXT.Set the value of the foreign key to some default value. ~ 50 ~ .Store a given default value i no value is specified o PRIMARY KEY .Child tuples are updated (or deleted) according to the action take on the parent tuple. CREATE INDEX items_index ON order_items (order_number. PRIMARY KEY (order_number.

bill_to_city VARCHAR(20). line_item NUMBER(4. FOREIGN KEY (part_number) REFERENCES parts (part_number) ). sales_person VARCHAR(25).0).0).• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • line_item NUMBER(4.0) NOT NULL. CONSTRAINT pk_order_items PRIMARY KEY (order_number. CREATE TABLE order_header ( order_number NUMBER(10. line_item). quantity NUMBER(4. FORIEGN KEY (order_number) REFERENCES order_header (order_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. bill_to_state VARCHAR(2). order_date DATE. part_number VARCHAR(12) NOT NULL.0) NOT NULL. Constraints can also be given names so that they can later be modified or dropped easily. CREATE TABLE order_items ( order_number NUMBER(10.0) NOT NULL. bill_to_address VARCHAR(45). line_item).0) NOT NULL. CONSTRAINT fk1_order_items FORIEGN KEY (order_number) REFERENCES order_header (order_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. An even better approach is to create the tables without constraints and then add them separately with ALTER TABLE statements ~ 51 ~ . CONSTRAINT pk_order_header PRIMARY KEY (order_number) ). part_number VARCHAR(12) NOT NULL. quantity NUMBER(4. bill_to VARCHAR(35). CONSTRAINT fk2_order_items FOREIGN KEY (part_number) REFERENCES parts (part_number) ON DELETE SET DEFAULT ON UPDATE CASCADE ). bill_to_zip VARCHAR(10). PRIMARY KEY (order_number.

Creating indexes on table columns • • • • • To speed up retrieval of orders given order_number: CREATE INDEX idx_order_number ON order_header (order_number) .0) NOT NULL. bill_to_city VARCHAR(20).0) ).• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • CREATE TABLE order_header ( order_number NUMBER(10.0) NOT NULL. line_item NUMBER(4. ALTER TABLE order_items ADD CONSTRAINT fk1_order_items FORIEGN KEY (order_number) REFERENCES order_header (order_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. bill_to VARCHAR(35). ALTER TABLE order_items ADD CONSTRAINT pk_order_items PRIMARY KEY (order_number.0) NOT NULL. bill_to_state VARCHAR(2). bill_to_address VARCHAR(45). ALTER TABLE order_items ADD CONSTRAINT fk2_order_items FOREIGN KEY (part_number) REFERENCES parts (part_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. bill_to_zip VARCHAR(10) ). part_number VARCHAR(12) NOT NULL. ~ 52 ~ . CREATE TABLE order_items ( order_number NUMBER(10. We give the first part of the index name as "idx" just as a convention. ALTER TABLE order_header ADD CONSTRAINT pk_order_header PRIMARY KEY (order_number). order_date DATE. line_item) . To speed up retrieval of orders given sales person: CREATE INDEX idx_sales_person ON order_header (sales_person) . sales_person VARCHAR(25). quantity NUMBER(4.

delete) data. last_name. Changing Schema Components with ALTER • Changing Attributes: ALTER TABLE student ALTER last_name VARCHAR(35). close_price) ~ 53 ~ . val2. Examples: INSERT INTO employee (first_name. "31212"). • • Adding Attributes: ALTER TABLE student ADD admission DATE.constraint_name Removes a constraint from a table. DROP TABLE table_name RESTRICT Remove the table only if it is not referenced (via a FORIEGN KEY constraint) by other tables. "Fillville".. DML is then used to manipulate (select. "Rich". CASCADE option deletes all data. "TN". "123 Sticks Ln. update. valX). all tables. DROP TABLE table_name CASCADE Remove the table and all related tables as specified by FOREIGN KEY constraints. • • DROP INDEX index_name Removes an index. DROP TABLE table_name Remove the table and all of its data. columnX) VALUES (val1. Removing Attributes (not widely implemented): ALTER TABLE student DROP home_phone. city. insert. Data Manipulation Language • DDL is used to create and specify the schema. .Removing Schema Components with DROP • DROP SCHEMA schema_name CASCADE Drop the entire schema including all tables. • • • • DROP SCHEMA schema_name RESTRICT Removes the schema only if it is empty. ALTER TABLE student ALTER gpa DROP DEFAULT ALTER TABLE student ALTER gpa SET DEFAULT 0. column2. DROP CONSTRAINT table_name.".. indexes. INSERT INTO stocks (symbol. zip) VALUES ("Buddy".. Inserting Data into Tables • • • • • • • • General syntax: INSERT INTO tablename (column1.00.. etc. close_date. street. state. . domains.

score..• • • • VALUES ("IBM".. test_name. column2. first_name. SELECT syntax: SELECT FROM WHERE GROUP BY HAVING ORDER BY column1. grade) VALUES (101. columnN tableA. zip) and a "Stocks" table: stocks(symbol... condition2. . INSERT INTO student_grades (student_id. last_name. "03-JUN-94". state. ..conditionM column1. • Quotes are placed around the data depending on the Data type and on the specific RDBMS being used: RDBMS Text Data Type Dates DATETIME: Either " or ' DATE: ' DATE: ' MS Access TEXT: Either " or ' Oracle IBM DB2 Sybase VARCHAR: ' VARCHAR: ' CHAR and VARCHAR: " DATE: " Retrieving Data from Tables with Select • • • • • • • • Main way of getting data out of tables is with the SELECT statement. .. 88. tableZ condition1. last_name. "B+")... columnN Assume an employees table: employees(employee_id. first_name employees last_name = "Smith" first_name DESC ~ 54 ~ . "Quiz 1". close_date. street. .25). column2.. tableB. .. city. condition column1. close_price) • • • • • • • Some example queries: SELECT FROM WHERE ORDER BY employee_id. 104.

SELECT FROM WHERE symbol. last_name. close_date. first_name DESC SELECT * FROM employees ORDER BY 2. close_date Relational Operators and SQL • • • • • • • • • • • • • Relational operators each have implementations in SQL. first_name FROM employee WHERE salary > 40000 AVG (salary) ( state = 'NJ' (EMPLOYEE) ) SELECT AVG(salary) FROM employee WHERE state = 'NJ' last_name = 'Smith' state = 'NY' (EMPLOYEE) SELECT * FROM employee WHERE last_name = 'Smith' AND state = 'NY' SQL Built-in Functions Example Table students: Name Bill Major CIS Grade 95 ~ 55 ~ . close_price stocks close_date > "01-JAN-95" AND symbol = "IBM" ORDER BY close_date SELECT FROM WHERE ORDER BY symbol. close_price stocks close_date >= "01-JAN-95" symbol. last_name.• • • • • • • • • • • • • • • • • • • • SELECT FROM WHERE ORDER BY employee_id. first_name ( salary > 40000 (EMPLOYEE) ) SELECT employee_id. last_name. first_name employees salary > 40000 last_name. employee_id.

1428571 Give the name of the student with the highest grade in the class: This is an example of a subquery SELECT name. grade FROM students s1 WHERE grade = ( SELECT max(grade) FROM students s2 WHERE s1.Mary Sue Tom Alex Sam Jane . Under the View menu.. major. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Average grade in the class: SELECT AVG(grade) FROM students.----- ~ 56 ~ .. grade FROM students WHERE grade = ( SELECT MAX(grade) FROM students ).major = s2. create the table in MS Access and enter the data shown above. Results: AVG(GRADE) ---------89. then choose Design View and then close the next dialog box. choose SQL.----Mary 98 Show the students with the highest grades in each major: SELECT name. Results: NAME GRADE -------------. Go to the Queries form and choose New.-------------------. CIS Marketing Finance CIS Marketing Finance 98 88 92 79 89 83 Note: To try out these examples.major ) ORDER BY grade DESC. Results: NAME MAJOR GRADE ------------.

location employee.• • • Mary Tom Sam CIS Finance Marketing 98 92 89 Note the two aliases given to the students table: s1 and s2. list all tables separated by commas.location = 'CA'. Called a Join.name employee.name. Selecting from 2 or More Tables • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • In the FROM portion. Results: NAME --------------- LOCATION ------------- ~ 57 ~ . department.department = department.department = department. These allow us to refer to different views of the same table.location. department employee.name. Results: NAME -------------------------------Jill Jack Fred List each employee name and what state (location) they work in. The WHERE part becomes the Join Condition Example table EMPLOYEE: Name Department Salary Joe Finance 50000 Alice Finance 52000 Jill MIS 48000 Jack MIS 32000 Fred Accounting 33000 Example table DEPARTMENTS: Department Location Finance NJ MIS CA Accounting CA Marketing NY List all of the employees working in California: SELECT FROM WHERE AND employee.department department.department department. employee. department employee. List them in order of location and name: SELECT FROM WHERE ORDER BY employee.

employee. department. Results: Name Joe Joe Joe Joe Alice employee.Departmen Finance Finance Finance Finance Finance Salary 50000 50000 50000 50000 52000 Department.location = 'CA'. employee.department. SELECT FROM ON ORDER BY department.location.name employee RIGHT JOIN department employee. Results: MAX(SALARY) -----------48000 Cartesian Product of the two tables: SELECT * FROM employee.name.• • • • • Fred Jack Jill Alice Joe CA CA CA NJ NJ This is similar to a LEFT JOIN.salary) employee. Show the department and location even if no employees work there.Dep Finance MIS Accounting Marketing Finance Location NJ CA CA NY NJ ~ 58 ~ . Results: DEPARTMENT ------------Accounting MIS MIS Finance Finance Marketing SELECT FROM WHERE AND LOCATION ---------------CA CA CA NJ NJ NY NAME ---------------Fred Jack Jill Alice Joe NULL What is the highest paid salary in California ? MAX(employee.department department. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • List each department and all employees that work there.location.department = department.department = department. department. department employee.department department.

LastName.CustomerID = accounts.00 $1. accounts customers. List the Customer name and their total account holdings: SELECT FROM WHERE GROUP BY Results: LASTNAME --------Axe Builder Jones Smith SELECT FROM WHERE GROUP BY Results: LASTNAME --------Axe Builder Jones Smith TotalBalance -----------$15.LastName. accounts customers.00 $1.customerid customers.00 $1.300.00 $6. 52000 52000 52000 48000 48000 48000 48000 32000 32000 32000 32000 33000 33000 33000 33000 MIS Accounting Marketing Finance MIS Accounting Marketing Finance MIS Accounting Marketing Finance MIS Accounting Marketing CA CA NY NJ CA CA NY NJ CA CA NY NJ CA CA NY In which states do our employees work ? From our Bank Accounts example.000. Sum(Balance) AS TotalBalance customers.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Alice Alice Alice Jill Jill Jill Jill Jack Jack Jack Jack Fred Fred Fred Fred SELECT FROM Finance Finance Finance MIS MIS MIS MIS MIS MIS MIS MIS Accounting Accounting Accounting Accounting DISTINCT location department.000.00 customers.00 SUM(BALANCE) -----------$15.000.LastName We can also use a Column Alias to change the title of the columns Here is a combination of a function and a column alias: ~ 59 ~ .000. Sum(Balance) customers.300.000.LastName customers.customerid customers.000.00 $1.CustomerID = accounts.00 $6.

• • • • • • • • • • • • • • • • SELECT name. tutors. STUDENTS (StudentID.studentid.name AS Student. (salary * 1. Results: ~ 60 ~ . salary AS CurrentSalary. department. For example: A student can tutor one or more other students. students tutors s1. Student_TutorID) StudentID Name Student_TutorID S101 S102 S103 S104 S105 S106 S107 • • • • • • • • Bill Alex Liz Ed Sue Petra NULL S101 S103 S103 S101 S106 Mary S101 Provide a listing of each student and the name of their tutor: SELECT FROM WHERE s1.student_tutorid = tutors. Name.name AS Tutor students s1.03) AS ProposedRaise FROM employee. A student has only one tutor. Results: name -------Alice Fred Jack Jill Joe department -----------Finance Accounting MIS MIS Finance CurrentSalary ------------52000 33000 32000 48000 50000 ProposedRaise ------------53560 33990 32960 49440 51500 Recursive Queries and Aliases • • Recall some of the E-R diagrams and relations we dealt with had a recursive relationship.

~ 61 ~ .name AS TutorName. as is.name. tutors.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Student ---------Alex Mary Sue Liz Ed Petra Tutor ---------Bill Bill Bill Mary Mary Sue The above is called a "recursive" query because it access the same table two times. Results: TutorName ---------Bill Mary Sue NumberTutored ------------3 2 1 WHERE Clause Expressions • There are a number of expressions one can use in a WHERE clause. students tutors WHERE s1. COUNT(tutors.student_tutorid = tutors.studentid = tutors.name AS Tutor students s1 LEFT JOIN students tutors s1. Use LEFT JOIN: SELECT FROM ON Results: Student ---------Bill Alex Mary Sue Liz Ed Petra Tutor ---------Bill Bill Bill Mary Mary Sue s1.student_tutorid GROUP BY s1. We give the table two aliases called s1 and tutors so that we can compare different aspects of the same table.studentid. the table is missing something: We don't see who is tutoring Bill Smith.name AS Student. Here is one more twist: Suppose we were interested in those students who do not tutor anyone? Use RIGHT JOIN How many students does each tutor work with ? SELECT s1.student_tutorid) AS NumberTutored FROM students s1. However.

• • • • • • • • • • Subqueries using EXISTS: SELECT FROM WHERE name. "You Got an A" FROM students WHERE grade between 91 and 100 • • • • • • Subqueries using = (equals): SELECT name. the subquery returns a set of tuples. SELECT FROM WHERE name employee department IN (SELECT department FROM departments WHERE location = 'CA'). The IN clause returns true when a tuple matches a member of the set.• Typical Logic expressions: COLUMN = value Also: < > = != <= >= • Also consider BETWEEN SELECT name. This assumes the subquery returns only one tuple as a result.salary) AND EXISTS (SELECT name FROM EMPLOYEE e3 ~ 62 ~ . grade FROM students WHERE grade = ( SELECT MAX(grade) FROM students ). Typically used for aggregate functions. • • • • • • • • • • • • Subqueries using IN: SELECT FROM WHERE name employee department IN ('Finance'. In the above case.salary > employee. salary employee EXISTS (SELECT name FROM EMPLOYEE e2 WHERE e2. 'MIS'). grade.

Show all employees whose name contains the letters 'en' SELECT FROM WHERE name. • • • • • • • • • • • • • • NOT EXISTS: SELECT FROM WHERE name.• • • • • • • • • WHERE Results: name ----------Joe Jill Fred e3. the % character is used as the wild card although in some DBMS.salary < employee. Generally. ~ 63 ~ .salary) salary ---------50000 48000 33000 The above query shows all employees names and salaries where there is at least one person who makes more money (the first exists) and at least one person who makes less money (second exists).salary > employee. Show all employees whose name starts with 'S' SELECT FROM WHERE name. Note that chatacters within quotes are case sensitive. salary employee NOT EXISTS (SELECT name FROM EMPLOYEE e2 WHERE e2. salary employee name LIKE '%en%'. • LIKE operator: Use the LIKE operator to perform a partial string match.salary) Results: name --------Alice salary ---------52000 Above query shows all employees for whom there does not exist an employee who is paid less. salary employee name LIKE 'S%'. the * character is used.

Removing a department would then be contingent upon no employees working in that department. Change the last name of an Employee: UPDATE employee SET last_name = 'Smith' WHERE employee_id = 'E1001'.000 DELETE employee WHERE salary > 50000. This is what we call enforcing Referential Integrity DELETE Change Values using UPDATE • • • • • • The UPDATE command is used to change attribute values in the database. • will not be successful if a constraint would be violated. ~ 64 ~ . consider the department attribute in the Employee table as a Foreign Key. Remove all employees: DELETE employee. UPDATE uses the SET clause to overwrite the value.Show all employees whose name contains the letter 'e' and the letter 'n' in that order: SELECT FROM WHERE name. Deleting Tuples with DELETE • • • • DELETE is used to remove tuples from a table. salary employee name LIKE '%e%n%' OR name LIKE '%n%e%'. • • • Remove only employees making more than $50. DELETE will remove all tuples from a table. salary employee name LIKE '%e%n%'. With no WHERE clause. Show all employees whose name contains the letter 'e' and the letter 'n' in any order: SELECT FROM WHERE name. For example. • • • • • • Remove all employees working in California: DELETE employee WHERE department IN (SELECT department FROM department WHERE location = 'CA').

last_name. AVG(salary) FROM employee GROUP BY department. Defining Views • • It is possible to define a particular view of a table (or tables). CREATE VIEW avg_sal_dept AS SELECT department. salary) CREATE VIEW emp_address AS SELECT first_name. Assume an employees table: employees(employee_id. last_name.05 WHERE employee_id = 'E1001'. street. city. department. zip FROM employee. we can define a view on that table and then use the view name when specifying queries. state.• • • • Give an Employee a raise: UPDATE employee SET salary = salary * 1. For example. street. One can then query these views as if they were tabes SELECT * FROM emp_address ORDER BY last_name. SELECT FROM WHERE * avg_sal_dept department = 'Finance'. salary FROM employee. state. first_name. zip. last_name. if we commonly access just 2 or 3 columns in a table. city. ~ 65 ~ . • • • • • • • • • • • • • • • • • • • • • • • CREATE VIEW emp_salary AS SELECT first_name.

100 ns $10's / MB 5 . Sharable . Data Storage Hierarchy Processor Registers Cache memory Main Memory (Core) Magnetic Disk (hard disk) Optical Disk (CD-ROM) Magnetic Tape 1 . Access time . inexpensive.5 ns $1000's / MB 15 . Persistent .Should facilitate sharing of data among many users. reliable and sharable storage methods with relatively rapid access time. we require persistent.Data should be accessible in a relatively short period of time.Data persists (lives on) after power is removed.30 ns $100's / MB 40 .30 ms $1 / MB 50 .Data Storage Characteristics: • • • • • • For a significant amount of data.Should not have to be replaced due to excessive errors.100 ms $1 / GB 100's ms to seconds less than $1 / GB Magnetic Disk Characteristics • We focus on magnetic disk ~ 66 ~ . Reliable . Inexpensive .typically measured on a $ per Megabyte basis.

File Operations • Consider four basic File Operations: Operation Find Insert Modify Delete Similar SQL Statement Select Insert Update Delete • • Unordered file ..time to actually read the data (blocks) from the disk and place it on the bus for main memory. 1024.The smallest unit of memory a disk can read or write.moving the disk read/write head to the right track 2. . (n is the number of records) Ordered file .waiting for the disk to rotate the track under the head 3. Unspanned Records: A record is found in one and only one block. Record Storage on Disk • • • • • • • • Relations (records) are stored on disk with each tuple written one after the other (end to end). Pad with spaces. 32 Kilobytes.e. o Insert takes constant time. Thus the Blocking factor is 2000/100 = 20 f = B/R Fixed length records: Each record is of fixed length.New record is inserted at the end of the file. o Select. ~ 67 ~ . then we can store 20 EMPLOYEE tuples (records) in one block. Transfer time . If the Block Size is 2. Typically 512 bytes. Spanned Records: Records are allowed to span across block boundaries.• • • • • Access time is the dominant cost to consider Access time consists of: 1.. Variable Length records: Each record is only as long as the data it contains. Disk Rotation time . 2048. i. Seek time . records do not span across block boundaries. Example: EMPLOYEE takes 100 bytes to store one tuple (record). The goal is to minimize seek and disk rotation delay by orienting related data on the same or adjacent tracks. Block Size ..the size of the block. Update and Delete take n/2 time. in the file.New record is inserted in order. Block .the number of tuples (records) that can fit into a single block. etc. Blocking Factor .000 bytes. o Insert takes log2n plus this time to re-organize records.

as output. Two types of indexes discussed here: Ordered index and Hashing. Example: Assume employee records.New record is inserted at the end of the file. Key attribute is stored in order in the index. o Types of Indexing • • • • An index is made up of two components: A key and a pointer The key is typically the key value for the relation and is mainly used to identify and look up records. Ordered Index • • Records are stored as they are inserted. The pointer is an address on disk where the rest of the data in the record can be found. The numeric result is the physical address for the record. Delete take at least log2n Indexed file . Update.• Select. we use a series of hash buckets. In this case. the key for a relation and returns. Update. o Insert takes constant time for the data itself plus log2n for the index o Select. Function f takes the ascii values of the first and last name and adds them. Hashing • • • • Identify a function f that takes as input. It is possible function f can map two different keys to the same address. Selection time is constant. the physical disk address for the rest of the data in the record. Delete take log2n lookup on the index followed by constant time to access data record. ~ 68 ~ . o An index is maintained that points to the location on disk where the record is found.

e. Where are the application program executed (e. 3. One must examine several criteria: 1.g. Advantages: o Excellent security and control over applications o High reliability .g. Where do the data and DBMS reside ? 2.years of proven MF technology o Relatively low incremental cost per user (just add a terminal) Disadvantages: o Unable to effectively serve advanced user interfaces o Users unable to effectively manipulate data outside of standard applications ~ 69 ~ .. Business rules are enforced in the applications running on the mainframe. Example: DB2 database and COBOL application programs running on an IBM 390. which CPU) ? This may include the user interface..Database System Architectures: • • There are a number of database system architectures presently in use.. User interface is textmode screens. IBM 3270 terminals or VT220 terminals) that have no processing power of their own. Multiple users access the applications through simple terminals (e. COBOL programs or JCL scripts that access the database. Applications are run on the same mainframe computer.g. Where are business rules enforced ? Traditional Mainframe Architecture • • • • • • • Database (or files) resides on a mainframe computer.

on the hard disk. the application is the DBMS. Example: MS Access running on a PC.Personal Computer .Stand-Alone Database • • • • • Database (or files) reside on a PC . File Sharing Architecture ~ 70 ~ . Applications run on the same PC and directly access the database. A single user accesses the applications. In such cases. Business rules are enforced in the applications running on the PC.

~ 71 ~ . Applications run on each PC on the LAN and access the same set of files on the file server. A single file server stores a single copy of the database files.Also. o Examples of clients: PCs with MS Windows operating system. MS Visual Basic. o Application communicates with DBMS server running on server machine through a Database Driver o Database driver (middleware) makes a connection to the DBMS server over a network. Forms and reports developed in: PowerBuilder. "C" or "C++". etc. o Run one or more applications using the client machine's CPU. Server Machines: o Run own copy of an operating system.a few users at most Classic Client/Server Architecture • • Client machines: o Run own copy of an operating system. the applications must handle concurrency control. Oracle Developer. Example: Sharing MS Access files on a file server.prices falling Disadvantages: o Limited data sharing ability . Borland Delphi. Advantages: o (limited) Ability to share data among several users o Costs of storage spread out among users o Most components are now commodity items . PCs on the LAN map a drive letter (or volume name) on the file server. Possibly by file locking. Business rules are enforced in the applications . memory. The application is also the DBMS. Each user runs a copy of the same application and accesses the same files.• • • • • • • • • PCs are connected to a local area network (LAN). MS Access.

etc. o See ODBC below.• • Run a Database Management System that manages a database. Additional burden on DBMS server to handle concurrency control. 3. Business rules may be enforced at: o o • • • The client application .. 3. Informix. PC with Windows operating system. As more business rule logic is programmed into the client side applications. DB2. o Examples: Sun Sparc server running UNIX operating system.so called "Fat Clients".so called "Thin Clients" A Mix of both. Advantages of client/server: 1. Processing of the entire Database System is spread out over clients and server. etc. 2. Entirely on the database server . Implementation is more complex because one needs to deal with middleware and the network. SQL) between them. Disadvantages of client/server: 1.g. Provides a Listening daemon that accepts connections from client machines and submits transactions to DBMS on behalf of the client machines. 4. DBMS can achieve high performance because it is dedicated to processing transactions (not running applications). Client Applications can take full advantage of advanced user interfaces such as Graphical User Interfaces. 1. Stored procedures and triggers can help in this case. o Examples: For Oracle: SQL*Net (or Net8) running on both client and server. 3. For Sybase: Sybase Open Client and Open Server. 2. RDBMS such as Oracle Server. Middleware: o Small portion of software that sits between client and server. ~ 72 ~ . 2. Sybase. Sybase PowerBuilder running on a client PC. Example: Oracle RDBMS running on a server. It is possible the network is not well suited for client/server communications and may become saturated. they can become unwieldy. o Establishes a connection from the client to the server and passes commands (e.

Distributed Database Architecture • In a distributed database system (DDS). Vertical: Columns in a table are split across multiple sites. Data Partitioning  Data may be split up (or partitioned) in several ways: 1. ~ 73 ~ . multiple Database Management Systems run on multiple servers (sites) connected by a network.  Splitting up data can improve performance by reducing contention for tables. • Data may be split up among the different servers or it may be replicated. 2. Both vertical and horizontal. Horizontal: Rows in a table are split up across multiple sites. 3.

City Smithville Smithville State KY KY Zip 91232 91232 Zip 81992 81990 Partition 2 Customer ID 1003 1004 Name Address Mr. Also called a synchronous replication protocol. Smith Mrs. Jones Address 123 Lexington 12 Davis Ave. Smith Mrs. Phase 2: If all sites reply with "Y". 443 Grinder Ln. Distributed Commit Protocol such as Two Phase Commit (2PC). others can continue processing the transactions. Builder 661 Parker Rd. City State Zip 91232 91232 81992 81990 Smithville KY Smithville KY Streetville GA 443 Grinder Ln. then send a "Commit" message to all sites. then the transaction is aborted. Mr. 2. Improve performance by moving a copy of data closer to the users. ~ 74 ~ .Customer Table Customer ID 1001 1002 1003 1004 Horizontal Partitioning: Partition 1 Name Mr.if one site fails. 2PC is an example of a synchronous replication protocol. Broadville GA Mr. In distributed DB. Axe Mr. Jones Mr. 661 Parker Rd. 1. Customer ID 1001 1002 Name Mr. 2. Phase 1: Send a message to all sites: "Can you commit Transaction X?" All sites that can commit this transaction reply with "Y". Jones Mr. We need mechanisms in place to ensure multiple copies of data are kept consistent. Axe 443 Grinder Ln. If any site replies "No". Axe Address 123 Lexington 12 Davis Ave. Builder Partition 2 CustID 1001 1002 1003 1004 Address 123 Lexington 12 Davis Ave. City Smithville Smithville Broadville Streetville State KY KY GA GA Zip 91232 91232 81992 81990 Data Replication • • • • • Data may also be replicated across multiple sites: 1. Smith Mrs. Improve reliability . Builder 661 Parker Rd. Recall in a centralized DB we had the notion of a commit point. we need to consider committing a transaction that changes data on multiple sites. City State Broadville GA Streetville GA Vertical Partitioning: Partition 1 CustID 1001 1002 1003 1004 Name Mr.

Try this: Visit several DBMS vendor's web sites and see if they offer an ODBC driver that can be downloaded to your PC. simply replace "PCs" with "UNIX Workstations" in the phrases above. In general. The Driver Manager presents a uniform interface to all clients. we take snapshots of a master database and propagate the changes to other sites on some periodic basis. higher performance and greater levels of independence over centralized systems. distributed database systems are also much harder to design and develop. control and administrate. Triggers and Stored Procedures ~ 75 ~ . distributed database systems offer more flexibility. Open DataBase Connectivity (ODBC) • • • • • • • Middleware has historically been proprietary. update and manipulate data on a server A DBMS Driver is typically supplied by the individual DBMS vendor and contains routines to convert requests from the Driver Manager into commands the specific DBMS understands. Security is also more difficult to enforce. BTW.• • • In Asynchronous replication. However. Note also subtle differences in SQL and how it is implemented in various DBMS. How can a single client access multiple DBMS servers with minimal changes ? ODBC is middleware software that can connect a client to multiple servers from different vendors. for those of you from the UNIX world. This consists of a set of function calls to query. ODBC has two main portions that reside on the client: A Driver Manager and one or more DBMS drivers.

Provide a mechanism to query the database in real time and format the results in HTML. The latter 2 are similar.. Stored procedures are useful in cases when standard applications logic must be implemented across all applications. Many DBMS now have the web server built in (or closely tied) to the database. etc. middleware to connect to the database.. 2. e. Oracle Web Applications Server. information is passed to a CGI script that formats the query and submits it to the DBMS. Many examples: Retail store with current products and price lists. Triggers are used to enforce business rules that all applications that use the database must adhere to. IBM DB2 supports triggers written in just about any language such as "C" and Java. ~ 76 ~ . There are two main ways to carry out dynamic real-time queries from the web: 1..g. Triggers may cause locks to be held longer than expected or may have other side effects. Oracle supports triggers written in PL/SQL. Results are returned to the CGI script which then formats the output in HTML. MS Access Internet Wizard). Employee directories. By far this is the predominant method.• • • • • • • • • Triggers are procedures or functions stored in the DBMS and are invoked when certain events occur. Internet and Intranet Databases • • • • • Companies are discovering that database can provide excellent content for web pages. 2. Results are formatted in HTML and returned to the user's browser. deleting a row in a table. Perl) that supports CGI. etc. 3. One needs: An HTTP (web) server. Also very useful when a large number of database accesses must be done with just a small result being passed back to the client. The trigger will automatically insert a new Order record in the Orders table if the quantity in inventory falls below a certain level. Programming triggers requires special attention is paid to how transaction execute. Copies of this code do not need to be distributed to the clients.g. The difference is in the last one. Using traditional HTML forms. Provide the web user with a form or other means to invoke a query on the database in real time. Stored Procedures are similar to triggers: They are functions and procedures that are stored in the database. some language (e. Several approaches to making database data available on-line: 1. On-line bankingbanks with account balance information.. the DBMS. e. Stored procedures may be called by triggers or by application programs. Example: A trigger may fire after each time an inventory record is updated. Most major DBMS support triggers. Events include: Inserting a new row into a table. users can specify some or all of the query. Periodically dump a database table to an HTML file and make the HTML file available on the web (e. updating data in a table.g.g.

If there is a problem (e.Stored procedures in the DBMS are used to accept input from HTML forms.more than one user processes the database at the same time Several issues arise: 1... perform the appropriate query and then format the results in HTML. How can we safely process transactions on the database without corrupting or losing data ? 3. Consider the following transaction that reserves a seat on an airplane flight and changes the customer: 1. power failure or system crash). How can we prevent users from interfering with each other's work ? 2.g. changes can not be written.. ~ 77 ~ . the database crashes. how can we recover without loosing all of our data ? Transaction Processing • • • • We need the ability to control how transactions are run in a multiuser database. Write charges Suppose that after the second step. A transaction is a set of read and write operations that must either commit or abort. MultiUser Databases: • • Multiuser database . Read customer information 2. Write reservation information 3. Or for some reason.

it was overwritten by user B. the incorrect amount (3) is written to the database.Logical Unit of Work. The second example works because we let user A write the new value of Prod 200 before user B can read it.04 Write Salary for emp 101 Example #2: User A Read inventory for Prod 200 Decrement inventory by 5 Write inventory for Prod 200 User B Read inventory for Prod 200 Decrement inventory by 7 Write inventory for Prod 200 First. Thus User B's decrement operation will fail.03 Write Salary for emp 101 User B Read Salary for emp 101 Multiply salary by 1. Another way to say this is transactions are Atomic. User's A and B share a bank account. Some changes to the database can be overwritten. each executing similar transactions: Example #1: User A Read Salary for emp 101 Multiply salary by 1. This is called the Lost Update problem because we lost the update from User A . where all actions are permanently saved in the database or they can abort in which case none of the actions are saved. what should the values for salary (in the first example) really be ? The DBMS must find a way to execute these two transactions concurrently and ensure the result is what the users (and designers) intended. Assume there are 10 units in inventory for Prod 200: Read inventory for Prod 200 Read inventory for Prod 200 Decrement inventory by 5 Decrement inventory by 7 Write inventory for Prod 200 Write inventory for Prod 200 for for for for for for user user user user user user A B A B A B Or something similar like: Read inventory for Prod 200 Decrement inventory by 5 Write inventory for Prod 200 Read inventory for Prod 200 Decrement inventory by 7 Write inventory for Prod 200 • • • • for for for for for for user user user user user user A A A B B B In the first case. Assume an initial balance of $200. User A reads the balance ~ 78 ~ . Consider how the operations for user's A and B might be interleaved as in example #2. All operations in a transaction must be executed as a single unit . Consider two users.• • • • • • • • • • • • • • • • • • • • • • • • • • • Transactions can either reach a commit point. These two are examples of the Lost Update or Concurrent Update problem. Here is another example.

Consider transaction A. A group of two or more concurrent transactions are serializable if we can order their operations so that the final result is the same as if we had run them in serial order (one after another). we execute one after the other (note it makes no difference which order: A then B. instead of interleaving (mixing) the operations of the two transactions. A2. B3. B. B1. C1. Each has 3 operations. ~ 79 ~ . This is called the inconsistent read problem. C3 has the same result as executing: A1. A3. Often reported as Transactions per second or TPS. C1. Concurrency Control and Locking • • • We need a way to guarantee that our concurrent transactions can be serialized. B2. C3 Then the above schedule of transactions and operations is serialized. B1. Suppose. or B then A) User User User User User User A A A B B B reads the balance deducts $100 from the balance writes the new balance of $100 reads the balance (which is now $100) deducts $100 from the balance writes the new balance of $0 • • • • • • • • • • • • If we insist only one transaction can execute at a time. B2. Concurrency Control is a method for controlling or scheduling the operations in such a way that concurrent transactions can be executed. C and D. C2. A3. Locking is one such means. then performance will be quite poor. Locking is done to data items in order to reserve them for future operations. If executing: A1. If we do concurrency control properly. C2. Transaction throughput: The number of transactions we can perform in a given time period. then we can maximize transaction throughput while avoiding any chance. in serial order. A lock is a logical flag set by a transaction to alert other transactions the data item is in use. B3.• • • • • • • User User User User User A B A B B deducts $100 from the balance reads the balance writes the new balance of $100 deducts $100 from the balance writes the new balance of $100 The reason we get the wrong final result (remaining balance of $100) is because transaction B was allowed to read stale data. A2. Characteristics of Locks • Locks may be applied to data items in two ways: Implicit Locks are applied by the DBMS Explicit Locks are applied by application programs.

this time using locks: User A places an exclusive lock on the balance User A reads the balance User A deducts $100 from the balance User B attempts to place a lock on the balance but fails because A already has an exclusive lock User B is placed into a wait state User A writes the new balance of $100 User A releases the exclusive lock on the balance User User User User User User User User B B B B A A A A places an exclusive lock on the balance reads the balance deducts $100 from the balance writes the new balance of $100 places a shared lock on item raise_rate reads raise_rate places an exclusive lock on item Amy_salary reads Amy_salary Here is a more involved example: User B places a shared lock on item raise_rate User B reads raise_rate ~ 80 ~ . A transaction acquires locks on data items it will need to complete the transaction. Two Phase Locking • • • • • • • • • • • • • • • • • • • • • • • • • The most commonly implemented locking mechanism is called Two Phased Locking or 2PL. Consider our prior example. 1. an entire database This is referred to as the Lock granularity • Locks may be of type types depending on the requirements of the transaction: 1. 2PL has two phases: Growing and shrinking. This is called the growing phase. An Exclusive Lock prevents any other transaction from reading or modifying the locked item. Once one lock is released. 2PL is a concurrency control mechanism that ensure serializability. A Shared Lock allows another transaction to read an item but prevents another transaction from writing the item.• Locks may be applied to: 1. an entire table 5. 2. 2. all no other lock may be acquired. a page (memory segment) (many rows worth) 4. an entire row of a table 3. a single data item (value) 2. This is called the shrinking phase.

• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • User A calculates a new salary as Amy_salary * (1+raise_rate) User User User User B B B B places an exclusive lock on item Bill_salary reads Bill_salary calculates a new salary as Bill_salary * (1+raise_rate) writes Bill_salary User A writes Amy_salary User A releases exclusive lock on Amy_salary User B releases exclusive lock on Bill_Salary User B releases shared lock on raise_rate User A releases shared lock on raise_rate Here is another example: User A places a shared lock on raise_rate User B attempts to place an exclusive lock on raise_rate Placed into a wait state User A places an exclusive lock on item Amy_salary User A reads raise_rate User A releases shared lock on raise_rate User B places an exclusive lock on raise_rate User A reads Amy_salary User B reads raise_rate User A calculates a new salary as Amy_salary * (1+raise_rate) User B writes a new raise_rate User B releases exclusive lock on raise_rate User A writes Amy_salary User A releases exclusive lock on Amy_salary Deadlock • • • • • • • Locking can cause problems. however. Consider: User A places an exclusive lock on item 1001 User B places an exclusive lock on item 2002 User A attempts to place an exclusive lock on item 2002 User A placed into a wait state User B attempts to place an exclusive lock on item 1001 ~ 81 ~ .

Two main ways to deal with deadlock.. Reprocessing • • • In a Reprocessing approach. 2. Database Recovery and Backup • • • • There are many situations in which a transaction may not reach a commit or abort point. The system might lose power 4. A second transaction has locked those needed items but is awaiting the release of locks the first transaction is holding so it can continue. 1. In any of these situations. Automated Recovery with Rollback / Rollforward • • • • We apply a similar technique: Make periodic saves of the database (time consuming operation). One transaction has locked some of the resources and is waiting for locks so it can complete. 1. Transactions might have other (physical) consequences 3. ~ 82 ~ . Rollback: Undo any partially completed transactions (ones in progress when the crash occurred) by applying the before images to the database. However. After Image: A copy of the table record (or page) of data after it was changed by the transaction. Database Recovery is the process of restoring the database and the data to a consistent state. Prevent it in the first place by giving each transaction exclusive rights to acquire all locks needed before proceeding. The DBMS can crash 3. Allow the deadlock to occur. 5. • This is called a deadlock. Human error can result in deletion of critical data. An operating system crash can terminate the DBMS processes 2. the latest database save is restored and all of the transactions are reapplied (by users) to bring the database back up to the point just before the crash. the database is periodically backed up (a database save) and all transactions applied since the last save are recorded If the system crashes. Several shortcomings: 1.g. data in the database may become inconsistent or lost. This transaction log Includes before images and after images Before Image: A copy of the table record (or page) of data before it was changed by the transaction. system crash).. This may include restoring lost data up to the point of the event (e. A disk may fail or other hardware may fail. Re-applying concurrent transactions is not straight forward. maintain a more intelligent log of the transactions that have been applied. Two approaches are discussed here: Reprocessing and Rollback/Rollforward.• • • • User B placed into a wait state . then break it by aborting one of the transactions. Time required to re-apply transactions 2.

Then start up again. it is not possible to backup its files as the resulting backup copy on tape may be inconsistent.copy everything on to tape. Sometimes called a delta backup. The DBMS flushes all pending transactions and writes all data to disk and transaction log. Recovery process uses both rollback and rollforward to restore the database. data may become unreadable. we would need to rollback to the last database save and then rollforward to the point just before the crash. Most modern DBMS allow for incremental backups. In the worst case. Database Backup • • • • • • • When secondary media (disk) fails. However. This is done for transactions that were committed before the crash. Database can be recovered from the last checkpoint in much less time. when an DBMS is running. May be infeasible to do often. One solution: Shut down the DBMS (and thus all applications). ~ 83 ~ . Weekend: Do a shutdown of the DBMS. An Incremental backup will backup only those data changed or added since the last full backup.• • • • • • Rollforward: Redo the transactions by applying the after images to the database. 2. do a full backup . Checkpoints can also be taken (less time consuming) in between database saves. Nightly: Do an incremental backup onto different tapes for each night of the week. We typically rely on backing up the database to cheaper magnetic tape or other backup medium for a copy that can be restored. and full backup of the database onto a fresh tape(s). Follows something like: 1.

Sign up to vote on this title
UsefulNot useful

Master Your Semester with Scribd & The New York Times

Special offer: Get 4 months of Scribd and The New York Times for just $1.87 per week!

Master Your Semester with a Special Offer from Scribd & The New York Times