This action might not be possible to undo. Are you sure you want to continue?
Q: What is a Database ? Answer from Pratt/Adamski: o A Database (DB) is structure that can store information about: 1. multiple types of entities, 2. the attributes that describe those entities; and 3. the relationships among the entities Answer from Elmasri/Navathe: o A Database (DB) is collection of related data - with the following properties: 1. A DB is logically coherent and has some relevant meaning 2. A DB is designed, built and populated with data for a specific purpose 3. A DB represents some aspect of the real world. Answer from Kroenke: An integrated, self-describing collection of related data o Integrated: Data is stored in a uniform way, typically all in one place (a single physical computer for example) o Self-Describing: A database maintains a description of the data it contains (Catalog) o Related: Data has some relationship to other data. In a University we have students who take courses taught by professors o By taking advantage of relationships and integration, we can provide information to users as opposed to simply data. o We can also say that the database is a model of what the users perceive. o Three main categories of models: 1. User or Conceptual Models: How users perceive the world and/or the business. 2. Logical Models: Represent the logic of how a a business operates. For example, the relationship between different entities and the flow of data through the organization. Based on the User's model. 3. Physical Models: Represent how the database is actually implemented on a computer system. This is based on the logical model. Database Management System (DBMS) A collection of software programs that are used to define, construct, maintain and manipulate data in a database. Database System (DBS) contains: The Database + The DBMS + Application Programs (what users interact with)
File System: A collection of individual files accessed by applications programs Limitations of a File System: o Separated and Isolated Data - Makes coordinating, assimilating and representing data difficult o Data Duplication - Wastes space and can lead to data integrity (inconsistency) problems o Application Program Dependencies - Changes to a single file can require changes to numerous application programs o Incompatible Files o Lack of Data Sharing - Difficult to control access to files, especially to individual portions of files Advantages of a DBMS A DBMS can provide: o Data Consistency and Integrity - by controlling access and minimizing data duplication o Application program independence - by storing data in a uniform fashion o Data Sharing - by controlling access to data items, many users can access data concurrently o Backup and Recovery o Security and Privacy o Multiple views of data
An Example Database CustomerID 123 123 124 125 125 127 127
• • • • • • •
Name Mr. Smith Mr. Smith Mrs. Jones Mr. Axe Mr. Axe Mr. & Mrs. Builder Mr. & Mrs. Builder
State Acct_Number 9987 9980 8811 4422 4433 3322 1122
Balance 4000 2000 1000 6000 9000 500 800
123 Lexington Smithville KY 123 Lexington Smithville KY 12 Davis Ave. 443 Grinder Ln. 443 Grinder Ln. Smithville KY Broadville GA Broadville GA
661 Parker Rd. Streetville GA 661 Parker Rd. Streetville GA
What happens when a customer moves to a new house ? Who should have access to what data in this database ? What happens if Mr. and Mrs. Builder both try and withdraw $500 from account 3322 ? What happens if the system crashes just as Mr. Axe is depositing his latest paycheck ? What data is the customer concerned with ? What data is a bank manager concerned with ? Send a mailing to all customers with checking accounts having greater than $2000 balance Let all GA customers know of a new branch location
Brief History of Database Systems
1940's, 50's Initial use of computers as calculators. Limited data, focus on algorithms. Science, military applications. 1960's Business uses. Organizational data, customer data, sales, inventory, accounting, etc. File system based, high emphasis on applications programs to extract and assimilate data. Larger amounts of data, relatively simple calculations. 1970's The relational model. Data separated into individual tables. Related by keys. Initially required heavy system resources. Examples: Oracle, Sybase, Informix, Digital RDB, IBM DB2. 1980's Microcomputers - the IBM PC, Apple Macintosh. Database program such as DBase (sort of), Paradox, FoxPro, MS Access. Individual user can crate, maintain small databases.
Late- 1980's Local area networks. Workgroups sharing resources such as files, printers, e-mail. Client/Server Database resides on a central server, applications programs run on client PCs attached to the server over a LAN. 1990's Internet and World Wide Web make databases of all kinds available from a single type of client - the Web Browser. Data warehousing and Data Mining also emerge. Other types of Databases: o Object-Oriented Database Systems. Objects (data and methods) stored persistently. o Distributed Database Systems. Copies of data reside at different locations for redundancy or for performance reasons.
Appropriate Use for a Database
In addition to the advantages already mentioned: o Performance o Expendability, Flexibility, Scalability o Reduced application development times o Standards enforcement However, keep in mind: o DBMS has High initial cost (although falling) o DBMS has High Overhead - requires powerful computers o DBMS are not special purpose software programs e.g., contrast a canned accouting software package like Quicken or QuickBooks with DBMS like MS Access.
When is a DBMS Not Appropriate? o Database is small with a simple structure o Applications are simple, special purpose and relatively static. o Applications have real-time requirements Examples: Traffic signal control ECU patient monitoring o Concurrent, multi-user access to data is not required.
Contents of a Database
A Database contains:
• • • •
User Data Metadata Indexes Application metadata
~ 5 ~ . How should we split data into the tables ? What are the relationships between the tables ? There are questions that are answered by Database Modeling and Database Design. Metadata • • • Recall that a database is self describing Metadata: Data about data. What were some problems we discussed ? Here is one improvement . Axe Address 123 Lexington 12 Davis Ave. column name. Jones Mr. For our purposes. primary keys.split into 2 tables: Customer Table CustomerID 123 124 125 127 Accounts Table CustomerID Acct_Number Balance 123 123 124 125 125 127 127 • • • Name Mr.CustomerID column. Broadville GA Mr. A set of columns forms a database record. Note relationship between the two tables .• • • • • Data users work with directly by entering. updating and viewing. Data that describe how user data are stored in terms of table name. Smith Mrs. length. etc. Recall our example database for the bank. data type. City State Smithville KY Smithville KY Streetville GA 443 Grinder Ln. The Accounts table has 7 records and 3 columns. & Mrs. data will be generally stored in tables with some relationships between tables. Builder 661 Parker Rd. 9987 9980 8811 4422 4433 3322 1122 4000 2000 1000 6000 9000 500 800 The customer table has 4 records and 5 columns. Each table has one or more columns.
queries and other application components. Sorting and Searching: An index for our new banking example might include the account numbers in a sorted order. Forms. It can also show metadata for Queries. Indexes allow the database to access a record without having to search through the entire table. choose Analyze and then Documentor). reports. Networked. This tool queries the system tables to give all kinds of Metadata for tables. etc. relationships and constraints in data o Is independent of any application program o Changes infrequently Data Model: o A set of primitives for defining the structure of a database. Have a look at the Database Documentor feature of MS Access (under the tools menu. Updating data requires an extra step: The index must also be updated. Data Modeling and Database Design • • • Database Design: The activity of specifying the schema of a database in a given data model Database Schema: The structure of a database that: o Captures data types. Reports. Applications Metadata is accessed via the database development programs. etc. o A set of operations for specifying retrieval and updates on a database o Examples: Relational. ~ 6 ~ .• Metadata are typically stored in System tables or System Catalog and are typically only directly accessible by the DBMS or by the system administrator. Example: Look at the Documentor tool in MS Access. in an MS Access database. indexes provide an alternate means of accessing user data. we focus on the Relational data model. Object-Oriented In this course. In the case of the book. the pointer is a page number. • Database Instance or State: The actual data contained in a database at a given time. Applications Metadata • • • Many DBMS have storage facilities for forms. Hierarchical. Example: Index in a book consists of two things: 1) A Keyword stored in order 2) A pointer to the rest of the information. Indexes • • • • • In keeping with our desire to provide users with several different views of data.
Account_Number is the key for the ACCOUNTS table. Zip ACCOUNTS Customer_Id. Given a Customer_Id. This logical model is then converted to a physical data model (tables. Data Modeling: Based on user requirements. columns. • Tables CUSTOMERS Customer_Id. even though Access allows you to use spaces. form a logical model of the system. Deployment: The system is deployed to users. The following is a very brief outline describing the database development process. etc. Testing: The system is tested using real data. it is not a good idea. Account_Number. Designing A Database . Applications are then written to perform the required functions.The Database Development Process Two overall approaches: 1. We call Customer_Id a Key for the CUSTOMERS table. Top-Down: Design systems from an overall organization perspective 2. Maintenance of the system begins. what functions should be supported.) that will be implemented. Account_Type. State. City. A Systems Analysis and Design course (such as CIS 3900 for undergraduates. o o o Customer_Id is the key for the CUSTOMERS table. Customer_Id in the ACCOUNTS table is called a Foreign Key Notice that when naming columns in the tables we always use an underscore character and do not use any other punctuation.A Brief Example For our Bank example. a database can be created. Date_Opened. etc. how the system should behave. • • • • • User needs assessment and requirements gathering: Determine what the user's are looking for. Balance Note that we use an artificial identifier (a number we make up) for the customer called Customer_Id. Implementation: Based on the data model. we can uniquely identify the remaining information. ~ 7 ~ . Name. CIS 9490 for graduates) covers these topics in greater detail. Street.one system at a time. Bottom-Up: Design systems from a specific perspective . relationships. There are many variations to this basic development process. lets assume that the managers are interested in creating a database to track their customers and accounts.
• Accounts Table ~ 8 ~ . & Mrs. Smith Mrs. we call this a One to Many relationship. (1:N).2 Column Column • • • We use the above information to build a logical model of the database.• • Relationships The relationship between CUSTOMERS and ACCOUNTS is by Customer_Id. Domain also includes the type and length or size of data found in each column. Jones Mr. This logical model is then converted to a physical model and implemented as tables. Broadville GA Mr. City State Zip 91232 91232 81992 81990 Smithville KY Smithville KY Streetville GA 443 Grinder Ln. The following is some example data for the Accounts and Customers tables: Customer Table Customer_Id 123 124 125 127 Name Mr. CUSTOMERS Domain Data Type Size Customer_Id (Key) Integer 20 Name Character 30 Street Character 30 City Character 25 State Character 2 Zip Character 5 ACCOUNTS Domain Data Type Size Customer_Id (FK) Integer 20 Account_Number (Key) Integer 15 Account_Type Character 2 Date_Opened Date Balance Real 12. Domains A domain is a set of values that a column may have. Since a customer may have more than one account at the bank. Builder 661 Parker Rd. Axe Address 123 Lexington 12 Davis Ave.
For example: 1.Customer_Id Account_Number Account_Type Date_Opened Balance 123 123 124 125 125 127 127 • 9987 9980 8811 4422 4433 3322 1122 Checking Savings Savings Checking Savings Savings Checking 10/12/89 10/12/89 01/05/92 12/01/94 12/01/94 08/22/94 11/13/88 4000. ~ 9 ~ . How do we enforce business rules ? o o Constraints on the database Applications Entity Relationship Modeling • Entity Relationship Modeling: A Set of constructs used to interpret. An account balance can never be negative. 4. A Customer can not be deleted if they have an existing (open) account. Money can only be transferred from a "Savings" account to a "Checking" account. 2.00 1000. Savings accounts with less than a $500 balance incur a service charge. 3.00 Business Rules Business rules allow us to specify constraints on what data can appear in tables and what operations can be performed on data in tables. specify and document logical data requirements for database processing systems.00 2000.00 800.00 500.00 9000.00 6000.
For example. Here we call them entities. in the ER Model. Relationship. Entity: Some identifiable object relevant to the system being built. E-R Modeling Constructs • • • E-R Modeling Constructs are: Entity. Attributes of entity EMPLOYEE might include: EmployeeID Social Security Number First Name Last Name Street Address City State ZipCode Date Hired Health Benefits Plan Attributes of entity PRODUCT might include: ProductID Product_Description Weight Size Cost ~ 10 ~ . we do not refer to tables. Examples of Entities are: EMPLOYEE CUSTOMER ORGANIZATION PART INGREDIENT PURCHASE ORDER CUSTOMER ORDER PRODUCT An instance of an entity is like a specific example: Bill Gates is an Employee of Microsoft SPAM is a Product Greenpeace is an Organization Flour is an ingredient • Attribute: A characteristic of an Entity.• • • E-R Models are Conceptual Models of the database. Properties used to distinguish one entity instance from another. Mainly differences in notation. Many variations of E-R Modeling used in practice. They can not be directly implemented in a database. Attributes. Identifiers It is important to get used to this terminology and to be able to use it at the appropriate time. symbols used to represent the 4 main constructs.
o Relationship Cardinality refers to the number of entity instances involved in the relationship. • Identifier: A special attribute used to identify a specific instance of an entity. Also called HAS-A relationship. Exercise: Choose one of your attributes as the identifier for each of the entities above. one CUSTOMER may place many CUSTOMER ORDERS one EMPLOYEE must fill out one or more PAY SHEETS This is also called "minimal cardinality" or the "optionality" of a relationship. • Relationship: An association between two entities. The two entities involved might be coalesced into one. For example: one CUSTOMER may place many CUSTOMER ORDERS many STUDENTS may sign up for many CLASSES one EMPLOYEE receives one PAYCHECK one SALESPERSON is assigned one COMPANY_CAR 1:N "One to Many" N:M "Many to Many" o o o o o 1:1 "One to One" Beware of 1:1 relationships. Typically split these into two 1:N relationships with an intersection entity. Most relationships in databases are binary. o Relationships of degree 2 are called binary relationships. For example. o A relationship can include one or more entities o The degree of a relationship is the number of Entities that participate in the relationship. o A CUSTOMER places a CUSTOMER ORDER An EMPLOYEE takes a CUSTOMER ORDER A STUDENT enrolls in a COURSE A COURSE is taught by a FACULTY MEMBER o Relationships are typically given names. E-R Diagrams • The most common way to represent the E-R constructs is by using a diagram ~ 11 ~ .Exercise: Come up with a list of attributes for each of the entities above. Participation of instances in a relationship may be mandatory or optional. o Typically we look for unique identifiers: o Social Security Number uniquely identifies an EMPLOYEE o CustomerID uniquely identifies a CUSTOMER o We can also use two attributes to indicate an identifier: ORDER_NUMBER and LINE_ITEM uniquely identify an item on an order. Beware of N:M relationships.
Degree: Shown by line segments between the relationship diamond and 2 or more entities. For this diagram: • • • • An ORDER must be placed by one and only one CUSTOMER. but you get the point. Oracle Designer/2000 and Visible Analyst. Cardinality: Displayed inside the relationship diamond. The entity name appears inside. Relationships can be displayed as diamonds (see below) or can be simply line segments between two entities. Variation One . A CUSTOMER may place zero or more ORDERS. Elmasri/Navathe textbook. Optionality: Mandatory participation indicated by an intersecting hash mark made perpendicular to the relationship line segment. Variation Two . degree. need to convey: Relationship name. An ORDER may have zero or more ITEMS. entities are depicted as rectangles with either pointed or rounded corners. cardinality.Elmasri/Navathe Book ~ 12 ~ . optionality (minimal cardinality) Here we will give examples from 4 variations: The Kroenke textbook. These are admittedly clumsy. An ITEM must have one and only one ORDER.What the Kroenke book uses • • • • Relationship Name: Displayed just outside of the relationship diamond. In almost all variations.• • • • • There are a wide variety of notations for E-R Diagrams. Optional participation indicated by a 0 intersecting the relationship line segment. For Relationships. Most of the differences concern how relationships are specified and how attributes are shown.
For example: An ORDER must be placed by one and only one CUSTOMER. Degree: Shown by line segments between the relationship diamond and 2 or more entities. Split up the cardinality. Optionality: Mandatory participation indicated by double relationship line Optional participation indicated by a single relationship line. There are two phrases. Multiple participation ("N") is indicated by crow's feet Optionality: Mandatory participation is indicated by a solid relationship line segment. ~ 13 ~ . Degree: Shown by line segments between any two entities. 3 way relationships as described in the Kronke book can not exist. Variation Three . relationships are expressed in a rigid sentence format. This phrase is then written along the line segments for the relationship. Cardinality: Single participation ("1" in the previous example) is indicated by a single line segment.• • • • Relationship Name: Displayed just inside the relationship diamond. As such. The "be" is mandatory making the verb difficult to get right. one for each direction of the relationship. Cardinality: Displayed between the participating entity and the relationship diamond next to the relationship line. Relationship Names: Are expressed as a verb phrase starting with "be". Relationship diamonds are not used.Oracle Designer/2000 CASE • • • • • • In Oracle Corporation's Designer/2000. Optional participation is indicated by a dotted line segment.
The relationships use the following symbols: o For cardinality. ~ 14 ~ . One ITEM must be an item on one and only one ORDER. Variation Four . One CUSTOMER may be placing zero or more ORDERS. the crow's feet are used to show a "Many" side of a relationship. a Customer May place one or more Orders. o Optional participation is shown with an open circle.Visible Analyst • • Visible Analyst Workbench (VAW) uses the rounded box to show an Attributive Entity one that depends on the existence of a fundamental entity (noted by just the rectangle). Thus in the above diagram. There are a set of tools that can print these "relationship sentences". One ORDER may be made up of zero or more ITEMS. o A single line show a "One" side of the relationship.• • • • One ORDER must be placed by one and only one CUSTOMER.
o Mandatory participation is shown with two hash marks. Thus in the above diagram. Displaying Attributes ~ 15 ~ .Sybase PowerDesigner This is not an Entity Relationship Diagram! It is true: The "Relationships" screen in MS Access is NOT an Entity Relationship diagramming tool. Variation Five . an Order Must be placed by one and only one Customer. This is a "physical" level diagram of how the tables are actually created.
Gets messy. Examples of strong entities: People. Weak Entities • • • • • • Broad definition. List attributes inside of the entity box. Materials Banks Examples of ID Dependent entities: Dependents (of employees). 1. Customers. an Entity-Relationship diagram should show only entities and their relationships. Attributes appear in ovals attached to the entity. Consider: Entity-Relationship-Attribute (ERA) model. It must be identified with a specific Order. Bank Branches (of Banks). Employees. 2. Vendors. Note that an ITEM can not exist by itself. ~ 16 ~ . ID Dependent entities are sometimes shown with curved boxes as in the Visible Analyst ER example. Elmasri/Navathe definition: Weak entity: Entity types that do not have key attributes of their own. Two main ways to display attributes associated with an entity. Parts. Students Products. Weak Entity: An entity that depends on another for its existence.• • • Technically. ID Dependent Entity: A weak entity that includes the identifier of the related strong entity. Resources. Services. Clients.
~ 17 ~ . Time of day. Duration) LongDistance Call (Source#. Duration. Destination#. Long distance Carrier) Cell Phone Call (Source#. • Second approach. Time of day. LandTime. Consider: Phone Call (Source#. AirTime) One approach would be to put all of the attributes into a single entity. Destination#. Subtype Entities • • • Attributes of two or more Entities may overlap significantly but not completely.• • The Elmasri/Navathe notation shows the ID Dependent entity with a double box. Time of day. Final note: ID Dependent entities will always result in relations (and later on tables) with composite keys. put common attributes into a parent or supertype entity and then have 3 subtype entities. The "identifying relationship" (from the strong entity to the weak entity) is shown with a double diamond. Destination#.
The d in the circle indicates the subtype entity is distinct. the double line between the Call entity and the d in the circle indicates the relationship is mandatory. As before. All values for a given column (attribute) must be of the same type.• Relationship is called an IS-A relationship. ~ 18 ~ . the Relational Model consists of the elements: relations. The above diagram uses the Oracle Designer/2000 symbols for Supertype/Subtype. Each column (attribute) value must be a single value only. which are made up of attributes. A relation is a set of columns (attributes) with values for each attribute such that: 1. Only one subtype entity can participate in an instance. 3. Below is the same diagram drawn using E-R symbols from the Elmasri/Navathe book. The Relational Model • • Recall. 2. Each column (attribute) name must be unique.
-> is read functionally determines Student_ID -> Student_Major Student_ID. An attribute is functionally dependant on another if we can use the value of one attribute to determine the value of another. Also this is pretty much an art as opposed to an exact science. Example: Employee_Name is functionally dependant on Social_Security_Number because Social_Security_Number can be used to determine the value of Employee_Name. Artist Model. The order of columns is insignificant 5. Users can offer some guidance as to what would make an appropriate key. Functional Dependencies • • • • • • • • • • • A Functional Dependency describes a relationship between attributes in a single relation. The order of the rows (tuples) is insignificant. From our discussion of E-R Modelling. The process we are following is: 1. Gather user/business requirements. 6.• • 4. One can read A -> B as. A key functionally determines a tuple (row). Not all determinants are keys. Semester# -> Grade SKU -> Compact_Disk_Title. Course#. Options. "A determines B". Tax -> Car_Price Course_Number. copies of attributes (the identifiers) were placed in related relations. thus a candidate key would consist of all of the attributes in a relation. Convert the E-R Model to a set of relations in the relational model 4. 3. 2. Section -> Professor. The selection of keys will depend on the particular application being considered. We also discussed how. we know that an Entity typically corresponds to a relation and that the Entity's attributes become attributes of the relation. Implement the database by creating a table for each normalized relation. 5. Keys and Uniqueness • • • • • • Key: One or more attributes that uniquely identify a tuple (row) in a relation. Classroom. Develop the E-R Model (shown as an E-R Diagram) based on the user/business requirements. Normalize the relations to remove any anomalies (***). Number of Students The attributes listed on the left hand side of the -> are called determinants. depending on the relationships between entities. Recall that no two relations should have exactly the same values. We use the symbol -> to indicate a functional dependency. No two rows (tuples) in a relation can be identical. ~ 19 ~ .
Modification Anomalies • Once our E-R model has been converted into relations. we may find that some relations are not properly specified. There can be a number of problems: o Deletion Anomaly: Deleting a relation results in some related information (from another entity) being lost.this situation might not be feasible. o Insertion Anomaly: Inserting a relation requires we have information from two or more entities . • Here is a quick example: A company has a Purchase order form: • Our dutiful consultant creates the E-R Model: ~ 20 ~ .
00 6 $3. Ship_To.00 11 $2. PartNum. Description. Normalization ~ 21 ~ . Qty) PO_HEADER (PO_Number.. PODate.) Consider some sample data for the LINE_ITEMS relation: PO_Number O101 O101 O101 O102 O102 O103 • • • • ItemNum PartNum Description Price I01 I02 I03 I01 I02 I01 P99 P98 P77 P99 P77 P33 Plate Cup Bowl Plate Bowl Fork Qty $3.00 7 $1. Consider the performance impact. Vendor.50 8 What are some of the problems with this relation ? What happens when we delete item 2 from Order O101 ? These problems occur because the relation in question contains data about 2 or more themes..00 5 $2.00 5 $2. . ItemNum. Price.LINE_ITEMS (PO_Number. Typical way to solve these anomalies is to split the relation in to two or more relations Process called Normalization.
Each column (attribute) name must be unique. The order of columns is insignificant.00 01/06/94 112.50 01/07/94 102. Date.00 01/06/94 100. Close_Price) Company Symbol Headquarters Date Close Price ~ 22 ~ . Close_Price) Company Symbol IBM IBM IBM Netscape Netscape IBM IBM IBM NETS NETS Date Close Price 01/05/94 101.00 01/05/94 33. If you have a key defined for the relation. Normal forms are given name such as: o First normal form (1NF) o Second normal form (2NF) o Third normal form (3NF) o Boyce-Codd normal form (BCNF) o Fourth normal form (4NF) o Fifth normal form (5NF) o Domain-Key normal form (DK/NF) These forms are cumulative. 6. 5. This is one reason why we often use artificial identifiers as keys. Symbol. Close Price is dependent on Company. No two rows (tuples) in a relation can be identical. First Normal Form (1NF) • • • A relation is in first normal form if it meets the definition of a relation: 1. All values for a given column (attribute) must be of the same type. The order of the rows (tuples) is insignificant. 4. Date and Symbol. Date The following example relation is not in 2NF: STOCKS (Company.00 Second Normal Form (2NF) • • • • • A relation is in second normal form (2NF) if all of its non-key attributes are dependent on all of the key. A relation in Third normal form is also in 2NF and 1NF. Headquarters. 3.• • • • Relations can fall into one or more categories (or classes) called Normal Forms Normal Form: A class of relations free from a certain set of modification anomalies. 2. In the example below. Date. then you can meet the unique row requirement. Relations that have a single attribute for a key are automatically in 2NF. Each column (attribute) value must be a single value only. Symbol. Example relation in 1NF: STOCKS (Company.
One Solution: Split this up into two relations: COMPANY (Company. Headquarters Symbol IBM IBM IBM NETS NETS • Symbol.IBM IBM IBM Netscape Netscape • • • • • • IBM IBM IBM NETS NETS Armonk.00 Sunyvale.00 01/05/94 33. NY Armonk. Headquarters • Consider that Company. Date -> Close Price.00 01/06/94 112. Headquarters) STOCKS (Symbol. Date -> Close Price Symbol.00 01/06/94 100. CA 01/06/94 112.00 01/06/94 100. Date. CA 01/05/94 33. Date -> Close Price Date Close Price 01/05/94 101. Close_Price) Company Symbol Headquarters IBM Netscape IBM NETS Armonk. NY 01/05/94 101.00 Sunyvale. NY Armonk.50 01/07/94 102. Headquarters Symbol -> Company. However: Company -> Headquarters This violates the rule for 2NF. Headquarters Symbol -> Company. Date as our key. So we might use Company. NY Sunnyvale.00 Third Normal Form (3NF) • A relation is in third normal form (3NF) if it is in second normal form and it contains no transitive dependencies.50 01/07/94 102. Date -> Close Price Company -> Symbol. Symbol.00 Company. Also. ~ 23 ~ . consider the insertion and deletion anomalies. CA • • Company -> Symbol.
Example: At CUNY: Course_Code -> Course_Num. Recall that not all determinants are keys. Section -> Classroom. Eventually. Professor Example: At Rutgers: Course_Index_Num -> Course_Num. Section -> Classroom. ~ 24 ~ . Professor Example: Company County Tax Rate IBM AT&T Putnam 28% Bergen 26% • • • • • • • Company -> County and County -> Tax Rate thus Company -> Tax Rate What happens if we remove AT&T ? We loose information about 2 different themes. we select a single candidate key to be the primary key for the relation. Consider the following example: Funds consist of one or more Investment Types. Section Course_Num. Split this up into two relations: Company County IBM AT&T Putnam Bergen • Company -> County County Tax Rate Putnam 28% Bergen 26% • County -> Tax Rate Boyce-Codd Normal Form (BCNF) • • • • • A relation is in BCNF if every determinant is a candidate key. Section Course_Num. Funds are managed by one or more Managers Investment Types can have one more Managers Managers only manage one type of investment. B and C.• • • • • • • • • Consider relation R containing attributes A. Those determinants that are keys we initially call candidate keys. If A -> B and B -> C then A -> C Transitive Dependency: Three attributes with the above dependencies.
Manager by itself is not a candidate key because we cannot use Manager alone to uniquely identify a tuple in the relation. Manager YES Manager NO YES 3. ~ 25 ~ . InvestmentType. the combination FundID and InvestmentType form a candidate key because we can use FundID. Manager to uniquely identify a tuple. Manager) in 1NF. InvestmentType) Rorig(FundID. InvestmentType as the Primary Key: 1NF for sure. Manager) In this last step. InvestmentType. List all of the determinants. InvestmentType FundID. For our example: Rorig(FundID. Manager Manager 2. we have retained the determinant "Manager" in the original relation Rorig. create a new relation from the functional dependency.InvestmentType to uniquely identify a tuple in the relation. 2NF because all of the non-key attributes (Manager) is dependant on all of the key. Is this relation R(FundID. 3NF because there are no transitive dependencies. Create a new relation from the functional dependency: Rnew(Manager. Consider what happens if we delete the tuple with FundID 22. For any determinant that is not a candidate key. 3. Manager -> InvestmentType Manager -> InvestmentType • • • In this case. 2NF or 3NF ? Given we pick FundID. 2.FundID InvestmentType Manager 99 99 33 22 11 • • • • • • • Common Stock Common Stock Growth Stocks Common Stock Smith Green Brown Smith Municipal Bonds Jones FundID. We loose the fact that Brown manages the InvestmentType "Growth Stocks. See if each determinant can act as a key (candidate keys). Which determinants can act as keys ? FundID. InvestmentType FundID. The determinants are: FundID. InvestmentType -> Manager FundID. Retain the determinant in the original relation. Manager) 1." The following are steps to normalize a relation into BCNF: 1. Similarly. the combination FundID and Manager also form a candidate key because we can use FundID.
one can determine multiple values of B. ~ 26 ~ . Given A. No regular functional dependencies 2. Insertion anomaly: Cannot add a stock fund without adding a bond fund (NULL Value). for example. Must always maintain the combinations to preserve the meaning. Rowe Price Emerging Markets Bond Fund • A few characteristics: 1. and C. Latter two attributes are independent of one another. More formally. one can determine multiple values of C. StudentID 100 100 100 100 200 Major CIS CIS Activities Baseball Volleyball Accounting Baseball Accounting Volleyball Marketing Swimming • • StudentID ->-> Major StudentID ->-> Activities Portfolio ID 999 999 999 999 888 Stock Fund Janus Fund Janus Fund Municipal Bonds Bond Fund Dreyfus Short-Intermediate Municipal Bond Fund Scudder Global Fund Municipal Bonds Scudder Global Fund Dreyfus Short-Intermediate Municipal Bond Fund Kaufmann Fund T. Multivalued Dependency: A type of functional dependency where the determinant can determine more than one value. 2. All three attributes taken together form the key. 3.Fourth Normal Form (4NF) • • • • A relation is in fourth normal form if it is in BCNF and it contains no multivalued dependencies. Book example: Student has one or more majors. B and C are independent of one another. call them A. There must be at least 3 attributes in the relation. B. 3. there are 3 criteria: 1. Student participates in one or more activities. 4. Given A.
Examples: 1. Functional Dependencies 2. PortfolioID PortfolioID ->-> ->-> Stock Fund Bond Fund Resolution: Split into two tables with the common key: Portfolio ID 999 999 888 Portfolio ID 999 999 888 Municipal Bonds Dreyfus Short-Intermediate Municipal Bond Fund T. Intra-relation rules However: Does Not include time dependent constraints. NULL values) and semantic (logical) description of what values an attribute can hold. De-Normalization • Consider the following relation: CUSTOMER (CustomerID. size. Zip) ~ 27 ~ . Domain: The physical (data type. Name. We don't consider these issues here. Rowe Price Emerging Markets Bond Fund Stock Fund Janus Fund Scudder Global Fund Kaufmann Fund Bond Fund Fifth Normal Form (5NF) • • There are certain conditions under which after decomposing a relation. Constraint: An rule governing static values of an attribute such that we can determine if this constraint is True or False. Domain Key Normal Form (DK/NF) • • A relation is in DK/NF if every constraint on the relation is a logical consequence of the definition of keys and domains. State. • • • Key: Unique identifier of a tuple. There is no known algorithm for converting a relation directly into DK/NF. Inter-relation rules 4. Multivalued Dependencies 3. it cannot be reassembled back into its original form.• • • • Stock Fund and Bond Fund form a multivalued dependency on Portfolio ID. City. Address.
• • •
This relation is not in DK/NF because it contains a functional dependency not implied by the key.
Zip -> City, State
We can normalize this into DK/NF by splitting the CUSTOMER relation into two: CUSTOMER (CustomerID, Name, Address, Zip) CODES (Zip, City, State) We may pay a performance penalty - each customer address lookup requires we look in two relations (tables). In such cases, we may de-normalize the relations to achieve a performance improvement.
Many of you asked for a "complete" example that would run through all of the normal forms from beginning to end using the same tables. This is tough to do, but here is an attempt: Example relation: EMPLOYEE ( Name, Project, Task, Office, Phone ) Note: Keys are underlined. Example Data: Name Project Task Office Floor Phone Bill Bill Bill Bill Sue Sue Sue Ed
• • •
100X 100X 200Y 200Y 100X 200Y 300Z 100X
T1 T2 T1 T2 T33 T33 T33 T2
400 400 400 400 442 442 442 588
4 4 4 4 4 4 4 5
1400 1400 1400 1400 1442 1442 1442 1588
• • •
Name is the employee's name Project is the project they are working on. Bill is working on two different projects, Sue is working on 3. Task is the current task being worked on. Bill is now working on Tasks T1 and T2. Note that Tasks are independent of the project. Examples of a task might be faxing a memo or holding a meeting. Office is the office number for the employee. Bill works in office number 400. Floor is the floor on which the office is located. Phone is the phone extension. Note this is associated with the phone in the given office.
First Normal Form
Assume the key is Name, Project, Task. Is EMPLOYEE in 1NF ?
Second Normal Form
• • •
List all of the functional dependencies for EMPLOYEE. Are all of the non-key attributes dependant on all of the key ? Split into two relations EMPLOYEE_PROJECT_TASK and EMPLOYEE_OFFICE_PHONE. EMPLOYEE_PROJECT_TASK (Name, Project, Task) Name Project Task Bill Bill Bill Bill Sue Sue Sue Ed 100X 100X 200Y 200Y 100X 200Y 300Z 100X T1 T2 T1 T2 T33 T33 T33 T2
EMPLOYEE_OFFICE_PHONE (Name, Office, Floor, Phone) Name Office Floor Phone Bill Sue Ed 400 442 588 4 4 5 1400 1442 1588
Third Normal Form
• • • •
Assume each office has exactly one phone number. Are there any transitive dependencies ? Where are the modification anomalies in EMPLOYEE_OFFICE_PHONE ? Split EMPLOYEE_OFFICE_PHONE. EMPLOYEE_PROJECT_TASK (Name, Project, Task) Name Project Task Bill 100X T1
Bill Bill Bill Sue Sue Sue Ed
100X 200Y 200Y 100X 200Y 300Z 100X
T2 T1 T2 T33 T33 T33 T2
EMPLOYEE_OFFICE (Name, Office, Floor) Name Office Floor Bill Sue Ed 400 442 588 4 4 5
EMPLOYEE_PHONE (Office, Phone) Office Phone 400 442 588 1400 1442 1588
Boyce-Codd Normal Form
List all of the functional dependencies for EMPLOYEE_PROJECT_TASK, EMPLOYEE_OFFICE and EMPLOYEE_PHONE. Look at the determinants. Are all determinants candidate keys ?
Forth Normal Form
• • •
Are there any multivalued dependencies ? What are the modification anomalies ? Split EMPLOYEE_PROJECT_TASK.
EMPLOYEE_PROJECT (Name, Project ) EMPLOYEE_TASK (Name, Task ) Name Task Bill Bill Sue Ed T1 T2 T33 T2
Name Project Bill Bill Sue Sue Sue Ed 100X 200Y 100X 200Y 300Z 100X
EMPLOYEE_OFFICE (Name, Office, Floor) Name Office Floor Bill Sue Ed 400 442 588 4 4 5
R4 (Office, Phone) Office Phone 400 442 588 1400 1442 1588
At each step of the process, we did the following: 1. 2. 3. 4. Write out the relation (optionally) Write out some example data. Write out all of the functional dependencies Starting with 1NF, go through each normal form and state why the relation is in the given normal form.
Another short example
Consider the following example of normalization for a CUSTOMER relation.
Zip. Street. Zip. BCNF Relation CUSTOMER is not in BCNF because one of the determinants Zip can not act as a key for the entire relation. Kroenke (7th ed. Phone) ZIPCODES (Zip. Solution: Split CUSTOMER into two relations: CUSTOMER (CustomerID. 2NF All non key attributes are dependent on all of the key. Street.Relation Name CUSTOMER (CustomerID. Name.) Rob/Coronel (5th ed) Hoffer. 3NF There are no transitive dependencies. Phone Zip -> City. Prescott & McFadden (6th ed. Zip. City. Relational Algebra: Elmasri/Navathe (3rd) ed.) Connolly/Begg (3rd Ed.) N/A MataToledo / Cushman Shaum's Outlines Ch. 2 Chapter 7 Chapter 8 Chapter 4 N/A ~ 32 ~ . City. State) Check both CUSTOMER and ZIPCODE to ensure they are both in 1NF up to BCNF. State. City. State. Old Bridge Functional Dependencies CustomerID -> Name. New Brunswick NJ 07101 732-555-1212 07066 908-555-1212 Mary Green 11 Birch St. • 4NF There are no multi-valued dependencies in either CUSTOMER or ZIPCODES. As a final step. Phone) Example Data CustomerID C101 C102 Name Bill Smith Street City State Zip NJ Phone 123 First St. Name. Street. consider de-normalization. State Normalization • • • • 1NF Meets the definition of a relation.
The order of the rows (tuples) is insignificant. A relation is a set of attributes with values for each attribute such that: 1. Division Set Theoretic Operations Consider the following relations R and S R First Bill Last Age Smith 22 Sally Green 28 Mary Keen 23 Tony Jones 32 S First Forrest Sally Last Gump Green Age 36 28 DonJuan DeMarco 27 • • Union: R S Result: Relation with tuples from R and S with duplicates removed. 2. Relational Algebra is a collection of operations on Relations. The order of attributes is insignificant 5. 4. Join. Difference and Cartesian product. which are made up of attributes. Difference: R . Two main collections of relational operators: 1. Each attribute name must be unique. Set theory operations: Union. 2. the Relational Model consists of the elements: relations. 3. Each attribute value must be a single value only (atomic). 6. All values for a given attribute must be of the same type (or domain).S Result: Relation with tuples from R but not from S ~ 33 ~ .• • • • • Recall. Relations are operands and the result of an operation is another relation. Specific Relational Operations: Selection. Intersection. No two rows (tuples) in a relation can be identical. Projection.
~ 34 ~ . intersection and difference operations. The degree of relation R is the number of attributes it contains. Some additional properties: o Union. Domain is the datatype and size of an attribute.• Intersection: R S Result: Relation with tuples that appear in both R and S. Intersection and difference operators may only be applied to Union Compatible relations. R S First Bill Sally Mary Tony Forrest Last Smith Green Keen Jones Gump Age 22 28 23 32 36 DonJuan DeMarco 27 R-S First Last Bill Age Smith 22 Mary Keen 23 Tony Jones 32 R S First Last Age Sally Green 28 Union Compatible Relations • • • • • • Attributes of relations need not be identical to perform union. However. they must have the same number of attributes or arity and the domains for corresponding attributes must be identical. Definition: Two relations R and S are union compatible if and only if they have the same degree and the domains of the corresponding attributes are the same.
o o o Union and Intersection are commutative operations R S=S R R S=S R Difference operation is NOT commutative. Convention is to use the attribute names from the first relation.R The resulting relations may not have meaningful names for the attributes.T is not equal to T .R Cartesian Product • Produce all combinations of tuples from two relations. Exercises • Assume relation T fName Sally Mary lName Green Score 44 28 William Smith Kontrary 27 • Compute R T Compute R T Show that R . R First Last Bill Age Smith 22 Mary Keen 23 Tony Jones 32 S ~ 35 ~ .S not equal S . R .
The resulting relation will have the same degree as the original relation. The selection operator is sigma: The selection operation acts like a filter on a relation by returning only a certain number of tuples. C (R) Returns only those tuples in R that satisfy condition C A condition C can be made up of any combination of comparison or logical operators that operate on the attributes of R.Dinner Steak Dessert Ice Cream Lobster Cheesecake RXS First Last Age Dinner Bill Bill Smith 22 Smith 22 Steak Steak Steak Dessert Ice Cream Ice Cream Ice Cream Lobster Cheesecake Lobster Cheesecake Lobster Cheesecake Mary Keen 23 Mary Keen 23 Tony Jones 32 Tony Jones 32 Selection Operator • • • • • • • • Selection and Projection are unary operators. o Comparison operators: • o Logical operators: Use the Truth tables (memorize these) for logical expressions: T T F T F F F F T F T T T F T F T F F T ~ 36 ~ . The tuples to be returned are dependent on a condition that is part of the selection operator. The resulting relation may have fewer tuples than the original relation.
Selection Examples Assume the following relation EMP has the following tuples: Name Smith Jones Green Office Dept 400 220 160 CS Rank Assistant Econ Adjunct Econ Assistant CS Fin Associate Associate Brown 420 Smith • 500 Select only those Employees in the CS department: Dept = 'CS' (EMP) Result: Name Smith Office Dept 400 CS CS Rank Assistant Associate Brown 420 • Select only those Employees with last name Smith who are assistant professors: Name = 'Smith' Rank = 'Assistant' (EMP) Rank Assistant Result: Name Office Dept Smith • 400 CS Select only those Employees who are either Assistant Professors or in the Economics department: Rank = 'Assistant' Dept = 'Econ' (EMP) Result: Name Office Dept Smith Jones 400 220 CS Rank Assistant Econ Adjunct Econ Assistant Green 160 ~ 37 ~ .
• Do expressions 2. • Project only the names and departments of the employees: name. 2. The degree of the resulting relation may be equal to or less than that of the original relation. 3 and 4 above all evaluate ot the same thing? Projection Operator • • • • • • Projection is also a Unary operator. use R and S from the Set Theoretic Operations section above. The general syntax is: attributes R Where attributes is the list of attributes to be displayed and R is the relation. dept (EMP) Results: ~ 38 ~ . (EMP) Rank = 'Associate' ( Dept = 'CS' EMP ) Dept = 'CS' ( Rank = 'Associate' EMP ) (Rank = 'Adjunct' Dept = 'CS') Rank = 'Associate' Age > 26 Dept = 'CS' (EMP) (R S) For this expression.• Select only those Employees who are not in the CS department or Adjuncts: (Rank = 'Adjunct' Dept = 'CS') (EMP) Result: Name Office Dept Green 160 Smith 500 Rank Econ Assistant Fin Associate Exercises • Evaluate the following expressions: 1. 4. 3. The resulting relation will have the same number of tuples as the original relation (unless there are duplicate tuples produced). The Projection operator is pi: Projection limits the attributes that will be returned from the original relation. Projection Examples Assume the same EMP relation above is used. 5.
rank (Rank = 'Adjunct' Age > 22 Dept = 'CS') (EMP) ) (R S) ) For this expression. age ( name. use R and S from the Set Theoretic Operations section above. office > 300 ( name. ( fname. rank ( (Rank = 'Adjunct' Dept = 'CS') (EMP) ) Result: Name Rank Green Assistant Smith Associate Exercises • Evaluate the following expressions: 1. rank (EMP)) ~ 39 ~ . Show the names of all employees working in the CS department: name ( Dept = 'CS' (EMP) ) Results: Name Smith Brown • Show the name and rank of those Employees who are not in the CS department or Adjuncts: name. 3.Name Smith Jones Green Dept CS Econ Econ Brown CS Smith Fin Combining Selection and Projection • • The selection and projection operators can be combined to perform both operations. 2.
Aggregate Functions • • We can also apply Aggregate functions to attributes and tuples: o SUM o MINIMUM o MAXIMUM o AVERAGE. MEAN. MEDIAN o COUNT Aggregate functions are sometimes written using the Projection operator or the Script F character: as in the Elmasri/Navathe book. Aggregate Function Examples Assume the relation EMP has the following tuples: Name Smith Jones Green Office Dept 400 220 160 CS Salary 45000 Econ 35000 Econ 50000 CS Fin (EMP) 65000 60000 Brown 420 Smith • 500 Find the minimum Salary: Results: MIN (salary) MIN(salary) 35000 • Find the average Salary: Results: AVG (salary) (EMP) AVG(salary) 51000 • Count the number of employees in the CS department: Results: COUNT(name) 2 COUNT (name) ( Dept = 'CS' (EMP) ) ~ 40 ~ .
Dept = depart. The generic join operator (called the Theta Join is: It takes as arguments the attributes from the two relations that are to be joined. MainOffice. Phone) : EMP EMP. For example assume we have the EMP relation as above and a separate DEPART relation with (Dept.Dept DEPART DEPART.Dept MainOffice CS Econ 41 Office EMP. Join Examples Assume we have the EMP relation from above and the following DEPART relation: Dept CS MainOffice 404 Phone 555-1212 555-1234 555-4321 555-9876 Econ 200 Fin Hist • 501 100 Find all information on every employee including their department info: EMP Results: Name Smith Jones emp.Dept = DEPART.Dept Salary 400 220 CS Econ 45000 35000 Phone 555-1212 555-1234 404 200 ~ ~ .Dept • • • DEPART • • • The join condition can be When the join condition operator is = then we call this an Equijoin Note that the attributes in common are repeated.• Find the total payroll for the Economics department: Results: SUM(salary) 85000 SUM (salary) ( Dept = 'Econ' (EMP) ) Join Operation • Join operations bring together two relations and combine their attributes and tuples in a specific fashion.
any attributes in common (such as dept above) are repeated.mainoffice) (emp. The natural join operator is: * We can also assume using * that the join condition will be = on the two attributes in common.dept = depart.Dept MainOffice CS Econ Fin 404 200 501 Phone 555-1212 555-1234 555-4321 Green 160 Smith 500 Natural Join • • • • • Notice in the generic (Theta) join operation. The Natural Join operation removes these duplicate attributes.dept) DEPART Name Office EMP.office < depart. Example: EMP * DEPART Results: Name Smith Jones Green Office Dept Salary 400 220 160 CS 45000 MainOffice 404 200 200 404 501 Phone 555-1212 555-1234 555-1234 555-1212 555-4321 Econ 35000 Econ 50000 CS Fin 65000 60000 Brown 420 Smith 500 Outer Join ~ 42 ~ .Dept Salary Smith 400 CS Econ Fin 45000 50000 60000 DEPART.Green 160 Econ CS Fin 50000 65000 60000 Econ CS Fin 200 404 501 555-1234 555-1212 555-4321 Brown 420 Smith • 500 Find all information on every employee including their department info where the employee works in an office numbered less than the department main office: EMP Results: (emp.
• • • In the Join operations so far. Right Outer Join 3. only those tuples from both relations that satisfy the join condition are included in the output relation.food = menu. Three types of outer joins: 1.Food menu. Full Outer Join • Examples: Assume we have two relations: PEOPLE and MENU: PEOPLE: Name Age Alice Bill Carl Dina 21 24 23 19 Food Hamburger Pizza Beer Shrimp MENU: Food Pizza Day Monday Hamburger Tuesday Chicken Pasta Tacos Wednesday Thursday Friday • PEOPLE people.Food Day ~ 43 ~ .food Name Age people. The Outer join includes other tuples as well according to a few rules.Food menu. Left Outer Join includes all tuples in the left hand relation and includes only those matching tuples from the right hand relation.food MENU Day Tuesday Monday NULL NULL Name Age people. includes all tuples in the left hand relation and from the right hand relation.food = menu. includes all tuples in the right hand relation and includes ony those matching tuples from the left hand relation. 2.Food Alice Bill Carl Dina • 21 24 23 19 Hamburger Pizza Beer Shrimp MENU Hamburger Pizza NULL NULL PEOPLE people.
Operator is: * Example: PEOPLE * MENU Name Alice Bill Carl Age 21 24 23 Food Day Hamburger NULL Pizza Beer NULL NULL ~ 44 ~ .Bill Alice 24 21 Pizza Hamburger Pizza Hamburger Chicken Pasta Tacos Monday Tuesday Wednesday Thursday Friday NULL NULL NULL NULL NULL NULL NULL NULL NULL • PEOPLE people.food = menu.Food menu.Food Hamburger Pizza Beer Shrimp Hamburger Pizza NULL NULL Chicken Pasta Tacos Day Tuesday Monday NULL NULL Wednesday Thursday Friday NULL NULL NULL NULL NULL NULL NULL NULL NULL Outer Union • • • The Outer Union operation is applied to partially union compatible relations.food MENU Name Alice Bill Carl Dina Age 21 24 23 19 people.
This is shown below: The following dialog box will appear: ~ 45 ~ . One way to do this is to use the Symbol choice on the Insert menu in MS Word. Since we mainly use MS Word or another word processor running in Microsoft Windows. Most of the relational algebra symbols can be produced using the "Symbol" font. we demonstrate them here. it is very helpful to be able to type these relational algebra symbols into MS Word or other work processor.Dina 19 Shrimp NULL NULL NULL Hamburger Monday NULL NULL Pizza NULL NULL Chicken NULL NULL Pasta NULL NULL Tacos Tuesday Wednesday Thursday Friday How to make Relational Algebra Symbols in MS Word When doing homework assignments and projects.
o ANSI 1990 . Oracle. retrieve and update data from tables.adds some Object oriented concepts SQL has two major parts: Data Definition Language (DDL) Used to create (define) data structures such as tables.SQL 2 Standard (sometimes called SQL-92) o SQL 3 . ~ 46 ~ . Structured Query Language • • • • • • SQL was first implemented in IBM's System R in the late 1970's.By default. "C". etc. the symbols displayed on this screen will use the Symbol font. All of the relational algebra symbols are included. Some symbols such as join and outer join are not available in this fashion. Data Manipulation Language (DML) is used to store. but the majority of SQL is standard across MS Access. SQL is a standardized language monitored by the American National Standards Institute (ANSI) as well as by National Institute of Standards (NIST). Pascal. indexes. Informix.SQL 1 standard o ANSI 1992 . For these you can copy and paste the graphics in the MS Word file linked here. Some minor syntax differences. SQL is either specified by a command-line tool or is embedded into a general purpose programming language such as Cobol. Sybase. SQL is the de-facto standard query language for creating and manipulating data in relational databases. clusters 2. etc. 1.
For example: HH:MM:SS:dd • • • • TIMESTAMP INTERVAL Offset from UTZ. +/. Numeric Data Types • • • Integers: INTEGER. Examples of Data Types for Some Popular RDBMS • • Data types most often used are shown in Bold letters MS Access Examples from the MS Access Help File (c) Microsoft: Storage Data Type Range of Values Size Byte 1 byte 0 to 255 ~ 47 ~ . DATE Has 10 positions in the format: YYYY-MM-DD TIME Has 8 positions in the format: HH:MM:SS TIME(i) Defines the TIME data type with an additional i positions for fractions of a second. REAL.SQL Data Types • Each implementation of SQL uses slightly different names for the data types. Month Day: 19972011 o Store as Julian date: 1997283 • Both MS Access and Oracle store date and time information together in a DATE data type. Other ways of expressing dates: o Store as characters or integers with Year.HH:MM Used to specify some span of time measured in days or minutes. etc. INT or SMALLINT Real Numbers: FLOAT. Fixed length of n characters: CHAR(n) or CHARACTER(n) Variable length up to size n: VARCHAR(n) Date and Time • • • • Note: Implementations vary widely for these data types.j) Character Strings • • • Two main types: Fixed length and variable length. DOUBLE. PRECISION Formatted Numbers: DECIMAL(i.j). NUMERIC(i.
402823E38 to -1.94065645841247E-324 to point) 1.477. String (fixedLength of 1 to approximately 65. VARCHAR2 o Others: BOOLEAN.483. Data Definition Language • • • • • • DDL is used to define the schema of the database. characters) string length • Oracle supports the following data types: o Numeric: BINARY_INTEGER. INT. 9999.402823E38 for positive values.Boolean 2 bytes True or False. LONG RAW. length) string Variant (with 16 bytes Any numeric value up to the range of a Double. SMALLINT o Date: DATE Note: Also stores time. DECIMAL. POSITIVEN.647. DEC. Drop or Alter a table Create or Drop an Index Define Integrity constraints Define access privileges to users ~ 48 ~ . NATURAL.203.4 bytes 1. Currency (scaled 8 bytes -922. RAW Note: You will not need to memorize the above two tables for exams. REAL.483. NUMBER.8 bytes negative values. point) Double (double-1. o Character: CHAR. PLS_INTEGER.400 for MS Windows length) string length version 3.400.5807. integer) Single (single-3.79769313486232E308 to -4. CHARACTER. 100 to December 31. LONG.1).767. 2 billion (approx. numbers) Variant (with 22 bytes + Same range as for variable-length String. NUMERIC.337.477.685.337. INTEGER.94065645841247E-324 for precision floating. precision floating.5808 to 922. 4. 65.79769313486232E308 for positive values. Create a database schema Create.648 to 2. POSITIVE. Long (long 4 bytes -2. Integer 2 bytes -32.685. They are only there for your reference.401298E-45 for negative values. integer) Date 8 bytes January 1. NATURALN.768 to 32. etc.203.147. VARCHAR. FLOAT. DOUBLE PRECISION.147. String (variable.401298E-45 to 3.10 bytes + 0 to approx. Object 4 bytes Any Object reference. STRING.
Under the View menu. From this point. columns and other database objects. do not include spaces in the names. part_number VARCHAR(12) NOT NULL. • • • • • • • • • • • • • • • • • • Creating a Table: CREATE TABLE employee ( Last_Name VARCHAR(20) First_name VARCHAR(18) Soc_Sec VARCHAR(11) Date_of_Birth DATE. bill_to VARCHAR(35). order_date DATE. bill_to_city VARCHAR(20). bill_to_zip VARCHAR(10). ~ 49 ~ . Employee_Soc_Sec VARCHAR(11) NOT NULL ).2) ) . CREATE TABLE dependant ( Last_Name VARCHAR(20) NOT NULL. Most of the DDL statements below (including domains. NOT NULL.0) NOT NULL. sales_person VARCHAR(25). PRIMARY KEY (order_number) ). do not call the last name column: Last Name If you wish to separate words in a name.0) NOT NULL. Specifying Primary and Foreign keys: CREATE TABLE order_header ( order_number NUMBER(10. Soc_Sec VARCHAR(11) NOT NULL. then choose Design View and then close the next dialog box. go to the Queries form and choose New. line_item NUMBER(4. Note that MS Access's DDL syntax is extremely limited. use the underscore character. choose SQL. Salary NUMBER(8. Date_of_Birth DATE.• • Define access privileges on objects SQL2 specification supports the creation of multiple schemas per database each with a distinct owner and authorized users. NOT NULL. CREATE TABLE order_items ( order_number NUMBER(10. NOT NULL. you can type in any SQL statement and execute it. • • • • • • • • • • • • • • • • • Note: When naming tables. First_name VARCHAR(18) NOT NULL. bill_to_state VARCHAR(2). bill_to_address VARCHAR(45). NOT NULL constraints and referential integrity constraints) are not supported.0) NOT NULL. Creating a Schema Note: To try out these SQL examples in MS Access. For example.
FOREIGN KEY (part_number) REFERENCES parts (part_number) ). Action to take if referential integrity is violated: o SET NULL . FORIEGN KEY (order_number) REFERENCES order_header (order_number).Store a given default value i no value is specified o PRIMARY KEY . Specify when constraint should be enforced: o Immediate o Deferrable until commit time Referential Integrity Constraint: Specify the behavior for child tuples when a parent tuple is modified. CREATE INDEX items_index ON order_items (order_number. o SET DEFAULT . Specifying Constraints on Columns and Tables • • • • • • • Constraints on attributes: o NOT NULL . LastName TEXT. line_item).• • • • • • • • • • • • • • • • • • • • • • • • quantity NUMBER(4. ~ 50 ~ .Attribute may not take a NULL value o DEFAULT .0) NOT NULL.Orphans.Child tuples are updated (or deleted) according to the action take on the parent tuple. Example from MS Access: CREATE TABLE employee ( FirstName TEXT.Set the value of the foreign key to some default value. o CASCADE .0). PRIMARY KEY (order_number.Indicates which attribute(s) must have unique values.Child tuples foreign key is set to NULL . ssn INTEGER CONSTRAINT ssnConstraint PRIMARY KEY ). This enforces referential integrity o UNIQUE .Indicate which attribute(s) form a foreign key. CREATE INDEX order_index ON order_header (order_number) ASC . CREATE INDEX employee_index ON employee (ssn) . Examples of ON DELETE and ON UPDATE CREATE TABLE order_items ( order_number NUMBER(10.Indicate which attribute(s) form the primary key o FOREIGN KEY . line_item) ASC .
part_number VARCHAR(12) NOT NULL.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • line_item NUMBER(4. CREATE TABLE order_header ( order_number NUMBER(10. line_item NUMBER(4. sales_person VARCHAR(25). quantity NUMBER(4. line_item). CONSTRAINT pk_order_header PRIMARY KEY (order_number) ).0) NOT NULL. bill_to_address VARCHAR(45). order_date DATE.0). FORIEGN KEY (order_number) REFERENCES order_header (order_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. CONSTRAINT fk2_order_items FOREIGN KEY (part_number) REFERENCES parts (part_number) ON DELETE SET DEFAULT ON UPDATE CASCADE ). part_number VARCHAR(12) NOT NULL.0) NOT NULL. line_item). FOREIGN KEY (part_number) REFERENCES parts (part_number) ). CONSTRAINT pk_order_items PRIMARY KEY (order_number. quantity NUMBER(4.0) NOT NULL.0) NOT NULL. CONSTRAINT fk1_order_items FORIEGN KEY (order_number) REFERENCES order_header (order_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. PRIMARY KEY (order_number. bill_to_city VARCHAR(20).0). An even better approach is to create the tables without constraints and then add them separately with ALTER TABLE statements ~ 51 ~ . bill_to_state VARCHAR(2). bill_to_zip VARCHAR(10). Constraints can also be given names so that they can later be modified or dropped easily. bill_to VARCHAR(35). CREATE TABLE order_items ( order_number NUMBER(10.
We give the first part of the index name as "idx" just as a convention.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • CREATE TABLE order_header ( order_number NUMBER(10. CREATE TABLE order_items ( order_number NUMBER(10. ALTER TABLE order_items ADD CONSTRAINT pk_order_items PRIMARY KEY (order_number.0) NOT NULL. bill_to_city VARCHAR(20).0) ). ALTER TABLE order_header ADD CONSTRAINT pk_order_header PRIMARY KEY (order_number).0) NOT NULL. part_number VARCHAR(12) NOT NULL. sales_person VARCHAR(25). bill_to_address VARCHAR(45). ~ 52 ~ . order_date DATE. line_item NUMBER(4. ALTER TABLE order_items ADD CONSTRAINT fk1_order_items FORIEGN KEY (order_number) REFERENCES order_header (order_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. quantity NUMBER(4. bill_to_state VARCHAR(2). Creating indexes on table columns • • • • • To speed up retrieval of orders given order_number: CREATE INDEX idx_order_number ON order_header (order_number) . ALTER TABLE order_items ADD CONSTRAINT fk2_order_items FOREIGN KEY (part_number) REFERENCES parts (part_number) ON DELETE SET DEFAULT ON UPDATE CASCADE. line_item) . bill_to VARCHAR(35).0) NOT NULL. bill_to_zip VARCHAR(10) ). To speed up retrieval of orders given sales person: CREATE INDEX idx_sales_person ON order_header (sales_person) .
. column2. DML is then used to manipulate (select. DROP CONSTRAINT table_name. DROP TABLE table_name Remove the table and all of its data. indexes. "TN". . "Rich". • • Adding Attributes: ALTER TABLE student ADD admission DATE. street. • • • • DROP SCHEMA schema_name RESTRICT Removes the schema only if it is empty. columnX) VALUES (val1. "Fillville". close_price) ~ 53 ~ .constraint_name Removes a constraint from a table. last_name. close_date.00.. Examples: INSERT INTO employee (first_name. val2. • • DROP INDEX index_name Removes an index. all tables. "123 Sticks Ln. Removing Attributes (not widely implemented): ALTER TABLE student DROP home_phone. .Removing Schema Components with DROP • DROP SCHEMA schema_name CASCADE Drop the entire schema including all tables. domains. state. CASCADE option deletes all data. zip) VALUES ("Buddy". update. city.. etc.". valX). Data Manipulation Language • DDL is used to create and specify the schema. INSERT INTO stocks (symbol. Changing Schema Components with ALTER • Changing Attributes: ALTER TABLE student ALTER last_name VARCHAR(35). Inserting Data into Tables • • • • • • • • General syntax: INSERT INTO tablename (column1. DROP TABLE table_name RESTRICT Remove the table only if it is not referenced (via a FORIEGN KEY constraint) by other tables. ALTER TABLE student ALTER gpa DROP DEFAULT ALTER TABLE student ALTER gpa SET DEFAULT 0.. "31212"). DROP TABLE table_name CASCADE Remove the table and all related tables as specified by FOREIGN KEY constraints. insert. delete) data.
88. INSERT INTO student_grades (student_id. condition column1. .. close_date. . condition2. "Quiz 1". city. SELECT syntax: SELECT FROM WHERE GROUP BY HAVING ORDER BY column1. column2. last_name. . column2. 104. last_name. test_name. street.. grade) VALUES (101. "03-JUN-94". first_name employees last_name = "Smith" first_name DESC ~ 54 ~ .conditionM column1. . tableZ condition1.. first_name.• • • • VALUES ("IBM". zip) and a "Stocks" table: stocks(symbol. columnN tableA.. state. columnN Assume an employees table: employees(employee_id.. • Quotes are placed around the data depending on the Data type and on the specific RDBMS being used: RDBMS Text Data Type Dates DATETIME: Either " or ' DATE: ' DATE: ' MS Access TEXT: Either " or ' Oracle IBM DB2 Sybase VARCHAR: ' VARCHAR: ' CHAR and VARCHAR: " DATE: " Retrieving Data from Tables with Select • • • • • • • • Main way of getting data out of tables is with the SELECT statement. close_price) • • • • • • • Some example queries: SELECT FROM WHERE ORDER BY employee_id.. . tableB... "B+").. score.25)..
first_name employees salary > 40000 last_name. first_name DESC SELECT * FROM employees ORDER BY 2. SELECT FROM WHERE symbol. close_date Relational Operators and SQL • • • • • • • • • • • • • Relational operators each have implementations in SQL. close_price stocks close_date >= "01-JAN-95" symbol. last_name. first_name ( salary > 40000 (EMPLOYEE) ) SELECT employee_id. last_name. employee_id. first_name FROM employee WHERE salary > 40000 AVG (salary) ( state = 'NJ' (EMPLOYEE) ) SELECT AVG(salary) FROM employee WHERE state = 'NJ' last_name = 'Smith' state = 'NY' (EMPLOYEE) SELECT * FROM employee WHERE last_name = 'Smith' AND state = 'NY' SQL Built-in Functions Example Table students: Name Bill Major CIS Grade 95 ~ 55 ~ . last_name.• • • • • • • • • • • • • • • • • • • • SELECT FROM WHERE ORDER BY employee_id. close_date. close_price stocks close_date > "01-JAN-95" AND symbol = "IBM" ORDER BY close_date SELECT FROM WHERE ORDER BY symbol.
.Mary Sue Tom Alex Sam Jane .----Mary 98 Show the students with the highest grades in each major: SELECT name. Under the View menu.major ) ORDER BY grade DESC. choose SQL. Go to the Queries form and choose New. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Average grade in the class: SELECT AVG(grade) FROM students. grade FROM students WHERE grade = ( SELECT MAX(grade) FROM students ). create the table in MS Access and enter the data shown above.major = s2. then choose Design View and then close the next dialog box.----- ~ 56 ~ .1428571 Give the name of the student with the highest grade in the class: This is an example of a subquery SELECT name. Results: AVG(GRADE) ---------89.-------------------. grade FROM students s1 WHERE grade = ( SELECT max(grade) FROM students s2 WHERE s1. Results: NAME MAJOR GRADE ------------. major. CIS Marketing Finance CIS Marketing Finance 98 88 92 79 89 83 Note: To try out these examples.. Results: NAME GRADE -------------.
name.department department.department = department. Results: NAME --------------- LOCATION ------------- ~ 57 ~ .name employee. Called a Join.location = 'CA'. department. Selecting from 2 or More Tables • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • In the FROM portion. list all tables separated by commas. Results: NAME -------------------------------Jill Jack Fred List each employee name and what state (location) they work in. department employee.name. These allow us to refer to different views of the same table.• • • Mary Tom Sam CIS Finance Marketing 98 92 89 Note the two aliases given to the students table: s1 and s2. List them in order of location and name: SELECT FROM WHERE ORDER BY employee. The WHERE part becomes the Join Condition Example table EMPLOYEE: Name Department Salary Joe Finance 50000 Alice Finance 52000 Jill MIS 48000 Jack MIS 32000 Fred Accounting 33000 Example table DEPARTMENTS: Department Location Finance NJ MIS CA Accounting CA Marketing NY List all of the employees working in California: SELECT FROM WHERE AND employee. department employee.department = department.location employee.department department.location. employee.
name employee RIGHT JOIN department employee.location. department employee.salary) employee.department = department.department = department. department.Departmen Finance Finance Finance Finance Finance Salary 50000 50000 50000 50000 52000 Department. employee. Results: MAX(SALARY) -----------48000 Cartesian Product of the two tables: SELECT * FROM employee. SELECT FROM ON ORDER BY department. Results: Name Joe Joe Joe Joe Alice employee.department department.name. • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • List each department and all employees that work there. Results: DEPARTMENT ------------Accounting MIS MIS Finance Finance Marketing SELECT FROM WHERE AND LOCATION ---------------CA CA CA NJ NJ NY NAME ---------------Fred Jack Jill Alice Joe NULL What is the highest paid salary in California ? MAX(employee.department. Show the department and location even if no employees work there.department department.location.location = 'CA'. department.• • • • • Fred Jack Jill Alice Joe CA CA CA NJ NJ This is similar to a LEFT JOIN. employee.Dep Finance MIS Accounting Marketing Finance Location NJ CA CA NY NJ ~ 58 ~ .
300.300.00 $6.00 $1. accounts customers.CustomerID = accounts. accounts customers.00 customers.CustomerID = accounts.000. Sum(Balance) customers.LastName We can also use a Column Alias to change the title of the columns Here is a combination of a function and a column alias: ~ 59 ~ .00 $1.00 $6.customerid customers.00 $1.000.customerid customers.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Alice Alice Alice Jill Jill Jill Jill Jack Jack Jack Jack Fred Fred Fred Fred SELECT FROM Finance Finance Finance MIS MIS MIS MIS MIS MIS MIS MIS Accounting Accounting Accounting Accounting DISTINCT location department.00 $1.000. Sum(Balance) AS TotalBalance customers.LastName.000. List the Customer name and their total account holdings: SELECT FROM WHERE GROUP BY Results: LASTNAME --------Axe Builder Jones Smith SELECT FROM WHERE GROUP BY Results: LASTNAME --------Axe Builder Jones Smith TotalBalance -----------$15.00 SUM(BALANCE) -----------$15.000.LastName.000.LastName customers. 52000 52000 52000 48000 48000 48000 48000 32000 32000 32000 32000 33000 33000 33000 33000 MIS Accounting Marketing Finance MIS Accounting Marketing Finance MIS Accounting Marketing Finance MIS Accounting Marketing CA CA NY NJ CA CA NY NJ CA CA NY NJ CA CA NY In which states do our employees work ? From our Bank Accounts example.
For example: A student can tutor one or more other students. salary AS CurrentSalary. A student has only one tutor. (salary * 1. Student_TutorID) StudentID Name Student_TutorID S101 S102 S103 S104 S105 S106 S107 • • • • • • • • Bill Alex Liz Ed Sue Petra NULL S101 S103 S103 S101 S106 Mary S101 Provide a listing of each student and the name of their tutor: SELECT FROM WHERE s1.• • • • • • • • • • • • • • • • SELECT name. tutors. department.studentid. Name.name AS Student. students tutors s1. STUDENTS (StudentID.student_tutorid = tutors. Results: name -------Alice Fred Jack Jill Joe department -----------Finance Accounting MIS MIS Finance CurrentSalary ------------52000 33000 32000 48000 50000 ProposedRaise ------------53560 33990 32960 49440 51500 Recursive Queries and Aliases • • Recall some of the E-R diagrams and relations we dealt with had a recursive relationship. Results: ~ 60 ~ .name AS Tutor students s1.03) AS ProposedRaise FROM employee.
the table is missing something: We don't see who is tutoring Bill Smith.name AS Student. Results: TutorName ---------Bill Mary Sue NumberTutored ------------3 2 1 WHERE Clause Expressions • There are a number of expressions one can use in a WHERE clause. as is.• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • Student ---------Alex Mary Sue Liz Ed Petra Tutor ---------Bill Bill Bill Mary Mary Sue The above is called a "recursive" query because it access the same table two times.student_tutorid) AS NumberTutored FROM students s1.student_tutorid GROUP BY s1. COUNT(tutors.name AS TutorName. Here is one more twist: Suppose we were interested in those students who do not tutor anyone? Use RIGHT JOIN How many students does each tutor work with ? SELECT s1. tutors. students tutors WHERE s1.name. However.student_tutorid = tutors.name AS Tutor students s1 LEFT JOIN students tutors s1.studentid. ~ 61 ~ . Use LEFT JOIN: SELECT FROM ON Results: Student ---------Bill Alex Mary Sue Liz Ed Petra Tutor ---------Bill Bill Bill Mary Mary Sue s1.studentid = tutors. We give the table two aliases called s1 and tutors so that we can compare different aspects of the same table.
• • • • • • • • • • • • Subqueries using IN: SELECT FROM WHERE name employee department IN ('Finance'.salary) AND EXISTS (SELECT name FROM EMPLOYEE e3 ~ 62 ~ . • • • • • • • • • • Subqueries using EXISTS: SELECT FROM WHERE name. 'MIS'). "You Got an A" FROM students WHERE grade between 91 and 100 • • • • • • Subqueries using = (equals): SELECT name. salary employee EXISTS (SELECT name FROM EMPLOYEE e2 WHERE e2. This assumes the subquery returns only one tuple as a result.• Typical Logic expressions: COLUMN = value Also: < > = != <= >= • Also consider BETWEEN SELECT name. grade FROM students WHERE grade = ( SELECT MAX(grade) FROM students ). grade. the subquery returns a set of tuples. The IN clause returns true when a tuple matches a member of the set. Typically used for aggregate functions.salary > employee. In the above case. SELECT FROM WHERE name employee department IN (SELECT department FROM departments WHERE location = 'CA').
Note that chatacters within quotes are case sensitive.salary < employee. salary employee name LIKE '%en%'.salary > employee. • • • • • • • • • • • • • • NOT EXISTS: SELECT FROM WHERE name. the % character is used as the wild card although in some DBMS. Generally. ~ 63 ~ . salary employee name LIKE 'S%'. Show all employees whose name contains the letters 'en' SELECT FROM WHERE name. the * character is used.salary) Results: name --------Alice salary ---------52000 Above query shows all employees for whom there does not exist an employee who is paid less.salary) salary ---------50000 48000 33000 The above query shows all employees names and salaries where there is at least one person who makes more money (the first exists) and at least one person who makes less money (second exists). salary employee NOT EXISTS (SELECT name FROM EMPLOYEE e2 WHERE e2. • LIKE operator: Use the LIKE operator to perform a partial string match. Show all employees whose name starts with 'S' SELECT FROM WHERE name.• • • • • • • • • WHERE Results: name ----------Joe Jill Fred e3.
For example. salary employee name LIKE '%e%n%'. Remove all employees: DELETE employee. With no WHERE clause. DELETE will remove all tuples from a table. salary employee name LIKE '%e%n%' OR name LIKE '%n%e%'. • • • • • • Remove all employees working in California: DELETE employee WHERE department IN (SELECT department FROM department WHERE location = 'CA'). consider the department attribute in the Employee table as a Foreign Key. ~ 64 ~ . • • • Remove only employees making more than $50. • will not be successful if a constraint would be violated. UPDATE uses the SET clause to overwrite the value. Change the last name of an Employee: UPDATE employee SET last_name = 'Smith' WHERE employee_id = 'E1001'.Show all employees whose name contains the letter 'e' and the letter 'n' in that order: SELECT FROM WHERE name. Deleting Tuples with DELETE • • • • DELETE is used to remove tuples from a table. Show all employees whose name contains the letter 'e' and the letter 'n' in any order: SELECT FROM WHERE name. This is what we call enforcing Referential Integrity DELETE Change Values using UPDATE • • • • • • The UPDATE command is used to change attribute values in the database.000 DELETE employee WHERE salary > 50000. Removing a department would then be contingent upon no employees working in that department.
• • • • Give an Employee a raise: UPDATE employee SET salary = salary * 1. Defining Views • • It is possible to define a particular view of a table (or tables). • • • • • • • • • • • • • • • • • • • • • • • CREATE VIEW emp_salary AS SELECT first_name. city. Assume an employees table: employees(employee_id. last_name. zip. city. state. SELECT FROM WHERE * avg_sal_dept department = 'Finance'. For example. first_name. One can then query these views as if they were tabes SELECT * FROM emp_address ORDER BY last_name. CREATE VIEW avg_sal_dept AS SELECT department. AVG(salary) FROM employee GROUP BY department. last_name. department. last_name. if we commonly access just 2 or 3 columns in a table. salary) CREATE VIEW emp_address AS SELECT first_name. ~ 65 ~ . street. state. street. salary FROM employee. zip FROM employee. we can define a view on that table and then use the view name when specifying queries.05 WHERE employee_id = 'E1001'.
30 ms $1 / MB 50 . reliable and sharable storage methods with relatively rapid access time. we require persistent.Data Storage Characteristics: • • • • • • For a significant amount of data. Reliable .30 ns $100's / MB 40 .Data persists (lives on) after power is removed. Inexpensive . Persistent .Should facilitate sharing of data among many users.Should not have to be replaced due to excessive errors.100 ns $10's / MB 5 . Access time . Data Storage Hierarchy Processor Registers Cache memory Main Memory (Core) Magnetic Disk (hard disk) Optical Disk (CD-ROM) Magnetic Tape 1 .100 ms $1 / GB 100's ms to seconds less than $1 / GB Magnetic Disk Characteristics • We focus on magnetic disk ~ 66 ~ . Sharable . inexpensive.typically measured on a $ per Megabyte basis.Data should be accessible in a relatively short period of time.5 ns $1000's / MB 15 .
If the Block Size is 2.New record is inserted at the end of the file. etc.moving the disk read/write head to the right track 2.the number of tuples (records) that can fit into a single block. Pad with spaces. ~ 67 ~ . Update and Delete take n/2 time. Transfer time . Spanned Records: Records are allowed to span across block boundaries. Variable Length records: Each record is only as long as the data it contains. Seek time . Thus the Blocking factor is 2000/100 = 20 f = B/R Fixed length records: Each record is of fixed length. 1024.. Block . o Insert takes log2n plus this time to re-organize records. Blocking Factor .The smallest unit of memory a disk can read or write. then we can store 20 EMPLOYEE tuples (records) in one block.waiting for the disk to rotate the track under the head 3..000 bytes. Typically 512 bytes. o Select. records do not span across block boundaries. o Insert takes constant time. .e.• • • • • Access time is the dominant cost to consider Access time consists of: 1.the size of the block. Disk Rotation time . The goal is to minimize seek and disk rotation delay by orienting related data on the same or adjacent tracks. i. Record Storage on Disk • • • • • • • • Relations (records) are stored on disk with each tuple written one after the other (end to end).time to actually read the data (blocks) from the disk and place it on the bus for main memory. 32 Kilobytes. 2048. Example: EMPLOYEE takes 100 bytes to store one tuple (record). Block Size . Unspanned Records: A record is found in one and only one block. File Operations • Consider four basic File Operations: Operation Find Insert Modify Delete Similar SQL Statement Select Insert Update Delete • • Unordered file . in the file.New record is inserted in order. (n is the number of records) Ordered file ..
~ 68 ~ . The pointer is an address on disk where the rest of the data in the record can be found. as output. the physical disk address for the rest of the data in the record. o Types of Indexing • • • • An index is made up of two components: A key and a pointer The key is typically the key value for the relation and is mainly used to identify and look up records. Delete take log2n lookup on the index followed by constant time to access data record.• Select. o An index is maintained that points to the location on disk where the record is found. Selection time is constant. Update. Example: Assume employee records. Hashing • • • • Identify a function f that takes as input. In this case. Function f takes the ascii values of the first and last name and adds them. we use a series of hash buckets. Ordered Index • • Records are stored as they are inserted. o Insert takes constant time for the data itself plus log2n for the index o Select. Key attribute is stored in order in the index. It is possible function f can map two different keys to the same address. Delete take at least log2n Indexed file .New record is inserted at the end of the file. Update. The numeric result is the physical address for the record. the key for a relation and returns. Two types of indexes discussed here: Ordered index and Hashing.
IBM 3270 terminals or VT220 terminals) that have no processing power of their own.. Where do the data and DBMS reside ? 2.g.. e. One must examine several criteria: 1.g. Where are the application program executed (e.Database System Architectures: • • There are a number of database system architectures presently in use. which CPU) ? This may include the user interface. Where are business rules enforced ? Traditional Mainframe Architecture • • • • • • • Database (or files) resides on a mainframe computer. Multiple users access the applications through simple terminals (e.years of proven MF technology o Relatively low incremental cost per user (just add a terminal) Disadvantages: o Unable to effectively serve advanced user interfaces o Users unable to effectively manipulate data outside of standard applications ~ 69 ~ . Advantages: o Excellent security and control over applications o High reliability .g. Example: DB2 database and COBOL application programs running on an IBM 390.. COBOL programs or JCL scripts that access the database. Business rules are enforced in the applications running on the mainframe. Applications are run on the same mainframe computer. User interface is textmode screens. 3.
Personal Computer . the application is the DBMS. In such cases. Example: MS Access running on a PC. Business rules are enforced in the applications running on the PC.Stand-Alone Database • • • • • Database (or files) reside on a PC . Applications run on the same PC and directly access the database.on the hard disk. A single user accesses the applications. File Sharing Architecture ~ 70 ~ .
Also. Applications run on each PC on the LAN and access the same set of files on the file server. etc. Server Machines: o Run own copy of an operating system. MS Access. o Examples of clients: PCs with MS Windows operating system. PCs on the LAN map a drive letter (or volume name) on the file server.prices falling Disadvantages: o Limited data sharing ability . Example: Sharing MS Access files on a file server. Each user runs a copy of the same application and accesses the same files.a few users at most Classic Client/Server Architecture • • Client machines: o Run own copy of an operating system. Oracle Developer. A single file server stores a single copy of the database files. MS Visual Basic. Borland Delphi. memory. ~ 71 ~ . Forms and reports developed in: PowerBuilder. Advantages: o (limited) Ability to share data among several users o Costs of storage spread out among users o Most components are now commodity items . "C" or "C++". the applications must handle concurrency control. Business rules are enforced in the applications . o Run one or more applications using the client machine's CPU.• • • • • • • • • PCs are connected to a local area network (LAN). Possibly by file locking. o Application communicates with DBMS server running on server machine through a Database Driver o Database driver (middleware) makes a connection to the DBMS server over a network. The application is also the DBMS.
Client Applications can take full advantage of advanced user interfaces such as Graphical User Interfaces.g. etc. Provides a Listening daemon that accepts connections from client machines and submits transactions to DBMS on behalf of the client machines.so called "Thin Clients" A Mix of both. It is possible the network is not well suited for client/server communications and may become saturated. SQL) between them. 2. Sybase PowerBuilder running on a client PC. Disadvantages of client/server: 1. Example: Oracle RDBMS running on a server. Advantages of client/server: 1. 3. Stored procedures and triggers can help in this case. 4. DB2.so called "Fat Clients". 3. Middleware: o Small portion of software that sits between client and server.• • Run a Database Management System that manages a database. 2. Additional burden on DBMS server to handle concurrency control. RDBMS such as Oracle Server. o Examples: Sun Sparc server running UNIX operating system. Processing of the entire Database System is spread out over clients and server. 3. PC with Windows operating system. Implementation is more complex because one needs to deal with middleware and the network. 2. For Sybase: Sybase Open Client and Open Server. Sybase. Business rules may be enforced at: o o • • • The client application . As more business rule logic is programmed into the client side applications. 1. ~ 72 ~ . they can become unwieldy. o Examples: For Oracle: SQL*Net (or Net8) running on both client and server. Entirely on the database server .. o See ODBC below. o Establishes a connection from the client to the server and passes commands (e. etc. Informix. DBMS can achieve high performance because it is dedicated to processing transactions (not running applications).
3. Both vertical and horizontal.Distributed Database Architecture • In a distributed database system (DDS). Vertical: Columns in a table are split across multiple sites. • Data may be split up among the different servers or it may be replicated. Horizontal: Rows in a table are split up across multiple sites. multiple Database Management Systems run on multiple servers (sites) connected by a network. ~ 73 ~ . 2. Splitting up data can improve performance by reducing contention for tables. Data Partitioning Data may be split up (or partitioned) in several ways: 1.
~ 74 ~ . Phase 2: If all sites reply with "Y". We need mechanisms in place to ensure multiple copies of data are kept consistent. Smith Mrs. we need to consider committing a transaction that changes data on multiple sites. City Smithville Smithville State KY KY Zip 91232 91232 Zip 81992 81990 Partition 2 Customer ID 1003 1004 Name Address Mr. 2. 1. Builder Partition 2 CustID 1001 1002 1003 1004 Address 123 Lexington 12 Davis Ave. Smith Mrs. others can continue processing the transactions. Builder 661 Parker Rd. Axe Address 123 Lexington 12 Davis Ave. In distributed DB. Jones Mr. Smith Mrs. If any site replies "No". Jones Mr. Jones Address 123 Lexington 12 Davis Ave. then send a "Commit" message to all sites. 443 Grinder Ln. Axe Mr. City State Zip 91232 91232 81992 81990 Smithville KY Smithville KY Streetville GA 443 Grinder Ln. Also called a synchronous replication protocol.if one site fails. Distributed Commit Protocol such as Two Phase Commit (2PC). Broadville GA Mr. Improve reliability . Customer ID 1001 1002 Name Mr. then the transaction is aborted. Phase 1: Send a message to all sites: "Can you commit Transaction X?" All sites that can commit this transaction reply with "Y".Customer Table Customer ID 1001 1002 1003 1004 Horizontal Partitioning: Partition 1 Name Mr. 661 Parker Rd. Mr. Axe 443 Grinder Ln. 2PC is an example of a synchronous replication protocol. Improve performance by moving a copy of data closer to the users. Recall in a centralized DB we had the notion of a commit point. City Smithville Smithville Broadville Streetville State KY KY GA GA Zip 91232 91232 81992 81990 Data Replication • • • • • Data may also be replicated across multiple sites: 1. 2. City State Broadville GA Streetville GA Vertical Partitioning: Partition 1 CustID 1001 1002 1003 1004 Name Mr. Builder 661 Parker Rd.
for those of you from the UNIX world. ODBC has two main portions that reside on the client: A Driver Manager and one or more DBMS drivers. update and manipulate data on a server A DBMS Driver is typically supplied by the individual DBMS vendor and contains routines to convert requests from the Driver Manager into commands the specific DBMS understands. Open DataBase Connectivity (ODBC) • • • • • • • Middleware has historically been proprietary. distributed database systems offer more flexibility. distributed database systems are also much harder to design and develop. we take snapshots of a master database and propagate the changes to other sites on some periodic basis.• • • In Asynchronous replication. The Driver Manager presents a uniform interface to all clients. How can a single client access multiple DBMS servers with minimal changes ? ODBC is middleware software that can connect a client to multiple servers from different vendors. In general. BTW. Try this: Visit several DBMS vendor's web sites and see if they offer an ODBC driver that can be downloaded to your PC. This consists of a set of function calls to query. simply replace "PCs" with "UNIX Workstations" in the phrases above. higher performance and greater levels of independence over centralized systems. Note also subtle differences in SQL and how it is implemented in various DBMS. However. Security is also more difficult to enforce. Triggers and Stored Procedures ~ 75 ~ . control and administrate.
Employee directories. One needs: An HTTP (web) server.. Also very useful when a large number of database accesses must be done with just a small result being passed back to the client. etc. Results are formatted in HTML and returned to the user's browser. Many examples: Retail store with current products and price lists. Programming triggers requires special attention is paid to how transaction execute. On-line bankingbanks with account balance information. e.g. Stored Procedures are similar to triggers: They are functions and procedures that are stored in the database. Perl) that supports CGI. etc.. Triggers may cause locks to be held longer than expected or may have other side effects.g. deleting a row in a table.g. IBM DB2 supports triggers written in just about any language such as "C" and Java. By far this is the predominant method. Copies of this code do not need to be distributed to the clients. Provide the web user with a form or other means to invoke a query on the database in real time. Many DBMS now have the web server built in (or closely tied) to the database.• • • • • • • • • Triggers are procedures or functions stored in the DBMS and are invoked when certain events occur. information is passed to a CGI script that formats the query and submits it to the DBMS. middleware to connect to the database. The latter 2 are similar. The trigger will automatically insert a new Order record in the Orders table if the quantity in inventory falls below a certain level. Results are returned to the CGI script which then formats the output in HTML. Example: A trigger may fire after each time an inventory record is updated.. ~ 76 ~ . the DBMS. 2. updating data in a table. Using traditional HTML forms. users can specify some or all of the query. 2. Several approaches to making database data available on-line: 1. Most major DBMS support triggers. Oracle Web Applications Server. Provide a mechanism to query the database in real time and format the results in HTML. Periodically dump a database table to an HTML file and make the HTML file available on the web (e. some language (e. Stored procedures are useful in cases when standard applications logic must be implemented across all applications. Triggers are used to enforce business rules that all applications that use the database must adhere to. Events include: Inserting a new row into a table. There are two main ways to carry out dynamic real-time queries from the web: 1. MS Access Internet Wizard). The difference is in the last one. Internet and Intranet Databases • • • • • Companies are discovering that database can provide excellent content for web pages. Stored procedures may be called by triggers or by application programs. e.. 3. Oracle supports triggers written in PL/SQL.g.
. perform the appropriate query and then format the results in HTML. Read customer information 2. How can we prevent users from interfering with each other's work ? 2. Write charges Suppose that after the second step. MultiUser Databases: • • Multiuser database . How can we safely process transactions on the database without corrupting or losing data ? 3. Or for some reason.g.more than one user processes the database at the same time Several issues arise: 1. the database crashes. how can we recover without loosing all of our data ? Transaction Processing • • • • We need the ability to control how transactions are run in a multiuser database. ~ 77 ~ .Stored procedures in the DBMS are used to accept input from HTML forms. power failure or system crash). changes can not be written. Consider the following transaction that reserves a seat on an airplane flight and changes the customer: 1... If there is a problem (e. A transaction is a set of read and write operations that must either commit or abort. Write reservation information 3.
Assume an initial balance of $200. where all actions are permanently saved in the database or they can abort in which case none of the actions are saved. each executing similar transactions: Example #1: User A Read Salary for emp 101 Multiply salary by 1. the incorrect amount (3) is written to the database. Assume there are 10 units in inventory for Prod 200: Read inventory for Prod 200 Read inventory for Prod 200 Decrement inventory by 5 Decrement inventory by 7 Write inventory for Prod 200 Write inventory for Prod 200 for for for for for for user user user user user user A B A B A B Or something similar like: Read inventory for Prod 200 Decrement inventory by 5 Write inventory for Prod 200 Read inventory for Prod 200 Decrement inventory by 7 Write inventory for Prod 200 • • • • for for for for for for user user user user user user A A A B B B In the first case. Thus User B's decrement operation will fail. This is called the Lost Update problem because we lost the update from User A . Consider how the operations for user's A and B might be interleaved as in example #2.04 Write Salary for emp 101 Example #2: User A Read inventory for Prod 200 Decrement inventory by 5 Write inventory for Prod 200 User B Read inventory for Prod 200 Decrement inventory by 7 Write inventory for Prod 200 First.03 Write Salary for emp 101 User B Read Salary for emp 101 Multiply salary by 1.it was overwritten by user B. All operations in a transaction must be executed as a single unit . User A reads the balance ~ 78 ~ . Another way to say this is transactions are Atomic. Here is another example.• • • • • • • • • • • • • • • • • • • • • • • • • • • Transactions can either reach a commit point. The second example works because we let user A write the new value of Prod 200 before user B can read it. Consider two users. what should the values for salary (in the first example) really be ? The DBMS must find a way to execute these two transactions concurrently and ensure the result is what the users (and designers) intended. These two are examples of the Lost Update or Concurrent Update problem. User's A and B share a bank account. Some changes to the database can be overwritten.Logical Unit of Work.
instead of interleaving (mixing) the operations of the two transactions. A3. A2. If we do concurrency control properly. C3 Then the above schedule of transactions and operations is serialized. Concurrency Control is a method for controlling or scheduling the operations in such a way that concurrent transactions can be executed. C2. A lock is a logical flag set by a transaction to alert other transactions the data item is in use. Often reported as Transactions per second or TPS. B2. C1. B1. A group of two or more concurrent transactions are serializable if we can order their operations so that the final result is the same as if we had run them in serial order (one after another). Consider transaction A. C and D. B3. A2. ~ 79 ~ . C2. Characteristics of Locks • Locks may be applied to data items in two ways: Implicit Locks are applied by the DBMS Explicit Locks are applied by application programs. Transaction throughput: The number of transactions we can perform in a given time period.• • • • • • • User User User User User A B A B B deducts $100 from the balance reads the balance writes the new balance of $100 deducts $100 from the balance writes the new balance of $100 The reason we get the wrong final result (remaining balance of $100) is because transaction B was allowed to read stale data. Suppose. Locking is one such means. A3. B3. Each has 3 operations. or B then A) User User User User User User A A A B B B reads the balance deducts $100 from the balance writes the new balance of $100 reads the balance (which is now $100) deducts $100 from the balance writes the new balance of $0 • • • • • • • • • • • • If we insist only one transaction can execute at a time. If executing: A1. B2. then we can maximize transaction throughput while avoiding any chance. Locking is done to data items in order to reserve them for future operations. C1. This is called the inconsistent read problem. then performance will be quite poor. Concurrency Control and Locking • • • We need a way to guarantee that our concurrent transactions can be serialized. B1. we execute one after the other (note it makes no difference which order: A then B. B. C3 has the same result as executing: A1. in serial order.
A transaction acquires locks on data items it will need to complete the transaction. an entire database This is referred to as the Lock granularity • Locks may be of type types depending on the requirements of the transaction: 1.• Locks may be applied to: 1. An Exclusive Lock prevents any other transaction from reading or modifying the locked item. 2PL has two phases: Growing and shrinking. this time using locks: User A places an exclusive lock on the balance User A reads the balance User A deducts $100 from the balance User B attempts to place a lock on the balance but fails because A already has an exclusive lock User B is placed into a wait state User A writes the new balance of $100 User A releases the exclusive lock on the balance User User User User User User User User B B B B A A A A places an exclusive lock on the balance reads the balance deducts $100 from the balance writes the new balance of $100 places a shared lock on item raise_rate reads raise_rate places an exclusive lock on item Amy_salary reads Amy_salary Here is a more involved example: User B places a shared lock on item raise_rate User B reads raise_rate ~ 80 ~ . a page (memory segment) (many rows worth) 4. 2. This is called the growing phase. an entire table 5. Once one lock is released. all no other lock may be acquired. an entire row of a table 3. 2PL is a concurrency control mechanism that ensure serializability. 2. Two Phase Locking • • • • • • • • • • • • • • • • • • • • • • • • • The most commonly implemented locking mechanism is called Two Phased Locking or 2PL. 1. a single data item (value) 2. This is called the shrinking phase. Consider our prior example. A Shared Lock allows another transaction to read an item but prevents another transaction from writing the item.
Consider: User A places an exclusive lock on item 1001 User B places an exclusive lock on item 2002 User A attempts to place an exclusive lock on item 2002 User A placed into a wait state User B attempts to place an exclusive lock on item 1001 ~ 81 ~ .• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • User A calculates a new salary as Amy_salary * (1+raise_rate) User User User User B B B B places an exclusive lock on item Bill_salary reads Bill_salary calculates a new salary as Bill_salary * (1+raise_rate) writes Bill_salary User A writes Amy_salary User A releases exclusive lock on Amy_salary User B releases exclusive lock on Bill_Salary User B releases shared lock on raise_rate User A releases shared lock on raise_rate Here is another example: User A places a shared lock on raise_rate User B attempts to place an exclusive lock on raise_rate Placed into a wait state User A places an exclusive lock on item Amy_salary User A reads raise_rate User A releases shared lock on raise_rate User B places an exclusive lock on raise_rate User A reads Amy_salary User B reads raise_rate User A calculates a new salary as Amy_salary * (1+raise_rate) User B writes a new raise_rate User B releases exclusive lock on raise_rate User A writes Amy_salary User A releases exclusive lock on Amy_salary Deadlock • • • • • • • Locking can cause problems. however.
g. 2. 1. 5. the latest database save is restored and all of the transactions are reapplied (by users) to bring the database back up to the point just before the crash. An operating system crash can terminate the DBMS processes 2. Rollback: Undo any partially completed transactions (ones in progress when the crash occurred) by applying the before images to the database. This may include restoring lost data up to the point of the event (e.. Database Recovery and Backup • • • • There are many situations in which a transaction may not reach a commit or abort point. Re-applying concurrent transactions is not straight forward. Human error can result in deletion of critical data. The system might lose power 4. system crash). However. maintain a more intelligent log of the transactions that have been applied. Several shortcomings: 1. ~ 82 ~ .. data in the database may become inconsistent or lost. This transaction log Includes before images and after images Before Image: A copy of the table record (or page) of data before it was changed by the transaction. Two main ways to deal with deadlock. • This is called a deadlock. Prevent it in the first place by giving each transaction exclusive rights to acquire all locks needed before proceeding. Reprocessing • • • In a Reprocessing approach. After Image: A copy of the table record (or page) of data after it was changed by the transaction. Transactions might have other (physical) consequences 3. Two approaches are discussed here: Reprocessing and Rollback/Rollforward. In any of these situations. A disk may fail or other hardware may fail. The DBMS can crash 3. A second transaction has locked those needed items but is awaiting the release of locks the first transaction is holding so it can continue. Automated Recovery with Rollback / Rollforward • • • • We apply a similar technique: Make periodic saves of the database (time consuming operation). 1. Database Recovery is the process of restoring the database and the data to a consistent state.• • • • User B placed into a wait state . One transaction has locked some of the resources and is waiting for locks so it can complete. Time required to re-apply transactions 2. then break it by aborting one of the transactions. the database is periodically backed up (a database save) and all transactions applied since the last save are recorded If the system crashes. Allow the deadlock to occur.
Nightly: Do an incremental backup onto different tapes for each night of the week. An Incremental backup will backup only those data changed or added since the last full backup. May be infeasible to do often. we would need to rollback to the last database save and then rollforward to the point just before the crash. Sometimes called a delta backup. 2. Then start up again. In the worst case. However. Checkpoints can also be taken (less time consuming) in between database saves. data may become unreadable. Recovery process uses both rollback and rollforward to restore the database. when an DBMS is running. The DBMS flushes all pending transactions and writes all data to disk and transaction log. do a full backup . One solution: Shut down the DBMS (and thus all applications). and full backup of the database onto a fresh tape(s). Most modern DBMS allow for incremental backups. ~ 83 ~ . We typically rely on backing up the database to cheaper magnetic tape or other backup medium for a copy that can be restored. Database Backup • • • • • • • When secondary media (disk) fails.copy everything on to tape. it is not possible to backup its files as the resulting backup copy on tape may be inconsistent. Database can be recovered from the last checkpoint in much less time.• • • • • • Rollforward: Redo the transactions by applying the after images to the database. Follows something like: 1. Weekend: Do a shutdown of the DBMS. This is done for transactions that were committed before the crash.
This action might not be possible to undo. Are you sure you want to continue?