This action might not be possible to undo. Are you sure you want to continue?

1. The major purpose of a database system is to provide users with an abstract view of the system. The system hides certain details of how data is stored and created and maintained Complexity should be hidden from database users. 2. There are several levels of abstraction: 1. Physical Level: How the data are stored. E.g. index, B-tree, hashing. Lowest level of abstraction. Complex low-level structures described in detail. 2. Conceptual Level: Next highest level of abstraction. Describes what data are stored. Describes the relationships among data. Database administrator level. 3. View Level: Highest level. Describes part of the database for a particular group of users. Can be many different views of a database. E.g. tellers in a bank get a view of customer accounts, but not of payroll data. Fig. 1.1 (figure 1.1 in the text) illustrates the three levels.

Figure 1.1: The three levels of data abstraction

The E-R Model 1. The entity-relationship model is based on a perception of the world as consisting of a collection of basic objects (entities) and relationships among these objects. o An entity is a distinguishable object that exists.

o o o o o o

Each entity has associated with it a set of attributes describing it. E.g. number and balance for an account entity. A relationship is an association among several entities. e.g. A cust_acct relationship associates a customer with each account he or she has. The set of all entities or relationships of the same type is called the entity set or relationship set. Another essential element of the E-R diagram is the mapping cardinalities, which express the number of entities to which another entity can be associated via a relationship set.

We'll see later how well this model works to describe real world situations. 2. The overall logical structure of a database can be expressed graphically by an E-R diagram: o rectangles: represent entity sets. o ellipses: represent attributes. o diamonds: represent relationships among entity sets. o lines: link attributes to entity sets and entity sets to relationships. See figure 1.2 for an example.

Figure 1.2: A sample E-R diagram.

The Object-Oriented Model 1. The object-oriented model is based on a collection of objects, like the E-R model. o An object contains values stored in instance variables within the object. o Unlike the record-oriented models, these values are themselves objects. o Thus objects contain objects to an arbitrarily deep level of nesting. o An object also contains bodies of code that operate on the the object. o These bodies of code are called methods. o Objects that contain the same types of values and the same methods are grouped into classes. o A class may be viewed as a type definition for objects. o Analogy: the programming language concept of an abstract data type. o The only way in which one object can access the data of another object is by invoking the method of that other object. o This is called sending a message to the object. o Internal parts of the object, the instance variables and method code, are not visible externally. o Result is two levels of data abstraction.

For example, consider an object representing a bank account.

The object contains instance variables number and balance. The object contains a method pay-interest which adds interest to the balance. Under most data models, changing the interest rate entails changing code in application programs. o In the object-oriented model, this only entails a change within the pay-interest method. 2. Unlike entities in the E-R model, each object has its own unique identity, independent of the values it contains: o Two objects containing the same values are distinct. o Distinction is created and maintained in physical level by assigning distinct object identifiers.

o o o

Data Independence

1. The ability to modify a scheme definition in one level without affecting a scheme definition in a higher level is called data independence. 2. There are two kinds: o Physical data independence The ability to modify the physical scheme without causing application programs to be rewritten Modifications at this level are usually to improve performance o Logical data independence The ability to modify the conceptual scheme without causing application programs to be rewritten Usually done when logical structure of database is altered 3. Logical data independence is harder to achieve as the application programs are usually heavily dependent on the logical structure of the data. An analogy is made to abstract data types in programming languages.

**Data Definition Language (DDL)
**

1. Used to specify a database scheme as a set of definitions expressed in a DDL 2. DDL statements are compiled, resulting in a set of tables stored in a special file called a data dictionary or data directory. 3. The data directory contains metadata (data about data) 4. The storage structure and access methods used by the database system are specified by a set of definitions in a special type of DDL called a data storage and definition language 5. basic idea: hide implementation details of the database schemes from the users

**Data Manipulation Language (DML)
**

1. Data Manipulation is: o retrieval of information from the database o insertion of new information into the database o deletion of information in the database o modification of information in the database 2. A DML is a language which enables users to access and manipulate data. The goal is to provide efficient human interaction with the system. 3. There are two types of DML: o procedural: the user specifies what data is needed and how to get it o nonprocedural: the user only specifies what data is needed Easier for user May not generate code as efficient as that produced by procedural languages 4. A query language is a portion of a DML involving information retrieval only. The terms DML and query language are often used synonymously.

**Entities and Entity Sets
**

An entity is an object that exists and is distinguishable from other objects. For instance, John Harris with S.I.N. 890-12-3456 is an entity, as he can be uniquely identified as one particular person in the universe. An entity may be concrete (a person or a book, for example) or abstract (like a holiday or a concept). An entity set is a set of entities of the same type (e.g., all persons having an account at a bank). Entity sets need not be disjoint. For example, the entity set employee (all employees of a bank) and the entity set customer (all customers of the bank) may have members in common. An entity is represented by a set of attributes. o E.g. name, S.I.N., street, city for ``customer'' entity. o The domain of the attribute is the set of permitted values (e.g. the telephone number must be seven positive integers). Formally, an attribute is a function which maps an entity set into a domain. o Every entity is described by a set of (attribute, data value) pairs. o There is one pair for each attribute of the entity set. o E.g. a particular customer entity is described by the set {(name, Harris), (S.I.N., 890-123-456), (street, North), (city, Georgetown)}.

**An analogy can be made with the programming language notion of type definition.
**

The concept of an entity set corresponds to the programming language type definition. A variable of a given type has a particular value at a point in time. Thus, a programming language variable corresponds to an entity in the E-R model.

**Figure 2-1 shows two entity sets. We will be dealing with five entity sets in this section:
**

branch, the set of all branches of a particular bank. Each branch is described by the attributes branch-name, branch-city and assets. customer, the set of all people having an account at the bank. Attributes are customername, S.I.N., street and customer-city. employee, with attributes employee-name and phone-number. account, the set of all accounts created and maintained in the bank. Attributes are account-number and balance. transaction, the set of all account transactions executed in the bank. Attributes are transaction-number, date and amount.

**Relationships & Relationship Sets
**

A relationship is an association between several entities. A relationship set is a set of relationships of the same type. Formally it is a mathematical relation on If (possibly non-distinct) sets.

are entity sets, then a relationship set R is a subset of

where

is a relationship.

For example, consider the two entity sets customer and account. (Fig. 2.1 in the text). We define the relationship CustAcct to denote the association between customers and their accounts. This is a binary relationship set (see Figure 2.2 in the text). Going back to our formal definition, the relationship set CustAcct is a subset of all the possible customer and account pairings. This is a binary relationship. Occasionally there are relationships involving more than two entity sets. The role of an entity is the function it plays in a relationship. For example, the relationship works-for could be ordered pairs of employee entities. The first employee takes the role of manager, and the second one will take the role of worker. A relationship may also have descriptive attributes. For example, date (last date of account access) could be an attribute of the CustAcct relationship set.

Attributes

with attributes phonenumber and location. (Think about the CustAcct relationship. Consider the entity set employee with attributes employee-name and phone-number. This is one-to-many from account to transaction. An entity in B is associated with any number in A. For binary relationship sets between entity sets A and B. o o o o Consider account and transaction entity sets. Thus account is dominant and transaction is subordinate.3) 2.4) 3. (Figure 2.. and a relationship log between them.6) The appropriate mapping cardinality for a particular relationship set depends on the real world being modeled. New definition may more accurately reflect the real world. Many-to-one: An entity in A is associated with at most one entity in B.) Existence Dependencies: if the existence of entity X depends on the existence of entity Y. One-to-many: An entity in A is associated with any number in B. Then we have two entity sets. and an entity in B is associated with at most one entity in A. and the semantics associated with the attribute in question. then X is said to be existence dependent on Y. .It is possible to define a set of entities and the relationships among them in a number of different ways. The question of what constitutes an entity and what constitutes an attribute depends mainly on the structure of the real world situation being modeled. Many-to-many: Entities in A and B are associated with any number from each other. (Figure 2. Mapping Cardinalities: express the number of entities to which another entity can be associated via a relationship. (Or we say that Y is the dominant entity and X is the subordinate entity.5) 4. Mapping Constraints An E-R scheme may define certain constraints to which the contents of a database must conform.. The main difference is in how we deal with attributes.) For example. We cannot extend this argument easily to making employee-name an entity. This new definition allows employees to have several (or zero) phones. and the relationship set EmpPhn defining the association between employees and their phones. (Figure 2. An entity in B is associated with at most one entity in A. If an account entity is deleted. its associated transaction entities must also be deleted. the mapping cardinality must be one of: 1. (Figure 2. We could argue that the phone be treated as an entity itself. One-to-one: An entity in A is associated with at most one entity in B.

Different transactions on different accounts could share the same number. A superkey for which no subset is a superkey is called a candidate key. transaction-number distinguishes transaction entities within the same account (and is thus the discriminator). Thus transaction is a weak entity set. It is existence-dependent on account.N.N. Member of a strong entity set is a dominant entity. The discriminator of a weak entity set is a set of attributes that allows this distinction to be made. (Why?) The idea of strong and weak entity sets is related to the existence dependencies seen earlier. The entity set transaction has attributes transaction-number.I. and uniquely identifies a customer entity. The primary key of account is account-number. but we need a means of distinguishing among the entities. A primary key is a candidate key (there may be more than one) chosen by the DB designer to identify entities in an entity set. date and amount. A weak entity set does not have a primary key. For example. taken collectively. as it is minimal. One that does have a primary key is called a strong entity set. These are not sufficient to form a primary key (uniquely identify a transaction). To illustrate: transaction is a weak entity. customer-name and S. A superkey is a set of one or more attributes which. is a candidate key. A superkey may contain extraneous attributes. An entity set that does not possess sufficient attributes to form a primary key is called a weak entity set.Keys Differences between entities must be expressed in terms of attributes. The primary key of a weak entity set is formed by taking the primary key of the strong entity set on which its existence depends (see Mapping Constraints) plus its discriminator. it must be part of a one-to-many relationship set. Note that customer-name alone is not. In the example above. This relationship set should have no descriptive attributes. and we are often interested in the smallest superkey. . For example. S. For a weak entity set to be meaningful. in the entity set customer. allow us to identify uniquely an entity in the entity set. is a superkey. Member of a weak entity set is a subordinate entity.I. as two customers could have the same name.

). The attributes of the relationship set custacct are then (account-number. This is enough information to enable us to relate an account to a person. With no descriptive attributes: depends on the mapping cardinality and the presence many-to-many: all attributes in . and the relationship is many-to-many. transaction-number). we might add the attribute date to the above relationship set. For example. .N. those are also included in its attribute set. ellipses representing attributes. S.I. Primary Keys for Relationship Sets The attributes of a relationship set are the attributes that comprise the primary keys of the entity sets involved in the relationship set. one-to-many: primary key for the ``many'' entity. Descriptive attributes may be added. Its components are: rectangles representing entity sets. signifying the date of last access to an account by a particular customer. For example: S. The primary key of a relationship set of descriptive attributes. depending on the mapping cardinality and the semantics involved (see text). Note that this attribute cannot instead be placed in either entity set as it relates to both a customer and an account. So the primary key for transaction would be (account-number. Just Remember: The primary key of a weak entity is found by taking the primary key of the strong entity on which it is existence-dependent.I. If the relationship has descriptive attributes. plus the discriminator of the weak entity set. is the primary key of customer.N. lines linking attributes to entity sets and entity sets to relationship sets. diamonds representing relationship sets. The Entity Relationship Diagram We can express the overall logical structure of a database graphically with an E-R diagram. and account-number is the primary key of account.

8 to 2. so do some examples.10 show some examples. Other Styles of E-R Diagram The text uses one particular style of diagram.10: One-to-one from customer to account Go back and review mapping cardinalities. Figures 2.In the text. . The arrow positioning is simple once you get it straight in your mind.7: An E-R diagram Figure 2.9: Many-to-one from customer to account Figure 2. Figure 2.8: One-to-many from customer to account Figure 2. They express the number of entities to which an entity can be associated via a relationship. Many variations exist. Think of the arrow head as pointing to the entity that ``one'' refers to. lines may be directed (have an arrow on the end) to signify mapping cardinalities for relationship sets.

o Symbols. o Can also use (0. o Means attribute can have more than one value. (See figure 2. (1. The relationship works-for might be ordered pairs of employees (first is manager. o E. second is worker). (See Elmasri & Navathe. n and m used. o What happens with descriptive attributes? o In this case. n to m.confusing at first. 1. but gives more information. 1 to 1. p 58.n) entity 2 indicates that entity 1 is related to between 0 and 1 occurrences of entity 2 (optional). o E. chapter 21.) o E.n).1) -.(1. o E. Extended E-R diagrams allowing more details/constraints in the real world to be recorded. Figure 2. o Less symbols. entity 1 (0.) o Composite attributes. clearer picture.g. For example. o Entity 2 is related to at least 1 and possibly many occurrences of entity 1 (mandatory).Some of the variations you will see are: Diamonds being omitted . In the E-R diagram. They are useful when the meaning of a relationship set needs clarification. o Typically used on near end of link . this can be shown by labelling the lines connecting entities (rectangles) to relationships (diamonds). Roles are normally explicit and not specified. o Generalization and specialization.1) indicates minimum zero (optional).1) or (1. o Subclasses and superclasses. Roles in E-R Diagrams The function that an entity plays in a relationship is called its role. hobbies. Numbers instead of arrowheads indicating cardinality. maximum 1. o Easier to understand than arrowheads. 1 to n.g (0.11). the entity sets of a relationship may not be distinct. (See Elmasri & Navathe. we have to create an intersection entity to possess the attributes. o Derived attributes.a link between entities indicates a relationship. Multivalued attributes may be indicated in some manner.n).g. o Has to be normalized later on.11: E-R diagram with role indicators .g. A range of numbers indicating optionality of relationship.

. each located in a specific bank branch.13) shows an example. and that an account may belong to several different customers. For example. the previouslymentioned weak entity set transaction is dependent on the strong entity set account via the relationship set log. We'll use the E-R diagram of Figure 2.13: E-R diagram with a ternary relationship This E-R diagram says that a customer may have several accounts. Figure 2. Reducing E-R Diagrams to Tables A database conforming to an E-R diagram can be represented by a collection of tables.12: E-R diagram with a weak entity set Nonbinary Relationships Non-binary relationships can easily be represented.12) shows this example. Figure 2. Figure 2. Figure 2.14) as our example.Weak Entity Sets in E-R Diagrams A weak entity set is indicated by a doubly-outlined box.

.18 in the text). 2. see the table of figure 2.16. and denote the set of all account numbers and all account In general.g. delete and modify rows (to reflect changes in the real world). A row of a table will consist of an n-tuple where n is the number of attributes. We refer to the set of all possible rows as the cartesian product of the sets of all attribute values.Figure 2. there is a unique table which is assigned the name of the corresponding set. The primary key of account (on which transaction depends) is account-number. Each table has a number of columns with unique names. the weak entity set transaction has three attributes: transaction-number. respectively. date and amount. for a table of n columns.14: E-R diagram with strong and weak entity sets For each entity set and relationship set. We can add. For example. we may denote the cartesian product of by Representation of Weak Entity Sets For a weak entity set.14. This gives us the table of figure 2.2. where balances. Representation of Strong Entity Sets We use a table with one column for each attribute of the set. We may denote this as for the account table. Each row in the table corresponds to one entity of the entity set. For the entity set account. the table contains a subset of the set of all possible rows.14 . we add columns to the table corresponding to the primary key of the strong entity set on which the weak set is dependent. (E. Actually. Figs.

Non-binary Relationship Sets The ternary relationship of Figure 2. Linking a Weak to a Strong Entity These relationship sets are many-to-one.Representation of Relationship Sets Let R be a relationship set involving entity sets . and account-number. . Generalization Consider extending the entity set account by classifying accounts as being either savingsaccount or chequing-account.19. generalization is shown by a triangle. and is thus redundant.13 gives us the table of figure 2. as shown in Figure 2.) We can express the similarities between the entity sets by generalization. CustAcct also has a descriptive attribute. date. This is the process of forming containment relationships between a higher-level entity set and one or more lower-level entity sets. we add them too: An example: The relationship set CustAcct involves the entity sets customer and account. and have no descriptive attributes. There are no descriptive attributes in this example.N. The primary key of the weak entity set is the primary key of the strong entity set it is existence-dependent on.I. As required. Their respective primary keys are S. we take the primary keys of each entity set. The table for the relationship set would have the same attributes. This gives us the table of figure 2. (savings has interest-rate and chequing has overdraft-amount.18. The table corresponding to the relationship set R has the following attributes: If the relationship has k descriptive attributes. In E-R diagrams.17. Each of these is described by the attributes of account plus additional attributes. plus its discriminator.

Distinction made through attribute inheritance. However. We get the E-R diagram shown in Figure 2. Attributes of higher-level entity are inherited by lower-level entities. as this would obscure the logical structure of this scheme. Two methods for conversion to a table form: o Create a table for the high-level entity.20: E-R diagram with redundant relationships Relationship sets work and uses could be combined into a single set. Figure 2. The solution is to use aggregation.Figure 2. When would we need such a thing? Consider a DB with information about employees who work on a particular project and use a number of machines doing that work. o Create only tables for the lower-level entities.19: Generalization Generalization hides differences and emphasizes similarities. Aggregation The E-R model cannot express relationships among relationships.20. plus tables for the lower-level entities containing also their specific attributes. . they shouldn't be.

An abstraction through which relationships are treated as higher-level entities. Use of a strong or weak entity set. The table for relationship set uses contains a column for each attribute in the primary key of machinery and work.21 shows the E-R diagram with aggregation. as shown in Figure 2. Design of an E-R Database Scheme The E-R data model provides a wide range of choice in designing a database scheme to accurately model some real-world situation. .13 could be replaced by a pair of binary relationships. For our example. We create a table for each entity and relationship set as before. Figure 2. Appropriateness of generalization. we treat the relationship set work and the entity sets employee and project as a higher-level entity set called work. Whether to use an attribute or an entity set. Some of the decisions to be made are Using a ternary relationship versus two binary relationships. Mapping Cardinalities The ternary relationship of Figure 2.21: E-R diagram with aggregation Transforming an E-R diagram with aggregation into tabular form is easy.22. Whether an entity set or a relationship set best fit a real-world concept. Appropriateness of aggregation. Figure 2.

13 is more appropriate.23: E-R diagram with account as a relationship set This new representation cannot model adequately the situation where customers may have joint accounts.13. The design of figure 2.22: Representation of Figure 2.13 using binary relationships However.22. this method works. Both Figure 2. Figure 2.22 show account as an entity. Use of Entity or Relationship Sets It is not always clear whether an object is best represented by an entity set or a relationship set. an account can be related to either a customer or a branch alone. (Why not?) If every account is held by only one customer.13 and Figure 2. Figure 2. .Figure 2.23 shows how we might model an account as a relationship between a customer and a branch. there is a distinction between the two representations: In Figure 2. relationship between a customer and account can be made only if there is a corresponding branch. In Figure 2. as in the banking world we expect to have an account relate to both a customer and a branch.

each having a unique name.1: The deposit and customer relations. E. . Basic Structure 1.Structure of Relational Database 1. A substantial theory has been developed for relational databases. a table of n columns must be a subset of 2. 2. That is. We will use the terms relation and tuple in place of table and row from now on. For each attribute there is a permitted set of values. Figure 3. deposit is a subset of In general. deposit contains a subset of the set of all possible rows. Thus a table represents a collection of relationships. Figure 3. Then. A row in a table represents a relationship among a set of values. Mathematicians define a relation to be a subset of a Cartesian product of a list of domains. the domain of bname is the set of all branch names.g. called the domain of that attribute. any row of deposit consists of a four-tuple where In general. and respectively. o o o It has four attributes. and the remaining attributes' domains Let denote the domain of bname. There is a direct correspondence between the concept of a table and the mathematical concept of a relation. A relational database consists of a collection of tables.1 shows the deposit and customer tables for our banking example. You can see the correspondence with our tables. .

cname: string. A relation scheme is a list of attributes and their corresponding domains. o The set of all sets of integers is not. balance: integer). o Why? Integers do not have subparts. . o Then [bname] = [1] = the value of on the bname attribute. The text uses the following conventions: o italics for all names o lowercase names for relations and attributes o names beginning with an uppercase for relation schemes These notes will do the same. We'll also require that the domains of all attributes be indivisible units. 3. Note that customers are identified by name. the relation scheme for the deposit relation: o Deposit-scheme = (bname. Some more formalities: o let the tuple variable refer to a tuple of the relation . this would not be allowed. o A domain is atomic if its elements are indivisible units. o For example. cname. o So [bname] = [1] = ``Downtown''. If we wish to specify domains. For example. 4. but sets do .2 shows the E-R diagram for a banking enterprise.3. balance) We may state that deposit is a relation on scheme Deposit-scheme by writing deposit(Deposit-scheme). the set of integers is an atomic domain. Database Scheme 1. Figure 3.the integers comprising them. We distinguish between a database scheme (logical design) and a database instance (data in the database at a point in time). o and [cname] = [3] = ``Johnson''. as two or more customers might share the same name. account#. we can write: o (bname: string. In the real world. o We say to denote that the tuple is in relation . 2. o We could consider integers non-atomic if we thought of them as ordered lists of digits. account#: integer.

balance. o {bname} is a superkey.Figure 3. amount) Note: some attributes appear in several relation schemes (e. street. assets. Why not put all attributes in one relation? Suppose we use one large relation instead of customer and deposit: o o o o o o Account-scheme = (bname. 3. For example. We would have to use null values for these fields. we can do this without using null values Keys 1. More formally. bname. o {bname} is a candidate key. cname). and provides a way of relating tuples of distinct relations. In other words. o {bname. as branches may be in the same city. 2. By using two separate relations. street. cname.g. o If and are in . account#.2: E-R diagram for the banking enterprise 4. balance) o Borrow-scheme = (bname. The notions of superkey. The primary key for Customer-scheme is {cname}. cname. 5. as we have no values for the address. loan#. candidate key and primary key all apply to the relational model. o {bname. o {bcity} is not a superkey. we cannot build a tuple. if we say that a subset of is a superkey for . Null values cause difficulties in the database. account#. ccity) If a customer has several accounts. The relation schemes for the banking example used throughout the text are: o Branch-scheme = (bname. o We will use {bname} as our primary key. bcity) o Customer-scheme = (cname. 4. we are restricting consideration to relations in which no two distinct tuples have the same values on all attributes in . ccity) o Deposit-scheme = (bname. we must duplicate her or his address for each account. If a customer has an account but no current address. in Branch-scheme. cname. as the superkey {bname} is contained in it. and . This is legal. bcity} is a superkey. bcity} is not a candidate key.

o The Relational Algebra 1. Query Languages 1. with the predicate appearing as a subscript. defined in terms of the fundamental operations: set-intersection natural join division assignment o Operations produce a new relation as a result. Select is denoted by a lowercase Greek sigma ( ). where the user instructs the system to perform a sequence of operations on the database. then . The relational algebra is a procedural query language. This will compute the desired information. The argument relation is given in parentheses following the . They may be one of: Procedural. o Six fundamental operations: select (unary) project (unary) rename (unary) cartesian product (binary) union (binary) set-difference (binary) o Several other operations. o Nonprocedural. 2. A query language is a language in which a user requests information from a database.3 be the borrow and branch relations in the banking example. These are typically higher-level than programming languages.o o . to select tuples (rows) of the borrow relation where the branch is ``SFU''. For example. where the user specifies the information desired without giving a procedure for obtaining the information. Fundamental Operations 1. The Select Operation Select selects tuples that satisfy a given predicate. A complete query language also contains facilities to insert and delete tuples as well as to modify parts of existing tuples. . we would write Let Figure 3.

shown in Figure 3. Since a relation is a set. with the scheme we might write to find clients who have the same name as their banker. 2. (and). <. To get the names of customers having the same name as their bankers. The attributes to be copied appear as subscripts. Think of select as taking rows of a relation. . we write We can perform these operations on the relations resulting from other operations. . but ignoring amount and loan#. For example.Figure 3. > and (or) and in the selection predicate. client. duplicate rows are eliminated. We allow comparisons using =. For example: We also allow the logical connectives Figure 3. The new relation created as the result of this operation consists of one tuple: . The Project Operation Project copies its argument relation for the specified attributes only.4. .4: The client relation. Suppose there is one more relation. to obtain a relation showing customers and branches. and project as taking columns of a relation.3: The borrow and branch relations. Projection is denoted by the Greek capital letter pi ( ).

the attribute names have attached to them the name of the relation from which they came. we need to reference the customer relation again: . we need information in both client and customer relations. written The result of from and . with relation To find the clients of banker Johnson and the city in which they live. is a new relation with a tuple for each possible pairing of tuples In order to avoid ambiguity. has tuples. So we can write to get just these tuples.cname column contains customers of bankers other than Johnson. we need a projection: 4. The Cartesian Product Operation The cartesian product of two relations is denoted by a cross ( ). we drop the relation name. If no ambiguity will result. and has The resulting scheme is the concatenation of the schemes of names added as mentioned. then is a very large relation. Suppose we want to find the names of all the customers who live on the same street and in the same city as Smith.cname = customer. The result tuples. to get just the customer's name and city.cname. The Rename Operation The rename operation solves the problems that occurs with naming when performing the cartesian product of a relation with itself. If will have tuples. Finally. the customer. (Why?) We want rows where client. and . We can get this by writing However.3. We can get the street and city of Smith by writing To find other customers with the same information.

We need both borrow and deposit relations for this: As in all set operations. we must find everyone who has a loan or an account or both at the branch. . the ambiguities will disappear. The Set Difference Operation Set difference is denoted by the minus sign ( ). but not in another. giving the relation of Figure 3.where is a selection predicate requiring street and ccity values to be equal. denoted by the Greek letter rho ( ). we require that and must have the same number of attributes. For a union operation o o as in set theory. duplicates are eliminated. Problem: how do we distinguish between the two street values appearing in the Cartesian product. Thus results in a relation containing tuples that are in but not in .5(a). It finds tuples that are in one relation. The Union Operation The union operation is denoted two compatible relations. 6. If we use this to rename one of the two customer relations we are using.5: The union and set-difference operations. Figure 3. To find all customers of the SFU branch. It returns the union (set union) of to be legal. as both come from a customer relation? Solution: use the rename operator. 5. We write to get the relation under the name of . The domains of the corresponding attributes must be the same.

o A constant relation. Suppose we want to find the largest account balance in the bank. Additional operations are defined in terms of the fundamental operations. Figure 3. Compute the set difference of and the deposit relation. 2.6(a)). The Set Intersection Operation .5(b). A basic expression consists of either o A relation in the database. but are useful to simplify common queries.6: Find the largest account balance in the bank. We can do more with this operation. we write This resulting relation contains all balances except the largest one. Now we can finish our query by taking the set difference: Figure 3. we write The result is shown in Figure 3. To find .6(b) shows the result. General expressions are formed out of smaller subexpressions using o select (p a predicate) o project (s a list of attributes) o rename (x a relation name) o union o set difference o cartesian product Additional Operations 1. Formal Definition of Relational Algebra 1. They do not add power to the algebra. Strategy: o o Find a relation containing the balances not the largest.To find customers of the SFU branch who have an account there but no loan. 2. (See Figure 3.

so we have the natural join. We denote attributes appearing in both relations by . Duplicates are removed as in all relation operations. It does not add any power as To find all customers having both a loan and an account at the SFU branch. . o o o o o o Consider and to be sets of attributes. The Natural Join Operation Often we want to simplify queries on a cartesian product. Figure 3. We can now make a more formal definition of natural join. It is a projection onto of a selection on where the predicate requires for each attribute in .7. we can rewrite the previous query as The resulting relation is shown in Figure 3.7: Joining borrow and customer relations. we write 3. denoted by a sign. to find all customers having a loan at the bank and the cities in which they live. we need borrow and customer relations: Our selection predicate obtains only those tuples pertaining to only one cname. and returns a relation that contains tuples that are in both of its argument relations. It performs a selection forcing equality on those attributes that appear in both relation schemes. For example. The natural join of and . Formally. To illustrate. This type of operation is very common. We denote attributes in either or both relations by . Consider two relations and . Natural join combines a cartesian product and a selection into one operation.Set intersection is denoted by . denoted by is a relation on scheme .

19 in the textbook shows the result. We see now that there can be several ways to write a query in the relational algebra. . To find the assets and names of all branches which have depositors living in Stamford. We can obtain the names of all branches located in Brooklyn by Figure 3. Suppose we want to find all the customers who have an account at all branches located in Brooklyn. and 4. then . denoted . and have no attributes in common. Strategy: think of it as three steps. The divide operation provides exactly those customers: which is simply . we need customer.where . To find all customers who have both an account and a loan at the SFU branch: This is equivalent to the set intersection version we wrote earlier. is suited to queries that include the phrase ``for all''. We can also find all cname. deposit and branch relations: Note that is associative. If two relations . bname pairs for which the customer has an account by Figure 3. The Division Operation Division.20 in the textbook shows the result. Now we need to find all customers who appear in with every branch name in .

is a relation on scheme . The division operation can be defined in terms of the fundamental operations. Read the text for a more detailed explanation. The Assignment Operation Sometimes it is useful to be able to write a relational algebra expression in parts using a temporary relation variable (as we did with and in the division example). (The relational algebra was procedural. 2. o o o o Let and Let . We will look at this explanation in class more closely.) We must provide a formal description of the information desired. A query in the tuple relational calculus is expressed as . 5. . but the relation variable created can be used in subsequent expressions. Assignment to a permanent relation would constitute a modification to the database. The tuple relational calculus is a nonprocedural language. denoted language. if for every tuple in there is a tuple in satisfying both of o These conditions say that the portion of a tuple is in if and only if there are tuples with the portion and the portion in for every value of the portion in relation . works like assignment in a programming We could rewrite our division definition as No extra relation is added to the database. The assignment operation. The relation A tuple is in the following: be relations.Formally. The Tuple Relational Calculus 1.

For example. and the the cities in which they live: In English. to find the branch-name. and ccity is the city of cname''. (We would use project in the algebra. 3. we may read this equation as ``the set of all tuples such that there exists a tuple in the relation borrow for which the values of and for the cname attribute are equal. . In English. The tuples get the scheme cname implicitly as that is the only attribute is mentioned with. We also use the notation o to indicate the value of tuple on attribute . but suppose we only want the customer names. means ``there exists a tuple in relation such that predicate How did we get the above expression? We needed tuples on scheme cname such that there were tuples in borrow pertaining to that customer name with amount attribute .) We need to write an expression for a relation on scheme (cname).ccity) tuples for which cname is a borrower at the SFU branch. the set of tuples for which predicate is true.'' The notation is true''. we might read this as ``the set of all (cname. Tuple variable ensures that the customer is a borrower at the SFU branch. and the value of for the amount attribute is greater than 1200. o to show that tuple is in relation .i. Example Queries 1. customer name and amount for loans over $1200: This gives us all attributes. loan number. Find all customers having a loan from the SFU branch.e. Let's look at a more complex example.

Find all customers having a loan. but by here. Find all customers who have an account at all branches located in Brooklyn. and also ensures that ccity is the city of the customer.) For this example we will use implication. then the customer has an account at the branch whose name appears in the bname attribute of .Tuple variable is restricted to pertain to the same customer as . In English: the set of all cname tuples such that for all tuples in the branch relation. Safety of Expressions 1. Division is difficult to understand. if the value of on attribute bcity is Brooklyn. The logical connectives (AND) and (OR) are allowed. if is true. 4. or both at the SFU branch: Note the use of the connective. We also use the existential quantifier and the universal quantifier Some more examples: 1. 2. denoted by a pointing finger in the text. as well as . (negation). set operations remove all duplicates. then must be true. but not a loan at the SFU branch. . Find all customers who have both a loan and an account at the SFU branch. e. (We used division in relational algebra. Solution: simply change the connective in 1 to a . Think it through carefully. 3. The formula means implies . or. an account. Find customers who have an account. As usual.g. A tuple relational calculus expression may generate an infinite expression.

Expressive Power of Languages 1. o If is a formula. rather than values for an entire tuple. it is called unsafe. then so are . Formal Definitions 1. then so are and .2. An expression is of the form where the represent domain variables. The Domain Relational Calculus 1. is the set of all values appearing in borrow. o o o o o The domain of a formula . An atom in the domain relational calculus is of the following forms o where is a relation on attributes. the domain of is the set of all values explicitly appearing in or that appear in relations mentioned in . These include values mentioned in as well as values that appear in a tuple of a relation mentioned in . 3. o . Safe Tuple Expressions We need to restrict the relational calculus a bit. Domain variables take on values from an attribute's domain. 3. Formulae are built up from atoms using the following rules: o An atom is a formula. denoted dom( ). . and is a formula. is the set of all values appearing in borrow. o If and are formulae. where and are domain variables. A safe expression yields a finite number of tuples as its result. There are an infinite number of tuples that are not in borrow! Most of these tuples contain values that do not appear in the database. o . Otherwise. is safe if all values that appear in the result are 4. 2. and is a comparison operator. So. The tuple relational calculus restricted to safe expressions is equivalent in expressive power to the relational algebra. and . We may say an expression values from dom( ). are domain variables or constants. is the set of all values referenced in . and . where c is a constant.

o If is a formula where x is a domain variable. remove and change information. Deletion is expressed in much the same way as a query. Here's my attempt: I've used two letter variable names to get away from the problem of having to remember what stands for. 4. try rewriting this expression using implication. as in the tuple relational calculus example. We can only delete whole tuples. . Up until now. Modifications are expressed using the assignment operator. Find all customers who have an account at all branches located in Brooklyn. and Example Queries 1. Find all customers who have a loan for an amount > than $1200. 3. We also need to add. we have looked at extracting information from the database. customer name and amount for loans of over $1200. Find all customers having a loan from the SFU branch. Find branch name. Find all customers having a loan. the selected tuples are removed from the database. and the city in which they live. If you find this example difficult to understand. Instead of displaying. then so are . 5. loan number. 2. an account or both at the SFU branch. Deletion 1. Modifying the Database 1.

2. Insertions 1. 3. . is true are deleted. a deletion is of the form where is a relation and Tuples in for which 2. To provide all loan customers in the SFU branch with a $200 savings account. 2. Delete all loans with loan numbers between 1300 and 1500. 1. Delete all of Smith's account records. To insert a tuple for Smith who has $1200 in account 9372 at the SFU branch. Attribute values for inserted tuples must be members of the attribute's domain. To insert data into a relation.In relational algebra. Some examples: is a relational algebra expression. Some examples: is a relational algebra query. 2. Delete all accounts at Branches located in Needham. 1. or write a query whose result is the set of tuples to be inserted. An insertion is expressed by where is a relation and 3. we either specify a tuple.

that is made visible to the user as a ``virtual relation''. we may wish to create a personalized collection of relations for a user. To increase all balances by 5 percent. . insertions and updates. Some examples: . which is assigned the value of expression .Updating 1. For security and convenience reasons. To make two different rates of interest payment. Updating allows us to change some values in a tuple without necessarily changing all. it is generally not possible to store views. not part of the conceptual model. We have assumed up to now that the relations we are given are the actual relations stored in the database. We use the update operator. (Why?) Views must then be recomputed for each query referring to them. (Why?) Views 1. A view is defined using the create view command: where <query expression> is any legal query expression. depending on balance amount: Note: in this example the order of the two operations is important. View Definition 1. As relations may be modified by deletions. 3. 4. . This statement is applied to every tuple in deposit. 2. 2. is any arithmetic expression involving constants and attributes in 1. with the form where is a relation with attribute The expression relation . We use the term view to refer to any relation.

Having defined a view. View names can appear anywhere a relation name can. The modifications on a view must be transformed to modifications of the actual relations in the conceptual model of the database. . from which the view is constructed. Another problem with modification through views: consider the view This view lists the cities in which the borrowers of each branch live. A suitable response would be o o Reject the insertion and inform the user. 2.The view created is given the name . To create a view all-customer of all branches and their customers: 3. Let the view loan-info be given to the clerk: 3. We can now find all customers of the SFU branch by writing Updates Through Views and Null Values 1.``Ruth''. we have no value for amount.3. The symbol null represents a null or place-holder value. However. 4. 2. the clerk can write: This insertion is represented by an insertion into the actual relation borrow. insertions and deletions using views can cause problems. we can now use it to refer to the virtual relation it creates. 4. Insert (``SFU''.null) into the relation. Updates. It says the value is unknown or does not exist. Since SQL allows a view name to appear anywhere a relation name may appear. An example will illustrate: consider a clerk who needs to see all information in the borrow relation except amount.

Pitfalls in Relational DB Design A bad design may have several properties. including: Repetition of information. Lending-schema. To understand why.22 in the textbook). Inability to represent certain information. Lending-schema = (bname.Now consider the insertion Using nulls is the only possible way to do this (see Figure 3. bcity. Suppose we have a schema. loan#. 4. cname. A tuple t in the new relation has the following attributes: o t[assets] is the assets for t[bname] o t[bcity] is the city for t[bname] . now consider the expression the view actually corresponds to: As comparisons involving nulls are always false. Figure 7. Loss of information. think about the tuples that got inserted into borrow and customer. Retrieve information easily (and accurately). Representation of Information 1. 3.1. this query misses the inserted tuple. 2.1: Sample lending relation. Relational Database Design The goal of relational database design is to generate a set of schemas that allow us to Store information without unnecessary redundancy. Then think about how the view is recomputed for the above query. If we do this insertion with nulls. amount) and suppose an instance of the relation is Figure 7. assets.

9. Under the new design. L-31. . The previous example might seem to suggest that we should decompose schema as much as possible. may lead to another form of bad design. 2M. Let's analyze this problem: o We know that a branch is located in exactly one city. L-31. Consider a design where Lending-schema is decomposed into two schemas 3. cname) Customer-loan-schema = (cname. and must delete it when the last loan is paid off. 15. we need to change many tuples if the branch's assets change. we can only have this information when there are loans. o The functional dependency bname bcity holds on Lending-schema. We are now repeating the assets and branch city information for every loan. Another problem is that we cannot represent the information for a branch (assets and city) unless we have a tuple for a loan at that branch. 1K) 8.o o is the loan number made by branch t[bname] to t[cname].2: The decomposed lending relation. (SFU. assets. 14. (SFU. Turner. Thus we need to insert 9. amount) 7. however. Turner. we need a tuple with all the attributes required for Lending-schema. branch-customer = customer-loan = Figure 7. loan#. 4. Burnaby. o Repetition of information complicates updating. bcity. 12. In our new design. 13. o The functional dependency bname loan# does not. 2. 5. We construct our new relations from lending by: 8. Branch-customer-schema = (bname. 6. o We also know that a branch may make many loans. 10. t[amount] is the amount of the loan for 5. 7. o These two facts are best represented in separate relations. o Repetition of information wastes space. 1K) 10. Unless we use nulls. Decomposition 1. If we wish to add a loan to our database. 11. the original design would require adding a tuple to borrow: 6. Careless decomposition.

o Because of this. every attribute in R appears in at least one Let r be a relation on R. 20. we will not have a similar problem. and let for . Why not? 16. Branch-schema = (bname. 15. we have less information. We'll make a more formal definition of lossless-join: o Let R be a relation schema. . o Although we have more tuples in the join. loan#. we call this a lossy or lossy-join decomposition. o A decomposition that is not lossy-join is called a lossless-join decomposition. 21. 13.they will not appear in the original lending relation. not on the amount of the loan (which is not unique). cname. o This will not cause problems.11. there is exactly one assets value and exactly one bcity. assets) Branch-loan-schema = (bname.3: Join of the decomposed relations. 17. It appears that we can reconstruct the lending relation by performing a natural join on the two new schemas. o Two of these tuples will be spurious . 18. amount) o The only way we could represent a relationship between tuples in the two relations is through bname. 14. When we decomposed Lending-schema into Branch-schema and Loan-info-schema.3 shows what we get by computing branch-customer customer-loan. Figure 7. o For a given branch name. o The only way we could make a connection between branch-customer and customer-loan was through cname. there is exactly one assets value and branch city. 12. 19. and should not appear in the database. o A set of relation schemas is a decomposition of R if o o That is. so the natural join is made on the basis of equality in the cname. For a given branch name. We notice that there are tuples in branch-customer customer-loan that are not in lending. How did this happen? o The intersection of the two schemas is cname. there will be four tuples in the natural join. Figure 7. bcity. o If two lendings are for the same customer. whereas a similar statement associated with a loan depends on the customer.

the tuple t gives rise to one tuple in each . Using functional dependencies. consider a tuple . o Thus every tuple in r appears in However.no more and no less. It is always the case that: o To see why this is. we can define several normal forms which represent ``good'' database designs. . we get what we started with . if we decompose r and then ``recompose'' r. cname. in general. amount) . for any legal relation r. bcity. When we compute the relations . Normalization Using Functional Dependencies We can use functional dependencies to design a relational database in which most of the problems we have seen do not occur. for all relations r on schema R that are legal under C: 22. o o o o We saw an example of this inequality in our decomposition of lending into branch-customer and customer-loan. In other words. A decomposition of a relation schema R is a lossless-join decomposition for R if. Let C represent a set of constraints on the database. assets. These n tuples combine together to regenerate t when we compute the natural join of the . a lossless-join decomposition is one in which. is the database that results from decomposing R into . Desirable Properties of Decomposition We'll take another look at the schema Lending-schema = (bname. loan#. In order to have a lossless-join decomposition.o o That is. we need to impose some constraints on the set of possible relations.

o Let F be a set of functional dependencies on R. bcity) Loan-info-schema = (bname. This ensures that we can never get the situation where spurious tuples are generated. it ensures that the attributes involved in the natural join ( ) are a candidate key for at least one of the two relations. The decomposition is a lossless-join decomposition of R if at least one of the following functional dependencies are in : 1. We claim the above decomposition is lossless. loan#. amount) Borrow-schema = (cname. The set of functional dependencies we required to hold on this schema was: bname assets bcity loan# amount bname If we decompose it into Branch-schema = (bname. Lossless-Join Decomposition 1. How can we decide whether a decomposition is lossless? o Let R be a relation schema. . assets. Why is this true? Simply put. o o Let and form a decomposition of R. loan#) we claim this decomposition has several desirable properties. 2. as for any value on the join attributes there will be a unique tuple in one of the relations.which we saw was a bad design.

amount) Borrow-schema = (cname. Dependency Preservation 1. A decomposition having the property that is a dependency-preserving decomposition. we need to determine what functional dependencies may be tested by checking each relation individually. Next we decompose Borrow-schema into Loan-schema = (bname. and loan# amount bname This is also a lossless-join decomposition. o Let F be a set of functional dependencies on schema R. However. cname. o We would like to check easily that updates to the database do not result in illegal relations being created. loan#. is the set of all functional dependencies in that include only attributes of . Another desirable property in database design is dependency preservation. and if F' is satisfied. our decomposition is lossless join. then F must also be satisfied. assets) Loan-info-schema = (bname.2. as they involve attributes in one relation schema. loan#. amount) Since bname implies that bname assets bcity. the augmentation rule for functional dependencies bname assets bcity Since Branch-schema Borrow-schema = bname. We'll now show our decomposition is lossless-join by showing a set of steps that generate the decomposition: o First we decompose Lending-schema into o o o o o o o o o o o o o o o o Branch-schema = (bname. o To know whether joins must be computed. . The set of restrictions is the set of dependencies that can be checked efficiently. Let . it may be that . loan#) As loan# is the common attribute. . bcity. but in general. F' is a set of functional dependencies on schema R. Functional dependencies in a restriction can be tested in one relation. then every functional dependency in F is implied by F'. We need to know whether testing only the restrictions is sufficient. o It would be nice if our design allowed us to check updates without having to compute natural joins. o o o o o o o o o o Let The restriction of F to be a decomposition of R. If this is so.

7. 23. We can now show that our decomposition of Lending-schema is dependency preserving. 28. 25. 12. 13. 15. as computing takes exponential time. to 10. 9. 8. and see whether they are equal. 16. 5. are the functional dependencies not easily checkable logically implied by those that are? Rather than compute and . As the above example shows. An Easier Way To Test For Dependency Preservation Really we only need to know whether the functional dependencies in F and not in F' are implied by those in F'. 17. 19. 20. ) then return (true) els 29. 6.2. o The functional dependency o o bname assets bcity can be tested in one relation on Branch-schema. The algorithm for testing dependency preservation follows this method: 3. 21. 11. 14. 18. In other words. 30. 22. 26. it is often easier not to apply the algorithm shown to test dependency preservation. end compute for each schema begin := the restriction of in D do end compute if ( . 4. for each restriction begin do . o o o The functional dependency loan# amount bname can be tested in Loan-schema. 27. we can do this: . 31. 24.

o We also have the repetition of information problem. we must repeat the branch name and amount of the loan. ). 8. and o loan# is not a superkey. as we have (usually) just a few functional dependencies to work on. Let's look at Loan-info-schema: o We have the non-trivial functional dependency loan# amount. o We will see how this may be achieved through the use of normal forms. o For each customer associated with a loan. o Branch and loan data are separated into distinct relations. Loan-info-schema = (bname. 14.F'. A database design is in BCNF if each member of the set of relation schemas is in BCNF. o We can eliminate this redundancy by decomposing into schemas that are all in BCNF. 6. Let's assess our example banking design: 4. 16. 10. See whether this set is obtainable from F' by using Armstrong's Axioms. bcity) 9. where . Branch-schema = (bname. cname. 12. loan#. 2.o o o Find F . bname assets bcity 11. Our decomposition does not suffer from the repetition of information problem. the functional dependencies not checkable in one relation. 3. cname street ccity 7. amount) 13. street. o This lack of redundancy is obviously desirable. o Thus we do not have to repeat branch data for each loan. is a superkey for schema R. we do not have to repeat the loan amount for each customer. loan# amount bname 15. . o If a single loan is made to several customers. Customer-schema and Branch-schema are in BCNF. Boyce-Codd Normal Form 1. o Thus Loan-info-schema is not in BCNF. This should take a great deal less work. at least one of the following holds: is a trivial functional dependency (i. assets. ccity) 5. Customer-schema = (cname. A relation schema R is in Boyce-Codd Normal Form (BCNF) with respect to a set F of functional dependencies if for all functional dependencies in and o o of the form . Use this simpler method on exams and assignments (unless you have exponential time available to you).e. Repetition of Information 1.

39. 23. o o . nontrivial 36. 20. and 40. BCNF) 32. result = (result . Only trivial functional dependencies apply to Borrow-schema. 19. If we decompose into 18. compute . Thus both schemas are in BCNF. result := .17. 24. Why? We replace a schema The dependency with holds on . then begin let be a if (there is a schema in result that is not in suc 47. 31. 34. 29. (Remember why?) To see whether these schemas are in BCNF. Now we can give a general method to generate a collection of BCNF schemas. 21. 41. 46. 35. 44. Loan-schema = (bname. This algorithm generates a lossless-join BCNF decomposition. and . 27. 33. . loan#) we have a lossless-join decomposition. 25. 22. Branch name and loan amount information are not repeated for each customer in this design. we have loan# amount bname applying. functional dependency that holds on 37. end else done = true. 42. o o o For Loan-schema. loan#. 26. 43. We also no longer have the repetition of information problem. while (not done) do 30. 45. we need to know what functional dependencies apply to them. 28. done := false. amount) Borrow-schema = (cname. 38.

The functional dependency loan# amount bname holds on Loan-info-schema. We replace Lending-schema with Branch-schema = (bname. bcity) Loan-info-schema = (bname. We will now proceed to decompose: o o o The functional dependency bname assets bcity holds on Lending-schema. Let's apply this algorithm to our earlier example of poor database design: 49. and thus a lossless join. but loan# is not a superkey. amount) The set of functional dependencies we require to hold on this schema are bname assets bcity loan# amount bname A candidate key for this schema is {loan#. loan#) . loan#. amount) o o o o Branch-schema is now in BCNF.o . cname. 48. Lending-schema = (bname. cname. amount) Borrow-schema = (cname. cname}. o So we have . assets. bcity. We replace Loan-info-schema with Loan-schema = (bname. loan#. but bname is not a superkey. loan#. assets. 50.

The closure of this dependency does not include the second one. 53. 52. Not every decomposition is dependency-preserving. Some Things To Note About BCNF o There is sometimes more than one BCNF decomposition of a given schema. banker-name) Cust-banker-schema = (cname. banker-name) The decomposed schemas preserve only the first (and trivial) functional dependencies. . o The algorithm given produces only one of these possible decompositions. It is not always possible to satisfy all three design goals: o BCNF. o Changing the order in which the functional dependencies are considered by the algorithm may change the decomposition. o Dependency preservation. 51. 55. This shows us that not every BCNF decomposition is dependency-preserving. o For example. banker-name) o o The set F of functional dependencies is banker-name bname banker-name cname bname The schema is not in BCNF as banker-name is not a superkey. o Some of the BCNF decompositions may also yield dependency preservation. cname. Thus a violation of cname bname banker-name cannot be detected unless a join is computed. cname bname banker-name 56. o Lossless join. We saw earlier that this decomposition is both lossless-join and dependencypreserving. We can see that any BCNF decomposition of Banker-schema must fail to preserve 54. o Consider the relation schema o o o o o o o o o o o o o o o o Banker-schema = (bname. we may obtain the decomposition Banker-branch-schema = (bname.These are both now in BCNF. Check the two decompositions for dependency preservation. while others may not. If we apply our algorithm. try running the BCNF algorithm on o o o o o o o o Then change the order of the last two functional dependencies and run the algorithm again.

a schema in BCNF is also in 3NF. then begin 30. and are not allowed in BCNF. o Each attribute A in is contained in a candidate key for R. 5. We now allow functional dependencies satisfying only the third condition. do contains then begin i := i + 1. Note that we require the set F of functional dependencies to be in canonical form. 29. BCNF is a more restrictive constraint than 3NF. 2. 20. let be a canonical cover for F. is a superkey for schema R. we abandon BCNF and accept a weaker form called third normal form (3NF). 25. 12. 23. for each functional dependency if none of the schemas . Our Banker-schema decomposition did not have a dependency-preserving lossless-join decomposition into BCNF. where and . end . When we cannot meet all three design criteria. 32. 11. 26. It is always possible to find a dependency-preserving lossless-join decomposition that is in 3NF. 4. 21. 22. if none of the schemas 27. i := i + 1. 18. at least one is a trivial functional dependency. 13. contains a candidate key for R 28. 7. 6. 10. 16. . 3. 9. These dependencies are called transitive dependencies. 19. 15. We now present an algorithm for finding a dependency-preserving lossless-join decomposition into 3NF. 8. i := 0. As all relation schemas in BCNF satisfy the first two conditions only.Third Normal Form 1. := 24. 31. 17. A database design is in 3NF if each member of the set of relation schemas is in 3NF. The schema was already in 3NF though (check it out). 14. A relation schema R is in 3NF with respect to a set F of functional dependencies if for all functional dependencies in of the following holds: o o of the form .

candidate key for R 34. banker-name. The design is as a schema is built for each given dependency. To review our Banker-schema consider an extension to our example: 42. o Repetition of information occurs. we may want to express relationships between a banker and his or her branch. 41. Banker-info-schema = (bname. 37. office#) Banker-schema = (cname. Comparison of BCNF and 3NF 1.) 40. 38. 43. . bname. o If we do not eliminate all transitive dependencies. o As banker-name bname . is guaranteed by the requirement that a candidate key for R be in at least one of the schemas.33. the process is finished. office#) The set F of functional dependencies is banker-name bname office# cname bname banker-name The for loop in the algorithm gives us the following decomposition: Banker-office-schema = (banker-name. 36. We have seen BCNF and 3NF. 2. Why? (A proof is given is [Ullman 1988]. bname. we may need to use null values to represent some of the meaningful relationships. return ( end ) := any 39. o It is always possible to obtain a 3NF design without sacrificing lossless-join or dependency-preservation. These problems can be illustrated with Banker-schema. cname. banker-name) Since Banker-schema contains a candidate key for Banker-info-schema. 35. Each relation schema is in .

A final point: there is a price to pay for decomposition. For each address. Normalization Using Multivalued Dependencies (not to be covered) 1. 4.Figure 7. Figure 7. When we decompose a relation. If we must choose between BCNF and dependency preservation. we must repeat the loan numbers for a customer. then we no longer wish to enforce this functional dependency.4 shows how we must either have a corresponding value for customer name. Suppose that in our banking example. it is generally better to opt for 3NF. 6. 5. we either pay a high price in system performance or risk the integrity of the data. o The limited amount of redundancy in 3NF is then a lesser evil. 4. or include a null. though. o Dependency-preservation. and the schema is in BCNF. we accept o 3NF o Lossless-join. To summarize. If we have customers who have several addresses. . 3. If we cannot achieve this. our goal for a relational database design is o BCNF. However. cname. street. as the functional dependency cname street ccity holds on this schema. we have to use natural joins or Cartesian products to put the pieces back together. and cname is not a superkey. 5.4: An instance of Banker-schema. o Dependency-preservation. o 3. BC-schema = (loan#. o Every occurrence of the banker's name must be accompanied by the branch name. ccity) We can see this is not BCNF. and vice versa. o Lossless-join. we now have the repetition of information problem. This takes computational time. we had an alternative design including the schema: 2. o If we cannot check for dependency preservation efficiently. o Repetition of information also occurs.

Figure 7. Let R be a relation schema. 3.5 (textbook 6. then we cannot have two tuples with the same A value but different B values. and let and . for all pairs of tuples . Instead. It looks horrendously complicated. Functional dependencies rule out certain tuples from appearing in a relation. address. . as shown in Figure 7. Multivalued dependencies do not rule out the existence of certain tuples. A simple example is a table with the schema (name.Multivalued Dependencies 1. The multivalued dependency holds on R if in any legal relation r(R). there exist tuples and in r such that: and in r such that 4.10) shows a tabular representation of this. 2. car).6. If A B. but is really rather simple. they require that other tuples of a certain form be present in the relation.

5: Tabular representation of . Thus the relation of Figure 7. Figure 7. has loan number 23.11). we see that we want the multivalued dependency cname street ccity to hold on BC-schema. If we look at our definition of multivalued dependency.6: (name.8 (textbook 6.12) is illegal.7 (textbook 6.8: An illegal bc relation. then we say it is a trivial multivalued dependency on schema R. says that the relationship between and . is independent of the Intuitively. as the relationship between a customer and a loan is independent of the relationship between a customer and his or her address. . Look at the example relation bc relation in Figure 7. Figure 7. o o o o o o o o We must repeat the loan number once for each address a customer has. relationship between and o If the multivalued dependency is satisfied by all relations on schema R. If a customer. o Thus is trivial if or . car) where o and . we want all of Smith's addresses to be associated with that loan.Figure 7. We must repeat the address once for each loan the customer has. This repetition is pointless. say ``Smith''. 5. address. an example of redundancy in a BCNF relation.7: Relation bc. Figure 7.

Let's do an example: o Let R=(A. then holds. Reflexivity rule: if is a set of attributes and 2. and . Complementation rule: if 5. Theory of Multivalued Dependencies 1. and . where holds. then holds. o We can compute from D using the formal definitions.G. We will need to compute all the multivalued dependencies that are logically implied by a given set of multivalued dependencies.H. we can construct a relation r' that does satisfy the multivalued dependency by adding tuples to r. An example of coalescence rule is as follows. .I) be a relation schema. Coalescence rule: if and . 6. then . Note that if a relation r fails to satisfy a given multivalued dependency. and is a set of attributes. o Suppose holds.C. Transitivity rule: if 4. and . and holds. then holds. Augmentation rule: if holds. The first three rules are Armstrong's axioms from Chapter 5. Thus we have . 3. o . and there is a such that and holds. 7. then holds. and An example of multivalued transitivity rule is as follows. . and . holds.6. o The closure of D is the set of all functional and multivalued dependencies logically implied by D. then holds. but it is easier to use a set of inference rules. then holds. o Let D denote a set of functional and multivalued dependencies. Multivalued augmentation rule: if holds. Multivalued transitivity rule: if holds. If we have . then holds. then we have The definition of multivalued dependencies implies that if there exists tuples and such that: . The following set of inference rules is sound and complete.B. Replication rule: if 8. holds. then o o o . and holds. 1. 2.

then holds. but no non-trivial functional dependencies. We saw that BC-schema was in BCNF.B . then holds. and . holds and An example will help: Let R=(A. holds. multivalued transitivity rule implies that o : coalescence rule can be applied. . We had the multivalued dependency cname street ccity. and being H.A = CGHI. holds and holds and holds and holds.I) with the set of dependencies: We list some members of o o : . and R . By the difference o being HI. Fourth Normal Form (4NF) 1. complementation rule implies that : since : Since . . so we can satisfy the coalescence rule with being . . being CG. but still was not an ideal design as it suffered from repetition of information. o Tuples and satisfy if we simply change the subscripts.H.o o o o o o o The complementation rule states that if then .B. and B. We conclude that : now we know that and . We can simplify calculating . and . then holds. rule.G.C. the closure of D by using the following rules. derivable from the previous ones: o o o Multivalued union rule: if Intersection rule: if Difference rule: if holds. holds.

19. note that if a schema is not in BCNF. R cannot be in 4NF. We can use the given multivalued dependencies to improve the database design by decomposing it into fourth normal form. We have an algorithm similar to the BCNF algorithm for decomposing a schema into 4NF: 7. 12. 22. 32. 8. o Since implies . 18. 3. dependency that holds on 24. 4. result = 30. o Every 4NF schema is also in BCNF. where is not a superkey. 25. 21. 27. where . is a superkey for schema R. o To see why. A relation schema R is in 4NF with respect to a set D of functional and multivalued dependencies if for all multivalued dependencies in and o o of the form . 26. 28. 29. . 34. 23. 5.2. at least one of the following hold: is a trivial multivalued dependency. 31. 10. by the replication rule. 15. such that if (there is a schema in result then begin let be a nontrivial multivalued 35. compute . 33. If we apply this algorithm to BC-schema: o cname loan# is a nontrivial multivalued dependency and cname is not a superkey for the schema. 17. 13. 9. A database design is in 4NF if each member of the set of relation schemas is in 4NF. The definition of 4NF differs from the BCNF definition only in the use of multivalued dependencies. there is a non-trivial functional dependency holding on R. done := false. end else done = true. 20. while (not done) do 14. 16. 6. result := . 11.

The algorithm causes us to decompose using this dependency into . 38. as we have and A is not a superkey. there exists a relation r(R) that satisfies D and for which for all i. The restriction of D to is the set consisting of: that include only attributes of where . This says that for every lossless-join decomposition of R into two schemas and . o Let R be a relation schema. Let D be the set of functional and multivalued dependencies holding on R. by inspecting the algorithm. loan#) Customer-schema=(cname. 36. and All functional dependencies in All multivalued dependencies of the form o is in .I). one of the two above dependencies must hold. 37. ccity) Let and form a decomposition of R. We can show that our algorithm generates only lossless-join decompositions. What does this formal statement say? It says that a decomposition is dependency preserving if for every set of relations on the decomposition schema satisfying only the restrictions on D there exists a relation r on the entire schema R that the decomposed schemas can be derived from. We'll do an example using our decomposition algorithm and check the result for dependency preservation.B. 39. o o o Let be a decomposition of R.These two schemas are in 4NF.C. o Let D be o o o o o o o R is not in 4NF. and that r also satisfies the functional and multivalued dependencies. o Let R be a relation schema and D a set of functional and multivalued dependencies on R. This decomposition is lossless-join if and only if at least one of the following multivalued dependencies is in : We saw similar criteria for functional dependencies.G. satisfies . Dependency preservation is not as simple to determine as with functional dependencies. o Let R=(A. You can see. that this must be the case for every decomposition.H. A decomposition of schema R is dependency preserving with respect to a set D of functional and multivalued dependencies if for every set of relations such that for all i. street. o o o o o o o o o o o o o o We then replace BC-schema by two schemas: Cust-loan-schema=(cname.

We can say the same for and . . Any relation s containing r and satisfying .10 (textbook 6.15) shows Relation r does not satisfy .9 (textbook 6.I) that satisfies D and decomposes into and . and our decomposition is 40. satisfies all functional and multivalued dependencies since no two tuples have the same value on any attribute. includes a tuple that is not in . but Why? As gives us is in . However. We can see that satisfies as there are no pairs with the same A value.H. Thus our decomposition fails to detect a violation of . o o o o o o o o o o o This decomposition is not dependency preserving as it fails to preserve . our o o o o o o o o o o The algorithm terminates.G.9: Projection of relation r onto a 4NF decomposition of R. is now in 4NF. there is no relation r on (A. Also.C. The restriction of D to (A.B. is not. Let's analyze the result. (why?) then the restriction of this dependency to into Applying this dependency in our algorithm finally decomposes and . Figure 7. .o o o o o is now in 4NF. must include the tuple o o However. Figure 7. So our decomposed version satisfies all the dependencies in the restriction of D.B) is and some trivial dependencies.14) shows four relations that may result from projecting a relation onto the four schemas of our decomposition. Figure 7. Applying the multivalued dependency algorithm then decomposes into (how did we get this?). but is not.

10: A relation r(R) that does not satisfy . . o Dependency Preservation. to ensure dependency preservation. and accept BCNF.Figure 7. 43. or even 3NF if necessary. 41. we compromise on 4NF. o Lossless-join. 42. the first criteria is just BCNF. it is best to find a database design that meets the three criteria: o 4NF. We cannot always meet all three criteria. If we only have functional dependencies. We have seen that if we are given a set of functional and multivalued dependencies. When this occurs.

- DBMS_SQL
- MAT100 WEEK10 Set Theory Rev
- MAT100 WEEK10 Set Theory Rev.doc-1
- CCP403
- Course Notes
- Compiler-Report.docx
- College Algebra Sets
- L11-Sets M163
- Part1Module1
- Lecture 2
- Chapter-i Set Theory
- Ordinal Numbers
- setconcepts-120102230637-phpapp02
- Discrete_Math_02+Finite+Sets
- Presentation
- Set
- Basic Concepts in Infinite Set Theory
- Finite Intersection Property
- Presentation 1
- A Set is a Collection of Distinct Objects
- Sets and Set Notation
- mathematics-set theory
- Set Theory
- l 0342067075
- A Set is a Collection of Well Defined and Distinct Objects
- 12 4 2004 Undergrad Top Moore Method
- Ordinal Numbers
- SETS & RELATIONS-JEE(MAIN+ADVANCED)
- Set Theory
- Morphology
- Data Abstraction

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd