# Relational Data Model One of the most important applications for computers is storing and managing information.

The manner in which information is organized can have a profound effect on how easy it is to access and manage. Perhaps the simplest but most versatile way to organize information is to store it in tables. The relational model is centered on this idea: the organization of data into collections of two-dimensional tables called ³relations.´ We can also think of the relational model as a generalization of the set data model that we discussed in Chapter 7, extending binary relations to relations of arbitrary arity. Originally, the relational data model was developed for databases ² that is, information stored over a long period of time in a cDatabase omputer system ² and for database management systems, the software that allows people to store, access, and modify this information. Databases still provide us with important motivation for understanding the relational data model. They are found today not only in their original, large-scale applications such as airline reservation systems or banking sys- tems, but in desktop computers handling individual activities such as maintaining expense records, homework grades, and many other uses. Other kinds of software besides database systems can make good use of tables of information as well, and the relational data model helps us design these tables and develop the data structures that we need to access them efficiently. For example, such tables are used by compilers to store information about the variables used in the program, keeping track of their data type and of the functions for which they are defined. Relations Section 7.7 introduced the notion of a ³relation´ as a set of tuples. Each tuple of a relation is a list of components, and each relation has a fixed arity, which is the number of components each of its tuples has. The columns of the table are given names, called attributes. Attribute In Fig. 8.1, the attributes are Course, StudentId, and Grade.

the order in which the rows of a table are listed has no significance, and we can rearrange the rows in any way without changing the value of the table, just as we can we rearrange the order of elements in a set without changing the value of the set. The order of the components in each row of a table is significant, since different columns are named differently, and each component must represent an item of the kind indicated by the header of its column. In the relational model, however, we may permute the order of the columns along with the names of their headers and keep the relation the same. Each row in the table is called a tuple and represenTuple ts a basic fact. The first row, (CS101, 12345, A), represents the fact that the student with ID number 12345

got an A in the course CS101. A table has two aspects: 1. The set of column names, and 2. The rows containing the information. The term ³relation´ refers to the latter, that is, the set of rows. Each row represents a tuple of the relation, and the order in which the rows appear in the table is immaterial. No two rows of the same table may have identical values in all columns. Relation scheme Item (1), the set of column names (attributes) is called the scheme of the relation. The order in which the attributes appear in the scheme is immaterial, but we need to know the correspondence between the attributes and the columns of the table in order to write the tuples properly. Databases A collection of relations is called a database. The first thing we need to do when designing a database for some application is to decide on how the information to be stored should be arranged into tables. Design of a database, like all design problems, is a matter of business needs and judgment. In an example to follow, we shall expand our application of a registrar¶s database involving courses, and thereby expose some of the principles of good database design. Some of the most powerful operations on a database involve the use of several relations to represent coordinated types of data. By setting up appropriate data structures, we can jump from one relation to another efficiently, and thus obtain information from the database that we could not uncover from a single relation. Queries on a Database We saw in Chapter 7 some of the most important operations performed on relations and functions; they were called insert, delete, and lookup, although their appropriate meanings differed, depending on whether we were dealing with a dictionary, a function, or a binary relation. There is a great variety of operations one can perform on database relations, especially on combinations of two or more relation 1. insert(t,R). We add the tuple t to the relation R, if it is not already there. This operation is in the same spirit as insert for dictionaries or binary relations. 2. delete(X,R). Here, X is intended to be a specification of some tuples. It consists of components for each of the attributes of R, and each component can be either a) A value, or b) The symbol , which means that any value is acceptable. The effect of this operation is to delete all tuples that match the specification X. For example, if we cancel CS101, we want to delete all tuples of the Course-Day-Hour relation that have Course = ³CS101.´ We could express this condition by delete(³CS101´, , ), Course-Day-Hour_ That operation would delete the first three tuples of the relation in Fig. 8.2(c), because their first components each are the same value as the first component of the specification, and their second and third components all match , as any values do. 3. lookup(X,R). The result of this operation is the set of tuples in R that match the specification X; the latter is a symbolic tuple as described in the preceding

History:
The relational model was invented by E.F. (Ted) Codd as a general model of data, and subsequently maintained and developed by Chris Date and Hugh Darwen among others. In The Third Manifesto (first published in 1995) Date and Darwen show how the relational model can accommodate certain desired object oriented features.

Controversies:
Codd himself, some years after publication of his 1970 model, proposed a three-valued logic (True, False, Missing or NULL) version of it to deal with missing information, and in his The Relational Model for Database Management Version 2 (1990) he went a step further with a four-valued logic (True, False, Missing but Applicable, Missing but Inapplicable) version. But these have never been implemented, presumably because of attending complexity. SQL's NULL construct was intended to be part of a three-valued logic system, but fell short of that due to logical errors in the standard and in its implementations.

Introduction:
The Relation is the basic element in a relational data model.

y y y y y y y y y y y y y

A relation is subject to the following rules: Relation (file, table) is a two-dimensional table. Attribute (i.e. field or data item) is a column in the table. Each column in the table has a unique name within that table. Each column is homogeneous. Thus the entries in any column are all of the same type (e.g. age, name, employee-number, etc). Each column has a domain, the set of possible values that can appear in that column. A Tuple (i.e. record) is a row in the table. The order of the rows and columns is not important. Values of a row all relate to some thing or portion of a thing. Repeating groups (collections of logically related attributes that occur multiple times within one record occurrence) are not allowed. Duplicate rows are not allowed (candidate keys are designed to prevent this). Cells must be single-valued (but can be variable length). Single valued means the following: Cannot contain multiple values such as 'A1,B2,C3'. Cannot contain combined values such as 'ABC-XYZ' where 'ABC' means one thing and 'XYZ' another.

Relationships:

One table (relation) may be linked with another in what is known as a relationship. Relationships may be built into the database structure to facilitate the operation of relational joins at runtime. y A relationship is between two tables in what is known as a one-to-many or parent-child or master-detail relationship where an occurrence on the 'one' or 'parent' or 'master' table may have any number of associated occurrences on the 'many' or 'child' or 'detail' table. To achieve this the child table must contain fields which link back the primary key on the parent table. These fields on the child table are known as a foreign key, and the parent table is referred to as the foreign table (from the viewpoint of the child). y It is possible for a record on the parent table to exist without corresponding records on the child table, but it should not be possible for an entry on the child table to exist without a corresponding entry on the parent table. y A table may be the subject of any number of relationships, and it may be the parent in some and the child in others. y Some database engines allow a parent table to be linked via a candidate key, but if this were changed it could result in the link to the child table being broken. y Some database engines allow relationships to be managed by rules known as referential integrity or foreign key restraints. These will prevent entries on child tables from being created if the foreign key does not exist on the parent table, or will deal with entries on child tables when the entry on the parent table is updated or deleted. y A relation may be expressed using the notation R(A1,A2,A3, ...An) where: R = the name of the relation. (A1,A2,A3, ...An) = the attributes within the relation. A1 = the attribute(s) which form the primary key.

Relational Data Model:
y y y y y The relational model has provided basis for: Research on theory of data/relationship/constraint Numerous database design methodologies The standard database access language SQL Almost all modern commercial database management systems use this model The relational data model describes the world as ³a collection of inter-related relations (or tables)´

Fundamental concepts in Relational Data Model:

1.Domain: A domain D is the original sets of atomic values used to model data. By atomic, we mean that each value in the domain is indivisible as far as the relational model is concerned. For example: y The domain of day shift is the set of all possible days : {Mon, Tue, Wed«} y The domain of salary is the set of all floating-point numbers greater than 0 and less than 200,000 (say). y The domain of name is the set of character strings that represents names of person 2.Relation (Relation state): A relation is a subset of the Cartesian product of a list of domains characterised by a name. Given n domains denoted by D1, D2, «, Dn , R is a relation defined on these domains if R D1×D2×...×Dn. Relation can be viewed as a ³table´. In that table, each row represents a tuple of data values and each column represents an attribute. 3.Attribute: A column of a relation designated by name. The name associated should be meaningful. Each attributes associates with a domain. 4. A relation schema denoted by R is a list of attributes (A1, A2, «, An). The degree of the relation is the number of attributes of its relation schema. The cardinality of the relation is the number of tuples in the relation. Example of relation, relation schema and attribute: STUDENT is Relation Name Roll No, Name, Birthdate, Semester, Department are called Attributes (or Columns) Each row of the tables is called Tuple (or Row/Record) Relation STUDENT Tuple Roll No. Name Birthday Semester Department Characteristic of relations: Ordering of Tuples in a relation: A tuple is a set of values. A relation is a set of tuples. Since a relation is a set, there is no ordering on rows. Ordering of Values within a tuple: The order of attributes and their values within a relation is not important as long as the correspondence between attributes and values is maintained. Thus the following is a different representation of the above EMPLOYEE relation.
STUDENT Roll No. 100001 Name Samavia Birthday 14 Feb Semester 5th Department Fine Arts

y y

100002

DX

13 Feb

5th

Law

100003

Xtremer

12 Feb

5th

C.S

y

Values and NULL values in the tuple: Each value in a tuple is atomic. That means each value cannot be divided into smaller components. Hence, the composite and multivalued attributes are not allowed in a relation.

Constraints in Relational Data:

Constraint is a very important feature in relational model. In fact, relational model support a well-defined theory of constraint on attributes or tables. Constraint is useful because it allows designer to specify the semantics of data in database and it is the rules to enforce DBMSs to check that new data satisfies the semantics.

Integrity constraint:
Relation allows us to represent data and association. Domain restricts the values of attributes in the relation and it is a constraint of relational model. However, there are real world semantics on data that cannot specifies if use only domain. We need more specific way to state what data values are not allows, what format is suitable for an attributes. For example, Student number must be unique, students¶ age is in the range e.g 20-30 years. Such information is provided in logical statements called integrity constraints. There are several kinds of integrity constraints: 1. Key constraint: A relation is a set of tuples. By definition, all elements in a set are distinct hence all tuple in a relation must be distinct. In relational model, tuples have no identity like object identification. Tuple identity is totally value based. Therefore, we need key constraint that is the way of uniquely identify a tuple. Given a relation schema R with U is the list of attributes, there are a set K which is a subset of U. If in a relation R of E with any two distinct tuples t1 and t2 we have the constraint that t1[K]  t2[K] then K is called a superkey of the relation schema R. A superkey that have no reduntant attributes is called a candidate key. Since a relation schema may have more than one candidate key thus there is a chosen candidate key whose values are used to uniquely identify tuples in the relation. Such key is primary key. Primary key is usually the most simple candidate key (i.e. key with single attribute or small number of attributes) 2. Entity constraint: No attribute in the primary key can be NULL. This is because, NULL values for the primary key means we cannot identify some tuples. For example, in the EMPLOYEE relation showed above, CellPhone cannot be a key since we cannot use this attribute to identify employees 20012322 and employee 19991323. 3. Referential constraint: The constraint that is specified between two relations and maintain the correspondence between tuples in these relations. It means the reference from a tuple in one relation to other relation must be valid. Example of Referential integrity constraint: In the Bank Database (From Data Modelling lecture) : The ACCOUNT relation need to take note the BRANCH where each account is held so in implementation, in each tuple of ACCOUNT relation, there is an attribute such as branchname to identify the associate BRANCH. The referential integrity constraint must state that the branchname attribute in the ACCOUNT relation refer to a valid branch (i.e. existing branch). Referential constraint in relational model relates to notation of foreign key. A set of attributes FK in a relation schema R1 is foreign key if The attributes in FK correspond to the attributes in the primary key of another relation schema R2. The value for FK in each tuple of R1 either occur as values of primary key of a tuple in R2 or is entirely NULL. In a database of many relations, there are usually many foreign keys. They provide the ³glue´ that links individual relations into a cohesive database structure.

Semantic constraints: This is a special kind of constraints that may have to enforce in relational database. Such constraints describe the semantics of data in the database or sometimes called the rules on data. For example, in the COMPANY database, we have the rule ³ An employee cannot take a part in more than 5 projects´ or ³Salary of an employee cannot exceed the salary of the employee¶s manager´. Functional Dependency constraints: This constraints establishes a functional relationship among two sets of attributes. Relational Database: Relations, keys, foreign keys and integrity constraints provide a complete toolkit for building relational databases. A relational database consists of many relations and tuples in relations are related in various ways. Here, we will define relational database schema and relational database instance. A relational database schema is: y A set of relation schemas S = {R1, R2, « , Rn} , and y A set of integrity constraints y A relational database instance is: y A set of relations (relation states) {r1(R1) , r2(R2) , « , rn(Rn) } where all of the integrity constraints are satisfied. Constraint Checking: Relational database instance is changing over time. At a moment of time, we can have an instance that satisfied all the constraints but when some update operations performs, we must recheck the constraints. There are three basic update operations on relations: insert a new record, delete an existing record and modify an existing record.

ACCOUNT branchName HaThanh DongDo DongDo HaThanh

balance 20000 20000 3500 50000

accountNumber C-12894349 C-12894350 S-141510751 S-520522620

BRANCH branchName HaThanh DongDo ThangLong

Address Hai Ba Trung Dong Da Hoan Kiem

assets 900000000 400000000 500000000

ACCOUNT-HOLDER CUSTOMER customerNumber accountNumber customerNumber Name address homeBranch 111111 C-12894349 111111 Anh Hai Ba Trung HaThanh 121314 C-12894350 121314 Van Anh Hai Ba Trung Dong Do 121314 S-141510751 515016 S-520522620 515016 Son Hoan Kiem HaThanh 111111 C-12894350 Domain constraint checking: For insert operation, it is need to check attribute value for type and other domain restrictions. For delete operation, it is no need to check any domain constraints For update operation, it is also need to check attribute value for type and other domain restrictions. The following changes satisfy domain constraints Insert Account(HaThanh, 50000, S-20071280) Insert Account(HaThan, 20000, C-20072242) ( it is looks ok but actually the data value is not correct) Update Account(HaThanh, 50000, S-20071280) to Account(HaThanh, 60000, S20071280) The changes that do not satisfy domain constraints: Insert Account(HaThanh, 5000USD, S-20071280) Insert Account(DongDo, -20, C-12894349) Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S34252525) Key constraint checking: For insert operation, it is need to check the key value does not occur in any existing tuple in the relation. For delete operation, it is no need to check any domain key constraints For update operation, if the key value is modified then need the same check as for insertion. Changes that satisfy key constraints: Insert Account(DongDo, 20000, C-12894350) (there is no account with that account number in the current relation) Insert Account-Holder(12334, C-12894350) (ok, but no such customer with number 12334)

Update Account(HaThanh, 50000, S-34252525) to Account(60000, HaThanh, S34252525) (key is not modified) Changes that do not satisfied key constraints: Insert Account(DongDo, 50000, C-12894350) (key is alredy there in the relation) Update Account(DongDo, 20000, C-12894350) to Account(DongDo, 20000, C12894349) ( the account C-12894349 is already in the relation) Referential integrity constraint checking: For insert operation, it is need to check the foreign keys occur as primary keys in the referenced relation. For delete operation, check all relations that have foreign keys refering to this relation An update need to treat as delete - then ± insert for referential constraints checking. Changes that satisfy referential constraint: Insert Account(ThangLong, 5000, C-12891230) Insert Account-Holder(111111, C-12891230) Update Customer(515016, Son, Hoan Kiem, HaThanh) to Customer(515016, Son, Hoan Kiem, ThangLong) Delete Account-Holder(111111, C-12894350) Changes that does not satisty referential constraint: Insert Account-Holder(12334, C-12894350) ( no such customer) Insert Customer(222222, Nha, DongDa, An Binh) ( no such branch) Delete Customer with customerNumber = µ111111¶ ( this is not acceptable since there are tuples in Account-Holder relation refer to this customer). Deletion can violate referential constraint when the tuple being deleted is referenced by the foreign keys from others tuples in a different relation. Several approaches are consider to handle this kind of violation. The first approach is simply disallow the deletion. The second approach user must find the refering tuple then either delete them manually or change their foreign key to an acceptable value or NULL value ( not possible if the foreign key also forms part of the primary key such as in the Account-Holder relation). The third approach: attempt to remove all refering tuple automatically (cascade) When the referential constraint is specified in the database during the creation phase, the DBMSs will allow user to specify which of the above approach applies when a violation occur Relational operations: Users (or programs) request data from a relational database by sending it a query that is written in a special language, usually a dialect of SQL. Although SQL was originally intended for end-users, it is much more common for SQL queries to be embedded into software that provides an easier user interface. Many web sites, such as Wikipedia, perform SQL queries when generating pages. In response to a query, the database returns a result set, which is just a list of rows containing the answers. The simplest query is just to return all the rows from a table, but more often, the rows are filtered in some way to return just the answer wanted. Often, data from multiple tables are combined into one, by doing a join. Conceptually, this is done by taking all possible combinations of rows (the Cartesian product), and then filtering out everything except the answer. In practice, relational database management systems rewrite ("optimize") queries to perform faster, using a variety of techniques. There are a number of relational operations in addition to join. These include project (the process of eliminating some of the columns), restrict (the process of eliminating some of the rows), union (a way of combining two tables with similar structures), difference (which lists the

rows in one table that are not found in the other), intersect (which lists the rows found in both tables), and product (mentioned above, which combines each row of one table with each row of the other). Depending on which other sources you consult, there are a number of other operators - many of which can be defined in terms of those listed above. These include semi-join, outer operators such as outer join and outer union, and various forms of division. Then there are operators to rename columns, and summarizing or aggregating operators, and if you permit relation values as attributes (RVA - relation-valued attribute), then operators such as group and ungroup. The SELECT statement in SQL serves to handle all of these except for the group and ungroup operators. The flexibility of relational databases allows programmers to write queries that were not anticipated by the database designers. As a result, relational databases can be used by multiple applications in ways the original designers did not foresee, which is especially important for databases that might be used for a long time (perhaps several decades). This has made the idea and implementation of relational databases very popular with businesses. SQL and the relational model: SQL, initially pushed as the standard language for relational databases, deviates from the relational model in several places. The current ISO SQL standard doesn't mention the relational model or use relational terms or concepts. However, it is possible to create a database conforming to the relational model using SQL if one does not use certain SQL features. The following deviations from the relational model have been noted in SQL. Note that few database servers implement the entire SQL standard and in particular do not allow some of these deviations. Whereas NULL is ubiquitous, for example, allowing duplicate column names within a table or anonymous columns is uncommon. Duplicate rows: The same row can appear more than once in an SQL table. The same tuple cannot appear more than once in a relation. Anonymous columns: A column in an SQL table can be unnamed and thus unable to be referenced in expressions. The relational model requires every attribute to be named and referenceable. Duplicate column names: Two or more columns of the same SQL table can have the same name and therefore cannot be referenced, on account of the obvious ambiguity. The relational model requires every attribute to be referenceable. Column order significance: The order of columns in an SQL table is defined and significant, one consequence being that SQL's implementations of Cartesian product and union are both noncommutative. The relational model requires there to be no significance to any ordering of the attributes of a relation. Views without CHECK OPTION: Updates to a view defined without CHECK OPTION can be accepted but the resulting update to the database does not necessarily have the expressed effect on its target. For example, an invocation of INSERT can be accepted but the inserted rows might not all appear in the view, or an invocation of UPDATE can result in rows disappearing from the view. The relational model requires updates to a view to have the same effect as if the view were a base relvar. Columnless tables unrecognized:

SQL requires every table to have at least one column, but there are two relations of degree zero (of cardinality one and zero) and they are needed to represent extensions of predicates that contain no free variables. NULL: This special mark can appear instead of a value wherever a value can appear in SQL, in particular in place of a column value in some row. The deviation from the relational model arises from the fact that the implementation of this ad hoc concept in SQL involves the use of threevalued logic, under which the comparison of NULL with itself does not yield true but instead yields the third truth value, unknown; similarly the comparison NULL with something other than itself does not yield false but instead yields unknown. It is because of this behaviour in comparisons that NULL is described as a mark rather than a value. The relational model depends on the law of excluded middle under which anything that is not true is false and anything that is not false is true; it also requires every tuple in a relation body to have a value for every attribute of that relation. This particular deviation is disputed by some if only because E.F. Codd himself eventually advocated the use of special marks and a 4-valued logic, but this was based on his observation that there are two distinct reasons why one might want to use a special mark in place of a value, which led opponents of the use of such logics to discover more distinct reasons and at least as many as 19 have been noted, which would require a 21-valued logic.[citation needed] SQL itself uses NULL for several purposes other than to represent "value unknown". For example, the sum of the empty set is NULL, meaning zero, the average of the empty set is NULL, meaning undefined, and NULL appearing in the result of a LEFT JOIN can mean "no value because there is no matching row in the right-hand operand".