chp2 Rel Mod

The Relational Model Of Data
2.1 An Overview Of Data Models

2.1.1 What Is A Data Model?
2.1.2 Important Data Models
2.1.3 Relational Model In Brief
2.1.4 Semistructured Model In Brief
2.1.5 Other Models In Brief
2.1.6 Comparison Of Modeling Approaches
2.2 Basics Of Relational Model

2.2.1 Attributes
2.2.2 Schemas
2.2.3 Tuples
2.2.4 Domains
2.2.5 Equivalent Representations Of A Relation
2.2.6 Relation Instances
2.2.7 Keys Of Relations
2.2.8 An Example Database Schema
2.2.9 Exercises For Section 2.2
2.3 Defining A Relation Schema in SQL

2.3.1 Relations In Sql
2.3.2 Data Types
2.3.3 Simple Table Declarations
2.3.4 Modifying Relation Schemas
2.3.5 Default Values
2.3.6 Declaring Keys
2.3.7 Exercises For section 2.3
2.4 An Algebraic Query Language

2.4.1 Why Do We Need A Special Query Language?
2.4.2 What Is An Algebra?
2.4.3 Overview Of Relational Algebra
2.4.4 Set Operations On Relations
2.4.5 Projection
2.4.6 Selection
2.4.7 Cartesian Product
2.4.8 Natural Joins
2.4.9 Theta Joins
2.4.10 Combining Operations To Form Queries
2.4.11 Naming And Renaming
2.4.12 Relationships Among Operators
2.4.13 A Linear Notation For Algebraic Relations
2.5 Constraints On Relations

2.5.1 Relational Algebra As A Constraint Language
2.5.2 Referential Integrity Constraints
2.5.3 Key Constraints
2.5.4 Additional Constraint Examples
2.6 Summary Of Chapter 2

Notes:
the most important model of data is the 2 dimensional table or 'relation'.

we begin with an overview of data models.
2.1 An Overview Of Data Models
The notion of a datamodel is one of the most fundamental in the study of database
systems.
some basic terminology and most important data models.
2.1.1 What Is A Data Model?
A data model is a notation for describing data or information, generally consisting

of three parts.
1. Structure of data
the data model is a *conceptual* model of the data, not descriptive of the
underlying datastructures.
2. Operations On The Data a limited set of operations that can be performed.
a limited set of queries - operations that retrieve information
a limited set of modifications - operations that change the database.
These limits are not weaknesses, but strengths. These make it possible for
programmers to describe data operatons at a very high level, yet have database
implement the operations efficiently.
3. Constraints on the data
constraints describe limitations on the data.
e.g: (simple) "a day of the week is an integer between 1 and 7"
e.g: "a movie has at least one title"
very complex constraints can be put on the data, see chapter 7
2.1.2 Important Data Models

Two of the prominent data models (wrt databases are)
1. the relational data model, including object relational extensions
2. the semi structured data model, including XML and related standards.
2.1.3 Relational Model In Brief
Relational Model is based on tables. (_ the fundamental (conceptual) 'data

structure')
e.g:
title, year, length, genre
Gone With The Wind, 1939, 231, drama
Star Wars, 1977, 124, sciFi
Wayne's World, 1992, 95, comedy
Each row of this relation/table *can* be implemented as a C struct with fields

corresponding to the column names, but, in general, relations are not implemented
as in memory structures, and must take into account access patterns (etc) of large
disks.
The *operations* normally associated with relational model form the 'relational
algebra'. (see section 2.4)
The operations are table oriented, e.g: we can ask for all rows of a table that
have a specific value in a specific column.
a brief example of constraints - we could ensure that the 'genre' values are drawn
from a fixed set, or we could ensure that there aren't two movies of the same name
(this is incorrect wrt the real world domain of movie making)
2.1.4 Semistructured Model In Brief

Semistructured data resembles trees or graphs, rather than tables or arrays.
XML represents this with hierarchical tags.
These tags represent what roles are played by enclosed data, as column names do in
the relational model.
Example:
<Movies>
<Movie title="Gone With the Wind">
<Year>1939</Year>
<Length>231</Length>
<Genre>drama</Genre>
</Movie>
<Movie title="Star Wars">

<Year>1977</Year>
<Length>124</Length>
<Genre>sciFi</Genre>
</Movie>
<Movie title="Wayne’s World">

<Year>1992</Year>
<Length>95</Length>
<Genre> comedy</Genre>
</Movie>
</Movies>
Constraints often involve the datatype of values associated with a tag.

E.g: the values associated with the <Length> tag are integers (or strings)
E.g: each Movie tag *must* have a (single) length tag within it.
2.1.5 Other Models In Brief
There are many other data models that have been associated with databases.
E.g: a modern trend is to add object oriented features to the relational model.
There are two effects of OO on relations
1. Values can be structured, instead of being elementary types such as integer

or strings.
2. Relations can have associated *methods*.
see section 10.3 for these.
There are database models of the purely object kind. see section 4.9
There are several other models that have fallen out of disuse.
e.g: the hierarchical model a tree model, like the semistructured model
another model is the network model, a graph oriented, physical level model. this is
like the hierarchical model, but unlike the hm, it does not favor trees.
2.1.6 Comparison Of Modeling Approaches

It seems that the hierarchical model is more flexible than the relational model.
Nevertheless, Relational Model is still preferred to the HM.
A brief argument,
because databases are large (deal with large amounts of data) access to data,
and modifications to the data, must be efficient. also, very important is ease of
use. Both these can be achieved by a (relational) model that
a. provides a *simple*, *limited* approach to structuring data, yet is
*reasonably* versatile, so anything can be modeled.
b. provides a limited yet (_ultimately) versatile collection of operations on
the data
Together (KEY) this limitations turn into features, which allow us to implement
languages, *like* SQL that allow programmers to express themselves at very high
levels (_of abstraction). But since SQL has a limited number of operations, we can
optimize them to run fast.
2.2 Basics Of Relational Model
The relational model gives us a single way to represent data - as a relation
title, year, length, genre

Gone With The Wind, 1939, 231, drama
Star Wars, 1977, 124, sciFi
Wayne's World, 1992, 95, comedy
2.2.1 Attributes
The columns of a relations are named by attributes.
In the above example, the attributes are title, year, length, genre
Usually an attribute describes the meaning of entries in the columns.
2.2.2 Schemas
The name of a relation and the set of attributes for the relation are called the
'schema' of that relation.
We represent the schema as the relation name followed by a parenthesized list of
its attributes.
e.g: Movies(title, year, length, genre)
The attributes of a relation are the set, not a list, so we have to declare a
'standard order'.
However for the relation above, we could take the given order as 'standard'.
In the Relatonal Model, a database consists of one or more relations.

The set of schema for all the constituent relation schemas is called the 'database
schema'.
2.2.3 Tuples
The rows of a relation, other than the header rows containing the attributes are
called 'tuples'.
A tuple has one 'component' for each attribute of the schema.
e.g: The first of three tuples in the relation above has the components "Gone With
The Wind", 1939, 231, drama, for the attributes, title, year, length, genre
respectively.
Convention: Relation Numbers start with a capital letter.

attribute names start with lower case letter.
however we also use R (A,B,C) for a generic relation having 3 named attributes.
2.2.4 Domains
The relational model demands that each attribute is atomic, i.e it must be of some
elementary type like integer or string.
they can*not* be compound - like structures, sets, list, array, or any other type
that can be broken down.
Each attribute has a 'domain' (which seems to be a synonym for 'elementary type')
We can include the domain of each attribute in the representation of the schema
E.g
Movies(title : string, year : integer, length : integer, genre : string)
2.2.5 Equivalent Representations Of A Relation

Relations are *sets* of tuples, and relations have a *set* of attributes.
So both tuples and attributes can be reordered.
aka rows and columns can be re ordered without the relation 'being different'
2.2.6 Relation Instances
Relations can change over time in two ways.

1. attributes can be added or deleted.
2. *tuples* can be added, deleted, or modified.
A relation with a specific set of atttributes and corresponding tuples are called
'instances' of that tuple. (thought short cut - a frozen relation is an 'instance')
A conventional database maintains only *one* instance of a tuple.

Databases that contain historical versions of relatons are called temporal
databases.
2.2.7 Keys Of Relations
The Relational Model allows many types of constraints. These are discussed in
Chapter 7.
However one kind of constraint is crucial : key constraints.
A set of attributes form a key (constraint for a relation) if we do not allow two
tuples in a relation instance to have the same values for all attributes of the
key.
In a relational scheme, attributes forming a key are underlined (here *d)
e.g
Movies(*title*, *year*, length, genre)
so there cannot be two tuples for this relation which have the exact same values
for both title *and* year.
Important: many real life relations use *artificial keys*.

e.g: employee-id, social_security_number etc
2.2.8 An Example Database Schema
The schema itself is
Movies(
*title* : string,
*year* : integer,
length : integer,
genre : string,
studioName : string,
producerC# : integer
MovieStar(
*name* : string,
address : string,
gender : char,
birthdate : date
)
StarsIn(
*movieTitle* : string,
*movieYear* : integer,
*starName* : string
MovieExec(
name : string,
address : string,
*cert#* : integer,
netWorth : integer
)
Studio(
*name* : string,
address : string,
presC# : integer
Comments:
The relation Movie has a key consisting of the attributes, name, and year.
studioName is the name of the studio that owns the movie
producerC# is an integer that represents the producer (see MovieExec
comments)
The relation MovieStar has the name attribute as the key, we use here the
convenient fiction that movie star names are unique. A more conventional approach
like using SS number or assigning each individual a unique number would work.
new datatypes: character for Gender ('m' or 'f') and date.
The relation StarsIn connects stars to movies they act in, and movies to their
stars.
Note that the key consists of all three attributes. also, two of the
attributes are actually the movie relation's keys, though we use different names
here. (you need an explicit statement of this being a foreign key etc)
MovieExec represents movie executives. A unique key is assigned to each movie

exec in the database, represented by the attribute cert#
Studio represents movie studios. The key is the studio name (we assume two
movie studios don't have the same name).
We assume the movie studio has a president who is a movie exective, and so the
MovieExec relation's key is present in this relation, to identify the individual
who is the studio's president.
2.3 Defining A Relation Schema In SQL
SQL is the principal language used to describe and manipulate relational databases.
Current standard = SQL 99.
Two aspects to SQL
1. The Data Definition sublanguage - for describing database schemas
2. The Data Manipulation sublanguage - for querying and modifying databases
Here, an overview of the DDL . More in Chapter 6 and 7.
2.3 Defining A Relation Schema in SQL

2.3.1 Relations In Sql
SQL makes a distinction between three kinds of relations

1. tables - 'ordinary' relations, that exist in the db, and can be modified and
queried.
2. views - relations that are defined by a computation. They are not stored,
but are constructed, in whole or part, as needed. (section 8.1)
3. temporary tables - constructed by the SQL language processors, when it
performs its executing queries and data modifications. These relations are thrown
away and not stored.
In this section- how to declare tables, (the first type, not the second or third).
the CREATE TABLE statement declares the schema for a stored relation, and gives the
name for a table, its attributes, their data types, allows us to declare a key, or
even several keys, declaring other constraints, declaring indexes.
2.3.2 Data Types

Primitive data types supporting to be SQL
1. Character strings of fixed or varying lengths.
CHAR(n) denotes a fixed length string of length *upto* n characters.
VARCHAR(n) also denotes a string of upto n characters. the difference
between the two is implementation dependent, related to padding, and string end
markers.
2. BIT(n) and BIT VARYING (n) denote bit strings of fixed and varying lengths
(upto n) respectively.
3. BOOLEAN denotes an attribute whose value is logical - TRUE, FALSE, *and

UNKNOWN*.
4. The type INT, or INTEGER
5. FLOAT or REAL (they are synonyms) for floating point numbers. Also DECIMAL
(6, 2) etc.
6. DATE s and TIMe s. DATE '1948-05-14'. TIME '15:03:02.5'
2.3.3 Simple Table Declarations
The simplest way to declare a relation is to use the keywords CREATE TABLE followed
by the name of a relation, and a parenthesized, comma separated list of the
attribute names and their types.
e.g:
CREATE TABLE Movies (

title CHAR(100),
year INT,
length INT,
genre CHAR(10),
studioName CHAR(30),
producerC# INT
); note : semicolon
e.g:
CREATE TABLE MovieStar(

name CHAR(30),
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
2.3.4 Modifying Relation Schemas
We know (previous section) how to create a schema.

We can modify an existing schema in two ways
1. we can delete it completely from the data base
2. we can modify the schema of an existing relation (the more common operation)
Delete a relation R from the data base with the statement
DROP TABLE R;
Relation R is now no longer available, nor any of its tuples.
To modify an existing relation, we *start with* a statement that begins with "ALTER
TABLE $NAME_OF_RELATION
We have several options the most important of which are
a. ADD followed by an attribute name and its data type
b. DROP followed by an attribute name.
e.g ALTER TABLE MovieStar ADD phone CHAR (16);
the tuples of MovieStar now have the phone attribute, but existing tuples will have
the special value NULL
ALTER TABLE MovieStar DROP birthdate;
2.3.5 Default Values
When we create or modify tuples ('in' a specific relation instance) sometimes we

don't have values for all the attributes.
As mentioned above, when we add a column to a relation, the existing tuples don't
have a value for the newly introduced attribute, and so the column values for
already existing tuples is 'NULL'
But we may want a different default value.
e.g:
ALTER TABLE MovieStar ADD PHONE CHAR(16) DEFAULT 'unlisted';

name CHAR(30) PRIMARY KEY,
address VARCHAR (255),
gender CHAR(1),
birthdate DATE
);
2.3.6 Declaring Keys
There are two kinds of declarations to indicate 'keyness - PRIMARY KEY, or UNIQUE
(more below)
There are two ways to declare an attribute , or a set of attributes to be a key in
the CREATE TABLE statement
1. We may declare *one attribute* to a key when that attribute is declared in
the schema

name CHAR(30) PRIMARY KEY,
gender CHAR(1),
birthdate DATE
);
Example:
2. We may add to the *list of items declared in the schema* (schema so far has
been only a list of attributes) with an additional declaration that states a
specific attribute, or a set of attributes, is a key.

name CHAR(30),
gender CHAR(1),
birthdate DATE,
PRIMARY KEY (name, gender)
);
2.4 An Algebraic Query Language
- the data manipulation aspect of the relational model. A data model is not just a
structure. There needs to be a way to modify and query data.
We learn an algebra - a relational algebra - that consists of several ways to
construct new relations from existing relations.
When given relations are data, the new relations can be answers to queries on that
data.
RelAlg is not used as a query language in real life databases, but the 'real' query
language, SQL incorporates relational algebra.
Many SQL programs are 'syntactically sugared' relational algebra expressions. When
an RDBMS handles SQL queries, the first step is to transform the SQL query into
relational algebra, or an equivalent representation.
2.4.1 Why Do We Need A Special Query Language?
why not use an existing programming language like C?
Surprising answer: Relational Algebra is less powerful than C or Java, and

paradoxically, so, more useful.
There are computations one can perform in (say) Java that one cannot in relalg.
E.g: compute whether the number of tuples in a relation is even or odd. (_ there
isn't a way to do this in SQL)
But, by restricting what we can say or do in our query language, we get two huge
advantages.
1. ease of programming (_ because *everything* turing computable isn't possible in
the language we use)
2. ability of compiler to produce highly optimized code. (_ again because the
language to be compiled is smaller/simpler)
2.4.2 What Is An Algebra?
An algebra in general consists of atomic operands and operators.

In arithmetic, for example, atomic operands are variables like x and constants like
15.
the operators are addition, subtraction etc.
an algebra allows us to build expressions ,with parentheses to group operations.
(_ also I think an algebra is *closed* under the operators a + b is still an
integer)
In relational algebra
1. the operands are (a) variables that stand for relations (b) constants that
are finite relations
In the next section we examine the operations of relalg.
2.4.3 Overview Of Relational Algebra
The operations of relalg fall into four categories.

a) set operations - union, intersection, difference - applied to relations.
b) operations that remove part of a relation - 'selection' eliminates some
tuples, 'projection' eliminates columns.
c) operations that combine tuples of two relations - including 'cartesian
product', which combines tuples of two relations in all possible ways, and various
kinds of join operations which selectively pair tuples from two relations.
d) an operation called renaming, which does not affect tuples of a relation but
changes the name of the attributes, and/or the name of the relation itself.
Operations of relational algebra are known as 'queries'.
2.4.4 Set Operations On Relations

with these conditions on relations R and S
- R and S must have schemas with identical sets of attributes, with the same
types for each attribute
- the columns of R and S must have the same order of attributes
- sometimes R and S have the same number of attributes with corresponding
identical domains, but the attributes have different names in each relations, so we
use the renaming operator (see below)
the following set operations are defined for relations
R union S = the set of elements (_ tuples?) that are in R or S or both. Even if

an element (_ tuple) is present in both R and S, it appears only once in the union
(but, see relations as bags below)
R intersection S = the set of elements in both R and S
R difference S = the set of elements that are in R but not in S
Example:
Let R =
name, address, gender, birthdate
Carrie Fisher, 123 Maple St., Hollywood, 9/9/99
Mark Hamill, 456, Oak Rd, Brentwood, 8/8/88
Let S =
name, address, gender, birthdate
Harrison Ford, 789 Palm Dr., BeverlyHills, 7/7/77
then R union S =
Harrison Ford, 789 Palm Dr., BeverlyHills, 7/7/77
R intersection S =
R difference S
2.4.5 Projection
The projection operator (greek pi) produces from the relation R a new relation by
removing some of R's columns.
The value of the expression pi_ A1, A2, ..., A_n (R) is a relation that has only
the attributes from the columns (of R) .
The schema for the resulting value ( a relation) is the set of attributes
{A1,A2, ... A_n} which we conventionally show in the order A1, ... A_n.
Let the relation 'Movies' be
title, year, length, genre, studioName, producerC#

Star Wars, 1977, 124, scifi, Fox, 12345
Galaxy Quest, 1999, 104, comedy, DreamWorks, 67890
Wayne's World, 1992, 95, comedy, Paramount, 99999
Example 2.9
pi_title, year, length(Movies) gives us the relation
title,year, length
Star Wars, 1977, 124
Galaxy Quest, 1999, 104
Wayne's World, 1992, 95
Example
pi_genre(Movies) gives us
genre
scifi
comedy.
Note: only two tuples (rem: a relation instance is a *set* of tuples)
2.4.6 Selection
The selection operator, applied to a relation produces a new relation with a subset
of R's tuples, that sastisfies condition C (of type boolean) that involves R's
attributes (and constants)
We apply the condition C to each tuple of R, substituting for attribute A in the

condition, the value of that attribute v from the tuple. If C then evaluates to
true, then that tuple is included in the result

then
select_(length > 100) gives

select_(length >= 100 and studioName = 'Fox'
gives

2.4.7 Cartesian Product

is the cross product of two *sets* R and S denoted R X S, is the set of pairs
formed by selecting (in all possible ways) the first element of the pair from R,
the second from S.
Essentially the same for relations, but the elements are tuples. Tuples can have
more than one component.
The result of pairing 1 tuple from R with another from S is a longer tuple, with an
attribute (in the longer tuple) for each tuple in R and S.
Conventionally, the attributes of R preceed the attributes of S in the result
tuple.
If R and S have the same attributes, we use R.A and S.A in the resulting tuple.
2.4.8 Natural Joins

We want to join tuples whose attributes match in some way.
The simplest such is that we join the tuples from R and S (into a new tuple for
the resulting relation) only when the tuples match in the common attributes (common
in the schemas of R and S and the tuples have identical values for *those*
attributes).
More precisely let A1, A2, .... A_n be the common attributes of R and S.
Then, a tuple from R and a tuple from S are joined (to form a tuple in R nj S) only
iff all values of A1 ... An match in both tuples.
let relation R be
A,B
1,2
3,4
let relation S be
B,C,D
2,5,6
4,7,8
9,10,11
then R nj S is (note that the common attribute here is B)
A,B,C,D
1,2,5,6
3,4,7,8
Here the only common attribute between relations R and S is B

So tuples of R and S need only agree in the value of B to be joined as tuples of R
nj S
For a more complex example let R be
A,B,C
1,2,3
6,7,8
9,7,8
Let S be
B,C,D
2,3,4
2,3,5
7,8,10
Here the common attributes are B *and* C.
so R nj S
has tuples
A,B,C,D
1,2,3,4
1,2,3,5
6,7,8,10
9,7,8,10
each tuple in R is joined to each tuple in R where the common attribute values
match.
2.4.9 Theta Joins
The natural join combines tuples from R and S on *one* specific condition - the
equality of shared attribute values.
It is sometimes necessary to combine tuples based on *other* conditions.
For this purpose we have the 'theta join' in which theta represents an arbitrary
condition, we'll use C instead, and use the 'bowtie' notation (in the textbook) of
the natural join with C as a subscript indicating the condition to be satisfied.
For the theta join.

1. Take the cross product of R and S
2. Select only those tuple pairs in which condition C is satisfied.
3. this collection of 'condition satisfying tuples' are the tuples of the theta
join result
Example: (B and C are common attributes)
let R be
A,B,C
1,2,3
6,7,8
9,7,8
Let S be
B,C,D
2,3,4
2,3,5
7,8,10
We need R theta_join S where Condition is A < D
The result is
A, R.B, R.C , S.B, S.C, D (note: relation namespaced common attributes)
1,2,3,2,3,4
1,2,3,2,3,5
1,2,3,7,8,10
6,7,8,7,8,10
9,7,8,7,8,10
Note: In the case of a theta join there is no guarantee that shared attributes will
agree in value in the combined tuple (_ so we have to list them separataly with the
name of the relation prefixed (e.g R.C, S.C etc)
Example R theta_join S with Condition == A < D and R.B (not =) S.B
the resulting relation, with one tuple, is
A, R.B, R.C , S.B, S.C, D

1,2,3,7,8,10
2.4.10 Combining Operations To Form Queries
basic idea: algebraic operations can be composed, the output of one operation
feeding into the input of another.
parentheses group operators.
example 2.17

we want the title and year of movies produced by Fox that are at least 100 minutes
long
one option
a) *select* movies with studioName = Fox
b) *select* movies with length > 100
c) compute intersection of the results of 1 and 2
b) project result of 3 onto title and year
we could also do
a) select movies with studioname = Fox *AND* length > 100
b) project result onto title and year
Equivalent Expressions and Query Optimizations.
Most db systems have a query language based on relational algebra.

Therefore there are often many logically equivalent queries which return the same
relations.
Some of these logical queries maybe more suitable to efficient query execution.
So a component called the query optimizer replaces queries with logically
equivalent but more execution efficient queries.
2.4.11 Naming And Renaming

It is often convenient to have an operator to rename relations.
the book has a weird notation so i'm using rename (rel_name, target_name
attributes)
so rename (R, S, A1, .. A_n) gives a relation S that is the same as R, but with the
attributes renamed (in order) from A1 thru An
if we want to keep the attribute names intact we do rename (R,S)
renames (S, S, X, Y Z) renames the three attributes of S (from say A,B,C) to X, Y,Z
no concrete example.
2.4.12 Relationships Among Operators
some operators can be expressed in terms of others.
e.g intersection in terms of set difference
R intersect S = R - (R - S)
R theta-join(condition C) S = select_C (R X S)
R natural-join S = select_C (R X S) where C = R.A_1 = S.A_1 AND R.A_2 = S.A_2

AND ..... R.A_n = S.A_n where A_1 thru A_n are the attributes that appear in both
schema. and then project only one copy of the shared attributes (say R's)
The 'core' or 'base' operations which cannot be written in terms of others are -
union, difference, selection, projection, (cross) product, renaming.
2.4.13 A Linear Notation For Algebraic Relations

basic idea: instead of a tree or an s-expression (- which is essentally functional)
use assignment with new variables and ordering of statements.
so
R(t,y,l,i,s,p) := select (Movies, length > 100)
S(t,y,l,i,s,p) := select (Movies, studioName = 'Fox')
T(t,y,l,i,s,p) := R intersection S
Answer (title, year) := project(T, title, year)
basically every interior node in a tree has its own variable, on which operators
higher up on the tree operate.
(from the exercises)
(ex: 2.4.6) an operator is said to be monotone, when, if a tuple is added to any of

its arguments, the result of the operator contains every tuple it did before, and
*possibly* more tuples (after adding tuples to its arguments).
Which of the operators we learned are monotone?

1. union is monotone.
2. intersection is monotone. adding tuples to either relation can only
increase the number of tuples intersection, never reduce it.
3. difference: consider R difference S . this is the set of tuples that are in
R but not in S. but if you add a tuple that is in R, to S, the number of tuples in
the result of the operator reduces. So difference is *not* monotone .
4. selection is monotone
5. projection is monotone.
6. crossproduct is monotone
7. natural join is monotone
8. theta join is monotone
9. renaming is monotone (does this even make sense?)
(ex: 2.4.7)
Suppose relations R and S have m and n tuples (Note: reversed m and n from the
text) respectively. Give the minimum and maximum numbers of tuples that the results
of the following expressions can have
a. R union S ; maximum = m + n (no common tuples between R and S)

minimum = m = n (all tuples are common)
a2. R intersection S
maximum = m = n (all tuples are common)
minimum = 0 (no tuples in common)
b. R natural join S
maximum = m * n ( no attributes in common)
minimum = 0 (have common attributes, which have no equal values
in R, S)
c. sigma_c (R) cross S

maximum = m * n (selection returns all of R's tuples)
minimum = 0 (selection returns 0 of R's tuples)
d. project_L (R) difference S for some condition R

maximum = m (projection L returns a set of tuples which have no
common tuples with S)
minimum = 0 (projection L returns a set of Tuples which have
exactly the same tuples as S)
Third important aspect of relational model = the ability to restrict the data that
maybe stored in the database.
so far we have seen only one kind of constraint, that of one or more attributes
acting as a key.
there are many more kinds of constraints
e.g: 'referential integrity constraints' - the value of one column of a
relation must appear in another column of the relation or a column of another
relation.
(here we use relalg but in chapter 7 we see how SQL can express the same
constraints)
(ex: 2.4.8)
The semijoin of a is the set of tuples t in R s.t there is at least one tuple u in
S such that u and t have common attributes to be equal.
a bit abstract, so try with the data for the natural join
let relation R be
A,B
1,2
3,4
let relation S be
B,C,D
2,5,6
4,7,8
9,10,11
then R semijoin S is (note that the common attribute here is B)
A,B,
1,2
3,4
(in this case, R semijoin S is the same as R)
S semijoin R would be
B,C,D
2,5,6
4,7,8
(this seems like a natural join but only the tuples in R are in the result, there
is no 'join')
so 1. projection on R natural join S s.t only the columns of R are present

2. same, but expressed in terms of a theta join
3. select for R s.t R.a elementOf S.a (as a set), R.b elementOf S.b(as a set)
etc
oo
(ex:2.4.10 )
A relation R has attributes A1, A2,... A_n, B1, B2, .... B_m.
Let S be a reletion with scheme B1, B2,... B_m. Iow, S's attributes are a subset of
R's.
R quotient S is the set of tuples t over A1...A_n (i.e non S attributes of R) such
that for every tuple s in S, ts is a tuple of R
(fair enough, but I suspect 'quotient' is clearer in terms of relations, see Set
Theory book)
2.5 Constraints on relations
third aspect on data model == constraints on the model (_ first two, structure,
operations)
So for we only saw one kind of constraint, a set of attributes of a relation acting
as a key.
now we *also* look at 'referential integrity' constraints - iow, a value appearing
in a column of one relation must also appear in another column of the same (??!!)
or another relation.
2.5.1 Relational Algebra As A Constraint Language
(some confused writing here, but the key idea seems to be that there are often 'two
ways' to express a constraint. as far as I can see the differences involve using
set notation vs using equality)
e.g given : R subset-of S vs R-S = 0

R subset-of Null vs R = Null
the 'equal to empty set' style is more prevalent in SQL
2.5.2 Referential Integrity Constraints
Example: In our movie database, if a person p appears in the starsIn relation,

under the 'starName' attribute, we also expect the same person p to appear in the
MovieStar relation, under the 'name' attribute.
Reminder Movie Database schema
Movies(
*title* : string,
*year* : integer,
length : integer,
genre : string,
studioName : string,
producerC# : integer
MovieStar(
*name* : string,
address : string,
gender : char,
birthdate : date
)
StarsIn(
*movieTitle* : string,
*movieYear* : integer,
*starName* : string
MovieExec(
name : string,
address : string,
*cert#* : integer,
netWorth : integer
)
Studio(
*name* : string,
address : string,
presC# : integer
the *...* s are primary keys.
In general, relational constraint == if a value v occurs 'under' an attibute A of

*some* tuple in relation R, we also expect v to appear as a component of attribute
B in relation S. This is driven by our design intentions.
We express this in relational algebra as
project_A (R) subsetOf project_B (S). (_ so B in S *can* have values not in R.A but
every value in R.A must be in S.B)
or with alternative notation
project_A (R) difference project_B (S) = null
Example 2.21
Consider these relations from our movie database
Movies (title, year, length, genre, studioName, producerC#)

MovieExec(name, address, cert#, netWorth)
we would expect the values of producerC# in movies would appear as the cert# of
some executive (tuple) in MovieExec (_ otherwise there would be a producer who is
not a movieExec. also there can be movie execs who are not directors).
This constraint can be expressed as
project_producerC# (Movies) subsetOf project_cert#(MovieExec)
Example 2.22
A referential constraint where the 'value' involved is represented by more than one
attribute.
Any movie mentioned in the 'StarsIn' relation must appear in the movies relation.
The key difference here is that Movies are identified (uniquely, so via primary key
=) year *and* title. so we use subset of *pairs* to express this constraint
project_(movietitle, movieyear) (StarsIn) subsetOf project_(title,year) Movies
2.5.3 Key Constraints
We use the same notation for key constraints

To express "an attribute or set of attributes is a key for a relation"
e.g: 'name' is the key for the relation MovieStar(name, address, gender, birthrate)
(For now assuming that we are concerned only with the address attribute, given name
is a key)
Let us rename the MovieStar relation to get two new 'names' MS1, MS2
rename_MS1(name, address, gender, birthdate) (MovieStar)

rename_MS2(name, address, gender, birthdate) (MovieStar)
and then
select_(MS1.name = MS2.name AND MS1.address NOT = MS2.address) (MS1 X MS2) = NULL;
2.5.4 Additional Constraint Examples
There are many kinds of constraints that can be expressed with relational algebra,
which are used for restricting database contents.
Two examples of domain constraints
gender must be either 'M' or 'F' on relation MovieStar becomes select_ gender
MovieStar != 'M' AND select_gender MovieStar != 'F' = NULL
to be a moviestudio president, you need a net worth of at least 10 million .
given
MovieExec (name, address, certC#, netWorth)
Studio(name, address, presC#)
step 1. Studio ThetaJoin(certC# = presC#) MovieExec

step 2. select(networth < 10,000,000) [Studio ThetaJoin(certC# = presC#)
MovieExec] == NULL
or
step 1. Select (netWorth >= 10,000,000) MovieExec, then
step 2. Project (certC#) [Select (netWorth >= 10,000,000) MovieExec]
step 3. Project (presC#) Studio subsetOf (2)

chp2 Rel Mod

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

chp2 Rel Mod

Uploaded by

Copyright:

Available Formats

The Relational Model Of Data

2.1 An Overview Of Data Models

2.2 Basics Of Relational Model

2.3 Defining A Relation Schema in SQL

2.4 An Algebraic Query Language

2.5 Constraints On Relations

2.6 Summary Of Chapter 2

the most important model of data is the 2 dimensional table or 'relation'.

2.1 An Overview Of Data Models

2.1.1 What Is A Data Model?

A data model is a notation for describing data or information, generally consisting

2.1.2 Important Data Models

2.1.3 Relational Model In Brief

Relational Model is based on tables. (_ the fundamental (conceptual) 'data

Each row of this relation/table *can* be implemented as a C struct with fields

2.1.4 Semistructured Model In Brief

<Movie title="Star Wars">

<Movie title="Wayne’s World">

Constraints often involve the datatype of values associated with a tag.

2.1.5 Other Models In Brief

1. Values can be structured, instead of being elementary types such as integer

2.1.6 Comparison Of Modeling Approaches

2.2 Basics Of Relational Model

The relational model gives us a single way to represent data - as a relation

title, year, length, genre

e.g: Movies(title, year, length, genre)

In the Relatonal Model, a database consists of one or more relations.

Convention: Relation Numbers start with a capital letter.

Movies(title : string, year : integer, length : integer, genre : string)

2.2.5 Equivalent Representations Of A Relation

2.2.6 Relation Instances

Relations can change over time in two ways.

A conventional database maintains only *one* instance of a tuple.

2.2.7 Keys Of Relations

However one kind of constraint is crucial : key constraints.

In a relational scheme, attributes forming a key are underlined (here *d)

Movies(*title*, *year*, length, genre)

Important: many real life relations use *artificial keys*.

The schema itself is

new datatypes: character for Gender ('m' or 'f') and date.

MovieExec represents movie executives. A unique key is assigned to each movie

2.2.9 Exercises For Section 2.2

2.3 Defining A Relation Schema In SQL

Here, an overview of the DDL . More in Chapter 6 and 7.

2.3 Defining A Relation Schema in SQL

SQL makes a distinction between three kinds of relations

2.3.2 Data Types

3. BOOLEAN denotes an attribute whose value is logical - TRUE, FALSE, *and

4. The type INT, or INTEGER

6. DATE s and TIMe s. DATE '1948-05-14'. TIME '15:03:02.5'

2.3.3 Simple Table Declarations

CREATE TABLE Movies (

CREATE TABLE MovieStar(

We know (previous section) how to create a schema.

Delete a relation R from the data base with the statement

Relation R is now no longer available, nor any of its tuples.

e.g ALTER TABLE MovieStar ADD phone CHAR (16);

2.3.5 Default Values

When we create or modify tuples ('in' a specific relation instance) sometimes we

ALTER TABLE MovieStar ADD PHONE CHAR(16) DEFAULT 'unlisted';

CREATE TABLE MovieStar(

2.3.6 Declaring Keys

CREATE TABLE MovieStar(

Each row of this relation/table can be implemented as a C struct with fields

A conventional database maintains only one instance of a tuple.

Movies(title, year, length, genre)

Important: many real life relations use artificial keys.

Note: only two tuples (rem: a relation instance is a set of tuples)

the ... s are primary keys.