You are on page 1of 21

The Relational Model Of Data

2.1 An Overview Of Data Models


2.1.1 What Is A Data Model?
2.1.2 Important Data Models
2.1.3 Relational Model In Brief
2.1.4 Semistructured Model In Brief
2.1.5 Other Models In Brief
2.1.6 Comparison Of Modeling Approaches

2.2 Basics Of Relational Model


2.2.1 Attributes
2.2.2 Schemas
2.2.3 Tuples
2.2.4 Domains
2.2.5 Equivalent Representations Of A Relation
2.2.6 Relation Instances
2.2.7 Keys Of Relations
2.2.8 An Example Database Schema
2.2.9 Exercises For Section 2.2

2.3 Defining A Relation Schema in SQL


2.3.1 Relations In Sql
2.3.2 Data Types
2.3.3 Simple Table Declarations
2.3.4 Modifying Relation Schemas
2.3.5 Default Values
2.3.6 Declaring Keys
2.3.7 Exercises For section 2.3

2.4 An Algebraic Query Language


2.4.1 Why Do We Need A Special Query Language?
2.4.2 What Is An Algebra?
2.4.3 Overview Of Relational Algebra
2.4.4 Set Operations On Relations
2.4.5 Projection
2.4.6 Selection
2.4.7 Cartesian Product
2.4.8 Natural Joins
2.4.9 Theta Joins
2.4.10 Combining Operations To Form Queries
2.4.11 Naming And Renaming
2.4.12 Relationships Among Operators
2.4.13 A Linear Notation For Algebraic Relations
2.4.14 Exercises For Section 2.4

2.5 Constraints On Relations


2.5.1 Relational Algebra As A Constraint Language
2.5.2 Referential Integrity Constraints
2.5.3 Key Constraints
2.5.4 Additional Constraint Examples
2.5.5 Exercises For Section 2.5

2.6 Summary Of Chapter 2


Notes:

the most important model of data is the 2 dimensional table or 'relation'.


we begin with an overview of data models.

2.1 An Overview Of Data Models

The notion of a datamodel is one of the most fundamental in the study of database
systems.
some basic terminology and most important data models.

2.1.1 What Is A Data Model?

A data model is a notation for describing data or information, generally consisting


of three parts.
1. Structure of data
the data model is a *conceptual* model of the data, not descriptive of the
underlying datastructures.
2. Operations On The Data a limited set of operations that can be performed.
a limited set of queries - operations that retrieve information
a limited set of modifications - operations that change the database.

These limits are not weaknesses, but strengths. These make it possible for
programmers to describe data operatons at a very high level, yet have database
implement the operations efficiently.
3. Constraints on the data
constraints describe limitations on the data.
e.g: (simple) "a day of the week is an integer between 1 and 7"
e.g: "a movie has at least one title"
very complex constraints can be put on the data, see chapter 7

2.1.2 Important Data Models


Two of the prominent data models (wrt databases are)
1. the relational data model, including object relational extensions
2. the semi structured data model, including XML and related standards.

2.1.3 Relational Model In Brief

Relational Model is based on tables. (_ the fundamental (conceptual) 'data


structure')

e.g:
title, year, length, genre
Gone With The Wind, 1939, 231, drama
Star Wars, 1977, 124, sciFi
Wayne's World, 1992, 95, comedy

Each row of this relation/table *can* be implemented as a C struct with fields


corresponding to the column names, but, in general, relations are not implemented
as in memory structures, and must take into account access patterns (etc) of large
disks.

The *operations* normally associated with relational model form the 'relational
algebra'. (see section 2.4)
The operations are table oriented, e.g: we can ask for all rows of a table that
have a specific value in a specific column.
a brief example of constraints - we could ensure that the 'genre' values are drawn
from a fixed set, or we could ensure that there aren't two movies of the same name
(this is incorrect wrt the real world domain of movie making)

2.1.4 Semistructured Model In Brief


Semistructured data resembles trees or graphs, rather than tables or arrays.
XML represents this with hierarchical tags.
These tags represent what roles are played by enclosed data, as column names do in
the relational model.
Example:

<Movies>
<Movie title="Gone With the Wind">
<Year>1939</Year>
<Length>231</Length>
<Genre>drama</Genre>
</Movie>

<Movie title="Star Wars">


<Year>1977</Year>
<Length>124</Length>
<Genre>sciFi</Genre>
</Movie>

<Movie title="Wayne’s World">


<Year>1992</Year>
<Length>95</Length>
<Genre> comedy</Genre>
</Movie>
</Movies>

Constraints often involve the datatype of values associated with a tag.


E.g: the values associated with the <Length> tag are integers (or strings)
E.g: each Movie tag *must* have a (single) length tag within it.

2.1.5 Other Models In Brief

There are many other data models that have been associated with databases.
E.g: a modern trend is to add object oriented features to the relational model.
There are two effects of OO on relations

1. Values can be structured, instead of being elementary types such as integer


or strings.
2. Relations can have associated *methods*.
see section 10.3 for these.

There are database models of the purely object kind. see section 4.9

There are several other models that have fallen out of disuse.
e.g: the hierarchical model a tree model, like the semistructured model

another model is the network model, a graph oriented, physical level model. this is
like the hierarchical model, but unlike the hm, it does not favor trees.

2.1.6 Comparison Of Modeling Approaches


It seems that the hierarchical model is more flexible than the relational model.
Nevertheless, Relational Model is still preferred to the HM.
A brief argument,
because databases are large (deal with large amounts of data) access to data,
and modifications to the data, must be efficient. also, very important is ease of
use. Both these can be achieved by a (relational) model that
a. provides a *simple*, *limited* approach to structuring data, yet is
*reasonably* versatile, so anything can be modeled.
b. provides a limited yet (_ultimately) versatile collection of operations on
the data

Together (KEY) this limitations turn into features, which allow us to implement
languages, *like* SQL that allow programmers to express themselves at very high
levels (_of abstraction). But since SQL has a limited number of operations, we can
optimize them to run fast.

2.2 Basics Of Relational Model

The relational model gives us a single way to represent data - as a relation

title, year, length, genre


Gone With The Wind, 1939, 231, drama
Star Wars, 1977, 124, sciFi
Wayne's World, 1992, 95, comedy

2.2.1 Attributes
The columns of a relations are named by attributes.
In the above example, the attributes are title, year, length, genre
Usually an attribute describes the meaning of entries in the columns.

2.2.2 Schemas
The name of a relation and the set of attributes for the relation are called the
'schema' of that relation.
We represent the schema as the relation name followed by a parenthesized list of
its attributes.

e.g: Movies(title, year, length, genre)

The attributes of a relation are the set, not a list, so we have to declare a
'standard order'.
However for the relation above, we could take the given order as 'standard'.

In the Relatonal Model, a database consists of one or more relations.


The set of schema for all the constituent relation schemas is called the 'database
schema'.

2.2.3 Tuples
The rows of a relation, other than the header rows containing the attributes are
called 'tuples'.
A tuple has one 'component' for each attribute of the schema.
e.g: The first of three tuples in the relation above has the components "Gone With
The Wind", 1939, 231, drama, for the attributes, title, year, length, genre
respectively.

Convention: Relation Numbers start with a capital letter.


attribute names start with lower case letter.
however we also use R (A,B,C) for a generic relation having 3 named attributes.

2.2.4 Domains

The relational model demands that each attribute is atomic, i.e it must be of some
elementary type like integer or string.
they can*not* be compound - like structures, sets, list, array, or any other type
that can be broken down.

Each attribute has a 'domain' (which seems to be a synonym for 'elementary type')
We can include the domain of each attribute in the representation of the schema

E.g

Movies(title : string, year : integer, length : integer, genre : string)

2.2.5 Equivalent Representations Of A Relation


Relations are *sets* of tuples, and relations have a *set* of attributes.
So both tuples and attributes can be reordered.
aka rows and columns can be re ordered without the relation 'being different'

2.2.6 Relation Instances

Relations can change over time in two ways.


1. attributes can be added or deleted.
2. *tuples* can be added, deleted, or modified.

A relation with a specific set of atttributes and corresponding tuples are called
'instances' of that tuple. (thought short cut - a frozen relation is an 'instance')

A conventional database maintains only *one* instance of a tuple.


Databases that contain historical versions of relatons are called temporal
databases.

2.2.7 Keys Of Relations

The Relational Model allows many types of constraints. These are discussed in
Chapter 7.

However one kind of constraint is crucial : key constraints.

A set of attributes form a key (constraint for a relation) if we do not allow two
tuples in a relation instance to have the same values for all attributes of the
key.

In a relational scheme, attributes forming a key are underlined (here *d)

e.g

Movies(*title*, *year*, length, genre)

so there cannot be two tuples for this relation which have the exact same values
for both title *and* year.

Important: many real life relations use *artificial keys*.


e.g: employee-id, social_security_number etc
2.2.8 An Example Database Schema

The schema itself is

Movies(
*title* : string,
*year* : integer,
length : integer,
genre : string,
studioName : string,
producerC# : integer

MovieStar(
*name* : string,
address : string,
gender : char,
birthdate : date
)

StarsIn(
*movieTitle* : string,
*movieYear* : integer,
*starName* : string

MovieExec(
name : string,
address : string,
*cert#* : integer,
netWorth : integer
)

Studio(
*name* : string,
address : string,
presC# : integer

Comments:
The relation Movie has a key consisting of the attributes, name, and year.
studioName is the name of the studio that owns the movie
producerC# is an integer that represents the producer (see MovieExec
comments)

The relation MovieStar has the name attribute as the key, we use here the
convenient fiction that movie star names are unique. A more conventional approach
like using SS number or assigning each individual a unique number would work.

new datatypes: character for Gender ('m' or 'f') and date.

The relation StarsIn connects stars to movies they act in, and movies to their
stars.
Note that the key consists of all three attributes. also, two of the
attributes are actually the movie relation's keys, though we use different names
here. (you need an explicit statement of this being a foreign key etc)

MovieExec represents movie executives. A unique key is assigned to each movie


exec in the database, represented by the attribute cert#

Studio represents movie studios. The key is the studio name (we assume two
movie studios don't have the same name).
We assume the movie studio has a president who is a movie exective, and so the
MovieExec relation's key is present in this relation, to identify the individual
who is the studio's president.

2.2.9 Exercises For Section 2.2

2.3 Defining A Relation Schema In SQL

SQL is the principal language used to describe and manipulate relational databases.
Current standard = SQL 99.
Two aspects to SQL
1. The Data Definition sublanguage - for describing database schemas
2. The Data Manipulation sublanguage - for querying and modifying databases

Here, an overview of the DDL . More in Chapter 6 and 7.

2.3 Defining A Relation Schema in SQL


2.3.1 Relations In Sql

SQL makes a distinction between three kinds of relations


1. tables - 'ordinary' relations, that exist in the db, and can be modified and
queried.
2. views - relations that are defined by a computation. They are not stored,
but are constructed, in whole or part, as needed. (section 8.1)
3. temporary tables - constructed by the SQL language processors, when it
performs its executing queries and data modifications. These relations are thrown
away and not stored.

In this section- how to declare tables, (the first type, not the second or third).

the CREATE TABLE statement declares the schema for a stored relation, and gives the
name for a table, its attributes, their data types, allows us to declare a key, or
even several keys, declaring other constraints, declaring indexes.

2.3.2 Data Types


Primitive data types supporting to be SQL
1. Character strings of fixed or varying lengths.
CHAR(n) denotes a fixed length string of length *upto* n characters.
VARCHAR(n) also denotes a string of upto n characters. the difference
between the two is implementation dependent, related to padding, and string end
markers.

2. BIT(n) and BIT VARYING (n) denote bit strings of fixed and varying lengths
(upto n) respectively.

3. BOOLEAN denotes an attribute whose value is logical - TRUE, FALSE, *and


UNKNOWN*.

4. The type INT, or INTEGER

5. FLOAT or REAL (they are synonyms) for floating point numbers. Also DECIMAL
(6, 2) etc.

6. DATE s and TIMe s. DATE '1948-05-14'. TIME '15:03:02.5'

2.3.3 Simple Table Declarations

The simplest way to declare a relation is to use the keywords CREATE TABLE followed
by the name of a relation, and a parenthesized, comma separated list of the
attribute names and their types.

e.g:

CREATE TABLE Movies (


title CHAR(100),
year INT,
length INT,
genre CHAR(10),
studioName CHAR(30),
producerC# INT
); note : semicolon

e.g:

CREATE TABLE MovieStar(


name CHAR(30),
address VARCHAR(255),
gender CHAR(1),
birthdate DATE
);
2.3.4 Modifying Relation Schemas

We know (previous section) how to create a schema.


We can modify an existing schema in two ways
1. we can delete it completely from the data base
2. we can modify the schema of an existing relation (the more common operation)

Delete a relation R from the data base with the statement

DROP TABLE R;

Relation R is now no longer available, nor any of its tuples.

To modify an existing relation, we *start with* a statement that begins with "ALTER
TABLE $NAME_OF_RELATION
We have several options the most important of which are
a. ADD followed by an attribute name and its data type
b. DROP followed by an attribute name.

e.g ALTER TABLE MovieStar ADD phone CHAR (16);

the tuples of MovieStar now have the phone attribute, but existing tuples will have
the special value NULL
ALTER TABLE MovieStar DROP birthdate;

2.3.5 Default Values

When we create or modify tuples ('in' a specific relation instance) sometimes we


don't have values for all the attributes.
As mentioned above, when we add a column to a relation, the existing tuples don't
have a value for the newly introduced attribute, and so the column values for
already existing tuples is 'NULL'
But we may want a different default value.

e.g:

ALTER TABLE MovieStar ADD PHONE CHAR(16) DEFAULT 'unlisted';

CREATE TABLE MovieStar(


name CHAR(30) PRIMARY KEY,
address VARCHAR (255),
gender CHAR(1),
birthdate DATE
);

2.3.6 Declaring Keys

There are two kinds of declarations to indicate 'keyness - PRIMARY KEY, or UNIQUE
(more below)
There are two ways to declare an attribute , or a set of attributes to be a key in
the CREATE TABLE statement
1. We may declare *one attribute* to a key when that attribute is declared in
the schema

CREATE TABLE MovieStar(


name CHAR(30) PRIMARY KEY,
address VARCHAR (255),
gender CHAR(1),
birthdate DATE
);

Example:
2. We may add to the *list of items declared in the schema* (schema so far has
been only a list of attributes) with an additional declaration that states a
specific attribute, or a set of attributes, is a key.

CREATE TABLE MovieStar(


name CHAR(30),
address VARCHAR (255),
gender CHAR(1),
birthdate DATE,
PRIMARY KEY (name, gender)
);

2.4 An Algebraic Query Language

- the data manipulation aspect of the relational model. A data model is not just a
structure. There needs to be a way to modify and query data.
We learn an algebra - a relational algebra - that consists of several ways to
construct new relations from existing relations.
When given relations are data, the new relations can be answers to queries on that
data.

RelAlg is not used as a query language in real life databases, but the 'real' query
language, SQL incorporates relational algebra.
Many SQL programs are 'syntactically sugared' relational algebra expressions. When
an RDBMS handles SQL queries, the first step is to transform the SQL query into
relational algebra, or an equivalent representation.

2.4.1 Why Do We Need A Special Query Language?

why not use an existing programming language like C?

Surprising answer: Relational Algebra is less powerful than C or Java, and


paradoxically, so, more useful.
There are computations one can perform in (say) Java that one cannot in relalg.

E.g: compute whether the number of tuples in a relation is even or odd. (_ there
isn't a way to do this in SQL)

But, by restricting what we can say or do in our query language, we get two huge
advantages.
1. ease of programming (_ because *everything* turing computable isn't possible in
the language we use)
2. ability of compiler to produce highly optimized code. (_ again because the
language to be compiled is smaller/simpler)

2.4.2 What Is An Algebra?

An algebra in general consists of atomic operands and operators.


In arithmetic, for example, atomic operands are variables like x and constants like
15.
the operators are addition, subtraction etc.
an algebra allows us to build expressions ,with parentheses to group operations.
(_ also I think an algebra is *closed* under the operators a + b is still an
integer)

In relational algebra
1. the operands are (a) variables that stand for relations (b) constants that
are finite relations
In the next section we examine the operations of relalg.

2.4.3 Overview Of Relational Algebra

The operations of relalg fall into four categories.


a) set operations - union, intersection, difference - applied to relations.
b) operations that remove part of a relation - 'selection' eliminates some
tuples, 'projection' eliminates columns.
c) operations that combine tuples of two relations - including 'cartesian
product', which combines tuples of two relations in all possible ways, and various
kinds of join operations which selectively pair tuples from two relations.
d) an operation called renaming, which does not affect tuples of a relation but
changes the name of the attributes, and/or the name of the relation itself.

Operations of relational algebra are known as 'queries'.

2.4.4 Set Operations On Relations


with these conditions on relations R and S
- R and S must have schemas with identical sets of attributes, with the same
types for each attribute
- the columns of R and S must have the same order of attributes
- sometimes R and S have the same number of attributes with corresponding
identical domains, but the attributes have different names in each relations, so we
use the renaming operator (see below)

the following set operations are defined for relations

R union S = the set of elements (_ tuples?) that are in R or S or both. Even if


an element (_ tuple) is present in both R and S, it appears only once in the union
(but, see relations as bags below)

R intersection S = the set of elements in both R and S

R difference S = the set of elements that are in R but not in S

Example:
Let R =
name, address, gender, birthdate
Carrie Fisher, 123 Maple St., Hollywood, 9/9/99
Mark Hamill, 456, Oak Rd, Brentwood, 8/8/88

Let S =
name, address, gender, birthdate
Carrie Fisher, 123 Maple St., Hollywood, 9/9/99
Harrison Ford, 789 Palm Dr., BeverlyHills, 7/7/77

then R union S =
Carrie Fisher, 123 Maple St., Hollywood, 9/9/99
Mark Hamill, 456, Oak Rd, Brentwood, 8/8/88
Harrison Ford, 789 Palm Dr., BeverlyHills, 7/7/77

R intersection S =
Carrie Fisher, 123 Maple St., Hollywood, 9/9/99

R difference S
Mark Hamill, 456, Oak Rd, Brentwood, 8/8/88

2.4.5 Projection

The projection operator (greek pi) produces from the relation R a new relation by
removing some of R's columns.

The value of the expression pi_ A1, A2, ..., A_n (R) is a relation that has only
the attributes from the columns (of R) .
The schema for the resulting value ( a relation) is the set of attributes
{A1,A2, ... A_n} which we conventionally show in the order A1, ... A_n.

Let the relation 'Movies' be

title, year, length, genre, studioName, producerC#


Star Wars, 1977, 124, scifi, Fox, 12345
Galaxy Quest, 1999, 104, comedy, DreamWorks, 67890
Wayne's World, 1992, 95, comedy, Paramount, 99999

Example 2.9
pi_title, year, length(Movies) gives us the relation

title,year, length
Star Wars, 1977, 124
Galaxy Quest, 1999, 104
Wayne's World, 1992, 95

Example

pi_genre(Movies) gives us

genre
scifi
comedy.

Note: only two tuples (rem: a relation instance is a *set* of tuples)

2.4.6 Selection

The selection operator, applied to a relation produces a new relation with a subset
of R's tuples, that sastisfies condition C (of type boolean) that involves R's
attributes (and constants)

We apply the condition C to each tuple of R, substituting for attribute A in the


condition, the value of that attribute v from the tuple. If C then evaluates to
true, then that tuple is included in the result

Let the relation 'Movies' be

title, year, length, genre, studioName, producerC#


Star Wars, 1977, 124, scifi, Fox, 12345
Galaxy Quest, 1999, 104, comedy, DreamWorks, 67890
Wayne's World, 1992, 95, comedy, Paramount, 99999

then

select_(length > 100) gives

title, year, length, genre, studioName, producerC#


Star Wars, 1977, 124, scifi, Fox, 12345
Galaxy Quest, 1999, 104, comedy, DreamWorks, 67890

select_(length >= 100 and studioName = 'Fox'

gives

title, year, length, genre, studioName, producerC#


Star Wars, 1977, 124, scifi, Fox, 12345

2.4.7 Cartesian Product


is the cross product of two *sets* R and S denoted R X S, is the set of pairs
formed by selecting (in all possible ways) the first element of the pair from R,
the second from S.

Essentially the same for relations, but the elements are tuples. Tuples can have
more than one component.
The result of pairing 1 tuple from R with another from S is a longer tuple, with an
attribute (in the longer tuple) for each tuple in R and S.
Conventionally, the attributes of R preceed the attributes of S in the result
tuple.
If R and S have the same attributes, we use R.A and S.A in the resulting tuple.

2.4.8 Natural Joins


We want to join tuples whose attributes match in some way.

The simplest such is that we join the tuples from R and S (into a new tuple for
the resulting relation) only when the tuples match in the common attributes (common
in the schemas of R and S and the tuples have identical values for *those*
attributes).

More precisely let A1, A2, .... A_n be the common attributes of R and S.

Then, a tuple from R and a tuple from S are joined (to form a tuple in R nj S) only
iff all values of A1 ... An match in both tuples.

let relation R be

A,B
1,2
3,4

let relation S be

B,C,D
2,5,6
4,7,8
9,10,11

then R nj S is (note that the common attribute here is B)

A,B,C,D
1,2,5,6
3,4,7,8

Here the only common attribute between relations R and S is B


So tuples of R and S need only agree in the value of B to be joined as tuples of R
nj S

For a more complex example let R be

A,B,C
1,2,3
6,7,8
9,7,8

Let S be

B,C,D
2,3,4
2,3,5
7,8,10
Here the common attributes are B *and* C.

so R nj S
has tuples

A,B,C,D
1,2,3,4
1,2,3,5
6,7,8,10
9,7,8,10

each tuple in R is joined to each tuple in R where the common attribute values
match.

2.4.9 Theta Joins

The natural join combines tuples from R and S on *one* specific condition - the
equality of shared attribute values.
It is sometimes necessary to combine tuples based on *other* conditions.
For this purpose we have the 'theta join' in which theta represents an arbitrary
condition, we'll use C instead, and use the 'bowtie' notation (in the textbook) of
the natural join with C as a subscript indicating the condition to be satisfied.

For the theta join.


1. Take the cross product of R and S
2. Select only those tuple pairs in which condition C is satisfied.
3. this collection of 'condition satisfying tuples' are the tuples of the theta
join result

Example: (B and C are common attributes)

let R be

A,B,C
1,2,3
6,7,8
9,7,8

Let S be

B,C,D
2,3,4
2,3,5
7,8,10

We need R theta_join S where Condition is A < D

The result is
A, R.B, R.C , S.B, S.C, D (note: relation namespaced common attributes)
1,2,3,2,3,4
1,2,3,2,3,5
1,2,3,7,8,10
6,7,8,7,8,10
9,7,8,7,8,10

Note: In the case of a theta join there is no guarantee that shared attributes will
agree in value in the combined tuple (_ so we have to list them separataly with the
name of the relation prefixed (e.g R.C, S.C etc)
Example R theta_join S with Condition == A < D and R.B (not =) S.B

the resulting relation, with one tuple, is

A, R.B, R.C , S.B, S.C, D


1,2,3,7,8,10

2.4.10 Combining Operations To Form Queries

basic idea: algebraic operations can be composed, the output of one operation
feeding into the input of another.
parentheses group operators.

example 2.17

Let the relation 'Movies' be

title, year, length, genre, studioName, producerC#


Star Wars, 1977, 124, scifi, Fox, 12345
Galaxy Quest, 1999, 104, comedy, DreamWorks, 67890
Wayne's World, 1992, 95, comedy, Paramount, 99999

we want the title and year of movies produced by Fox that are at least 100 minutes
long

one option
a) *select* movies with studioName = Fox
b) *select* movies with length > 100
c) compute intersection of the results of 1 and 2
b) project result of 3 onto title and year

we could also do
a) select movies with studioname = Fox *AND* length > 100
b) project result onto title and year

Equivalent Expressions and Query Optimizations.

Most db systems have a query language based on relational algebra.


Therefore there are often many logically equivalent queries which return the same
relations.
Some of these logical queries maybe more suitable to efficient query execution.
So a component called the query optimizer replaces queries with logically
equivalent but more execution efficient queries.

2.4.11 Naming And Renaming


It is often convenient to have an operator to rename relations.

the book has a weird notation so i'm using rename (rel_name, target_name
attributes)
so rename (R, S, A1, .. A_n) gives a relation S that is the same as R, but with the
attributes renamed (in order) from A1 thru An
if we want to keep the attribute names intact we do rename (R,S)

renames (S, S, X, Y Z) renames the three attributes of S (from say A,B,C) to X, Y,Z

no concrete example.

2.4.12 Relationships Among Operators

some operators can be expressed in terms of others.

e.g intersection in terms of set difference

R intersect S = R - (R - S)

R theta-join(condition C) S = select_C (R X S)

R natural-join S = select_C (R X S) where C = R.A_1 = S.A_1 AND R.A_2 = S.A_2


AND ..... R.A_n = S.A_n where A_1 thru A_n are the attributes that appear in both
schema. and then project only one copy of the shared attributes (say R's)

The 'core' or 'base' operations which cannot be written in terms of others are -
union, difference, selection, projection, (cross) product, renaming.

2.4.13 A Linear Notation For Algebraic Relations


basic idea: instead of a tree or an s-expression (- which is essentally functional)
use assignment with new variables and ordering of statements.

so
R(t,y,l,i,s,p) := select (Movies, length > 100)
S(t,y,l,i,s,p) := select (Movies, studioName = 'Fox')
T(t,y,l,i,s,p) := R intersection S
Answer (title, year) := project(T, title, year)

basically every interior node in a tree has its own variable, on which operators
higher up on the tree operate.

2.4.14 Exercises For Section 2.4

(from the exercises)

(ex: 2.4.6) an operator is said to be monotone, when, if a tuple is added to any of


its arguments, the result of the operator contains every tuple it did before, and
*possibly* more tuples (after adding tuples to its arguments).

Which of the operators we learned are monotone?


1. union is monotone.
2. intersection is monotone. adding tuples to either relation can only
increase the number of tuples intersection, never reduce it.
3. difference: consider R difference S . this is the set of tuples that are in
R but not in S. but if you add a tuple that is in R, to S, the number of tuples in
the result of the operator reduces. So difference is *not* monotone .
4. selection is monotone
5. projection is monotone.
6. crossproduct is monotone
7. natural join is monotone
8. theta join is monotone
9. renaming is monotone (does this even make sense?)
(ex: 2.4.7)
Suppose relations R and S have m and n tuples (Note: reversed m and n from the
text) respectively. Give the minimum and maximum numbers of tuples that the results
of the following expressions can have

a. R union S ; maximum = m + n (no common tuples between R and S)


minimum = m = n (all tuples are common)

a2. R intersection S
maximum = m = n (all tuples are common)
minimum = 0 (no tuples in common)

b. R natural join S
maximum = m * n ( no attributes in common)
minimum = 0 (have common attributes, which have no equal values
in R, S)

c. sigma_c (R) cross S


maximum = m * n (selection returns all of R's tuples)
minimum = 0 (selection returns 0 of R's tuples)

d. project_L (R) difference S for some condition R


maximum = m (projection L returns a set of tuples which have no
common tuples with S)
minimum = 0 (projection L returns a set of Tuples which have
exactly the same tuples as S)

Third important aspect of relational model = the ability to restrict the data that
maybe stored in the database.
so far we have seen only one kind of constraint, that of one or more attributes
acting as a key.
there are many more kinds of constraints
e.g: 'referential integrity constraints' - the value of one column of a
relation must appear in another column of the relation or a column of another
relation.
(here we use relalg but in chapter 7 we see how SQL can express the same
constraints)

(ex: 2.4.8)
The semijoin of a is the set of tuples t in R s.t there is at least one tuple u in
S such that u and t have common attributes to be equal.

a bit abstract, so try with the data for the natural join

let relation R be

A,B
1,2
3,4

let relation S be

B,C,D
2,5,6
4,7,8
9,10,11
then R semijoin S is (note that the common attribute here is B)

A,B,
1,2
3,4

(in this case, R semijoin S is the same as R)

S semijoin R would be

B,C,D
2,5,6
4,7,8

(this seems like a natural join but only the tuples in R are in the result, there
is no 'join')

so 1. projection on R natural join S s.t only the columns of R are present


2. same, but expressed in terms of a theta join
3. select for R s.t R.a elementOf S.a (as a set), R.b elementOf S.b(as a set)
etc
oo

(ex:2.4.10 )

A relation R has attributes A1, A2,... A_n, B1, B2, .... B_m.
Let S be a reletion with scheme B1, B2,... B_m. Iow, S's attributes are a subset of
R's.

R quotient S is the set of tuples t over A1...A_n (i.e non S attributes of R) such
that for every tuple s in S, ts is a tuple of R

(fair enough, but I suspect 'quotient' is clearer in terms of relations, see Set
Theory book)

2.5 Constraints on relations

third aspect on data model == constraints on the model (_ first two, structure,
operations)
So for we only saw one kind of constraint, a set of attributes of a relation acting
as a key.
now we *also* look at 'referential integrity' constraints - iow, a value appearing
in a column of one relation must also appear in another column of the same (??!!)
or another relation.

2.5.1 Relational Algebra As A Constraint Language

(some confused writing here, but the key idea seems to be that there are often 'two
ways' to express a constraint. as far as I can see the differences involve using
set notation vs using equality)

e.g given : R subset-of S vs R-S = 0


R subset-of Null vs R = Null
the 'equal to empty set' style is more prevalent in SQL

2.5.2 Referential Integrity Constraints

Example: In our movie database, if a person p appears in the starsIn relation,


under the 'starName' attribute, we also expect the same person p to appear in the
MovieStar relation, under the 'name' attribute.
Reminder Movie Database schema

Movies(
*title* : string,
*year* : integer,
length : integer,
genre : string,
studioName : string,
producerC# : integer

MovieStar(
*name* : string,
address : string,
gender : char,
birthdate : date
)

StarsIn(
*movieTitle* : string,
*movieYear* : integer,
*starName* : string

MovieExec(
name : string,
address : string,
*cert#* : integer,
netWorth : integer
)

Studio(
*name* : string,
address : string,
presC# : integer

the *...* s are primary keys.

In general, relational constraint == if a value v occurs 'under' an attibute A of


*some* tuple in relation R, we also expect v to appear as a component of attribute
B in relation S. This is driven by our design intentions.

We express this in relational algebra as

project_A (R) subsetOf project_B (S). (_ so B in S *can* have values not in R.A but
every value in R.A must be in S.B)
or with alternative notation

project_A (R) difference project_B (S) = null

Example 2.21

Consider these relations from our movie database

Movies (title, year, length, genre, studioName, producerC#)


MovieExec(name, address, cert#, netWorth)

we would expect the values of producerC# in movies would appear as the cert# of
some executive (tuple) in MovieExec (_ otherwise there would be a producer who is
not a movieExec. also there can be movie execs who are not directors).

This constraint can be expressed as

project_producerC# (Movies) subsetOf project_cert#(MovieExec)

Example 2.22
A referential constraint where the 'value' involved is represented by more than one
attribute.
Any movie mentioned in the 'StarsIn' relation must appear in the movies relation.
The key difference here is that Movies are identified (uniquely, so via primary key
=) year *and* title. so we use subset of *pairs* to express this constraint

project_(movietitle, movieyear) (StarsIn) subsetOf project_(title,year) Movies

2.5.3 Key Constraints

We use the same notation for key constraints


To express "an attribute or set of attributes is a key for a relation"
e.g: 'name' is the key for the relation MovieStar(name, address, gender, birthrate)

(For now assuming that we are concerned only with the address attribute, given name
is a key)
Let us rename the MovieStar relation to get two new 'names' MS1, MS2

rename_MS1(name, address, gender, birthdate) (MovieStar)


rename_MS2(name, address, gender, birthdate) (MovieStar)

and then

select_(MS1.name = MS2.name AND MS1.address NOT = MS2.address) (MS1 X MS2) = NULL;

2.5.4 Additional Constraint Examples

There are many kinds of constraints that can be expressed with relational algebra,
which are used for restricting database contents.

Two examples of domain constraints

gender must be either 'M' or 'F' on relation MovieStar becomes select_ gender
MovieStar != 'M' AND select_gender MovieStar != 'F' = NULL
to be a moviestudio president, you need a net worth of at least 10 million .

given
MovieExec (name, address, certC#, netWorth)
Studio(name, address, presC#)

step 1. Studio ThetaJoin(certC# = presC#) MovieExec


step 2. select(networth < 10,000,000) [Studio ThetaJoin(certC# = presC#)
MovieExec] == NULL

or
step 1. Select (netWorth >= 10,000,000) MovieExec, then
step 2. Project (certC#) [Select (netWorth >= 10,000,000) MovieExec]
step 3. Project (presC#) Studio subsetOf (2)

2.5.5 Exercises For Section 2.5

You might also like