This action might not be possible to undo. Are you sure you want to continue?

**Simple SQL Queries
**

A SQL query has a form:

SELECT . . .

FROM . . .

WHERE . . .;

The SELECT clause indicates which attributes should

appear in the output.

The FROM gives the relation(s) the query refers to

The WHERE clause is a Boolean expression indicating

which tuples are of interest.

The query result is a relation

Note that the result relation is unnamed.

Example SQL Query

Relation schema:

Course (courseNumber, name, noOfCredits)

Query:

Find all the courses stored in the database

Query in SQL:

SELECT -

FROM Course;

Note:

“ - “ means all the attributes in the relations involved.

Example SQL Query

Relation schema:

Movie (title, year, length, filmType)

Query:

Find the titles of all movies stored in the database

Query in SQL:

SELECT title

FROM Movie;

Example SQL Query

Relation schema:

Student (ID, firstName, lastName, address, GPA)

Query:

Find the ID of every student who has GPA > 3

Query in SQL:

SELECT ID

FROM Student

WHERE GPA > 3;

Example SQL Query

Relation schema:

Student (ID, firstName, lastName, address, GPA)

Query:

Find the ID and last name of every student with first name ’John’,

who has GPA > 3

Query in SQL:

SELECT ID, lastName

FROM Student

WHERE firstName = ’John’ AND GPA > 3;

WHERE clause

The expressions that may follow WHERE are conditions

Standard comparison operators Θ includes { =, <>, <, >, <=, >= }

The values that may be compared include constants and

attributes of the relation(s) mentioned in FROM clause

• Simple expression

• A op Value

• A op B

• Where A, B are attributes and op is a comparison operator

We may also apply the usual arithmetic operators, +,-,*,/, etc. to

numeric values before comparing them

• (year - 1930) * (year - 1930) < 100

The result of a comparison is a Boolean value TRUE or FALSE

Boolean expressions can be combined by the logical operators

AND, OR, and NOT

Example SQL Query

Relation schema:

Movie (title, year, length, filmType)

Query:

Find the titles of all color movies produced in 1990

Query in SQL:

SELECT title

FROM Movie

WHERE filmType = ’color’ AND year = 1990;

Example SQL Query

Relation schema:

Movie (title, year, length, filmType)

Query:

Find the titles of all color movies that are either made after 1970 or

are less than 90 minutes long

Query in SQL:

SELECT title

FROM Movie

WHERE (year > 1970 OR length < 90) AND filmType = ’color’;

Note on precedence rules:

AND takes precedence over OR, and

NOT takes precedence over both

Products and Joins

SQL has a simple way to couple relations in one query

list each relevant relation in the FROM clause

All the relations in the FROM clause are coupled through

Cartesian product (×, in algebra)

Cartesian Product

From Set Theory:

The Cartesian Product of two sets R and S is the

set of all pairs (a, b) such that: a ∈ R and b ∈ S.

Denoted as R × S

Note:

• In general, R × S = S × R

Example

Instance S: Instance R:

R x S:

B C D

2 5 6

4 7 8

9 10 11

A B

1 2

3 4

A R.B S.B C D

1 2 2 5 6

1 2 4 7 8

1 2 9 10 11

3 4 2 5 6

3 4 4 7 8

3 4 9 10 11

Example

Instance of Course: Instance of Student:

SELECT - FROM Student, Course;

ID firstName lastName GPA Address courseNumber name noOfCredits

111 Joe

Smith 4.0 45 Pine av. Comp352 Data structures 3

111 Joe

Smith 4.0 45 Pine av. Comp353 Databases 4

222 Sue

Brown 3.1 71 Main st. Comp352 Data structures 3

222 Sue

Brown 3.1 71 Main st. Comp353 Databases 4

333 Ann

Johns 3.7 39 Bay st. Comp352 Data structures 3

333 Ann

Johns 3.7 39 Bay st. Comp353 Databases 4

ID firstName lastName GPA Address

111 Joe

Smith 4.0 45 Pine av.

222 Sue

Brown 3.1 71 Main st.

333 Ann

Johns 3.7 39 Bay st.

courseNumber name noOfCredits

Comp352 Data structures 3

Comp353 Databases 4

Example

Instance of Course: Instance of Student:

SELECT ID, courseNumber

FROM Student, Course;

ID firstName lastName GPA Address

111 Joe

Smith 4.0 45 Pine av.

222 Sue

Brown 3.1 71 Main st.

333 Ann

Johns 3.7 39 Bay st.

courseNumber name noOfCredits

Comp352 Data structures 3

Comp353 Databases 4

ID courseNumber

111

Comp352

111

Comp353

222

Comp352

222

Comp353

333

Comp352

333

Comp353

Example

Relation schemas:

Student (ID, firstName, lastName, address, GPA)

Ugrad (ID, major)

Query:

Find all information available about every undergraduate student

We can try to compute the Cartesian product (×)

SELECT - FROM Student, Ugrad;

Example

Instance of Ugrad: Instance of Student:

SELECT - FROM Student, Ugrad;

ID firstName lastName GPA Address ID major

111 Joe

Smith 4.0 45 Pine av.

111 CS

111 Joe

Smith 4.0 45 Pine av.

333 EE

222 Sue

Brown 3.1 71 Main st.

111 CS

222 Sue

Brown 3.1 71 Main st.

333 EE

333 Ann

Johns 3.7 39 Bay st.

111 CS

333 Ann

Johns 3.7 39 Bay st.

333 EE

ID firstName lastName GPA Address

111 Joe

Smith 4.0 45 Pine av.

222 Sue

Brown 3.1 71 Main st.

333 Ann

Johns 3.7 39 Bay st.

ID major

111 CS

333 EE

Which tuples should

be in the query result

and which shouldn’t?

Example

Instance of Ugrad: Instance of Student:

SELECT -

FROM Student, Ugrad

WHERE Student.ID = Ugrad.ID;

ID firstName lastName GPA Address ID major

111 Joe

Smith 4.0 45 Pine av.

111 CS

333 Ann

Johns 3.7 39 Bay st.

333 EE

ID firstName lastName GPA Address

111 Joe

Smith 4.0 45 Pine av.

222 Sue

Brown 3.1 71 Main st.

333 Ann

Johns 3.7 39 Bay st.

ID major

111 CS

333 EE

Joins in SQL

The above query is an example of Join operation

There are various kinds of joins and we will study them

later in detail

To join relations R

1

,…,R

n

in SQL:

List all these relations in the FROM clause

Express the conditions in the WHERE clause in order to get the

desired join

Joining Relations

Relation schemas:

Movie (title, year, length, filmType)

Owns (title, year, studioName)

Query:

Find title, length, and studio name of every movie

Query in SQL:

SELECT Movie.title, Movie.length, Owns.studioName

FROM Movie, Owns

WHERE Movie.title = Owns.title AND Movie.year = Owns.year;

Is Owns in Owns.studioName necessary?

Joining Relations

Relation schemas:

Movie (title, year, length, filmType)

Owns (title, year, studioName)

Query:

Find the title and length of every movie produced by Disney

Query in SQL:

SELECT Movie.title, length

FROM Movie, Owns

WHERE Movie.title = Owns.title AND Movie.year = Owns.year

AND studioName = ’Disney’;

Joining Relations

Relation schemas:

Movie (title, year, length, filmType)

Owns (title, year, studioName)

StarsIn (title, year, starName)

Query:

Find the title and length of each movie with Julia Roberts,

produced by Disney

Query in SQL:

SELECT Movie.title, Movie.length

FROM Movie, Owns, StarsIn

WHERE Movie.title = Owns.title AND Movie.year = Owns.year

AND Movie.title = StarsIn.title AND Movie.year = StarsIn.year

AND studioName = ’Disney’ AND starName = ’Julia Roberts’;

Example

title year starName

T1 1990 JR

T2 1991 JR

title year studioName

T1 1990 Disney

T2 1991 MGM

title year length filmTyp

e

T1 1990 124 color

T2 1991 144 color

SELECT Movie.title, Movie.length

FROM Movie, Owns, StarsIn

WHERE Movie.title = Owns.title AND Movie.year = Owns.year AND

Movie.title = StarsIn.title AND Movie.year = StarsIn.year AND

studioName = ’Disney’ AND starName = ’Julia Roberts’;

title length

T1 124

Movie

Owns

StarsIn

Example

Relation schemas:

Movie (title, year, length, filmType, studioName, producerC#)

Exec (name, address, cert#, netWorth)

Query:

Find the name of the producer of “Star Wars”

Query in SQL:

SELECT Exec.name

FROM Movie, Exec

WHERE Movie.title = ’Star Wars’ AND

Movie.producerC# = Exec.cert#;

Example

Relation schemas:

Movie (title, year, length, filmType, studioName, producerC#)

Exec (name, address, cert#, netWorth)

Query:

Find the name of the producer of “Star Wars”

Query with Subquery:

SELECT name

FROM Exec

WHERE cert# = ( SELECT producerC#

FROM Movie

WHERE title = ’Star Wars’ );

Example

Relation schemas:

Movie(title, year, length, filmType, studioName, producerC#)

Exec(name, address, cert#, netWorth)

StarsIn(title, year, starName)

Query: Find the names of the producers of Harrison Ford’s movies

Query in SQL:

SELECT name

FROM Exec

WHERE cert# IN (SELECT producerC#

FROM Movie

WHERE (title, year) IN (SELECT title, year

FROM StarsIn

WHERE starName = ’Harrison Ford’));

Example

Relation schemas:

Movie(title, year, length, filmType, studioName, producerC#)

Exec(name, address, cert#, netWorth)

StarsIn(title, year, starName)

Query:Find names of the producers of Harrison Ford’s movies

Query in SQL:

SELECT Exec.name

FROM Exec, Movie, StarsIn

WHERE Exec.cert# = Movie.producerC# AND

Movie.title = StarsIn.title AND

Movie.year = StarsIn.year AND

starName = ’Harrison Ford’;

Correlated Subqueries

Relation schema:

Movie(title, year, length, filmType, studioName, producerC#)

Query:

Find movie titles that appear more than once

Query in SQL:

SELECT title

FROM Movie Old

WHERE year < ANY (SELECT year

FROM Movie

WHERE title = Old.title);

Note the scopes of the variables in this query.

Correlated Subqueries

Query in SQL

SELECT title

FROM Movie Old

WHERE year < ANY (SELECT year

FROM Movie

WHERE title = Old.title);

The condition in the outer WHERE is true only if there is a movie with same

title as Old.title that has a later year

The query will produce a title one fewer times than there are movies with that title

What would be the result if we used “<>”, instead of “<” ?

For a movie title appearing 3 times, we would get 3 copies of the title in the output

Aggregation in SQL

SQL provides five operators that apply to a column of

a relation and produce “some kind of summary”

These operators are called aggregations

These operators are used by applying them to a

scalar-valued expression, typically a column name, in

a SELECT clause

Aggregation Operators

SUM

the sum of values in the column

AVG

the average of values in the column

MIN

the least value in the column

MAX

the greatest value in the column

COUNT

the number of values in the column, including the duplicates, unless

the keyword DISTINCT is used explicitly

Example

Relation schema:

Exec(name, address, cert#, netWorth)

Query:

Find the average net worth of all movie executives

Query in SQL:

SELECT AVG(netWorth)

FROM Exec;

The sum of “all” values in the column netWorth divided by

the number of these values

In general, if a tuple appears n times in a relation, it will be

counted n times when computing the average

Example

Relation schema:

Exec (name, address, cert#, netWorth)

Query:

How many tuples are there in the Exec relation?

Query in SQL:

SELECT COUNT(*)

FROM Exec;

The use of * as a parameter is unique to COUNT;

using * does not make sense for other aggregation operations

Example

Relation schema:

Exec (name, address, cert#, netWorth)

Query:

How many different names are there in the Exec relation?

Query in SQL:

SELECT COUNT (DISTINCT name)

FROM Exec;

In query processing time, the system first eliminates the duplicates

from column name, and then counts the number of values there

Aggregation -- Grouping

Often we need to consider the tuples in an SQL query in

groups, with regard to the value of some other column(s)

Example: suppose we want to compute:

Total length in minutes of movies produced by each studio:

Movie(title, year, length, filmType, studioName, producerC#)

We must group the tuples in the Movie relation according to

their studio, and get the sum of the length values within each

group; the result would be something like:

studio SUM(length)

Disney 12345

MGM 54321

… …

Aggregation - Grouping

Relation schema:

Movie(title, year, length, filmType, studioName, producerC#)

Query: What is the total length in minutes produced by each studio?

Query in SQL:

SELECT studioName, SUM(length)

FROM Movie

GROUP BY studioName;

Whatever aggregation used in the SELECT clause will be applied

only within groups

Only those attributes mentioned in the GROUP BY clause may

appear unaggregated in the SELECT clause

Can we use GROUP BY without using aggregation? (Yes/No)

Aggregation -- Grouping

Relation schema:

Movie(title, year, length, filmType, studioName, producerC#)

Exec(name, address, cert#, netWorth)

Query:

For each producer (name), list the total length of the films produced

Query in SQL:

SELECT Exec.name, SUM(Movie.length)

FROM Exec, Movie

WHERE Movie.producerC# = Exec.cert#

GROUP BY Exec.name;

Aggregation – HAVING clause

We might be interested in not all but some groups of tuples

that satisfy certain conditions

We can follow a GROUP BY clause with a HAVING clause

HAVING is followed by some conditions about the group

We can not use a HAVING clause without GROUP BY

Aggregation – HAVING clause

Relation schema:

Movie (title, year, length, filmType, studioName, producerC#)

Exec(name, address, cert#, netWorth)

Query:

For those producers who made at least one film prior to 1930, list the

total length of the films produced

Query in SQL:

SELECT Exec.name, SUM(Movie.length)

FROM Exec, Movie

WHERE producerC# = cert#

GROUP BY Exec.name

HAVING MIN(Movie.year) < 1930;

Aggregation – HAVING clause

This query chooses the group based on the property of the group

SELECT Exec.name, SUM(Movie.length)

FROM Exec, Movie

WHERE producerC# = cert#

GROUP BY Exec.name

HAVING MIN(Movie.year) < 1930;

This query chooses the movies based on the property of each movie tuple

SELECT Exec.name, SUM(Movie.length)

FROM Exec, Movie

WHERE producerC# = cert# AND Movie.year < 1930

GROUP BY Exec.name;

Note the difference!

Order By

The SQL statements/queries we looked at so far return an unordered

relation/bag (except when using ORDER BY)

Movie (title, year, length, filmType, studioName, producerC#)

SELECT Exec.name, SUM(Movie.length)

FROM Exec, Movie

WHERE producerC# = cert#

GROUP BY Exec.name

HAVING MIN(Movie.year) < 1930

ORDER BY Exec.name ASC;

In general:

ORDER BY A1 ASC, B DESC, C ASC;

Database Modifications

SQL & Database Modifications?

Next we will look at SQL statements that do not return something,

but rather change the state of the database

There are three types of such SQL statements/transactions:

Insert tuples into a relation

Delete certain tuples from a relation

Update values of certain components of certain existing tuples

We refer to these types of operations collectively as database

modifications, and refer to such requests as transactions

Insertion

The insertion statement consists of:

The keyword INSERT INTO

The name of a relation R

A parenthesized list of attributes of the relation R

The keyword VALUES

A tuple expression, that is, a parenthesized list of concrete values,

one for each attribute in the attribute list

The form of an insert statement:

INSERT INTO R(A

1

, …A

n

)

VALUES (v

1

,… v

n

);

• A tuple is created and added, where v

i

is the value of

attribute

A

i

, for i =

1,2,…,n

Insertion

Relation schema:

StarsIn (title, year, starName)

Update the database:

Add “Sydney Greenstreet” to the list of stars of The Maltese Falcon

In SQL:

INSERT INTO StarsIn (title,year, starName)

VALUES(’The Maltese Falcon’, 1942, ’Sydney Greenstreet’);

Another formulation of this query:

INSERT INTO StarsIn

VALUES(’The Maltese Falcon’, 1942, ’Sydney Greenstreet’);

Insertion

The previous insertion statement was very simple

It added only one tuple into a relation

Instead of using explicit values for one tuple, we can

compute a set of tuples to be inserted using a subquery

This subquery replaces the keyword VALUES and the tuple

expression in the INSERT statement

Insertion

Database schema:

Studio(name, address, presC#)

Movie(title, year, length, filmType, studioName, producerC#)

Update the database:

Add to Studio, all studio names mentioned in the Movie relation

If the list of attributes does not include all attributes of relation

R, then the tuple created has default values for the missing

attributes

Since there is no way to determine an address or a

president for such a studio value, NULL will be used for the

attributes address and presC#

Insertion

Database schema:

Studio(name, address, presC#)

Movie(title, year, length, filmType, studioName, producerC#)

Update the database:

Add to Studio, all studio names mentioned in the Movie relation

In SQL:

INSERT INTO Studio(name)

SELECT DISTINCT studioName

FROM Movie

WHERE studioName NOT IN (SELECT name

FROM Studio);

Deletion

A deletion statement consists of :

The keyword DELETE FROM

The name of a relation R

The keyword WHERE

A condition

The form of a delete statement:

DELETE FROM R WHERE <condition>;

The effect of executing this statement is that every tuple in

relation R satisfying the condition will be deleted from R

Note: unlike the INSERT, we need a WHERE clause here

Deletion

Relation schema:

StarsIn(title, year, starName)

Update:

Delete: Sydney Greenstreet was a star in The Maltese Falcon

In SQL:

DELETE FROM StarIn

WHERE title = ’The Maltese Falcon’ AND

starName = ’Sydney Greenstreet’;

Deletion

Relation schema:

Exec(name, address, cert#, netWorth)

Update:

Delete every movie executive whose net worth is < $10,000,000

In SQL:

DELETE FROM Exec

WHERE netWorth < 10,000,000;

Anything wrong here?!

Deletion

Relation schema:

Studio(name, address, presC#)

Movie(title, year, length, filmType, studioName, producerC#)

Update:

Delete from Studio, all movies produced by studios not mentioned in

Movie (i.e., we don’t want to have non-producing studios)

In SQL:

DELETE FROM Studio

WHERE name NOT IN (SELECT StudioName

FROM Movie);

Defining Database Schema

To create a table in SQL:

CREATE TABLE name (list of elements);

• Principal elements are attributes and their types, but key

declarations and constraints may also appear

Example:

CREATE TABLE Star (

name CHAR(30),

address VARCHAR(255),

gender CHAR(1),

birthdate DATE

);

Defining Database Schema

To delete a table:

DROP TABLE name;

Example:

DROP TABLE Star;

Data types

INT or INTEGER

REAL or FLOAT

DECIMAL(n, d) -- NUMERIC(n, d)

DECIMAL(6, 2), e.g., 0123.45

CHAR(n)/BIT(B) fixed length character/bit string

Unused part is padded with the "pad character”, denoted as ±

VARCHAR(n) / BIT VARYING(n) variable-length strings up

to n characters

Data types (cont’d)

Time:

SQL2 format is TIME 'hh:mm:ss[.ss...]'

Date:

SQL2 format is DATE ’yyyy-mm-dd’ (m =0 or 1)

The default format of date in Oracle is ’dd-mon-yy’

Example:

CREATE TABLE Days(d DATE);

INSERT INTO Days VALUES(’08-aug-02’);

Oracle function to_date converts a specified format into

default format, e.g.,

INSERT INTO Days VALUES (to_date('2002-08-08', 'yyyy-mm-dd'));

Altering Relation Schemas

Adding Columns

Add an attribute to a relation R with

ALTER TABLE R ADD <column declaration>;

Example: Add attribute phone to table Star

ALTER TABLE Star ADD phone CHAR(16);

Removing Columns

Remove an attribute from a relation R using DROP:

ALTER TABLE R DROP COLUMN <column_name>;

Example: Remove column phone from Star

ALTER TABLE Star DROP COLUMN phone;

Note: Can’t drop if it is the only column

Attribute Properties

We can assert that the value of an attribute to be:

NOT NULL

• every tuple must have a “real” (non-null) value for this attribute

DEFAULT value

• Null is the default value for every attribute A

• The owner of the relation can define some other value as the

default, instead of NULL

Attribute Properties

CREATE TABLE Star (

name CHAR(30),

address VARCHAR(255),

gender CHAR(1) DEFAULT ’?’,

birthdate DATE NOT NULL);

Example: Add an attribute with a default value:

ALTER TABLE Star ADD phone CHAR(16) DEFAULT ’unlisted’;

INSERT INTO Star(name, birthdate) VALUES (’Sally’ ,’0000-00-00’)

name address gender birthdate phone

Sally NULL ? 0000-00-00 unlisted

INSERT INTO Star(name, phone) VALUES (’Sally’,’333-2255’)

this insertion could not be performed since the value for birthdate is

not given and it is disallowed to use NULL as the default

Schema Refinement

Functional Dependencies:

Essential to Normalization

Theory

Functional Dependencies

Consider the relation:

Movie (title, year, length, filmType, studioName, starName)

What are the functional dependencies?

title, year ÷ length

title, year ÷ filmType

title, year ÷ studioName

title, year ÷ length, filmType, studioName

Note that the FD title, year ÷ starName does not hold

Logical Implication: Reasoning with FD’s

Consider relation R(A, B,C) with the set of FD’s:

F = {A÷B, B÷C}

We can deduce from F that A÷C also holds on R.

How? Apply the definition…

To detect possible redundancy, is it necessary to

consider all the given FD’s?

As shown above, there might be some additional hidden

(nontrivial) FD’s “implied” by a given set of FD’s

Logical Implication (Cont’d)

Consider R(A

1

,A

2

,A

3

,A

4

,A

5

) with FD’s:

F = { A

1

÷ A

2

, A

2

÷ A

3

, A

2

A

3

÷ A

4

, A

2

A

3

A

4

÷ A

5

}

Prove that F ⊭ A

5

÷ A

1

Solution method: Provide a counter-example; give a relation

instance r of R that satisfies every FD in F but not

A

5

÷ A

1

A

1

A

2

A

3

A

4

A

5

t1: 0 1 1 1 1

t2: 1 1 1 1 1

A desired instance r of R.

Closure of a set of FD’s

Defn: The closure of F, denoted F

+

, is the set of

FD’s that are logically implied by F

How can we compute F

+

?

Definitely, F

+

includes F but possibly more FD’s

We need to know how to reason about FD’s

Equivalence

Defn: Suppose R is a relation schema, and S and T are

sets of functional dependencies on R.

T and S are equivalent (S ≡ T)

Example: Suppose R = {A,B,C}, and

S = {A ÷ B, B ÷ C, A ÷ C}

T = {A ÷ B, B ÷ C}

Can show that S ≡ T

Armstrong’s Axioms [1974]

R is a relation schema, and X, Y and Z are subsets of R.

Reflexivity

If Y ⊆ X, then X ÷ Y (trivial FD’s)

Augmentation

If X ÷ Y, then XZ ÷ YZ, for every Z

Transitivity

If X ÷ Y and Y ÷ Z, then X ÷ Z

These are sound and complete inference rules for FD’s

Additional rules / axioms

Other useful rules that follow from Armstrong Axioms

Union (Combining) Rule

If X ÷ Y and X ÷ Z, then X ÷ YZ

Decomposition (Splitting) Rule

If X ÷ YZ, then X ÷ Y and X ÷ Z

Pseudotransitivity Rule

If X ÷ Y and WY ÷ Z, then XW ÷ Z

NOTE: X, Y, Z, and W are sets of attributes

Example – Discovering hidden FD’s

Consider a relation schema R = {A, B, C, G, H, I} with

FD’s F = { A ÷ B, A ÷ C, CG ÷ H, CG ÷ I, B ÷ H }

Using these rules, we can derive the following FD’s

Since A ÷ B and B ÷ H, then A ÷ H, by transitivity

Since CG ÷ H and CG ÷ I, then CG ÷ HI, by union

Since A ÷ C then AG ÷ CG, by augmentation

Now, since AG ÷ CG and CG ÷ I, then AG ÷ I, by

transitivity (Do AG ÷ H)

Many trivial dependencies can be derived(!) by augmentation

Computing the Closure of Attributes

Given a set F of FD’s and a set X of attributes, how do we

compute the closure of X w.r.t. F?

Starting with X, we repeatedly expand the set, by adding the right

hand side (RHS) of every FD, once we have included its LHD in

the set.

When the set cannot be expanded anymore, we have obtained

the result, X

+

An Algorithm to Compute X

+

under F

X

+

÷ X (initialization step)

repeat

for each FD W ÷ Z in F do:

if W _ X

+

then

X

+

÷ X

+

Z // include Z to the result

until X

+

does not change anymore

Complexity question: In the worst case, how many times

the repeat statement will be executed?

Example

Consider a relation scheme R = { A, B, C, D, E, F } with

the set of FD’s { AB ÷ C, BC ÷ AD, D ÷ E, CF ÷ B }

Compute {A, B}

+

Execution result at each iteration:

X

+

= {A, B}

Using AB ÷ C, we get X

+

= {A, B, C}

Using BC ÷ AD, we get X

+

= {A, B, C, D}

Using D ÷ E, we get X

+

= {A, B, C,D, E}

No more change to X

+

is possible.

X

+

= {A, B}

+

= {A, B, C, D, E}

Does the order in which FD’s appear matter in this computation?

Implication Problem Revisited

Is a given FD X ÷ Y implied by a set F of FD’s?

That is to ask whether X ÷ Y is in F

+

?

How to answer this question?

An algorithm for this:

Compute X

+

under F, and

Check if Y is in X

+

• If yes, then F ⊨ X ÷ Y

• Otherwise F ⊭ X ÷ Y

Example

Consider a relation schema R = { A, B, C, D, E, F } with

the FD’s F = { AB ÷ C, BC ÷ AD, D ÷ E, CF ÷ B }

True/false: F ⊨ AB ÷ D?

Two steps:

Compute X

+

= {A, B}

+

= {A, B, C, D, E}

Check if D e X

+

Yes, AB ÷ D is implied by F

Example

Consider a relation scheme R = { A, B, C, D, E, F } with

FD’s F = { AB ÷ C, BC ÷ AD, D ÷ E, CF ÷ B }

Is D ÷ A implied by F?

Two steps:

Compute X

+

= {D}

+

= {D, E}

Check if A e X

+

Since A is not in {D, E}, we conclude that D ÷ A is not

implied by F

Closures and Keys

When will X

+

include all attributes of a relation R?

Clearly, the answer is yes iff X

is a (superkey) key of R

To check if X

is a candidate key of R, we may check if:

1. X

+

contains all attributes of R, i.e., X

+

= R, and

2. No proper subset S of X has this property, i.e., ∀A∈X, {X–A}

+

= R

Knowledge about keys is essential for “Normal forms”

Canonical Cover

Number of iterations of the algorithm for computing the

closure of a set of attributes depends on the number of

FD’s in F

The same will be observed for other algorithms that we will study

(such as the decomposition algorithms)

Can we “minimize” F?

Covers

FD’s can be represented in several different ways without changing

the set of legal/valid instances of the relation

Let F and G be sets of FD’s. We say “G follows from F”, if every

relation instance that satisfies F also satisfies G. In symbols: F ⊨ G.

We may also say: “G is implied by F” or “G is covered by F.”

If both F ⊨ G and G ⊨ F hold, then we say that G and F are

equivalent and denote this by F ≡ G

Note that F ≡ G iff F

+

≡ G

+

If F ≡ G we may also say: G is a cover of F and vice versa

Canonical Cover

Let F be a set of FD’s. A canonical / minimal cover

of F is a set G of FD’s that satisfies the following:

1. G is equivalent to F; that is, G ≡ F

2. G is minimal; that is, if we obtain a set H of FD’s from

G by deleting one or more of its FD’s, or by deleting

one or more attributes from some FD in G, then F ≢ H

3. Every FD in G is of the form X ÷ A, where A is a

single attribute

Canonical Cover

A canonical cover G is minimal in two respects:

1. Every FD in G is “required” in order for G to be equivalent to F

2. Every FD in G is as “small” as possible, that is,

• each attribute on the left hand side is necessary.

• Recall: the RHS of every FD in G is a single attribute

Computing Canonical Cover

Given a set F of FD’s, how to compute a canonical cover G of F?

Step 1: Put the FD’s in the standard form

Initialize G := F

Replace each FD X → A

1

A

2

…A

k

in G with X→A

1

, X→A2, …, X→A

k

Step 2: Minimize the left hand side of each FD

E.g., for each FD AB → C in G, check if A or B on the LHS is redundant ,

i.e., (G ÷ {AB → C } ⋃ {A → C })

+

≡ F

+

?

Step 3: Delete redundant FD’s

For each FD X → A in G, check if it is redundant, i.e., whether

(G ÷ {X → A })

+

≡ F

+

?

Computing Canonical Cover

R = { A, B, C, D, E, H}

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Step one – put FD’s in the standard form

All present FD’s are in the standard form

G = {A÷B, DE÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, AED ÷ B}

Computing Canonical Cover

Step two – Check for left redundancy

For every FD X ÷ A in G, check if the closure of a subset of X

determines A. If so, remove redundant attribute(s) from X

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, AED ÷ B }

A ÷ B

obviously OK (no left redundancy)

DE ÷ A

D

+

= D

E

+

= E

OK (no left redundancy)

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, AED ÷ B }

BC ÷ E

B

+

= B

C

+

= C

OK (no left redundancy)

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, AED ÷ B }

AC ÷ E

A

+

= AB

C

+

= C

OK (no left redundancy)

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, AED ÷ B }

BCD ÷ A

B

+

= B

C

+

= C

D

+

= D

BC

+

= BCE

CD

+

= CD

BD

+

= BD

OK (no left redundancy)

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, AED ÷ B }

AED ÷ B

A

+

= AB

E & D are redundant

we can remove them

from AED ÷ B

G = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

G = { DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

Step 3 – Check for redundant FD’s

For every FD X ÷ A in G

Remove X ÷ A from G; call the result G’

Compute X

+

under G’

If A e X

+

, then X ÷ A is redundant and hence we remove

the FD X ÷ A from G (that is, we rename G’ to G)

R = { A, B, C, D, E, H}

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

Remove DE ÷ A from G

G’ = { BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

Compute DE

+

under G’

DE

+

= DE (computed under G’)

Since A ∉ DE, the FD DE ÷ A is not redundant

G = { DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

Remove BC ÷ E from G

G’ = { DE ÷ A, AC ÷ E, BCD ÷ A, A ÷ B }

Compute BC

+

under G’

BC

+

= BC

BC ÷ E is not redundant

G = { DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A, A ÷ B }

Remove AC ÷ E from G

G’ = { DE ÷ A, BC ÷ E, BCD ÷ A, A ÷ B }

Compute AC

+

under G’

AC

+

= ACBE

Since E∊ ACBE, AC ÷ E is redundant remove it from G

G = { DE ÷ A, BC ÷ E, BCD ÷ A, A ÷ B }

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { DE ÷ A, BC ÷ E, BCD ÷ A, A ÷ B }

Remove BCD ÷ A from G

G’ = { DE ÷ A, BC ÷ E, A ÷ B }

Compute BCD

+

under G’

BCD

+

= BCDEA

This FD is redundant remove it from G

G = { DE ÷ A, BC ÷ E, A ÷ B }

R = { A, B, C, D, E, H }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Computing Canonical Cover

G = { DE ÷ A, BC ÷ E, A ÷ B }

Remove A ÷ B from G

G’ = { DE ÷ A, BC ÷ E }

Compute A

+

under G’

A

+

= A

This FD is not redundant (Another reason why this is true?)

G = { DE ÷ A, BC ÷ E, A ÷ B }

G is a minimal cover for F

R = { A, B, C, D, E, F }

F = { A ÷ B, DE ÷ A, BC ÷ E, AC ÷ E, BCD ÷ A,

AED ÷ B }

Several Canonical Covers Possible?

Relation R={A,B,C} with F = {A ÷ B, A ÷ C,

B ÷ A, B ÷ C, C ÷ B, C ÷ A}

Several canonical covers exist

G = {A ÷ B, B ÷ A, B ÷ C, C ÷ B}

G = {A ÷ B, B ÷ C, C ÷ A}

A B

C

A B

C

A B

C

Can you find more ?

How to Deal with Redundancy?

Name Address RepresentingFirm SpokesPerson

Carrie Fisher 123 Maple Star One Joe Smith

Harrison Ford 789 Palm dr. Star One Joe Smith

Mark Hamill 456 Oak rd. Movies & Co Mary Johns

Relation Instance:

Relation Schema:

Star (name, address, representingFirm, spokesPerson)

We can decompose this relation into two smaller relations

F = { name ÷ address, representingFirm, spokePerson,

representingFirm ÷ spokesPerson }

How to Deal with Redundancy?

Relation Schema:

Star (name, address, representingFirm, spokesperson)

Decompose this relation into the following relations:

Star (name, address, representingFirm)

with F1={ name ÷ address, representingFirm }

and

Firm (representingFirm, spokesPerson)

with F2= { representingFirm ÷ spokesPerson }

F = { representingFirm ÷ spokesPerson }

How to Deal with Redundancy?

Name Address RepresentingFirm Spokesperson

Carrie Fisher 123 Maple Star One Joe Smith

Harrison Ford 789 Palm dr. Star One Joe Smith

Mark Hamill 456 Oak rd. Movies & Co Mary Johns

Relation Instance before decomposition:

Name Address RepresentingFirm

Carrie Fisher 123 Maple Star One

Harrison Ford 789 Palm dr. Star One

Mark Hamill 456 Oak rd. Movies & Co

Relation Instances after decomposition:

RepresentingFirm Spokesperson

Star One Joe Smith

Movies & Co Mary Johns

Decomposition

A decomposition of a relation schema R consists of replacing R by

two or more non-empty relation schemas such that each one is a

subset of R and together they include all attributes of R. Formally,

R = {R

1

,…,R

m

} is a decomposition if all conditions below hold:

(0) R

i

≠Ø, for all i in {1,…,m}

(1) R

1

∪…∪ R

m

= R

(2) R

i

≠ R

j

,

for different i and j in {1,…,m}

When m = 2, the decomposition R = { R

1

, R

2

} is called binary

Not every decomposition of R is “desirable”

Properties of a decomposition?

(1) Lossless-join – this is a must

(2) Dependency-preserving – this is desirable

Explanation follows …

Example

Relation Instance: Decomposed into:

B C

2 3

2 5

A B C

1 2 3

4 2 5

A B

1 2

4 2

To “recover” information, we join the relations:

A B C

1 2 3

4 2 5

4 2 3

1 2 5

Why do we have new tuples?

Lossless-Join Decomposition

R is a relation schema and F is a set of FD’s over R.

A binary decomposition of R into relation schemas R

1

and

R

2

with attribute sets X and Y is said to be a lossless-join

decomposition with respect to F, if for every instance r of

R that satisfies F, we have t

X

( r ) t

Y

( r ) = r

Thm: Let R be a relation schema and F a set of FD’s on R.

A binary decomposition of R into R

1

and R

2

with attribute

sets X and Y is lossless iff X · Y ÷ X or X · Y ÷ Y,

i.e., this binary decomposition is lossless if the common

attributes of X and Y form a key of R

1

or R

2

Example: Lossless-join

Relation Instance: Decomposed into:

B C

2 3

A B C

1 2 3

4 2 3

A B

1 2

4 2

To recover the original relation r, we join the two relations:

A B C

1 2 3

4 2 3

F = { B ÷ C }

No new tuples !

Example: Dependency Preservation

Relation Instance:

Decomposed into:

B C D

2 5 7

3 6 8

A B

1 2

4 3

F = { B ÷ C, B ÷ D, A ÷ D }

A B C D

1 2 5 7

4 3 6 8

Can we enforce A ÷ D?

How ?

Dependency-Preserving Decomposition

A dependency-preserving decomposition allows us to enforce

every FD, on each insertion or modification of a tuple, by

examining just one single relation instance

Let R be a relation schema that is decomposed into two schemas

with attribute sets X and Y, and let F be a set of FD’s over R. The

projection of F on X (denoted by F

X

) is the set of FD’s in F

+

that

involve only attributes in X

Recall that a FD U ÷ V in F

+

is in F

X

if all the attributes in U

and V are in X; In this case we say this FD is “relevant” to X

The decomposition of < R, F > into two schemas with attribute sets

X and Y is dependency-preserving if ( F

X

F

Y

)

+

≡ F

+

Normal Forms

Given a relation schema R, we must be able to determine

whether it is “good” or we need to decompose it into

smaller relations, and if so, how?

To address these issues, we need to study normal forms

If a relation schema is in one of these normal forms, we

know that it is in some “good” shape in the sense that

certain kinds of problems (related to redundancy) cannot arise

1NF 2NF 3NF BCNF

Normal Forms

The normal forms based on FD’s are

First normal form (1NF)

Second normal form (2NF)

Third normal form (3NF)

Boyce-Codd normal form (BCNF)

These normal forms have increasingly restrictive

requirements

- Acceptable Use
- ICDL
- PRESENT cipher
- Dragon Fly
- Flattened Butterfly
- High radix , and routing
- Introduction
- Database System Concepts and Architecture
- Relational Database Design by ER- and EER-to-Relational Mapping
- The Relational Data Model and Relational Database Constraints
- Data Modeling Using the Entity-Relationship (ER) Model
- DML and QL
- SQL difinition language
- Ch1
- Appendix a , sic/ex
- Engineering Ethics
- Engineering Ethics
- Engineering Ethics
- Engineering Ethics
- Engineering Ethics
- Engineering Ethics
- Engineering Ethics
- Engineering Ethics
- Light-Weight Encryption Processor Verilog Code

Sign up to vote on this title

UsefulNot usefulA SQL query has a form:
SELECT . . .
FROM . . .
WHERE . . .;

A SQL query has a form:

SELECT . . .

FROM . . .

WHERE . . .;

SELECT . . .

FROM . . .

WHERE . . .;

- SQL Queries
- Ch1
- Data Modeling Using the Entity-Relationship (ER) Model
- SQL difinition language
- Relational Database Design by ER- and EER-to-Relational Mapping
- Database System Concepts and Architecture
- The Relational Data Model and Relational Database Constraints
- DML and QL
- 1.SQL.overview
- Query Languages
- Sample Exam II
- SQL.doc and queries
- stvr-2010-sqlfpc
- LINQ to SQL
- Tugas Basis Data Spasial
- Online System Copy
- Step by Step SQL Basic
- SQL Basic
- Crystal Reports Training1
- DBS06 English
- Set
- 08SQL1
- AIS Reviewer
- SQL
- Introduction to SQL
- Introduction to SQL
- sql commands
- SQL
- Nested Queries and Join Queries
- Crystal Reports V3.0
- Advance SQL