MOD4

Exercises 43
person (driver-id, name, address)

car (license, model, year)
accident (report-number, date, location)
owns (driver-id, license)
participated (driver-id, car, report-number, damage-amount)
Figure 4.12. Insurance database.
insert into accident

values (4007, ’2001-09-01’, ’Berkeley’)
insert into participated

select o.driver-id, c.license, 4007, 3000
from person p, owns o, car c
where p.name = ’Jones’ and p.driver-id = o.driver-id and
o.license = c.license and c.model = ’Toyota’
d. Delete the Mazda belonging to “John Smith”.
Since model is not a key of the car relation, we can either assume that only
one of John Smith’s cars is a Mazda, or delete all of John Smith’s Mazdas
(the query is the same). Again assume name is a key for person.
delete car
where model = ’Mazda’ and license in
(select license
from person p, owns o
where p.name = ’John Smith’ and p.driver-id = o.driver-id)
Note: The owns, accident and participated records associated with the Mazda
still exist.
e. Update the damage amount for the car with license number “AABB2000” in
the accident with report number “AR2197” to $3000.
update participated
set damage-amount = 3000
where report-number = “AR2197” and driver-id in
(select driver-id
from owns
where license = “AABB2000”)
4.2 Consider the employee database of Figure 4.13, where the primary keys are un-
derlined. Give an expression in SQL for each of the following queries.
a. Find the names of all employees who work for First Bank Corporation.
b. Find the names and cities of residence of all employees who work for First
Bank Corporation.
c. Find the names, street addresses, and cities of residence of all employees
who work for First Bank Corporation and earn more than $10,000.
44 Chapter 4 SQL
d. Find all employees in the database who live in the same cities as the com-
panies for which they work.
e. Find all employees in the database who live in the same cities and on the
same streets as do their managers.
f. Find all employees in the database who do not work for First Bank Corpo-
ration.
g. Find all employees in the database who earn more than each employee of
Small Bank Corporation.
h. Assume that the companies may be located in several cities. Find all com-
panies located in every city in which Small Bank Corporation is located.
i. Find all employees who earn more than the average salary of all employees
of their company.
j. Find the company that has the most employees.
k. Find the company that has the smallest payroll.
l. Find those companies whose employees earn a higher salary, on average,
than the average salary at First Bank Corporation.
Answer:
select employee-name
from works
where company-name = ’First Bank Corporation’
Bank Corporation.
select e.employee-name, city
from employee e, works w
where w.company-name = ’First Bank Corporation’ and
w.employee-name = e.employee-name
c. Find the names, street address, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000.
If people may work for several companies, the following solution will
only list those who earn more than $10,000 per annum from “First Bank
Corporation” alone.
select *
from employee
where employee-name in
(select employee-name
from works
where company-name = ’First Bank Corporation’ and salary ¿ 10000)
As in the solution to the previous query, we can use a join to solve this one
also.
d. Find all employees in the database who live in the same cities as the com-
panies for which they work.
Exercises 45
select e.employee-name
from employee e, works w, company c
where e.employee-name = w.employee-name and e.city = c.city and
w.company -name = c.company -name
e. Find all employees in the database who live in the same cities and on the
same streets as do their managers.
select P.employee-name
from employee P, employee R, manages M
where P.employee-name = M.employee-name and
M.manager-name = R.employee-name and
P.street = R.street and P.city = R.city
f. Find all employees in the database who do not work for First Bank Corpo-
ration.
The following solution assumes that all people work for exactly one com-
pany.
from works
If one allows people to appear in the database (e.g. in employee) but not
appear in works, or if people may have jobs with more than one company,
the solution is slightly more complicated.
from employee
where employee-name not in
(select employee-name
from works
where company-name = ’First Bank Corporation’)
g. Find all employees in the database who earn more than every employee of
The following solution assumes that all people work for at most one com-
pany.
from works
where salary > all
(select salary
from works
where company-name = ’Small Bank Corporation’)
If people may work for several companies and we wish to consider the
total earnings of each person, the problem is more complex. It can be solved
by using a nested subquery, but we illustrate below how to solve it using
the with clause.
46 Chapter 4 SQL
with emp-total-salary as
(select employee-name, sum(salary) as total-salary
from works
group by employee-name
)
from emp-total-salary
where total-salary > all
(select total-salary
from emp-total-salary, works
where works.company-name = ’Small Bank Corporation’ and
emp-total-salary.employee-name = works.employee-name
)
h. Assume that the companies may be located in several cities. Find all com-
panies located in every city in which Small Bank Corporation is located.
The simplest solution uses the contains comparison which was included
in the original System R Sequel language but is not present in the subse-
quent SQL versions.
select T.company-name
from company T
where (select R.city
from company R
where R.company-name = T.company-name)
contains
(select S.city
from company S
where S.company-name = ’Small Bank Corporation’)
Below is a solution using standard SQL.
select S.company-name
from company S
where not exists ((select city
from company
where company-name = ’Small Bank Corporation’)
except
(select city
from company T
where S.company-name = T.company-name))
i. Find all employees who earn more than the average salary of all employees
of their company.
The following solution assumes that all people work for at most one com-
pany.
Exercises 47
employee (employee-name, street, city)

works (employee-name, company-name, salary)
company (company-name, city)
manages (employee-name, manager-name)
Figure 4.13. Employee database.
from works T
where salary > (select avg (salary)
from works S
where T.company-name = S.company-name)
j. Find the company that has the most employees.
select company-name
from works
group by company-name
having count (distinct employee-name) >= all
(select count (distinct employee-name)
from works
group by company-name)
k. Find the company that has the smallest payroll.
select company-name
from works
having sum (salary) <= all (select sum (salary)
from works
group by company-name)
l. Find those companies whose employees earn a higher salary, on average,
select company-name
from works
having avg (salary) > (select avg (salary)
from works
where company-name = ’First Bank Corporation’)
4.3 Consider the relational database of Figure 4.13. Give an expression in SQL for
each of the following queries.
a. Modify the database so that Jones now lives in Newtown.
b. Give all employees of First Bank Corporation a 10 percent raise.
c. Give all managers of First Bank Corporation a 10 percent raise.
d. Give all managers of First Bank Corporation a 10 percent raise unless the
salary becomes greater than $100,000; in such cases, give only a 3 percent
raise.
48 Chapter 4 SQL
e. Delete all tuples in the works relation for employees of Small Bank Corpora-
tion.
Answer: The solution for part 0.a assumes that each person has only one tuple in
the employee relation. The solutions to parts 0.c and 0.d assume that each person
works for at most one company.
update employee
set city = ’Newton’
where person-name = ’Jones’
b. Give all employees of First Bank Corporation a 10-percent raise.
update works
set salary = salary * 1.1
c. Give all managers of First Bank Corporation a 10-percent raise.
update works
set salary = salary * 1.1
where employee-name in (select manager-name
from manages)
and company-name = ’First Bank Corporation’
d. Give all managers of First Bank Corporation a 10-percent raise unless the
salary becomes greater than $100,000; in such cases, give only a 3-percent
raise.
update works T
set T.salary = T.salary * 1.03
where T.employee-name in (select manager-name
from manages)
and T.salary * 1.1 > 100000
and T.company-name = ’First Bank Corporation’
update works T
set T.salary = T.salary * 1.1
from manages)
and T.salary * 1.1 <= 100000
and T.company-name = ’First Bank Corporation’
SQL-92 provides a case operation (see Exercise 4.11), using which we give
a more concise solution:-
Exercises 49
update works T
set T.salary = T.salary ∗
(case
when (T.salary ∗ 1.1 > 100000) then 1.03
else 1.1
)
from manages) and
T.company-name = ’First Bank Corporation’
tion.
delete works
where company-name = ’Small Bank Corporation’
4.4 Let the following relation schemas be given:
R = (A, B, C)
S = (D, E, F )
Let relations r(R) and s(S) be given. Give an expression in SQL that is equivalent
to each of the following queries.
a. ΠA (r)
b. σB = 17 (r)
c. r × s
d. ΠA,F (σC = D (r × s))
Answer:
a. ΠA (r)
select distinct A
from r
b. σB = 17 (r)
select *
from r
where B = 17
c. r × s
select distinct *
from r, s
d. ΠA,F (σC = D (r × s))
select distinct A, F
from r, s
where C = D
4.5 Let R = (A, B, C), and let r1 and r2 both be relations on schema R. Give an
expression in SQL that is equivalent to each of the following queries.
a. r1 ∪ r2
b. r1 ∩ r2
50 Chapter 4 SQL
c. r1 − r2
d. ΠAB (r1 ) ΠBC (r2 )
Answer:
a. r1 ∪ r2
(select *
from r1)
union
(select *
from r2)
b. r1 ∩ r2
We can write this using the intersect operation, which is the preferred
approach, but for variety we present an solution using a nested subquery.
select *
from r1
where (A, B, C) in (select *
from r2)
c. r1 − r2
select ∗
from r1
where (A, B, C) not in (select ∗
from r2)
This can also be solved using the except clause.
d. ΠAB (r1 ) ΠBC (r2 )
select r1.A, r2.B, r3.C
from r1, r2
where r1.B = r2.B
4.6 Let R = (A, B) and S = (A, C), and let r(R) and s(S) be relations. Write an
expression in SQL for each of the queries below:
a. {< a > | ∃ b (< a, b > ∈ r ∧ b = 17)}
b. {< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s}
c. {< a > | ∃ c (< a, c > ∈ s ∧ ∃ b1 , b2 (< a, b1 > ∈ r ∧ < c, b2 > ∈ r ∧ b1 >
b2 ))}
Answer:
a. {< a > | ∃ b (< a, b > ∈ r ∧ b = 17)}
select distinct A
from r
where B = 17
b. {< a, b, c > | < a, b > ∈ r ∧ < a, c > ∈ s)}
Exercises 31
employee (person-name, street, city)

works (person-name, company-name, salary)
company (company-name, city)
manages (person-name, manager-name)
Figure 3.39. Relational database for Exercises 3.5, 3.8 and 3.10.
3.4 In Chapter 2, we saw how to represent many-to-many, many-to-one, one-to-

many, and one-to-one relationship sets. Explain how primary keys help us to
represent such relationship sets in the relational model.
Answer: Suppose the primary key of relation schema R is {Ai1 , Ai2 , ..., Ain }
and the primary key of relation schema S is {Bi1 , Bi2 , ..., Bim }. Then a re-
lationship between the 2 sets can be represented as a tuple (Ai1 , Ai2 , ..., Ain
Bi1 , Bi2 , ..., Bim ). In a one-to-one relationship, each value on {Ai1 , Ai2 , ..., Ain }
will appear in exactly one tuple and likewise for {Bi1 , Bi2 , ..., Bim }. In a many-
to-one relationship (e.g., many A - one B), each value on {Ai1 , Ai2 , ..., Ain } will
appear once, and each value on {Bi1 , Bi2 , ..., Bin } may appear many times. In a
many-to-many relationship, values on both {Ai1 , Ai2 , ..., Ain } and { Bi1 , Bi2 , ...,
Bim } will appear many times. However, in all the above cases {Ai1 , Ai2 , ..., Ain ,
Bi1 , Bi2 , ..., Bim } is a primary key, so no tuple on (Aj1 , ..., Ajn Bk1 , ..., Bkm ) will
appear more than once.
3.5 Consider the relational database of Figure 3.39, where the primary keys are un-
derlined. Give an expression in the relational algebra to express each of the fol-
lowing queries:
Bank Corporation.
c. Find the names, street address, and cities of residence of all employees who
work for First Bank Corporation and earn more than $10,000 per annum.
d. Find the names of all employees in this database who live in the same city
as the company for which they work.
e. Find the names of all employees who live in the same city and on the same
street as do their managers.
f. Find the names of all employees in this database who do not work for First
Bank Corporation.
g. Find the names of all employees who earn more than every employee of
h. Assume the companies may be located in several cities. Find all companies
located in every city in which Small Bank Corporation is located.
Answer:
a. Πperson-name (σcompany-name = “First Bank Corporation” (works))
b. Πperson-name, city (employee
(σcompany-name = “First Bank Corporation” (works)))
32 Chapter 3 Relational Model
c. Πperson-name, street, city

(σ(company-name = “First Bank Corporation” ∧ salary > 10000)
works employee)
d. Πperson-name (employee works company)
e. Πperson-name ((employee manages)
(manager -name = employee2.person- name ∧ employee.street = employee2.street
∧ employee.city = employee2.city) (ρemployee2 (employee)))
f. The following solutions assume that all people work for exactly one com-
pany. If one allows people to appear in the database (e.g. in employee) but
not appear in works, the problem is more complicated. We give solutions for
this more realistic case later.
Πperson-name (σcompany-name = “First Bank Corporation” (works))
If people may not work for any company:
Πperson-name (employee) − Πperson-name
(σ(company-name = “First Bank Corporation”) (works))
g. Πperson-name (works) − (Πworks.person-name (works
(works.salary ≤works2.salary ∧ works2.company -name = “Small Bank Corporation”)
ρworks2 (works)))
h. Note: Small Bank Corporation will be included in each answer.
Πcompany-name (company ÷
(Πcity (σcompany-name = “Small Bank Corporation” (company))))
3.6 Consider the relation of Figure 3.21, which shows the result of the query “Find
the names of all customers who have a loan at the bank.” Rewrite the query
to include not only the name, but also the city of residence for each customer.
Observe that now customer Jackson no longer appears in the result, even though
Jackson does in fact have a loan from the bank.
a. Explain why Jackson does not appear in the result.
b. Suppose that you want Jackson to appear in the result. How would you
modify the database to achieve this effect?
c. Again, suppose that you want Jackson to appear in the result. Write a query
using an outer join that accomplishes this desire without your having to
modify the database.
Answer: The rewritten query is

Πcustomer-name,customer-city,amount (borrower loan customer)
a. Although Jackson does have a loan, no address is given for Jackson in the
customer relation. Since no tuple in customer joins with the Jackson tuple of
borrower, Jackson does not appear in the result.
b. The best solution is to insert Jackson’s address into the customer relation. If
the address is unknown, null values may be used. If the database system
does not support nulls, a special value may be used (such as unknown) for
Jackson’s street and city. The special value chosen must not be a plausible
name for an actual city or street.
Exercises 33
c. Πcustomer-name,customer-city,amount ((borrower loan) customer)
3.7 The outer-join operations extend the natural-join operation so that tuples from
the participating relations are not lost in the result of the join. Describe how the
theta join operation can be extended so that tuples from the left, right, or both
relations are not lost from the result of a theta join.
Answer:
a. The left outer theta join of r(R) and s(S) (r θ s) can be defined as
(r θ s) ∪ ((r − ΠR (r θ s)) × (null, null, . . . , null))
The tuple of nulls is of size equal to the number of attributes in S.
b. The right outer theta join of r(R) and s(S) (r θ s) can be defined as
(r θ s) ∪ ((null, null, . . . , null) × (s − ΠS (r θ s)))
The tuple of nulls is of size equal to the number of attributes in R.
c. The full outer theta join of r(R) and s(S) (r θ s) can be defined as
(r θ s) ∪ ((null, null, . . . , null) × (s − ΠS (r θ s))) ∪
((r − ΠR (r θ s)) × (null, null, . . . , null))
The first tuple of nulls is of size equal to the number of attributes in R, and
the second one is of size equal to the number of attributes in S.
3.8 Consider the relational database of Figure 3.39. Give an expression in the rela-
tional algebra for each request:
b. Give all employees of First Bank Corporation a 10 percent salary raise.
c. Give all managers in this database a 10 percent salary raise.
d. Give all managers in this database a 10 percent salary raise, unless the salary
would be greater than $100,000. In such cases, give only a 3 percent raise.
tion.
Answer:
a. employee ← Πperson-name,street,“N ewtown
(σperson-name=“Jones” (employee))
∪ (employee − σperson-name=“Jones” (employee))
b. works ← Πperson-name,company-name,1.1∗salary (
σ(company-name=“First Bank Corporation”) (works))
∪ (works − σcompany-name=“First Bank Corporation” (works))
c. The update syntax allows reference to a single relation only. Since this up-
date requires access to both the relation to be updated (works) and the man-
ages relation, we must use several steps. First we identify the tuples of works
to be updated and store them in a temporary relation (t1 ). Then we create
a temporary relation containing the new tuples (t2 ). Finally, we delete the
tuples in t1 , from works and insert the tuples of t2 .
t1 ← Πworks.person-name,company-name,salary
(σworks.person-name=manager-name (works × manages))
34 Chapter 3 Relational Model
t2 ← Πperson-name,company-name,1.1∗salary (t1 )
works ← (works − t1 ) ∪ t2
d. The same situation arises here. As before, t1 , holds the tuples to be updated
and t2 holds these tuples in their updated form.
t1 ← Πworks.person-name,company-name,salary
(σworks.person-name=manager-name (works × manages))
t2 ← Πworks.person-name,company-name,salary∗1.03
(σt1 .salary ∗ 1.1 > 100000 (t1 ))
t2 ← t2 ∪ (Πworks.person-name,company-name,salary∗1.1
(σt1 .salary ∗ 1.1 ≤ 100000 (t1 )))
works ← (works − t1 ) ∪ t2
e. works ← works − σcompany−name=“Small Bank Corporation” (works)
3.9 Using the bank example, write relational-algebra queries to find the accounts
held by more than two customers in the following ways:
a. Using an aggregate function.
b. Without using any aggregate functions.
Answer:
a. t1 ← account-number
Gcount customer
-name (depositor)
Πaccount-number σnum-holders>2 ρaccount-holders(account-number,num-holders) (t1 )
b. t1 ← (ρd1 (depositor) × ρd2 (depositor) × ρd3 (depositor))
t2 ← σ(d1.account-number=d2.account-number=d3.account-number) (t1 )
Πd1.account-number (σ(d1.customer-name=d2.customer-name ∧
d2.customer -name=d3.customer -name ∧d3.customer -name=d1.customer -name) (t2 ))
3.10 Consider the relational database of Figure 3.39. Give a relational-algebra expres-
sion for each of the following queries:
a. Find the company with the most employees.
b. Find the company with the smallest payroll.
c. Find those companies whose employees earn a higher salary, on average,
Answer:
a. t1 ← company-name Gcount-distinct person-name (works)
t2 ← maxnum-employees (ρcompany-strength(company-name,num-employees) (t1 ))
Πcompany-name (ρt3 (company-name,num-employees) (t1 ) ρt4 (num-employees) (t2 ))
b. t1 ← company-name Gsum salary (works)
t2 ← minpayroll (ρcompany-payroll(company-name,payroll) (t1 ))
Πcompany-name (ρt3 (company-name,payroll) (t1 ) ρt4 (payroll) (t2 ))
c. t1 ← company-name Gavg salary (works)
t2 ← σcompany-name = “First Bank Corporation” (t1 )
Chapter 19
Query Optimization
It is an activity conducted by the query optimizer to select the best

available strategy for executing the query.
1. Query Trees and Heuristics for Query Optimization
- Apply heuristic rules to modify the internal representation of the
query
- The scanner and parser of an SQL query first generates a data
structure that corresponds to an initial query representation
- It is optimized according to some heuristics rules
- This leads an optimized query representation, which corresponds
to the query execution strategy
- One of the heuristic rule is to apply selection and projection
before join to reduce data space
- Query tree is used to represent relational algebra, query graph is
used to represent relational calculus
Notation for Query Tree and Query Graphs
Q2: For every project located in ‘Stafford’, list the project number, the
controlling department number, and the department manager’s last
name, address and birth date.
SELECT Pnumber, Dnum, Lname, Address, Bdate

FROM PROJECT, DEPARTMENT, EMPLOYEE
WHERE Dnum = Dnumber AND Mgr_ssn = Ssn AND Plocation=’Stafford’;
1
2
3
4
5
- Query tree: relations at the leaf, operators at the nodes
- Perform operations until the root node is reached
- Node 1, 2, 3 must be executed in sequence
- Query tree represents a specific order of execution
Heuristic Optimization of Query Trees

- There are many query trees possible for a given relational
algebraic expression
- Query parser will generate a standard query tree without doing
any optimization
Example:
Find the last names of employees born after 1957 who work on a
project name ‘Aquarius’.
SELECT E.Lname
FROM EMPLOYEE E, WORKS_ON W, PROJECT P
WHERE P.Pname = ‘Aquarius’ AND P.Pnumber = W.Pno AND
E.Ssn=W.Ssn AND E.Bdate = ‘1957-12-31’;
Fig. 19.2(a) is the Query tree, not optimized
Fig. 19.2(b) is the Query tree with improvement
Fig. 19.2 (c ) more improvement
Fig. 19.2 (d ) more improvement
A query can be transformed step by step into an equivalent query
that is more efficient to execute (need some rules to do this)
6
7
8
9
10
11
General Transformation Rules for Relational Algebra Operations
1. A conjunctive selection condition can be broken up into a
cascade of individual σ operations.
σ c1 AND c2 AND c3 (R) = σ c1 (σ c2 (σ c3 (R)))
2. Commutative of σ : The σ operator is commutative,

σ c1 (σ c2 (R)) = σ c2 (σ c1 (R))
3. Cascade of ∏
∏List1 (∏List2 ……(∏Listn (R) ) = ∏List1 (R )
4. Commuting σ with ∏ If the selection condition c involves only

those attributes A1, A2, …An in the projection list, the two
operators can be commuted
∏<A1, A2, …An> (σ (R) ) = σ ( ∏<A1, A2, …An> (R) )
5. Commutativity of and X
R S = S R
RXS=SXR
6. Commuting σ c with
If the attributes in the selection condition c involve only the
attributes of the one of the relation being joined then
σc ( R S) = (σ c ( R ) ) S
12
if the selection condition c is c1 AND c2, c1 involves only attributes
in R and c2 involves only attributes in S then
σc ( R S) = (σ c1 ( R )) (σ c2 ( S ))
7. Commuting ∏ with
Suppose L = (A1, A2, …,An, B1, B2, ..Bn) where

(A1, A2, …, An) are attributes of R
(B1, B2, …, Bn) are attributes of S,
If the join condition only involves attributes in L, then
∏L (R c S ) = ( ∏<A1, A2, …An> (R) ) ( ∏<B1, B2, …Bn> (S) )
8. Commutativity of set operations. The set operations union and

intersection is commutative but difference is not
9. Associativity of JOIN, X, Union and Intersection: These are

individually associative
(R φ S ) φ T = R φ (S φ T)
10. Commuting selection with set operations: the σ operation

commutes with union and intersection
σ (R φ S ) = (σ (R)) φ (σ (S))
11. The ∏ commutes with union

13
∏L (R U S ) = ( ∏L (R) ) U ( ∏L (S) )
12. Converting a σ and X sequence into a JOIN
σc (R X S) = (R c S)
13. Pushing σ in conjunction with set difference
σc (R - S) = σc (R ) - σc (S)
However, selection may be applied to only one relation.
σc (R - S) = σc (R ) - S
14. Pushing σ to only one argument in ῃ
If in the condition σc all attributes are from relation R, then

σc (R ῃ S) = σc (R ) ῃ S
15. If S is empty, RUS = R
14
Outline of a heuristic optimization algorithm
1. Using rule 1, break up the select operation, this will allow
moving selection down the tree at different branches
2. Using rules 2, 4, 6, 10, 13, 14 move each select operation as far
down the query tree as is permitted by the attributes
3. Using rule 5 and 9, rearrange the leaf nodes
4. Using rule 12, combine X with a selection and JOIN
5. Using rule 3, 4, 7, 11 cascading of projection, breakdown and
move down the tree
6. Reduce the size of intermediate results; perform selection as
early as possible
7. Use projection attributes as early as possible to reduce number
of attributes
8. Execute select and join operations that are more restrictive or
result in less tuples
2. Choice of Query Execution Plans
Alternatives for Query Evaluation:
∏Fname, Lname, Address

(σDname=’Reseach’ (DEPARTMENT) Dnumber = Dno (EMPLOYEE)
Query tree is in Fig. 19.3
- Query tree includes the information about the access methods for
each relation and algorithms to execute the tree
- To execute this query plan, optimizer might choose an index for
SELECT operation on DEPARTMENT; assume also that an index
15
exits for Dno attribute of EMPLOYEE. Optimizer may also use a
pipelined operation.
16
For materialized evaluation, the result is stored as a temporary
(intermediate) relation.
For example, Join operation can produce intermediate results and
then apply projection. In a pipelined operation, one operation
results are fed to the next operation as they arrive.
Nested Subquery Optimization

Consider the query:
SELECT E1.Fname, E1.Lname

FROM EMPLOYEE E1
WHERE E1.Salary = (SELECT MAX(Salary)
FROM EMPLOYEE E2);
Inner qury evaluates and get E.Salary = M (max salary in E2)
Then the outer query runs for each employee in E1.
The inner query is evaluated once and the outer query uses this
value. This is a non-correlated query.
Consider the query:

SELECT Fname, Lname, Salary
FROM EMPLOYEE E
WHERE EXISTS (SELECT *
FROM DEPARTMENT D
WHERE D.Dnumber = E.Dno AND
D.Zipcode=30332)
The inner query returns true or false depending on whether the
employee working in a department lives in 30332 zip code area or
not.
17
The inner subquery has to be evaluated for every outer query
tuple, which is inefficient.
The above nested query can be written as a block query

(unnesting) using a Join operation.
SELECT Fname, Lname, Salary
FROM EMPLOYEE E, DEPARTMENT D
WHERE D.Dnumber=E.Dno AND D.Zipcode=30332
Temporary intermediate results can be used to perform join

operation. Usually, nested queries can be converted in to block
queries.
SKIP (Views and Merging)
3. Use of Selectivities in Cost-based Optimization

A query optimizer estimates and compares costs of executing
queries using different strategies. It then uses the strategy with
lowest cost. Number of strategies have to be limited to save
compute time.
Queries are either interpreted or compiled. The cost functions are
estimated. This approach is applicable to compiled queries as it
consumes more time at run time.
The query cost components are stored in the catalog to speed up the
process.
Overall cost-based query optimization
18
1. For a given query expression, multiple equivalent rules may exist;
there is no definitive convergence; it is difficult to do this in a
limited time
2. It is necessary to use some quantitative measures for evaluating
alternatives; reduce them to common metric called cost
3. Keep cheaper ones and prune the expensive ones
4. Scope is a query block; various index access paths, join
permutations, join methods, group by methods, etc…
5. In global query optimization, multiple query blocks used
Cost components for query execution

1. Access cost to disk (disk I/O, searching for a record, access
structures, hashing, indexes, file allocation methods, etc..
2. Disk storage cost: any intermediate file storage cost
3. Computation cost: CPU time (processing time), searching,
merging, performing computations on fields, etc.
4. Memory usage cost: no of buffers
5. Communication cost: cost of shipping data from the source of
query location to the server
Minimize access cost to disk…for large databases
For small database, it is stored in memory
Catalog information used in cost functions
Most of the information required is stored in the catalog so that the
query optimizer can access it.
- Number of records (tupes) (r)
- Average record size R
- The number of file blocks (b)
19
- The blocking factor (bfr)
- Primary file organization
- Indexes
- The number of levels of multiindex x
- The number of first level index blocks (bl1)
- The number of distinct values of an attribute (d)
- Selectivity of an attribute (sl)
- Avg number of records that satisfy an equality condition on an
attribute (selection cardinality s=sl*r; selectivity and no of
records)
Histograms
- Tables or data structures maintained by DBMS to record
distribution of data
- Without histograms, it is uniform distribution
20
21
Examples of cost functions for SELECT
S1: Linear search (brute force) approach

o Cs1a = b
o For an equality condition on a key (on the average it is found
that) Cs1a = b/2 if found otherwise Cs1a = b
S2: Binary search:
o Cs2 = log2b + г(s/bfr) ˥ - 1

o For an equality search on a unique key attribute Cs2 = log2b
S3: Using a primary index or a hash key to retrieve a single record:
o Cs3a = x + 1
o Cs3b = 1 for static or linear hashing
o Cs3b = 2 for extendible hashing
Examples of cost functions for JOIN
o JOIN selectivity (js)
o R S (A=B R.A = S.B) join condition C
o js = | R S | / |R X S |
 If condition C does not exist, js = 1
 If no tuples from the relations satisfy condition C, js 0
 Usually 0 <= js <= 1
o Size of the result after join operation

o | R S | = js * |R| * |S|
 If condition C does not exist, js = 1
 If no tuples from the relations satisfy condition C, js 0
 Usually 0 <= js <= 1
22
o If A is a key of R (A=B condition)
|R S| <= |S| ; js = 1 /|R|
o If B is a key of S (A=B condition)
|R S| <= |R| ; js = 1 /|S|
J1: Nested Loop Join:

o Cj1 = bR + (bR * bS) +
((jS * |R| * |S|)/bfrRS) (writing results to disk)
(R for outer loop, RS for resulting file)
23
4. Overview of Query Optimization in Oracle
- Cost based, introduced in Oracle 7.1

- It examines alternate table and index access paths, operator
algorithms, join ordering, join methods, parallel execution
distributed methods and so on
- Chooses the execution plan with the lowest estimated cost
- Estimated cost is a relative number proportional to the expected
elapsed time needed to execute the query with a given plan
Optimizer calculates cost based on:

o Object statistics
 Table cardinalities
 Number of distinct values in columns
 Column high and low values
 Data distribution of column values
 Estimated usage of resources (CPU and I/O time)
 Memory needed
o Estimated cost
 Internal metric
 Corresponds to the run time and required resources
 Find the best combination of lowest run time and least
resource utilization
24
Global Query Optimizer
- Query optimization consists of logical and physical phases
- In Oracle, logical transformation and physical optimization are
integrated to generate optimal execution plan (Fig. 19.7)
- Transformation can be heuristic based on cost based
- Cost based query transformation (CBQT) introduced in 10g.
- Applies one or more transformations
- An SQL statement may consist of multiple blocks, which are
transformed by physical optimizer
25
- This process is repeated several times, each time applying a
different transformation and its cost is computed
- At the end one or more transformations are applied to the
original SQL statement if they result in optimal execution plan
- To avoid combinatorial explosion, it provides efficient search
strategies for searching the state space of various transformations
- Major transformations include: group-by, distinct, subquery
merging, subquery unnesting, predicate move aroung, common
subexpression elimination, join predicate push down, OR
expansion, subquery coalescing, join factorization, subquery
removal through window function, start transformation, group-by
placement, and bushy join trees
Adaptive Optimization
- Oracle’s physical optimizer is adaptive
- Uses feedback loop from execution level to improve on its
previous decisions (backtrack)
- Optimal execution plan for a given SQL statement (uses object
statistics, system statistics)
- Optimality depends on accuracy of the statistics fed to the model
and the sophistication of the model itself
- Execution engine and physical optimizer has the feedback loop
(Fig. 19.7)
- Based on the estimated value of the table cardinality, optimizer
may choose index based nested loop join method; during
execution, the actual cardinality may be different from the
estimated value; during the execution, this may trigger the
physical optimizer to change the decision to use hash join method
instead of index join.
26
Array Processing
- Oracle lacks N-dimensional array based computation
- Extensions are made for OLAP features
- Improves performance in complex modeling and analysis
- Computation clause allows a table to be treated as a multi-
dimensional array and specify a set of formulas over it, the
formulas replace multiple joins and union operations
Hints
- Application developer can provide hints to query optimizer (query
annotations or directives)
- Hints are embedded in the text of query statement
- Hints are used to address infrequent cases to help optimizer
- Occasionally, application developer can override the optimizer in
case of suboptimal plans chosen by the optimizer
- E.g EMPLOYEE record ‘Sex’ attributes may assume half male and
half female; it is possible in the database all are male; then
application developer can specify that to optimize the column
index
- Some types of hints:
o The access path for a given table
o The join order for a query block
o A particular join method for a join between tables
o Enabling or disabling of tranformations
Outlines
- Outlines are used to preserve execution plans of SQL statements
or queries
- They are implemented as a collection of hints
27
- Outlines are used for plan stability, what-if analysis, and
performance experiments
SQL Plan Management
- Execution plans have a significant impact on overall performance
of a database management system
- SQL Plan Management (SPM) was introduced in Oracle 11g
- This option can be enabled for all execution plans or for a specific
SQL statements
- Execution plans may become obsolete due to a variety of reasons;
new statistics, configuration parameter changes, software
updates, new optimization techniques
- SMP will use optimal plans and avoid semi-optimal ones, create
new plans and add to the system as needed
5. Semantic Query Optimization

- Along with other optimization techniques, semantic query
optimization uses constraints specified on the database schema
- These constraints are used to generate more efficient queries to
execute
- Consider:
SELECT E.Lname, M.Lname
FROM EMPLOYEE AS E, EMPLOYEE AS M
WHERE E.Super_ssn=M.Ssn AND E.Salary>M.Salary;
(Retrieve the names of employees who earn more than their
supervisors)
If there is a constraint indicating that employees can’t earn more
than their supervisors, then there is no need to execute the able query.
28
- Consider another example:
SELECT Lname, Salary
FROM EMPLOYEE, DEPARTMENT
WHERE EMPLOYEE.Dno=DEPARTMENT.Dnumber AND
EMPLOYEE.Salary > 100000;
It can be rewritten:
SELECT Lname, Salary
FROM EMPLOYEE
WHERE EMPLOYEE.Dno IS NOT NULL AND
EMPLOYEE.Salary > 100000;
(The referential integrity constraint that EMPLOYEE Dno is a
foreign key that refers to DEPARTMENT Dnumber primary
key. All the attributes referenced in the query are from
EMPLOYEE. Thus, there is no need for DEPARTMENT and it
can be eliminated and there is no need for join.
29

MOD4

Uploaded by

Copyright:

Available Formats

You might also like

MOD4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

MOD4

Uploaded by

Copyright:

Available Formats

Exercises 43

person (driver-id, name, address)

Figure 4.12. Insurance database.

insert into accident

insert into participated

Below is a solution using standard SQL.

employee (employee-name, street, city)

Figure 4.13. Employee database.

a. Modify the database so that Jones now lives in Newtown.

b. Give all employees of First Bank Corporation a 10-percent raise.

c. Give all managers of First Bank Corporation a 10-percent raise.

employee (person-name, street, city)

3.4 In Chapter 2, we saw how to represent many-to-many, many-to-one, one-to-

c. Πperson-name, street, city

Answer: The rewritten query is

c. Πcustomer-name,customer-city,amount ((borrower loan) customer)

It is an activity conducted by the query optimizer to select the best

SELECT Pnumber, Dnum, Lname, Address, Bdate

Heuristic Optimization of Query Trees

σ c1 AND c2 AND c3 (R) = σ c1 (σ c2 (σ c3 (R)))

2. Commutative of σ : The σ operator is commutative,

∏List1 (∏List2 ……(∏Listn (R) ) = ∏List1 (R )

4. Commuting σ with ∏ If the selection condition c involves only

Suppose L = (A1, A2, …,An, B1, B2, ..Bn) where

∏L (R c S ) = ( ∏<A1, A2, …An> (R) ) ( ∏<B1, B2, …Bn> (S) )

8. Commutativity of set operations. The set operations union and

9. Associativity of JOIN, X, Union and Intersection: These are

10. Commuting selection with set operations: the σ operation

11. The ∏ commutes with union

12. Converting a σ and X sequence into a JOIN

13. Pushing σ in conjunction with set difference

14. Pushing σ to only one argument in ῃ

If in the condition σc all attributes are from relation R, then

15. If S is empty, RUS = R

2. Choice of Query Execution Plans

Alternatives for Query Evaluation:

∏Fname, Lname, Address

Query tree is in Fig. 19.3

Nested Subquery Optimization

SELECT E1.Fname, E1.Lname

Consider the query:

The above nested query can be written as a block query

Temporary intermediate results can be used to perform join

SKIP (Views and Merging)

3. Use of Selectivities in Cost-based Optimization

Cost components for query execution

S1: Linear search (brute force) approach

o Cs2 = log2b + г(s/bfr) ˥ - 1

o Size of the result after join operation

|R S| <= |S| ; js = 1 /|R|

o If B is a key of S (A=B condition)

|R S| <= |R| ; js = 1 /|S|

J1: Nested Loop Join:

- Cost based, introduced in Oracle 7.1

Optimizer calculates cost based on:

5. Semantic Query Optimization

You might also like