Professional Documents
Culture Documents
IS312 - L04 - QP-Projection and Set Operation
IS312 - L04 - QP-Projection and Set Operation
Query Query
Processing
Result
2
Relational Algebra: Project Operator
• Produces table containing subset of columns of argument table
attribute list(relation)
• Example:
Person Name,Hobby(Person)
3
Project Operator
• Example:
Person Name,Address(Person)
Id Name Address Hobby Name Address
5
Algorithms
• <attribute list>(R)
– Just remove unwanted attributes!
• Duplicate rows:
– No DISTINCT in query Sort → Eliminate
• No need to remove duplicates
– DISTINCT in query
• Key ∈ attribute list ➔ No duplicates
• Key ∉ attribute list ➔ Duplicate elimination
6
Sort → Eliminate
•Sort rows
•Remove repeated ones.
7
Sort → Eliminate
Sort → Eliminate
Sort tuples with all remaining tuples
8
Sort → Eliminate
Sort → Eliminate
9
Sort → Eliminate
Sort → Eliminate
10
Sort → Eliminate
i:1
j:2
11
Sort → Eliminate
Sort → Eliminate
i:1
j:2
12
Sort → Eliminate
Sort → Eliminate
i:1
j:3
13
Sort → Eliminate
Sort → Eliminate
i:1
j:4
14
Sort → Eliminate
Sort → Eliminate
i:4
j:5
15
Sort → Eliminate
Sort → Eliminate
i:4
j:5
16
Sort → Eliminate
Sort → Eliminate
i:4
j:5
17
Sort → Eliminate
CS 4 20092300 Nemin
Sort → Eliminate
i:5
j:6
18
Sort → Eliminate
Sort → Eliminate
i:5
j:6
And, so on…
DeptName Salary Address EmpName DeptName Salary Address EmpName
CS 4000 Giza Nemin CS 4000 Giza Nemin
19
SET Operations
• Algorithm for SET operations
• Set operations:
• UNION, INTERSECTION, SET DIFFERENCE and CARTESIAN PRODUCT
• CARTESIAN PRODUCT of relations R and S
include all possible combinations of records
from R and S. The attribute of the result
include all attributes of R and S.
• Cost analysis of CARTESIAN PRODUCT
• If R has n records and j attributes and S has m records and k
attributes, the result relation will have n*m records and j+k
attributes.
• CARTESIAN PRODUCT operation is very
expensive and should be avoided if possible.
20
SET Operations
• Apply to type compatible relations
• Same number of attributes
• Names of attributes are the same in both
• Same attribute domains
• Tables:
Person (SSN, Name, Address, Hobby)
Professor (Id, Name, Office, Phone)
are not union compatible.
Name (Person) and Name (P rofessor)
are union compatible and
Name (Person) - Name (Professor)
makes sense.
21
Algorithms for SET Operations
• UNION
• Sort the two relations on the same attributes.
• Scan and merge both sorted files concurrently, whenever
the same tuple exists in both relations, only one is kept in
the merged results.
• INTERSECTION
• Sort the two relations on the same attributes.
• Scan and merge both sorted files concurrently, keep in the
merged results only those tuples that appear in both
relations.
• SET DIFFERENCE R-S
• Keep in the merged results only those tuples that appear in
relation R but not in relation S.
• Sorting-based
22
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1
Research 2 DeptName DeptID
Accounting 1
HR 4
R
DeptName DeptID
Management 5
Logistics 7
Accounting 1
S R∪S
23
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1
DeptName DeptID
Research 2
HR 4
Admin 9
R
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
24
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
HR 4
Admin 9
R
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
25
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Admin 9
R
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
26
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9
R
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
27
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1
DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9
HR 4
R
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
28
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9 HR 4
R Management 5
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
29
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9 HR 4
R Management 5
Logistics 7
DeptName DeptID
Accounting 1
Management 5
Logistics 7
S R∪S
30
Sorting-based ∪ Algorithm
DeptName DeptID
Accounting 1 DeptName DeptID
Research 2
Accounting 1
HR 4
Research 2
Admin 9 HR 4
R Management 5
Logistics 7
DeptName DeptID Admin 9
Accounting 1
Management 5
Logistics 7
S R∪S
31
Sorting-based ∩ and - Algorithms
32
Summary
• Relational Algebra operators can be classified into three groups
[Selection- Projection- grouping – grouping- set operation- rename]
• Tuple at a Time Unary Operators
• Selection and Projection
• No need to bring entire relation into memory at one time
• Full Relation Unary operators
• Duplicate elimination and grouping
• Require seeing all or most of the tuples in memory at once
• Full Relation Binary Operators
• Set operators like union, intersection , difference, join and Cartesian products
• Requires seeing the tuples of both relations in memory
33
Aggregate Operators
• Functions that operate on sets:
– COUNT, SUM, AVG, MAX, MIN
• Produce numbers (not tables)
• Not part of relational algebra
34
Aggregate Operators with B-Tree
• MAX, MIN, SUM, AVERAGE, COUNT
26
6 12
42 51 62
1 2 4 7 8 13 15 18 25
27 29 45 46 48 53 55 60 64 70 90
39
Aggregate Operators with B-Tree
• MAX
• Traverse till the right-most index key
• MIN
• Traverse till the left-most index key
• SUM
• Visit all keys
• Average
• Visit all keys
• Count
• Is usually part of the catalog so it can be found directly
40
GROUP BY
41
GROUP BY
SELECT Dno, AVG (Salary)
FROM EMPLOYEE
GROUP BY Dno;
• Sorting
• Sort the relation on the grouping attributes
• Partition the relation into groups by the grouping attributes
• Apply the aggregate operators on the groups
Computation is complex
42
43
GROUP BY
• Sorting
• Sort the relation on the grouping attributes
• Partition the relation into groups by the grouping attributes
• Apply the aggregate operators on the groups
44
45