Professional Documents
Culture Documents
CS 143 Final Exam Notes Disks
CS 143 Final Exam Notes Disks
Disks
o A typical disk
Platter diameter: 1-5 in
Cylinders: 100 – 2000
Platters: 1 – 20
Sectors per track: 200 – 500
Sector size: 512 – 50K
Overall capacity: 1G – 200GB
( sectors / track ) × ( sector size ) × ( cylinders ) × ( 2 × number of platters )
Sequential I/O
Execute a 200K program – Consisting of a single file
( 10ms ) + ( 5ms ) + ( 200K / 10MB/s) = 35ms
o Block modification
Byte-level modification not allowed
Can be modified by blocks
Block modification
Read the block from disk
2. Modify in memory
3. Write the block to disk
o Buffer, buffer pool
Keep disk blocks in main memory
Avoid future read
Hide disk latency
Buffer, buffer pool
Dedicated main memory space to “cache” disk blocks
Most DBMS let users change buffer pool size
Files
o Spanned vs. Unspanned
Unspanned – Store as many tuples into a block, forget about the extra remaining space
Spanned – Store as many tuples into a block, store part of the next tuple into the block
o Deletion
For now, ignore spanning issue, irrelevant for current discussion
What should we do?
Copy the last entry into the space
Shift all entries forward to fill the space
Leave it open and fill it with the next update
Have a pointer to point to the first available empty slot
Have a bit-map of the occupancy of the tuples
o Variable-Length Tuples
Reserved Space – Reserve the maximum space for each tuple
Variable Length
Tuple length in the beginning
End-of-record symbol
Pack the tuples tightly into a page
Update on Variable Length Tuples?
If new tuple is shorter than the tuple before – just place it where it was
If new tuple is longer than the tuple before – delete the old tuple and place it at the end of the block with free space
Slotted Page
Header slots in the beginning, pointing to tuples stored at the end of the block
o Long Tuples
Spanning
Splitting tuples – Split the attributes of tuples into different blocks
o Sequential File – Tuples are ordered by some attributes (search key)
o Sequencing Tuples
Inserting a new tuple
Easy case – One tuple has been deleted in the middle
Insert new tuple into the block
Difficult case – The block is completely full
May shift some tuples into the next block, if there are space in the next block
If there are no space in the next block, use the overflow page
Overflow page
Overflow page may over flow as well
Use points to point to additional overflow pages
May slow down performance, because this uses random access
Any problem?
PCTFREE in DBMS
Keeps a percentage of free space in blocks, to reduce the number of overflow pages
Not a SQL standard
Indexing
o Basic idea – Build an “index” on the table
An auxiliary structure to help us locate a record given to a “key”
Example: User has a key (40), and looks up the information in the table with the key
o Indexes to learn
Tree-based index
Index sequential file
Dense index vs. sparse index
Primary index (clustering index) vs. Secondary index (non-clustering index)
B+ tree
Hash table
Static hashing
Extensible hashing
o Dense index
For every tuple in the table, create an index entry which to search on, and a pointer to the tuple that it points to (so just an index
with pointers to the tuple in the block that the tuple is in)
Dense index blocks contain more indexes per block than tuples in their blocks, because dense indexes are much smaller in size
than the tuple that they point to.
o Why dense index?
Example:
1,000,000 records (900-bytes/rec)
4-byte search key, 4-byte pointer
4096-byte block
How many blocks for table?
Tuples / block = size of block / size of tuples = 4096 / 900 = 4 tuples
Records / tuples = 1,000,000 / 4 = 250,000 blocks
250,000 blocks * 4096 bytes / block = 1GB
How many blocks for index?
Index / block = 4096 / 8 = 512
Records / indexes = 1,000,000 / 512 = 1956
1956 blocks * 4096 bytes / block = 8MB
o Sparse index
For every block, create an index entry which to search on, and a pointer to the block that it points to (even smaller index size of the
dense index)
In real world, this reduces the index size dramatically, because there may be many tuples in one block, for which sparse index only
creates on index entry to those tuples
nd
o Sparse 2 level
For every index block, create an index entry which to search on, and a pointer to the index block that it points to (an index on the
index, which further reduces in size)
Can create multiple level of indexes (multi-level index)
o Terms
Index sequential file (Index Sequential Access Method)
Search key (≠ primary key)
i.e., h(‘Susan’) = 7
Properties
Uniformity – entries are distributed across the table uniformly
Randomness – even if two keys are very similar, the hash values will eventually be different
o Why hash table?
Direct access
saved space – Do not reserve a space for every possible key
o Hashing for DBMS (static hashing)
Search key → h(key), which points to a (key, record) in disk blocks (buckets)
o Record storage
Can store as whole record or store as key and pointer, which points to the record
o Overflow
Size of the table is fixed, thus there is always a chance that a bucket would overflow
Solutions:
Overflow buckets (overflow block chaining) – link to an additional overflow bucket
More widely used
Open probing – go to the next bucket and look for space
Not used very often anymore
o How many empty slots to keep?
50% to 80% of the blocks occupied
If less than 50% used
Waste space
Extra disk look-up time with more blocks that needs to be looked up
If more than 80% used
Overflow likely to occur
o Major problem of static hashing
How to cope with growth?
Data tends to grow in size
Overflow blocks unavoidable
o Extensible hashing
Two ideas
Use i of b bits output by hash function
Use the prefix of the first i bits of a string of b-bits in length (i.e., use the first 3 bits of a 5-bit hash value)
To insert into a bucket, use the first i bits and traverse the directory using those bits to the appropriate bucket
If there is space in the bucket, insert the record into the bucket
If there is no space in the bucket, and the i digit of the bucket address table is equal to the bucket i digit
Only one entry in the bucket address table points to the bucket
Increase both the i digit of the bucket address table and the i digit of the bucket by 1, and split the bucket into two, and insert
More than one entry in the bucket address table points to the bucket
Redirect one of the entries in the bucket address table to point to a new bucket, and increase the i digit current bucket by 1
Join Algorithms
o Givens
10 tuples/block
Number of tuples in R (|R|) = 10,000
Number of blocks for table R (BR) = |R| / (tuples/block) = 10,000 / 10 = 1,000 blocks in R
100
R S
o Merge Join
Steps:
If the tables are already sorted, can proceed straight into the merging and compare part of the algorithm
If the tables are not sorted, sort each table first using merge sort, then merge and compare the two tables
To sort:
o Read M blocks into main memory, sort them, then output it to a resultant partition
o Repeat until done
To merge:
o Read the first blocks of the first M partitions (if less than or equal to M partitions), or the first blocks of the first M – 1
partitions (if more than M partitions, with one block for writing the output) into main memory, sort them, then output it to a
resultant partition
o Repeat until done
Example:
Main memory usage
M blocks are used in the splitting stage
In the first pass of the merging stage (while sorting the tables), M blocks are used to store the blocks from the table
In the second pass and on of the merging stage (while sorting the tables), M – 1 blocks are used to store the blocks from the
table
Sorting the R table
1,000
Number of partitions = 102 = 10
Splitting pass
Resulting in 10 partitions
Merging pass
Resulting in one sorted table
I/O for sorting = number of passes * (2 * number of blocks in the table = 2 * (2 * 1,000) = 4,000
Sorting the S table
500
Number of partitions = = 5
102
Splitting pass
Resulting in 10 partitions
Merging pass
Resulting in one sorted table
I/O for sorting = number of passes * (2 * number of blocks in the table = 2 * (2 * 500) = 2,000
Merging
I/O for merging = BR + BS = 1,000 + 500 = 1,500
Read R blocks
Look up index on S
o For every R tuples, disk I/O = C + J (assume that disk blocks are not clustered)
Example:
Given: J = 0.1, BI (number of index blocks) = 90
Load index into main memory (because index will be accessed multiple times, and it is smaller than the main memory size) = BI
= 90
Therefore, the index lookup variable C is 0, because the whole index is in main memory.
For every tuple in R, look up the tuple in S = |R| * (C + J)
For every block in R = BR
Total I/O = BI + (BR + |R| * (C + J)) = 90 + (1,000 + 10,000 (0 + 0.1)) = 2,090
Example:
Given: J = 1, BI = 200
Logical implication
Example: R ( A, B, C, G, H I )
Functional dependencies:
o A →B
o A →C
o CG → H
o CG → I
o B →H
Conclusions:
o A → BCH, because A → ABCH
o AG → I, because AG → ABCGHI
Canonical cover
Functional dependencies
A → BC
B →C
A →B
AB → C
Is A → BC necessary?
Is AB → C necessary?
o CG → I
o B →H
{A}+ = {A, B, C, H}
{AG}+ = {A, B, C, G, H, I}
Example: StudentClass
{sid}+ = sid, name,
Key & functional dependency
Notes:
A key determines a tuple
Functional dependency determines other attributes
X is a key of R if
X → all attributes of R (ie, X+ = R)
o instructor → office
o office → fax
Decomposed tables:
o R1 ( dept, cnum, instructor, office )
R3 ( instructor, office )
R4 ( dept, cnum, instructor )
o R2 ( office, fax )
o Boyce-Codd Normal Form (BCNF)
Definition: R is in BCNF with regard to F, iff for every non-trivial X → Y, X contains a key
No redundancy due to FD
Algorithm:
For any R in the schema
If (X → holds on R AND
X → Y is non-trivial AND
X does not contain a key), then
1) Compute X (X : closure of X)
2) Decompose R into R1 (X+) and R2 (X, Z)
// X becomes common attributes
// Z: all attributes in R except X+
Repeat until no more decomposition
Example:
StudentAdvisor ( sid, sname, advisor )
FDs:
o sid → sname
o sid → advisor
Is it BCNF?
o Dependency-preserving decomposition
FD is a kind of constraint
Checking dependency preserving decomposition
Example: R ( office, fax ), office → fax
A local checking operation: look up office in the table, and make sure the newly inserted fax number is the same
Example: R1 ( A, B ), R2 ( B, C ), A → B, B → C, A → C
Check for each part of the tuple corresponding to each table to make sure that it does not violate any constraints
Do not need to check A → C because it is implied where A → B and B → C
Example: R1 ( A, B ), R2 ( B, C ), A → C
Have to join tables together to make sure that the attributes are not duplicated
BCNF does not guarantee dependency preserving decomposition
Example: R ( street, city, zip ), street, city → zip, zip → city
Use violating FD to split up table
o R1 ( zip, city )
o R2 ( zip, street )
Have to join the two tables together in order to check whether street, city → zip
1. X contains a key, or
2. Y is a member of key
Theorem: There exist a decomposition in 3NF that is a dependency-preserving decomposition
May have redundancy, because of the relaxed condition
o Multivalue dependency (MVD)
Example: Class(cnum, ta, sid). Every TA is for every student
Table:
cnum: 143, TA: tony, james, sid: 100, 101, 103
cnum: 248, TA: tony, susan, sid: 100, 102
cnum ta sid
-------------------------------
143 tony 100
143 tony 101
143 tony 103
143 james 100
143 james 101
143 james 103
248 tony 100
248 tony 102
248 susan 100
248 susan 102
Where does the redundancy come from?
In each class, every TA appears with every student
o For C1, if TA1 appears with S1, TA2 appears with S2, then TA1 also appears with S2
Definition: X →> R
R.
Complementation rule: Given X →> Y, if Z is all attributes in R except (X, Y), then X →> Z
MVD as a generalization of FD
If X → Y, then X →> Y
4NF
Remove redundancies from MVD, FD
Not dependency preserving
BCNF
No redundancies from FD
Not dependency preserving
3NF
May have some edundancies
Dependency preserving.
BCNF may not lead to a unique decomposition when there the dependency graph cannot be represented using a tree structure
Transactions and concurrency control
Transaction – Sequence of SQL statements that is considered as a unit
Example:
Transfer $1M from Susan to Jane
S1:
UPDATE Account
SET balance = balance – 1,000,000
WHERE owner = ‘Susan’
S2:
UPDATE Account
SET balance = balance + 1,000,000
WHERE owner = ‘Jane’
Increase Tony’s salary by $100 and by 40%
S1:
UPDATE Employee
SET salary = salary + 100
WHERE name = ‘Tony’
S2:
UPDATE Employee
SET salary = salary * 1.4
WHERE name = ‘Tony
Transactions and ACID property
ACID property
Atomicity: “ALL-OR-NOTHING”
o Either ALL OR NONE of the operations in a transaction is executed.
o If the system crashes in the middle of a transaction, all changes by the transaction are "undone" during recovery.
Consistency: If the database is in a consistent state before a transaction, the database is in a consistent state after the
transaction
Isolation: Even if multiple transactions are executed concurrently, the result is the same as executing them in some sequential
order
o Each transaction is unaware of (is isolated from) other transaction running concurrently in the system
Durability
o If a transaction committed, all its changes remain permanently even after system crash
With AUTOCOMMIT mode OFF
Transaction implicitly begins when any data in DB is read or written
All subsequent read/write is considered to be part of the same transaction
A transaction finishes when COMMIT or ROLLBACK statement is executed
o COMMIT: All changes made by the transaction is stored permanently
o ROLLBACK: Undo all changes made by the transaction
With AUTOCOMMIT mode ON
Every SQL statement becomes one transaction is committed
Serializable schedule
Example in handout
Schedule A
o T1
Read(A); A ← A + 100;
Write(A);
Read(B); B ← B + 100;
Write(B);
o T2
Read(A); A ← A x 2;
Write(A);
Read(B); B ← B x 2;
Write(B);
o Result = 250 vs. 250, database is still in a consistent state
Schedule B (switch the order that the transactions are executed)
o T2
Read(A); A ← A x 2;
Write(A);
Read(B); B ← B x 2;
Write(B);
o T1
Read(A); A ← A + 100;
Write(A);
Read(B); B ← B + 100;
Write(B);
o Result = 150 vs. 150, database is still in a consistent state
It is the job of the application to make sure that the transactions gets to the database in the correct order
Schedule C (inter-mingled statements)
o T1
Read(A); A ← A + 100;
Write(A);
o T2
o Read(A); A ← A x 2;
Write(A);
o T1
Read(B); B ← B + 100;
Write(B);
o T2
Read(B); B ← B x 2;
Write(B);
o Result = 250 vs. 250, database is still in a consistent state
Schedule D (inter-mingled statements)
o T1
Read(A); A ← A + 100;
Write(A);
o T2
o Read(A); A ← A x 2;
Write(A);
Read(B); B ← B x 2;
Write(B);
o T1
Read(B); B ← B + 100;
Write(B);
o Result = 250 vs. 150, database is NOT in a consistent state
Schedule E (inter-mingled statements)
o T1
Read(A); A ← A + 100;
Write(A);
o T2
o Read(A); A ← A x 1;
Write(A);
Read(B); B ← B x 1;
Write(B);
o T1
Read(B); B ← B + 100;
Write(B);
o Result = 150 vs. 150, database is still in a consistent state
Simplifying assumption
The "validity" of a schedule may depend on the initial state and the particular actions that transactions take
o It is difficult to consider all transaction semantics
We want to identify "valid" schedules that give us the "consistent" state regardless of
o the initial state
o 2) transaction semantics
We only look at database read and write operation and check whether a particular schedule is valid or not.
o Read/write: input/output from/to database
o The only operations that can screw up the database
o Much simpler than analyzing the application semantics
Notation
Sa = r1(A) w1(A) r1(B) w1(B) r2(A) w2(A) r2(B) w2(B)
o Subscript 1 means transaction 1
o r(A) means read A
o w(A) means write to A
Schedule A: Sa = r1(A) w1(A) r1(B) w1(B) r2(A) w2(A) r2(B) w2(B)
o SERIAL SCHEDULE: all operations are performed without any interleaving
Schedule C: Sc = r1(A) w1(A) r2(A) w2(A) r1(B) w1(B) r2(B) w2(B)
o COMMENTS: Sc is good because Sc is "equivalent" to a serial schedule
Schedule D: Sc = r1(A) w1(A) r2(A) w2(A) r2(B) w2(B) r1(B) w1(B)
o Dependency in the schedule
w1(A) and r2(A): T1 -> T2
w2(B) and r1(B): T2 -> T1
o Cycle. T1 should precede T2 and T2 should precede T1
o Cannot be rearranged into a serial schedule
o Is not "equivalent" to any serial schedule
Conflicting actions: A pair of actions that may give different results if swapped
Conflict equivalence: S1 is conflict equivalent to S2 if S1 can be rearranged into S2 by a series of swaps of non-conflicting
actions
Conflict serializability: S1 is conflict serializable if it is conflict equivalent to some serial schedule
A “good” schedule
Precedence graph P(S)
Nodes: transactions in S
Edges: Ti → Tj if
o pi(A), qj(A) are actions in S
o 2) pi(A) precedes qj(A)
o 3) At least one of pi, qj is a write
P(S) is acyclic ⇔ S is conflict serializable
Summary:
Good schedule: conflict serializable schedule
Conflict serializable <=> acyclic precedence graph
Recoverable/cascadeless schedule
Recoverable schedule: Schedule S is RECOVERABLE if Tj reads a data item written by Ti, the COMMIT operation of Ti
appears before the COMMIT operation of Tj
Cascadeless schedule: A single transaction abort leads to a series of transaction rollback
o Transaction
Sequence of SQL statements that is considered as a unit
Motivation
Crash recover
Concurrency
Transactions and ACID property
ACID property
Atomicity: “ALL-OR-NOTHING”
o Either ALL OR NONE of the operations in a transaction is executed.
o If the system crashes in the middle of a transaction, all changes by the transaction are "undone" during recovery.
Consistency: If the database is in a consistent state before a transaction, the database is in a consistent state after the
transaction
Isolation: Even if multiple transactions are executed concurrently, the result is the same as executing them in some sequential
order
o Each transaction is unaware of (is isolated from) other transaction running concurrently in the system
Durability
o If a transaction committed, all its changes remain permanently even after system crash
Main questions:
What execution orders are "valid"?
o We first need to understand what execution orders are okay
Serializability, Recoverability, Cascading rollback
How can we allow only "valid" execution order?
o Concurrency control mechanism
Serializable and conflict serializable schedules
Simplifying assumption
We only look at database read and write operation and check whether a particular schedule is valid or not.
o Read/write: input/output from/to database
o The only operations that can screw up the database
o Much simpler than analyzing the application semantics
Definition: All operations are performed without any interleaving
Is r1(A) w1(A) r2(A) w2(A) r1(B) w1(B) r2(B) w2(B) a serializable schedule?
o No, r1(B) w1(B) of transaction 1 went after r2(A) w2(A) of transaction 2, which makes transaction 1 and transaction 2
interleaving
Dependencies in the schedule
Example: r1(A) w1(A) r2(A) w2(A) r2(B) w2(B) r1(B) w1(B)
o w1(A) and r2(A): T1 → T2
Serial → recoverable
Serial → cascadeless
Serial guarantees read/write/commit actions of one transaction are all grouped together, thus transactions are cascadeless
Cascadeless → recoverable