Professional Documents
Culture Documents
ER Model Basics
ssn
name
lot
Employees
Entity: Real-world object distinguishable from other objects. An entity is described (in DB) using a set of attributes. Entity Set: A collection of similar entities. E.g., all employees.
All entities in an entity set have the same set of attributes. (Until we consider ISA hierarchies, anyway!) Each entity set has a key. Each attribute has a domain.
name lot
Employees
supervisor subordinate
Reports_To
Relationship: Association among two or more entities. E.g., Attishoo works in Pharmacy department. Relationship Set: Collection of similar relationships. An n-ary relationship set R relates n entity sets E1 ... En; each relationship in R involves entities e1, ..., en. Same entity set could participate in different relationship sets, or in different roles in same set.
Cartesian or Cross-Products
A tuple <a1,a2,,an> is just a list with n elements in order. A binary tupe <a,b> is called an ordered pair. Given two sets A,B, we can form a new set A x B containing all ordered pairs <a,b> such that a is a member of A, b is a member of B. In set notation: A x B = {<a,b> | a in A, b in B}. Example: {1,2,3} x {x,y} = {<1,x>,<1,y>,<2,x>,<2,y>,<3,x>,<3,y>}
since dname
lot
did
budget
Consider Works_In: An employee can work in many departments; a dept can have many employees. In contrast, each dept has at most one manager, according to the key constraint on Manages.
Employees
Manages
Departments
1-to-1
1-to Many
Many-to-1
Many-to-Many
Participation Constraints
Does every department have a manager? If so, this is a participation constraint: the participation of Departments in Manages is said to be total (vs. partial). Every did value in Departments table must appear in a tuple of the Manages relation.
name ssn lot
since
dname
did
Manages
budget
Departments
Employees
Works_In
since
Weak Entities
A weak entity can be identified uniquely only by considering the primary key of another (owner) entity. Owner entity set and weak entity set must participate in a one-to-many relationship set (one owner, many weak entities). Weak entity set must have total participation in this identifying relationship set.
name ssn lot cost pname age
Employees
Policy
Dependents
name
ssn
lot
Employees
in C++, or other PLs, hourly_wages attributes are inherited. If we declare A ISA B, every A entity is also considered to be a B entity.
Hourly_Emps
Contract_Emps
Overlap constraints: Can Joe be an Hourly_Emps as well as a Contract_Emps entity? (Allowed/disallowed) Covering constraints: Does every Employees entity also have to be an Hourly_Emps or a Contract_Emps entity? (Yes/no) Reasons for using ISA: To add descriptive attributes specific to a subclass. To identify entities that participate in a relationship.
name
ssn
lot Employees
Aggregation
Used when we have to model a relationship involving (entity sets and) a relationship set. Aggregation allows us to treat a relationship set as an entity set for purposes of participation in (other) relationships.
Monitors
until
started_on pid
Projects pbudget
since did
dname
budget
Departments
Sponsors
Aggregation vs. ternary relationship: Monitors is a distinct relationship, with a descriptive attribute. (i.e., until) Also, can say that each sponsorship is monitored by at most one employee.
Should a concept be modeled as an entity or an attribute? Should a concept be modeled as an entity or a relationship? Identifying relationships: Binary or ternary? Aggregation?
from
to
dname
did budget Departments
Employees
dname
budget Departments
from
Duration
to
budget
Departments
dbudget
If each policy is owned by just 1 employee, and each dependent is tied to the covering policy, first diagram is inaccurate. What are the additional constraints in the 2nd diagram?
Dependents
Bad design
Dependents
Beneficiary
Better design
policyid
Policies cost
Summary of ER (Contd.)
Several kinds of integrity constraints can be expressed in the ER model: key constraints, participation constraints, and overlap/covering constraints for ISA hierarchies. Some constraints (notably, functional dependencies) cannot be expressed in the ER model. (e.g., z = x + y) Constraints play an important role in determining the best database design for an enterprise.
Summary of ER (Contd.)
ER design is subjective. There are often many ways to model a given scenario! Analyzing alternatives can be tricky, especially for a large enterprise. Common choices include: Entity vs. attribute, entity vs. relationship, binary or n-ary relationship, whether or not to use ISA hierarchies, and whether or not to use aggregation. Ensuring good database design: resulting relational schema should be analyzed and refined further. FD information and normalization techniques are especially useful.
Title: Operating System Support for Database Management Author: Michael Stonebraker Pages: 217223
Problem Definition
Apparent disconnect between DBMS performance goals and operating system design and implementation. Services provided by OS are inadequate and suboptimal. Paper evaluates the following services: Buffer pool management File system Interprocess communication Consistency control Paged virtual memory
Contributions
Demonstrates OS services are too slow or inappropriate for DBMS tasks.
Attempts to make OS designers aware of and more sensitive to DBMS needs.
Key Concepts
Buffer Pool Management
OS has a fixed buffer pool that handles all I/O UNIX uses LRU replacement strategy, which may not be ideal for a DBMS Large performance overhead to pull a block into the buffer. Approx. 5000 instructions for 512 bytes No good prefetch strategy. UNIX does not implement a selected force out buffer manager where the DBMS can dictate the order of the commits
Key Concepts
The File System
UNIX implements its file system as character arrays and forces the DBMS to implement its own higher level objects. Tree Structured File Systems UNIX implements 2 service using trees
Keeping track of blocks in a given file Hierarchical directory structure
DBMS adds a third tree to support keyed access One tree with all 3 kinds of information is more efficient.
Key Concepts
Scheduling Process Management and Interprocess Communication
Performance Task switches are inevitable Processes have a great deal of state information making task switches expensive Critical Sections Buffer pool is a shared data segment. Problems arise if OS deschedules a DB process holding a lock on the buffer pool. Server model OS needs to provide a message facility for multiple processes to message a single process. Server must do its own scheduling and multitasking.
Key Concepts
Consistency Control
Many Operating Systems can only place locks at the file level. DBMS prefer finer granularity. When DBMS implement its own buffer pool, crash recovery by the operating system would be impossible.
Large files may not be able to be stored in memory Binding chunks of the file into user space may incur a performance loss.
Validation
Content is mostly informational. Based off previous papers and existing implementations of current systems.
Examples are cited primarily from the UNIX OS and the Ingres DBMS.
Issues could be biased and may not be common or applicable to all OS and DBMS combinations.
Assumptions
Presents the topic as one that is applicable to across a number of DBMS and OS Author constrains his examples to UNIX and Ingres. Paper was written in 1981. Operating Systems have advanced considerably since then. His points may no longer be applicable.
The R* Tree: An Efficient and Robust Access Method for Points and Rectangles
Problem
Problem Statement Why is this problem important? Why is this problem hard?
Approaches
Approach description, key concepts Contributions (novelty, improved) Assumptions
Novelty of Contribution
Related Work Traditional one-dimensional indexing structures (e.g., hash, B-tree) are not appropriate for range search B+ tree Represents sorted data in a way that allows for efficient insertion and removal of elements. Dynamic, multilevel index with maximum and minimum bounds on the number of keys in each node. Leaf nodes are linked together as a linked list to make range queries easy.
Novelty of Contribution
Related Work R-tree R-tree is a foundation for spatial access method A complex spatial object is represented by minimum bounding rectangles while preserving essential geometric properties Over-lapping regions Heuristic: minimize the area of each enclosing rectangle in the inner nodes.
Principles of R-tree
Height-balanced tree similar to a B-tree with index records in its leaf nodes containing pointers to data objects. Heuristic Optimization: minimize the area of each enclosing rectangle in the inner nodes.
Reference: A Guttman R-tree a dynamic index structure for spatial searching, 1984
Example
R1 R2 R1 R2 R4 R5 R3 R5 R3 R4
Preferred by R-tree
R1 R2
R4 R5 R3
Preferred by R*-tree
Validation Methodology
Methodology
Experiments with simulated workloads Evaluation of design decisions
Results
R*-tree outperforms variants of R-tree and 2-level grid file. R*-tree is robust against non-uniform data distributions.
Summary
Papers focus
R*-tree implementations and performance
Ideas
Heuristic Optimizations (pp. 208)
Reduction of area, margin, and overlap of the directory rectangles