You are on page 1of 17

Database Modeling

Spring Valley Tech Corp ∙ Java SMP+ Teacher Training ∙ Jorge Cosgayon
Relational DBs and OLTP

 On-line Transactional Processing


 High throughput, simple queries
 INSERT/UPDATE intensive
 Used concurrently by many users
 Key goals:
 Integrity
 Availability
 Speed of transactions
 Concurrency
 Recoverability
Data Integrity

 The accuracy and consistency of data stored in a data store


 The database integrity is maintained in the relational model using primary and foreign keys
 Reliability perspective
 Data does not represent conflicting information
 Process perspective
 Data has remained unaltered in transit from creation to reception
 State perspective
 A measure of the validity and fidelity of a data object
 Security perspective
 Maintains information exactly as it was inputted
Codd’s Rules

1. The Information Rule. All information in a relational database is represented explicitly at the logical level
and in exactly one way: by values in tables
2. Guaranteed Access Rule. Each and every datum (atomic value) in a relational database is guaranteed to be
logically accessible by resorting to a combination of table name, primary key value, and column name
3. Systematic Treatment of Null Values. Null values (distinct from the empty character string or a string of
blank characters and distinct from zero or any other number) are supported in fully relational DBMS for
representing missing information and inapplicable information in a systematic way, independent of data
type.
Codd’s Rules

4. Dynamic On-line Catalog Based on the Relational Model. The database description is represented at the
logical level in the same way as ordinary data, so that authorized users can apply the same relational
language to its interrogation as they apply to the regular data
5. Comprehensive Data Sublanguage Rule. A relational system may support several languages and various
modes of terminal use. However, there must be at least one language whose statements are expressible, per
some well-defined syntax, as character strings and whose ability to support all the following is
comprehensive: data definition, view definition, data manipulation (interactive and by program), integrity
constraints, and transaction boundaries
Codd’s Rules

6. View Updating Rule. All views that are theoretically updatable are also updatable by the system
7. High-level Insert, Update, and Delete. The capacity of handling a base relation or a derived relation as a
single operand applies not only to the retrieval of data but also to the insertion, update, and deletion of data
8. Physical Data Independence. Application programs and terminal activities remain logically unimpaired
whenever any changes are made in either storage representations or access methods
Codd’s Rules

9. Logical Data Independence. Application programs and terminal activities remain logically unimpaired
whenever information-preserving changes of any kind that theoretically permit unimpairment are made to
the base tables
10. Integrity Independence. Integrity constraints specific to a particular relational database must be definable in
a relational data sublanguage and storable in the catalog, not in the application programs
11. Distribution Independence. Distribution independence implies that users should not have to be aware of
whether a database is distributed
12. Nonsubversion Rule. If a relational system has a low-level (single-record-at-a-time) language, that low level
cannot be used to subvert or bypass the integrity rules and constraints expressed in the higher level relational
language (multiple-records-at-a-time)
Key Features of a Relational Model

 Each row in the table is called tuple.


 Each column in the table is called attribute.
 The intersection of row with the column will have data value.
 In relational model rows can be in any order.
 In relational model attributes can be in any order.
 By definition, all rows in a relation are distinct. No two rows can be exactly the same.
 Relations must have a key. Keys can be a set of attributes.
 For each column of a table there is a set of possible values called its domain. The domain contains all
possible values that can appear under that column.
Key Features (cont.)

 Domain is the set of valid values for an attribute.


 Degree of the relation is the number of attributes (columns) in the relation.
 Cardinality of the relation is the number of tuples (rows) in the relation.
The Concept of Keys

 A key is an attribute or group of attributes, which is used to identify a row in a relation. Key can be broadly classified into (1)
Superkey (2) Candidate key, and (3) Primary key
 Superkey
 A superkey is a subset of attributes of an entity-set that uniquely identifies the entities.
 Superkeys represent a constraint that prevents two entities from ever having the same value for those attributes.
 Candidate Key
 Candidate key is a minimal superkey. A candidate key for a relation schema is a minimal set of attributes whose values uniquely identify
tuples in the corresponding relation. Composed of prime attributes.
 Primary Key
 The primary key is a designated candidate key. It is to be noted that the primary key should not be null.
 Foreign Key
 Foreign key is set of fields or attributes in one relation that is used to ―refer‖ to a tuple in another relation.
How to find the candidate keys

 Given: R(a, b, c, d, e, f)
 A → C, C → D, D → B, E → F
 C, D, B, F can be determined
 Those who can’t be determined must be in the candidate key
 A, E
 Find attributes that can be determined by the set AE
 (AE)+ = AECDBF
 If all attributes can be determined by that set, it is the candidate key
Normalization

 A process applied to a relation in order to conform to Codd’s rules


 Allows the data to be queried and manipulated using a language grounded in first-order logic (predicate
logic)
 Prevents side-effects (anomalies)
 Update, Insertion, and Deletion anomalies
Overview

 Student information database:


 Student info
 Student ID, course(s), major(s), etc
 Phone numbers, emails, addresses, etc
 Personal info: name, gender, etc
 Classes taken
 Scores for assignments, projects, quizzes, exams, etc
First Normal Form

 A relation is in first normal form if and only if the domain of each attribute contains only atomic (indivisible)
values, and the value of each attribute contains only a single value from that domain.
 Steps:
 Eliminate repeating groups in individual tables.
 Create a separate table for each set of related data.
 Identify each set of related data with a primary key
Second Normal Form

 A relation is in 2NF if it is in 1NF and non-key attributes are functionally dependent on the primary key.
 Nonprime attributes cannot be dependent on prime attributes
 Functional dependency is a relationship that exists when one attribute uniquely determines another attribute.
 A functional dependency is trivial if Y is a subset of X. In a table with attributes of employee name and
Social Security number (SSN), employee name is functionally dependent on SSN because the SSN is unique
for individual names. An SSN identifies the employee specifically, but an employee name cannot distinguish
the SSN because more than one employee could have the same name.
Third Normal Form

 A relation is in 3NF if it is in 2NF and has no transitive dependencies


 Nonprime attributes cannot be dependent on other nonprime attributes
 A transitive dependency is a functional dependency which holds by virtue of transitivity. A transitive
dependency can occur only in a relation that has three or more attributes.
  Let A, B, and C designate three distinct attributes (or distinct collections of attributes) in the relation. Suppose
all three of the following conditions hold:
 A→B
 It is not the case that B → A
 B→C
 Then the functional dependency A → C is a transitive dependency.
Boyce-Codd Normal Form (BCNF)

 A stronger version of 3NF


 A relation is in BCNF if it is in 3NF and contains no trivial dependencies
 For each functional dependency such that X → Y, X should be a super key
 Prime attribute cannot be dependent on another attribute
 Ex: R(A, B, C) where AB → C, C → B
 A must be part of the candidate key because no other attribute can find it
 (AB)+ and (AC)+ are our candidate keys
 R(A,B,C) is in 3NF, but not BCNF because C is not a superkey (does not determine all other keys)

You might also like