You are on page 1of 150

Data Modeling and Database Design

Minder Chen, Ph.D. Team


Team number
mchen@gmu.edu Specialty
Customer
Customer number
member Customer name
is a member of
Customer address
Customer activity
Customer telephone
Employee Customer fax
Division
Employee number
Division number
belongs to First name
Division name
Last name subcontract
Division address
Employee function
Employee salary staffed by

is assigned to Project
Project number
Project name
Task Project label
Task name contains Start date
Task cost End date
Data Modeling and Database Design Course Outline
• INTRODUCTION
– Introduction to Data Modeling
– Database Development Life Cycle Overview
• ENTITY AND RELATIONSHIP
– Develop the Subject Area Diagram
– Develop Preliminary Data Model: Entity & Relationship
Identification
• ATTRIBUTES AND SUBTYPES
– Attributes Identification and Definition
– Develop Fully Attributed Data Model
– Identifiers
– Data Modeling Exercise
– Partitioning and Entity Subtypes
• NORMALIZATION
– Normalization
– Normalization Exercise
– De-normalization
• DATA MODEL EVALUATION AND MAPPING TO RELATIONAL DBMS

– Refine a Data Model: Analysis and Simplification


– Transform to Physical Data Base Design
• PowerDesigner: Data Architect
• Pysical DB Design and Data Warehouse DB Design
© Minder Chen, 1993~2002 Data Modeling - 2 -
References
Data Modeling and Database Design
1. Batini, Ceri, Navathe, Conceptual Database Design, Redwood City, CA: The
Benjamin/Cummings Publishing Company, Inc., 1992.
2. Teorey, T. J., Database Modeling and Design: The Entity-Relationship Approach,
Morgan Kaufmann Publishers, Inc., 1990.
3. Thomas A. Bruce, Designing Quality Databases with IDEF1X Information Models,
Dorset House Publishing, NY: New York, 1991.
4. Texas Instruments, A Guide to IE Using IEF, 2nd edition, Part No. 2739756-0001,
1990.
5. Martin, James, Information Engineering Book II: Planning and Analysis, Prentice-
Hall Inc., 1989.
6. Dave Ensor, Ian Stevenson, Oracle Design, O'Reilly & Associates, 1997
7. Rob Gillette, etc., Physical Database Design for Sybase SQL Server, Prentice
Hall, 1995.
8. Ralph Kimball, The Data Warehouse Toolkit, Wiley, 1996.

JAD References
1. August, J. H.. Joint Application Design: The Group Session Approach to System
Design. Englewood Cliffs, NY, Prentice Hall, Inc., 1991.
2. Wood, J. and Silver, D. Joint Application Design: How to Design Quality Systems
in 40% Less Time. New York, NY, John Wiley & Sons, 1989.
3. Andrews, D. C. and Leventhal, N. S., Fusion: Integrating IE, CASE, and JAD: A
Handbook for Reengineering the Systems Organization, Englewood Cliffs, NJ:
Yourdon Press, 1993.
© Minder Chen, 1993~2002 Data Modeling - 3 -
Data Modeling and Database Design: INTRODUCTION

• Systems Development Life Cycle (SDLC)


in a Client/Server Environment
• Introduction to Data Modeling
• Database Development Life Cycle
Overview

© Minder Chen, 1993~2002 Data Modeling - 4 -


Rationales for Data Modeling
• Data is the foundation of modern information
systems enabled by data base technologies.
• Data in an organization exist and can be described
independently of how these data are used.
• Data should be managed as a corporate-wide
resource.
• The types of data used in an organization do not
change very much.
• Data have certain inherent properties which lead to
correct structuring.
• If we structure data according to their inherent
properties, the structure (i.e., data models) will be
stable.

© Minder Chen, 1993~2002 Data Modeling - 5 -


History of Data Modeling
• Importance of Entity-Relationship Modeling Technique
– Database
– Data modeling and enterprise-wide data
– Data quality
– Data updating and accessing tools and procedure
– Data sharing culture
• ER modeling technique was first developed by Peter Chen in
1976
– A conceptual/logical data modeling tool
– A user-oriented approach
– A graphic-based method
• ER modeling technique is the major data modeling method in
Information Engineering and is widely supported by most of
CASE tools.
• Data modeling is the foundation of most database-centered
transaction processing systems and data warehouse systems

© Minder Chen, 1993~2002 Data Modeling - 6 -


CSC Development Strategies
HIGH
HIGH
• RE-CREATE new business process & systems from scratch

• RE-ENGINEER business process & systems

Risk • RE-DESIGN current systems


Long Term Reward
Short Term Costs • RE-HOST current systems
Degree of Change
• RE-IMAGE current systems

LOW
LOW
© Minder Chen, 1993~2002 Data Modeling - 7 -
Distribution of Business Function (Logic)
Presentation
Space
Presentation Presentation Function Data Data Data
Service Logic Logic Logic Service
Space

Client Server
• • Functions that access data
Presentation logic
• Local input validation on the server
• • Functions that need input
Output production logic
• Local peripheral drivers from multiple users
• • Functions that coordinate
Performance critical processing
the work of several user
Issues:
• Distribution of data
• Platform-specific capabilities and interoperability
• Connectivity capabilities/platform
• Frequency of change to codes
• Configuration management
© Minder Chen, 1993~2002 Data Modeling - 8 -
C/S Development Methodology
SDLC rules=> performance =>
Conceptual Logical Physical
C/S Analysis Design Design
Architecture

User Work Form Forms,


Interface Flow Sequences Screens

Application Process Object Programs,


Logic Flow Interaction Procedures
Model

Information Data Database Tables,


& Data Base Model Schema Indexes

Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.


© Minder Chen, 1993~2002 Data Modeling - 9 -
Client/Server Application Development Methodology
Wh
Sta ere D
rt ? oY
ou
Requirements

Information Processes Workflow


Architecture
& Data Base Behavior User Interface

Application
Design and
Development

Source: David Vaskevitch, Client/Server Strategies, IDG Books, 1993.

© Minder Chen, 1993~2002 Data Modeling - 10 -


Data Modeling (Data Base Design) Process
Information Requirements
A conceptual DB schema is a high-
Conceptual level description of the database,
DB Design independent of the particular DBMS.

Conceptual (Enterprise) DB Schema


A logical DB schema is a description of
Logical
the structure of the database that can be
DB Design processed by a DBMS: relational, network,
or hierarchical.
Logical DB Schema
A physical DB schema is a description of the
Physical implementation of the database in external
memory; it describes the storage structures
DB Design and access methods used in order to
effectively access and maintain data.

Physical DB Schema
Source: Batini, C., Ceri, S., and Navathe, S. B., Conceptual Database Design: An Entity-
Relationship Approach, The Benjamin/Cummings Publishing Company, Inc., 1992.
© Minder Chen, 1993~2002 Data Modeling - 11 -
Multiple Perspectives
We use We do
this data ONE these things
BUSINESS

DATA ACTIVITY

HIRE PAY
EMPLOYEE EMPLOYEE

EMPLOYEE
......
......
....
....

PROMOTE FIRE
© Minder Chen, 1993~2002 EMPLOYEE EMPLOYEE Data Modeling - 12 -
Data Model (Entity Relationship Diagram)

Member placed by; is enrolled under;


Member Agreement
Order places applies to

sells; generates; established by;


is sold on generated by established

is featured in; sponsors;


Product Promotion Club
features is sponsored by

© Minder Chen, 1993~2002 Data Modeling - 13 -


Entity Relationship Diagram: Subject Area and Entity Type

• Subject Area and Subject Area


Diagram
• Entity Types
• Entity Instances
• Finding Entity Types
• Evaluating Entity Types

© Minder Chen, 1993~2002 Data Modeling - 14 -


Subject Area (Submodel)
• A natural area of interest to the business that is centered on a major
resource, inputs, outputs, or activity of the business.
• It contains a set of entity types.
• We start the data modeling in the ISP stage by identifying subject
areas with names and descriptions.
• In BAA stage, subject areas are used to as high level grouping of
entity types.
• Naming: a subject area is a noun in plural form and often has the
name as the central entity type in the subject area.
• Examples:

Projects

Project Member Task

Project

© Minder Chen, 1993~2002 Data Modeling - 15 -


Subject Area Diagram

Customers Products Raw-materials

Orders Suppliers
Sales-persons
Buyers Purchas
e
Legends
Orders
: Subject Area

: Association
© Minder Chen, 1993~2002 Data Modeling - 16 -
Entity Types
• Definition:
– An entity is an object or event, real or abstract, about
which we would like to store data. Entity is the
abbreviation of entity type. It represent a set of
entity instances which can be described by the
same set of attribute types. The value of the same
attribute for each entity instance may be different.
• Identifying Entity Types
– What information is required by the business?
– Things that are of interest to the business that need
to be remembered in order to manage and track
them.
– Things belong to the same entity type have common
characteristics.

© Minder Chen, 1993~2002 Data Modeling - 17 -


Naming Entity Types
• The name of each entity is in singular form
– a noun
– an adjective + a noun
– a noun + a noun => (noun string)
– an adjective + a noun + a noun
• Examples
– Customer, Customer Order, Product, Hourly Employee, Project, Department, Unfilled
Customer Order
• Be clear and concise
• Avoid abbreviation
• Be consist with user’s terminology
• Identify synonyms
– Customer Client
– Product Merchandise
– Supplier Vendor
– Teacher Faculty
• Use one name as the official name and document others as aliases

© Minder Chen, 1993~2002 Data Modeling - 18 -


Exercise: Entity Type Naming
• Courses
• Department
• Customer Order
• PO

© Minder Chen, 1993~2002 Data Modeling - 19 -


Properties of Entity Types
• Name
• Description
• Identifier
• Properties: Estimated number (Max., Min., Average) of
entity instances
• Expected growth rate of entity instances
• Subject Area in which the Entity Type resides
• Attributes that describe the Entity Types
• Examples of entity type instances

© Minder Chen, 1993~2002 Data Modeling - 20 -


Definition of an Entity Type
• A poor definition of Customer: Anyone
that buys something from the company.
– Can employees be a customer?
– Can a leasor be a customer?
– If the company sold a subsidiary to another
company, does the new owner consider a
customer?
• Good definition should be:
– Compatible
– Precise
– Concise
– Clear
– Complete
© Minder Chen, 1993~2002 Data Modeling - 21 -
Good Definition
• Compatible
– Customer: An ORGANIZATION that purchase
PRODUCTs for personal use.
– Distributor: An ORGANIZATION that purchase
PRODUCTs for resale.
• Precision:
– With appropriate qualifiers
– Example: An ORGANIZATION is considered to have
purchase a PRODUCT when we receive a valid
PURCHASE ORDER from it.
• Complete
– ORGANIZATION, PRODUCT, PURCHASE ORDER need
to be defined.
• Concise and Clear
– Use modular definition

© Minder Chen, 1993~2002 Data Modeling - 22 -


Example of Entity Type Descriptions

Entity Type Description


Information about all persons or organizations who
Customer purchases
Product All goods manufactured and sold
Raw-material Components used to manufacture Products.
Supplier Vendors of Raw Materials.
Company personnel responsible for purchasing
Buyer Raw-Materials from Suppliers

© Minder Chen, 1993~2002 Data Modeling - 23 -


Entity Type and Entity Instance (Occurrence)
Entity Types Entity Instance
Vendor ABC Co.
Employee John Smith
Course Intro. to IE
Department Marketing Department

© Minder Chen, 1993~2002 Data Modeling - 24 -


Exercise: Entity Types or Entity Instances?
• Maryland
• Organization Unit
• Customer
• President
• Bill Clinton
• Department of Commerce
• Address

© Minder Chen, 1993~2002 Data Modeling - 25 -


Finding Entity Types
• Interviews with users
• JAD workshops
• Business forms
• Reports
• Computer files using reverse engineering
• Operation manuals

© Minder Chen, 1993~2002 Data Modeling - 26 -


Where to Look for an Entity Type?
• Tangible or Intangible Things
– The nouns that are used to describe the problem domain will often
correspond to the major Entity Types of the system, at least at a high level.
– Examples: Product, Sensor, and Employee, Department, and Sale Office.
• Resources
– Any resources that an organization needs to manage should be represented
as an Entity Type. Information assists the efficient and effective use of other
resources through improved decision.
– Examples: Inventory, Machine, Bank Account, and Customer.
• Roles Played
– Roles can be played by persons or organizational units.
– Examples: Customers, Managers, and Account representatives.
• Events
– Events are incidents that occur at points in time. An event often involved an
interaction between two Entity Types or an action that changes the status of
an Entity Type.
– Examples: Sale, Delivery, and Registration of a motor vehicle.

© Minder Chen, 1993~2002 Data Modeling - 27 -


BIAIT: Business Information Analysis and Integration Technique

• Analysis of Orders
• Ordered entities can be a thing, a space, or a skill.
• View the order from supplier side.
• If an organization receives no orders, it has no reason
for existing.
• An organization unit can receive multiple types of
orders.
• 4 questions about the Supplier:
– Billing (Cash)?
– Deliver Late (Immediate)?
– Profile customer?
– Negotiate price (Fixed)?
• 3 questions about the Ordered Entity:
– Rented (Sold)?
– Tracked?
– Made to order (Stock)?
Source: Carlson, W. M., "BIAIT: Business Information Analysis and Integration Technique -
The New Horizon," Data Base, Vol. 10, No. 4, 1979, pp. 3-9.
© Minder Chen, 1993~2002 Data Modeling - 28 -
Criteria for Evaluating an Entity Type
• Need to be remembered by the information system in order to
be functional.
• Can be operated on: CREATE, READ, UPDATE, DELETE.
• Has a set of operations/services that always apply to change
the status of each occurrence of an Entity Type.
• Carry a set of attributes that always apply to describe each
occurrence of an Entity Type.
• Have at least one relationship with other entity type.
• Exist more than one entity occurrence (instance) in an Entity
Type.
• Have at least a unique identifier.
• Domain-based requirements: Something that the system must
have in order to operate. These may be clearly specified in the
problem description or known from subject matter experts.

© Minder Chen, 1993~2002 Data Modeling - 29 -


Entity Relationship Modeling and Diagramming
• Relationships
• Entity Relationship Diagramming
Notation
• Attributes
• Identifiers
• Partitioning and Entity Subtypes

© Minder Chen, 1993~2002 Data Modeling - 30 -


Relationship (Type)
• Definition
– A Relationship Type is an association among Entity
Types. It indicates that there is a business
relationship between these Entity Types.
– Relationship Membership is the participation of an
Entity Type in a Relationship.
– In IE, a Relationship Type can involve only two Entity
Types (binary relationship). Some other modeling
techniques allow n-ary relationships.
• Examples
– CUSTOMER places ORDER
– ORDER is placed by CUSTOMER
– EMPLOYEE works on PROJECT
– PROJECT has project member EMPLOYEE

© Minder Chen, 1993~2002 Data Modeling - 31 -


Paring (Relationship Instance)
• Relationship paring is a pair of Entity Instances of two
Entity Types associated by a Relationship Type between
these two Entity Types.
Entity Types Entity Instance
Student#1
Student
Student#2
Course#A
Course#B
Course Course#C
Course#D

Relationship Relationship Paring


Student#1 takes Course#A
Student Student#1 takes Course#B
Student#1 takes Course#D
takes Student#2 takes Course#A
Course Student#2 takes Course#C
Student#2 takes Course#D
© Minder Chen, 1993~2002 Data Modeling - 32 -
Relationship Instances Grouping
• Definition: A collection of pairings of
a Relationship Membership in which
an Entity Instance is involved.
• Examples:
– Student#1 takes Course#A, #B, and #D
– Student#2 takes Course#A, #C, and #D
– Course#A is taken by Student#1 and
Student#2

© Minder Chen, 1993~2002 Data Modeling - 33 -


Relationship Cardinality
E1 E2
One-to-One
One-to-One
1:1

E2
One-to-Many
One-to-Many
E1

1:M

Many-to-Many E2
Many-to-ManyE1
M:N

© Minder Chen, 1993~2002 Data Modeling - 34 -


Relationship Cardinality
• The number of Entity Instances involved in the
Relationship Instances Grouping in a Relationship Type.
• Three Forms of Cardinality
1. One-to-one (1:1)
DEPARTMENT has MANAGER
Each DEPARTMENT has one and only one MANAGER
Each MANAGER manages one and only one DEPARTMENT

2. One-to-many (1:m)
CUSTOMER places ORDER
Each CUSTOMER sometimes (95%) place one or more ORDERs
Each ORDER always is placed by exactly one CUSTOMER

3. Many-to-many (m:n)
INSTRUCTOR teaches COURSE
Each INSTRUCTION teaches zero, one, or more COURSEs
Each COURSE is taught by one or more INSTRUCTORs

© Minder Chen, 1993~2002 Data Modeling - 35 -


Entity Relationship Diagram (ERD): Notations
Graphical
GraphicalNotations
Notations Cardinality zero
indicator one
many
relationship-description
Entity-X reversed-relation-description Entity-Y
min max

Translate
Translateinto
intotwo
twostructured
structuredstatements
statements

Each Entity-X relationship-description cardinality-indicator (one-or-many) Entity-Y


Each Entity-Y reversed-relationship-description (zero-or-one) Entity-Y

Example
Example is-managed-by
Department manages Manager

© Minder Chen, 1993~2002 Data Modeling - 36 -


Optionality of Relationship Memberships
• Whether all entity instances of both entity
types need to participate in relationship
pairing.
• Optionality:
– Mandatory
– Optional
• Example:
– CUSTOMER membership is optional
– ORDER membership is mandatory
places
CUSTOMER is placed by ORDER
© Minder Chen, 1993~2002 Data Modeling - 37 -
Relationship Statements
Cardinality one
indicator one or more
Graphical
GraphicalNotations
Notations
places
CUSTOMER is placed by ORDER

Optionality zero (sometimes)


indicator one (always)

Each
EachEntity
EntityXX optionality
optionality relationship
relationship cardinality
cardinality Entity
EntityYY

Each CUSTOMER sometimes places one or more ORDER.


Each ORDER always is placed by one CUSTOMER.

© Minder Chen, 1993~2002 Data Modeling - 38 -


Defining Relationships
• Name
• Description
• Property
– Cardinality volumes
– Optionality percentage: % of Entity Type X's
instances pairing with Entity Type's Y's
instances
– Transferability: A relationship is transferable if
an entity instance can change its pairing within
the same relationship.
» TRANSFERABLE: An EMPLOYEE can change to a
different DEPARTMENT.
» NON-TRANSFERABLE: An ORDER cannot be
transferred to another CUSTOMER.

© Minder Chen, 1993~2002 Data Modeling - 39 -


ERD: More Examples
(a) places
Customer
belongs-to Order

is-contained-in
Product
contains

(b) manages Parallel


is-managed-by Relationsh
Employee Project
works-for ip
has-project-members

is-consists-of
(c)
Involuted or
Looped
Part contained-in Relationship

© Minder Chen, 1993~2002 Data Modeling - 40 -


ERD: Alternative Notations

places
Customer Order
belongs-to

Alternative
AlternativeNotations:
Notations:

places
Customer
belongs-to Order
places
Customer
belongs-to Order
1 M
Customer places Order

© Minder Chen, 1993~2002 Data Modeling - 41 -


Identifying Relationships
• Association between entity types
• Entity types that are used on the
same forms or documents.
• A description in a business document
that has a verb that relates two entity
types
– has
– consists of
– uses

© Minder Chen, 1993~2002 Data Modeling - 42 -


Attributes
• Definition
– Characteristics that could be used to describe Entity Types and
Relationship Types. However, in IE, relationship types are not allowed
to have attributes.
• Naming Conventions:
– Names that have business meaning
– Don't use abbreviation or possessive case, e.g., PN and Customer's
name
– Don't include entity type name because IEF will prefix the attribute name
with entity type name automatically
– Use standard format:
Entity Type Name (Qualifiers) Domain Name
Customer Name
Employee Starting Date
• Examples
– Customer has customer name, address, and telephone number
– Product has quantity-on-hand, weight, volume, color, and name.
– Employee has SSN, salary, and birthday.
– Employee-works-for-project has percentage-of-time, starting-date.

© Minder Chen, 1993~2002 Data Modeling - 43 -


Attributes: Notations
Student ID
Employee Student
Student Student Name
Employee number
First name
studentID
Birth date
Last name
Student ID Employee function name
enrollment Course no. Employee salary phone
Birth date

Student(Student
Student(Student ID,
ID, Student
Student Name,
Name, Birth
Birth Date)
Date)
Finding
FindingAttributes:
Attributes:
Attributes
Attributesareareidentified
identifiedprogressively
progressivelyduring
during BAA
BAA phase.
phase.
•• Data
DataAnalysis
Analysis
•• Activity
ActivityAnalysis
Analysis
•• Interaction
InteractionAnalysis
Analysis
•• Current
Current Systems
Systems Analysis
Analysis
© Minder Chen, 1993~2002 Data Modeling - 44 -
Attribute Value
• Definition
– Attribute Values are instances of Attributes used to describe
specific Entity Instances
• Examples
– Customer Number: 011334
– Customer Name: Minder Chen
– State: VA
– Order Total: $23,000
– Sale tax: $250
• An attribute of an entity type should have only one value
at any given time. (No repeating group)
• Avoid using complex coding scheme for an attribute.
For example: PART Number: X-XXX-XXX

Part Type Material Sequence Number


© Minder Chen, 1993~2002 Data Modeling - 45 -
Type & Instance
OBJECT TYPE OCCURRENCE
Entity Type Entity Instance
Entity Entity Instance
Entity Type Entity

Relationship (Type) Pairing (Relationship Instance)

Attribute (Type) (Attribute) Value

© Minder Chen, 1993~2002 Data Modeling - 46 -


Attribute Source Categories
• Basic
– Definition: An Attribute Value that cannot be deduced
or calculated.
– Examples: Student name and Birthday
• Derived
– Definition: The Attribute Value can be calculated or
deduced from relationship Groupings or from the
values of other Attributes. The value of a Derived
Attribute changes constantly.
– Examples: Student Age, Account Balance, Number of
courses taken.
• Designed
– Definition: The Attribute is created to overcome the
system constraints. The value of a Designed Attribute
does not change.
– Examples: Student ID, Course number.

© Minder Chen, 1993~2002 Data Modeling - 47 -


Data Types

© Minder Chen, 1993~2002 Data Modeling - 48 -


Properties of Attributes
• Name
• Description
• Attribute Source Category: Basic, Derived, Designed
• Domain or data type: Text, Number, Date, Time, Timestamp
• Optionality: Mandatory or optional
• Length and/or precision
• Permitted Values (Legal Values)
– Ranges
– A set of values (Code Table)
• Default value or algorithm

Tools such as PowerBuilder has additional properties for table’s


columns called extended attributes
– Validation Rule
– Editing Format
– Column Heading
– Reporting Format
– Form Label
– Code Table
© Minder Chen, 1993~2002 Data Modeling - 49 -
Composite Attribute
• Definition:
• Example:
– Telephone Number =
Area code + Exchange + Extension
• There is no support of composite attribute
type most of CASE tools. In such case,
an composite attribute must be stored as
an entity type.

© Minder Chen, 1993~2002 Data Modeling - 50 -


Domain
• A collection of values which can be taken by one or
more attributes.
• Date is the domain for Ordered Date, Student's
Birthday, Employee Starting Date.
• A used defined domain can have customized
validation rules and formats.
• CASE tools such as IEF only supports the
following basic domains:
– Text
– Number
– Date
– Time
– Timestamp

© Minder Chen, 1993~2002 Data Modeling - 51 -


Identifiers
• The identifier of an entity type is a set of
attributes and/or relationships whose
values can uniquely identify an entity.
• Entity types should have one identifier.
• Identifiers may consist of
– A single attribute: Student ID
– A set of attributes: Students ID + Course ID
– An attribute and a relationship membership
(implemented as a foreign Key): Order Item No +
Order Has Order Item

© Minder Chen, 1993~2002 Data Modeling - 52 -


Identifying Relationship

product
customer

places Symbol for is ordered by


Identifying Relationship

ORDERS
is placed by has
contains
order
order item
is part of

© Minder Chen, 1993~2002 Data Modeling - 53 -


Data Modeling Case Study
The following is description by a pharmacy owner:
"Jack Smith catches a cold and what he suspects is a flu
virus. He makes an appointment with his family doctor
who confirm his diagnosis. The doctor prescribes an
antibiotic and nasal decongestant tablets. Jack leaves the
doctor's office and drives to his local drug store. The
pharmacist packages the medication and types the labels
for pill bottles. The label includes information about
customer, the doctor who prescribe the drug, the drug
(e.g., Penicillin), when to take it, and how often, the content
of the pill (250 mg), the number of refills, expiration date,
and the date of purchase."

Please develop a data model for the entities and relationships


within the context of pharmacy. Also develop a definition for
"prescription". List all your underlying assumptions used in
your data models.

© Minder Chen, 1993~2002 Data Modeling - 54 -


Data Modeling Case Study
Given the following narrative description of entities and their
relationships, prepare a draft entity relationship diagram (ERD).
Be sure any reasonable assumptions that you are making.
Burger World Distribution Center serves as a supplier to 45
Burger World franchises. You are involved with a project to
build a database system for distribution. Each franchise
submits a day-by-day projection of sales for each of Burger
World's menu products - the products listed on the menu at
each restaurant - for the coming month. All menu product
require ingredients and/or packaging items. Based on
projected sales for the store, the system must generate a
day-by-day and ingredients need and then collapse those
needs into one-per-week purchase requisitions and
shipments.

© Minder Chen, 1993~2002 Data Modeling - 55 -


Data Modeling Process
• List entity types
• Create relationships
– Pick a central entity type
– Work around the neighborhood
» Add entity types to the diagram
» Build relationships among them
– Determine cardinalities of relationships
• Find/Create identifiers for each entity type
• Add attributes to the entity type in the data
model
• Analyze and revise the data model

© Minder Chen, 1993~2002 Data Modeling - 56 -


Classifying Attribute and Partitioning
• An Entity Subtype A collection of Entities of the same type
to which a narrower definition and additional Attributes
and Relationships apply. An Entity Subtype inherits
(retains) all the Attributes and Relationships of its parent
Entity Type.
• Classifying Attribute: An attribute of the Base Entity Type
whose values partition the Entity Instances into Subtypes.

• Partitioning: A basis for subdividing one entity type into


subtypes. The process of dividing an Entity Type into
several Subtypes based on a Classifying Attribute is called
Partitioning.
• The Classifying Attribute is recorded as a property of the
Partitioning and it appears on the diagram.

© Minder Chen, 1993~2002 Data Modeling - 57 -


Characteristics of Partitioning
• Optionality:
– Mandatory: Every Entity instances of the Entity Type
must fall into one of the Subtype categories.
– Optional: Not every Entity instances of the Entity Type
must fall into one of the Subtype categories.
• Entity Life Cycle: The states through which an
Entity Type can pass are used for Partitioning.
• Enumeration:
– Fully enumerated
– Not fully enumerated
• Classifying Attributes and Values
– Classifying Attribute: Type
– D: Domestic Subtype
– F: Foreign Subtype

© Minder Chen, 1993~2002 Data Modeling - 58 -


Partitioning and Entity Subtype: Notation
ATTRIBUTE:
Employee ID
Name ATTRIBUTE:
Birthday Teaching Quality Indicator

Employee
Type

Teaches
Lecturer Seminar
Staff

Status
Wage
Hourly

© Minder Chen, 1993~2002 Data Modeling - 59 -


Alternative Notations for Subtypes
IDEF1X PowerDesigner
Complete Category
All categories shown Account
Account Number
employeeID
name
Name
phone

employee type
Savings Checking
full-time-emp part-time-emp
Rate Fees
employeeID employeeID
(FK) (FK)
salary hourly-rate

© Minder Chen, 1993~2002 Data Modeling - 60 -


Entity Subtype Partitioning
Life
Life Cycle
Cycle Partitioning
Partitioning
Order

Order Status

Taken
Scheduled
Shipped
Billed
Paid

© Minder Chen, 1993~2002 Data Modeling - 61 -


Normalization
• A data base is a model or an image of the
reality.
• Logical Data Base Design is a process of
modeling and capturing the end-user views
of an application domain and synthesis
them into a data base structure.
• Normalization is a logical data base design
method.
• The basis for normalization is the functional
dependencies among attributes in a table.

© Minder Chen, 1993~2002 Data Modeling - 62 -


SQL Terminology
Column
Product
Product Table
Table
p_no product_name
quantity price
101 Color TV 24 500
Row
201 B&W TV 10 250
202 PC 5 2000

Create
Create aa table
table in
in SQL
SQL

CREATE TABLES
(p_no CHAR(5) NOT NULL,
product_name CHAR(20),
quantity SMALLINT,
price DECIMAL(10, 2));
© Minder Chen, 1993~2002 Data Modeling - 63 -
SQL Terminology
Set Theory Relational DB File Example
Relation Table File Product_table
Attribute Column Data item Product_name
Tuple Row Record Product_101's info.
Domain Pool of legal values Data type DATE

© Minder Chen, 1993~2002 Data Modeling - 64 -


SQL Principles
• The result of a SQL query is always a table (View
or Dynamic Table)
• Rows in a table are considered to be unordered
• Dominate the markets since late 1980s
• Can be used in interactive programming
environments
• Provide both data definition language (DDL) and
data manipulation language (DML)
• A non-procedural language
• Can be embedded in 3GL:
– Embedded SQL
– Dynamic SQL

© Minder Chen, 1993~2002 Data Modeling - 65 -


SQL: Data Definition Language (DDL)

CREATE TABLE
VIEW
DROP INDEX
DATABASE

ALTER TABLE

© Minder Chen, 1993~2002 Data Modeling - 66 -


SQL: Introduction
• A relational data base is perceived by its users as a
collection of tables
• E. F. Codd 1969
• Dominate the markets since late 1980s
• Strengths:
– Simplicity
– End-user orientation
– Standardization
– Value-based instead of pointer-based
– Endorsed by major computer companies
• Most CASE products support the development of
relational data base centered applications

© Minder Chen, 1993~2002 Data Modeling - 67 -


SQL: Data Manipulation Language (DML)

p_no product_name quantity price


SELECT
SELECT
UPDATE 101 Color TV 24 500
UPDATE
INSERT
INSERT 201 B&W TV 10 250
DELETE
DELETE 202 PC 5 2000

The
The Generic
Generic Form
Form of
of the
the SELECT
SELECT Statement
Statement

SELECT
SELECT [DISTINCT]
[DISTINCT]column(s)
column(s)
FROM
FROM table(s)
table(s)
[WHERE
[WHEREconditions]
conditions]
[GROUP
[GROUPBYBY column(s)
column(s) [HAVING
[HAVING condition]]
condition]]
[ORDER
[ORDERBYBY column(s)]
column(s)]

© Minder Chen, 1993~2002 Data Modeling - 68 -


Database Table
• The following code retrieves only the Last Name and the
Employee ID where the Employee ID is greater than 5. The
records are retrieved in descending order.
SELECT LastName, EmployeeID
FROM Employees
WHERE EmployeeID > 5
ORDER BY EmployeeID DESC

© Minder Chen, 1993~2002 Data Modeling - 69 -


WHERE Clause
• WHERE: Use the Where clause to limit the selection. The #
symbol indicates literal date values.

SELECT * FROM Employees


WHERE LastName = "Smith"

SELECT Employees.LastName FROM Employees


WHERE Employees.State in ('NY','WA')

SELECT OrderID FROM Orders


WHERE OrderDate BETWEEN #01/01/93# AND #01/31/93#

© Minder Chen, 1993~2002 Data Modeling - 70 -


Keys
• A key, also called identifier, is an Attribute or a
Composite Attribute that can be used to
uniquely identify an instance of an entity type.
• Examples:
Entity Type Key
Warehouse Warehouse Number
Product Product Number
Student Student ID or SSN
Ship Name and Port of Registration
Stock of Product Product Number and Warehouse No.

© Minder Chen, 1993~2002 Data Modeling - 71 -


Types of Key
• Primary Key: A unique key is an attribute or a set
of attributes that has been used by the DBMS as
the identifier of a table.
• Candidate (Alternative) Key: An attribute or a set
of attributes that could have been used as the
primary key of a table.
• Secondary (Index) Key: An attribute or a set of
attributes that has been used to construct the
data retrieval index.
• Concatenated (Combined or Composite) Key: A
set of attributes that has been used as the key.
• Foreign Key: An attribute or a set of attributes
that is used as the primary key in another table.
© Minder Chen, 1993~2002 Data Modeling - 72 -
Purposes of Normalization
• Avoid maintenance problems such as Update
.
• Insert: There may be no place to insert new
information.
• Delete: Some important information will be
lost by deletion.
• Update: Inconsistency may occur because of
the existence of data redundancy.
• Provide maximum flexibility to meet future
information needs by keeping tables
corresponding to object types in their
simplified forms.
© Minder Chen, 1993~2002 Data Modeling - 73 -
A Common Sense Approach to Normalization
• Don't rush to put all the information in one
table.
• Create a table to correspond to a class of
a simple object type that should exist by
itself, i.e., "one fact in one place."
• Include common fields (links) as ways of
joining information from several related
tables.
• Avoid redundancy by using links to
retrieve data from related tables.

© Minder Chen, 1993~2002 Data Modeling - 74 -


Normalization Theory
• Normalization is a process of systematically
breaking a complex table into simpler ones.
• It is built around the concept of normal forms.
• A relation is in a particular normal form if it
satisfies a specific set of constraints such as
dependencies among attributes in the relation.
• For x is an integer and x > 1,
if a relation is in x-NF than it is in (x-1)-NF.
• Higher order normal forms are usually more
desirable than lower order normal forms.
• Normalization process usually starts from
complex relations which are usually drawn from
some existing documents such as business
forms.

© Minder Chen, 1993~2002 Data Modeling - 75 -


A Business Form

© Minder Chen, 1993~2002 Data Modeling - 76 -


An Informal Example of Normalization
• A CUSTOMER ORDER contains the following
information:
– OrderNo
– OrderDate
– CustNo
– CustAddress
– CustType
– Tax
– Total
– one or more than one Order-Item which has
» ProductNo
» Description
» Quantity
» UnitPrice
» Subtotal.

© Minder Chen, 1993~2002 Data Modeling - 77 -


Solution
Unnormalized table
(OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax,
1{ProductNo, Description, Quantity, UnitPrice,Subtotal}

Remove repeating group

(OrderNo, ProductNo, Description, Quantity, UnitPrice, Subtotal)


Subtotal
1st NF

Remove partial FD

2nd NF (OrderNo, OrderDate, CustNo, CustAddress, CustType, Tax

Remove transitive FD

(OrderNo, ProductNo, Quantity, UnitPrice,


Subtotal)
Subtotal
(ProductNo, Description, UnitPrice)

(OrderNo, OrderDate, CustNo, Tax, Total


3rd NF
(CustNo, CustAddress, CustType)
© Minder Chen, 1993~2002 Data Modeling - 78 -
Unnormalized Form
• A relation that has multi-valued attributes (repeating
groups).
• Normalization Process: Remove Multi-value Attributes
• If an unnormalized relation R has a primary key K and a
multi-value attribute M, the normalization process is:
– The multi-value attribute M should be removed from R.
– A new relation will be created with (K,M) as the primary key of the
relation.
– There may be some other attributes associated with this new
relation.
– R will then be at least in 1NF.
• Example: An Employee relation has an attribute language-
spoken. For some employees there may be more than one
language that they can speak.

EMP (employeeID, empName, empAddress, (language1, language2, ...))



EMP (employeeID, empName, empAddress)
EMP-LANGUAGE (employeeID, language, skillLevel)
© Minder Chen, 1993~2002 Data Modeling - 79 -
How Do You Remove the Repeating Groups?
CREATE TABLE MEM_CONDITION (
MEMBER# VARCHAR2(12) NOT NULL,
CASE# VARCHAR2(16) NOT NULL,
DIAG_ARRAY_1 VARCHAR2(6) NOT NULL,
DIAG_ARRAY_2 VARCHAR2(6) NOT NULL,
DIAG_ARRAY_3 VARCHAR2(6) NOT NULL,
DIAG_ARRAY_4 VARCHAR2(6) NOT NULL,
DIAG_ARRAY_5 VARCHAR2(6) NOT NULL,
DIAG_EX_ARRAY_1 VARCHAR2(2) NOT NULL,
DIAG_EX_ARRAY_2 VARCHAR2(2) NOT NULL,
DIAG_EX_ARRAY_3 VARCHAR2(2) NOT NULL,
DIAG_EX_ARRAY_4 VARCHAR2(2) NOT NULL,
DIAG_EX_ARRAY_5 VARCHAR2(2) NOT NULL,
DRUG_ARRAY_1 VARCHAR2(12) NOT NULL,
DRUG_ARRAY_2 VARCHAR2(12) NOT NULL,
DRUG_ARRAY_3 VARCHAR2(12) NOT NULL,
DRUG_ARRAY_4 VARCHAR2(12) NOT NULL,
DRUG_ARRAY_5 VARCHAR2(12) NOT NULL,
LC_ARRAY_1 VARCHAR2(4) NOT NULL,
LC_ARRAY_2 VARCHAR2(4) NOT NULL,
LC_ARRAY_3 VARCHAR2(4) NOT NULL,
LC_ARRAY_4 VARCHAR2(4) NOT NULL,
LC_ARRAY_5 VARCHAR2(4) NOT NULL,
MEM_REVIEW VARCHAR2(4) NOT NULL,
OP# VARCHAR2(4) NOT NULL,
PROC_ARRAY_1 VARCHAR2(6) NOT NULL,
PROC_ARRAY_2 VARCHAR2(6) NOT NULL,
PROC_ARRAY_3 VARCHAR2(6) NOT NULL,
PROC_ARRAY_4 VARCHAR2(6) NOT NULL,
PROC_ARRAY_5 VARCHAR2(6) NOT NULL,
PROV_ARRAY_1 VARCHAR2(12) NOT NULL,
PROV_ARRAY_2 VARCHAR2(12) NOT NULL,
PROV_ARRAY_3 VARCHAR2(12) NOT NULL,
PROV_ARRAY_4 VARCHAR2(12) NOT NULL,
PROV_ARRAY_5 VARCHAR2(12) NOT NULL,
REC_TYPE VARCHAR2(2) NOT NULL,
SP_ARRAY_1 VARCHAR2(4) NOT NULL,
SP_ARRAY_2 VARCHAR2(4) NOT NULL,
SP_ARRAY_3 VARCHAR2(4) NOT NULL,
SP_ARRAY_4 VARCHAR2(4) NOT NULL,
SP_ARRAY_5 VARCHAR2(4) NOT NULL,
TRANSCODE VARCHAR2(2) NOT NULL,
TT_ARRAY_1 VARCHAR2(4) NOT NULL,
TT_ARRAY_2 VARCHAR2(4) NOT NULL,
TT_ARRAY_3 VARCHAR2(4) NOT NULL,
TT_ARRAY_4 VARCHAR2(4) NOT NULL,
TT_ARRAY_5 VARCHAR2(4) NOT NULL,
VOID VARCHAR2(2) NOT NULL,
YMDEFF VARCHAR2(8) NOT NULL,
YMDEND VARCHAR2(8) NOT NULL,
YMDTRANS VARCHAR2(8) NOT NULL,
PRIORITY VARCHAR2(2) NOT NULL
);

© Minder Chen, 1993~2002 Data Modeling - 80 -


Functional Dependency
• Notation: R.X => R.Y
• Definition: Attribute Y of Relation R is
functionally dependent on the
Attribute X of Relation R when there
is each value of R.Y associated with
no more than one value of R.X. R.X
and R.Y may be composite attributes.
• Description:
– R .Y is functionally dependent on R.X
– R.X functionally determines R.Y
© Minder Chen, 1993~2002 Data Modeling - 81 -
Full & Partial Dependency
• R.A => R.B
• If B is not functionally dependent on
any subset of A (other than A itself), B
is fully dependent on A in R.
• If B is functionally dependent on a
subset of A (other than A itself), B is
partially dependent on A in R.

© Minder Chen, 1993~2002 Data Modeling - 82 -


First Normal Form (1NF)
• A relation R is in the first normal form (1NF) if and only if all
attributes of any tuple in R contain only atomic values.
• Normalization Process:
– Remove Partial Functional Dependencies
– If R is in 1NF and has a composite primary key (K1,K2), an attribute P
is functionally dependent on K1 (K1 => P) (i.e., P is partially
dependent on (K1, K2)), the normalization process is:
– The attribute P should be removed from R and a new relation will be
created with K1 as the primary key and P as a non-key attribute.
– A relation that is in 1NF and not in 2NF must have a composite
primary key.
• Example
– Supplier-Part relation has attributes supplier#, part#, qty, city,
distance, where (supplier#, part#) is the key.
– City is partially dependent on supplier#.
SUPPLIER-PART (supplier#, part#, qty, city, distance)

SUPPLIER-PART (supplier#, Part#, qty)
SUPPLIER (supplier#, city, distance)
© Minder Chen, 1993~2002 Data Modeling - 83 -
Non-loss Decomposition
• Normalization is a reduction (decomposition) process
that replaces a relation by suitable projections. Each of
the projection is a new relation that is in a further
normalized form than the original relation. The
collection of projections is equivalent to the original
relation.
• The original relation can always be recovered by taking
the natural join of these projections.
• Any information that can be derived from the original
relation can also be derived from the further normalized
relations. The converse is not true.
• The process is reversible because no information is loss
in the reduction process.

© Minder Chen, 1993~2002 Data Modeling - 84 -


Transitive Dependency
In a relation R,
if R.A =>R.B and R.B => R.C
then attribute C is said to be transitively
dependent on attribute A.

© Minder Chen, 1993~2002 Data Modeling - 85 -


Second Normal Form (2NF)
• A relation R is in the second normal form (2NF) if and only if it is in
1NF and every non-key attribute is fully dependent on the primary
key.
• Normalization Process: Remove Transitive Dependencies
• If R is in 2NF and has two non-key attributes A1 and A2 where A2 is
functionally dependent on A1 (A1 => A2). The A2 should be
removed from R and a new relation will be created with A1 as the
primary key and A2 as a non-key attribute.
• Example
– Supplier relation has attributes supplier#, city, distance, where supplier# is
the key and distance to a supplier can be determined by the city of the
supplier.

SUPPLIER (supplier#, city, distance, quality_level)



SUPPLIER (Supplier#, city, quality_level)
CITY-DISTANCE (city, distance)
© Minder Chen, 1993~2002 Data Modeling - 86 -
Third Normal Form (3NF)
• A relation R is in the third normal form (3NF) if and only if the non-
key attributes (if there is any) are fully dependent on the primary key
of R (i.e., R is in its 2NF) and are mutually independent.
• Heuristic to Check Whether a Relation Is in 3NF
– All the non-key attributes (which are not multi-value attributes) are
dependent on the (primary) key, the whole key, and nothing but the key.

Explanation
Explanation
• All the non-key attributes have atomic value and dependent on the key
(1NF - No multi-value attribute),
• the whole key, (2NF - No Partially Functional Dependency)
• and nothing but the key (3NF - No Transitive Functional Dependency)

© Minder Chen, 1993~2002 Data Modeling - 87 -


Normalization Process
Unnormalized
UnnormalizedForm
Form

A B C D E F G H
remove repeating groups
1NF
1NF 2NF
2NF

A F G H A B C D E

remove partial dependencies remove transitive dependencies


3NF
3NF
3NF
3NF
3NF
3NF 3NF
3NF D E
A F G
F H A B C D

© Minder Chen, 1993~2002 Data Modeling - 88 -


Normalization: Pros and Cons
• Pros
– Reduce data redundancy & space required
– Enhance data consistency
– Enforce data integrity
– Reduce update cost
– Provide maximum flexibility in responding ad hoc queries
• Cons
– Many complex queries will be slower because joins have to be
performed to retrieve relevant data from several normalized
tables
– Programmers/users have to understand the underlying data
model of an database application in order to perform proper
joins among several tables
– The formulation of multiple-level queries is a nontrivial task.

© Minder Chen, 1993~2002 Data Modeling - 89 -


Join Two Tables
SELECT Categories.CategoryName, Products.ProductName
FROM Categories, Products
WHERE Products.CategoryID = Categories.Category ID

© Minder Chen, 1993~2002 Data Modeling - 90 -


Tables in Relational DB
• Identify Primary Keys and Foreign Keys in the
following Tables!!!

ID ID

ID

© Minder Chen, 1993~2002 Data Modeling - 91 -


Join Tables
SELECT Orders.OrderID, Orders.CustID,
LastName, Firstname, Orders.ItemID, Description
FROM Customer, Orders, Inventory
WHERE Customer.CustID = Orders.CustID AND
Orders.ItemID = Inventory.ItemID
ORDER BY CustID, Orders.ItemID

ID ID

© Minder Chen, 1993~2002 Data Modeling - 92 -


Foreign Keys & Primary Keys in a Sample Access Database

© Minder Chen, 1993~2002 Data Modeling - 93 -


An Example of a Complex Query
Please list name and phone number of customers
who have ordered product number 007.

SELECT customer_name, customer_phone


FROM customer
WHERE customer_number IN
SELECT customer_number
FROM order
WHERE order_no IN
SELECT order_no
FROM orderItem
WHERE product_number = 007

© Minder Chen, 1993~2002 Data Modeling - 94 -


Denormalization
• The process of intentionally backing away from
normalization to improve performance. Denormalization
should not be the first choice for improving performance
and should only be used for fine tuning a database for a
particular application.
• Requirements
– Prior normalization
– Knowledge of data usage
• Benefits
– Minimize the need for joins
– Reduce number of tables
– Reduce number of foreign keys
– Reduce number of indices
• Knowledge of Data Usage
– How often are two data items needed together
– How many rows are involved
– How volatile is denormalized data
– How important is visibility of data to users
– What is the minimum response time and frequency of an query
© Minder Chen, 1993~2002 Data Modeling - 95 -
De-normalization: An Example
JOIN
R1 R2

Denormalization

R1 * R 2 R2
• Where:
– R1 (ProductNo, SupplierNo, Price)
– R2 (SupplierNo, Name, Address, Phone)
– R1*R2 (ProductNo, SupplierNo, Name, Address, Phone, Price)
• R2 should be kept to prevent data loss.
• Data redundancy in R1*R2 and R2 could cause potential data
inconsistency problems if the redundant data in these two tables are
not maintained properly.

© Minder Chen, 1993~2002 Data Modeling - 96 -


Data Model Refinement and Transformation
• Data Model Refinement
• Associative Entity Type
• Removing Many-to-Many Relationships
• Keys
• Transformation to Relational Databases

© Minder Chen, 1993~2002 Data Modeling - 97 -


Refinement of a Data Model: Analysis and Simplification

• Isolated Entity Type


• Solitary Entity Type
• One-to-One Relationship
• Redundant Relationship
• Multi-Valued Attributes
• Attribute with Attributes
• Many-to-Many Relationship

© Minder Chen, 1993~2002 Data Modeling - 98 -


Isolated Entity Type
• An Entity Type that does not participate in a
Relationship.
• Since every Entity Type should participate in at
least one Relationship, there exist two
alternatives:
– Identify a relevant Relationship
– Remove the Entity Type from the model

© Minder Chen, 1993~2002 Data Modeling - 99 -


Solitary Entity Type
• An Entity Type that has only one Entity Instance.
Examples: Computer Center, Sales Tax, and Current
Order Number. Solitary Entity Types may be too
restrictive.
• Alternatives:
– Introduce another Entity Type with a wider scope.
– Computer Center ==> Organization Unit

– Define it as an Attribute of an Entity Type.


– Sales Tax ==> Sales Tax of Order

– Define it as a data element in an parameter table. A parameter


table has only one row.
– Current Order Number ==> Current Order Number of Parameter
Table

© Minder Chen, 1993~2002 Data Modeling - 100 -


Evaluate One-to-One Relationship
• It may be an unnecessary relationship between two
Entity Types if they have the same attribute and
relationships (i.e., they are identical).
• It should be then combined into one Entity Type.

Maybe
MaybeIncorrect
Incorrect
Purchase becomes Purchase
Request Order
has request

Correct
Correct
Purchase
Order

© Minder Chen, 1993~2002 Data Modeling - 101 -


Redundant Relationship
Is this relationship
redundant?
has ordered
product
customer

places is ordered by

ORDERS
is placed by has
contains
order
order item
is part of
Differences in timing of an entity type in its life cycle:
• Implemented as separate entity types or use subtypes
• Use value of attributes or additional attributes to differentiate them

© Minder Chen, 1993~2002 Data Modeling - 102 -


Redundant Relationship

Redundant
Redundant

Product Warehouse
stocks
is held as
holds
Stock
contains is held in

Non-redundant
Non-redundant
is contained in is contained in
Product contains Order Line Order
contains
is placed by

places
is contained in is contained in
Order History Customer
contains contains

© Minder Chen, 1993~2002 Data Modeling - 103 -


Multi-Valued Attribute
• Definition
– An Attribute that may have more than one value at a time is called a multi-
valued attribute.
• Solution:
– Create an Entity Type for the multi-valued attribute
• Example:
– Languages spoken by an Employee

– Employee(ID, Name, Phone, Languages)


– Employee(111, “John Smith”, 201-999-8888, (English, Chinese))

– Employee(ID, Name, Phone)


– Employee(111, “John Smith”, 210-999-8888)

– Employee_language(ID, Language)
– Employee_language(111, English)
– Employee_language(111, Chinese)

© Minder Chen, 1993~2002 Data Modeling - 104 -


Attribute with Attributes
• An Attribute that can be described by other
Attributes is called an attribute with
attributes.
• Example:
– College Degree by an Employee
– (John Smith has a College Degree in Computer
Sciences from George Mason University)
• Solution:
– Create an Entity Type to avoid an Attribute with
Attributes.
– Add new attributes to the existing Entity Type.

© Minder Chen, 1993~2002 Data Modeling - 105 -


Associative Entity Type
• An Associative Entity Type is an Entity Type
whose existence is meaningful only if it
participates in several (>=2) Relationship Types
at the same time.
• Associative Entity Types are often introduced to
represent additional information in many-to-
many Relationships or to decompose a many-to-
many Relationship into two one-to-many
Relationships.
• Associative Entity Types are also used to
represent n-ary Relationships in a binary data
model.

© Minder Chen, 1993~2002 Data Modeling - 106 -


Remove Many-to-Many Relationship
Given
Given
contains
Order Product
belongs-to

Why?
Why?
• There is no place to attach Attributes that are required to describe a many-to-many
Relationship.
• It is difficult to translate many-to-many Relationships into relational tables automatically.

How?
How? A many-to-many relationship can be decomposed into two
one-to-many Relationships by creating an Associative Entity
Type between the existing two Entity Types.

contains has
Order Order Line Product
belongs to is contained in

© Minder Chen, 1993~2002 Data Modeling - 107 -


Remove Many-to-Many Relationships: Exercises

Remove the many-to-many relationship from the


following ER diagrams
(a)
has-sources
Product Supplier
offers

(b)
takes
Student Course
is-taken-by

(c) consists-of

Part is-contained-in

© Minder Chen, 1993~2002 Data Modeling - 108 -


Bills of Material
A
Part

consists-of is-a-component-in B C
2 1

D E D F
Product Structure 1 3 2 2

Product-Structure(Parent Part No, Child Part No, Quantity)

A B 2
A C 1
B D 1
B E 3
C D 2
C F 2

© Minder Chen, 1993~2002 Data Modeling - 109 -


Using an Associative Entity Type to Represent an N-ary Relationship

involved in involved in
product product usage
Product usage Project

involved in
product usage
Supplier

Product
ProductUsage
Usageisisan
anAssociative
AssociativeEntity
EntityType
Typefor
foraa3-ary
3-aryRelationship.
Relationship.

is used in uses
Product Product Usage Project

supplies
Supplier

© Minder Chen, 1993~2002 Data Modeling - 110 -


Translate Data Models to Relational Tables

Given
Given
contains has
Order Order Line Product
belongs to is contained in

Key: Order# Key: Order#+Product# Key: Product#


Attribute: Attribute: Attribute:
Order date Quantity Description
Customer ID Unit Price Qty-on-hand
Sale Person ID Unit Price

Relational
RelationalTables
TablesCreated
Created

CREATE TABLE ORDER


(OrderNo CHAR(10) NOT NULL,
OrderDate DATE,
CustomerID CHAR(10),
SalePersonID CHAR(10));

© Minder Chen, 1993~2002 Data Modeling - 111 -


Transformation of Data Models to Relational Database Tables

• The entire, or part of, a data (entity-relationship)


model can be translated into a normalized
database design.

• Objects Created
– At most one relational database
– One or more relations (tables)
– Data structures (DDL) representing the elements
(attributes) and the primary key of each relation
– Data type of each data elements

© Minder Chen, 1993~2002 Data Modeling - 112 -


Heuristics of Transformation
• A table is created for each Entity Type in the ER diagram.
• A table is created for each multi-valued attribute.
• Relationship Types are implemented as tables or as foreign keys in
other tables.
• Many-to-many relationship types are translated into tables.
• Foreign keys are used for implementing one-to-one and one-to-
many Relationship Types.
• For one-to-many Relationship Types, the foreign key is placed in
the table that represents the Entity Type on the "many" end of the
Relationship Type.
• For identifying one-to-many Relationship Types, the PK of the
"one" table migrate to the "many" table as a FK and the FK is also
part of the PK of the "many" table.
• For non-identifying one-to-many Relationship Types, the PK of the
"one" table migrate to the "many" table as a FK and the FK is a
non-key attribute of the "many" table.

© Minder Chen, 1993~2002 Data Modeling - 113 -


PowerDesign: Data Architect

Generation/Reverse Engineering:
CDM, PDM

Generation & Reverse Engineering: Generation & Reverse Engineering:


Database Structure Extended Attributes
Triggers & Stored Procedures Database Structure

Target
4GL Tool http://www.powersoft.com/
Target DBMS
© Minder Chen, 1993~2002 Data Modeling - 114 -
PowerDesigner

© Minder Chen, 1993~2002 Data Modeling - 115 -


Conceptual Data Model
A Sample Conceptual Data Model Project : Management
Model : Project Management
Author : User Version 6.x 7/21/98
Team
Division Team number Customer
Division number Speciality Customer number
Division name
Customer name
Division address
Customer address
Is membersupervises
of Customer activity
Customer telephone
Employee Customer fax
Employee number
First name
Last name Subcontract Activity
Uses Employee function
Start date
Employee salary
End date

Is manager of

Project
Task
Material Project number
Task name
Material number Project name
Task cost
Material name Project label
Material type

composes composed of Participate


Start date
End date

© Minder Chen, 1993~2002 Data Modeling - 116 -


Notations

Employee
Employee number
Ent First name
ity Last name
Employee function
Employee salary

Relationship

Employee
Division Employee number
Division number First name
Division name Last name
Division address Employee function
Employee salary

One-to-many
© Minder Chen, 1993~2002 Data Modeling - 117 -
More on Relationships
Employee
is a member of
Employee number
First name Team
Last name Team number
Employee function Specialty
Employee salary member
Many-to-many cardinality

Project
Task
Project number
Task name
Project name
Task cost
Project label

A project 'contains’ one or more tasks, and a task's


existence is dependent on the project.

© Minder Chen, 1993~2002 Data Modeling - 118 -


Advanced Concepts

Material
Material number
Account Material name
Account Number Material type
Name

composes composed of

Savings Checking
Rate Fees

Employee
Employee number
Subtype First name
Last name
Employee function
Employee salary

Reflexive relationship

© Minder Chen, 1993~2002 Data Modeling - 119 -


Define Entities

© Minder Chen, 1993~2002 Data Modeling - 120 -


Define Attributes

© Minder Chen, 1993~2002 Data Modeling - 121 -


Check Parameters

© Minder Chen, 1993~2002 Data Modeling - 122 -


Relationship Definition

© Minder Chen, 1993~2002 Data Modeling - 123 -


Dependent (Identifying Relationship)
• Check the box to
indicate a
dependent
relationship. "One
to many" and
"mandatory" are
automatically
chosen as the
cardinality and
optionality.
• At the physical data
model level, the
parent entity type's
primary key (PK) will
become part of the
dependent child
entity type's PK. It
is also a foreign key.

© Minder Chen, 1993~2002 Data Modeling - 124 -


Inheritance (Super-Type and Sub-Type)

© Minder Chen, 1993~2002 Data Modeling - 125 -


Generate Physical Data Model

© Minder Chen, 1993~2002 Data Modeling - 126 -


Physical Data Model
Conceptual
Conceptual Employee
Division
Data
DataModel
Model Employee number
Division number
Division name belongs to First name
Division address Last name
Transformation

Employee function
Employee salary

Do not define FK
as an attribute.

EMPLOYEE
Physical
Physical EMPNUM <pk>
Data
DataModel
Model DIVISION DIVNUM <fk>
DIVNUM <pk> EMPFNAM
DIVNAME EMPLNAM
DIVADDR DIVNUM = DIVNUM EMPFUNC
EMPSAL

DIVNUM automatically migrates as a foreign key.

© Minder Chen, 1993~2002 Data Modeling - 127 -


Dependent Relationship
Conceptual
Conceptual Project
Task
Data
DataModel
Model Project number
Task name
Project name
Task cost
Project label
Transformation

Physical
Physical PROJECT
Data
DataModel
Model PRONUM <pk> TASK
CUSNUM <fk> PRONUM = PRONUM PRONUM <pk,fk>
EMPNUM <fk> TSKNAME <pk>
ACTBEG ACTBEG
ACTEND ACTEND
PRONAME TSKCOST
PROLABL

© Minder Chen, 1993~2002 Data Modeling - 128 -


Physical Data Model Physical Data Model
Project:Management
Model :Project Management
Author :User Version6.x 7/21/98
TEAM
TEANUM <pk>
TEASPE CUSTOMER
DIVISION EMPLOYE_MATERIAL
CUSNUM <pk>
DIVNUM <pk> CUSNAME MATERIAL.MATNAME char(30)
DIVNAME CUSADDR PROJ.EMPLOYEE.EMPNUM numeric(5)
TEANUM = TEANUM
DIVADDR CUSACT PROJ.EMPLOYEE.EMPFNAM char(30)
CUSTEL PROJ.EMPLOYEE.EMPLNAM char(30)
MEMBER PROJ.EMPLOYEE.EMPFUNC char(30)
CUSFAX
TEANUM <pk,fk> MATERIAL
EMPNUM <pk,fk>
PROJ.EMPLOYEE
DIVNUM = DIVNUM USED
CUSNUM = CUSNUM
EMPNUM = EMPNUM

USED EMPNUM = EMP_EMPNUM


MATNUM <pk,fk>
EMPNUM <pk,fk>
EMPLOYEE PROJECT
EMPNUM = EMPNUM
EMPNUM <pk> PRONUM <pk>
MATNUM = MATNUM EMP_EMPNUM <fk> CUSNUM <fk>
EMPNUM = EMPNUM EMPNUM <fk>
DIVNUM <fk>
EMPFNAM <ak> ACTBEG
EMPLNAM <ak> ACTEND
MATERIAL PRONAME
EMPFUNC <ak>
MATNUM <pk> EMPSAL PROLABL
MATNAME
MATTYPE

MATNUM = CPD_MATNUM EMPNUM = EMPNUM PRONUM = PRONUM

TNUM = CPN_MATNUM
PARTICIPATE TASK
COMPOSE PRONUM <pk,fk> PRONUM = PRONUM PRONUM <pk,fk>
TSKNAME <pk,fk> TSKNAME = TSKNAME TSKNAME <pk>
CPD_MATNUM <pk,fk>
EMPNUM <pk,fk> ACTBEG
CPN_MATNUM <pk,fk>
PARBEG ACTEND
PAREND TSKCOST

© Minder Chen, 1993~2002 Data Modeling - 129 -


References (Relationships at the Physical Data Model)

© Minder Chen, 1993~2002 Data Modeling - 130 -


Referential Integrity
• The arrow is
pointing from
the table
containing
the foreign
key to the
table where
the foreign
key is used
as a primary
key.

© Minder Chen, 1993~2002 Data Modeling - 131 -


Deletion Rules

• Update Constraints
• Delete Constraints
–None
–Restrict
–Cascade
–Set null
–Set Default

© Minder Chen, 1993~2002 Data Modeling - 132 -


Generation of Oracle SQL DLL
-- ============================================================
-- Datab ase name: PROJECT
-- DBMS name: ORACLE Version 8
-- Created on: 7/21/98 8:59 PM
-- ============================================================

-- ============================================================
-- Table: DIVISION
-- ============================================================
create table ADMIN.DIVISION
(
DIVNUM numeric(5) n ot null
co nstraint CKC_DIVNUM_DIVISION check (DIVNUM >= '1'),
DIVNAME char(30) not n ull,
DIVADDR char(80) null ,
constraint PK_DIVISION p rimary key (DIVNUM)
)
/

-- ============================================================
-- Table: CUST OMER
-- ============================================================
create table PROJ.CUSTOMER
(
CUSNUM numeric(5) n ot null
co nstraint CKC_CUSNUM_CUSTOMER check (
CUSNUM >= '1'),
CUSNAME ch ar(30) not nu ll,
CUSADDR ch ar(80) not nu ll,
CUSACT char(80) null ,
CUSTEL char(12) nu ll ,
CUSFAX char(12) n ull ,
constraint PK_CUST OMER primary key (CUSNUM)
)
/

-- ============================================================
-- Table: TEAM
-- ============================================================
create table PROJ.TEAM
(
TEANUM numeric(5) not null
co nstraint CKC_T EANUM_T EAM ch eck (TEANUM >= '1'),
TEASPE char(80) nu ll ,
constraint PK_TEAM primary key (TEANUM)
)
/

-- ============================================================
-- Table: MATERIAL
-- ============================================================
create table PROJ.MATERIAL
(
MATNUM numeric(5) not nu ll
co nstraint CKC_MATNUM_MATERIAL ch eck (MATNUM >= '1'),
MATNAME char(30) not n ull,
MATTYPE char(30) no t null,
constraint PK_MAT ERIAL primary key (MATNUM)
)
/

-- ============================================================
-- Table: EMPL OYEE
-- ============================================================
create table PROJ.EMPLOYEE
(
EMPNUM numeric(5) n ot null
co nstraint CKC_EMPNUM_EMPLOYEE ch eck (
EMPNUM >= '1'),
EMP_EMPNUM n umeric(5) null ,
DIVNUM numeric(5) n ot null,
EMPFNAM ch ar(30) null ,
EMPLNAM ch ar(30) not nu ll,
EMPFUNC char(30) n ull ,
EMPSAL n umeric(8,2) null ,
constraint PK_EMPL OYEE primary key (EMPNUM),
constraint AK_EMP_AK1_EMPLOYEE un ique (EMPLNAM, EMPF NAM, EMPFUNC)
)
/

-- ============================================================
-- Index: CHIEF _F K
-- ============================================================
create ind ex PROJ.CHIEF_FK on PROJ.EMPLOYEE (EMP_EMPNUM asc)
/

-- ============================================================
-- Index: BELONGS_TO_F K2
-- ============================================================
create ind ex PROJ.BEL ONGS_TO_F K2 on PROJ.EMPLOYEE (DIVNUM asc)
/

-- ============================================================
-- Table: PROJECT
-- ============================================================
create table PROJ.PROJECT
(
PRONUM numeric(5) n ot null
co nstraint CKC_PRONUM_PROJECT check (
PRONUM >= '1'),
CUSNUM numeric(5) n ot null,
EMPNUM numeric(5) n ull ,
ACTBEG timestamp null
co nstraint CKC_ACTBEG_PROJECT check (
ACT BEG is nu ll or ((activity.b egind ate < activity.enddate))),
ACTEND timestamp null
co nstraint CKC_ACTEND_PROJECT check (
ACT END is nu ll or ((activity.b egind ate < activity.enddate))),
PRONAME ch ar(30) not nu ll,
PROLABL char(80) n ull ,
constraint PK_PROJECT primary key (PRONUM)
)
/

-- ============================================================
-- Index: SUBCONTRACT_F K
-- ============================================================
create ind ex PROJ.SUBCONTRACT_FK on PROJ.PROJECT (CUSNUM asc)
/

-- ============================================================
-- Index: IS_RESPONSIBL E_FOR_FK
-- ============================================================
create ind ex PROJ.IS_RESPONSIBL E_FOR_FK on PROJ.PROJECT (EMPNUM asc)
/

-- ============================================================
-- Table: TASK
-- ============================================================
create table PROJ.TASK
(
PRONUM numeric(5) n ot null,
TSKNAME char(30) n ot null,
ACTBEG timestamp null
co nstraint CKC_ACTBEG_TASK check (ACT BEG is nu ll or ((activity.b egind ate < activity.enddate))),
ACTEND timestamp null
co nstraint CKC_ACTEND_TASK check (ACT END is nu ll or ((activity.b egind ate < activity.enddate))),
TSKCOST numeric(8,2) not nu ll,
constraint PK_TASK primary key (PRONUM, T SKNAME),
constraint CKT_TASK ch eck (
(task.beg in date < min(p articip ate.begindate)
an d
task.enddate < max(participate.enddate)))
)
/

-- ============================================================
-- Index: BELONGS_TO_F K
-- ============================================================
create ind ex PROJ.BEL ONGS_TO_F K on PROJ. TASK (PRONUM asc)
/

-- ============================================================
-- Table: PART ICIPATE
-- ============================================================
create table PROJ.PARTICIPATE
(
PRONUM numeric(5) n ot null,
TSKNAME char(30) n ot null,
EMPNUM numeric(5) n ot null,
PARBEG timestamp null
co nstraint CKC_PARBEG_PARTICIP check (PARBEG is null o r (((task.begindate < min(participate.beg in date)
an d
task.enddate < max(participate.enddate)) an d
(participate.beg in date < particip ate.enddate)))),
PAREND timestamp null
co nstraint CKC_PAREND_PARTICIP check (PAREND is null o r (((task.begindate < min(participate.beg in date)
an d
task.enddate < max(participate.enddate)) an d
(participate.beg in date < particip ate.enddate)))),
constraint PK_PART ICIPATE primary key (PRONUM, TSKNAME, EMPNUM),
constraint CKT_PARTICIPATE check (
((task.begindate < min(participate.begind ate)
an d
task.enddate < max(participate.enddate)) an d
(participate.beg in date < particip ate.enddate)))
)
/

-- ============================================================
-- Index: WORKS_ON_FK
-- ============================================================
create ind ex PROJ.WORKS_ON_FK on PROJ. PARTICIPAT E (EMPNUM asc)
/

-- ============================================================
-- Index: IS_DONE_BY_F K
-- ============================================================
create ind ex PROJ.IS_DONE_BY_FK on PROJ. PARTICIPATE (PRONUM asc, TSKNAME asc)
/

-- ============================================================
-- Table: MEMBER
-- ============================================================
create table PROJ.MEMBER
(
TEANUM numeric(5) not null,
EMPNUM numeric(5) n ot null,
constraint PK_MEMBER p rimary key (TEANUM, EMPNUM)
)
/

-- ============================================================
-- Index: MEMBER_FK
-- ============================================================
create ind ex PROJ.MEMBER_FK on PROJ.MEMBER (TEANUM asc)
/

-- ============================================================
-- Index: IS_MEMBER_OF_F K
-- ============================================================
create ind ex PROJ.IS_MEMBER_OF_FK on PROJ.MEMBER (EMPNUM asc)
/

-- ============================================================
-- Table: USED
-- ============================================================
create table PROJ.USED
(
MATNUM numeric(5) not nu ll,
EMPNUM numeric(5) n ot null,
constraint PK_USED primary key (MATNUM, EMPNUM)
)
/

-- ============================================================
-- Index: USED_FK
-- ============================================================
create ind ex PROJ.USED_FK on PROJ.USED (MATNUM asc)
/

-- ============================================================
-- Index: USES_FK
-- ============================================================
create ind ex PROJ.USES_FK on PROJ.USED (EMPNUM asc)
/

-- ============================================================
-- Table: COMPOSE
-- ============================================================
create table PROJ.COMPOSE
(
CPD_MATNUM numeric(5) not null,
CPN_MATNUM numeric(5) not null,
constraint PK_COMPOSE p rimary key (CPD_MATNUM, CPN_MATNUM)
)
/

-- ============================================================
-- Index: COMPOSES_FK
-- ============================================================
create ind ex PROJ.COMPOSES_FK o n PROJ.COMPOSE (CPD_MATNUM asc)
/

-- ============================================================
-- Index: COMPOSED_OF_FK
-- ============================================================
create ind ex PROJ.COMPOSED_OF_FK on PROJ.COMPOSE (CPN_MATNUM asc)
/

alter table PROJ.EMPL OYEE


add constrain t FK_EMPLOYEE_CHIEF_EMPLOYEE foreign key (EMP_EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/

alter table PROJ.EMPL OYEE


add constrain t FK_EMPLOYEE_BELONGS_T_DIVISION fo reign key (DIVNUM)
references ADMIN.DIVISION (DIVNUM)
/

alter table PROJ.PROJECT


add constrain t FK_PROJECT_SUBCONT RA_CUSTOMER fo reign key (CUSNUM)
references PROJ.CUSTOMER (CUSNUM)
/

alter table PROJ.PROJECT


add constrain t FK_PROJECT_IS_RESPON_EMPLOYEE foreign key (EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/

alter table PROJ.TASK


add constrain t FK_T ASK_BELONGS_T_PROJECT foreig n key (PRONUM)
references PROJ.PROJECT (PRONUM)
/

alter table PROJ.PART ICIPATE


add constrain t FK_PARTICIP_WORKS_ON_EMPLOYEE foreign key (EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/

alter table PROJ.PART ICIPATE


add constrain t FK_PARTICIP_IS_DONE_B_TASK foreign key (PRONUM, TSKNAME)
references PROJ.T ASK (PRONUM, TSKNAME)
/

alter table PROJ.MEMBER


add constrain t FK_MEMBER_MEMBER_TEAM foreig n key (TEANUM)
references PROJ.T EAM (TEANUM)
/

alter table PROJ.MEMBER


add constrain t FK_MEMBER_IS_MEMBER_EMPLOYEE foreign key (EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/

alter table PROJ.USED


add constrain t FK_USED_USED_MATERIAL foreig n key (MATNUM)
references PROJ.MATERIAL (MAT NUM)
/

alter table PROJ.USED


add constrain t FK_USED_USES_EMPL OYEE foreig n key (EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/

alter table PROJ.COMPOSE


add constrain t FK_COMPOSE_COMPOSES_MATERIAL fo reign key (CPD_MATNUM)
references PROJ.MATERIAL (MAT NUM)
/

alter table PROJ.COMPOSE


add constrain t FK_COMPOSE_COMPOSED__MAT ERIAL foreign key (CPN_MATNUM)
references PROJ.MATERIAL (MAT NUM)
/

© Minder Chen, 1993~2002 Data Modeling - 133 -


Referential Integrity
alter table PROJ.EMPLOYEE
add constraint FK_EMPLOYEE_CHIEF_EMPLOYEE foreign key (EMP_EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/
alter table PROJ.EMPLOYEE
add constraint FK_EMPLOYEE_BELONGS_T_DIVISION foreign key (DIVNUM)
references ADMIN.DIVISION (DIVNUM)
/
alter table PROJ.PROJECT
add constraint FK_PROJECT_SUBCONTRA_CUSTOMER foreign key (CUSNUM)
references PROJ.CUSTOMER (CUSNUM)
/
alter table PROJ.PROJECT
add constraint FK_PROJECT_IS_RESPON_EMPLOYEE foreign key (EMPNUM)
references PROJ.EMPLOYEE (EMPNUM)
/
alter table PROJ.TASK
add constraint FK_TASK_BELONGS_T_PROJECT foreign key (PRONUM)
references PROJ.PROJECT (PRONUM)
/

© Minder Chen, 1993~2002 Data Modeling - 134 -


Physical Database Design Activities
Define Tables & Columns

Define Keys

Identify Critical Transactions

Add Columns: Manipulate Tables: Add Tables:


• Redundant columns • Collapse tables • Derived data
• Derived data columns • Supertypes & subtypes tables

Handle Integrity Issues:


• Row uniqueness & Domain restrictions
• Referential integrity & Generate sequence numbers
• Derived and redundant data

Controlling Access
Source: Gillete, Rob, etc., Physical
Database Design for Sybase SQL
Server, Prentice Hall, 1995. Manage Objects:
• Sizes
• Placement
© Minder Chen, 1993~2002 Data Modeling - 135 -
Architecture of Data Warehouse
Data Warehouse End User
Access and
Corporate Metadata OLAP front-
Operational Info. Directory Summarized end Tools
Database

Data Derived
Replication
& Cleansing
Informational
Detailed Database
• EIS
Past Projecte • DSS
Current d • Report Writers
• Spreadsheets
• Data extraction
Data • Data filtering
Bridging/ • Table joining
Transformati • Translation
on • Re-Formatting
© Minder Chen, 1993~2002 Data Modeling - 136 -
Operational vs. Informational Databases
Characteristics Operational Database Informational Database
Data Archival data, summarized
Current value
Content data, calculated data

Data Subject areas across


Application by application
organizations enterprise

Data
Dynamic Static until refreshed
Volatility

Data Fully normalized for Joined views suitable for


normalization transaction processing business analysis

Access
High Low - Medium
frequency

Data Updated on a record and field Access only;


Update basis no direct update

Highly structured Highly unstructured, heuristic


Usage
transaction processing or analytical processing

Response
Sub-second to 2-3 seconds Several seconds to minutes
Time
© Minder Chen, 1993~2002 Data Modeling - 137 -
Excel Pivot
Table Wizard

Relational View

Multidimensional View
© Minder Chen, 1993~2002 Data Modeling - 138 -
Dimensional Model
Product Sale Market Region
• Key • Key
• Name • Description
• Description Product Key • District
• Size Market Key • Region
• Price Promotion Key • Demographics
Time Key
• Dollars
• Units Time
Promotion • Price • Key
• Key • Cost • Weekday
• Description • Holiday
• Discount • Fiscal
• Media

Region
Product

© Minder Chen, 1993~2002 Time Data Modeling - 139 -


Modeling a Data Warehouse
• MDM: Multidimensional Modeling
– A logical model of business information
– Easy to understand
– Applicable to relational and multidimensional
databases
– Extremely useful for analysis
– A tried-and-tested techniques
• Why?
– An OLTP (On-Line Transaction Process) design of an
order processing system may have dozens or
hundreds of tables. It becomes difficult for
business managers to understand the design in
order to analyze the data.

© Minder Chen, 1993~2002 Data Modeling - 140 -


Approach
• Designed around numeric data:
– values
– counts
– weights
– occurrence
• An example of a MDM problem statement:
– "What is my profitability by customer over
time, by organization?"

© Minder Chen, 1993~2002 Data Modeling - 141 -


The Classic Star Schema
Each dimension is described by its own table
and the facts are arranged in a single large
Market Dimension table with a concatenated primary key
comprises the individual keys of each
Market ID
dimension.
description
region
state Fact Table Period Dimension
district
Market ID Period ID
city
Product ID description
Period ID year
dollars quarter
Product Dimension units month
price current flag
Product ID
resolution
description
sequence
supplier ID
brand
color
size
© Minder Chen, 1993~2002 Data Modeling - 142 -
Snow Flake Structure
Brand
Brand identifier <pk> int
Brand name char(30) Customer
Customer identifier <pk> int
Brand identifier = Brand identifier Customer name char(30)
Customer address char(80)
Product Customer activity char(80)
Customer phone number char(12)
Product identifier <pk> int
Customer fax number char(12)
Brand identifier <fk> int
Product description char(80) Year
Product category char(30) Year identifier <pk> int
Product unit price int Year name char(30)
Customer identifier = Customer identifier
Year identifier = Year identifier
Product identifier = Product identifier
Quarter
Sale Quarter identifier <pk> int
Time identifier <fk> int Year identifier <fk> int
Customer identifier <fk> int Quarter name char(10)
Country Store identifier <fk> int
Country identifier <pk> int Product identifier <fk> int Quarter identifier = Quarter identifier
Country name char(80) Sale total real
Sale revenu real Month
Month identifier <pk> int
Country identifier = Country identifier Quarter identifier <fk> int
Month name char(10)
Region
Month identifier = Month identifier
Region identifier <pk> int Store identifier = Store identifier
Country identifier <fk> int Week
Time identifier = Time identifier
Region name char(30) Week identifier <pk> int
Month identifier <fk> int
Week name char(30)
Region identifier = Region identifier
Week number in year int
Store
Store identifier <pk> int Week identifier = Week identifier
Region identifier <fk> int
Store name char(50) Day
Store address char(80) Week identifier <fk> int
Store manager char(30) Time identifier <pk> int
Store phone number char(20) Date datetime
Store FAX number char(20) Day of week char(30)
Store financial services type char(10) Day number in month int
Store photo services type char(10)

© Minder Chen, 1993~2002 Data Modeling - 143 -


Steps to Build MDM
• Pick a business subject area
– Weekly sales reports, monthly financial statements,
insurance claim costs.
• Asking six fundamental questions:
– What business process is being modeled?
– At what level of detail (granularity) is "active"
analysis conducted?
– What do the measures have in common (the
"dimensions")?
– What are the dimensions' attributes?
– Are the attributes stable or variable over time and is
their "cardinality" bounded or unbounded?

© Minder Chen, 1993~2002 Data Modeling - 144 -


Issues
• Active analysis
– Mechanical manipulation: Pivoting, Drilling
down, Graphing
– Agent-based manipulation: Alert reporting,
exception reporting
– Workflow manipulation: Publishing,
distributing documents.
• Cardinality means "how many"
– A relational database usually has "unbounded"
cardinality
– A multidimensional database usually has
"bounded" cardinality. Complete
reorganization is needed to change cardinality.

© Minder Chen, 1993~2002 Data Modeling - 145 -


dept_id = parent_id A Data Model for an Electronic Commerce Application
dept product_attribute
dept_id int dept_id = dept_id pfid = pfid pfid varchar(30)
parent_id int attribute_id tinyint
name varchar(255) attribute_index tinyint
description text attribute_value varchar(20)
date_changed datetime product_family
pfid varchar(30)
dept_id int pfid = pfid promo_cross
manufacturer_id int pfid varchar(30)
pfid = pfid name varchar(255) related_pfid varchar(30)
short_description varchar(255) description varchar(255)
long_description text pfid = pfid
product_variant image_filename varchar(255)
sku int intro_date datetime
date_changed datetime pfid = pfid promo_upsell
pfid varchar(30) list_price int pfid varchar(30)
attribute0 tinyint monogramable tinyint related_pfid varchar(30)
attribute1 tinyint description varchar(255)
attribute2 tinyint
attribute3 tinyint
attribute4 tinyint
receipt_item
shopper pfid varchar(30)
sku = sku sku int
shopper_id char(32)
created datetime order_id = order_id order_id char(26)
name varchar(235) row_id int
password varchar(20) quantity int
street varchar(50) adjusted_price int
city varchar(50)shopper_id = shopper_id
state varchar(30) receipt
zip varchar(15) promo_price
country varchar(20) order_id char(26)
phone varchar(16) shopper_id char(32) promo_name varchar(255)
email varchar(50) total int promo_type int
status tinyint promo_description text
date_entered datetime promo_rank int
date_changed datetime active int
marshalled_receipt image date_start datetime
date_end datetime
shopper_id = shopper_id shopper_all int
shopper_column varchar(64)
shopper_op varchar(2)
shopper_value varchar(64)
basket cond_all int
shopper_id char(32) cond_column varchar(64)
date_changed datetime cond_op varchar(2)
marshalled_order image cond_value varchar(64)
cond_basis char(1)
cond_min int
award_all int
award_column varchar(64)
award_op varchar(2)
award_value varchar(64)
award_max int
disjoint_cond_award int
disc_type char(1)
© Minder Chen, 1993~2002 disc_value realData Modeling - 146 -
Attribute 0 of pfid 14 is size and
the attribute value 1 is Grande
and 2 is Tall and 3 is Short

© Minder Chen, 1993~2002 Data Modeling - 147 -


Web-based Build-To-Order Application

© Minder Chen, 1993~2002 Data Modeling - 148 -


Data Model for Build-To-Order Application

© Minder Chen, 1993~2002 Data Modeling - 149 -


http://www.oracle.com/tools/jdeveloper/documents/jsptwp/index.html?content.html

Auction Web
Site's Data Model
© Minder Chen, 1993~2002 Data Modeling - 150 -