You are on page 1of 191

Chapter 1 –Database Systems

Engin Calisir University of Texas at Dallas


Learning Objectives
• After completing this chapter, you will be able to:
• Define the difference between data and information
• Describe what a database is, various types, and why they are valuable assets
for decision making
• Explain the importance of database design
• See how modern databases evolved from file systems
• Understand flaws in file system data management
• Outline the main components of the database system
• Describe the main functions of a database management system (DBMS)
Timeline

Source :Ben Stopford


Chapter 5 -Advanced Data Modeling

5
Learning Objectives
• After completing this chapter, you will be able to:
• Describe the main extended entity relationship (EER) model constructs and
how they are represented in ERDs and EERDs
• Use entity clusters to represent multiple entities and relationships in an entity
relationship diagram (ERD)
• Describe the characteristics of good primary keys and how to select them
• Apply flexible solutions for special data-modeling cases

6
The Extended Entity Relationship Model
(EERM)
• Enhanced entity relationship model
• Result of adding more semantic constructs to the original entity
relationship (ER) model like Inheritance, clustering, supertype -subtype
• EER diagrams (EERDs) use the EER model

7
Entity Supertypes and Subtypes
• Entity supertype (-employee) Parent
• Generic entity type related to one or more entity subtypes
• Contains common characteristics
Child
• Entity subtype (-pilot)
• Contains unique characteristics of each entity subtype
• Criteria to determine usage (must meet with both criteria)
• There must be different, identifiable kinds of the entity in the user’s
environment
• The different kinds of instances should each have one or more attributes that
are unique to that kind of instance

8
Inheritance
Enables an entity subtype to inherit attributes and relationships of the
supertype

• All entity subtypes inherit their primary key


attribute from their supertype
• At the implementation level, supertype and
its subtype(s) maintain a 1:1 relationship
• Entity subtypes inherit all relationships in
which supertype entity participates
• Lower-level subtypes inherit all attributes
and relationships from its upper-level
supertypes

9
Specialization Hierarchy (1 of 2)
• Entity supertypes and subtypes are organized in a specialization hierarchy
• Depicts arrangement of higher-level entity supertypes and lower-level entity
subtypes
• Relationships are described in terms of “is-a” relationships (1:1)
• Subtype exists within the context of a supertype
• Every subtype has one supertype to which it is directly related
• Supertype can have many subtypes

A specialization hierarchy provides the means to:

• Support attribute inheritance


• Define a special supertype attribute known as the subtype
discriminator
• Define disjoint or overlapping constraints and complete or partial
constraints

10
Specialization Hierarchy (2 of 2)

11
Subtype Discriminator / Completeness Constraint
• Nothing but it is another attribute for supertype entities.
• Default comparison condition is the equality comparison
• In some situations the subtype discriminator is not necessarily
based on an equality comparison
Partial >can be null / Complete > no null
Disjoint> must have unique / Overlapping > not unique

d o d o

Disjoint Overlapping Disjoint Overlapping


Partial Partial Complete Complete
12
Completeness Constraint
• Specifies whether each supertype occurrence must also be a member of at
least one subtype
• Partial completeness: not every supertype occurrence is a member of a
subtype
• Total completeness: every supertype occurrence must be a member of at
least one subtypes

13
Example;

Person Person

d o

Male Female Staff Faculty

14
Specialization and Generalization
• Specialization
• Top-down process
• Identifies lower-level, more specific entity
subtypes from a higher-level entity supertype
• Based on grouping unique characteristics and
relationships of the subtypes
• Generalization
• Bottom-up process
• Identifies a higher-level, more generic entity
supertype from lower-level entity subtypes
• Based on grouping common characteristics and
relationships of the subtypes

15
Entity Clustering
• “Virtual” entity type used to represent multiple entities and relationships
in ERD
• Formed by combining multiple interrelated entities into a single, abstract entity
object
• More conceptual.. ( Simplifying).. No Attributes, no relationship, PK,FK.

16
Entity Integrity: Selecting Primary Keys
• Primary keys: single attribute or a combination of attributes
• Uniquely identifies each entity instance
• Guarantees entity integrity
• Works with foreign keys to implement relationships
Natural Keys and Primary
Keys
• Natural key or natural identifier:
• Something unique have/has.
• Perfect candidate for primary key selection/modelling.

17
Primary Key Guidelines
• Unique and can’t be null
• No change over time : must be static (not like LastName only)
• Preferably single-attribute : desired but not required
• Preferably numeric : Sorting /incrementing is more easy
• Security-compliant : not composed with security risked attribute( SSN)
• Non intelligent :Should not be another meaning ,must be descriptive and
uniquely identify each entity instance.

18
When to Use Composite Primary Keys (CPK)
They are typically seen in associative entity tables or weak entity tables

• A Weak Entity table is a table that exists


because another table exists
• An Associative Entity table is simply a table that is • Identifiers of weak entities
used for a many-to-many relationship between 2 • Represents a real-world object that is
or more other tables. existence-dependent on another real-world
• Each primary key combination is allowed once object
in M:N relationship • Represented in the data model as two
• Strong identifying relationship with the parent separate entities in a strong identifying
entity relationship (multivalued)
When to Use Surrogate Primary Keys
• Useful when there is no natural key
• Helpful if selected candidate key has embedded semantic contents or
is too long
• Must have “unique identification ” and “not null” constraints

20
Design Case 1:Implementing 1:1 Relationships
• Foreign keys work with primary keys to properly implement relationships
in relational model
• Place primary key of the parent entity on the dependent entity as foreign key
One side is mandatory and the Place the PK of the entity on the mandatory
other side side in the entity on the optional side as a FK,
and make the FK mandatory
Both sides are Select the FK that causes the fewest nulls, or
optional or mandatory more simple ( fallow the primary key guide as
much as you can)

21
Design Case 2: Maintaining History of Time-Variant Data
• Time-variant data: data whose values
change over time and for which a
history of the data changes must be
retained
• Requires creating a new entity in a 1:M
relationship with the original entity
• New entity contains the new value, date of
the change, and any other pertinent
attribute

22
Design Case 3: Fan Traps
• Design traps: occurs when a relationship is improperly or incompletely
identified
• Fan trap
• Loopbacks
• Chasm trap
• Fan trap: occurs when one entity is in two 1:M relationships to other entities
fan trap solution

With fan trap

23
Design Case 3: Fan Traps (example from book)

24
Design Case 4: Redundant Relationships
• Occur when there are multiple relationship paths between related entities
• Must remain consistent across the model
• Help simplify the design

Relationship between player and division is creating redundancy and should be


deleted.

25
Questions ?

26
SQL Databases

• Global, and everywhere


• Persistent and shareable in a secure way
• Reusable for different needs
NoSQL Databases
Magic Quadrant: Data Management Solution
Today’s Database Trend
Data and Metadata
• Data: raw facts of interest to end user
• Name of the costumers , SSN, addresses, song, movie etc..

• Metadata: “data about data” ;


• Describes data characteristics and relationships with data , such as ;
• Means of creation of the data
• Purpose of the data
• Time and date of creation
• Creator or author of the data
• Location on a computer network where the data were created
• Standards used
• Tab names of customers , name of artist or album , gender of movie , year of movie,
file size of movie, download date and time etc..
Data versus Information
• Data consists of raw facts
• Not yet processed to reveal meaning to the end user only numbers , text and
symbols
• Building blocks of information

• Information results from processing raw data to reveal meaning


• Requires context
• Vital for knowledge
• Should be accurate, relevant, and timely
• Example :
• Raw Data 1,5,-9,11,-14,22,35
• Working TV Channels at “75082 zip code area” is 1,5,11,22,35
DIKW Model
What is a Database ?
• A database is a collection of integrated data or records
• Term “database can refer to any collection of data items
• Record is representation of o physical or conceptual entity

• A database consist of both data and metadata


• Metadata is information that describes the data structure inside of database
• Database stores metadata in the data dictionary that describes the columns
,tables and indexes

• Database contains description of its own structure


• Integrating both data and the relationship among them
Types of Databases (1 of 5)
• Databases can be classified to reflect the degree to which the data is
structured
• Unstructured data exists in its original (raw) state
• Structured data results from formatting
• Structure is applied based on type of processing to be performed
• Semistructured data: processed to some extent

• Business intelligence:
• Captures and processes business data to generate information that support
decision making
Types of Databases (2 of 5)
• Classification by user ;
• Single-user database: supports one user at a time
• Desktop database: single-user database on a personal computer
• Multiuser database: supports multiple users at the same time
• Workgroup databases: supports a small number of users or a specific department
• Enterprise database: supports many users across many departments

• Classification by location
• Centralized database: data located at a single site
• Distributed database: data distributed across different sites
• Cloud database: created and maintained using cloud data services that
provide defined performance measures for the database
Types of Databases (3 of 5)

• Classification by data type


• General-purpose database: contains a wide variety of data used in multiple
disciplines
• Discipline-specific database: contains data focused on specific subject areas
• Operational database: designed to support a company’s day-to-day
operations
Types of Databases (4 of 5)
• As of today all modern databases are falls under the 4 major structural type
;

• Hierarchical databases.
• Network databases.
• Relational databases.
• Object-oriented databases
Types of Databases (5 of 5)

• Analytical database: stores historical data and business metrics used


exclusively for tactical or strategic decision making
• Data warehouse: stores data in a format optimized for decision support
• Online analytical processing (OLAP): tools for retrieving, processing, and
modeling data from the data warehouse ( simply for Data Mining)
• Trend analysis
• Time series etc.
OLAP vs OLTP
• We can divide IT systems into transactional (OLTP) and analytical (OLAP). In
general we can assume that OLTP systems provide source data to data
warehouses, whereas OLAP systems help to analyze it.
OLAP vs OLTP

• OLAP (On-line Analytical Processing) is characterized by relatively low volume of


transactions. Queries are often very complex and involve aggregations. For OLAP systems a response
time is an effectiveness measure. OLAP applications are widely used by Data Mining techniques. In
OLAP database there is aggregated, historical data, stored in multi-dimensional schemas.

• OLTP (On-line Transaction Processing) is characterized by a large number of short


on-line transactions (INSERT, UPDATE, DELETE). The main emphasis for OLTP systems is put on very
fast query processing, maintaining data integrity in multi-access environments and an effectiveness
measured by number of transactions per second. In OLTP database there is detailed and current data,
and schema used to store transactional databases is the entity model (usually 3NF).
Databases Performance
• Some other new concepts for better performance
• Replications
• Read/Reporting Replications
• Sharding concept
• Clustering
• Memory Caching
• Oracle called “in-memory caching” ,
• AWS called ElasticCache and use Memcached and Redis memory data
stores
Problems with File System Data Processing
• Problems with file systems challenge the types of
information that can be created from data as well as
information accuracy

• Lengthy development times


• Difficulty of getting quick answers
• Complex system administration
• Lack of security and limited data sharing
• Extensive programming
Structural and Data Dependence
• Structural dependence
• Access to a file is dependent on its own structure
• All file system programs are modified to conform to a new file structure
• Structural independence
• File structure is changed without affecting the application’s ability to access the data
• Data dependence
• Data access changes when data storage characteristics change
• Data independence
• Data storage characteristics are changed without affecting the program’s ability to access the data

Practical significance of data dependence is the difference between logical and physical phase.
Data Redundancy
• Unnecessarily storing the same data at different places
• Islands of information (i.e., scattered data locations)
• Increases the probability of having different versions of the same data

• Possible results of uncontrolled data redundancy


• Poor data security
• Data inconsistency
• Data-entry errors
• Data integrity problems

How about Database redundancy ?


Data Consistency
• Data inconsistency exists when different and conflicting versions of the same data appear
in different places.

• How about Database consistency ?


• Eventually consistency
• Strong consistency
Database Systems
• Logically related data stored in a single logical data repository
• Physically distributed among multiple storage facilities
• DBMS eliminates most of file system’s data inconsistency, data anomaly, data
dependence, and structural dependence problems
• Current generation DBMS software
• Stores data structures, relationships between structures, and access paths
• Defines, stores, and manages all access paths and components
DBMS
• Database management system (DBMS)
• Collection of programs of
Databases,
Security,
Data Dictionary ,
Storage Engine,
Query Processor
DBMS advantages

• Improve data sharing


• Improve data security
• Better data integration
• Minimize data inconsistency
• Improve data access
• Improve end-user productivity
• Improve data decision
The Database System Environment
• Database system: organization of components that
define and regulate the collection, storage,
management, and use of data within a database
environment

• Hardware
• Software
• People
• Procedures
• Data
DBMS Functions (1 of 2)
• Data dictionary management
• Data dictionary: stores definitions of data elements and their relationships
• Data storage management
• Performance tuning ensures efficient performance
• Data transformation and presentation
• Data is formatted to conform to logical expectations
• Security management
• Enforces user security and data privacy
• Multiuser access control
• Sophisticated algorithms ensure that multiple users can access the
database concurrently without compromising its integrity
DBMS Functions (2 of 2)
• Backup and recovery management
• Enables recovery of the database after a failure
• Data integrity management
• Minimizes redundancy and maximizes consistency
• Database access languages and application programming interfaces
• Query language: lets the user specify what must be done without having to
specify how
• Structured Query Language (SQL): de facto query language and data access
standard supported by the majority of DBMS vendors
• Database communication interfaces
• Accept end-user requests via multiple, different network environments
Disadvantages of database systems

• Increased costs
• Management complexity
• Maintaining currency
• Vendor dependence
• Frequent upgrade/replacement cycles
Why Database Design Is Important
• Focuses on design of database structure that will be used to store and
manage end-user data
• Well-designed database: facilitates data management and generates accurate
and valuable information
• Poorly designed database: causes difficult-to-trace errors that may lead to
poor decision making
Data Anomalies (1 of 3)
• Anomalies can occur in poorly planned or un-normalized databases

Insertion anomalies ( occurs when some of the attributes cannot insert)


• Example ; Suppose our university has approved a new course called ITSS5300: SQL & PL/SQL.

Can this information about the new course be entered (inserted) into the table COURSE in its
present form?
COURSE# SECTION# C_NAME
ITSS564 072 Database Design
ITSS564 073 Database Design
MIS570 072 Oracle Forms
ITSS564 074 Database Design
Data Anomalies (2 of 3)
Deletion anomalies (certain attributes are lost because of the deletion of other attributes)

• Example : Suppose not enough students enrolled for the course MIS570 which had only one
section 072. So, the school decided to drop this section and delete the section# 072 for
MIS570 from the table COURSE. But then, what other relevant info also got deleted in the
process?

COURSE# SECTION# C_NAME


ITSS564 072 Database Design
ITSS564 073 Database Design
MIS570 072 Oracle Forms
ITSS564 074 Database Design
Data Anomalies (3 of 3)
Update anomalies ( when one or more instance of duplicated data is updated but
not all.)
• Example :Suppose the course name (C_Name) for ITSS564 got changed to Database
Management. How many times do you have to make this change in the COURSE table in its
current form?

COURSE# SECTION# C_NAME


ITSS564 072 Database Design
ITSS564 073 Database Design
MIS570 072 Oracle Forms
ITSS564 074 Database Design
Preparing for Your Database Professional Career

TABLE 1.3 DATABASE CAREER


OPPORTUNITIES

JOB TITLE DESCRIPTION SAMPLE SKILLS REQUIRED


Database Developer Create and maintain database-based Programming, database fundamentals, SQL
applications
Database Designer Design and maintain databases Systems design, database design, SQL

Database Manage and maintain DBMS and Database fundamentals, SQL, vendor courses
Administrator databases
Database Analyst Develop databases for decision support QL, query optimization, data warehouses
reporting
Database Architect Design and implementation of database DBMS fundamentals, data modeling, SQL,
environments (conceptual, logical, and hardware knowledge, etc.
physical)

Database Consultant Help companies leverage database Database fundamentals, data modeling,
technologies to improve business database design, SQL, DBMS, hardware,
processes and achieve specific goals vendor-specific technologies, etc.
Database Security Implement security policies for data DBMS fundamentals, database administration,
Officer administration SQL, data security technologies, etc.

Cloud Computing Design and implement the infrastructure Internet technologies, cloud storage
Data Architect for next-generation cloud database technologies, data security, performance
systems tuning, large databases, etc.
Data Scientist Analyze large amounts of varied data to Data analysis, statistics, advanced
generate insights, relationships, and mathematics, SQL, programming, data mining,
predictable behaviors machine learning, data visualization
Questions ?
Learning Objectives
• After completing this chapter, you will be able to:
• Discuss data modeling and why data models are important
• Describe the basic data-modeling building blocks
• Define what business rules are and how they influence database design
• Understand how the major data models evolved
• List emerging alternative data models and the needs they fulfill
• Explain how data models can be classified by their level of abstraction
Chapter 2 – Data Models
Chapter 9 -Database Design
Data Modeling and Data Models

• Data modeling is “a process “


• Data collection & Business rules & Requirements collection
• Conceptual Data Model
• Logical Data Model
• Physical Data Model
• Implementation
• Maintenance

• Data model is schema or blueprint of solution ( outcome of data modeling)


• Graphical simple representation.
The Importance of Data Models

• Facilitates communication
• Gives various views of the database
• Organizes data for various users
• Provides an abstraction for the creation of good a database
Business Rules
• Brief, precise, and unambiguous description of a policy, procedure, or
principle
• Create and enforce actions within that organization’s environment
• Proper identification of, Relationships, Attributes ,Constraints and Entities
Discovering Business Rules (1 of 2)
• Sources of business rules
• Company managers
• Policy makers
• Department managers
• Written documentation
• Direct interviews with end users & costumers

• How about Government regulations ?


Discovering Business Rules (2 of 2)
• Reasons for identifying and documenting business rules
• Standardize company’s view of data
• Facilitate communications tool between users and designers
• Assist designers
• Understand the nature, role, scope of data, and business processes
• Develop appropriate relationship participation rules and constraints
• Create an accurate data model
Translating Business Rules into Data Model Components
• Business rules set the stage for the proper identification of entities, attributes,
relationships, and constraints
• Nouns translate into entities
• Verbs translate into relationships among entities
• Relationships are bidirectional !!
• Questions to identify the relationship type
• How many instances of B are related to one instance of A?
• How many instances of A are related to one instance of B?

• Each applicant can submit one or more applications (one application per year for multiple
years).
• Each application is submitted by only one applicant.
Hierarchical and Network Models
• Hierarchical models: developed to manage large amounts of data for
complex manufacturing projects
• Represented by an upside-down tree which contains segments
• Segments are the equivalent of a file system’s record type
• Depicts a set of one-to-many (1:M) relationships
• The network model allows more than one parent and support (M:M) relationship

Schema
Hierarchical Model
• Advantages
• Promotes data sharing
• Parent/child relationship promotes conceptual simplicity and data integrity
• Database security is provided and enforced by DBMS
• Efficient with 1:M relationships
• Disadvantages
• Requires knowledge of physical data storage characteristics
• Navigational system requires knowledge of hierarchical path
• Changes in structure require changes in all application programs
• Implementation limitations
• No data definition
• Lack of standards
Network Model
• Advantages
• Conceptual simplicity
• Handles more relationship types
• Data access is flexible
• Data owner/member relationship promotes data integrity
• Conformance to standards
• Includes data definition language (DDL) and data manipulation language (DML)
• Disadvantages
• System complexity limits efficiency
• Navigational system yields complex implementation, application development, and
management
• Structural changes require changes in all application programs
The Relational Model
• Relational database management system (RDBMS)
• Performs basic functions provided by the hierarchical and network DBMS
systems
• Makes the relational data model easier to understand and implement
• Hides the complexities of the relational model from the user
• Relation Types; One to one (1:1) , One to many (1:M) many to many (M:N)
Data Model Basic Building Blocks
• Entity: person, place, thing, or event about which data will be collected and stored
• Attribute: characteristic of an entity
• Relationship: association among entities
• One-to-many (1:M OR 1..*)
• Many-to-many (M:N or *..*)
• One-to-one (1:1 OR 1..1)
• Constraint: restriction placed on data
• Ensures data integrity
Naming Conventions
• Entity name requirements
• Be descriptive of the objects in the business environment
• Use terminology that is familiar to the users
• Attribute name
• Required to be descriptive of the data represented by the attribute
• Proper naming
• Facilitates communication between parties
• Promotes self-documentation

• Snake vs Camel vs Kebab vs Pascal  what are they ?


• my_first_variable=my_second_variable-my_third_variable
• myFirstVariable=mySecondVariable-myThirdVariable
• my-first-variable=my-second-variable-my-third-variable
Relational Model
• Advantages
• Structural independence is promoted using independent tables
• Tabular view improves conceptual simplicity
• Ad hoc query capability is based on SQL
• Isolates the end user from physical-level details
• Improves implementation and management simplicity
• Disadvantages
• Requires substantial hardware and system software overhead
• Conceptual simplicity gives untrained people the tools to use a good system poorly
• May promote information problems
The Entity Relationship Model
• Graphical representation of entities and their relationships in a database
structure
• ER Models represented in Entity relationship diagram (ERD) < uses graphic
representations to model database components>
• Entity instance or entity occurrence: rows in the relational table
• Can be anything
• Must have rectangle box
• Singular ,“A noun”
• Attributes: describe particular characteristics
• Connectivity: term used to label the relationship(s) types
• Active or passive verb
Entity Relationship Model
• Advantages
• Visual modeling yields conceptual simplicity
• Visual representation makes it an effective communication tool
• Is integrated with the dominant relational model
• Disadvantages
• Limited constraint representation
• Limited relationship representation
• No data manipulation language
• Loss of information content occurs when attributes are removed from entities to avoid
crowded displays
The Object-Oriented Data Model (1 of 3)

• Both data and its relationships are contained in a single structure


known as an object
• Object-oriented database management system(OODBMS): based on OODM
• Object: contains data and their relationships with operations that are
performed on it
• Basic building block for autonomous structures
• Abstraction of real-world entity
• Attribute: describes the properties of an object
The Object-Oriented Data Model (2 of 3)

• Class: collection of similar objects with shared structure and behavior


organized in a class hierarchy
• Class hierarchy: resembles an upside-down tree in which each class
has only one parent
• Inheritance: object inherits methods and attributes of classes above it
• Unified Modeling Language (UML): describes sets of diagrams and
symbols to graphically model a system
The Object-Oriented Data Model (3 of 3)
• Advantages
• Semantic content is added
• Visual representation includes semantic content
• Inheritance promotes data integrity
• Disadvantages
• Slow development of standards caused vendors to supply their own
enhancements
• Complex navigational system
• Learning curve is steep
• High system overhead slows transactions
Emerging Data Models: Big Data and NoSQL (1 of 3)

• Goals of Big Data


• Find new and better ways to manage large amounts of web and sensor-
generated data
• Provide high performance at a reasonable cost
• Characteristics of Big Data
• Volume
• Velocity
• Variety
Emerging Data Models: Big Data and NoSQL (2 of 3)

• Challenges of Big Data


• Volume doesn’t allow usage of conventional structures
• Expensive
• OLAP tools proved inconsistent dealing with unstructured data
• New technologies of Big Data
• Hadoop
• Hadoop Distributed File System (HDFS)
• MapReduce
• NoSQL
Emerging Data Models: Big Data and NoSQL (3 of 3)

• NoSQL databases
• Not based on the relational model
• Support distributed database architectures
• Provide high scalability, high availability, and fault tolerance
• Support large amounts of sparse data
• Geared toward performance rather than transaction consistency
• Provides a broad umbrella for data storage and manipulation
NoSQL
• Advantages
• High scalability, availability, and fault tolerance are provided
• Uses low-cost commodity hardware
• Supports Big Data
• Key-value model improves storage efficiency
• Disadvantages
• Complex programming is required
• There is no relationship support
• There is no transaction integrity support
• In terms of data consistency, it provides an eventually consistent model
Data Modeling & Database Modeling
• Data modeling ( a process )
• Conceptual
• Logical
• Physical

• Database modeling ( architectural decision of data modeling)


• Hierarchical
• Network
• Relational
• Object-Oriented
Degrees of Data Abstraction
The External Model
• End users’ view of the data environment
• People who use the application programs to manipulate the data and generate
information

• ER diagrams are used to


represent the external views
• External schema: specific
representation of an external
view
The Conceptual Model
• Represents a global view of the entire database by the
entire organization
• Conceptual schema: basis for the identification and high-level
description of the main data objects
• Logical design: task of creating a conceptual data model
• Conceptual model advantages
• Macro-level view of data environment
• Software and hardware independent
The Internal Model
• Representing database as seen by the DBMS
mapping conceptual model to the DBMS
• Internal schema: specific representation of an
internal model, using the database constructs
supported by the chosen database
• Logical independence: changing internal model
without affecting the conceptual model
• Hardware independent: unaffected by the type
of computer on which the software is installed
The Physical Model
• Operates at lowest level of abstraction
• Describes the way data are saved on storage media such as magnetic, solid
state, or optical media
• Requires the definition of physical storage and data access methods
• Software and hardware dependent
• Relational model aimed at logical level
• Does not require physical-level details
• Physical independence: changes in physical model do not affect
internal model
Models
Levels of Data Abstraction

Model Degree of Abstraction Focus Independent of

External High End-user views Hardware and software

Conceptual Medium-High Global view of data Hardware and software


(database model
independent)

Internal Medium-Low Specific database model Hardware

Physical Low Storage and access Neither hardware nor


methods software
Systems Development Life Cycle (SDLC)

• Traces history of an information system


• It is a process framework defining tasks performed each step
• Traditional SDLC is divided into five phases
• Planning: yields a general overview of the company and its objectives
• Analysis: problems defined during planning phase are examined in greater
detail
• Detailed systems design: designer completes the design of the system’s
processes
• Implementation: hardware, DBMS software, and application programs are
installed, and the database design is implemented
• Maintenance: corrective, adaptive, and perfective
• Iterative rather than sequential process
DBLC vs SDLC
The Database Life Cycle How the company runs its business?
Mission, vision, organizational structure
Do meeting with all end users ,
Getter information about problems
Find relationship between problems
Find constrains, limits and scope of tasks
ERM is communication tools between parties
Conceptual>Logical>Physical design
Business view –What ?
Technical/Designer view –How ?
Where we are going to install DBMS ? Cloud ?
DB creating with SysAdmins/DomainAdmins
ETL perform? Migrated?Aggregated ?Security ?
Data integrity and security >DBMS
Check limits and boundaries
Regulations & Compliance? PCI ,FERPA
Patches, Bugs, bottlenecks ?
Demands ?

Preventive, Corrective, Adaptive, Audits , Reporting


Database Design

Conceptual Design

Logical Design

Physical Design
(1 of 6)
Conceptual Design

• Goal: design a database independent of database software and


physical details
• Conceptual data model: describes main data entities, attributes,
relationships, and constrains
• Data Analysis and requirements
Step1

• Entity relationship modeling and normalization


Step2

• Data model verification


Step3

• Distributed database design


Step4
(2 of 6)
Conceptual Design

• Goal: design a database independent of database software and


physical details

• Easy to understand
• Simple boxes ,lines and text
• Easy to enhanced
• Highly abstract –no details
• Describes main data entities, attributes,
relationships, and constrains
• Translation from business requirements
• Most likely you will have multiple version.
(3 of 6)
Conceptual Design

• Data Analysis and requirements


Step1

• Goal: Collect all data and requirements from different sources and
make sure all business process and their needs understood correctly

Activities during to this first stage :

• Make meeting with users from all level


• Observe current system and their problems
• Meet with system /DBA design team
• Create scope of work and apply some standards
(4 of 6)
Conceptual Design

Entity relationship modeling and normalization


Step2

• Goal: Create first relationships between entities and details of


attributes

Activities during to this stage :

• Create formal communication tools


• Define relationships among to entities
• Decide PKs, FKs
(5 of 6)
Conceptual Design

Data model verification


Step3

• Goal: finalize all tests, compare with proposed system &needs and confirm with all stakeholders
most of the time this is last step for Conceptual design stage

Activities during to this stage :

• Make sure all modules are in place


• Check all boundaries, limits
• Make sure all access rights and security needs in place

• Modules create loose coupling !!


• Sometimes we may not remove all duplicated and
overlapping of data.
• Enterprise modules doesn't let you have loose coupling.
(6 of 6)
Conceptual Design

Distributed database design


Step3

• Goal: this not required but performance standing point you may need to do if your DB requires very
fast response specially geographically dispersed demands
Some question before to decide any distributed DB:

• Do we have aggregation or performance problem ?


• Do we Content Delivery Network (CDN)
• Do we have governmental mandatory regulations ?
• Do we have to fallow some compliances ?

We are still independent from any


HW and SW.
DBMS Software Selection
• Factors that affect the purchasing
• Cost
• DBMS features and tools
• Underlying model
• Portability
• DBMS hardware requirements

• Now we decided to use MS-SQL


• No longer we are independent from
Software however still we are
independent from hardware.
(1 of 9)
Logical Design

• Goal: design a database that is based on a specific data model


but independent of database physical details
• We need to consider DBMS requirements and boundaries
• Map the conceptual model to logical model components
Step1

• Validate the logical model using normalization


Step2

• Validate the logical model integrity constraints


Step3

• Validate the logical model against user requirements


Step4
(2 of 9)
logical Design
Map the conceptual model to logical model
Step1 components

• Goal: Create first relationships between entities and


details of attributes

Activities during to this stage :

• Requires that all objects in the conceptual


model be mapped to the specific constructs
used by the selected database model
(3 of 9)
logical Design
Map the conceptual model to logical model
Step1 components

Step Activity Strong entity is one resides in the “1”


side off all relationship
1 Map strong entities
2 Map supertype/subtype relationships
3 Map weak entities
fills
4 Map binary relationships
5 Map higher-degree relationships
has
has

teaches teaches
(4 of 9)
logical Design
Map the conceptual model to logical model
Step1 components

Ste Activity CREATE TABLE [Instructor] (


[InstructorID] Char(5),
p [InstructorLastName] VarChar(255),
1 Map strong entities [InstructorFirstName] VarChar(255),
[InstructorAddress] VarChar(255),
2 Map supertype/subtype relationships PRIMARY KEY ([InstructorID])
);
3 Map weak entities
4 Map binary relationships CREATE TABLE [Professor] (
[ProfessorID] Char(5),
5 Map higher-degree relationships [ProfessorLastName] VarChar(255),
[ProfessorFirstName] VarChar(255),
[ProfessorAddress] VarChar(255),
CREATE TABLE [Seat] ( PRIMARY KEY ([ProfessorID])
[ClassroomNumber] Char(10), );
[NumberOfSeat] Decimal(4,0),
PRIMARY KEY ([ClassroomNumber])
);
(5 of 9)
logical Design
Map the conceptual model to logical model
Step1 components
• For each weak entity, create a table that includes all of
it’s simple attributes. And include a foreign key points
Ste Activity to the primary key of the owner entity, where the
p foreign key and partial key will be the primary key of
the weak entity.
1 Map strong entities • A partial key uniquely identify a weak entity for a given
2 Map supertype/subtype relationships owner entity.
3 Map weak entities
fills
4 Map binary relationships
5 Map higher-degree relationships
has
has

teaches teaches
(6 of 9)
logical Design
Map the conceptual model to logical model
Step1 components
• Our example doesn’t have any super type or sub type
therefore we skipped that step we also mapped
Ste Activity almost all entitles but “Course “ which it has binary
p Relationship
1 Map strong entities
2 Map supertype/subtype relationships
3 Map weak entities
fills
4 Map binary relationships
5 Map higher-degree relationships
has
has

teaches teaches
(7 of 9)
logical Design

Validate the logical model using normalization


Step2

• Goal: Final data control step make sure all data redundant and normalized
during to mapping you may add/remove some of the attributes.

Activities uring to this stage :


• All tables should be at least 3NF
(8 of 9)
logical Design

Validate the logical model integrity constraints


Step3

• Goal: Make sure all defined constrains are in place and they are
working correctly

Activities during to this stage :

• Check all limits and boundaries.( up/down)


• Check Null is working as suppose to
• Check all codes are they ok
(9 of 9)
logical Design

Validate the logical model against user requirements


Step4

• Goal:Make sure before moving physical stage design what you


prepared meets with requirements and stakeholders need
Activities during to this stage :

• Validate all end-user data, transactions


• Validate security requirements

This is last step for logical design and we still are independent from Hardware.
Physical Design

• Volume of data to be used and data usage pattern are very important

Centralized vs distributed on
• Define data storage organization promises vs cloud ,indexes and
Step1 Views
Users , groups , security settings
• Define integrity and security measures
Step2 and assignments
Performance metrics ,read ,
• Determine performance measures write speed ,caching
Step3 Fine-tuning
Database Design Strategies
• Top-down design starts by identifying the data sets and then defines the data elements
for each of those sets
• Involves the identification of different entity types and the definition of each
entity’s attributes
• Bottom-up design first identifies the data elements (items) and then groups them
together in data sets
• First defines attributes, and then groups them to form entities
Centralized versus Decentralized Design
• Centralized design: process by which all
database design decisions are carried out
centrally by a small group of people
• Suitable in a top-down design approach
when the problem domain is relatively small,
as in a single unit or department in an
organization
Questions ?
Chapter 4 – Entity Relationship (ER)
Modeling

115
Learning Objectives
• After completing this chapter, you will be able to:
• Identify the main characteristics of entity relationship components
• Describe how relationships between entities are defined, refined, and
incorporated into the database design process
• See how ERD components affect database design and implementation
• Understand that real-world database design often requires the reconciliation
of conflicting goals

116
The Entity Relationship Model (ERM)
• Forms the basis of an entity relationship diagram (ERD)
• Conceptual database as viewed by end user

117
Database Components
• Database’s main components
1. Entities
2. Relationships
3. Attributes
4. Keys (PK,FK , etc..)
5. Cardinality
6. Existence Dependence
7. Relationship Strength
8. Relationship Participation
9. Relationship Degree
10.Associative (Composite) Entities

118
1.Entities (1 of 2)

Entity :“things” is any object , place , person, product etc. that you want to track
/store in database. ( think like it is container or template)
• Refers to the entity set and not to a single entity occurrence
• ERM corresponds to a table—not to a row—in the relational environment
• ERM refers to a table row as an entity instance or entity occurrence
• In Chen, Crow’s Foot, and UML notations, an entity is represented by a
rectangle that contains the entity’s name
• The entity name, a noun, is usually written in all capital letters

Entity Name

Entity Attributes

119
1.Entities (2 of 2)
Entity : “things” or any object , place , person, product etc. (representation of class )
Entity Instance : singe occurrence of an entity

•Most variations SQL are not case sensitive however name convention should be consistent

Entity
Instances

Attributes StudentID LastName FirstNAme StudentAddress

Entity 00001 Smith John 555 PineApple Lane


Instances 00002 Smith Jane 555 PineApple Lane
00003 Davidson Brandon 2223 Oak Street

120
2.Relationships
• Relationship illustrate the association between two entities and the entities
that participle in a relationship are called “participants”.
• They are presented as straight line and operates both direction.
• Usually relationship has a name expressed as a verb and written on the
relationship line
Works

fills
Teaches

121
3.Attributes (1 of 3)
• Characteristics of attributes

• Required attribute: must have a value and cannot be left empty


• Last name , phone Number , SSN etc..
• Optional attribute: does not require a value and can be left empty
• Middle Name , Employee’s spouse name etc..
• Domain: set of possible values for a given attribute
• Date of birth > Date , Student grades > Char > A to F or 0 to 100.

122
3.Attributes (2 of 3)
• Characteristics of attributes

• Identifier(Primary Keys): one or more attributes that uniquely identify each entity
instance
• SSN , LastName + EmailAddress , LastName + BirthDate , can be atomic or composite
• Key attributes are underlined in the table structure.
• StudentTable(Lastname , FistName, SSN, PhoneNumber , etc…)
• Composite identifier: primary key composed of more than one attribute that can be
subdivided to yield additional attributes
• Simple attribute: (Atomic) attribute that cannot be subdivided

123
3.Attributes (3 of 3)
• Single vs Multivalued attributes
• Single-valued attribute:attribute that has only a single value
• RoomNumber , CostumerId etc. Keep in mind that a single-valued attribute is not
necessarily a simple attribute.
• Multivalued attributes: attributes that have many values
• Hobbies ,PhoneNumbers etc..
• Derived vs Stored Attributes
• Derived Attribute : Attributes derived (computed) from other stored
attribute. For example age from Date of Birth and Today’s date.
• Stored Attribute : An attribute, which cannot be derived from other attribute,
is known as stored attribute. For example, BirthDate of employee
• Complex Attributes : composite + multivalued attributes = complex attributes. ( multiple address ,
multiple phone number with area codes etc..)

124
4.Keys

• Keys are very important part of Relational database model. They are
used to establish and identify relationships between tables and also
to uniquely identify any record or row of data inside a table.
• Primary Key (PK)
• Foreign Key (FK)

125
5.Connectivity and Cardinality
• Connectivity: noting but relationship classification
• Include 1:1, 1:M, and M:N
• Cardinality: expresses the minimum and maximum number of entity
occurrences associated (or linked ) to the number of occurrence in other entity
• In the ERD, cardinality is indicated by placing the appropriate numbers beside the
entities, using the format (x, y)
• In general ; As cardinality is the maximum number of connections between table rows
(either one or many), modality is the least number of row connections! Modality also
only has two options, 0 being the least or 1 being the least.

126
5.Connectivity and Cardinality
Example Max 1 Max M
Connectivity : either 1:1 or 1:M depends constrains Min 1 Min 1

Perhaps lecturer can teach only one class in semester !


Conceptual ! teaches

Let’s say we have businesses rule about number of teaches


class
(1 ,1) (1,4)
• Instructor can teach only 4 class per semester
<cardinality will be here 4>

• How about professors can teach 3 class per


semester ?

127
6.Existence Dependence (1 of 2)
• Existence dependence
• Entity exists in the database only when it is associated with another related
entity occurrence
• Referred to as a weak entity
• Has a primary key that is partially or totally derived from parent entity in
the relationship

What happens payment entity ,if we delete loan


entity ?

weak

128
6.Existence Dependence (2 of 2)
• Existence independence
• Entity exists apart from all of its related entities
• Referred to as a strong entity or regular entity

What happens loan entity ,if we delete payment


entity ?
strong

• Database designer determines whether an entity is weak or strong


• Based on business rules

129
7.Relationship Strength
• Weak (non-identifying) relationship
• It cannot exist without the entity which it has a relationship
• Primary key of the related entity does not contain a primary key component of the
parent entity
• Strong (identifying) relationships
• Primary key of the related entity contains a primary key component of the parent
entity

130
8.Relationship Participation
• Optional participation (min. cardinality is 0)
• Any instance of one entity might participate in a relationship with another entity, but
this is not compulsory
• Mandatory participation (min. cardinality is 1)
• At least one instance of one entity must participate in a relationship with another entity

There are four main types of membership relationships


1. Mandatory for both entities: A member of staff must be assigned to a given department, and any department must have staff. There
can be no unassigned staff, and it is not possible to have an 'empty' department.
2. Mandatory for one entity, optional for the other: Any member of staff must be attached to a department, but it is possible for a
department to have no staff allocated.
3. Optional for one entity, mandatory for the other: A member of staff does not have to be placed in a department, but all departments
must have at least one member of staff.
4. Optional for both entities: A member of staff might be assigned to work in a department, but this is not compulsory. A department
might, or might not, have staff allocated to work within it.

131
Relationship Degree
• Indicates the number of entities or participants
associated with a relationship
• Unary relationship: association is
maintained within a single entity
• Recursive relationship: relationship exists
within a single entity type

• Binary relationship: two entities are


associated

• Ternary relationship: three entities are


associated

132
Example
Vehicle Relationship Manufacturing Division
• Corolla
• Camry • Compaq Car
• Rav4 • SUV
• C-HR • Jeep
• Prius Built • Van
• Yaris • Truck
• Highlander • Hybrid
• 4Runner • Maintenance
• Sienna • Logistic Support
• Tacoma
• Tundra

133
Example •

Vehicle Built Manufacturing Division


• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

134


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

What is the degree of relationship ?


What is the connectivity ( relationship classification)?
What is the participation ?
What is the Cardinality ?

135


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

What is the degree of relationship ?


<Indicates the number of entities or participants associated with a relationship>
What is the connectivity ?
< nothing but relationship classification 1:1, 1:M, and M:N>

136


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

What is the degree of relationship ? =2


<Indicates the number of entities or participants associated with a relationship>
What is the connectivity ? = 1:M
< nothing but relationship classification 1:1, 1:M, and M:N>

137


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

What is the participation ?


<Optional participation (min. cardinality is 0)
Any instance of one entity might participate in a relationship with another entity, but this is not
compulsory.>
<Mandatory participation (min. cardinality is 1)
Every instance of one entity must participate in a relationship with another entity>
138


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars No Relationship
• Sienna • with any Car
• Tacoma •
• Tundra •

Participation (Vehicle)=1 Participation (Division)=0


What is the participation ?
<Optional participation (min. cardinality is 0)
Any instance of one entity might participate in a relationship with another entity, but this is
not compulsory.>
<Mandatory participation (min. cardinality is 1)
Every instance of one entity must participate in a relationship with another entity> 139


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

What is the Cardinality ?


<In general ; As cardinality is the maximum number of connections
between table rows (either one or many) >

140


Example
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

Cardinality (Vehicle)=1 Cardinality (Division)=4

What is the Cardinality ?


<In general ; As cardinality is the maximum number of connections
between table rows (either one or many) >
141
Crow’s Foot Notation

142
The Entity Relationship Model

143
Crow’s Foot Notation Example

Registration for exams.

144
Chen Notation

145
Source :Wikipedia 146


Let’s call our example for Crow’s foot
Vehicle Built Manufacturing Division
• Corolla •
• Camry • • Compaq Car
• Rav4 • • SUV
• C-HR • • Van
• Prius • • Truck
• Yaris • • Hybrid
• Highlander • • Farm Vehicles
• 4Runner • • Electric Cars
• Sienna •
• Tacoma •
• Tundra •

Cardinality (Vehicle)=1 Cardinality (Division)=4


Participation (Vehicle)=1 Relationship =Built
Participation (Division)=0

(min,max) (min,max)
(0,M) (1,1) 147
Associative (Composite) Entities
• Used to represent an M:N relationship between two or more entities
• Has a 1:M relationship with the parent entities
• Composed of the primary key attributes of each parent entity
• May also contain additional attributes that play no role in connective process

M N

c
1 M M 1

148
Database Design Challenges: Conflicting Goals
• Database designers must often make design compromises that are triggered by
conflicting goals
• Fallow the design standards and best practices.
• High processing speed may limit the number and complexity of logically desirable relationships
• Maximum information generation may lead to loss of clean design structures and high
transaction speed

• Best practices.
• Prevent bad data from being entered into tables of interest.
• Enforce business logic at the database-level.
• Documentation of important database rules.
• Enforce relational integrity between any number of tables.
• Improve database performance.
• Enforce uniqueness.

149
Developing an ER Diagram-Example (1/4)
• Activities involved in building and ERD
1. Create a detailed narrative of the organization’s description of operations
2. Identify business rules based on the descriptions
3. Identify main entities and relationships from the business rules
4. Develop the initial ERD
5. Identify the attributes and primary keys that adequately describe entities
6. Revise and review ERD

150
Did you finalize your ERD?

151
Developing an ER Diagram-Example (2/4)
Task 1.Create a detailed narrative of the organization’s description of operations

• We have small training center and we teach number of topics. We have staff and
teachers who works in 5 classrooms every night.

Task 2. Identify business rules based on the descriptions


• Every student must take at least one course.
• Each course should have unique course id ( and perhaps section)
• Each course must have a name.
• Each professor must teach at least one class per semester
• Each professor must have their own unique ID , last name and name.

152
Developing an ER Diagram-Example (3/4)

Task3 :Identify main entities and relationships from the business rules

• Easy to understand
• Simple boxes ,lines and text
• Easy to enhanced
• Highly abstract –no details
Task4 :Develop the initial ERD
• Only “Entities” visible [tables]
• Translation from business You are in
requirements Conceptual Data Modeling Stage..
Developing an ER Diagram-Example (4/4)

Task5 :Identify the attributes and primary keys that adequately describe entities
Task6: Revise and review ERD
• Enhanced from conceptual model
• Add attributes for each entity
You are in
• Add Relationships
Logical Data Modeling stage..
• Add Cardinalates
• Decide Key attributes
• Primary keys
• Foreign Keys
• Still generic not pointing to any specific database.
register teach
Questions ?

155
Chapter 3 – The Relational Database Model

156
Engin Calisir University of Texas at Dallas
Learning Objectives
• After completing this chapter, you will be able to:
• Describe the relational database model’s logical structure
• Identify the relational model’s basic components and explain the structure,
contents, and characteristics of a relational table
• Use relational database operators to manipulate relational table contents
• Explain the purpose and components of the data dictionary and system
catalog
• Identify appropriate entities and then the relationships among the entities in
the relational database model
• Describe how data redundancy is handled in the relational database model
• Explain the purpose of indexing in a relational database

157
A Logical View of Data
• Relational database model enables logical representation of the data and its relationships
• Logical simplicity yields simple and effective database design methodologies
• The logical view is facilitated by the creation of data relationships based on a logical construct called a
relation
• No worries about hardware requirements at this stage

Example of 1:M
Example of 1:1

fills teaches

158
Tables and Their Characteristics
1 A table is perceived as a two-dimensional structure
composed of rows and columns.

2 Each table row (tuple) represents a single entity


occurrence within the entity set.

3 Each table column represents an attribute, and


each column has a distinct name.

4 Each table must have an attribute or combination


of attributes that uniquely identifies each row.

5 All values in a column must conform to the same


data format.
6 Each column has a specific range of values known
as the attribute domain.
7 The order of the rows and columns is immaterial to
the DBMS.

8 Each intersection of a row and column represents a


single data value.

159
Types of Keys (1/2)
• Keys consist of one or more attributes that determine other attributes
• Ensure that each row in a table is uniquely identifiable
• Establish relationships among tables and to ensure the integrity of the data
• Superkey: key that can uniquely identify any row in the table ( it can be one or more
attributes)
• Primary key (PK) is a super key
• ID or ID+SensorID ID SensorID Time Value
1 3 10 14.3
2 2 16 124
3 1 40 3
• Candidate key: minimal super key 4 3 70 14.4
• ID 5 1 100 8
• How about “Time” ? 6 2 116 186
• How about PK ?

160
Types of Keys (2/2)
• Simple Key :LastName or FirstName or ID
• Composite key: key that is composed of more than one attribute
• LastName+FirstName+StudentID or ID+SensorID etc..
• Compound key : more than one simple key’s combination
• StudentID+StudentEmail or StudentID+CourseID ;
• Difference between composite and component keys ;
• if CourceID is combination of more then one attribution then StudentID+CourseID became a
CompositeID
• Some DB vendors use both word is interchangeable
• Foreign key: primary key of one table that has been placed into another table to create a common
attribute
• Secondary key: key used strictly for data retrieval purposes
• Secondary key is another candidate key that is not selected as a PK.

161
Dependencies
• Determination
• State in which knowing the value of one attribute makes it possible to determine the value of
another
• Establishes the role of a key
• Based on the relationships among the attributes
• Functional dependence: value of one or more attributes determines the value of one or more
other attributes
• Determinant: attribute whose value determines another
• Dependent: attribute whose value is determined by the other attribute
• Full functional dependence: entire collection of attributes in the determinant is necessary for the
relationship <perhaps for OODB? >

162
Database Integrity
• Entity integrity: condition in which each row in the table has its own unique
identity
• All of the values in the primary key must be unique
• No key attribute in the primary key can contain a null

• Referential integrity: every reference to an entity instance by another entity


instance is valid

163
Integrity Rules
• Ways to handle nulls
• Flags
• Special codes used to indicate the absence of some value
• Constraints
• NOT NULL constraint: placed on a column to ensure that every row in the table has a value for
that column
• UNIQUE constraint: restriction placed on a column to ensure that no duplicate values exist for
that column

164
What is null ?
• Null: absence of any data value
• Unknown attribute value, known but missing attribute value, or inapplicable condition
• Ways to handle nulls
• Flags
• Special codes used to indicate the absence of some value
• Constraints
• NOT NULL constraint: placed on a column to ensure that every row in the table has a
value for that column
• UNIQUE constraint: restriction placed on a column to ensure that no duplicate values
exist for that column

165
Relational Algebra
• Theoretical way of manipulating table contents using relational operators
• Relational algebra is a procedural query language, which takes instances of
relations as input and yields instances of relations as output.
• It uses operators to perform queries.
• An operator can be either unary or binary.
• They accept relations as their input and yield relations as their output

166
Relational Set Operators (1 of 13)
• Select (Restrict) Single Table (Unary Operations)
• Project

• Union Multiple Table Operations (in Math)


• Intersect
• Difference
• Product
• Division

Joins Multiple Table Operations


• Natural Join with selection options
• Equijoin
• Theta Join (non Equijoin)
• Inner join
• Left Outer Join
• Right Outer Join
• Full Join
• Cross Join

167
Relational Set Operators (2 of 13)

168
Relational Set Operators (3 of 13)
• SELECT (Restrict)
• Unary operator that yields a horizontal subset of a table
• Only one table is a “input”
• Output is always subset of input table
• SELECT ( Condition) FROM Table;
Select ;
ID SensorID Time Value Select Time < 100 ID SensorID Time Value

1 3 10 14.3
Horizontal
1 3 10 14.3
2 2 16 124 • No limit
2 2 16 124
3 1 40 3
3 1 40 3
Select ID = 1 4 3 70 14.4
4 3 70 14.4
5 1 100 8 ID SensorID Time Value
6 2 116 186 1 3 10 14.3

169
Relational Set Operators (4 of 13)
Unary Operators Example
PROJECT

• Unary operator that yields a vertical subset of a table


• Only one table is a “input”
• Output is always subset of input table
SELECT (Column) from Table;

Project ; SensorID
3
ID SensorID Time Value Project SensorID 2
1 3 10 14.3 1
2 2 16 124 3
3 1 40 3 1 • Vertical
2 •
4 3 70 14.4 Time Value No limit
5 1 100 8 10 14.3
6 2 116 186 16 124
Project Time and Value 40 3
70 14.4
100 8
116 186

170
Relational Set Operators (5 of 13)
• UNION
• Combines all rows from two tables, excluding duplicate rows
• Union-compatible: tables share the same number of columns, and their corresponding
columns share compatible domains
• No conditions
SELECT column_name(s) FROM table1
UNION
SELECT column_name(s) FROM table2;

171
Relational Set Operators (6 of 13)
• Full Outer Join
• Specifies a row from either left or right table
• Union-compatible: tables share the same number of columns, and their
corresponding columns share compatible domains
SELECT * FROM table_A
FULL OUTER JOIN table_B
ON table_A.A=table_B.A;

172
What we learned so far

 Select (Restrict) Single Table (Unary Operations)


 Project

 Union Multiple Table Operations (in Math)


• Intersect
• Difference
• Product
• Division

Joins Multiple Table Operations


• Natural Join with selection options
• Equijoin
• Theta Join (non Equijoin)
• Inner join
• Left Outer Join
• Right Outer Join
 Full (Other)Join
• Cross Join

173
Relational Set Operators (7 of 13)
• Equijoin
• The SQL EQUI JOIN is a simple SQL join uses the equal sign(=) as the comparison operator for
the condition.
• It has two types - SQL Outer join and SQL Inner join

SELECT * FROM Employee


FULL OUTER JOIN Departments ON Employee.EmpID = Departments.EmpID

• Theta ( non-Equijoin)
• The SQL NON EQUI JOIN is a join uses comparison operator other than the equal sign like >, <,
>=, <= with the condition

SELECT * FROM Employee


FULL OUTER JOIN Departments ON Employee.EmpID < 200
174
Relational Set Operators (8 of 13)
• Intersect
• Yields only the rows that appear in both tables A∩B
• Tables must be union-compatible to yield valid results

Inner join Left Join Right Join

175
Relational Set Operators (9 of 13)

176
Relational Set Operators (10 of 13)
• Natural Join
• It is similar like equijoin based on table level and all columns names and type must be same in both
tables .Repeated columns and rows are avoided.
• SQL use mostly inner join form with SELECT statement based on (ON predicate) columns -Eliminate
duplicated rows not columns

SELECT * FROM Table1 NATURAL JOIN Table2;


SELECT tb1.ssn FROM Table1, tb1 INNER JOIN Table2 ,tb2
ON tb1.ssn=tb2.ssn;
3 Steps process
PRODUCT >SELECT >PROJECT

177
Relational Set Operators (11 of 13)
• PRODUCT
• Yields all possible pairs of rows from two tables
• Cartesian or Cross Joins ( advanced class topic)
• DIVIDE
• Yields all rows in one table that are not found in the other table
• Tables must be union-compatible to yield valid results .
• SQL does not use DIVIDE relational operator in real life instead SQL use
either sub-query or “where …except “ type logical implication.
• Example :
SELECT * FROM suppliers as s
WHERE NOT EXISTS (( SELECT p.pid FROM parts as p )
EXCEPT
(SELECT sp.pid FROM supplies sp WHERE sp.sid = s.sid ) );

178
Relational Set Operators (12 of 13)
• DIFFERENCE
• It is Minus operator and Oracle use “MINUS” and SQL use “EXCEPT” command not “-”
• Yields all rows in one table that are not found in the second table.

SELECT column1 , column2 , ... columnN


FROM table_name
WHERE condition
MINUS (or EXCEPT)
SELECT column1 , column2 , ... columnN
FROM table_name
WHERE condition;
• Left outer Join and Difference are different do not confuse !!

179
Relational Set Operators (13 of 13)

180
What we learned so far
 Select (Restrict) Single Table (Unary Operations)
 Project

 Union Multiple Table Operations (in Math)


 Intersect
 Difference
 Product
 Division

Joins Multiple Table Operations


 Natural Join with selection options
 Equijoin
 Theta Join (non Equijoin)
 Inner join
 Left Outer Join
 Right Outer Join
 Full (Other)Join
• Cross Join

181
Confused ?

182
Data Dictionary /System Catalog
• Description of all tables, views etc., in the database created by the user and
designer ,it is vital for DB.

183
Microsoft System Catalog
• https://docs.microsoft.com/en-us/sql/relational-databases/system-catalo
g-views/querying-the-sql-server-system-catalog-faq?view=sql-server-201
7

• Use generally ;
• Documentation
• Communicate with other users/programs about a common meaning of elements
• Troubleshooting

• Homonyms and synonyms must be avoided to lessen confusion


• Homonym: same name is used to label different attributes
• Synonym: different names are used to describe the same attribute
184
Relationships within the Relational Database (1 of 4)
• One-to-many (1:M)
• Most common norm for relational databases
• One-to-one (1:1)
• One entity can be related to only one other entity and vice versa , this kind of relationship is very rare
• Many-to-many (M:N)
• Implemented by creating a new entity in 1:M relationships with the original entities , most complex and centralized
tables have this kind of relationship
• Composite entity required (i.e., bridge or associative entity): helps avoid problems inherent to M:N relationships ,
• Logical 3rd table need to solve relationship with 1:M relationship with original entites
• Includes the primary keys of tables to be linked
• Self Referencing (AKA Recursive) Relationship
• This is used when a table needs to have relationship with itself
• Sponsorship , costumer referral ,assembly parts in part table etc..

• One-to-None
• Single tables with no relationship with anyone.
• Log tables, some security tables etc..

185
Relationships within the Relational Database (2 of 4)
One-to-many (1:M) ;Most common norm for relational databases

186
Relationships within the Relational Database (3 of 4)
One-to-one (1:1)
One entity can be related to only one other entity and vice versa , this kind of relationship is very
rare

187
Relationships within the Relational Database (4 of 4)
Many-to-many (M:N)

Always you need 3rd relationship !!


188
Indexes
• Orderly arrangement to logically access rows in a table
• Index key: index’s reference point that leads to data location identified by the
key
• Unique index: index key can have only one pointer value associated with it
• Each index is associated with only one table
• The index key can have multiple attributes
• They are not fee ! Specially they are very costly in Cloud environment
• Increase performance and table space
• Indexing is key element in RDBMS

189
Codd’s Relational Database Rules (1 of 2)
Rule Rule Name Description
1 Information All information in a relational database must be logically represented as column values in rows within
tables.
2 Guaranteed access Every value in a table is guaranteed to be accessible through a combination of table name, primary key
value, and column name.
3 Systematic treatment of nulls Nulls must be represented/supported and treated in a systematic way, independent of data type.
4 Dynamic online catalog based The metadata must be stored and managed as ordinary data—that is, in tables within the database; such
on the relational model data must be available to authorized users using the standard database relational language.
Data Dictionary/catalog must exist.
5 Comprehensive data The relational database may support many languages; however, it must
sublanguage support one well-defined, declarative language as well as data definition,
view definition, data manipulation (interactive and by program), integrity
constraints, authorization, and transaction management (begin, commit,
and rollback).it is SQL
6 View updating Any view that is theoretically updatable must be updatable through the
system. Logical view tables must be update according to system.
7 High-level insert, update, and The database must support set-level inserts, updates, and deletes.
delete

190
Codd’s Relational Database Rules (2 of 2)
Rule Rule Name Description

8 Physical data independence Application programs and ad hoc facilities are logically unaffected when physical access methods
or storage structures are changed.
9 Logical data independence Application programs and ad hoc facilities are logically unaffected when changes are made to the
table structures that preserve the original table values (changing order of columns or inserting
columns).

10 Integrity independence All relational integrity constraints must be definable in the relational language and stored in the
system catalog, not at the application level.
11 Distribution independence The end users and application programs are unaware of and unaffected by the data location
(distributed vs. local databases).
12 Nonsubversion If the system supports low-level access to the data, users must not be allowed to bypass the
integrity rules of the database. SQL is hig-level database query language.

13 Rule zero /Foundation Rule relational database management system must be able to use the relational model functionalities
to organize, store, retrieve and manipulate the data.

1.FoxPro database system follows a minimum of 7 Codd’s rules.


2.Microsoft SQL Server follows around 11 Codd’s rules.
3.Oracle database system follows 11 Codd’s rules as well.
191

You might also like