Professional Documents
Culture Documents
DBMS Chapter 2
DBMS Chapter 2
DATABASE MODELING
1
Database Environment
A major aim of DB is to provide users with an abstract view
of data, hiding certain details of how data is stored and
manipulated.
Since DB is a shared resources each users may require a
different view of the data.
To satisfy these needs, the architecture of most commercial
DBMSs available today is based on the so-called ANSI-
SPARC Architecture.(America National Standard Institution
Standard Planning and Requirement committee).
An early proposal for a standard terminology and general
architecture for DB was produced in 1971 by DBTG
(Database task force).
– Two – level architecture (Schema, subschema)
The ANSI/SPARC produced a similar terminology and
architecture in 1975.
But a three level approach (external,
conceptual, internal)
External is the way users are perceiving the data.
Internal is the way DBMS and OS perceiving the
data.
Conceptual level provide the mapping and the
desired independence b/n external and internal
level.
Below the internal level there is physical level that must be
managed by the OS under the direction of DBMS.
The three Level ANSI-SPARC
The objective of the 3 level Architecture is to separate each user’s view of
the database from the way the database is physically represented.
What are the reasons for the desirability of the separation?
Each user should be able to access the same data, but have a different
customized view of the data.
Since a database is a shared resource, each user may require a different
view of the data held in the database.
Each user should be able to change the way he or she views the data,
with out affecting other users.
Users should not have to deal directly with physical database storage
details.
The DBA should be able to change the database storage structures without
affecting the user’s views.
A) External level :
users’ view of the database has an external schema that:
describes that part of the database that is relevant to each
user.
It includes a number of external schemas or user views.
Each of these views or external schemas describes a part of
the database of interest to a particular group of users.
This allows the users to see only those parts of the database
that are relevant to them.
For example, one user may view dates in the form
(day,month, year), while another may view dates as
(year,month,day), some views may include derived or
calculated data, data not actually stored in the database,
Entities, attributes or relationships that are not of interest to
the users may still be represented in the database, but the
users will be unaware of them.
B) Conceptual level
The community view of the database, has conceptual
schema:
describes what data is stored in the database and the
relationships among the data.
The conceptual level represents:
All entities, attributes and their relationships,
The constraints on the data,
semantic information about the data;
Security and integrity information.
– It is a complete view of the data requirements of
the organization.
– Any data available to a user must be contained in,
or derivable from conceptual level.
C) Internal level :
describes the physical storage structure of the database in the
computer.
It has an internal schema which defines the storage of data.
It uses a physical data model which shows how data is
organized on the machine.
The internal level is concerned with such things as:
Storage space allocation for data
Record description for storage
Record placement
From the three level architecture, conceptual
modeling/conceptual db design is the “heart” of the database.
Because it is independent of target DBMS or applications
programs.
The following two figures describes clearly the three level of
Database Architecture.
• ANSI-SPARC Architecture and Database Design Phases
Schemas, Instances and Data Models
In any data model, it is important to distinguish between the
description of the database and the database itself.
The description of a database is called the database
schema, which is specified during database design and is
not expected to change frequently.
Most data models have certain conventions for displaying
schemas as diagrams.
A displayed schema is called a schema diagram.
The following figure, shows a schema diagram for the
database.
The diagram displays the structure of each record type but
not the actual instances of records.
We call each object in the schema—such as STUDENT or
COURSE—a schema construct
11
• Figure: Schema Diagram
12
A schema diagram displays only some aspects of a schema,
such as the names of record types and data items, and some
types of constraints.
Other aspects are not specified in the schema diagram; for
example, in the previous figure shows neither the data type
of each data item, nor the relationships among the various
files.
Many types of constraints are not represented in schema
diagrams.
A constraint such as students majoring in computer science
must take CS1310 before the end of their sophomore year is
quite difficult to represent diagrammatically.
The actual data in a database may change quite frequently.
For example, the database shown in the previous figure
changes every time we add a new student or enter a new
grade. 13
The data in the database at a particular moment in time is
called a database state or snapshot.
It is also called the current set of occurrences or instances
in the database.
In a given database state, each schema construct has its
own current set of instances; for example, the STUDENT
construct will contain the set of individual student entities
(records) as its instances.
Many database states can be constructed to correspond to
a particular database schema.
Every time we insert or delete a record or change the
value of a data item in a record, we change one state of
the database into another state.
The distinction between database schema and database state
is very important.
14
When we define a new database, we specify its database
schema only to the DBMS.
At this point, the corresponding database state is the empty
state with no data.
We get the initial state of the database when the database is
first populated or loaded with the initial data
15
Data Models
is a collection of concepts that can be used to describe the
structure of a database or
is a relatively simple representation, usually graphical, of
more complex real-world data structures.
By structure of a database we mean the data types,
relationships, and constraints that apply to the data.
In general terms, a model is an abstraction of a more
complex real-world object or event.
A model’s main function is to help you understand the
complexities of the real-world environment.
Within the database environment, a data model represents:
data structures and their characteristics,
relationships, constraints,
transformations, and
other constructs with the purpose of supporting a specific
problem domain.
16
Note:-The terms data model and database model are
often used interchangeably.
Data modeling is an iterative, progressive process.
You start with a simple understanding of the problem
domain, and as your understanding of the problem
domain increases, so does the level of detail of the data
model.
The final data model is in effect a “blueprint” containing
all the instructions to build a database that will meet all
end-user requirements.
This blueprint is narrative and graphical in nature,
meaning that it contains both text descriptions in plain,
unambiguous language and clear, useful diagrams
depicting the main data elements.
17
An implementation-ready data model should contain at
least the following components:
A description of the data structure that will store the
end-user data.
A set of enforceable rules to guarantee the integrity of
the data.
A data manipulation methodology to support the real-
world data transformations.
Keep in mind that a house blueprint is an abstraction;
you cannot live in the blueprint.
Similarly, the data model is an abstraction; you cannot
draw the required data out of the data model.
Just as you are not likely to build a good house without
a blueprint, you are equally unlikely to create a good
database without first creating an appropriate data
model.
18
Categories of Data Models
Data models can be categorize according to the types of
concepts they use to describe the database structure.
High-level or conceptual data models provide concepts that
are close to the way many users perceive data.
Low-level or physical data models provide concepts that
describe the details of how data is stored on the computer
storage media, typically magnetic disks.
Concepts provided by low-level data models are generally
meant for computer specialists, not for end users.
Between these two extremes is a class of representational (or
implementation) data models.
This model provide concepts that may be easily understood by
end users but that are not too far removed from the way data
is organized in computer storage.
Representational data models hide many details of data
storage on disk but can be implemented on a computer system
directly.
19
A data model is a description of the way that data is stored in
a database.
Data model helps to understand the relationship between
entities and to create the most effective structure to hold
data.
Data Model is a collection of tools or concepts for describing
Data
Data relationships
Data semantics
Data constraints
The main purpose of Data Model is to represent the data in
an understandable way.
Categories of data models include:
Record-based
Object-based
Physical 20
Record-based Data Models:
Consist of a number of fixed format records.
Each record type defines a fixed number of fields,
Each field is typically of a fixed length.
Hierarchical Data Model
Network Data Model
Relational Data Model
Entity Relationship Model
21
1. Hierarchical Model
simplest data model
record type is referred to as node or segment
The top node is the root node
Nodes are arranged in a hierarchical structure as sort of
upside-down tree
A parent node can have more than one child node
A child node can only have one parent node
The relationship between parent and child is one-to-many
Relation is established by creating physical link between
stored records (each is stored with a predefined access path
to other records)
To add new record type or relationship, the database must
be redefined and then stored in a new form.
22
ADVANTAGES of Hierarchical Data Model:
simple to construct and operate on
Corresponds to a number of natural hierarchically
organized domains - e.g., assemblies in manufacturing,
personnel organization in companies
construct using simple language- uses constructs like
GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT etc. 23
DISADVANTAGES of Hierarchical Data Model:
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization"
2. Network Model
Allows record types to have more than one parent
unlike hierarchical model
A network data models sees records as set members
Each set has an owner and one or more members
Allow no many to many relationship between entities
Like hierarchical model network model is a collection
of physically linked records.
Allow member records to have more than one owner
24
ADVANTAGES of Network Data Model:
Able to model complex relationships and represents semantics of
add/delete on the relationships.
Handle most situations for modeling using record types and
relationship types.
Language is navigational- uses constructs like FIND, FIND
member, FIND owner, FIND NEXT within set, GET etc.
Programmers can do optimal navigation through the database.
25
DISADVANTAGES of Network Data Model:
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread
through a set of records.
Little scope for automated "query optimization”
3. Relational Data Model
Terminologies originates from the branch of mathematics
called set theory and relation
Can define more flexible and complex relationship
Viewed as a collection of tables called “Relations”
equivalent to collection of record types
Relation: Two dimensional table
represent information or data in the form of tables
rows and columns
26
A row of the table is called tuple equivalent to record
A column of a table is called attribute equivalent to
fields
Data value is the value of the Attribute
Records are related by the data stored jointly in the fields
of records in two tables or files.
The related tables contain information that creates the
relation
The tables seem to be independent but are related some
how.
No physical consideration of the storage is required by
the user
Many tables are merged together to come up with a new
virtual view of the relationship
27
The rows represent records (collections of information about
separate items)
The columns represent fields (particular attributes of a record)
Conducts searches by using data in specified columns of one
table to find additional data in another table
In conducting searches, a relational database matches
information from a field in one table with information in a
corresponding field of another table to produce a third table
that combines requested data from both tables
•
28
Properties of Relational Databases
Each row of a table is uniquely identified by a PRIMARY
KEY composed of one or more columns
Each tuple in a relation must be unique
Group of columns, that uniquely identifies a row in a table is
called a CANDIDATE KEY
ENTITY INTEGRITY RULE of the model states that no
component of the primary key may contain a NULL value.
A column or combination of columns that matches the
primary key of another table is called a FOREIGN KEY.
Used to cross-reference tables.
The REFERENTIAL INTEGRITY RULE of the model
states that, for every foreign key value in a table there must
be a corresponding primary key value in another table in
the database or it should be NULL.
All tables are LOGICAL ENTITIES 29
A table is either a BASE TABLES (Named Relations) or
VIEWS (Unnamed Relations)
Only Base Tables are physically stores
VIEWS are derived from BASE TABLES with SQL
instructions like: [SELECT .. FROM .. WHERE ..
ORDER BY]
Is the collection of tables
– Each entity in one table
– Attributes are fields (columns) in table
Order of rows and columns is immaterial
Entries with repeating groups are said to be un-
normalized
Entries are single-valued
Each column (field or attribute) has a distinct name
All values in a column represent the same attribute and
30
have the same data format
Building Blocks of the Relational Data Model
The building blocks of the relational data model are:
Entities: real world physical or logical object
Attributes: properties used to describe each Entity or real
world object.
Relationship: the association between Entities
Constraints: rules that should be obeyed while
manipulating the data.
31
Relational Data Model
37
Data Independence
The three-schema architecture can be used to further explain the
concept of data independence.
It can be defined as the capacity to change the schema at one level
of a database system without having to change the schema at the
next higher level.
We can define two types of data independence:
1. Logical data independence is the capacity to change the
conceptual schema without having to change external schemas or
application programs.
We may change the conceptual schema to expand the database (by
adding a record type or data item), to change constraints, or to
reduce the database (by removing a record type or data item).
In the last case, external schemas that refer only to the remaining
data should not be affected.
For example, an existing application may access customer
records in a database.
38
If additional attribute is added to the customer schema,
for example a reference indicating passport number, then
only applications or views that need to access the new data
item need to be modified.
2. Physical data independence is the capacity to change the
internal schema without having to change the conceptual
schema.
Hence, the external schemas need not be changed as well.
Changes to the internal schema may be needed because some
physical files were reorganized—for example, by creating
additional access structures—to improve the performance of
retrieval or update.
If the same data as before remains in the database, we should
not have to change the conceptual schema.
An example of such a change could be the addition of a new
index for accessing customer addresses.
This does not affect the conceptual schema.
DBMS by utilizing the new access path.
39
Generally, physical data independence exists in most
databases and file environments where physical details such
as:
the exact location of data on disk,
hardware details of storage encoding,
placement, compression,
splitting, merging of records, and so on are hidden from the
user.
Applications remain unaware of these details.
On the other hand, logical data independence is harder to
achieve because:
it allows structural and constraint changes without affecting
application programs—a much stricter requirement.
Whenever we have a multiple-level DBMS, its catalog must
be expanded to include information on how to map requests
and data among the various levels. 40
The DBMS uses additional software to accomplish these
mappings by referring to the mapping information in
the catalog.
Data independence occurs because when the:
schema is changed at some level,
schema at the next higher level remains unchanged;
only the mapping between the two levels is changed.
Hence, application programs referring to the higher-
level schema need not be changed
41
SW Component of DBMS
DBMSs are highly complex and sophisticated
pieces of software.
It is not possible to generalize the component
structure of DBMS as it varies greatly from
system to system.
But we can partition it in to several SW
component (module) each with a specific
operation.
42
• SW Components of a DBMS
43
Query Processor: transform queries in to low level
instruction and direct to the database manager.
DB manager accept instructions and examine the
external and conceptual schema. Then call the file
manager.
DML preprocessor: convert DML statements
embedded in application programs.
DDL compiler: convert DDL statement in to a set of
tables containing meta-data. And store in the system
catalog.
Catalog manager :manages access and maintain the
system catalog . It is the most accessed by DBMS
component.
44
• Major Components of DB Manager
45
Major component of DB Manager
Authorization Control : check the user authorization.
Command Processor : process the request
Query Optimizer: determine optimal strategy for queries.
Transaction Manager: perform the requested operation.
Scheduler: it makes concurrent operation to proceed with out
conflicting each other.
Recovery Manager: ensure to remain in consistent state in
the presence of failure.
Buffer manager : responsible to transfer data b/n main
memory and secondary memory.
Multi-User DBMS Architectures
The common Architecture to implement multi user
database Management system.
1. Teleprocessing
2. File – Server
3. Client - Server 46
Teleprocessing
Traditional architecture.
Single mainframe with a number of terminals
attached.
Trend is now towards downsizing.
47
File-Server
File-server is connected to several workstations
across a network.
Disadvantages include:
Significant network traffic.
Copy of DBMS on each workstation.
Concurrency, recovery and integrity control more
complex. 48
File-Server Architecture
49
File-Server Architecture
Client (tier 1) manages user interface and runs
applications.
Server (tier 2) holds database and DBMS.
Advantages include:
wider access to existing databases;
increased performance;
possible reduction in hardware costs;
reduction in communication costs;
increased consistency.
50
Traditional Two-Tier Client-Server
51
Traditional Two-Tier Client-Server
52
Three-Tier Client-Server
Client side presented two problems preventing true
scalability:
‘Fat’ client, requiring considerable resources on client’s computer to
run effectively.
Significant client side administration overhead.
54
Database Development Lifecycle (Can not be developed
by judgment!!!)
Database Planning
System Definition
Requirements Collection and Analysis
Database Design
DBMS Selection
Prototyping
Implementation
Data conversion and loading
Testing
Operational maintenance
55
Database Development Lifecycle
56
Database Planning:
Analyze the Company Situation
What is the organization’s general operating environment, and
what is its mission (problem of the statement, objective)within that
environment?
What is the organization’s structure?
Identifies work to be done; the resources with which to do it; and the money to pay
for it all.
preliminary assessment (time, budget, etc..)
Integrated with the overall IS strategy of the organization.
System Definition
How does the existing system function?
What input does the system require?
What documents and reports does the system generate?
How is the system output used? By Whom?
What are the operational relationships among business
units?
What are the limits and constraints imposed on the system?
Use modeling techniques for formal documentation (Eg.
57
DFD)
Requirements Collection and Analysis
Designer’s efforts are focused on
Information needs, Information users.
Information sources. Information constitution.
Sources of information for the designer
Developing and gathering end user data views
Direct observation of the current system: existing and desired output
Interface with the systems design group
The designer must identify the company’s business rules and analyze their
impacts.
Use different fact finding techniques (interview, questionnaire, document
analysis)
Database Design – Converting Requirement to Database
Major aims
Identify functions the database will perform
Represent data and relationships between data that is required by all major
application areas and user groups.
Provide data model that supports any transactions required on the data.
Specify a minimal design that is appropriately structured to achieve the stated
performance requirements for the system such as response times. 58
DBMS Selection
The selection of an appropriate DBMS to support the database
application.
Undertaken at any time prior to logical design provided sufficient
information is available regarding system requirements.
Also design the user interface and the application programs using
the selected DBMS
59
Implementation
The physical realization of the database and application designs.
Use DDL of DBMS to create database schemas and
empty database files.
Use DDL to create any specified user views.
Data Conversion and Loading
Transferring any existing data into the new database and
converting any existing applications to run on the new database.
Only required when a new database system is replacing
an old system.
DBMS normally have a utility that loads existing files
into the new database.
Where applicable, it may be possible to convert and use
application programs from the old system for use by the
new system. 60
Testing
The process of executing the application programs with the intent
of finding errors.
Use carefully planned test strategies and realistic data.
Testing cannot show the absence of faults; it can show only
that software faults are present.
Demonstrates that database and application programs appear
to be working according to requirements.
Operational Maintenance
The process of monitoring and maintaining the system
following installation.
Monitoring the performance of the system.
If performance falls, may require reorganization of the
database.
Maintaining and upgrading the database application (when
required).
61
Incorporating new requirements into the database application.