You are on page 1of 61

CHAPTER TWO

DATABASE MODELING

1
Database Environment
 A major aim of DB is to provide users with an abstract view
of data, hiding certain details of how data is stored and
manipulated.
 Since DB is a shared resources each users may require a
different view of the data.
 To satisfy these needs, the architecture of most commercial
DBMSs available today is based on the so-called ANSI-
SPARC Architecture.(America National Standard Institution
Standard Planning and Requirement committee).
 An early proposal for a standard terminology and general
architecture for DB was produced in 1971 by DBTG
(Database task force).
– Two – level architecture (Schema, subschema)
 The ANSI/SPARC produced a similar terminology and
architecture in 1975.
 But a three level approach (external,
conceptual, internal)
 External is the way users are perceiving the data.
 Internal is the way DBMS and OS perceiving the
data.
 Conceptual level provide the mapping and the
desired independence b/n external and internal
level.
 Below the internal level there is physical level that must be
managed by the OS under the direction of DBMS.
The three Level ANSI-SPARC
 The objective of the 3 level Architecture is to separate each user’s view of
the database from the way the database is physically represented.
 What are the reasons for the desirability of the separation?
 Each user should be able to access the same data, but have a different
customized view of the data.
 Since a database is a shared resource, each user may require a different
view of the data held in the database.
 Each user should be able to change the way he or she views the data,
with out affecting other users.
 Users should not have to deal directly with physical database storage
details.
 The DBA should be able to change the database storage structures without
affecting the user’s views.
A) External level :
 users’ view of the database has an external schema that:
 describes that part of the database that is relevant to each
user.
 It includes a number of external schemas or user views.
 Each of these views or external schemas describes a part of
the database of interest to a particular group of users.
 This allows the users to see only those parts of the database
that are relevant to them.
 For example, one user may view dates in the form
(day,month, year), while another may view dates as
(year,month,day), some views may include derived or
calculated data, data not actually stored in the database,
 Entities, attributes or relationships that are not of interest to
the users may still be represented in the database, but the
users will be unaware of them.
B) Conceptual level
 The community view of the database, has conceptual
schema:
 describes what data is stored in the database and the
relationships among the data.
 The conceptual level represents:
 All entities, attributes and their relationships,
 The constraints on the data,
 semantic information about the data;
 Security and integrity information.
– It is a complete view of the data requirements of
the organization.
– Any data available to a user must be contained in,
or derivable from conceptual level.
C) Internal level :
 describes the physical storage structure of the database in the
computer.
 It has an internal schema which defines the storage of data.
 It uses a physical data model which shows how data is
organized on the machine.
 The internal level is concerned with such things as:
 Storage space allocation for data
 Record description for storage
 Record placement
 From the three level architecture, conceptual
modeling/conceptual db design is the “heart” of the database.
 Because it is independent of target DBMS or applications
programs.
 The following two figures describes clearly the three level of
Database Architecture.
• ANSI-SPARC Architecture and Database Design Phases
Schemas, Instances and Data Models
 In any data model, it is important to distinguish between the
description of the database and the database itself.
 The description of a database is called the database
schema, which is specified during database design and is
not expected to change frequently.
 Most data models have certain conventions for displaying
schemas as diagrams.
 A displayed schema is called a schema diagram.
 The following figure, shows a schema diagram for the
database.
 The diagram displays the structure of each record type but
not the actual instances of records.
 We call each object in the schema—such as STUDENT or
COURSE—a schema construct
11
• Figure: Schema Diagram
12
 A schema diagram displays only some aspects of a schema,
such as the names of record types and data items, and some
types of constraints.
 Other aspects are not specified in the schema diagram; for
example, in the previous figure shows neither the data type
of each data item, nor the relationships among the various
files.
 Many types of constraints are not represented in schema
diagrams.
 A constraint such as students majoring in computer science
must take CS1310 before the end of their sophomore year is
quite difficult to represent diagrammatically.
 The actual data in a database may change quite frequently.
 For example, the database shown in the previous figure
changes every time we add a new student or enter a new
grade. 13
 The data in the database at a particular moment in time is
called a database state or snapshot.
 It is also called the current set of occurrences or instances
in the database.
 In a given database state, each schema construct has its
own current set of instances; for example, the STUDENT
construct will contain the set of individual student entities
(records) as its instances.
 Many database states can be constructed to correspond to
a particular database schema.
 Every time we insert or delete a record or change the
value of a data item in a record, we change one state of
the database into another state.
 The distinction between database schema and database state
is very important.
14
 When we define a new database, we specify its database
schema only to the DBMS.
 At this point, the corresponding database state is the empty
state with no data.
 We get the initial state of the database when the database is
first populated or loaded with the initial data

15
Data Models
 is a collection of concepts that can be used to describe the
structure of a database or
 is a relatively simple representation, usually graphical, of
more complex real-world data structures.
 By structure of a database we mean the data types,
relationships, and constraints that apply to the data.
 In general terms, a model is an abstraction of a more
complex real-world object or event.
 A model’s main function is to help you understand the
complexities of the real-world environment.
 Within the database environment, a data model represents:
 data structures and their characteristics,
 relationships, constraints,
 transformations, and
 other constructs with the purpose of supporting a specific
problem domain.
16
 Note:-The terms data model and database model are
often used interchangeably.
 Data modeling is an iterative, progressive process.
 You start with a simple understanding of the problem
domain, and as your understanding of the problem
domain increases, so does the level of detail of the data
model.
 The final data model is in effect a “blueprint” containing
all the instructions to build a database that will meet all
end-user requirements.
 This blueprint is narrative and graphical in nature,
meaning that it contains both text descriptions in plain,
unambiguous language and clear, useful diagrams
depicting the main data elements.

17
 An implementation-ready data model should contain at
least the following components:
 A description of the data structure that will store the
end-user data.
 A set of enforceable rules to guarantee the integrity of
the data.
 A data manipulation methodology to support the real-
world data transformations.
 Keep in mind that a house blueprint is an abstraction;
you cannot live in the blueprint.
 Similarly, the data model is an abstraction; you cannot
draw the required data out of the data model.
 Just as you are not likely to build a good house without
a blueprint, you are equally unlikely to create a good
database without first creating an appropriate data
model.
18
Categories of Data Models
 Data models can be categorize according to the types of
concepts they use to describe the database structure.
 High-level or conceptual data models provide concepts that
are close to the way many users perceive data.
 Low-level or physical data models provide concepts that
describe the details of how data is stored on the computer
storage media, typically magnetic disks.
 Concepts provided by low-level data models are generally
meant for computer specialists, not for end users.
 Between these two extremes is a class of representational (or
implementation) data models.
 This model provide concepts that may be easily understood by
end users but that are not too far removed from the way data
is organized in computer storage.
 Representational data models hide many details of data
storage on disk but can be implemented on a computer system
directly.
19
 A data model is a description of the way that data is stored in
a database.
 Data model helps to understand the relationship between
entities and to create the most effective structure to hold
data.
 Data Model is a collection of tools or concepts for describing
 Data
 Data relationships
 Data semantics
 Data constraints
 The main purpose of Data Model is to represent the data in
an understandable way.
 Categories of data models include:
 Record-based
 Object-based
 Physical 20
 Record-based Data Models:
 Consist of a number of fixed format records.
 Each record type defines a fixed number of fields,
 Each field is typically of a fixed length.
Hierarchical Data Model
Network Data Model
Relational Data Model
Entity Relationship Model

21
1. Hierarchical Model
 simplest data model
 record type is referred to as node or segment
 The top node is the root node
 Nodes are arranged in a hierarchical structure as sort of
upside-down tree
 A parent node can have more than one child node
 A child node can only have one parent node
 The relationship between parent and child is one-to-many
 Relation is established by creating physical link between
stored records (each is stored with a predefined access path
to other records)
 To add new record type or relationship, the database must
be redefined and then stored in a new form.
22
ADVANTAGES of Hierarchical Data Model:
simple to construct and operate on
Corresponds to a number of natural hierarchically
organized domains - e.g., assemblies in manufacturing,
personnel organization in companies
construct using simple language- uses constructs like
GET, GET UNIQUE, GET NEXT, GET NEXT WITHIN
PARENT etc. 23
DISADVANTAGES of Hierarchical Data Model:
Navigational and procedural nature of processing
Database is visualized as a linear arrangement of records
Little scope for "query optimization"
2. Network Model
 Allows record types to have more than one parent
unlike hierarchical model
 A network data models sees records as set members
 Each set has an owner and one or more members
 Allow no many to many relationship between entities
 Like hierarchical model network model is a collection
of physically linked records.
 Allow member records to have more than one owner
24
ADVANTAGES of Network Data Model:
Able to model complex relationships and represents semantics of
add/delete on the relationships.
Handle most situations for modeling using record types and
relationship types.
Language is navigational- uses constructs like FIND, FIND
member, FIND owner, FIND NEXT within set, GET etc.
Programmers can do optimal navigation through the database.
25
DISADVANTAGES of Network Data Model:
Navigational and procedural nature of processing
Database contains a complex array of pointers that thread
through a set of records.
Little scope for automated "query optimization”
3. Relational Data Model
Terminologies originates from the branch of mathematics
called set theory and relation
Can define more flexible and complex relationship
Viewed as a collection of tables called “Relations”
equivalent to collection of record types
Relation: Two dimensional table
represent information or data in the form of tables 
rows and columns
26
 A row of the table is called tuple equivalent to record
 A column of a table is called attribute  equivalent to
fields
 Data value is the value of the Attribute
 Records are related by the data stored jointly in the fields
of records in two tables or files.
 The related tables contain information that creates the
relation
 The tables seem to be independent but are related some
how.
 No physical consideration of the storage is required by
the user
 Many tables are merged together to come up with a new
virtual view of the relationship
27
 The rows represent records (collections of information about
separate items)
 The columns represent fields (particular attributes of a record)
 Conducts searches by using data in specified columns of one
table to find additional data in another table
 In conducting searches, a relational database matches
information from a field in one table with information in a
corresponding field of another table to produce a third table
that combines requested data from both tables

28
Properties of Relational Databases
 Each row of a table is uniquely identified by a PRIMARY
KEY composed of one or more columns
 Each tuple in a relation must be unique
 Group of columns, that uniquely identifies a row in a table is
called a CANDIDATE KEY
 ENTITY INTEGRITY RULE of the model states that no
component of the primary key may contain a NULL value.
 A column or combination of columns that matches the
primary key of another table is called a FOREIGN KEY.
 Used to cross-reference tables.
 The REFERENTIAL INTEGRITY RULE of the model
states that, for every foreign key value in a table there must
be a corresponding primary key value in another table in
the database or it should be NULL.
 All tables are LOGICAL ENTITIES 29
 A table is either a BASE TABLES (Named Relations) or
VIEWS (Unnamed Relations)
 Only Base Tables are physically stores
 VIEWS are derived from BASE TABLES with SQL
instructions like: [SELECT .. FROM .. WHERE ..
ORDER BY]
 Is the collection of tables
– Each entity in one table
– Attributes are fields (columns) in table
 Order of rows and columns is immaterial
 Entries with repeating groups are said to be un-
normalized
 Entries are single-valued
 Each column (field or attribute) has a distinct name
 All values in a column represent the same attribute and
30
have the same data format
Building Blocks of the Relational Data Model
 The building blocks of the relational data model are:
 Entities: real world physical or logical object
 Attributes: properties used to describe each Entity or real
world object.
 Relationship: the association between Entities
 Constraints: rules that should be obeyed while
manipulating the data.

31
Relational Data Model

Entity : Student Entity: course

Each of the tables will have a number of columns


with unique name.

Note that the attribute course no. exists in both.


32
4. Entity Relationship Model
 The conceptual simplicity of relational database technology
triggered the demand for RDBMSs.
 In turn, the rapidly increasing requirements for transaction
and information created the need for more complex
database implementation structures, thus creating the need
for more effective database design tools.
 Complex design activities require conceptual simplicity to
yield successful results.
 Although the relational model was a vast improvement over
the hierarchical and network models, it still lacked the
features that would make it an effective database design
tool.
 Because it is easier to examine structures graphically than
to describe them in text, database designers prefer to use a
graphical tool in which entities and their relationships are
33
pictured.
 Thus, the entity relationship (ER) model, or ERM, has
become a widely accepted standard for data modeling.
 ER model is the graphical representation of entities and
their relationships in a database structure that quickly
became popular because it complemented the relational
data model concepts.
 The relational data model and ERM combined to provide
the foundation for tightly structured database design.
 ER models are normally represented in an entity
relationship diagram (ERD), which uses graphical
representations to model database components.
 The ER model is based on the following components:
1. Entity.
 An entity is defined as anything about which data are to
be collected and stored.
34
 is represented in the ERD by a rectangle, also known as an
entity box.
 the name of the entity, a noun, is written in the center of the
rectangle.
 the entity name is generally written in capital letters and is
written in the singular form:
 PAINTER rather than PAINTERS, and EMPLOYEE rather
than EMPLOYEES.
 Usually, when applying the ERD to the relational model, an
entity is mapped to a relational table.
 Each row in the relational table is known as an entity
instance or entity occurrence in the ER model.
 Each entity is described by a set of attributes that describes
particular characteristics of the entity.
 For example, the entity EMPLOYEE will have attributes
such as a Social Security number, a last name, and a 35first
2. Relationships.
 Relationships describe associations among data.
Most relationships describe associations between two
entities.
When the basic data model components were introduced,
three types of relationships among data were illustrated:
One-to-Many (1:M)
Many-to-Many (M:N), and
One-to-One (1:1)
 The ER model uses the term connectivity to label the
relationship types.
 The name of the relationship is usually an active or passive
verb.
For example, a PAINTER paints many PAINTINGs; an
EMPLOYEE learns many SKILLs; an EMPLOYEE
36
manages a STORE.
Data Independence
 Three of the four important characteristics of the
database approach, are:
1. Use of a catalog to store the database
description (schema) so as to make it self-
describing,
2. Insulation of programs and data (program-data
and program-operation independence), and
3. Support of multiple user views.
 In this section we discuss the concept of data
independence.

37
Data Independence
 The three-schema architecture can be used to further explain the
concept of data independence.
 It can be defined as the capacity to change the schema at one level
of a database system without having to change the schema at the
next higher level.
 We can define two types of data independence:
1. Logical data independence is the capacity to change the
conceptual schema without having to change external schemas or
application programs.
 We may change the conceptual schema to expand the database (by
adding a record type or data item), to change constraints, or to
reduce the database (by removing a record type or data item).
 In the last case, external schemas that refer only to the remaining
data should not be affected.
 For example, an existing application may access customer
records in a database.
38
 If additional attribute is added to the customer schema,
for example a reference indicating passport number, then
only applications or views that need to access the new data
item need to be modified.
2. Physical data independence is the capacity to change the
internal schema without having to change the conceptual
schema.
 Hence, the external schemas need not be changed as well.
 Changes to the internal schema may be needed because some
physical files were reorganized—for example, by creating
additional access structures—to improve the performance of
retrieval or update.
 If the same data as before remains in the database, we should
not have to change the conceptual schema.
 An example of such a change could be the addition of a new
index for accessing customer addresses.
 This does not affect the conceptual schema.
 DBMS by utilizing the new access path.
39
 Generally, physical data independence exists in most
databases and file environments where physical details such
as:
 the exact location of data on disk,
 hardware details of storage encoding,
 placement, compression,
 splitting, merging of records, and so on are hidden from the
user.
 Applications remain unaware of these details.
 On the other hand, logical data independence is harder to
achieve because:
 it allows structural and constraint changes without affecting
application programs—a much stricter requirement.
 Whenever we have a multiple-level DBMS, its catalog must
be expanded to include information on how to map requests
and data among the various levels. 40
 The DBMS uses additional software to accomplish these
mappings by referring to the mapping information in
the catalog.
 Data independence occurs because when the:
 schema is changed at some level,
 schema at the next higher level remains unchanged;
 only the mapping between the two levels is changed.
 Hence, application programs referring to the higher-
level schema need not be changed

41
SW Component of DBMS
 DBMSs are highly complex and sophisticated
pieces of software.
 It is not possible to generalize the component
structure of DBMS as it varies greatly from
system to system.
 But we can partition it in to several SW
component (module) each with a specific
operation.

42
• SW Components of a DBMS

43
 Query Processor: transform queries in to low level
instruction and direct to the database manager.
 DB manager accept instructions and examine the
external and conceptual schema. Then call the file
manager.
 DML preprocessor: convert DML statements
embedded in application programs.
 DDL compiler: convert DDL statement in to a set of
tables containing meta-data. And store in the system
catalog.
 Catalog manager :manages access and maintain the
system catalog . It is the most accessed by DBMS
component.
44
• Major Components of DB Manager

45
Major component of DB Manager
 Authorization Control : check the user authorization.
 Command Processor : process the request
 Query Optimizer: determine optimal strategy for queries.
 Transaction Manager: perform the requested operation.
 Scheduler: it makes concurrent operation to proceed with out
conflicting each other.
 Recovery Manager: ensure to remain in consistent state in
the presence of failure.
 Buffer manager : responsible to transfer data b/n main
memory and secondary memory.
Multi-User DBMS Architectures
The common Architecture to implement multi user
database Management system.
1. Teleprocessing
2. File – Server
3. Client - Server 46
Teleprocessing
 Traditional architecture.
 Single mainframe with a number of terminals
attached.
 Trend is now towards downsizing.

47
File-Server
 File-server is connected to several workstations
across a network.

 Database resides on file-server.

 DBMS and applications run on each workstation.

 Disadvantages include:
 Significant network traffic.
 Copy of DBMS on each workstation.
 Concurrency, recovery and integrity control more
complex. 48
File-Server Architecture

49
File-Server Architecture
 Client (tier 1) manages user interface and runs
applications.
 Server (tier 2) holds database and DBMS.

 Advantages include:
 wider access to existing databases;
 increased performance;
 possible reduction in hardware costs;
 reduction in communication costs;
 increased consistency.
50
Traditional Two-Tier Client-Server

51
Traditional Two-Tier Client-Server

52
Three-Tier Client-Server
 Client side presented two problems preventing true
scalability:
 ‘Fat’ client, requiring considerable resources on client’s computer to
run effectively.
 Significant client side administration overhead.

 By 1995, three layers proposed, each potentially running on a


different platform.
Advantages:
 ‘Thin’ client, requiring less expensive hardware.
 Application maintenance centralized.
 Easier to modify or replace one tier without affecting others.
 Separating business logic from database functions makes it easier to
implement load balancing.
 Maps quite naturally to Web environment. 53
Three-Tier Client-Server

54
Database Development Lifecycle (Can not be developed
by judgment!!!)

 Database Planning
 System Definition
 Requirements Collection and Analysis
 Database Design
 DBMS Selection
 Prototyping
 Implementation
 Data conversion and loading
 Testing
 Operational maintenance

55
Database Development Lifecycle

56
Database Planning:
 Analyze the Company Situation
 What is the organization’s general operating environment, and
 what is its mission (problem of the statement, objective)within that
environment?
 What is the organization’s structure?
 Identifies work to be done; the resources with which to do it; and the money to pay
for it all.
 preliminary assessment (time, budget, etc..)
 Integrated with the overall IS strategy of the organization.
System Definition
 How does the existing system function?
 What input does the system require?
 What documents and reports does the system generate?
 How is the system output used? By Whom?
 What are the operational relationships among business
units?
 What are the limits and constraints imposed on the system?
 Use modeling techniques for formal documentation (Eg.
57
DFD)
Requirements Collection and Analysis
 Designer’s efforts are focused on
 Information needs, Information users.
 Information sources. Information constitution.
 Sources of information for the designer
 Developing and gathering end user data views
 Direct observation of the current system: existing and desired output
 Interface with the systems design group
 The designer must identify the company’s business rules and analyze their
impacts.
 Use different fact finding techniques (interview, questionnaire, document
analysis)
Database Design – Converting Requirement to Database
 Major aims
 Identify functions the database will perform
 Represent data and relationships between data that is required by all major
application areas and user groups.
 Provide data model that supports any transactions required on the data.
 Specify a minimal design that is appropriately structured to achieve the stated
performance requirements for the system such as response times. 58
DBMS Selection
 The selection of an appropriate DBMS to support the database
application.
 Undertaken at any time prior to logical design provided sufficient
information is available regarding system requirements.
 Also design the user interface and the application programs using
the selected DBMS

Prototyping - Building a working model of a database


application.
 Purpose
 To identify features of a system that work well, or are
inadequate
 To suggest improvements or even new features
 To clarify the users’ requirements
 To evaluate the feasibility of a particular system design.

59
Implementation
 The physical realization of the database and application designs.
 Use DDL of DBMS to create database schemas and
empty database files.
 Use DDL to create any specified user views.
Data Conversion and Loading
 Transferring any existing data into the new database and
converting any existing applications to run on the new database.
 Only required when a new database system is replacing
an old system.
 DBMS normally have a utility that loads existing files
into the new database.
 Where applicable, it may be possible to convert and use
application programs from the old system for use by the
new system. 60
Testing
 The process of executing the application programs with the intent
of finding errors.
 Use carefully planned test strategies and realistic data.
 Testing cannot show the absence of faults; it can show only
that software faults are present.
 Demonstrates that database and application programs appear
to be working according to requirements.

Operational Maintenance
 The process of monitoring and maintaining the system
following installation.
 Monitoring the performance of the system.
 If performance falls, may require reorganization of the
database.
 Maintaining and upgrading the database application (when
required).
61
 Incorporating new requirements into the database application.

You might also like