Professional Documents
Culture Documents
UNIT 1
Consider part of a savings-bank enterprise that keeps information about all customers and
savings accounts. One way to keep the information on a computer is to store it in operating
system files. To allow users to manipulate the information, the system has a number of
application programs that manipulate the files, including
System programmers wrote these application programs to meet the needs of the bank.
This typical file-processing system is supported by a conventional operating system. The system
stores permanent records in various files, and it needs different application programs to extract
records from, and add records to, the appropriate files. Before database management systems
(DBMSs) came along, organizations usually stored information in such systems. keeping
organizational information in a file-processing system has a number of major disadvantages
View of Data
A major purpose of a database system is to provide users with an abstract view of the
data. That is, the system hides certain details of how the data are stored and maintained.
Data Abstraction
. The need for efficiency has led designers to use complex data structures to represent data in the
database. Since many database-systems users are not computer trained, developers hide the
complexity from users through several levels of abstraction, to simplify users’ interactions with
the system.
• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Database
administrators, who must decide what information to keep in the database, use the logical level
of abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Many
users of the database system do not need all this information; instead, they need to access only a
part of the database. The view level of abstraction exists to simplify their interaction with the
system.
Databases change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database. The overall
design of the database is called the database schema. Database systems have several schemas,
partitioned according to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical
schema describes the database design at the logical level.A database may also have several
schemas at the view level, sometimes called subschemas, that describe different views of the
database.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for
describing data, data relationships, data semantics, and consistency constraints.
E-R diagram indicates that there are two entity sets, customer and account, with attributes as
outlined earlier. The diagram also shows a relationship depositor between customer and account.
In addition to entities and relationships, the E-R model represents certain constraints to which the
contents of a database must conform. One important constraint is mapping cardinalities, which
express the number of entities to which another entity can be associated via a relationship set.
For example, if each account must belong to only one customer, the E-R model can express that
constraint
Relational Model
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name.
Example table shows details of bank customers, shows which accounts belong to which
customers. The first table, the customer table, shows, for example, that the customer identified
by customer-id 192-83-7465 is named Johnson and lives at 12 Alma St. in Palo Alto.
The relational model is an example of a record-based model. Record-based models are so named
because the database is structured in fixed-format records of several types. Each table contains
records of a particular type. Each record type defines a fixed number of fields, or attributes.
Other Data Models
The object-oriented model can be seen as extending the E-R model with notions of
encapsulation, methods (functions), and object identityThe object-relational data model
combines features of the object-oriented data model and relational data model.
The Semistructured data models permit the specification of data where individual data items
of the same type may have different sets of attributes. every data item of a particular type must
have the same set of attributes. The extensible markup language (XML) is widely used to
represent semi structured data..
Two other data models, the network data model and the hierarchical data model, preceded the
relational data model. These models were tied closely to the underlying implementation, and
complicated the task of modeling data
Database Languages
A database system provides a data definition language to specify the database schema and a
data manipulation language to express database queries and updates. the data definition and
data manipulation languages are not two separate languages; instead they simply form parts of a
single database language, such as the widely used SQL language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL). For instance, the following statement in the SQL language
defines the account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a special
set of tables called the data dictionary or data directory.
A data dictionary contains metadata—that is, data about data. The schema of a table is an
example of metadata. A database system consults the data dictionary before reading or
modifying actual data. We specify the storage structure and access methods used by the database
system by a set of statements in a special type of DDL called a data storage and definition
language. These statements define the implementation details of the database schemas, which are
usually hidden from the users. The data values stored in the database must satisfy certain
consistency constraints. For example, suppose the balance on an account should not fall below
$100. The DDL provides facilities to specify such constraints. The database systems check these
constraints every time the database is updated.
Data-Manipulation Language
• The retrieval of information stored in the database
• The insertion of new information into the database
• The deletion of information from the database
• The modification of information stored in the database
• Procedural DMLs require a user to specify what data are needed and how to
get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what
data are needed without specifying how to get those data. Declarative DMLs are usually easier to
learn and use than are procedural DMLs. However, since a user does not have to specify how to
get the data, the database system has to figure out an efficient means of accessing data. The
DML component of the SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language.
This query in the SQL language finds the name of the customer whose customer-idis 192
select customer.customer-name
from customer
where customer.customer-id = 192
The query specifies that those rows from the table customer where the customer-id is 192 must
be retrieved, and the customer-name attribute of these rows must be displayed.
Application programs are programs that are used to interact with the database. Application
programs are usually written in a host language, such as Cobol, C, C++, or Java. Examples in a
banking system are programs that generate payroll checks, debit accounts, credit accounts, or
transfer funds between accounts. To access the database, DML statements need to be executed
from the host language.
A primary goal of a database system is to retrieve information from and store new information in
the database. People who work with a database can be categorized as database users or database
administrators.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor,
whose function is to break down DML statements into instructions that the storage manager
understands.
• Specialized users are sophisticated users who write specialized database applications that do
not fit into the traditional data-processing framework. base and expert systems, systems that store
data with complex data types (for example, graphics data and audio data), and environment-
modeling systems.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the
programs that access those data. A person who has such central control over the system is called
a database administrator (DBA).
•Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
• Schema and physical-organization modification. The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, or to alter the
physical organization to improve performance.
• Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
Database Architecture
Storage Manager
A storage manager is a program module that provides the interface between the lowlevel data
stored in the database and the application programs and queries submitted to the system. The
storage manager is responsible for the interaction with the file manager. The raw data are stored
on the disk using the file system, which is usually provided by a conventional operating system.
The storage manager translates the various DML statements into low-level file-system
commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in
the database.
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints
and checks the authority of users to access data.
• Buffer manager, which is responsible for fetching data from disk storage into main memory,
and deciding what data to cache in main memory. The buffer manager is a critical part of the
database system, since it enables the database to handle data sizes that are much larger than the
size of main memory. The storage manager implements several data structures as part of the
physical system implementation
•Transaction manager, which ensures that the database remains in a consistent (correct) state
despite system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures
used to represent information stored on disk.
• DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. A query can
usually be translated into any of a number of alternative evaluation plans that all give the same
result. The DML compiler also performs query optimization, that is, it picks the lowest cost
evaluation plan from among the alternatives.
Application Architectures
Database applications are typically broken up into a front-end part that runs at client machines
and a part that runs at the back end. In two-tier architectures, the front-end directly
communicates with a database running at the back end. In three-tier architectures, the back end
part is itself broken up into an application server and a database server.
Entity-Relationship Model
The entity-relationship (E-R) data model perceives the real world as consisting of basic objects,
called entities, and relationships among these objects. It was developed to facilitate database
design by allowing specification of an enterprise schema, which represents the overall logical
structure of a database . The E-R data model is one of several semantic data models; the
semantic aspect of the model lies in its representation of the meaning of the data. The E-R model
is very useful in mapping the meanings and interactions of real-world enterprises onto a
conceptual schema.
Basic Concepts
The E-R data model employs three basic notions: entity sets, relationship sets, and attributes.
Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects.
For example, each person in an enterprise is an entity. An entity has a set of properties, and the
values for some set of properties may uniquely identify an entity. For instance, a person may
have a person-id property whose value uniquely identifies that person. Thus, the value 677-89-
9011 for person-id would uniquely identify one particular person in the enterprise. An entity may
be concrete, such as a person or a book, or it may be abstract, such as a loan, or a holiday, or a
concept.
An entity set is a set of entities of the same type that share the same properties, orattributes. The
set of all persons who are customers at a given bank, for example, can be defined as the entity set
customer. Similarly, the entity set loan might represent the set of all loans awarded by a
particular bank. and the entity set of all customers of the bank(customer).
Attributes
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by
each member of an entity set. Possible attributes of the customer entity set are customer-id,
customer-name, customerstreet,and customer-city.Possible attributes of the loan entity set are
loan-number and amount.
• Simple and composite attributes. In our examples thus far, the attributes havebeen simple; that
is, they are not divided into subparts. Composite attributes,on the other hand, can be divided into
subparts (that is, other attributes). Forexample, an attribute name could be structured as a
composite attribute consisting of first-name, middle-initial, and last-name.
• Single-valued and multivalued attributes. The attributes in our examples all have a single
value for a particular entity. For instance, the loan-number attribute for a specific loan entity
refers to only one loan number. Such attributes are said to be single valued. There may be
instances where an attribute has
a set of values for a specific entity. Consider an employee entity set with the attribute phone-
number. An employee may have zero, one, or several phone numbers, and different employees
may have different numbers of phones. This type of attribute is said to be multivalued.
• Derived attribute. The value for this type of attribute can be derived from the values of other
related attributes or entities. As example, suppose that the customer entity set has an attributeage,
which indicates the customer’s age. If the customer entity set also has anattribute date-of-birth,
we can calculate age from date-of-birth and the current date. Thus, age is a derived attribute. In
this case, date-of-birth may be referred to as a base attribute, or a stored attribute. The value of a
derived attribute is not stored, but is computed when required.
Relationship Sets
A relationship is an association among several entities. For example, we can define a
relationship that associates customer Hayes with loan L-15. This relationship specifies that
Hayes is a customer with loan number L-15.
A relationship set is a set of relationships of the same type.
As an example, consider the entity sets employee, branch, and job. Examples of job entities
could include manager, teller, auditor, and so on. Job entities may have the attributes title and
level. The relationship set works-on among employee, branch, and job is an example of a ternary
relationship. A ternary relationship among Jones, Perryridge, and manager indicates that Jones
acts as a manager at the Perryridge branch. Jones could also act as auditor at the Downtown
branch, which would be represented by another relationship.
Constraints
An E-R enterprise schema may define certain constraints to which the contents of a database
must conform. In this section, we examine mapping cardinalities and participation constraints,
which are two of the most important types of constraints.
Mapping Cardinalities
Mapping cardinalities, or cardinality ratios, express the number of entities to which another
entity can be associated via a relationship set. Mapping cardinalities are most useful in
describing binary relationship sets, although they can contribute to the description of relationship
sets that involve more than two entity sets. In this section, we shall concentrate on only binary
relationship sets.
For a binary relationship set R between entity sets A and B, the mapping cardinality must be one
of the following:
• One to one. An entity in A is associated with at most one entity in B, and an entity in B is
associated with at most one entity in A
• One to many. An entity in A is associated with any number (zero or more) of entities in B. An
entity in B, however, can be associated with at most one entity in A.
• Many to one. An entity in A is associated with at most one entity in B. An entity in B, however,
can be associated with any number (zero or more) of entities in A.
• Many to many. An entity in A is associated with any number (zero or more) of
entities in B, and an entity in B is associated with any number (zero or more)
of entities in A.
Entity-Relationship Diagram
An E-R diagram can express the overall logical structure of a database graphically. E-R
diagrams are simple and clear—qualities that may well account in large part for the widespread
use of the E-R model. Such a diagram consists of the following major components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationship sets
• Lines, which link attributes to entity sets and entity sets to relationship sets
• Double ellipses, which represent multivalued attributes
• Dashed ellipses, which denote derived attributes
• Double lines, which indicate total participation of an entity in a relationship set
• Double rectangles, which represent weak entity sets
● Hierarchical Model
● Network Model
● Relational Model
Hierarchical Model
In this model each entity has only one parent but can have several children . At the top of hierarchy
there is only one entity which is called Root.
Network Model
In the network model, entities are organised in a graph,in which some entities can be accessed
through several path
Relational Model
In this model, data is organised in two-dimesional tables called relations. The tables or relation are
related to each other.
Saveetha School of Engineering
UNIT 1
Consider part of a savings-bank enterprise that keeps information about all customers and savings
accounts. One way to keep the information on a computer is to store it in operating system files. To
allow users to manipulate the information, the system has a number of application programs that
manipulate the files, including
System programmers wrote these application programs to meet the needs of the bank.
This typical file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from,
and add records to, the appropriate files. Before database management systems (DBMSs) came along,
organizations usually stored information in such systems. keeping organizational information in a file-
processing system has a number of major disadvantages
View of Data
A major purpose of a database system is to provide users with an abstract view of the data. That
is, the system hides certain details of how the data are stored and maintained.
Data Abstraction
. The need for efficiency has led designers to use complex data structures to represent data in the
database. Since many database-systems users are not computer trained, developers hide the complexity
from users through several levels of abstraction, to simplify users’ interactions with the system.
• Physical level. The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
• Logical level. The next-higher level of abstraction describes what data are stored in the database, and
what relationships exist among those data. The logical level thus describes the entire database in terms
of a small number of relatively simple structures. Database administrators, who must decide what
information to keep in the database, use the logical level of abstraction.
• View level. The highest level of abstraction describes only part of the entire database. Many users of
the database system do not need all this information; instead, they need to access only a part of the
database. The view level of abstraction exists to simplify their interaction with the system.
Instances and Schemas
Databases change over time as information is inserted and deleted. The collection of information stored
in the database at a particular moment is called an instance of the database. The overall design of the
database is called the database schema. Database systems have several schemas, partitioned according
to the levels of abstraction.
The physical schema describes the database design at the physical level, while the logical schema
describes the database design at the logical level.A database may also have several schemas at the view
level, sometimes called subschemas, that describe different views of the database.
Data Models
Underlying the structure of a database is the data model: a collection of conceptual tools for describing
data, data relationships, data semantics, and consistency constraints.
The overall logical structure (schema) of a database can be expressed graphically by an E-R diagram,
which is built up from the following components:
• Rectangles, which represent entity sets
• Ellipses, which represent attributes
• Diamonds, which represent relationships among entity sets
• Lines, which link attributes to entity sets and entity sets to relationships
E-R diagram indicates that there are two entity sets, customer and account, with attributes as outlined
earlier. The diagram also shows a relationship depositor between customer and account. In addition to
entities and relationships, the E-R model represents certain constraints to which the contents of a
database must conform. One important constraint is mapping cardinalities, which express the number
of entities to which another entity can be associated via a relationship set. For example, if each account
must belong to only one customer, the E-R model can express that constraint
Relational Model
The relational model uses a collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name. Example table
shows details of bank customers, shows which accounts belong to which customers. The first table, the
customer table, shows, for example, that the customer identified by customer-id 192-83-7465 is named
Johnson and lives at 12 Alma St. in Palo Alto.
The relational model is an example of a record-based model. Record-based models are so named
because the database is structured in fixed-format records of several types. Each table contains records
of a particular type. Each record type defines a fixed number of fields, or attributes.
The object-oriented model can be seen as extending the E-R model with notions of encapsulation,
methods (functions), and object identityThe object-relational data model combines features of the
object-oriented data model and relational data model.
The Semistructured data models permit the specification of data where individual data items of the
same type may have different sets of attributes. every data item of a particular type must have the same
set of attributes. The extensible markup language (XML) is widely used to represent semi structured
data..
Two other data models, the network data model and the hierarchical data model, preceded the
relational data model. These models were tied closely to the underlying implementation, and
complicated the task of modeling data
Database Languages
A database system provides a data definition language to specify the database schema and a data
manipulation language to express database queries and updates. the data definition and data
manipulation languages are not two separate languages; instead they simply form parts of a single
database language, such as the widely used SQL language.
Data-Definition Language
We specify a database schema by a set of definitions expressed by a special language called a data-
definition language (DDL). For instance, the following statement in the SQL language defines the
account table:
create table account
(account-number char(10),
balance integer)
Execution of the above DDL statement creates the account table. In addition, it updates a special set of
tables called the data dictionary or data directory.
A data dictionary contains metadata—that is, data about data. The schema of a table is an example of
metadata. A database system consults the data dictionary before reading or modifying actual data. We
specify the storage structure and access methods used by the database system by a set of statements in a
special type of DDL called a data storage and definition language. These statements define the
implementation details of the database schemas, which are usually hidden from the users. The data
values stored in the database must satisfy certain consistency constraints. For example, suppose the
balance on an account should not fall below $100. The DDL provides facilities to specify such
constraints. The database systems check these constraints every time the database is updated.
Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate data
as organized by the appropriate data model. There are basically two types
• Procedural DMLs require a user to specify what data are needed and how to
get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data. Declarative DMLs are usually easier to learn and use
than are procedural DMLs. However, since a user does not have to specify how to get the data, the
database system has to figure out an efficient means of accessing data. The DML component of the
SQL language is nonprocedural.
A query is a statement requesting the retrieval of information. The portion of a DML that involves
information retrieval is called a query language.
This query in the SQL language finds the name of the customer whose customer-idis 192
select customer.customer-name
from customer
where customer.customer-id = 192
The query specifies that those rows from the table customer where the customer-id is 192 must be
retrieved, and the customer-name attribute of these rows must be displayed.
Database Access from Application Programs
Application programs are programs that are used to interact with the database. Application
programs are usually written in a host language, such as Cobol, C, C++, or Java. Examples in a
banking system are programs that generate payroll checks, debit accounts, credit accounts, or transfer
funds between accounts. To access the database, DML statements need to be executed from the host
language.
A primary goal of a database system is to retrieve information from and store new information in the
database. People who work with a database can be categorized as database users or database
administrators.
• Naive users users who interact with the system by invoking one of the application programs that have
been written previously. As example, consider a user who wishes to find her account balance over the
World Wide Web. Such a user may access a form, where a person enters her account number.. Naive
users may also simply read reports generated from the database.
• Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces. Rapid application development
(RAD) tools are tools that enable an application programmer to construct forms and reports without
writing a program.
• Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language. They submit each such query to a query processor, whose
function is to break down DML statements into instructions that the storage manager understands.
• Specialized users are sophisticated users who write specialized database applications that do not fit
into the traditional data-processing framework. base and expert systems, systems that store data with
complex data types (for example, graphics data and audio data), and environment-modeling systems.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and the programs
that access those data. A person who has such central control over the system is called a database
administrator (DBA).
•Schema definition. The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
• Granting of authorization for data access. By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever someone attempts to access the data in the system.
• Routine maintenance. Examples of the database administrator’s routine maintenance activities are:
Periodically backing up the database, to prevent loss of data in case of disasters, flooding,etc.
Ensuring that enough free disk space is available and upgrading disk space if required.
Monitoring jobs running on the database and ensuring that performance is not degraded
Database Architecture
Storage Manager
A storage manager is a program module that provides the interface between the lowlevel data stored in
the database and the application programs and queries submitted to the system. The storage manager is
responsible for the interaction with the file manager. The raw data are stored on the disk using the file
system, which is usually provided by a conventional operating system. The storage manager translates
the various DML statements into low-level file-system commands. Thus, the storage manager is
responsible for storing, retrieving, and updating data in the database.
• Authorization and integrity manager, which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
• Buffer manager, which is responsible for fetching data from disk storage into main memory, and
deciding what data to cache in main memory. The buffer manager is a critical part of the database
system, since it enables the database to handle data sizes that are much larger than the size of main
memory. The storage manager implements several data structures as part of the physical system
implementation
•Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
• File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
• DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
• DML compiler, which translates DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation engine understands. A query can usually
be translated into any of a number of alternative evaluation plans that all give the same result. The
DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan from
among the alternatives.
Application Architectures
Database applications are typically broken up into a front-end part that runs at client machines and a
part that runs at the back end. In two-tier architectures, the front-end directly communicates with a
database running at the back end. In three-tier architectures, the back end part is itself broken up into an
application server and a database server.
Entity-Relationship Model
The entity-relationship (E-R) data model perceives the real world as consisting of basic objects,
called entities, and relationships among these objects. It was developed to facilitate database design by
allowing specification of an enterprise schema, which represents the overall logical structure of a
database . The E-R data model is one of several semantic data models; the semantic aspect of the
model lies in its representation of the meaning of the data. The E-R model is very useful in mapping
the meanings and interactions of real-world enterprises onto a conceptual schema.
Basic Concepts
The E-R data model employs three basic notions: entity sets, relationship sets, and attributes.
Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects. For
example, each person in an enterprise is an entity. An entity has a set of properties, and the values for
some set of properties may uniquely identify an entity. For instance, a person may have a person-id
property whose value uniquely identifies that person. Thus, the value 677-89-9011 for person-id would
uniquely identify one particular person in the enterprise. An entity may be concrete, such as a person or
a book, or it may be abstract, such as a loan, or a holiday, or a concept.
An entity set is a set of entities of the same type that share the same properties, orattributes. The set of
all persons who are customers at a given bank, for example, can be defined as the entity set customer.
Similarly, the entity set loan might represent the set of all loans awarded by a particular bank. and the
entity set of all customers of the bank(customer).
Attributes
An entity is represented by a set of attributes. Attributes are descriptive properties possessed by each
member of an entity set. Possible attributes of the customer entity set are customer-id, customer-name,
customerstreet,and customer-city.Possible attributes of the loan entity set are loan-number and amount.
• Simple and composite attributes. In our examples thus far, the attributes havebeen simple; that is,
they are not divided into subparts. Composite attributes,on the other hand, can be divided into subparts
(that is, other attributes). Forexample, an attribute name could be structured as a composite attribute
consisting of first-name, middle-initial, and last-name.
• Single-valued and multivalued attributes. The attributes in our examples all have a single value for a
particular entity. For instance, the loan-number attribute for a specific loan entity refers to only one
loan number. Such attributes are said to be single valued. There may be instances where an attribute
has
a set of values for a specific entity. Consider an employee entity set with the attribute phone-number.
An employee may have zero, one, or several phone numbers, and different employees may have
different numbers of phones. This type of attribute is said to be multivalued.
• Derived attribute. The value for this type of attribute can be derived from the values of other related
attributes or entities. As example, suppose that the customer entity set has an attributeage, which
indicates the customer’s age. If the customer entity set also has anattribute date-of-birth, we can
calculate age from date-of-birth and the current date. Thus, age is a derived attribute. In this case,
date-of-birth may be referred to as a base attribute, or a stored attribute. The value of a derived
attribute is not stored, but is computed when required.
Relationship Sets
A relationship is an association among several entities. For example, we can define a relationship that
associates customer Hayes with loan L-15. This relationship specifies that Hayes is a customer with
loan number L-15.
A relationship set is a set of relationships of the same type.
As an example, consider the entity sets employee, branch, and job. Examples of job entities could
include manager, teller, auditor, and so on. Job entities may have the attributes title and level. The
relationship set works-on among employee, branch, and job is an example of a ternary relationship. A
ternary relationship among Jones, Perryridge, and manager indicates that Jones acts as a manager at the
Perryridge branch. Jones could also act as auditor at the Downtown branch, which would be
represented by another relationship.
Constraints
An E-R enterprise schema may define certain constraints to which the contents of a database must
conform. In this section, we examine mapping cardinalities and participation constraints, which are two
of the most important types of constraints.
Mapping Cardinalities
Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can
be associated via a relationship set. Mapping cardinalities are most useful in describing binary
relationship sets, although they can contribute to the description of relationship sets that involve more
than two entity sets. In this section, we shall concentrate on only binary relationship sets.
For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of the
following:
• One to one. An entity in A is associated with at most one entity in B, and an entity in B is associated
with at most one entity in A
• One to many. An entity in A is associated with any number (zero or more) of entities in B. An entity
in B, however, can be associated with at most one entity in A.
• Many to one. An entity in A is associated with at most one entity in B. An entity in B, however, can
be associated with any number (zero or more) of entities in A.
• Many to many. An entity in A is associated with any number (zero or more) of
entities in B, and an entity in B is associated with any number (zero or more)
of entities in A.
Entity-Relationship Diagram
An E-R diagram can express the overall logical structure of a database graphically. E-R diagrams are
simple and clear—qualities that may well account in large part for the widespread use of the E-R
model. Such a diagram consists of the following major components:
A relational database consists of a collection of tables, each of which is assigned a unique name A row
in a table represents a relationship among a set of values.
Consider the account table of Figure It has three column headers: account-number, branch-name, and
balance. For each attribute, there is a set of permitted values, called the domain of that attribute. For
the attribute branch-name, for example, the domain is the set of all branch names. Let D1 denote the
set of all account numbers, D2 the set of all branch names, and D3 the set of all balances. any row of
account must consist of a 3-tuple (v1, v2, v3), where v1 is an account number (that is, v1 is in domain
D1), v2 is a branch name (that is, v2 is in domain D2), and v3 is a balance (that is, v3 is indomain D3).
In general, account will contain only a subset of the set of all possible
rows. Therefore, account is a subset of
D1 ?D2 ?D3
In general, a table of n attributes must be a subset of
D1 ?D2 ?????Dn−1 ?Dn
Modification Anomalies
All three kinds of anomalies are highly undesirable, since their occurrence
constitutes corruption of the database. Properly normalized databases are
much less susceptible to corruption than are un normalized databases.
RELATIONAL ALGEBRA
NORMALIZATION
NORMAL FORMS
UNNORMALIZED RELATIONS
– NO REPEATING GROUPS
• UPDATE: THE SIDE EFFECTS FOR EACH DRUG APPEAR ONLY ONCE.
BCNF RELATIONS
Table 8 gives a relation for this problem and figure 10 The dependency
diagram(s).
⮚ If vendor V1 has to supply to project P2, but the item is not yet decided,
then a row with a blank for item code has to be introduced.
⮚ The information about item 1 is stored twice for vendor V3.
Observe that the relation given in Table 8 is in 3NF and also in BCNF It still
has the problems mentioned above. The problem is reduced by expressing this
relation as two relations in the Fourth Normal Form (4NF). A relation is in 4NF
if it has no more than one independent multivalued dependency or one
independent multivalued dependency with a functional dependency. Table 8
can be expressed as the two 4NF relations given in Table 9. The fact that
vendors are capable of supplying certain items and that they are assigned to
supply for some projects in independently specified in the 4NF relation.
These relations still have a problem. Even though vendor Vl 's capability to
supply items and his allotment to supply for specified projects may not need it.
We thus need another relation which specifies this. This is called 5NF form. the
5NF relations are the relations in Table 9(a) and 9(b) together with the relation
given in table 10.
JOIN DEPENDENCIES:
The normal forms discussed so far required that the given relation R if
not in the given normal form be decomposed in two relations to meet the
requirements of the normal form. In some cases, a relation can have problems
like redundant information and update anomalies because of it but cannot be
decomposed in two relations to remove the problems. In such cases it may
possible to decompose the relation in three or more relations using the 5NF.
n Read-write head
H Positioned very close to the platter surface (almost touching it)
H Reads or writes magnetically encoded information.
n Surface of platter divided into circular tracks
H Over 16,000 tracks per platter on typical hard disks
n Each track is divided into sectors.
H A sector is the smallest unit of data that can be read or written.
H Sector size typically 512 bytes
H Typical sectors per track: 200 (on inner tracks) to 400 (on outer tracks)
n To read/write a sector
H disk arm swings to position head on right track
H platter spins continually; data is read/written as sector passes under head
n Head-disk assemblies
H multiple disk platters on a single spindle (typically 2 to 4)
H one head per platter, mounted on a common arm.
n Cylinder i consists of ith track of all the platters
n Earlier generation disks were susceptible to head-crashes
n Surface of earlier generation disks had metal-oxide coatings which would disintegrate on
head crash and damage all data on disk
n Current generation disks are less susceptible to such disastrous failures, although
individual sectors may get corrupted
n Disk controller – interfaces between the computer system and the disk drive hardware.
n accepts high-level commands to read or write a sector
n initiates actions such as moving the disk arm to the right track and actually reading or
writing the data
n Computes and attaches checksums to each sector to verify that data is read back correctly
n If data is corrupted, with very high probability stored checksum won’t match recomputed
checksum
n Ensures successful writing by reading back sector after writing it
n Performs remapping of bad sectors
DISK SUBSYSTEM
Faster data transfer than with a single disk, but fewer I/Os per second since every
disk has to participate in every I/O.
Subsumes Level 2 (provides all its benefits, at lower cost).
RAID Level 4: Block-Interleaved Parity; uses block-level striping, and keeps a parity
block on a separate disk for corresponding blocks from N other disks.
When writing data block, corresponding block of parity bits must also be
computed and written to parity disk
To find value of a damaged block, compute XOR of bits from corresponding blocks
(including parity block) from other disks
Provides higher I/O rates for independent block reads than Level 3
block read goes to a single disk, so blocks stored on different disks can be
read in parallel
Provides high transfer rates for reads of multiple blocks than no-striping
Before writing a block, parity data must be computed
Can be done by using old parity block, old value of current block and new
value of current block (2 block reads + 2 block writes)
Or by recomputing the parity value using the new values of blocks
corresponding to the parity block
– More efficient for writing large amounts of data sequentially
Parity block becomes a bottleneck for independent block writes since every block
write also writes to parity disk
RAID Level 5: Block-Interleaved Distributed Parity; partitions data and parity among all
N + 1 disks, rather than storing data in N disks and parity in 1 disk.
E.g., with 5 disks, parity block for nth set of blocks is stored on disk (n mod 5) +
1, with the data blocks stored on the other 4 disks.
Higher I/O rates than Level 4.
4 Block writes occur in parallel if the blocks and their parity blocks are on
different disks.
Subsumes Level 4: provides same benefits, but avoids bottleneck of parity disk.
RAID Level 6: P+Q Redundancy scheme; similar to Level 5, but stores extra redundant
information to guard against multiple disk failures.
H Better reliability than Level 5 at a higher cost; not used as widely.
FILE ORGANIZATION
n The database is stored as a collection of files. Each file is a sequence of records. A
record is a sequence of fields.
n One approach:
H assume record size is fixed
H each file has records of one particular type only
H different files are used for different relations
This case is easiest to implement; will consider variable length records
later.
Fixed-Length Records
n Simple approach:
H Store record i starting from byte n * (i – 1), where n is the size of each record.
H Record access is simple but records may cross blocks
4 Modification: do not allow records to cross block boundaries
n Deletion of record I:
alternatives:
H move records i + 1, . . ., n
to i, . . . , n – 1
H move record n to i
H do not move records, but
link all free records on a
free list
Free Lists
n Store the address of the first deleted record in the file header.
n Use this first record to store the address of the second deleted record, and so on
n Can think of these stored addresses as pointers since they “point” to the location of a
record.
n More space efficient representation: reuse space for normal attributes of free records to
store pointers. (No pointers stored in in-use records.)
Variable-Length Records
n Variable-length records arise in database systems in several ways:
H Storage of multiple record types in a file.
H Record types that allow variable lengths for one or more fields.
H Record types that allow repeating fields (used in some older data models).
n Byte string representation
H Attach an end-of-record (⊥) control character to the end of each record
H Difficulty with deletion
H Difficulty with growth
H
Variable-Length Records: Slotted Page Structure
n Disadvantage to pointer structure; space is wasted in all records except the first in a a
chain.
n Solution is to allow two kinds of block in file:
H Anchor block – contains the first records of chain
H Overflow block – contains records other than those that are the first records of
chairs.
search-key pointer
n Index files are typically much smaller than the original file
n Two basic kinds of indices:
n Ordered indices: search keys are stored in sorted order
n Hash indices: search keys are distributed uniformly across “buckets” using a “hash
function”.
Ordered Indices
Indexing techniques evaluated on basis of:
n In an ordered index, index entries are stored sorted on the search key value. E.g., author
catalog in library.
n Primary index: in a sequentially ordered file, the index whose search key specifies the
sequential order of the file.
⮚ Also called clustering index
⮚ The search key of a primary index is usually but not necessarily the primary key.
n Secondary index: an index whose search key specifies an order different from the
sequential order of the file. Also called
non-clustering index.
n Index-sequential file: ordered sequential file with a primary index.
Dense Index Files
n Dense index — Index record appears for every search-key value in the file.
Multilevel Index
n
Primary and Secondary Indices
n Secondary indices have to be dense.
n Indices offer substantial benefits when searching for records.
n When a file is modified, every index on the file must be updated, Updating indices
imposes overhead on database modification.
n Sequential scan using primary index is efficient, but a sequential scan using a secondary
index is expensive
⮚ each record access may fetch a new block from disk
B+ TREE INDEX FILE
After deleting
n The removal of the leaf node containing “Downtown” did not result in its parent having
too little pointers. So the cascaded deletions stopped with the deleted leaf node’s parent.
HASHING
Static Hashing
n A bucket is a unit of storage containing one or more records (a bucket is typically a disk
block).
n In a hash file organization we obtain the bucket of a record directly from its search-key
value using a hash function.
n Hash function h is a function from the set of all search-key values K to the set of all
bucket addresses B.
n Hash function is used to locate records for access, insertion as well as deletion.
n Records with different search-key values may be mapped to the same bucket; thus entire
bucket has to be searched sequentially to locate a record.
Example of Hash File Organization
Hash file organization of account file, using branch-name as key
There are 10 buckets,
n The binary representation of the ith character is assumed to be the integer i.
n The hash function returns the sum of the binary representations of the characters modulo
10
⮚ E.g. h(Perryridge) = 5 h(Round Hill) = 3 h(Brighton) = 3
n Hash file organization of account file, using branch-name as key
Hash Functions
n Worst has function maps all search-key values to the same bucket; this makes access time
proportional to the number of search-key values in the file.
n An ideal hash function is uniform, i.e., each bucket is assigned the same number of
search-key values from the set of all possible values.
n Ideal hash function is random, so each bucket will have the same number of records
assigned to it irrespective of the actual distribution of search-key values in the file.
n Typical hash functions perform computation on the internal binary representation of the
search-key.
⮚ For example, for a string search-key, the binary representations of all the
characters in the string could be added and the sum modulo the number of buckets
could be returned. .
Handling of Bucket Overflows
n Bucket overflow can occur because of
⮚ Insufficient buckets
⮚ Skew in distribution of records. This can occur due to two reasons:
★ multiple records have same search-key value
★ chosen hash function produces non-uniform distribution of key values
n Although the probability of bucket overflow can be reduced, it cannot be eliminated; it is
handled by using overflow buckets.
n Overflow chaining – the overflow buckets of a given bucket are chained together in a
linked list.
n Above scheme is called closed hashing.
⮚ An alternative, called open hashing, which does not use overflow buckets, is not
suitable for database applications.
Hash Indices
n Hashing can be used not only for file organization, but also for index-structure creation.
n A hash index organizes the search keys, with their associated record pointers, into a hash
file structure.
n Strictly speaking, hash indices are always secondary indices
⮚ if the file itself is organized using hashing, a separate primary hash index on it
using the same search-key is unnecessary.
⮚ However, we use the term hash index to refer to both secondary index structures
and hash organized files.
Dynamic Hashing
n Each bucket j stores a value ij; all the entries that point to the same bucket have the same
values on the first ij bits.
n To locate the bucket containing search-key Kj:
Compute h(Kj) = X
Use the first i high order bits of X as a displacement into bucket address table, and follow
the pointer to appropriate bucket
n To insert a record with search-key value Kj
⮚ follow same procedure as look-up and locate the bucket, say j.
⮚ If there is room in the bucket j insert record in the bucket.
⮚ Else the bucket must be split and insertion re-attempted
⮚ Overflow buckets used instead in some cases
UNIT-IV
QUERY PROCESSING
Basic Steps in Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation