You are on page 1of 24

DBMS Definition:

A database-management system (DBMS) is a collection of interrelated data and a set of programs


to access those data. The collection of data, usually referred to as the database, contains information
relevant to an enterprise.

Objective:
The primary goal of a DBMS is to provide a way to store and retrieve database information that is
both convenient and efficient.

Database systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the manipulation of
information. The database system must ensure the safety of the information stored, despite system crashes
or attempts at unauthorized access.

Database-System Applications:
1. Enterprise Information
2. Banking and Finance
Banking: For customer information, accounts, loans, and banking transactions.
Credit card transactions: For purchases on credit cards and generation of monthly statements.
Finance: For storing information about holdings, sales, and purchases of financial instruments
such as stocks and bonds; also for storing real-time market data to enable online trading by
customers and automated trading by the firm.
3. Airlines
4. University
5. Web-Based services.
6. Telecommunication.

There are two modes in which databases are used OLTP and OLAP.
Purpose of Database Systems:
1. Data redundancy and inconsistency
2. Difficulty in accessing data
3. Data isolation
4. Integrity problems
5. Atomicity problems
6. Concurrent-access anomalies
7. Security problems
View of Data:
A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with an abstract
view of the data. That is, the system hides certain details of how the data are stored and maintained.
1 Data Models
 Relational Model: collection of tables to represent both data and the relationships among
those data. Each table has multiple columns, and each column has a unique name. Tables are
also known as relations.
 Entity-Relationship Model: uses a collection of basic objects, called entities, and
relationships among these objects. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects
 Semi-structured Data Model: permits the specification of data where individual data items
of the same type may have different sets of attributes
 Object-Based Data Model: Object-oriented programming (especially in Java, C++, or C#)
has become the dominant software-development methodology
2 Relational Data Model
3 Data Abstraction: developers hide the complexity from users through several levels of data
abstraction, to simplify users interactions with the system.
 Physical level: The lowest level of abstraction describes how the data are actually stored. The
physical level describes complex low-level data structures in detail.
 Logical level: The next-higher level of abstraction describes what data are stored in the
database, and what relationships exist among those data. The user of the logical level does not
need to be aware of this complexity. This is referred to as physical data independence.
Database administrators, who must decide what information to keep in the database, use the
logical level of abstraction.
 View level: The highest level of abstraction describes only part of the entire database. Even
though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for the same database
4. Instances and Schemas:
Databases change over time as information is inserted and deleted. The collection of information
stored in the database at a particular moment is called an instance of the database. The overall design of
the database is called the database schema.
Database systems have several schemas, partitioned according to the levels of abstraction. The
physical schema describes the database design at the physical level, while the logical schema describes
the database design at the logical level. A database may also have several schemas at the view level,
sometimes called subschemas, that describe different views of the database.

Database Languages:
A database system provides a data-definition language (DDL) to specify the database schema and a data-
manipulation language (DML) to express database queries and updates.
Domain Constraints. A domain of possible values must be associated with every attribute (for example,
integer types, character types, date/time types). Domain constraints are the most elementary form of
integrity constraint. They are tested easily by the system whenever a new data item is entered into the
database.
Referential Integrity. There are cases where we wish to ensure that a value that appears in one relation
for a given set of attributes also appears in a certain set of attributes in another relation (referential
integrity). Example: the dept name value in a course record must appear in the dept name attribute of
some record of the department relation. Database modifications can cause violations of referential
integrity. When a referential-integrity constraint is violated, the normal procedure is to reject the action
that caused the violation.
Authorization. We may want to differentiate among the users as far as the type of access they are
permitted on various data values in the database. These differentiations are expressed in terms of
authorization

o Data Definition Language (DDL)


It is a language that allows the user to define the data and their relationship to other types
of data. The DDL commands are: Create, Alter, Rename, Drop, Truncate.
o Data Manipulation Language (DML)
It is a language that provides a set of operations to support the basic data manipulation
operation on data held in the database. The DML commands are: Insert, delete, update,
select, merge, call.
o Data Control Language (DCL)
DCL is used to access the stored data. It is mainly used for revoke and grant the user
access to a database. The DCL commands are: Grant, Revoke.
o Transaction Control Language (TCL)
TCL is a language which manages the transactions within the database. It is used to
execute the changes made by the data manipulation language statements. The TCL
commands are: Commit, Rollback.
DBMS Architecture

o The DBMS design depends upon its architecture. The basic client/server architecture is used to
deal with a large number of PCs, web servers, database servers and other components that are
connected with networks.
o The client/server architecture consists of many PCs and a workstation which are connected via
the network.
o DBMS architecture depends upon how users are connected to the database to get their request
done.

Types of DBMS Architecture

1-Tier Architecture
o In this architecture, the database is directly available to the user. It means the user can directly sit
on the DBMS and uses it.
o Any changes done here will directly be done on the database itself. It doesn't provide a handy tool
for end users.
o The 1-Tier architecture is used for development of the local application, where programmers can
directly communicate with the database for the quick response.

2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In the two-tier architecture, applications on
the client end can directly communicate with the database at the server side. For this interaction,
API's like: ODBC, JDBC are used.
o Where the application resides at the client machine, and invokes database system
functionality at the server machine through query language statements.
o The user interfaces and application programs are run on the client-side.
o The server side is responsible to provide the functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application establishes a connection with the server
side.

3-Tier Architecture
o The 3-Tier architecture contains another layer between the client and server. In this architecture,
client can't directly communicate with the server.
o The application on the client-end interacts with an application server which further communicates
with the database system.
o End user has no idea about the existence of the database beyond the application server. The
database also has no idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web application.
o Where the client machine acts as merely a front end and does not contain any direct database
calls; web browsers and mobile applications are the most commonly used application clients
today.
o The front end communicates with an application server. The application server, in turn,
communicates with a database system to access data.
o The business logic of the application, which says what actions to carry out under what conditions,
is embedded in the application server, instead of being distributed across multiple clients.
o Three tier applications provide better security as well as better performance than two-tier
applications.

Database Users and Administrators


People who work with a database can be categorized as database users or database administrators.
Database Users and User Interfaces: There are four different types of database-system users,
differentiated by the way they expect to interact with the system. Different types of user interfaces have
been designed for the different types of users.
o Naıve users: predefined user interfaces, such as web or mobile applications. Naıve users may
also view read reports generated from the database. Eg: Vlearn.
o Application programmers are computer professionals who write application programs. Eg.
Developers.
o Sophisticated users interact with the system without writing programs. Instead, they form their
requests either using a database query language or by using tools such as data analysis software.
Database Administrator
One of the main reasons for using DBMSs is to have central control of both the data and
the programs that access those data. A person who has such central control over the system is
called a database administrator (DBA). The functions of a DBA include,
Schema definition: The DBA creates the original database schema by executing a set of data
definition statements in the DDL.
Storage structure and access-method definition: The DBA may specify some parameters
pertaining to the physical organization of the data and the indices to be created.
Schema and physical-organization modification: The DBA carries out changes to the schema
and physical organization to reflect the changing needs of the organization, to improve
performance.
Granting of authorization for data access: By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access. The
authorization information is kept in a special system structure that the database system consults
whenever a user tries to access the data in the system.
Routine maintenance: Examples of the database administrator’s routine maintenance activities
are:
° Periodically backing up the database onto remote servers, to prevent loss of data in case
of disasters such as flooding.
° Ensuring that enough free disk space is available for normal operations, and upgrading
disk space as required.
° Monitoring jobs running on the database and ensuring that performance is not degraded
by very expensive tasks submitted by some users.
Distributed Databases:
A distributed database is a collection of multiple interconnected databases, which are spread physically
across various locations that communicate via a computer network.

Features

 Databases in the collection are logically interrelated with each other. Often they represent a
single logical database.
 Data is physically stored across multiple sites. Data in each site can be managed by a DBMS
independent of the other sites.
 The processors in the sites are connected via a network. They do not have any multiprocessor
configuration.
 A distributed database is not a loosely connected file system.
 A distributed database incorporates transaction processing, but it is not synonymous with a
transaction processing system.
Distributed Database Management System

A distributed database management system (DDBMS) is a centralized software system that manages a
distributed database in a manner as if it were all stored in a single location.

Features
 It is used to create, retrieve, update and delete distributed databases.
 It synchronizes the database periodically and provides access mechanisms by the virtue of which
the distribution becomes transparent to the users.
 It ensures that the data modified at any site is universally updated.
 It is used in application areas where large volumes of data are processed and accessed by
numerous users simultaneously.
 It is designed for heterogeneous database platforms.
 It maintains confidentiality and data integrity of the databases.

Factors Encouraging DDBMS

The following factors encourage moving over to DDBMS −


 Distributed Nature of Organizational Units − Most organizations in the current times are
subdivided into multiple units that are physically distributed over the globe. Each unit requires
its own set of local data. Thus, the overall database of the organization becomes distributed.
 Need for Sharing of Data − The multiple organizational units often need to communicate with
each other and share their data and resources. This demands common databases or replicated
databases that should be used in a synchronized manner.
 Support for Both OLTP and OLAP − Online Transaction Processing (OLTP) and Online
Analytical Processing (OLAP) work upon diversified systems which may have common data.
Distributed database systems aid both these processing by providing synchronized data.
 Database Recovery − One of the common techniques used in DDBMS is replication of data
across different sites. Replication of data automatically helps in data recovery if database in any
site is damaged. Users can access data from other sites while the damaged site is being
reconstructed. Thus, database failure may become almost inconspicuous to users.
 Support for Multiple Application Software − Most organizations use a variety of application
software each with its specific database support. DDBMS provides a uniform functionality for
using the same data among different platforms.

Advantages of Distributed Databases

Modular Development − If the system needs to be expanded to new locations or new units, in
centralized database systems, the action requires substantial efforts and disruption in the existing
functioning. However, in distributed databases, the work simply requires adding new computers and
local data to the new site and finally connecting them to the distributed system, with no interruption in
current functions.
More Reliable − In case of database failures, the total system of centralized databases comes to a halt.
However, in distributed systems, when a component fails, the functioning of the system continues may
be at a reduced performance. Hence DDBMS is more reliable.
Better Response − If data is distributed in an efficient manner, then user requests can be met from local
data itself, thus providing faster response. On the other hand, in centralized systems, all queries have to
pass through the central computer for processing, which increases the response time.
Lower Communication Cost − In distributed database systems, if data is located locally where it is
mostly used, then the communication costs for data manipulation can be minimized. This is not feasible
in centralized systems.

Difficulties of Distributed Databases

Following are some of the adversities associated with distributed databases.


 Need for complex and expensive software − DDBMS demands complex and often expensive
software to provide data transparency and co-ordination across the several sites.
 Processing overhead − Even simple operations may require a large number of communications
and additional calculations to provide uniformity in data across the sites.
 Data integrity − The need for updating data in multiple sites pose problems of data integrity.
 Overheads for improper data distribution − Responsiveness of queries is largely dependent
upon proper data distribution. Improper data distribution often leads to very slow response to
user requests.

Types of Distributed Databases

Distributed databases can be broadly classified into homogeneous and heterogeneous distributed
database environments, each with further sub-divisions, as shown in the following illustration.

Homogeneous Distributed Databases


In a homogeneous distributed database, all the sites use identical DBMS and operating systems. Its
properties are −
 The sites use very similar software.
 The sites use identical DBMS or DBMS from the same vendor.
 Each site is aware of all other sites and cooperates with other sites to process user requests.
 The database is accessed through a single interface as if it is a single database.
Types of Homogeneous Distributed Database
There are two types of homogeneous distributed database −
 Autonomous − Each database is independent that functions on its own. They are integrated by a
controlling application and use message passing to share data updates.
 Non-autonomous − Data is distributed across the homogeneous nodes and a central or master
DBMS co-ordinates data updates across the sites.

Heterogeneous Distributed Databases


In a heterogeneous distributed database, different sites have different operating systems, DBMS products
and data models. Its properties are −
 Different sites use dissimilar schemas and software.
 The system may be composed of a variety of DBMSs like relational, network, hierarchical or
object oriented.
 Query processing is complex due to dissimilar schemas.
 Transaction processing is complex due to dissimilar software.
 A site may not be aware of other sites and so there is limited co-operation in processing user
requests.

Types of Heterogeneous Distributed Databases


 Federated − The heterogeneous database systems are independent in nature and integrated
together so that they function as a single database system.
 Un-federated − The database systems employ a central coordinating module through which the
databases are accessed.

Distributed DBMS Architectures

DDBMS architectures are generally developed depending on three parameters −


 Distribution − It states the physical distribution of data across the different sites.
 Autonomy − It indicates the distribution of control of the database system and the degree to
which each constituent DBMS can operate independently.
 Heterogeneity − It refers to the uniformity or dissimilarity of the data models, system
components and databases.

Architectural Models

 Client - Server Architecture for DDBMS


 Peer - to - Peer Architecture for DDBMS
 Multi - DBMS Architecture

Client - Server Architecture for DDBMS


This is a two-level architecture where the functionality is divided into servers and clients. The server
functions primarily encompass data management, query processing, optimization and transaction
management. Client functions include mainly user interface. However, they have some functions like
consistency checking and transaction management.
The two different client - server architecture are −
 Single Server Multiple Client
 Multiple Server Multiple Client (shown in the following diagram)

Peer- to-Peer Architecture for DDBMS


In these systems, each peer acts both as a client and a server for imparting database services. The peers
share their resource with other peers and co-ordinate their activities.
This architecture generally has four levels of schemas −
 Global Conceptual Schema − Depicts the global logical view of data.
 Local Conceptual Schema − Depicts logical data organization at each site.
 Local Internal Schema − Depicts physical data organization at each site.
 External Schema − Depicts user view of data.

Multi - DBMS Architectures


This is an integrated database system formed by a collection of two or more autonomous database
systems.
Multi-DBMS can be expressed through six levels of schemas −
 Multi-database View Level − Depicts multiple user views comprising of subsets of the
integrated distributed database.
 Multi-database Conceptual Level − Depicts integrated multi-database that comprises of global
logical multi-database structure definitions.
 Multi-database Internal Level − Depicts the data distribution across different sites and multi-
database to local data mapping.
 Local database View Level − Depicts public view of local data.
 Local database Conceptual Level − Depicts local data organization at each site.
 Local database Internal Level − Depicts physical data organization at each site.
There are two design alternatives for multi-DBMS −

 Model with multi-database conceptual level.


 Model without multi-database conceptual level.
Data Model

Data Model gives us an idea that how the final system will look like after its complete implementation. It
defines the data elements and the relationships between the data elements. Data Models are used to show
how data is stored, connected, accessed and updated in the database management system. Here, we use a
set of symbols and text to represent the information so that members of the organization can communicate
and understand it. 
1 Hierarchical Model
2 Network Model
3 Entity-Relationship Model
4 Relational Model
5 Object-Oriented Data Model
6 Object-Relational Data Model
7 Flat Data Model
8 Semi-Structured Data Model
9 Associative Data Model
10 Context Data Model

Hierarchical Model
Hierarchical Model was the first DBMS model. This model organizes the data in the hierarchical tree
structure. The hierarchy starts from the root which has root data and then it expands in the form of a
tree adding child node to the parent node. 

Features of a Hierarchical Model


1. One-to-many relationship: The data here is organised in a tree-like structure where the one-to-
many relationship is between the datatypes. Also, there can be only one path from parent to any
node. Example: In the above example, if we want to go to the node sneakers we only have one
path to reach there i.e through men's shoes node.
2. Parent-Child Relationship: Each child node has a parent node but a parent node can have more
than one child node. Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child node is automatically deleted.
4. Pointers: Pointers are used to link the parent node with the child node and are used to navigate
between the stored data. Example: In the above example the 'shoes' node points to the two other
nodes 'women shoes' node and 'men's shoes' node.
Advantages of Hierarchical Model
 It is very simple and fast to traverse through a tree-like structure.
 Any change in the parent node is automatically reflected in the child node so, the integrity of
data is maintained.
Disadvantages of Hierarchical Model
 Complex relationships are not supported.
 As it does not support more than one parent of the child node so if we have some complex
relationship where a child node needs to have two parent node then that can't be represented
using this model.
 If a parent node is deleted then the child node is automatically deleted.
Network Model
This model is an extension of the hierarchical model. It was the most popular model before the
relational model. This model is the same as the hierarchical model, the only difference is that a record
can have more than one parent. It replaces the hierarchical tree with a graph. 

Features of a Network Model


1. Ability to Merge more Relationships: In this model, as there are more relationships so data is
more related. This model has the ability to manage one-to-one relationships as well as many-to-
many relationships.
2. Many paths: As there are more relationships so there can be more than one path to the same
record. This makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with the help of the
circular linked list. The current position is maintained with the help of a program and this
position navigates through the records according to the relationship.
Advantages of Network Model
 The data can be accessed faster as compared to the hierarchical model. This is because the data
is more related in the network model and there can be more than one path to reach a particular
node. So the data can be accessed in many ways.
 As there is a parent-child relationship so data integrity is present. Any change in parent record
is reflected in the child record.
Disadvantages of Network Model
 As more and more relationships need to be handled the system might get complex. So, a user
must be having detailed knowledge of the model to work with the model.
 Any change like updation, deletion, insertion is very complex.
Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this model, we
represent the real-world problem in the pictorial form to make it easy for the stakeholders to
understand. It is also very easy for the developers to understand the system by just looking at the ER
diagram. We use the ER diagram as a visual tool to represent an ER Model. ER diagram has the
following three components:
 Entities: Entity is a real-world thing. It can be a person, place, or even a
concept. Example: Teachers, Students, Course, Building, Department, etc are some of the
entities of a School Management System.
 Attributes: An entity contains a real-world property called attribute. This is the characteristics
of that attribute. Example: The entity teacher has the property like teacher id, salary, age, etc.
 Relationship: Relationship tells how two attributes are related. Example: Teacher works for a
department.
Example:

In the above diagram, the entities are Teacher and Department. The attributes of  Teacher entity are
Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The attributes of entity Department entity
are Dept_id, Dept_name. The two entities are connected using the relationship. Here, each teacher
works for a department.
Features of ER Model
 Graphical Representation for Better Understanding: It is very easy and simple to understand
so it can be used by the developers to communicate with the stakeholders.
 ER Diagram: ER diagram is used as a visual tool for representing the model.
 Database Design: This model helps the database designers to build the database and is widely
used in database design.
Advantages of ER Model
 Simple: Conceptually ER Model is very easy to build. If we know the relationship between the
attributes and the entities we can easily build the ER Diagram for the model.
 Effective Communication Tool: This model is used widely by the database designers for
communicating their ideas.
 Easy Conversion to any Model: This model maps well to the relational model and can be easily
converted relational model by converting the ER model to the table. This model can also be
converted to any other model like network model, hierarchical model etc.
Disadvatages of ER Model
 No industry standard for notation: There is no industry standard for developing an ER model.
So one developer might use notations which are not understood by other developers.
 Hidden information: Some information might be lost or hidden in the ER model. As it is a
high-level view so there are chances that some details of information might be hidden.
Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the form of a
two-dimensional table. All the information is stored in the form of row and columns. The basic
structure of a relational model is tables. So, the tables are also called  relations in the relational
model. Example: In this example, we have an Employee table.

Features of Relational Model


 Tuples: Each row in the table is called tuple. A row contains all the information about any
instance of the object. In the above example, each row has all the information about any
specific individual like the first row has information about John.
 Attribute or field: Attributes are the property which defines the table or relation. The values of
the attribute should be from the same domain. In the above example, we have different
attributes of the employee like Salary, Mobile_no, etc.
Advnatages of Relational Model
 Simple: This model is more simple as compared to the network and hierarchical model.
 Scalable: This model can be easily scaled as we can add as many rows and columns we want.
 Structural Independence: We can make changes in database structure without changing the
way to access the data. When we can make changes to the database structure without affecting
the capability to DBMS to access the data we can say that structural independence has been
achieved.
 Data is stored in tables called relations.
 Relations can be normalized.
 In normalized relations, values saved are atomic values.
 Each row in a relation contains a unique value.
 Each column in a relation contains values from a same domain .

Disadvantages of Relatinal Model


 Hardware Overheads: For hiding the complexities and making things easier for the user this
model requires more powerful hardware computers and data storage devices.
 Bad Design: As the relational model is very easy to design and use. So the users don't need to
know how the data is stored in order to access it. This ease of design can lead to the
development of a poor database which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model. These
problems can be avoided with the help of proper implementation and organisation.
Object-Oriented Data Model
The real-world problems are more closely represented through the object-oriented data model. In this
model, both the data and relationship are present in a single structure known as an object. We can store
audio, video, images, etc in the database which was not possible in the relational model(although you
can store audio and video in relational database, it is adviced not to store in the relational database). In
this model, two are more objects are connected through links. We use this link to relate one object to
other objects. This can be understood by the example given below.

In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name, Job_title of the
employee and the methods which will be performed by that object are stored as a single object. The two
objects are connected through a common attribute i.e the Department_id and the communication
between these two will be done with the help of this common id.
Object-Relational Model
As the name suggests it is a combination of both the relational model and the object-oriented model.
This model was built to fill the gap between object-oriented model and the relational model. We can
have many advanced features like we can make complex data types according to our requirements
using the existing data types. The problem with this model is that this can get complex and difficult to
handle. So, proper understanding of this model is required.
Flat Data Model
It is a simple model in which the database is represented as a table consisting of rows and columns. To
access any data, the computer has to read the entire table. This makes the modes slow and inefficient.
Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We cannot differentiate between
data and schema in this model. Example: Web-Based data sources which we can't differentiate between
the schema and data of the website. In this model, some entities may have missing attributes while
others may have an extra attribute. This model gives flexibility in storing the data. It also gives
flexibility to the attributes. Example: If we are storing any value in any attribute then that value can be
either atomic value or a collection of values.
Associative Data Model
Associative Data Model is a model in which the data is divided into two parts. Everything which has
independent existence is called as an entity and the relationship among these entities are
called association. The data divided into two parts are called items and links.
 Item: Items contain the name and the identifier(some numeric value).
 Links: Links contain the identifier, source, verb and subject.
Example: Let us say we have a statement "The world cup is being hosted by London from 30 May
2020". In this data two links need to be stored:
1. The world cup is being hosted by London. The source here is 'the world cup', the verb 'is being'
and the target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb is 'from' and the target is '30
May 2020'.
This is represented using the table as follows:

Context Data Model


Context Data Model is a collection of several models. This consists of models like network model,
relational models etc. Using this model we can do various types of tasks which are not possible using
any model alone.

Entity-Relationship Model
The entity-relationship (E-R) data model was developed to facilitate database design by allowing
specification of an enterprise schema that represents the overall logical structure of a database. The E-R
model is very useful in mapping the meanings and interactions of real world enterprises onto a conceptual
schema
The E-R data model employs three basic concepts: entity sets, relationship sets, and attributes.
Entity Sets
An entity is a “thing” or “object” in the real world that is distinguishable from all other objects.
For example, each person in a university is an entity. An entity has a set of properties, and the values for
some set of properties must uniquely identify an entity. For instance, a person may have a person id
property whose value uniquely identifies that person. Thus, the value 677-89-9011 for person id would
uniquely identify one particular person in the university.
Similarly, courses can be thought of as entities, and course id uniquely identifies a course entity
in the university. An entity set is a set of entities of the same type that share the same properties, or
attributes. An entity is represented by a set of attributes.
Attributes are descriptive properties possessed by each member of an entity set. The designation
of an attribute for an entity set expresses that the database stores similar information concerning each
entity in the entity set; however, each entity may have its own value for each attribute.
Possible attributes of the instructor entity set are ID, name, dept name, and salary. And also
course entity set are course id, title, dept name, and credits. Each entity has a value for each of its
attributes. For instance, a particular instructor entity may have the value 12121 for ID, the value Wu for
name, the value Finance for dept name, and the value 90000 for salary.
An entity set is represented in an E-R diagram by a rectangle, which is divided into two parts.
The first part, which in this text is shaded blue, contains the name of the entity set. The second part
contains the names of all the attributes of the entity set. The E-R diagram in Figure 6.1 shows two entity
sets instructor and student. The attributes associated with instructor are ID, name, and salary. The
attributes associated with student are ID, name, and tot cred. Attributes are part of the primary key.

Relationship Sets:
A relationship is an association among several entities. For example, we can define a relationship advisor
that associates instructor Katz with student Shankar. This relationship specifies that Katz is an advisor to
student Shankar. A relationship set is a set of relationships of the same type.
Consider two entity sets instructor and student. We define the relationship set advisor to denote the
associations between students and the instructors who act as their advisors. A relationship instance in an
E-R schema represents an association between the named entities in the real-world enterprise that is being
modeled. As an illustration, the individual instructor entity Katz, who has instructor ID 45565, and the
student entity Shankar, who has student ID 12345, participate in a relationship instance of advisor. This
relationship instance represents that in the university, the instructor Katz is advising student Shankar.
A relationship set is represented in an E-R diagram by a diamond, which is linked via lines to a number of
different entity sets (rectangles). Eg: Studens and enrolled courses. Relationship is taken.

The function that an entity plays in a relationship is called that entity’s role. Since entity sets participating
in a relationship set are generally distinct, roles are implicit and are not usually specified. However, they
are useful when the meaning of a relationship needs clarification. Such is the case when the entity sets of
a relationship set are not distinct; that is, the same entity set participates in a relationship set more than
once, in different roles. In this type of relationship set, sometimes called a recursive relationship set,
explicit role names are necessary to specify how an entity participates in a relationship instance.

Figure 6.4 shows the role indicators course id and prereq id between the course entity set and the prereq
relationship set
A relationship may also have attributes called descriptive attributes. As an example of descriptive
attributes for relationships, consider the relationship set takes which relates entity sets student and section.
We may wish to store a descriptive attribute grade with the relationship to record the grade that a student
received in a course offering. An attribute of a relationship set is represented in an E-R diagram by an
undivided rectangle. We link the rectangle with a dashed line to the diamond representing that
relationship set. A relationship set may have multiple descriptive attributes.
It is possible to have more than one relationship set involving the same entity sets. The relationship sets
advisor and takes provide examples of a binary relationship set—that is, one that involves two entity sets.
Most of the relationship sets in a database system are binary. Occasionally, however, relationship sets
involve more than two entity sets. The number of entity sets that participate in a relationship set is the
degree of the relationship set. A binary relationship set is of degree 2; a ternary relationship set is of
degree 3.

Complex Attributes
For each attribute, there is a set of permitted values, called the domain, or value set, of that attribute. The
domain of attribute course id might be the set of all text strings of a certain length. Similarly, the domain
of attribute semester might be strings from the set {Fall, Winter, Spring, Summer}.

Simple and composite attributes:


Simple attributes have not been divided into subparts. Composite attributes, on the other hand, can be
divided into subparts.
For example,
An attribute name could be structured as a composite attribute consisting of first name, middle initial, and
last name. composite attribute address with the attributes street, city, state, and postal code
Single-valued and multivalued attributes:
The attributes in our examples all have a single value for a particular entity. For instance, the student ID
attribute for a specific student entity refers to only one student ID. Such attributes are said to be single
valued. There may be instances where an attribute has a set of values for a specific entity.
An instructor may have zero, one, or several phone numbers, and different instructors may have different
numbers of phones. This type of attribute is said to be multivalued.
As another example, we could add to the instructor entity set an attribute dependent name listing all the
dependents. This attribute would be multivalued, since any particular instructor may have zero, one, or
more dependents.
Derived attributes:
The value for this type of attribute can be derived from the values of other related attributes or entities.
For instance, let us say that the instructor entity set has an attribute students advised, which represents
how many students an instructor advises. We can derive the value for this attribute by counting the
number of student entities associated with that instructor.
As another example, suppose that the instructor entity set has an attribute age that indicates the
instructor’s age. If the instructor entity set also has an attribute date of birth, we can calculate age from
date of birth and the current date. Thus, age is a derived attribute. In this case, date of birth may be
referred to as a base attribute, or a stored attribute. The value of a derived attribute is not stored but is
computed when required.
Here, a composite attribute name with component attributes first name, middle initial, and last name
replaces the simple attribute name of instructor. As another example, suppose we were to add an address
to the instructor entity set. The address can be defined as the composite attribute address with the
Attributes Street, city, state, and postal code. The attribute street is itself a composite attribute whose
component attributes are street number, street name, and apartment number. The figure also illustrates a
multivalued attribute phone number, denoted by “{phone number}”, and a derived attribute age,
depicted by “age ( )”.
An attribute takes a null value when an entity does not have a value for it. The null value may indicate
“not applicable”—that is, the value does not exist for the entity. For example, a person who has no middle
name may have the middle initial attribute set to null. Null can also designate that an attribute value is
unknown.
Mapping Cardinalities:
Mapping cardinalities, or cardinality ratios, express the number of entities to which another entity can be
associated via a relationship set. Mapping cardinalities are most useful in describing binary relationship
sets, although they can contribute to the description of relationship sets that involve more than two entity
sets. For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of
the following:
• One-to-one: An entity in A is associated with at most one entity in B, and an entity in B is associated
with at most one entity in A.
• One-to-many: An entity in A is associated with any number (zero or more) of entities in B. An entity in
B, however, can be associated with at most one entity in A.
• Many-to-one: An entity in A is associated with at most one entity in B. An entity in B, however, can be
associated with any number (zero or more) of entities in A.
 Many-to-many: An entity in A is associated with any number (zero or more) of entities in B, and
an entity in B is associated with any number (zero or more) of entities in A.
In the E-R diagram notation, we indicate cardinality constraints on a relationship by drawing either a
directed line (→) or an undirected line (—) between the relationship set and the entity set in question.
Specifically, for the ABOVE university example.
The participation of an entity set E in a relationship set R is said to be total if every entity in E
must participate in at least one relationship in R. If it is possible that some entities in E do not participate
in relationships in R, the participation of entity set E in relationship R is said to be partial.

E-R diagrams also provide a way to indicate more complex constraints on the number of times each entity
participates in relationships in a relationship set. A line may have an associated minimum and maximum
cardinality, shown in the form l..h, where lis the minimum and h the maximum cardinality. A minimum
value of 1 indicates total participation of the entity set in the relationship set; that is, each entity in the
entity set occurs in at least one relationship in that relationship set. A maximum value of 1 indicates that
the entity participates in at most one relationship, while a maximum value ∗ indicates no limit.

You might also like