You are on page 1of 85

Unit 2.

3: Database Concept

Srawan Kumar KC
Full Time Faculty ( Computer and IT ), LACM
Learning Objectives

• Brief understanding about Database and database


Management

• Discuss traditional data file organization and its


problems.

• Explain how a database approach overcomes the


problems associated with the traditional file
environment.

• Describe the three most common data models.


Unit 2.3 : Database Concept

Database :

• Database is a collection of related items or facts


arranged in a specific structure.

• Can easily be accessed, managed, and updated.

• Manipulated by a data-processing system for a specific


purpose.

• Its a centralized control of data.


Database :

• A database consists of a file or set of files that can be


broken down into records, each of which consists of
one or more fields.
Characteristics of Database

Shared : Data in a database are shared among


different users and applications.

Persistence : Data in a database exist permanently in


the sense, the data can live beyond the
scope of the process that created it.

Integrity : Data should be correct with respect to


the real world entity that they can
represent.
Characteristics of Database

Security : Data should be protected from


unauthorized access.

Consistency : Data element in a database represents


a real world values which should be
consistent with respect to the relationship.

Non-redundancy : No two data items in a database


should represent the same real
world entity.
1). Basic of data arrangement and access :

Bytes : 8 bits = 1 byte

Fields : Piece of information is stored in its own


location, called Field.

For e.g., each entry has a field for Student


ID, Forename, Surname and Date of Birth

Records : One full set of fields i.e. all the related


information about one person or object is
called record.

Tables : Complete collection of records make a table.

File : A logical grouping of related records is called file.

Database : A logical grouping of related files is called


database.
1). Basic of data arrangement and access :

The Data Hierarchy


1). Basic of data arrangement and access :

Data Management Terminology

Entity : A person, place, thing, or event about which


information is maintained (a real world object or
concept). E.g. Ram, pen etc Records describe
entities.

Attribute : Each characteristic or quality describing a


particular entity is called attribute. E.g. account-
no, acc-holder-name, bal etc are attributes of
account entity. Fields describe attributes.

Primary Key : Field (attribute) that uniquely identifies the


record in the database is called primary key. E.g.
acc-no can be a primary key for the account
entity.
1). Basic of data arrangement and access :

Storing and Access Records

• Sequential access media (like tape) stores records


sequentially based on key values.

• But direct (or random) access media (like disks) use other
techniques:

a) Indexed Sequential Access Method (ISAM):

• It uses an index to locate individual records. Index is a


list of the key field of each record and memory address
where that record is physically located.
1). Basic of data arrangement and access :

Storing and Access Records

b) Direct File Access Method:

• It uses the key field to locate the physical address of a


record. Transform algorithm translates the key field
value directly into the record’s storage location
1). Basic of data arrangement and access :

Index Sequential Access Methods

Database
Index
College Name of GPA
LACM
Student
KCM
NCM LACM Rosha 3.8
LACM Hemant 4.0
LACM Priyanka 3.8
KCM Pawan 2.9
KCM Ojaswee 3.6
KCM Manjila 3.8
NCM Nirbachan 3.8
NCM Neha 2.6
1). Basic of data arrangement and access :

Direct File Access Methods


Database

Memory Location Roll Name Grade


0 1 Rosha 3.8
10 2 Hemant 4.0
Transform algorithm: (Roll-1)*10 20 3 Priyanka 3.6
To find record of student having roll=3
(3-1)*10= 20 30 4 Pawan 2.9
40 5 Ojaswee 3.6
50 6 Manjila 3.8
60 7 Nirbachan 3.8
70 8 Neha 2.6
2). Traditional File Environment :

• In the traditional file management environment, related


records are kept in a data file related to it.

• It stores data in plain text files and is also called flat file
system.

• This type of database is ideal for simple databases that


do not contain a lot of repeated information.

• Examples include excel spreadsheet or word data


list file.
2). Traditional File Environment :
2). Traditional File Environment :

Problems in Traditional File Environment


• Keeping organizational information in a file-processing system has a
number of major disadvantages:

Data Redundancy :

• The address and telephone number of a particular customer may


appear in a file that consists of savings-account records and in a file
that consists of checking-account records.

• This redundancy leads to higher storage and access cost.

Data Inconsistency :
• The various copies of the same data may no longer agree.
• For example, a changed customer address may be reflected in
savings-account records but not elsewhere in the system.
2). Traditional File Environment :

Problems in Traditional File Environment


Difficulty in Accessing Data:

• Conventional file-processing environments do not allow


needed data to be retrieved in a convenient and efficient
manner.

Data Isolation:

• Because data are scattered in various files, and files may be in


different formats, writing new application programs to retrieve
the appropriate data is difficult.

Integrity Problems:

• The problem of integrity is the problem of ensuring that the


data in database is correct after and before the transaction.
2). Traditional File Environment :

Problems in Traditional File Environment

Security Problems :

• Not every user of the database system should be able to


access all the data.

• For example, in a banking system, payroll personnel


need to see only that part of the database that has
information about the various bank employees. They do
not need access to information about customer accounts.

• But, enforcing such security constraints is difficult in flat


file system.
3) Database : The Modern Approach

 A modern approach of Database includes database with


database management system.

 Database is the collection of related persistent data and


contains relevant information.

 Where as a Database Management System (DBMS) is a


set of programs to access those data.

 The primary goal of a DBMS is to provide a way to store and


retrieve database information that is both convenient and
efficient.
3) Database : The Modern Approach
3) Database : The Modern Approach
Creating the Database

• To create a database, designers must develop a


conceptual design and a physical design.

Conceptual Design:

• An abstract model of a database from the user or


business perspective.

Physical Design:

• Layout that shows how a database is actually


arranged on storage devices.

TG 3 21
3) Database : The Modern Approach
Entity Relation Modelling

• The process of designing a database by organizing data entities


to be used and identifying the relationships among them.

a) Entity – Relation (ER) Diagram:

• Document that shows data entities and attributes and


relationships among them.

• Consists of entities, attributes and relationships; each is


represented on the diagram.

b) Entity Class / Set:

• A grouping of entities of a given type.

• For example, set of all persons, companies, trees, holidays etc.


3) Database : The Modern Approach
Entity Relation Modelling

c) Instance:

• A particular entity within an entity class.

• For example Ram is an instance of Person entity set. An


entity set is represented by rectangular box.

d) Attributes / Identifier:

• An entity is described by a set of properties.

• For example, a customer entity can have customer-id,


customer-name, customer-street, and customer-city as
attributes.
3) Database : The Modern Approach
Entity Relation Modelling

f) Relationships:

• The conceptual linking of entities in a database.


• The number of entities in a relationship is the
degree of the relationship.

• Relationships between two items are common and


are called binary relationships.
3) Database : The Modern Approach
Entity Relation Modelling

• Overall logical structure of a database can be


expressed graphically by E-R diagram. The basic
components of this diagram are:

Rectangles (represent entity sets)


Ellipses (represent attributes)
Diamonds (represent relationship sets among
entity sets)
Lines (link attributes to entity sets and entity
sets to relationship sets)
3) Database : The Modern Approach
Entity Relation Modelling
3) Database : The Modern Approach
Entity Relation Modelling
 Binary relationship between entity sets can be categorized into
three categories.

One-to-One
 In a 1:1 (one-to-one) relationship, a single-entity instance of
one type is related to a single-entity instance of another type.

One-to-Many
 In a 1: M (one-to-many) relationship, a single-entity instance of
one type is related to many-entity instance of another type.

Many-to-Many
 In a M:M (many-to-many) relationship, many-entity instance of
one type is related to many-entity of another type and vice
versa.
3) Database : The Modern Approach
Entity Relation Modelling
3) Database : The Modern Approach
Normalization :

 Normalization is the process of organizing data in a


database in which complex relation / table is broken into
simpler relations.

 This includes creating tables and establishing relationships


between those tables according to rules to protect the
data

 Normalization helps to achieve

a) Minimum redundancy
b) Maximum data integrity
c) Best processing performance
3) Database : The Modern Approach
Normalization :
 Consider the employee table given below:

Eid Name Address Age Salary Phone

E001 Ram Ktm 34 50000 4567329

E001 Ram Ktm 34 50000 4567328

E002 Alina Ktm 26 40000 5567893

E002 Alina Ktm 26 40000 5567892

E003 Bibek Bkt 29 45000 4563219

 Since an employee may have more than one phone


numbers it causes high duplication of the data. We can
minimize the duplication by breaking the table as below.
3) Database : The Modern Approach

Normalization :

Eid Name Address Age Salary Eid Phone

E001 Ram Ktm 34 50000 E001 4567329


E001 4567328
E002 Alina Ktm 26 40000
E002 5567893
E003 Bibek Pkr 29 45000 E002 5567892
E003 4563219
3) Database : The Modern Approach
Locating Data in Database

 We have two choices: Centralized or Distributed for


locating the data.

 Choice will affect user accessibility, query response time,


data entry, security, and cost.

a) Centralized Database

b) Distributed Database
3) Database : The Modern Approach
Locating Data in Database

a) Centralized database:

• All the related files are in one physical location.


• This strategy provides database administrators with the
ability to work on a database as a whole at one location.

• Data consistency is improved and security is easier.


• Files are only accessible via the centralized host computer.
• Speed problem due to transmission delays is main
drawback of this approach.
3) Database : The Modern Approach
Locating Data in Database
a) Centralized database:
3) Database : The Modern Approach
Locating Data in Database
b) Distributed database:

 Complete copies of a database, or portions of a database, are


in more than one location, close to the user.

Type 1: Replicated database:

 Copies of database is stored in many locations. Main motto of


this approach is to increase the user access responsiveness.

Type 2: Partitioned databases:

 A portion of the database rather than complete copy is stored


in many locations. Each location responsible for its own data.
3) Database : The Modern Approach
Locating Data in Database
b) Distributed database:
3) Database : The Modern Approach
Centralized Vs Distributed Database
4) Database Management System :

• It is a software that provides services for


accessing a database, while maintaining all
the required features of database.

• That allows any number of users to access


and modify that data in a database.

• Provides tools that enable users to construct


special request (called queries) to obtain selected
records easily from the database.

• DBMS is an intermediate layer between


programs and the data.
4) Database Management System :
4) Database Management System :

Why DBMS ?

• A DBMS provides a secure and survivable medium


for the storage and retrieval of data.

• In real world data have structure, related to one


another and has constraints.

• Also the different users of data need to create,


access and manipulated.

• DBMS provides mechanism to achieve these


objectives without compromising the security and
integrity of the data.
4) Database Management System :
Function of DBMS
1) Transaction Processing :
• Transaction is a sequence of database operations
that represents a logical unit of work.

• It accesses a database and transforms it from one


state to another.

• A transaction can update a record, delete one,


modify a set of records etc.

2) Concurrency Management :
• Concept that is used to address conflicts with the
simultaneous accessing or altering of data that
can occur with a multi-user system.

•The Concurrency is about to control the multi- user


access of Database.
4) Database Management System :
Function of DBMS
3) Recovery :
• Ensure that aborted or failed transaction do not
create any adverse effects on the database.

• Make sure database returned to a consistent state


after a transaction fails or aborts.

• Recovery procedures in place that will restore


service to a failing component as quickly as possible.

4) Security :

• Refers to the protection of data against


unauthorized access.

• It make sure that only authorized users are given


access to the data in the database.
4) Database Management System :
Function of DBMS

5) Storage Management :

• Process used to maximize or improve the


performance of their data storage resources.

• Mechanism for the management of permanent


storage of the data.

• Responsible for the transfer of data between the


main memory and secondary storage (such as
disk or tape).
4) Database Management System :
Advantages of DBMS

• Redundancy can be reduced

• Inconsistency can be removed

• Data can be shared

• Security restrictions can be applied

• Integrity can be maintained

• Reduced data entry, storage, and retrieval costs

• Provide Backup and Recovery Procedures

• Conflict resolution

• Standards can be enforced


4) Database Management System :
Disadvantages of DBMS

• Complex, difficult and time consuming to design

• Extensive conversion cost from File Based System


to Database System

• Additional hardware and software cost

• Trainings required for programmers and users

• Complexity of Backup and Recovery

• Only efficient for particularly large organizations

• Higher impact of failure


4) Database Management System :
Logical Vs Physical View
a) Logical view :

 Represents data in a format that is meaningful to a user


(e.g., tables with fields and records).

b) Physical view :

 Deals with the actual, physical arrangement and location


of data in the direct access storage devices (DASD)

 DBMS shield the end-user and programmer from


having to know about the physical location of the data;
users only have to know the logical way it’s organized
4) Database Management System :
DBMS Components
a) Data Models:

• A data model is a collection of conceptual tools for describing


data, data relationships, data semantics, and consistency
constraints.

b) Database Schema:

• The overall design of the database which is not expected to


change frequently is called the database schema.

• The physical and logical schema describes the database design


at corresponding levels.
4) Database Management System :
DBMS Components
c) Subschema:

• The schema at the view level is sometimes called subschema and


describes the view of the database.

• It is the specific set of data from the database that is required by


each application. A database may have several subschema.

d) Database Instances (state) :

 Databases change over time as information is inserted and


deleted.

 The collection of information stored in the database at a particular


moment is called an instance of the database. It is also known as
database state.
4) Database Management System :
DBMS Languages
a) Data Definition Language (DDL) :

• Set of statements that describe a database structure (all


record types and data set types).

b) Data Manipulation Language (DML) :

• Instructions used with higher-level programming languages


to query the contents of the database, store or update
information, and develop database applications.

c) Structured query language (SQL):

• Popular relational database language that enables users to


perform complicated searches with relatively simple instructions.
4) Database Management System :
DBMS Languages
d) Query by Example :

• Database language that enables the user to fill out a


grid (form) to construct a sample or description of the
data wanted.

e) Data Dictionary:

• File that contains metadata. This is the data about


data. This file is consulted before actual data are read
or modified in the database system.
Logical Data Models
 The basic structure or design of the database is the data
model. A data model is a collection of conceptual tools for
describing data, data relationships, data semantics, and
consistency constraints.
 The most common data models are Hierarchical, Network,
Entity Relationship and Relational. Other types of data
models include multidimensional, object-relational,
hypermedia, embedded, and virtual.

Slide 1-51
Logical Data Models
 Hierarchical Data Model:
◦ A hierarchical database model is a data model in which the
data is organized into a tree-like structure.
◦ The structure allows representing information using
parent/child relationships: each parent can have many
children, but each child has only one parent. All attributes
of a specific record are listed under an entity type.
◦ Hierarchical database model rigidly structures data into an
inverted “tree” in which each record contains two elements,
a single root or master field, often called a key, and a
variable number of subordinate fields.

Slide 1-52
Logical Data Models
 Hierarchical Data Model:
◦ The strongest advantage of the hierarchical database
approach is the speed and efficiency with which it can be
searched for data.
◦ The hierarchical model does have problems: Access to data
in this model is predefined by the database administrator
before the programs that access the data are written.
Programmers must follow the hierarchy established by the
data structure.
◦ It can not handle many to many relationship

Slide 1-53
Logical Data Models
 Hierarchical Data Model:

Slide 1-54
Logical Data Models
 Network Data Model:
◦ The network model is a database model conceived as a flexible
way of representing objects and their relationships.
◦ Its distinguishing feature is that the schema, viewed as a graph
in which object types are nodes and relationship types are arcs.
◦ Unlike hierarchical model, the network model allows each
record to have multiple parent and child records, forming a
lattice structure. The network model replaces the hierarchical
tree with a graph thus allowing more general connections
among the nodes. This model was evolved to specially handle
non-hierarchical relationships. Now, same data may exist in
two different levels.

Slide 1-55
Logical Data Models
 Network Data Model:
◦ Support many-to-many relationship
◦ Good processing speed but very complex model to design,
implement and maintain

Slide 1-56
Logical Data Models
 Relational Data Model
◦ Relational data model represents the database as a collection of
relations where each relation resembles a table of values with
rows and columns.
◦ A relation may be regarded as a set of tuples, also called
records. A relation consists of relation schema & relation
instance. The relation schema specifies relation name &
description of tuples (name of attributes, domain). While
relation instance corresponds to a table of rows & columns
where each row is a tuple & the column is the attribute.
◦ Major advantages of relational model over the older data
models are the simple data representation & the ways complex
queries can be expressed easily.
Slide 1-57
Logical Data Models
 Relational Data Model
◦ Consider a relation representing employee record as;
Employee
Eid Name Address Depart_no
011 Ram Sing Kathmandu D01
012 Hari Saha Pokhara D02
◦ The relational model of data permits the database designer to
create a consistent, logical representation of information.
Consistency is achieved by including declared constraints in the
database design, which is usually referred to as the logical
schema. The theory includes a process of database
normalization whereby a design with certain desirable properties
can be selected from a set of logically equivalent alternatives
Slide 1-58
Three basic operations of a relational database

 “Select” operation: creates a subset consisting of all


file records that meet stated criteria.
 “Join” operation: combines relational tables.

 “Project” operation: creates a subset consisting of

columns in a table, permitting the user to create


new tables that contain only the information
required.
Advantages and Disadvantages of Logical Data Models
(network, hierarchical, relational)
Model Advantages Disadvantages
Hierarchical Searching is fast and Access to data is predefined by
database model efficient. exclusively hierarchical relationships,
predetermined by administrator. Limited
search/query flexibility. Not all data are
naturally hierarchical.
Network Many more relationships This is the most complicated database
database model can be defined. There is model to design, Implement, and
greater speed and efficiency maintain greater query flexibility than with
than with relational hierarchical model, but less than with
database models. relational model.
Relational Conceptual simplicity; there Processing efficiency and speed are
database model are no predefined lower. Data redundancy is common,
relationships among data. requiring additional maintenance.
High flexibility in ad-hoc
querying. New data and
records can be added
easily.
Types of DBMS

• Entirely dependent upon how the database is


structured by that particular DBMS.

• There are four main types of  Database


Management System (DBMS).

a) Hierarchical Database

b) Network Database

c) Relational Database

d) Object-Oriented Database
a) Hierarchical Database

• Oldest database models.

• Refers as a tree structure in the most frequently


occurring relationship.

• Database are established in such a way that one


data item is present as the subordinate of another
one or a sub unit.

• Records contain information about there groups


of parent/child relationships.

• Also called one-to-many relationship.


a) Hierarchical Database

• Organizes data elements as tabular rows, one for


each instance of an entity, consider company’s
organizational structure.
b) Network Database

• Replaces the hierarchical model thus allowing


more general connections among nodes.

• Evolved specially to handle non-hierarchical


relationships.

• No tier or hierarchy exists with in the database,


also called many-to-many relationship.

• A network database looks more like a cobweb or


interconnected network of records.
b) Network Database

• Provide a tree-like hierarchy with the


exception that child tables were allowed to have
more than one parent.
c) Relational Database

• A database made up of a set of tables.

• A common field existing in any two tables creates


a relationship between tables.

• Relationship link rows from two tables by


embedding row identifier (keys) from one table as
attribute values in the other table.

• Every table shares at least one field with another


table in ‘one to one’, ‘one to many’, or ‘many to
many’ relationships.
c) Relational Database

• These relationships allow to access the data in


almost an unlimited number of ways.

• As well as combine the tables as building blocks


to create complex and very large database.
d) Object-oriented Database

• Object-oriented DBMS borrow from the model


of the Object-oriented programming  .

• It adds the database functionality to object


programming languages.

• Object-oriented databases use small, recyclable


separated of software called objects.

• The objects themselves are stored in the object-


oriented database.
d) Object-oriented Database

• Each object contains of two elements: Piece of


data and instructions or software programs called
methods.
6) Approaches to Manage Data :
Data Warehousing

• A data warehouse is a repository of information gathered


from multiple sources, stored under a unified schema, at a
single site.

• Data warehouses are designed to facilitate reporting and


analysis.

• A data warehouse is a subject-oriented, integrated,


time-variant, non-updatable collection of data used in
support of management decision-making processes and
business intelligence.
6) Approaches to Manage Data :
Data Warehousing

• Subject-oriented: It is organized around the key


subjects of the enterprise. e.g. customers, patients,
students, products.

• Integrated: A data stored in a data warehouse are


defined using consistent naming conventions, formats.

• Time-variant/Historical: Data in the data warehouse


contain a time dimension so that they may be used to
study trends and change over time.
6) Approaches to Manage Data :
Data Warehousing

• Non-updatable/Nonvolatile: Data in the data


warehouse are loaded and refreshed from operational
systems, but cannot be updated by end user.

• Use Online Analytical Processing (OLAP): Data


warehouses, which are designed to Support decision
makers use OLAP. OLAP involves analysis of
accumulated data by end users.

• Multidimensional: The data warehouse uses a


multidimensional data structure.
6) Approaches to Manage Data :
Data Mining

 Is a process of looking for unknown relationship and


patterns and extracting the useful information from large
volumes of data.

 In simple, data mining is the knowledge discovery in


database or data warehouse.

 Sophisticated tools employ algorithms to discover hidden


patterns, correlations, and relationships among
organizational data.
6) Approaches to Manage Data :
Data Mining

Examples:

• Market basket analysis-customer buying patterns: A


customer buying the product A will buy the product B
and C?

• Market segmentation: Identifies the common


characteristics of customers.
6) Approaches to Manage Data :
Function of Data Mining

 Data mining has five main functions:

 Classification: infers the defining characteristics of a


certain group (such as customers who have been lost to
competitors).

 Clustering: identifies groups of items that share a particular


characteristic. (Clustering differs from classification in that no
pre-defining characteristic is given in classification).

 Association: identifies relationships between events that


occur at one time (such as the contents of a shopping
basket).
6) Approaches to Manage Data :
Function of Data Mining
 Sequencing: similar to association, except that the
relationship exists over a period of time (such as repeat
visits to a supermarket or use of a financial planning
product).

 Forecasting: estimates future values based on patterns


within large sets of data (such as demand forecasting).

Applications :

 The applications for data mining are wide-ranging. They


include :

 Customer relationships (that is, customer retention);


cross-selling; campaign management; market, channel,
and pricing analysis.
6) Approaches to Manage Data :
Text Mining

 Text mining is the application of data mining to non


structured or less structured text files.

 Text mining, however, operates with less structured


information.

 Frequently focused on document format rather than


document content.
6) Approaches to Manage Data :
Text Mining

Text mining helps organizations to do the following:

 Find the “hidden” content of documents, including


additional useful relationships.

 Relate documents across previously unnoticed divisions


(e.g., discover that customers in two different product
divisions have the same characteristics).

 E.g., identify all the customers of an insurance firm who


have similar complaints and cancel their policies.
6) Approaches to Manage Data :
Data Mart

 A data mart is a scaled-down version of a data


warehouse that focuses on a particular subject.

 It is designed to support the unique business


requirements of a specific department or business
process.

 Because of its small scope, a data mart takes less time to


build, less costs, and is less complex than data
warehouse.
6) Approaches to Manage Data :
Data Warehouse Framework
6) Approaches to Manage Data :
Data Governance

• It is an approach to managing information across an


entire organization.

• It involves a formal set of business processes and


policies.

• That are designed to ensure that data are handled in a


well defined fashion.

• The objective is to make information available,


transparent and useful for the people authorized to
access it.

• From the moment it enters an organization, until it is


outdated and deleted.
6) Approaches to Manage Data :
Data Governance

• One approach to implement Data Governance is


master data management.

• Master Data Management is a process that spans


all organizational business processes and applications.

• It provides companies with the ability to store,


maintain, exchange, and synchronize the data.
6) Approaches to Manage Data :
Data Governance

• Master data are a set of core data such as customer,


product, employee.

• That span the enterprise information systems.

• It is different from the transaction data.

• In contrast, master data are applied to multiple


transactions.

• Used to categorize, aggregate, and evaluate the


transaction data.
6) Approaches to Manage Data :
Data Governance

For example,

•If Binaya purchase one Samsung 42-inch plasma


television, part number 675, from Sam House at New Road,
for RS 40000, on 5th April 2015.

•In this example, the master data are “product sold”,


“vendor”, “sales store” “purchase price”, “date” and “part
number”.

•When specific values are applied to the master data , then


a transaction is represented.
END

You might also like