You are on page 1of 86

Database

Chapter 2
Class 12

1
Data vs. Information
• Data are raw facts or figures.
• Information is the result of processing raw data to
reveal meaning.
• Information requires context to reveal meaning
• Raw data must be formatted for storage,
processing, and presentation.
• Data are the foundation of information, which is
the bedrock of knowledge
Database Systems, 9th Edition 2
Data vs. Information (cont’d.)
• Data: building blocks of information
• Information is produced by processing data.
• Information used to reveal meaning in data.
• Accurate, relevant, timely information is the key to
good decision making
• Good decision making is the key to organizational
survival

Database Systems, 9th Edition 3


Database definition
• In general, a database is anything that stores
data
– Example, a phone book which store name &
phone number
• In computer, a database refers to
– collection of organized data to be accessed,
retrieved and used.

4
Function of a database in general
• Allow anyone to:
– store (add)
– delete (remove)
– organize
– use
– present data

5
Example: data in Access database

6
Database Terminology
• A database or a table within a database is made of
RECORDS
• RECORDS contain information about a single item
in the database/table.
• Each record contains FIELDS.
• A FIELD is a category of data that has been broken
down into its simplest form.
– Firstname, surname, street, suburb

7
Terminology

• In a database or table, each ROW represents a


record
ID
ID CDTitle
CDTitle Units
UnitsIn
InStock
StockUnit
UnitPrice
Price
11 Joe
JoeCocker
CockerEssentials
Essentials 33 $28.00
$28.00
22 The Beatles
The Beatles 44 $35.00
$35.00
33 Aussie
AussieCountry
CountryHits
Hits 22 $19.95
$19.95

8
Terminology

• each COLUMN represents a field.

ID
ID CDTitle
CDTitle Units
UnitsIn
InStock
StockUnit
UnitPrice
Price
11 Joe Cocker Essentials
Joe Cocker Essentials 3 3 $28.00
$28.00
22 The Beatles
The Beatles 44 $35.00
$35.00
33 Aussie
AussieCountry
CountryHits
Hits 22 $19.95
$19.95

.
9
Why do we need database?
• Databases solve many of the problems
encountered in data management
– Used in almost all modern settings involving data
management:
• Business
• Research
• Administration
• Important to understand how databases work and
interact with other applications
Database Systems, 9th Edition 10
What is a DBMS?
• Database Management System is a software
that stores and manages the database.
• It is the computerized records keeping system.
• It allows to:
• create databases
• insert, update and delete data
• sort and query data
• create forms and reports
11
• Database Applications:
– Banking: all transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Manufacturing: production, inventory, orders, supply
chain
– Human resources: employee records, salaries, tax
deductions
• Example of DBMS are Microsoft Access, MySQL,
PostgreSQL, SQL Server, Oracle, dBASE, Clipper and
FoxPro
File system
A file is a collection of related records. It is the
traditional way of storing data electronically.
If there are 100 employees, then each employee would
have a record (e.g. called Employee Personal Details
record) and the collection of 100 such records would
constitute a file (in this case, called Employee Personal
Details file).

Files are integrated into a Database. This is done


using a Database Management System.
13
• In the early days, database applications were built on
top of file systems
• Drawbacks of using file systems to store data:
– Data redundancy and inconsistency
• Multiple file formats, duplication of information
in different files
– Difficulty in accessing data
• Need to write a new program to carry out each
new task
– Data isolation — multiple files and formats
– Integrity problems
– Concurrent access by multiple users
• Concurrent accessed needed for performance
• Uncontrolled concurrent accesses can lead to
inconsistencies
– E.g. two people reading a balance and updating
it at the same time
– Security problems
• Database systems offer solutions to all the above
problems
Role and Advantages of the DBMS
• DBMS is the intermediary between the user and
the database
– Database structure stored as file collection
– Can only access files through the DBMS
• DBMS enables data to be shared
• DBMS integrates many users’ views of the data

16
17
Advantages of the DBMS
• Advantages of a DBMS:
– Sharing data
– Reduced data redundancy
– Data backup and recovery
– Inconsistency avoided
– Data integrity
– Data security
– Data independence
– Multiple user interfaces
– Process complex query
Database Systems, 9th Edition 18
19
20
Disadvantages of DBMS
• Expensive
• Changing technology
• Needs technical training
• Backup is needed

21
DBMS Models
• DBMS Model describe:
– rules and standards on how database organize data
– defines how users view organization of data
• Common models include:
– Hierarchical Database model
– Network Database model
– Relational Database model
– ER Database Model

22
Hierarchical Database Model
• It organizes data in a tree structure
• all access to data starts at the top of the hierarchy
and moves downward
– for example, from customer to orders, vendor to
purchases, etc
• There is a hierarchy of parent and child data
segments
• support one to many relationship

23
24
Advantages
• It is the easiest model.
• It has one or more attributes.
• The searching is fast and easy, if parent is known.
• It supports one-to-one or one-to-many relationship.

Disadvantages
• Old fashioned and outdate database model.
• Does not support many-to-many relationship
• It increases redundancy.
25
Network Database Model
• The network model allows each record to have multiple
parent and child records, forming a generalized graph
structure
• Similar like hierarchical model but
– permit more than one parent per child
– thus permit the modeling of many-to-many relationships in data
• Very flexible
• Not widely use

26
27
28
Advantages
• It accepts many-to-many relationship, so it is more flexible.
• The searching is faster.
• It reduces redundancy because data shouldn’t be repeated
if same data is needed.

Disadvantages
• It is difficult to handle.
• There is less security because of sharing data.

29
Relational Model
• Data is stored in two-dimensional tables (rows and columns)
• These table are called relations.
• Widely used
• Example, Microsoft Access and MySQL

30
• Advantages
• The breaking of complex database table into simple database table
becomes possible.
• Database processing is faster than other model.
• There is very less redundancy.
• The integrity rules can easily be implemented.

Disadvantages
• It is more complex than other model
• There are too many rules because of complex relationships.
• It needs more powerful computers and data storage devices.

31
Entity-Relationship Database Model
• The entity relationship database model (ER-
Model) is based on perception of a real world that
contains the set of basic objects, called entities
and of relationships among these objects and
attributes of an entity.
• It is an overall logical structure of a database that
can be represented graphically.

32
Data Hierarchy
Definition- systematic organization of data, often in a hierarchical
form

Database File Database File: Physical file stored in a storage media.


Example: StudentDB.accdb

Table Table: Contain information on a specific subject / topic.


Example: Student, Courses

Record: Contain information on single data item in a


Record
table. Example: information about a student. Also known
as row in a table.

Field Field: Contain a specific piece of information within a


record. Example: Student Name, Student IC. Also known
as column in a table. 33
Example of an Access table

records

fields
34
Relational database
• stores data in tables
• tables are organized into columns, and each
column stores one type of data
• data for a single “instance” of a table is stored as a
row

35
What is RDBMS?
• RDBMS is a DBMS which manages a relational
database
– Example, Microsoft Access and MySQL
• Data is structured in tables, records and fields
• Each table consists of rows (records)
• Each table row consists of one or more fields
(columns)
• RDBMS store the data into collection of tables, which
might be related by common fields
36
Advantages of RDBMS
• Minimum data redundancy
• Data consistency - less likelihood of incorrect or
incomplete data being stored or used
• Integrated data – data is organized in logical relationship
thus making it easy to relate data items
• Data sharing – allows users from different department to
share data
• Data accessibility – allows users to access or retrieve in a
flexible manner
• Uniform security, privacy and integrity control – db
administrator can establish control for accessing, updating
and protecting the data
37
Categories of RDBMS
• Personal database
– Best in single-user environment (up to 10 users)
– Example: Microsoft Access

• Client/Server database
– Support multiple users in a network environment
– Run in a server, client can request data from the server &
query, update & report locally
– Example: SQL Server
38
Master table
• Contains a primary key (must be unique as a
password)
• Normally is a table that lists the properties of
things that have some permanence and used
many times in other tables
• Example customers, teachers, students and
subjects offered

39
Transaction table
• Records some kind of interaction or event between master
tables
• Transaction tables are typically used in posting operations or
as lookup tables
• Example,
– In Student Information System, the actual classes taken by
students are transactions because they record specific interactions
between students and teachers
– In an eCommerce software the shopping cart tables are all
transaction tables, they record the purchase of items by customers

40
Primary Key
• Field, or fields, which by itself, or together
uniquely identify each row in a table
• Used to match up records in different tables
• Usually indexed
• Help to define the relationships between tables

41
Primary Key
• Requirement: must be unique and cannot be
empty or null
• Functions:
• Used to associate data from multiple tables
• Prevent duplicate record
• Control the order of records
• Faster to locate records
• Possible to have 2 or more fields as primary key in
a table –this is called composite key

42
Foreign Key
• Same data field and type which is linked to a
primary key in a corresponding table
• Example:
– In SalesTransaction table below, CustomerID would be the
Foreign Key field
– The Foreign Key is used to look up the CustomerID in the
Customer table where the CustomerID is the primary key

Customer table SalesTransaction table


CustomerID (PRIMARY KEY) SalesID
CustomerName SalesDate
CustomerID (FOREIGN KEY) 43
Relationships
• Relationships establishes the association between
common fields in two tables
• Common field links two tables to each other, thus
ensuring connection between the data in the
tables within the same database
• REMEMBER: Access uses related tables - one table
can find and use data in another table

44
Abstraction
• Database systems are made-up of complex data
structures. To ease the user interaction with
database, the developers hide internal irrelevant
details from users. This process of hiding irrelevant
details from user is called data abstraction.

45
Levels of Abstraction
• Physical level describes how a record (e.g., customer) is stored.
• Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
• View level: application programs hide details of data types.
Views can also hide information (e.g., salary) for security
purposes.
View of Data
An architecture for a database system
Instances and Schemas
• Schema – the logical structure of the database
– e.g., the database consists of information about a set of
customers and accounts and the relationship between
them)
– Analogous to type information of a variable in a program
– Physical schema: database design at the physical level
– Logical schema: database design at the logical level
49
• Definition of instance: The data stored in database at a
particular moment of time is called instance of
database. Database schema defines the variable
declarations in tables that belong to a particular database;
the value of these variables at a moment of time is called
the instance of that database.
• For example, lets say we have a single table student in the
database, today the table has 100 records, so today the
instance of the database has 100 records. Lets say we are
going to add another 100 records in this table by tomorrow
so the instance of database tomorrow will have 200
records in table. In short, at a particular moment the data
stored in database is called the instance, that changes over
time when we add or delete data from the database.
50
• Physical Data Independence – the ability to
modify the physical schema without changing the
logical schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels
and components should be well defined so that
changes in some parts do not seriously influence
others.

51
Data Models
• A collection of tools for describing
– data
– data relationships
– data semantics
– data constraints
• Entity-Relationship model
• Relational model
• Other models:
– object-oriented model
– semi-structured data models
– Older models: network model and hierarchical
model
Entity-Relationship Model
Example of schema in the entity-relationship model
Entity Relationship Model
• E-R model of real world
– Entities (objects)
• E.g. customers, accounts, bank branch
– Relationships between entities
• E.g. Account A-101 is held by customer Johnson
• Relationship set depositor associates customers with accounts
• Widely used for database design
– Database design in E-R model usually converted to design in
the relational model which is used for storage and
processing
Relational Model Attributes

• Example of tabular data in the relational model


customer- customer- customer- account-
Customer-
name street city number
id
192-83-7465 Johnson Alma Palo Alto A-101
019-28-3746 Smith North Rye A-215
192-83-7465 Johnson Alma Palo Alto A-201
321-12-3123 Jones Main Harrison A-217
019-28-3746 Smith North Rye A-201
A Sample Relational Database
Data Definition Language (DDL)

• Specification notation for defining the database schema


– E.g.
create table account (
account-number char(10),
balance integer)
• DDL compiler generates a set of tables stored in a data dictionary
• Data dictionary contains metadata (i.e., data about data)
– database schema
– Data storage and definition language
• language in which the storage structure and access methods used by the
database system are specified
• Usually an extension of the data definition language
Data Manipulation Language (DML)
• Language for accessing and manipulating the data
organized by the appropriate data model
– DML also known as query language
• Two classes of languages
– Procedural – user specifies what data is required and
how to get those data
– Nonprocedural – user specifies what data is required
without specifying how to get those data
• SQL is the most widely used query language
SQL
• SQL: widely used non-procedural language
– E.g. find the name of the customer with customer-id 192-83-7465
select customer.customer-name
from customer
where customer.customer-id = ‘192-83-7465’
– E.g. find the balances of all accounts held by the customer with customer-id 192-
83-7465
select account.balance
from depositor, account
where depositor.customer-id = ‘192-83-7465’ and
depositor.account-number = account.account-number
• Application programs generally access databases through one of
– Language extensions to allow embedded SQL
– Application program interface (e.g. ODBC/JDBC) which allow SQL queries to be sent
to a database
Database Users
• Users are differentiated by the way they expect to interact with
the system
• Application programmers – interact with system through DML
calls
• Sophisticated users – form requests in a database query language
• Specialized users – write specialized database applications that do
not fit into the traditional data processing framework
• Naïve users – invoke one of the permanent application programs
that have been written previously
– E.g. people accessing database over the web, bank tellers, clerical staff
Database Administrator
• Coordinates all the activities of the database system; the
database administrator has a good understanding of the
enterprise’s information resources and needs.
• Database administrator's duties include:
– Schema definition
– Storage structure and access method definition
– Schema and physical organization modification
– Granting user authority to access the database
– Specifying integrity constraints
– Acting as liaison with users
– Monitoring performance and responding to changes in
requirements
Transaction Management
• A transaction is a collection of operations that performs
a single logical function in a database application
• Transaction-management component ensures that the
database remains in a consistent (correct) state despite
system failures (e.g., power failures and operating
system crashes) and transaction failures.
• Concurrency-control manager controls the interaction
among the concurrent transactions, to ensure the
consistency of the database.
Storage Management
• Storage manager is a program module that
provides the interface between the low-level data
stored in the database and the application
programs and queries submitted to the system.
• The storage manager is responsible to the
following tasks:
– interaction with the file manager
– efficient storing, retrieving and updating of data
Application Architectures

 Two-tier architecture: E.g. client programs using ODBC/JDBC to


communicate with a database
 Three-tier architecture: E.g. web-based applications, and
applications built using “middleware”
Normalization
Normalization
Normalization is a systematic way of ensuring that a database
structure is suitable for general-purpose querying and free of
certain undesirable characteristics that could lead to a loss of
data integrity.

The objectives of normalization:


• Free the database of modification anomalies
• Minimize redesign when extending the database structure
• Make the data model more informative to users
• Avoid bias towards any particular pattern of querying

In general, relational databases should be normalized to the


"third normal form". 
Background to Normalization: Definitions
Functional Dependency: If A and B are attributes of relation R, B is functionally
dependent on A (denoted A  B), if each A value is associated with precisely
one B value.

Or in other words, In every possible legal value of R (relation), whenever two tuple agree
on their A values, they also agree on their B value.
Determinant of a functional dependency refers to attribute or group of attributes on left-
hand side of the arrow.

e.g. in an "Employee" table that includes the attributes "Employee ID" and "Employee
Date of Birth", the functional dependency {Employee ID} → {Employee Date of Birth}
would hold.
Background to Normalization: Definitions

Full Functional Dependency


– A and B are attributes of a relation,
– B is fully dependent on A if B is functionally dependent on A
but not on any proper subset of A.

A functional dependency X  Y is full functional dependency if


removal of any attribute A from X means that the dependency
does not hold any more. 
Background to Normalization: Definitions
Transitive Dependency: A transitive dependency is an indirect
functional dependency. Let A, B, and C designate three distinct
attributes in the relation. Suppose all three of the following
conditions hold:
– A→B
– It is not the case that B → A
– B→C
Then the functional dependency A → C is a transitive dependency.
The functional dependency {Book} → {Author Nationality} applies;
that is, if we know the book, we know the author's nationality.
Furthermore:
– {Book} → {Author}
– {Author} → {Author Nationality}
– {Author} does not → {Book}
Therefore {Book} → {Author Nationality} is a transitive
dependency.
Background to Normalization: Definitions
An Index or Key is an attribute or collection of attributes that may be used to identify or retrieve one or
more records.

SuperKey: A superkey is a set of columns within a table whose values can be used to uniquely identify a
row.
e.g. Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many
possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those
listed, only <SSN> is a candidate key, as the others contain information not necessary to uniquely
identify records

A candidate key is a key that can be used to uniquely identify record. I.e., it may be used to retrieve one
specific record.
The primary key of a relation is a candidate key that has been designated as the main key.
A foreign key is an attribute (or collection of attributes) in a relation that can be used as a key to another
relation. Foreign keys link tables together to form an integrated database.
The Process of Normalization
The Process of Normalization
There are two main steps of the normalization process:
eliminate redundant data (for example, storing the same
data in more than one table) and ensure data
dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure that
data is logically stored. 
• Formal technique for analysing a relation based on its
primary key and functional dependencies between its
attributes.
• Often executed as a series of steps. Each step
corresponds to a specific normal form, which has known
properties.
• As normalization proceeds, relations become
progressively more restricted (stronger) in format and
also less vulnerable to update anomalies.
First Normal Form (1NF)
No Repeating Elements or Groups of Elements
A relation in which intersection of each row and column contains one
and only one value.
– All key attributes get defined
– No repeating groups in table
– All attributes dependent on primary key

UNF to 1NF:
• Eliminate duplicative columns from the same table (In other words..
Remove subsets of data that apply to multiple rows of a table and
place them in separate tables.).
• Create separate tables for each group of related data and identify
each row with a unique column or set of columns (the primary key).
• Create relationships between these new tables and their
predecessors through the use of foreign keys.
Second Normal Form (2NF)
No Partial Dependencies on a Concatenated Key
A relation that is in 1NF and every non-primary-key attribute
is fully functionally dependent on the primary key (no
partial dependency).
1NF to 2NF:
• Identify primary key for the 1NF relation.
• Identify functional dependencies in the relation.
• If partial dependencies exist on the primary key remove
them by placing them in a new relation along with copy of
their determinant (in other words, remove columns that
are not fully dependent upon the primary key).
• Create relationships between these new tables and their
predecessors through the use of foreign keys.
Third Normal Form (3NF)
No Dependencies on Non-Key Attributes
A relation that is in 1NF and 2NF and in which no non-
primary-key attribute is transitively dependent on the
primary key.  
2NF to 3NF
• Identify the primary key in the 2NF relation.
• Identify functional dependencies in the relation.
• If transitive dependencies exist on the primary key
remove them by placing them in a new relation along
with copy of their determinant.
Boyce-Codd normal form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if every
determinant is a candidate key.

• Difference between 3NF and BCNF is that for a


functional dependency A  B, 3NF allows this
dependency in a relation if B is a primary-key attribute
and A is not a candidate key.
• Whereas, BCNF insists that for this dependency to
remain in a relation, A must be a candidate key.
Normalization Example
• It is the processes of reducing the redundancy of data
in the table and also improving the data integrity. So
why is this required? without Normalization in SQL, we
may face many issues such as
• Insertion anomaly: It occurs when we cannot insert
data to the table without the presence of another
attribute
• Update anomaly:  It is a data inconsistency that results
from data redundancy and a partial update of data.
• Deletion Anomaly: It occurs when certain attributes
are lost because of the deletion of other attributes.
77
• In brief, normalization is a way of organizing the
data in the database. Normalization entails
organizing the columns and tables of a database to
ensure that their dependencies are properly
enforced by database integrity constraints.

78
1st Normal Form (1NF)

• In this Normal Form, we tackle the problem of


atomicity. Here atomicity means values in the table
should not be further divided. In simple terms, a
single cell cannot hold multiple values. If a table
contains a composite or multi-valued attribute, it
violates the First Normal Form.  

79
Example
Employee Name Ph. Number Salary
ID
101 Ram Shrestha 9845012222, 55000
057521000
102 Hira Poudel 9812121212, 45000
0575210020
103 Rajiv Yadav 985100000, 35000
057523000

In the above table, we can clearly see that the Phone Number
column has two values. Thus it violated the 1st NF. Now if we
apply the 1st NF to the above table we get the below table as
the result.

80
Employee Name Ph. Number Salary
ID
101 Ram Shrestha 9845012222 55000
101 Ram Shrestha 057521000 55000
102 Hira Poudel 9812121212 45000
102 Hira Poudel 0575210020 45000
103 Rajiv Yadav 985100000 35000
103 Rajiv Yadav 057523000 35000

• By this, we have achieved atomicity and also each and


every column have unique values.

81
2nd Normal Form (2NF)
• The first condition in the 2nd NF is that the table
has to be in 1st NF. The table also should not
contain partial dependency. 
Example
Employee ID Department ID office Location
Emp 101 Tr1 Hetauda
Emp 102 Dev2 Kathamandu
Emp 103 RAD1 Butwal

This table has a composite primary key Employee ID, Department


ID. The non-key attribute is Office Location. In this case, Office
Location only depends on Department ID, which is only part of the
primary key. Therefore, this table does not satisfy the second
82
Normal Form.
• To bring this table to Second Normal Form, we
need to break the table into two parts. Which will
give us the below tables:
Employee Department ID Department office
ID ID Location
Emp 101 Tr1 Tr1 Hetauda
Emp 102 Dev2 Dev2 Kathamandu
Emp 103 RAD1 RAD1 Butwal

As you can see we have removed the partial functional


dependency that we initially had. Now, in the table, the
column Office Location is fully dependent on the
primary key of that table, which is Department ID.
83
3rd Normal Form (3NF)
• The same rule applies as before i.e, the table has to be in
2NF before proceeding to 3NF.
• The other condition is there should be no transitive
dependency for non-prime attributes.
• That means non-prime attributes (which doesn’t form a
candidate key) should not be dependent on other non-
prime attributes in a given table.
• So a transitive dependency is a functional dependency in
which X → Z (X determines Z) indirectly, by virtue of X → Y
and Y → Z (where it is not the case that Y → X)
84
Example
Student ID Name Subject ID Subject Address
101 Ram Cmp 101 Computer Htd 1
102 Sita ACC 102 Account HTD 5
103 Rahul Eco 103 Economics HTd 4
104 Manish Eng 104 English Htd 10

• In the above table, Student ID determines Subject


ID, and Subject ID determines Subject.
• Therefore, Student ID determines Subject via
Subject ID.
• This implies that we have a transitive functional
dependency, and this structure does not satisfy the
third normal form. 85
Now in order to achieve third normal form, we need to
divide the table as shown below
Student Name Subject Address Subject ID Subject
ID ID
Cmp 101 Computer
101 Ram Cmp 101 Htd 1
102 Sita ACC 102 HTD 5 ACC 102 Account
103 Rahul Eco 103 HTd 4 Eco 103 Economics
104 Manish Eng 104 Htd 10
Eng 104 English

• As you can see from the above tables all the non-key attributes
are now fully functional dependent only on the primary key.
• In the first table, columns Student Name, Subject ID and Address
are only dependent on Student ID.
• In the second table, Subject is only dependent on Subject ID.
86

You might also like