Professional Documents
Culture Documents
Chapter 2
Class 12
1
Data vs. Information
• Data are raw facts or figures.
• Information is the result of processing raw data to
reveal meaning.
• Information requires context to reveal meaning
• Raw data must be formatted for storage,
processing, and presentation.
• Data are the foundation of information, which is
the bedrock of knowledge
Database Systems, 9th Edition 2
Data vs. Information (cont’d.)
• Data: building blocks of information
• Information is produced by processing data.
• Information used to reveal meaning in data.
• Accurate, relevant, timely information is the key to
good decision making
• Good decision making is the key to organizational
survival
4
Function of a database in general
• Allow anyone to:
– store (add)
– delete (remove)
– organize
– use
– present data
5
Example: data in Access database
6
Database Terminology
• A database or a table within a database is made of
RECORDS
• RECORDS contain information about a single item
in the database/table.
• Each record contains FIELDS.
• A FIELD is a category of data that has been broken
down into its simplest form.
– Firstname, surname, street, suburb
7
Terminology
8
Terminology
ID
ID CDTitle
CDTitle Units
UnitsIn
InStock
StockUnit
UnitPrice
Price
11 Joe Cocker Essentials
Joe Cocker Essentials 3 3 $28.00
$28.00
22 The Beatles
The Beatles 44 $35.00
$35.00
33 Aussie
AussieCountry
CountryHits
Hits 22 $19.95
$19.95
.
9
Why do we need database?
• Databases solve many of the problems
encountered in data management
– Used in almost all modern settings involving data
management:
• Business
• Research
• Administration
• Important to understand how databases work and
interact with other applications
Database Systems, 9th Edition 10
What is a DBMS?
• Database Management System is a software
that stores and manages the database.
• It is the computerized records keeping system.
• It allows to:
• create databases
• insert, update and delete data
• sort and query data
• create forms and reports
11
• Database Applications:
– Banking: all transactions
– Airlines: reservations, schedules
– Universities: registration, grades
– Sales: customers, products, purchases
– Manufacturing: production, inventory, orders, supply
chain
– Human resources: employee records, salaries, tax
deductions
• Example of DBMS are Microsoft Access, MySQL,
PostgreSQL, SQL Server, Oracle, dBASE, Clipper and
FoxPro
File system
A file is a collection of related records. It is the
traditional way of storing data electronically.
If there are 100 employees, then each employee would
have a record (e.g. called Employee Personal Details
record) and the collection of 100 such records would
constitute a file (in this case, called Employee Personal
Details file).
16
17
Advantages of the DBMS
• Advantages of a DBMS:
– Sharing data
– Reduced data redundancy
– Data backup and recovery
– Inconsistency avoided
– Data integrity
– Data security
– Data independence
– Multiple user interfaces
– Process complex query
Database Systems, 9th Edition 18
19
20
Disadvantages of DBMS
• Expensive
• Changing technology
• Needs technical training
• Backup is needed
21
DBMS Models
• DBMS Model describe:
– rules and standards on how database organize data
– defines how users view organization of data
• Common models include:
– Hierarchical Database model
– Network Database model
– Relational Database model
– ER Database Model
22
Hierarchical Database Model
• It organizes data in a tree structure
• all access to data starts at the top of the hierarchy
and moves downward
– for example, from customer to orders, vendor to
purchases, etc
• There is a hierarchy of parent and child data
segments
• support one to many relationship
23
24
Advantages
• It is the easiest model.
• It has one or more attributes.
• The searching is fast and easy, if parent is known.
• It supports one-to-one or one-to-many relationship.
Disadvantages
• Old fashioned and outdate database model.
• Does not support many-to-many relationship
• It increases redundancy.
25
Network Database Model
• The network model allows each record to have multiple
parent and child records, forming a generalized graph
structure
• Similar like hierarchical model but
– permit more than one parent per child
– thus permit the modeling of many-to-many relationships in data
• Very flexible
• Not widely use
26
27
28
Advantages
• It accepts many-to-many relationship, so it is more flexible.
• The searching is faster.
• It reduces redundancy because data shouldn’t be repeated
if same data is needed.
Disadvantages
• It is difficult to handle.
• There is less security because of sharing data.
29
Relational Model
• Data is stored in two-dimensional tables (rows and columns)
• These table are called relations.
• Widely used
• Example, Microsoft Access and MySQL
30
• Advantages
• The breaking of complex database table into simple database table
becomes possible.
• Database processing is faster than other model.
• There is very less redundancy.
• The integrity rules can easily be implemented.
Disadvantages
• It is more complex than other model
• There are too many rules because of complex relationships.
• It needs more powerful computers and data storage devices.
31
Entity-Relationship Database Model
• The entity relationship database model (ER-
Model) is based on perception of a real world that
contains the set of basic objects, called entities
and of relationships among these objects and
attributes of an entity.
• It is an overall logical structure of a database that
can be represented graphically.
32
Data Hierarchy
Definition- systematic organization of data, often in a hierarchical
form
records
fields
34
Relational database
• stores data in tables
• tables are organized into columns, and each
column stores one type of data
• data for a single “instance” of a table is stored as a
row
35
What is RDBMS?
• RDBMS is a DBMS which manages a relational
database
– Example, Microsoft Access and MySQL
• Data is structured in tables, records and fields
• Each table consists of rows (records)
• Each table row consists of one or more fields
(columns)
• RDBMS store the data into collection of tables, which
might be related by common fields
36
Advantages of RDBMS
• Minimum data redundancy
• Data consistency - less likelihood of incorrect or
incomplete data being stored or used
• Integrated data – data is organized in logical relationship
thus making it easy to relate data items
• Data sharing – allows users from different department to
share data
• Data accessibility – allows users to access or retrieve in a
flexible manner
• Uniform security, privacy and integrity control – db
administrator can establish control for accessing, updating
and protecting the data
37
Categories of RDBMS
• Personal database
– Best in single-user environment (up to 10 users)
– Example: Microsoft Access
• Client/Server database
– Support multiple users in a network environment
– Run in a server, client can request data from the server &
query, update & report locally
– Example: SQL Server
38
Master table
• Contains a primary key (must be unique as a
password)
• Normally is a table that lists the properties of
things that have some permanence and used
many times in other tables
• Example customers, teachers, students and
subjects offered
39
Transaction table
• Records some kind of interaction or event between master
tables
• Transaction tables are typically used in posting operations or
as lookup tables
• Example,
– In Student Information System, the actual classes taken by
students are transactions because they record specific interactions
between students and teachers
– In an eCommerce software the shopping cart tables are all
transaction tables, they record the purchase of items by customers
40
Primary Key
• Field, or fields, which by itself, or together
uniquely identify each row in a table
• Used to match up records in different tables
• Usually indexed
• Help to define the relationships between tables
41
Primary Key
• Requirement: must be unique and cannot be
empty or null
• Functions:
• Used to associate data from multiple tables
• Prevent duplicate record
• Control the order of records
• Faster to locate records
• Possible to have 2 or more fields as primary key in
a table –this is called composite key
42
Foreign Key
• Same data field and type which is linked to a
primary key in a corresponding table
• Example:
– In SalesTransaction table below, CustomerID would be the
Foreign Key field
– The Foreign Key is used to look up the CustomerID in the
Customer table where the CustomerID is the primary key
44
Abstraction
• Database systems are made-up of complex data
structures. To ease the user interaction with
database, the developers hide internal irrelevant
details from users. This process of hiding irrelevant
details from user is called data abstraction.
45
Levels of Abstraction
• Physical level describes how a record (e.g., customer) is stored.
• Logical level: describes data stored in database, and the
relationships among the data.
type customer = record
name : string;
street : string;
city : integer;
end;
• View level: application programs hide details of data types.
Views can also hide information (e.g., salary) for security
purposes.
View of Data
An architecture for a database system
Instances and Schemas
• Schema – the logical structure of the database
– e.g., the database consists of information about a set of
customers and accounts and the relationship between
them)
– Analogous to type information of a variable in a program
– Physical schema: database design at the physical level
– Logical schema: database design at the logical level
49
• Definition of instance: The data stored in database at a
particular moment of time is called instance of
database. Database schema defines the variable
declarations in tables that belong to a particular database;
the value of these variables at a moment of time is called
the instance of that database.
• For example, lets say we have a single table student in the
database, today the table has 100 records, so today the
instance of the database has 100 records. Lets say we are
going to add another 100 records in this table by tomorrow
so the instance of database tomorrow will have 200
records in table. In short, at a particular moment the data
stored in database is called the instance, that changes over
time when we add or delete data from the database.
50
• Physical Data Independence – the ability to
modify the physical schema without changing the
logical schema
– Applications depend on the logical schema
– In general, the interfaces between the various levels
and components should be well defined so that
changes in some parts do not seriously influence
others.
51
Data Models
• A collection of tools for describing
– data
– data relationships
– data semantics
– data constraints
• Entity-Relationship model
• Relational model
• Other models:
– object-oriented model
– semi-structured data models
– Older models: network model and hierarchical
model
Entity-Relationship Model
Example of schema in the entity-relationship model
Entity Relationship Model
• E-R model of real world
– Entities (objects)
• E.g. customers, accounts, bank branch
– Relationships between entities
• E.g. Account A-101 is held by customer Johnson
• Relationship set depositor associates customers with accounts
• Widely used for database design
– Database design in E-R model usually converted to design in
the relational model which is used for storage and
processing
Relational Model Attributes
Or in other words, In every possible legal value of R (relation), whenever two tuple agree
on their A values, they also agree on their B value.
Determinant of a functional dependency refers to attribute or group of attributes on left-
hand side of the arrow.
e.g. in an "Employee" table that includes the attributes "Employee ID" and "Employee
Date of Birth", the functional dependency {Employee ID} → {Employee Date of Birth}
would hold.
Background to Normalization: Definitions
SuperKey: A superkey is a set of columns within a table whose values can be used to uniquely identify a
row.
e.g. Imagine a table with the fields <Name>, <Age>, <SSN> and <Phone Extension>. This table has many
possible superkeys. Three of these are <SSN>, <Phone Extension, Name> and <SSN, Name>. Of those
listed, only <SSN> is a candidate key, as the others contain information not necessary to uniquely
identify records
A candidate key is a key that can be used to uniquely identify record. I.e., it may be used to retrieve one
specific record.
The primary key of a relation is a candidate key that has been designated as the main key.
A foreign key is an attribute (or collection of attributes) in a relation that can be used as a key to another
relation. Foreign keys link tables together to form an integrated database.
The Process of Normalization
The Process of Normalization
There are two main steps of the normalization process:
eliminate redundant data (for example, storing the same
data in more than one table) and ensure data
dependencies make sense (only storing related data in a
table). Both of these are worthy goals as they reduce the
amount of space a database consumes and ensure that
data is logically stored.
• Formal technique for analysing a relation based on its
primary key and functional dependencies between its
attributes.
• Often executed as a series of steps. Each step
corresponds to a specific normal form, which has known
properties.
• As normalization proceeds, relations become
progressively more restricted (stronger) in format and
also less vulnerable to update anomalies.
First Normal Form (1NF)
No Repeating Elements or Groups of Elements
A relation in which intersection of each row and column contains one
and only one value.
– All key attributes get defined
– No repeating groups in table
– All attributes dependent on primary key
UNF to 1NF:
• Eliminate duplicative columns from the same table (In other words..
Remove subsets of data that apply to multiple rows of a table and
place them in separate tables.).
• Create separate tables for each group of related data and identify
each row with a unique column or set of columns (the primary key).
• Create relationships between these new tables and their
predecessors through the use of foreign keys.
Second Normal Form (2NF)
No Partial Dependencies on a Concatenated Key
A relation that is in 1NF and every non-primary-key attribute
is fully functionally dependent on the primary key (no
partial dependency).
1NF to 2NF:
• Identify primary key for the 1NF relation.
• Identify functional dependencies in the relation.
• If partial dependencies exist on the primary key remove
them by placing them in a new relation along with copy of
their determinant (in other words, remove columns that
are not fully dependent upon the primary key).
• Create relationships between these new tables and their
predecessors through the use of foreign keys.
Third Normal Form (3NF)
No Dependencies on Non-Key Attributes
A relation that is in 1NF and 2NF and in which no non-
primary-key attribute is transitively dependent on the
primary key.
2NF to 3NF
• Identify the primary key in the 2NF relation.
• Identify functional dependencies in the relation.
• If transitive dependencies exist on the primary key
remove them by placing them in a new relation along
with copy of their determinant.
Boyce-Codd normal form (BCNF)
A relation is in Boyce-Codd normal form (BCNF) if every
determinant is a candidate key.
78
1st Normal Form (1NF)
79
Example
Employee Name Ph. Number Salary
ID
101 Ram Shrestha 9845012222, 55000
057521000
102 Hira Poudel 9812121212, 45000
0575210020
103 Rajiv Yadav 985100000, 35000
057523000
In the above table, we can clearly see that the Phone Number
column has two values. Thus it violated the 1st NF. Now if we
apply the 1st NF to the above table we get the below table as
the result.
80
Employee Name Ph. Number Salary
ID
101 Ram Shrestha 9845012222 55000
101 Ram Shrestha 057521000 55000
102 Hira Poudel 9812121212 45000
102 Hira Poudel 0575210020 45000
103 Rajiv Yadav 985100000 35000
103 Rajiv Yadav 057523000 35000
81
2nd Normal Form (2NF)
• The first condition in the 2nd NF is that the table
has to be in 1st NF. The table also should not
contain partial dependency.
Example
Employee ID Department ID office Location
Emp 101 Tr1 Hetauda
Emp 102 Dev2 Kathamandu
Emp 103 RAD1 Butwal
• As you can see from the above tables all the non-key attributes
are now fully functional dependent only on the primary key.
• In the first table, columns Student Name, Subject ID and Address
are only dependent on Student ID.
• In the second table, Subject is only dependent on Subject ID.
86