BSC 203 Database Management System

BSC 203 DATABASE MANAGEMENT SYSTEM
NOMINAL HOURS :120
COURSE AIM: To equip students with knowledge in database system and

management of the database
LEARNING OUTCOMES: By the end of this course students should be able to:
1. Understand the main issues related to database system in general

2. Describe the Database analysis and design techniques
3. Explain Design Theory for relational Database
4. Apply the principles of structured Query Language (SQL)
5. Explain the Transaction Management Concepts
COURSE CONTENT
UNIT 1: UNDERSTANDING THE MAIN ISSUES RELATED TO DATABASE
SYSTEM IN GENERAL
1. 1 Definition of Database
1.2 Conventional file based system and database approach
1.3 The traditional File based Approach
1.4 The role of DBMS
1.5 Advantage and Disadvantages of DBMS
1.6 Components of DBMS
1.6.1 Functions of DBMS
1.6.2 Physical and logical structures
1.6.3 Three Level Architecture
1.6.4 Logical and Physical Data Independence
Data Models
Designing a database properly is fundamental to establishing a database that meets the
needs of the users. Data models capture the nature of and relationships among data and
are used at different levels of abstraction as a database is conceptualized and designed.
A database management system (DBMS) is a software system that enables the use of a
database approach.
DBMS is to provide a systematic method of creating, updating, storing, and retrieving

the data stored in a database.
Components of the Database Environment
1. Data modeling and design tools Data modeling and design tools are automated
tools used to design databases and application programs. These tools help with
creation of data models and in some cases can also help automatically generate the
“code” needed to create the database. We reference the use of automated tools
for database design and development throughout the text.
2. Repository A repository is a centralized knowledge base for all data definitions,

data relationships, screen and report formats, and other system components.
A repository contains an extended set of metadata important for managing
databases as well as other components of an information system.
3. DBMS A DBMS is a software system that is used to create, maintain, and provide
controlled access to user databases.
4. Database A database is an organized collection of logically related data, usually

designed to meet the information needs of multiple users in an organization. It
is important to distinguish between the database and the repository. The repository
contains definitions of data, whereas the database contains occurrences of
data.
5. Application programs Computer-based application programs are used to create

and maintain the database and provide information to users.
6. User interface The user interface includes languages, menus, and other facilities
by which users interact with various system components, such as data modeling
and design tools, application programs, the DBMS, and the repository. User
interfaces are illustrated throughout this text.
7. Data and database administrators Data administrators are persons who are
responsible for the overall management of data resources in an organization.
Database administrators are responsible for physical database design and for
managing technical issues in the database environment.
8. System developers System developers are persons such as systems analysts and
programmers who design new application programs.
9. End users End users are persons throughout the organization who add, delete,
and modify data in the database and who request or receive information from it.
All user interactions with the database must be routed through the DBMS.
1.5 Advantage and Disadvantages of DBMS
Advantages of Database Management System (DBMS)
1. Improved data sharing
An advantage of the database management approach is, the DBMS helps to create
an environment in which end users have better access to more and better-managed
data.
Such access makes it possible for end users to respond quickly to changes in their
environment.
2. Improved data security
The more users access the data, the greater the risks of data security breaches.
Corporations invest considerable amounts of time, effort, and money to ensure that
corporate data are used properly. A DBMS provides a framework for better
enforcement of data privacy and security policies.
3. Better data integration
Wider access to well-managed data promotes an integrated view of the

organization’s operations and a clearer view of the big picture. It becomes much
easier to see how actions in one segment of the company affect other segments.
4. Minimized data inconsistency
Data inconsistency exists when different versions of the same data appear in
different places. For example, data inconsistency exists when a company’s sales
department stores a sales representative’s name as “Bill Brown” and the company’s
personnel department stores that same person’s name as “William G. Brown,” or
when the company’s regional sales office shows the price of a product as $45.95
and its national sales office shows the same product’s price as $43.95. The
probability of data inconsistency is greatly reduced in a properly designed
database.
5. Improved data access
The DBMS makes it possible to produce quick answers to ad hoc queries. From a
database perspective, a query is a specific request issued to the DBMS for data
manipulation—for example, to read or update the data. Simply put, a query is a
question, and an ad hoc query is a spur-of-the-moment question. The DBMS sends
back an answer (called the query result set) to the application. For example, end
users, when dealing with large amounts of sales data, might want quick answers to
questions (ad hoc queries) such as:
- What was the dollar volume of sales by product during the past six months?
- What is the sales bonus figure for each of our salespeople during the past three
months?
- How many of our customers have credit balances of 3,000 or more?
6. Improved decision making
Better-managed data and improved data access make it possible to generate better-
quality information, on which better decisions are based. The quality of the
information generated depends on the quality of the underlying data. Data quality
is a comprehensive approach to promoting the accuracy, validity, and timeliness of
the data. While the DBMS does not guarantee data quality, it provides a framework
to facilitate data quality initiatives.
7. Increased end-user productivity
The availability of data, combined with the tools that transform data into usable
information, empowers end users to make quick, informed decisions that can make
the difference between success and failure in the global economy.
Till now we have seen different benefits of database management systems. But it

has certain limitations or disadvantages.
Let's find various disadvantages of database system.
Disadvantages of Database Management System (DBMS):
Although the database system yields considerable advantages over previous data
management approaches, database systems do carry significant disadvantages. For
example:
1. Increased costs
one of the disadvantages of dbms is Database systems require sophisticated

hardware and software and highly skilled personnel. The cost of maintaining the
hardware, software, and personnel required to operate and manage a database
system can be substantial. Training, licensing, and regulation compliance costs are
often overlooked when database systems are implemented.
2. Management complexity
Database systems interface with many different technologies and have a significant
impact on a company’s resources and culture. The changes introduced by the
adoption of a database system must be properly managed to ensure that they help
advance the company’s objectives. Given the fact that database systems hold
crucial company data that are accessed from multiple sources, security issues must
be assessed constantly.
3. Maintaining currency
To maximize the efficiency of the database system, you must keep your system
current. Therefore, you must perform frequent updates and apply the latest
patches and security measures to all components.
Because database technology advances rapidly, personnel training costs tend to be

significant. Vendor dependence. Given the heavy investment in technology and
personnel training, companies might be reluctant to change database vendors.
As a consequence, vendors are less likely to offer pricing point advantages to

existing customers, and those customers might be limited in their choice of
database system components.
4. Frequent upgrade/replacement cycles
DBMS vendors frequently upgrade their products by adding new functionality.

Such new features often come bundled in new upgrade versions of the software.
Some of these versions require hardware upgrades. Not only do the upgrades
themselves cost money, but it also costs money to train database users and
administrators to properly use and manage the new features.
1.6 Components of DBMS
Components of DBMS
DBMS have several components, each performing very significant tasks in the
database management system environment. Below is a list of components within
the database and its environment.
Software
This is the set of programs used to control and manage the overall database. This
includes the DBMS software itself, the Operating System, the network software
being used to share the data among users, and the application programs used to
access data in the DBMS.
Hardware
Consists of a set of physical electronic devices such as computers, I/O devices,
storage devices, etc., this provides the interface between computers and the real
world systems.
Data
DBMS exists to collect, store, process and access data, the most important
component. The database contains both the actual or operational data and the
metadata.
Procedures
These are the instructions and rules that assist on how to use the DBMS, and in
designing and running the database, using documented procedures, to guide the
users that operate and manage it.
Database Access Language

This is used to access the data to and from the database, to enter new data, update
existing data, or retrieve required data from databases. The user writes a set of
appropriate commands in a database access language, submits these to the DBMS,
which then processes the data and generates and displays a set of results into a
user readable form.
Query Processor
This transforms the user queries into a series of low level instructions. This reads
the online user’s query and translates it into an efficient series of operations in a
form capable of being sent to the run time data manager for execution.
Run Time Database Manager

Sometimes referred to as the database control system, this is the central software
component of the DBMS that interfaces with user-submitted application programs
and queries, and handles database access at run time. Its function is to convert
operations in user’s queries. It provides control to maintain the consistency,
integrity and security of the data.
Data Manager
Also called the cache manger, this is responsible for handling of data in the
database, providing a recovery to the system that allows it to recover the data after
a failure.
Database Engine
The core service for storing, processing, and securing data, this provides controlled
access and rapid transaction processing to address the requirements of the most
demanding data consuming applications. It is often used to create relational
databases for online transaction processing or online analytical processing data.
Data Dictionary
This is a reserved space within a database used to store information about the
database itself. A data dictionary is a set of read-only table and views, containing
the different information about the data used in the enterprise to ensure that
database representation of the data follow one standard as defined in the
dictionary.
Report Writer
Also referred to as the report generator, it is a program that extracts information
from one or more files and presents the information in a specified format. Most
report writers allow the user to select records that meet certain conditions and to
display selected fields in rows and columns, or also format the data into different
charts.
Great Performance through Effective DBMS

A company's performance is greatly affected by how it manages its data. And one of
the most basic tasks of data management is the effective management of its
database. Understanding the different components of the DBMS and how it works
and relates to each other is the first step to employing an effective DBMS.
1.6.1 Functions of DBMS
Functions of DBMS
1. Data Dictionary Management
Data Dictionary Management is the one of the most important function in database
management system.
DBMS stores definitions of the data elements and their relationships (metadata) in
a data dictionary.
So, all programs that access the data in the database work through the DBMS.
The DBMS uses the data dictionary to look up the required data component
structures and relationships which relieves you from coding such complex
relationships in each program.
Additionally, any changes made in a database structure are automatically recorded

in the data dictionary, thereby freeing you from having to modify all of the
programs that access the changed structure.
In other words, the DBMS system provides data abstraction, and it removes
structural and data dependence from the system.

2. Data Storage Management
The DBMS creates and manages the complex structures required for data storage,
thus relieving you from the difficult task of defining and programming the physical
data characteristics.
A modern DBMS system provides storage not only for the data, but also for related
data entry forms or screen definitions, report definitions, data validation rules,
procedural code, structures to handle video and picture formats, and so on.
Data storage management is also important for database performance tuning.

Performance tuning relates to the activities that make the database perform more
efficiently in terms of storage and access speed. So, the data storage management is
another important function of Database Management System.
3. Data transformation and presentation
The DBMS transforms entered data in to required data structures. The DBMS
relieves you of the chore of making a distinction between the logical data format
and the physical data format. That is, the DBMS formats the physically retrieved
data to make it conform to the user’s logical expectations.
For example, imagine an enterprise database used by a multinational company. An

end user in England would expect to enter data such as July 11, 2009, as
“11/07/2009.” In contrast, the same date would be entered in the United States as
“07/11/2009.” Regardless of the data presentation format, the DBMS system must
manage the date in the proper format for each country.
4. Security Management
Security Management is another important function of DBMS. The DBMS creates a

security system that enforces user security and data privacy. Security rules
determine which users can access the database, which data items each user can
access, and which data operations (read, add, delete, or modify) the user can
perform. This is especially important in multiuser database systems.

5. Multi User Access Control
To provide data integrity and data consistency, the DBMS uses sophisticated
algorithms to ensure that multiple users can access the database concurrently
without compromising the integrity of the database.
6. Backup and Recovery Management
The DBMS provides backup and data recovery to ensure data safety and integrity.
Current DBMS systems provide special utilities that allow the DBA to perform
routine and special backup and restore procedures. Recovery management deals
with the recovery of the database after a failure, such as a bad sector in the disk or a
power failure. Such capability is critical to preserving the database’s integrity.
7. Data Integrity Management
Data integrity management is another important function of DBMS.
The DBMS promotes and enforces integrity rules, thus minimizing data
redundancy and maximizing data consistency.
The data relationships stored in the data dictionary are used to enforce data
integrity. Ensuring data integrity is especially important in transaction-oriented
database systems.
8. Database Access Languages and Application Programming

Interfaces
The DBMS provides data access through a query language. A query language is a
non procedural language—one that lets the user specify what must be done without
having to specify how it is to be done.
Structured Query Language (SQL) is the defacto query language and data access
standard supported by the majority of DBMS vendors.

9. Database Communication Interfaces
Current-generation DBMS's accept end-user requests via multiple, different

network environments. For example, the DBMS might provide access to the
database via the Internet through the use of Web browsers such as Mozilla Firefox
or Microsoft Internet Explorer. In this environment, communications can be
accomplished in several ways:
- End users can generate answers to queries by filling in screen forms through their
preferred Web browser.
- The DBMS can automatically publish predefined reports on a Website.
- The DBMS can connect to third-party systems to distribute information via e-

mail or other productivity applications.
1.6.2 Physical and logical structures
Physical and Logical Databases
 C/SIDE - Client/Server Integrated Development Environment
 C/AL - Client/Server Application Language
Description.
C/AL is the programming language that used within the development environment for
Microsoft Dynamics NAV, and the development environment is called as C/SIDE. C/AL
is a database specific programming language and it primarily used to retrieve, insert,
and modify the records in the dynamics NAV Database.
In this section, you will learn how the information in your application is structured.
When you use a database, you are not usually concerned with where each piece of data is
stored, or what size it is. You just want to be sure that when you refer to a name, for
example, the correct value is returned. This is why the C/SIDE database system provides
a conceptual representation of data that does not include many details about how the
data is stored. An abstract data model is used for this conceptual representation. This
data model uses logical concepts (such as objects, their properties, and their
relationships), which are much easier to understand.
This section distinguishes between the logical and the physical database. For this topic,
the logical database is the structure of the data and the relationships between different
pieces of information. There is no information about how these structures and relations
are implemented. For the physical database, this topic describes how the structures in
the logical database and the search paths between them are implemented. The term
database means the logical database, unless indicated otherwise.
What the user sees as a coherent set of information in the C/SIDE database system can
be stored in several physical disk files, but this is transparent to the user. The following
illustration shows how one logical database can be physically stored on three hard disks
but still comprise a single (logical) database.
The following illustration shows the logical database.
The Logical Structures in Your Database

Access to the data is made possible by a well-defined logical organization composed of
the following.
The following illustration shows logical structures.
Logical structure Description
Fields Fields are the smallest logical structure in a C/SIDE database. A

field holds a single piece of information, such as a name or an
amount. A field can hold one specific type of information. (The
C/SIDE database system distinguishes between 17 different types
of information.) Fields are assembled into a structure called a
record. On its own, a field is not very useful, as it can hold only a
limited amount of information. Assembling these small bits of
information into records produces a much more flexible
"information-holder", which also groups fields that belong
together.
Records A record is a logical structure assembled from an arbitrary number

of fields. A record stores a single entry in the database. The fields
in a record store information about important properties of the
entry. Records are organized in tables.
Tables A table can be thought of as an N times M matrix. Each of the N

rows describes a record and each of the M columns describes a
field in the record. Tables are organized in companies.
Companies A company is the largest logical structure in a C/SIDE database. A

company is a sub-database; its primary use is to separate and
group large portions of data together. A company can contain
private tables as well as tables that are shared with other
companies.
Introduction to Physical Storage Structures

One characteristic of an RDBMS is the independence of logical data structures such as
tables, views, and indexes from physical storage structures. Because physical and
logical structures are separate, you can manage physical storage of data without
affecting access to logical structures. For example, renaming a database file does not
rename the tables stored in it.
An Oracle database is a set of files that store Oracle data in persistent disk storage.
This section discusses the database files generated when you issue a CREATE
DATABASE statement:
• Data files and temp filesA data file is a physical file on disk that was created by
Oracle Database and contains data structures such as tables and indexes. A temp
file is a data file that belongs to a temporary tablespace. The data is written to
these files in an Oracle proprietary format that cannot be read by other programs.
• Control filesA control file is a root file that tracks the physical components of the
database.
• Online redo log filesThe online redo log is a set of files containing records of
changes made to data.
A database instance is a set of memory structures that manage database files. Figure
11-1 shows the relationship between the instance and the files that it manages.
Figure 11-1 Database Instance and Database Files
Introduction to Logical Storage Structures

Oracle Database allocates logical space for all data in the database. The logical units of
database space allocation are data blocks, extents, segments, and tablespaces. At a
physical level, the data is stored in data files on disk (see Chapter 11, "Physical Storage
Structures"). The data in the data files is stored in operating system blocks.
Figure 12-1 is an entity-relationship diagram for physical and logical storage. The crow's
foot notation represents a one-to-many relationship.
Figure 12-1 Logical and Physical Storage
Description of "Figure 12-1 Logical and Physical Storage"
Logical Storage Hierarchy

Figure 12-2 shows the relationships among data blocks, extents, and segments within a
tablespace. In this example, a segment has two extents stored in different data files.
Figure 12-2 Segments, Extents, and Data Blocks Within a Tablespace
Description of "Figure 12-2 Segments, Extents, and Data Blocks Within a Tablespace"
At the finest level of granularity, Oracle Database stores data in data blocks. One logical
data block corresponds to a specific number of bytes of physical disk space, for
example, 2 KB. Data blocks are the smallest units of storage that Oracle Database can
use or allocate.
An extent is a set of logically contiguous data blocks allocated for storing a specific type
of information. In Figure 12-2, the 24 KB extent has 12 data blocks, while the 72 KB
extent has 36 data blocks.
A segment is a set of extents allocated for a specific database object, such as a table.
For example, the data for the employees table is stored in its own data segment,
whereas each index for employees is stored in its own index segment. Every database
object that consumes storage consists of a single segment.
Each segment belongs to one and only one tablespace. Thus, all extents for a
segment are stored in the same tablespace. Within a tablespace, a segment can
include extents from multiple data files, as shown in Figure 12-2. For example, one
extent for a segment may be stored in users01.dbf, while another is stored in
users02.dbf. A single extent can never span data files.
1.6.3 Three Level Architecture
Three-Level ANSI-SPARC Architecture

An early proposal for a standard terminology and general architecture for database
systems was produced in 1971 by the DBTG (Data Base Task Group) appointed by the
Conference on Data Systems and Languages (CODASYL, 1971). The DBTG recognized
the need for a two-level approach with a system view called the schema and user views
called sub-schemas.
Here is the figure showing the ANSI_SPARC Architecture of the database system:
The levels form a three-level architecture that includes an external, a conceptual, and an
internal level. The way users recognize the data is called the external level. The way the
DBMS and the operating system distinguish the data is the internal level, where the data
is actually stored using the data structures and file. The conceptual level offers both the
mapping and the desired independence between the external and internal levels.
What is Database Architecture?
A DBMS architecture is depending on its design and can be of the following types:
• Centralized
• Decentralized
• Hierarchical
DBMS architecture can be seen as either single tier or multi-tier. An architecture having
n-tier splits the entire system into related but independent n modules that can be
independently customized, changed, altered, or replaced.
The architecture of a database system is very much influenced by the primary computer
system on which the database system runs. Database systems can be centralized, or
client-server, where one server machine executes work on behalf of multiple client
machines. Database systems can also be designed to exploit parallel computer
architectures. Distributed databases span multiple geographically separated machines.
The Three Tier Architecture
A 3-tier application is an application program that is structured into three major parts;
each of them is distributed to a different place or places in a network. These 3 divisions
are as follows:
• The workstation or presentation layer
• The business or application logic layer
The database and programming related to managing layer
DBMS Architecture - Overview of Three-Level Architecture
DATABASE TUTORIALS
Database Task Group ( DBTG ) has been developed and published a proposal
for a standard vocabulary and architecture for database systems in 1971. It can be
appointed by Conference on Data Systems and Langauges ( CODASYL ) . The
standard Planning and Requirements Committee of American National Standards
Institute ( ANSI ) Committee on Computers and Information Processing
developed and published a similar vocabulary and architecture in 1975.
The results of these reports was the three-level architecture. Three-level

Architecture is a basis of modern database architectures . Database can be viewed
at three levels. These three levels are depicted by three models known as Three-
level Schema. These models can refer the structures of the database systems not
the data stored in it. The permanent structure of database is also known as
Intension of Database or Database Schema. The data can be stored at given time
known as Extension of Database or Database Instance.
The intensions of database should not be changed once it has been defined. This
because a small change in the intension of database may require many changes to
the data stored in the databases. The extension of database is performed after the
intension of database has been finished. It means that data is stored in database
when the database structure has been defined. The extension of database is
performed according to the rules defined in the intension of database.
The schema's are used to store definitions of the structures of databases. It can be
anything a like a single entity or the whole organization. Three level architecture
defines the many different schema's stored at different levels to isolate the details
of different levels from one another.
Three Level Architecture of Database Management System

External Level Schema
In the Relational Database Model, the external level schema also presents data

as a set of relations. An external level schema specifies a view of the data in terms
of the conceptual level. It is tailored to the needs of a particular category of the
users. Portions of the stored data should not be seen by some users and begins to
implement a level of security and simplifies the view for these users

Examples:
 Students should not see faculty salaries.

 Faculty should not see billing or payment data.
Applications are written in terms of an external schema. The external view is
computed when accessed. It is not stored. Different external schema's can be
provided to different categories of users. Translation from external level to
conceptual level is done automatically by DBMS at run time. The conceptual
schema can be changed without changing application:
 Mapping from external to conceptual must be changed.

 Referred to as conceptual data independence.
Conceptual Level Schema
The Conceptual Level Schema represents the entire / whole database. Conceptual
schema describes the records and relationship included in the Conceptual view. It
also contains the method of deriving the objects in the conceptual view from the
objects in the internal view.
Internal Level Schema
The Internal level schema indicates the whole data will be stored and described in
the data structures and access method can be used by the database. It can contains
the definition of stored record and method of representing the data fields and
access aid used.
A mapping between external and conceptual views gives the correspondence

among the records and relation ship of the conceptual and external view. The
external view is the abstraction of conceptual view which in turns is the abstraction
of internal view. It describes the contents of the database as perceived by the user
or application program of that view.

1.6.4 Logical and Physical Data Independence
If a database system is not multi-layered, then it becomes difficult to make any changes
in the database system. Database systems are designed in multi-layers as we learnt
earlier.
Data Independence
A database system normally contains a lot of data in addition to users’ data. For
example, it stores data about data, known as metadata, to locate and retrieve data easily.
It is rather difficult to modify or update a set of metadata once it is stored in the
database. But as a DBMS expands, it needs to change over time to satisfy the
requirements of the users. If the entire data is dependent, it would become a tedious and
highly complex job.
Metadata itself follows a layered architecture, so that when we change data at one layer,
it does not affect the data at another level. This data is independent but mapped to each
other.
Logical Data Independence
Logical data is data about database, that is, it stores information about how data is
managed inside. For example, a table (relation) stored in the database and all its
constraints, applied on that relation.
Logical data independence is a kind of mechanism, which liberalizes itself from actual
data stored on the disk. If we do some changes on table format, it should not change the
data residing on the disk.
Physical Data Independence
All the schemas are logical, and the actual data is stored in bit format on the disk.
Physical data independence is the power to change the physical data without impacting
the schema or logical data.
For example, in case we want to change or upgrade the storage system itself −
suppose we want to replace hard-disks with SSD − it should not have any impact
on the logical data or schemas.
Physical data independence is the ability to modify the physical

scheme without making it necessary to rewrite application programs. Such
modifications include changing from unblocked to blocked record storage,
or from sequential to random access files.

Logical data independence is the ability to modify the conceptual
scheme without making it necessary to rewrite application programs. Such
a modification might be adding a field to a record; an application program’s
view hides this change from the program.

Logical Data Independence: Logical data independence is the ability to modify the
conceptual schema without having alteration in external schemas or application
programs. Alterations in the conceptual schema may include addition or deletion of
fresh entities, attributes or relationships and should be possible without having
alteration to existing external schemas or having to rewrite application programs.
Physical Data Independence: Physical data independence is the ability to modify the
inner schema without having alteration to the conceptual schemas or application
programs. Alteration in the internal schema might include. * Using new storage devices.
* Using different data structures. * Switching from one access method to another. *
Using different file organizations or storage structures. * Modifying indexes.
Physical Independence: The logical scheme stays unchanged even though the
storage space or type of some data is changed for reasons of optimisation or
reorganisation. In this external schema does not change. In this internal schema
changes may be required due to some physical schema were reorganized here. Physical
data independence is present in most databases and file environment in which hardware
storage of encoding, exact location of data on disk, merging of records, so on this are
hidden from user. Logical Independence: The external scheme may stay unchanged for
most changes of the logical scheme. This is especially desirable as the application
software does not need to be modified or newly translated.
UNIT 2: DESCRIBING THE DATABASE ANALYSIS AND DESIGN
TECHNIQUES
Introduction
1. Database design is a technique that involves the analysis, design, description, and
specification of data designed for automated business data processing. This technique
uses models to enhance communication between developers and customers.
2. Data models and supporting descriptions are the tools used in database design.
These tools become the deliverables that result from applying database design. There
are two primary objectives for developing of these deliverables. The first objective is
to produce documentation that describes a customer’s perspective of data and the
relationships among this data. The second objective is to produce documentation that
describes the customer organization's environment, operations and data needs. In
accomplishing these objectives, the following deliverables result:
 Decision Analysis and Description Forms
 Task Analysis and Description Forms
 Task/Data Element Usage Matrix
 Data Models
 Entity-Attribute Lists
 Data Definition Lists
 Physical Database Specifications Document
3. Consider a database approach if one or more of the following conditions exist in
the user environment:
 A multiple number of applications are to be supported by the system.
 A multiple number of processes or activities use a multiple number of data
sources.
 A multiple number of data sources are used in the reports produced.
 The data, from the data definitions, are known to be in existing
database(s).
 The development effort is to enhance the capabilities of an existing
database.
4. If it appears that conditions would support database development, then
undertake the activities of logical database analysis and design. When the logical
schema and sub schemas are completed they are translated into their physical
counterparts. Then the physical sub schemas are supplied as part of the data
specifications for program design. The exact boundary between the last stages of
logical design and the first stages of physical analysis is difficult to assess because of
the lack of standard terminology. However, there seems to be general agreement that
logical design encompasses a DBMS-independent view of data and that physical
design results in a specification for the database structure, as it is to be physically
stored. The design step between these two that produces a schema that can be
processed by a DBMS is called implementation design.
5. Do not limit database development considerations to providing random access or
ad hoc query capabilities for the system. However, even if conditions appear to
support database development, postpone the decision to implement or not
implement a DBMS until after completing a thorough study of the current
environment. This study must clarify any alternatives that may or may not be
preferable to DBMS implementation.
2.1 Database planning, Design and Administration
What is Database Planning in DBMS?
It is the management of activities that permit the stages of the database system
development life cycle to be realized as efficiently and effectively as possible.
Database planning must be integrated with the overall IS strategy of the organization.
There are 3 main issues involved in formulating an IS strategies which are:
 Identification of enterprise plans and goals with subsequent purpose of

information systems requirements
 Evaluation of current information systems to find out existing strengths and
weaknesses
 Appraisal of IT opportunities that might yield aggressive advantage
An important first step in database planning is to obviously define the mission

statement for the database system. The mission statement describes the major aims of
the database system. Those driving the database project within the organization that
normally define the mission statement. A mission statement helps to simplify the
purpose of the database system and provide a clearer path towards the efficient and
effective creation of the required database system.
Database Design
This is the process of creating a design that will support the enterprise’s mission
statement and mission objectives for the required database system. Two main
approaches to the design of a database are followed. These are:
 bottom-up and
 top-down
The bottom-up approach starts at the fundamental level of attributes (i.e. properties of

entities and relationships) which through analysis of the associations between attributes
are clustered into relations that signifies types of entities and relationships between
entities.
A more appropriate strategy for the design of complex databases is to use the top-down
approach which starts with the development of data models that holds few high-level
entities and relationships and then applies consecutive top-down refinements to identify
lower-level entities, relationships, and the associated attributes. The top – down
approach can be understand better using the concepts of the Entity-Relationship (ER)
model, beginning with the identification of entities and relationships between the
entities, which are of interest to the organization.
Database Administration
A DBMS normally provides various utilities for aiding database administration that
includes utilities for loading data into their respective database and finally monitoring
the system. The utilities allow system monitoring give information on and query
execution strategy. The Database Administrator (DBA) is the one who can use this
information to tune the system to give better performance result to database, by
generating additional indexes to speed up queries, by altering storage structures, or by
combining or splitting tables.
The monitoring process continues throughout the life of a database system and in time
may lead to restructuring of the database for satisfying the changing requirements.
These changes ultimately provide information on the likely evolution of the system and
the future resources that may be needed. This, together with knowledge of proposed
new applications, enables the DBA to connect in capacity planning and to notify or alert
senior staff(s) for adjusting plans consequently. If the DBMS lacks certain utilities, the
DBA can either develop the required utilities in-house or purchase additional vendor
tools based on the requirement.
2.2 The information system Lifecycle

PDF file
2.3 The Database Development Lifecycle
Software has now exceeded hardware as the key to the success of many computer based
systems. Unfortunately, the track record of software development is not particularly
remarkable. The last few decades have seen the rise in software applications ranging
from small, relatively simple applications consisting of a few lines of code, to large,
complex applications consisting of millions of lines of code. Many of these applications
have required stable maintenance which involved correcting faults that had been
detected, implementing new user requirements, and modifying the software to run on
new or upgraded platforms.
The effort spent on upholding the design and quality began to absorb resources at an
alarming rate. Software developers also require the database to be properly maintained
by planning them and designing their way of existence and administering them after
deployment. In this chapter you will learn about how these three terms proved useful in
database management system.
Database System Development Life Cycle
As a database system is a primary element of the larger organization wide information
system, the database system development life cycle is inherently connected with the life
cycle of the information system. The stages of the database system development lifecycle
are shown in figure below:
UNIT 3: EXPLAINING THE DESIGN THEORY FOR RELATIONAL
DATABASE
Overview
Database design theory is a topic that many people avoid learning for lack of time. Many
others attempt to learn it, but give up because of the dry, academic treatment it is
usually given by most authors and teachers. But if creating databases is part of your job,
then you're treading on thin ice if you don't have a good solid understanding of
relational database design theory.
This article begins with an introduction to relational database design theory, including a
discussion of keys, relationships, integrity rules, and the often-dreaded "Normal
Forms." Following the theory, I present a practical step-by-step approach to good
database design.
The Relational Model
The relational database model was conceived by E. F. Codd in 1969, then a researcher at
IBM. The model is based on branches of mathematics called set theory and predicate
logic. The basic idea behind the relational model is that a database consists of a series of
unordered tables (or relations) that can be manipulated using non-procedural
operations that return tables. This model was in vast contrast to the more traditional
database theories of the time that were much more complicated, less flexible and
dependent on the physical storage methods of the data.
Note: It is commonly thought that the word relational in the relational model comes
from the fact that you relate together tables in a relational database. Although this is a
convenient way to think of the term, it's not accurate. Instead, the word relational has its
roots in the terminology that Codd used to define the relational model. The table in
Codd's writings was actually referred to as a relation (a related set of information). In
fact, Codd (and other relational database theorists) use the terms relations, attributes
and tuples where most of us use the more common terms tables, columns and rows,
respectively (or the more physical—and thus less preferable for discussions of database
design theory—files, fields and records).
The relational model can be applied to both databases and database management
systems (DBMS) themselves. The relational fidelity of database programs can be
compared using Codd's 12 rules (since Codd's seminal paper on the relational model, the
number of rules has been expanded to 300) for determining how DBMS products
conform to the relational model. When compared with other database management
programs, Microsoft Access fares quite well in terms of relational fidelity. Still, it has a
long way to go before it meets all twelve rules completely.
Fortunately, you don't have to wait until Microsoft Access is perfect in a relational sense
before you can benefit from the relational model. The relational model can also be
applied to the design of databases, which is the subject of the remainder of this article.
Relational Database Design
When designing a database, you have to make decisions regarding how best to take
some system in the real world and model it in a database. This consists of deciding
which tables to create, what columns they will contain, as well as the relationships
between the tables. While it would be nice if this process was totally intuitive and
obvious, or even better automated, this is simply not the case. A well-designed database
takes time and effort to conceive, build and refine.
The benefits of a database that has been designed according to the relational model are
numerous. Some of them are:
 Data entry, updates and deletions will be efficient.
 Data retrieval, summarization and reporting will also be efficient.
 Since the database follows a well-formulated model, it behaves predictably.
 Since much of the information is stored in the database rather than in the
application, the database is somewhat self-documenting.
 Changes to the database schema are easy to make.
The goal of this article is to explain the basic principles behind relational database
design and demonstrate how to apply these principles when designing a database using
Microsoft Access. This article is by no means comprehensive and certainly not
definitive. Many books have been written on database design theory; in fact, many
careers have been devoted to its study. Instead, this article is meant as an informal
introduction to database design theory for the database developer.
Note: While the examples in this article are centered around Microsoft Access
databases, the discussion also applies to database development using the Microsoft
Visual Basic® programming system, the Microsoft FoxPro® database management
system, and the Microsoft SQL Server™ client-server database management system.
3.1 E-R Model
One of the most difficult phases of database design is the fact that designers,
programmers and / or end-users tend to view data and its use in various different forms.
Unfortunately, unless all the database learners gain a common understanding that
reflects how the enterprise operates but the design you may produce will fail to meet the
users’ requirements. To ensure that you get a precise understanding of the nature of the
data and how it is used by the enterprise, you need to have a universal model for
interaction that is non-technical and free of ambiguities and easy readable to both
technical as well as non-technical members. So ER (Entity Relationship) Model was
designed and developed and are represented by ER diagram. In this chapter you will
learn about the ER diagram and its working.
What is Entity Relationship Diagram (ER-Diagram)?

ER-Diagram is a pictorial representation of data that describes how data is
communicated and related to each other. Any object, such as entities, attributes of an
entity, sets of relationship and other attributes of relationship can be characterized with
the help of the ER diagram.
Entities: They are represented using the rectangle shape box. These rectangles are
named with the entity set they represent.
ER modeling is a top-down structure to database design that begins with identifying the
important data called entities and relationships in combination between the data that
must be characterized in the model. Then database model designers can add more
details such as the information they want to hold about the entities and relationships
which are the attributes and any constraints on the entities, relationships, and
attributes. ER modeling is an important technique for any database designer to master
and forms the basis of the methodology.
• Entity type: It is a group of objects with the same properties that are identified by the
enterprise as having an independent existence. The basic concept of the ER
model is the entity type that is used to represent a group of ‘objects’ in the ‘real
world’ with the same properties. An entity type has an independent existence
within a database.
• Entity occurrence: A uniquely identifiable object of an entity type.
Diagrammatic Representation of Entity Types
Each entity type is shown as a rectangle labeled with the name of the entity, which is
normally a singular noun.
What is Relationship Type?
A relationship type is a set of associations between one or more participating entity
types. Each relationship type is given a name that describes its function.
Here is a diagram showing how relationships are formed in a database.
What is degree of Relationship?

The entities occupied in a particular relationship type are referred to as participants in
that relationship. The number of participants involved in a relationship type is termed
as the degree of that relationship.
In the above figured example “Branch has staff”, there is a relationship between two
participating entities. A relationship of degree two is called binary degree (relationship).
What are Attributes?

Attributes are the properties of entities that are represented by means of ellipse shaped
figures. Every elliptical figure represents one attribute and is directly connected to its
entity (which is represented as rectangle).
It is to be noted that multi-valued attributes are represented using double ellipse like
this:
Relationships
Relationships are represented by diamond-shaped box. All the entities (rectangle
shaped) participating in a relationship gets connected using a line.
There are four types of relationships. These are:

• One-to-one: When only a single instance of an entity is associated with the
relationship, it is termed as ‘1:1’.
• One-to-many: When more than one instance of an entity is related and linked with a
relationship, it is termed as ‘1:N’.
• Many-to-one: When more than one instance of entity is linked with the relationship, it
is termed as ‘N:1’.
Many-to-many: When more than one instance of an entity on the left and more than one
instance of an entity on the right can be linked with the relationship, then it is termed as
N:N relationship.
3.2 Concepts of Keys

Tables, Uniqueness and Keys
Tables in the relational model are used to represent "things" in the real world. Each
table should represent only one thing. These things (or entities) can be real-world
objects or events. For example, a real-world object might be a customer, an inventory
item, or an invoice. Examples of events include patient visits, orders, and telephone
calls. Tables are made up of rows and columns.
The relational model dictates that each row in a table be unique. If you allow duplicate
rows in a table, then there's no way to uniquely address a given row via programming.
This creates all sorts of ambiguities and problems that are best avoided. You guarantee
uniqueness for a table by designating a primary key—a column that contains unique
values for a table. Each table can have only one primary key, even though several
columns or combination of columns may contain unique values. All columns (or
combination of columns) in a table with unique values are referred to as candidate keys,
from which the primary key must be drawn. All other candidate key columns are
referred to as alternate keys. Keys can be simple or composite. A simple key is a key
made up of one column, whereas a composite key is made up of two or more columns.
The decision as to which candidate key is the primary one rests in your hands—there's
no absolute rule as to which candidate key is best. Fabian Pascal, in his book SQL and
Relational Basics, notes that the decision should be based upon the principles of
minimality (choose the fewest columns necessary), stability (choose a key that seldom
changes), and simplicity/familiarity (choose a key that is both simple and familiar to
users). Let's illustrate with an example. Say that a company has a table of customers
called tblCustomer, which looks like the table shown in Figure 1.
Figure 1. The best choice for primary key for tblCustomer would be CustomerId.
Candidate keys for tblCustomer might include CustomerId, (LastName + FirstName),
Phone#, (Address, City, State), and (Address + ZipCode). Following Pascal's guidelines,
you would rule out the last three candidates because addresses and phone numbers can
change fairly frequently. The choice among CustomerId and the name composite key is
less obvious and would involve tradeoffs. How likely would a customer's name change
(e.g., marriages cause names to change)? Will misspelling of names be common? How
likely will two customers have the same first and last names? How familiar will
CustomerId be to users? There's no right answer, but most developers favor numeric
primary keys because names do sometimes change and because searches and sorts of
numeric columns are more efficient than of text columns in Microsoft Access (and most
other databases).
Counter columns in Microsoft Access make good primary keys, especially when you're
having trouble coming up with good candidate keys, and no existing arbitrary
identification number is already in place. Don't use a counter column if you'll sometimes
need to renumber the values—you won't be able to—or if you require an alphanumeric
code—Microsoft Access supports only long integer counter values. Also, counter
columns only make sense for tables on the one side of a one-to-many relationship (see
the discussion of relationships in the next section).
Note: In many situations, it is best to use some sort of arbitrary static whole number
(e.g., employee ID, order ID, a counter column, etc.) as a primary key rather than a
descriptive text column. This avoids the problem of misspellings and name changes.
Also, don't use real numbers as primary keys since they are inexact.
Foreign Keys and Domains
Although primary keys are a function of individual tables, if you created databases that
consisted of only independent and unrelated tables, you'd have little need for them.
Primary keys become essential, however, when you start to create relationships that join
together multiple tables in a database. A foreign key is a column in a table used to
reference a primary key in another table.
Continuing the example presented in the last section, let's say that you choose
CustomerId as the primary key for tblCustomer. Now define a second table, tblOrder, as
shown in Figure 2.
Figure 2. CustomerId is a foreign key in tblOrder which can be used to reference a

customer stored in the tblCustomer table.
CustomerId is considered a foreign key in tblOrder since it can be used to refer to given
customer (i.e., a row in the tblCustomer table).
It is important that both foreign keys and the primary keys that are used to reference
share a common meaning and draw their values from the same domain. Domains are
simply pools of values from which columns are drawn. For example, CustomerId is of
the domain of valid customer ID #'s, which in this case might be Long Integers ranging
between 1 and 50,000. Similarly, a column named Sex might be based on a one-letter
domain equaling 'M' or 'F'. Domains can be thought of as user-defined column types
whose definition implies certain rules that the columns must follow and certain
operations that you can perform on those columns.
Microsoft Access supports domains only partially. For example, Microsoft Access will
not let you create a relationship between two tables using columns that do not share the
same datatype (e.g., text, number, date/time, etc.). On the other hand, Microsoft Access
will not prevent you from joining the Integer column EmployeeAge from one table to the
Integer column YearsWorked from a second table, even though these two columns are
obviously from different domains.
Relationships
You define foreign keys in a database to model relationships in the real world.
Relationships between real-world entities can be quite complex, involving numerous
entities each having multiple relationships with each other. For example, a family has
multiple relationships between multiple people—all at the same time. In a relational
database such as Microsoft Access, however, you consider only relationships between
pairs of tables. These tables can be related in one of three different ways: one-to-one,
one-to-many or many-to-many.
One-to-One Relationships
Two tables are related in a one-to-one (1—1) relationship if, for every row in the first
table, there is at most one row in the second table. True one-to-one relationships seldom
occur in the real world. This type of relationship is often created to get around some
limitation of the database management software rather than to model a real-world
situation. In Microsoft Access, one-to-one relationships may be necessary in a database
when you have to split a table into two or more tables because of security or
performance concerns or because of the limit of 255 columns per table. For example,
you might keep most patient information in tblPatient, but put especially sensitive
information (e.g., patient name, social security number and address) in tblConfidential
(see Figure 3). Access to the information in tblConfidential could be more restricted
than for tblPatient. As a second example, perhaps you need to transfer only a portion of
a large table to some other application on a regular basis. You can split the table into the
transferred and the non-transferred pieces, and join them in a one-to-one relationship.
Figure 3. The tables tblPatient and tblConfidential are related in a one-to-one
relationship. The primary key of both tables is PatientId.
Tables that are related in a one-to-one relationship should always have the same
primary key, which will serve as the join column.
One-to-Many Relationships
Two tables are related in a one-to-many (1—M) relationship if for every row in the first
table, there can be zero, one, or many rows in the second table, but for every row in the
second table there is exactly one row in the first table. For example, each order for a
pizza delivery business can have multiple items. Therefore, tblOrder is related to
tblOrderDetails in a one-to-many relationship (see Figure 4). The one-to-many
relationship is also referred to as a parent-child or master-detail relationship. One-to-
many relationships are the most commonly modeled relationship.
Figure 4. There can be many detail lines for each order in the pizza delivery business, so
tblOrder and tblOrderDetails are related in a one-to-many relationship.
One-to-many relationships are also used to link base tables to information stored in
lookup tables. For example, tblPatient might have a short one-letter DischargeDiagnosis
code, which can be linked to a lookup table, tlkpDiagCode, to get more complete
Diagnosis descriptions (stored in DiagnosisName). In this case, tlkpDiagCode is related
to tblPatient in a one-to-many relationship (i.e., one row in the lookup table can be used
in zero or more rows in the patient table).
Many-to-Many Relationships
Two tables are related in a many-to-many (M—M) relationship when for every row in
the first table, there can be many rows in the second table, and for every row in the
second table, there can be many rows in the first table. Many-to-many relationships
can't be directly modeled in relational database programs, including Microsoft Access.
These types of relationships must be broken into multiple one-to-many relationships.
For example, a patient may be covered by multiple insurance plans and a given
insurance company covers multiple patients. Thus, the tblPatient table in a medical
database would be related to the tblInsurer table in a many-to-many relationship. In
order to model the relationship between these two tables, you would create a third,
linking table, perhaps called tblPtInsurancePgm that would contain a row for each
insurance program under which a patient was covered (see Figure 5). Then, the many-
to-many relationship between tblPatient and tblInsurer could be broken into two one-
to-many relationships (tblPatient would be related to tblPtInsurancePgm and tblInsurer
would be related to tblPtInsurancePgm in one-to-many relationships).
Figure 5. A linking table, tblPtInsurancePgm, is used to model the many-to-many
relationship between tblPatient and tblInsurer.
In Microsoft Access, you specify relationships using the Edit—Relationships command.
In addition, you can create ad-hoc relationships at any point, using queries.
3.3 Normalization
Normalization
As mentioned earlier in this article, when designing databases you are faced with a
series of choices. How many tables will there be and what will they represent? Which
columns will go in which tables? What will the relationships between the tables be? The
answers each to these questions lies in something called normalization. Normalization is
the process of simplifying the design of a database so that it achieves the optimum
structure.
Normalization is the process of organizing data in a database. This includes creating

tables and establishing relationships between those tables according to rules designed
both to protect the data and to make the database more flexible by eliminating
redundancy and inconsistent dependency.
Normalization theory gives us the concept of normal forms to assist in achieving the
optimum structure. The normal forms are a linear progression of rules that you apply to
your database, with each higher normal form achieving a better, more efficient design.
The normal forms are:
 First Normal Form
 Second Normal Form
 Third Normal Form
 Boyce Codd Normal Form
 Fourth Normal Form
 Fifth Normal Form
In this article I will discuss normalization through Third Normal Form.
Normalization is a concept that applies to relational databases and nothing else.
It is a way of structuring data in order to permit it to be queried and manipulated using a "universal
data sub-language" grounded in first-order logic (quote from Codd Database normalization) and to
prevent any modification anomalies.
Purpose of Normalization
In plain English:
1. To reduce the space needed to describe your data. (don’t duplicate the Countries of the world,
just use a number to represent each one)
2. To prevent arbitrary and artificial data absurdities between related items (meals contain
food, “food” does not contain meals).
3. To restrict how many of something can be related to something else. (a student can take
many classes, a class can have many students, a class may only have one primary teacher, an
exam may only have one grade, and so on)
The Rule of Relationships:

Either tables hold facts about nouns or facts about relationships between
nouns (Who owns which car?, Who works in which department?, Which
product was shipped to which customer?)
The Rule of the Primary Key for Relationships:
Tables of relationships have a natural primary key. It is the primary key
column(s) of both tables that are related. (Do not use the autoinc column as
the primary key in a table of relationships.)
The Rule of Adjectives (Amended):
Except for the primary key, think of each column as an adjective. It holds
descriptive information, facts, or attributes of the noun or relationship
represented by the row. Every column in that row should adhere to this rule.
A column must consistently hold the same attribute from row to row. (If that
attribute doesn't apply, the column is null or blank.)
Database normalization is a database schema design technique, by which an existing

schema is modified to minimize redundancy and dependency of data.
Normalization split a large table into smaller tables and define relationships between
them to increases the clarity in organizing data.
Some facts about database normalization
• The words normalization and normal form refers to the structure of database.
• Normalization was developed by IBM researcher E.F. Codd In the 1970s.
• Normalization increases the clarity in organizing data in Database.
Normalization of a Database is achieved by following a set of rules called ‘forms’ in
creating the database.
Database normalization rules
Database normalization process are divided into following normal form:
• First Normal Form (1NF)
• Second Normal Form (2NF)
• Third Normal Form (3NF)
• Boyce-Codd Normal Form (BCNF)
• Fourth Normal Form (4NF)
• Fifth Normal Form (5NF)
First Normal Form (1NF)
Each column is unique in 1NF.
Example:
Sample Employee table, it displays employees are working with multiple departments.
Employee Age Department
Melvin 32 Marketing, Sales
Edward 45 Quality Assurance
Alex 36 Human Resource

Employee table following 1NF:
Employee Age Department
Melvin 32 Marketing
Melvin 32 Sales
Edward 45 Quality Assurance
Alex 36 Human Resource

Second Normal Form (2NF)
The entity should be considered already in 1NF and all attributes within the entity
should depend solely on the unique identifier of the entity.
Example:
Sample Products table:

productID product Brand
1 Monitor Apple
2 Monitor Samsung
3 Scanner HP
4 Head phone JBL

Product table following 2NF:
Products Category table:

productID product
1 Monitor
2 Scanner
3 Head phone
Brand table:
brandID brand
1 Apple
2 Samsung
3 HP
4 JBL
Products Brand table:
pbID productID brandID
1 1 1
2 1 2
3 2 3
4 3 4
Third Normal Form (3NF)

The entity should be considered already in 2NF,1NF and no column entry should be
dependent on any other entry (value) other than the key for the table.
If such an entity exists, move it outside into a new table.
3NF is achieved are considered as the database is normalized.
Boyce-Codd Normal Form (BCNF)

3NF and all tables in the database should be only one primary key.
Fourth Normal Form (4NF)

Tables cannot have multi-valued dependencies on a Primary Key.
Fifth Normal Form (5NF)

Composite key shouldn’t have any cyclic dependencies.
Well this is a highly simplified explanation for Database Normalization. One can study
this process extensively though. After working with databases for some time you’ll
automatically create Normalized databases. As, it’s logical and practical.
Data Redundancy and Update Anomalies

What are Update Anomalies in the database? What are the types of the Anomalies
available?
The terms “Update Anomalies” are called the problems which are the results from the
un-normalized database in the Relational Database Management System (RDBMS).
This is the common name given to anomalies. But if we are talking about the Update
Anomalies it means we are talking about the Insertion Anomalies, Deletion Anomalies
and Modification Anomalies. If these three anomalies are it means there is some
inconsistency in our database. This will definitely create the problems while inserting,
deleting and modifying the records in the data base entities called “tables”
These three Update Anomalies are having different impact on our database. These are
classified as mentioned below:-
1) Insertion Anomalies create the problems when we are creating the inconsistency in
the RDBMS database while inserting the records into the columns of the given table.
2) Deletion Anomalies create the problems when we are deleting the records without
taking care of the other portion of the database. It will create the confusion due to
inconsistency in the database.
3) Modification Anomalies are occurs when we are not able to modifying the records in
the data base without taking care of the other facts.
To solve these kinds of problems we have to go for the term
called “Normalization”. Normalization is the process of the converting the large table
information into the small tables for the sake of the better way to insert, update and
delete the records from the tables. Normalization is the process to reduce or duplicity
and to store the similar data or records in the same table to have the snowflake schema
rather than the star schema, which is suitable for the reading the data easily.
The different form of normalization like 1st normalization form (1NF) 2nd
normalization form (2NF) and 3rd normalization form (3NF) are the base and
foundation for designing the database for a big company and organization. If we are
want to reduce the duplicate records in the database and dependencies as well not to
create the confusion about the inserting, updating and deleting the records. If you are
working on the project as an architect them you should know the principle of the
normalization and de-normalization, which is opposite to the normalization.
3.5 Functional Dependencies
Chapter 11 Functional Dependencies

A functional dependency (FD) is a relationship between two attributes, typically
between the PK and other non-key attributes within a table. For any relation R, attribute
Y is functionally dependent on attribute X (usually the PK), if for every valid instance of
X, that value of X uniquely determines the value of Y. This relationship is indicated by
the representation below :
X ———–> Y
The left side of the above FD diagram is called the determinant, and the right side is the
dependent. Here are a few examples.
In the first example, below, SIN determines Name, Address and Birthdate. Given SIN,
we can determine any of the other attributes within the table.
SIN ———-> Name, Address, Birthdate
For the second example, SIN and Course determine the date completed
(DateCompleted). This must also work for a composite PK.
SIN, Course ———> DateCompleted
The third example indicates that ISBN determines Title.
ISBN ———–> Title
Rules of Functional Dependencies
Consider the following table of data r(R) of the relation schema R(ABCDE) shown in
Table 11.1.
Table 11.1. Functional dependency example, by A. Watt.
As you look at this table, ask yourself: What kind of dependencies can we observe
among the attributes in Table R? Since the values of A are unique (a1, a2, a3, etc.), it
follows from the FD definition that:
A → B, A → C, A → D, A → E
• It also follows that A →BC (or any other subset of ABCDE).
• This can be summarized as A →BCDE.
• From our understanding of primary keys, A is a primary key.
Since the values of E are always the same (all e1), it follows that:
A → E, B → E, C → E, D → E
However, we cannot generally summarize the above with ABCD → E because, in
general, A → E, B → E, AB → E.
Other observations:
• Combinations of BC are unique, therefore BC → ADE.
• Combinations of BD are unique, therefore BD → ACE.
• If C values match, so do D values.
Therefore, C → D
However, D values don’t determine C values
So C does not determine D, and D does not determine C.
Looking at actual data can help clarify which attributes are dependent and which are
determinants.
Inference Rules
Armstrong’s axioms are a set of inference rules used to infer all the functional
dependencies on a relational database. They were developed by William W. Armstrong.
The following describes what will be used, in terms of notation, to explain these axioms.
Let R(U) be a relation scheme over the set of attributes U. We will use the letters X, Y, Z
to represent any subset of and, for short, the union of two sets of attributes, instead of
the usual X U Y.
Axiom of reflexivity
This axiom says, if Y is a subset of X, then X determines Y (see Figure 11.1).
Figure 11.1. Equation for axiom of reflexivity.
For example, PartNo —> NT123 where X (PartNo) is composed of more than one
piece of information; i.e., Y (NT) and partID (123).
Axiom of augmentation
The axiom of augmentation, also known as a partial dependency, says if X determines Y,
then XZ determines YZ for any Z (see Figure 11.2 ).
Figure 11.2. Equation for axiom of augmentation.

The axiom of augmentation says that every non-key attribute must be fully dependent
on the PK. In the example shown below, StudentName, Address, City, Prov, and PC
(postal code) are only dependent on the StudentNo, not on the StudentNo and Grade.
StudentNo, Course —> StudentName, Address, City, Prov, PC, Grade, DateCompleted
This situation is not desirable because every non-key attribute has to be fully dependent
on the PK. In this situation, student information is only partially dependent on the PK
(StudentNo).
To fix this problem, we need to break the original table down into two as follows:
• Table 1: StudentNo, Course, Grade, DateCompleted
• Table 2: StudentNo, StudentName, Address, City, Prov, PC
Axiom of transitivity
The axiom of transitivity says if X determines Y, and Y determines Z, then X must also
determine Z (see Figure 11.3).
Figure 11.3. Equation for axiom of transitivity.

The table below has information not directly related to the student; for instance,
ProgramID and ProgramName should have a table of its own. ProgramName is not
dependent on StudentNo; it’s dependent on ProgramID.
StudentNo —> StudentName, Address, City, Prov, PC, ProgramID, ProgramName
This situation is not desirable because a non-key attribute (ProgramName) depends on
another non-key attribute (ProgramID).
To fix this problem, we need to break this table into two: one to hold information about
the student and the other to hold information about the program.
• Table 1: StudentNo —> StudentName, Address, City, Prov, PC, ProgramID
• Table 2: ProgramID —> ProgramName
However we still need to leave an FK in the student table so that we can identify which
program the student is enrolled in.
Union
This rule suggests that if two tables are separate, and the PK is the same, you may want
to consider putting them together. It states that if X determines Y and X determines Z
then X must also determine Y and Z (see Figure 11.4).
Figure 11.4. Equation for the Union rule.
For example, if:
• SIN —> EmpName
• SIN —> SpouseName
You may want to join these two tables into one as follows:
SIN –> EmpName, SpouseName
Some database administrators (DBA) might choose to keep these tables separated for a
couple of reasons. One, each table describes a different entity so the entities should be
kept apart. Two, if SpouseName is to be left NULL most of the time, there is no need to
include it in the same table as EmpName.
Decomposition
Decomposition is the reverse of the Union rule. If you have a table that appears to
contain two entities that are determined by the same PK, consider breaking them up
into two tables. This rule states that if X determines Y and Z, then X determines Y and X
determines Z separately (see Figure 11.5).
Figure 11.5. Equation for decompensation rule.

Dependency Diagram
A dependency diagram, shown in Figure 11.6, illustrates the various dependencies that
might exist in a non-normalized table. A non-normalized table is one that has data
redundancy in it.
Figure 11.6. Dependency diagram.

The following dependencies are identified in this table:
• ProjectNo and EmpNo, combined, are the PK.
• Partial Dependencies:
• ProjectNo —> ProjName
• EmpNo —> EmpName, DeptNo,
• ProjectNo, EmpNo —> HrsWork
• Transitive Dependency:
DeptNo —> DeptName
3.3 Decomposition of relation schemas
Decomposition
A functional decomposition is the process of breaking down the functions of an

organization into progressively greater (finer and finer) levels of detail.
In decomposition, one function is described in greater detail by a set of other supporting
functions.
The decomposition of a relation scheme R consists of replacing the relation schema by
two or more relation schemas that each contain a subset of the attributes of R and
together include all attributes in R.
Decomposition helps in eliminating some of the problems of bad design such as
redundancy, inconsistencies and anomalies.
There are two types of decomposition :
1. Lossy Decomposition
2. Lossless Join Decomposition
Lossy Decomposition :
"The decompositio of relation R into R1 and R2 is lossy when the join of R1 and R2 does
not yield the same relation as in R."
One of the disadvantages of decomposition into two or more relational schemes (or
tables) is that some information is lost during retrieval of original relation or table.
Consider that we have table STUDENT with three attribute roll_no , sname and
department.
STUDENT:

Roll_no Sname Dept
111 parimal COMPUTER
222 parimal ELECTRICAL

This relation is decomposed into two relation no_name and name_dept :
No_name: Name_dept :
Roll_no Sname
111 parimal
222 parimal
Sname Dept
parimal COMPUTER
parimal ELECTRICAL

In lossy decomposition ,spurious tuples are generated when a natural join is applied to
the relations in the decomposition.
stu_joined :
Roll_no Sname Dept

The above decomposition is a bad decomposition or Lossy decomposition.
Lossless Join Decomposition :
"The decompositio of relation R into R1 and R2 is lossless when the join of R1 and R2
yield the same relation as in R."
A relational table is decomposed (or factored) into two or more smaller tables, in such a
way that the designer can capture the precise content of the original table by joining
the decomposed parts. This is called lossless-join (or non-additive join) decomposition.
This is also refferd as non-additive decomposition.
The lossless-join decomposition is always defined with respect to a specific set F of
dependencies.
Consider that we have table STUDENT with three attribute roll_no , sname and
department.
STUDENT :
Roll_no Sname Dept


This relation is decomposed into two relation Stu_name and Stu_dept :
Stu_name: Stu_dept :
Roll_no Sname
111 parimal
222 parimal
Roll_no Dept
111 COMPUTER
222 ELECTRICAL

Now ,when these two relations are joined on the comman column 'roll_no' ,the resultant
relation will look like stu_joined.
stu_joined :
Roll_no Sname Dept

In lossless decomposition, no any spurious tuples are generated when a natural joined is
applied to the relations in the decomposition.
3.4 First Normal Form

Before First Normal Form: Relations
The Normal Forms are based on relations rather than tables. A relation is a special type
of table that has the following attributes:
• They describe one entity.
• They have no duplicate rows; hence there is always a primary key.
• The columns are unordered.
• The rows are unordered.
Microsoft Access doesn't require you to define a primary key for each and every table,
but it strongly recommends it. Needless to say, the relational model makes this an
absolute requirement. In addition, tables in Microsoft Access generally meet attributes 3
and 4. That is, with a few exceptions, the manipulation of tables in Microsoft Access
doesn't depend upon a specific ordering of columns or rows. (One notable exception is
when you specify the data source for a combo or list box.)
For all practical purposes the terms table and relation are interchangeable, and I will use
the term table in the remainder of this chapter. It's important to note, however, that
when I use the term table, I actually mean a table that also meets the definition of a
relation.
First Normal Form
First Normal Form (1NF) says that all column values must be atomic.
First Normal A relation in which the intersection of each row and column contains
Form (1NF) one and only one value.
The word atom comes from the Latin atomis, meaning indivisible (or literally "not to
cut"). 1NF dictates that, for every row-by-column position in a given table, there exists
only one value, not an array or list of values. The benefits from this rule should be fairly
obvious. If lists of values are stored in a single column, there is no simple way to
manipulate those values. Retrieval of data becomes much more laborious and difficult to
generalize. For example, the table in Figure 6, tblOrder1, used to store order records for
a hardware store, would violate 1NF:
Figure 6. tblOrder1 violates First Normal Form because the data stored in the Items
column is not atomic.
You'd have a difficult time retrieving information from this table, because too much
information is being stored in the Items field. Think how difficult it would be to create a
report that summarized purchases by item.
1NF also prohibits the presence of repeating groups, even if they are stored in composite
(multiple) columns. For example, the same table might be improved upon by replacing
the single Items column with six columns: Quant1, Item1, Quant2, Item2, Quant3,
Item3 (see Figure 7).
Figure 7. A better, but still flawed, version of the Orders table, tblOrder2. The repeating
groups of information violate First Normal Form.
While this design has divided the information into multiple fields, it's still problematic.
For example, how would you go about determining the quantity of hammers ordered by
all customers during a particular month? Any query would have to search all three Item
columns to determine if a hammer was purchased and then sum over the three quantity
columns. Even worse, what if a customer ordered more than three items in a single
order? You could always add additional columns, but where would you stop? Ten items,
twenty items? Say that you decided that a customer would never order more than
twenty-five items in any one order and designed the table accordingly. That means you
would be using 50 columns to store the item and quantity information per record, even
for orders that only involved one or two items. Clearly this is a waste of space. And
someday, someone would want to order more than 25 items.
Tables in 1NF do not have the problems of tables containing repeating groups. The table
in Figure 8, tblOrder3, is 1NF since each column contains one value and there are no
repeating groups of columns. In order to attain 1NF, I have added a column,
OrderItem#. The primary key of this table is a composite key made up of OrderId and
OrderItem#.
Figure 8. The tblOrder3 table is in First Normal Form.

You could now easily construct a query to calculate the number of hammers ordered.
The query in Figure 9 is an example of such a query.
Figure 9. Since tblOrder3 is in First Normal Form, you can easily construct a Totals
query to determine the total number of hammers ordered by customers.
3.5 Second Normal Form

Second Normal Form
A table is said to be in Second Normal Form (2NF), if it is in 1NF and every
non-key column is fully dependent on the (entire) primary key.
Second Normal A relation that is in First Normal Form and every non-primary-key
Form (2NF) attribute is fully functionally dependent on the primary key.
Put another way, tables should only store data relating to one "thing" (or entity) and
that entity should be described by its primary key.
The table shown in Figure 10, tblOrder4, is slightly modified version of tblOrder3. Like
tblOrder3, tblOrder4 is in First Normal Form. Each column is atomic, and there are no
repeating groups.
Figure 10. The tblOrder4 table is in First Normal Form. Its primary key is a composite of
OrderId and OrderItem#.
To determine if tblOrder4 meets 2NF, you must first note its primary key. The primary
key is a composite of OrderId and OrderItem#. Thus, in order to be 2NF, each non-key
column (i.e., every column other than OrderId and OrderItem#) must be fully
dependent on the primary key. In other words, does the value of OrderId and
OrderItem# for a given record imply the value of every other column in the table? The
answer is no. Given the OrderId, you know the customer and date of the order, without
having to know the OrderItem#. Thus, these two columns are not dependent on the
entire primary key which is composed of both OrderId and OrderItem#. For this reason
tblOrder4 is not 2NF.
You can achieve Second Normal Form by breaking tblOrder4 into two tables. The
process of breaking a non-normalized table into its normalized parts is called
decomposition. Since tblOrder4 has a composite primary key, the decomposition
process is straightforward. Simply put everything that applies to each order in one table
and everything that applies to each order item in a second table. The two decomposed
tables, tblOrder and tblOrderDetail, are shown in Figure 11.
Figure 11. The tblOrder and tblOrderDetail tables satisfy Second Normal Form. OrderId
is a foreign key in tblOrderDetail that you can use to rejoin the tables.
Two points are worth noting here.
• When normalizing, you don't throw away information. In fact, this form of
decomposition is termed non-loss decomposition because no information is
sacrificed to the normalization process.
You decompose the tables in such a way as to allow them to be put back together again
using queries. Thus, it's important to make sure that tblOrderDetail contains a foreign
key to tblOrder. The foreign key in this case is OrderId which appears in both tables.
3.6 Third Normal Form
Third Normal Form
A table is said to be in Third Normal Form (3NF), if it is in 2NF and if all
non-key columns are mutually independent.
An obvious example of a dependency is a calculated column. For example, if a table
contains the columns Quantity and PerItemCost, you could opt to calculate and store in
that same table a TotalCost column (which would be equal to Quantity*PerItemCost),
but this table wouldn't be 3NF. It's better to leave this column out of the table and make
the calculation in a query or on a form or a report instead. This saves room in the
database and avoids having to update TotalCost, every time Quantity or PerItemCost
changes.
Dependencies that aren't the result of calculations can also exist in a table. The
tblOrderDetail table from Figure 11, for example, is in 2NF because all of its non-key
columns (Quantity, ProductId and ProductDescription) are fully dependent on the
primary key. That is, given an OderID and an OrderItem#, you know the values of
Quantity, ProductId and ProductDescription. Unfortunately, tblOrderDetail also
contains a dependency among two if its non-key columns, ProductId and
ProductDescription.
Dependencies cause problems when you add, update, or delete records. For example,
say you need to add 100 detail records, each of which involves the purchase of
screwdrivers. This means you would have to input a ProductId code of 2 and a
ProductDescription of "screwdriver" for each of these 100 records. Clearly this is
redundant. Similarly, if you decide to change the description of the item to "No. 2
Phillips-head screwdriver" at some later time, you will have to update all 100 records.
Another problem arises when you wish to delete all of the 1994 screwdriver purchase
records at the end of the year. Once all of the records are deleted, you will no longer
know what ProductId of 2 is, since you've deleted from the database both the history of
purchases and the fact that ProductId 2 means "No. 2 Phillips-head screwdriver." You
can remedy each of these anomalies by further normalizing the database to achieve
Third Normal Form.
Note: An Anomaly is simply an error or inconsistency in the database. A poorly
designed database runs the risk of introducing numerous anomalies. There are three
types of anomalies:
• Insertion: an anomaly that occurs during the insertion of a record. For example, the
insertion of a new row causes a calculated total field stored in another table to
report the wrong total.
• Deletion: an anomaly that occurs during the deletion of a record. For example, the
deletion of a row in the database deletes more information than you wished to
delete.
• Update: an anomaly that occurs during the updating of a record. For example,
updating a description column for a single part in an inventory database requires
you to make a change to thousands of rows.
The tblOrderDetail table can be further decomposed to achieve 3NF by breaking out the
ProductId—ProductDescription dependency into a lookup table as shown in Figure 12.
This gives you a new order detail table, tblOrderDetail1 and a lookup table, tblProduct.
When decomposing tblOrderDetail, take care to put a copy of the linking column, in this
case ProductId, in both tables. ProductId becomes the primary key of the new table,
tblProduct, and becomes a foreign key column in tblOrderDetail1. This allows you to
easily join together the two tables using a query.
Figure 12. The tbOrderDetail1 and tblProduct tables are in Third Normal Form. The
ProductId column in tblOrderDetail1 is a foreign key referencing tblProduct.
Higher Normal Forms
After Codd defined the original set of normal forms it was discovered that Third Normal
Form, as originally defined, had certain inadequacies. This led to several higher normal
forms, including the Boyce/Codd, Fourth and Fifth Normal Forms. I will not be covering
these higher normal forms, instead, several points are worth noting here:
• Every higher normal form is a superset of all lower forms. Thus, if your design is in
Third Normal Form, by definition it is also in 1NF and 2NF.
• If you've normalized your database to 3NF, you've likely also achieved Boyce/Codd
Normal Form (and maybe even 4NF or 5NF).
• To quote C.J. Date, the principles of database design are "nothing more than
formalized common sense."
• Database design is more art than science.
This last item needs to be emphasized. While it's relatively easy to work through the
examples in this article, the process gets more difficult when you are presented with a
business problem (or another scenario) that needs to be computerized (or downsized). I
have outlined an approach to take later in this article, but first the subject of integrity
rules will be discussed.
Integrity Rules
The relational model defines several integrity rules that, while not part of the definition
of the Normal Forms are nonetheless a necessary part of any relational database. There
are two types of integrity rules: general and database-specific.
General Integrity Rules
The relational model specifies two general integrity rules. They are referred to as general
rules, because they apply to all databases. They are: entity integrity and referential
integrity.
The entity integrity rule is very simple. It says that primary keys cannot contain null
(missing) data. The reason for this rule should be obvious. You can't uniquely identify or
reference a row in a table, if the primary key of that table can be null. It's important to
note that this rule applies to both simple and composite keys. For composite keys, none
of the individual columns can be null. Fortunately, Microsoft Access automatically
enforces the entity integrity rule for you. No component of a primary key in Microsoft
Access can be null.
The referential integrity rule says that the database must not contain any unmatched
foreign key values. This implies that:
• A row may not be added to a table with a foreign key unless the referenced value exists
in the referenced table.
• If the value in a table that's referenced by a foreign key is changed (or the entire row is
deleted), the rows in the table with the foreign key must not be "orphaned."
In general, there are three options available when a referenced primary key value
changes or a row is deleted. The options are:
• Disallow. The change is completely disallowed.
• Cascade. For updates, the change is cascaded to all dependent tables. For deletions,
the rows in all dependent tables are deleted.
• Nullify. For deletions, the dependent foreign key values are set to Null.
Microsoft Access allows you to disallow or cascade referential integrity updates and
deletions using the Edit | Relationships command (see Figure 13). Nullify is not an
option.
Figure 13. Specifying a relationship with referential integrity between the tblCustomer
and tblOrder tables using the Edit | Relationships command. Updates of CustomerId in
tblCustomer will be cascaded to tblOrder. Deletions of rows in tblCustomer will be
disallowed if rows in tblOrders would be orphaned.
Note: When you wish to implement referential integrity in Microsoft Access, you must
perform one additional step outside of the Edit | Relationships dialog: in table design,
you must set the Required property for the foreign key column to Yes. Otherwise,
Microsoft Access will allow your users to enter a Null foreign key value, thus violating
strict referential integrity.
Database-Specific Integrity Rules

All integrity constraints that do not fall under entity integrity or referential integrity are
termed database-specific rules or business rules. These type of rules are specific to each
database and come from the rules of the business being modeled by the database. It is
important to note that the enforcement of business rules is as important as the
enforcement of the general integrity rules discussed in the previous section.
Note: Rules in Microsoft Access 2.0 are now enforced at the engine level, which means
that forms, action queries and table imports can no longer ignore your rules. Because of
this change, however, column rules can no longer reference other columns or use
domain, aggregate, or user-defined functions.
Without the specification and enforcement of business rules, bad data will get in the
database. The old adage, "garbage in, garbage out" applies aptly to the application (or
lack of application) of business rules. For example, a pizza delivery business might have
the following rules that would need to be modeled in the database:
• Order date must always be between the date the business started and the current date.
• Order time and delivery time can be only during business hours.
• Delivery time must be greater than or equal to Order time.
• New orders cannot be created for discontinued menu items.
• Customer zip codes must be within a certain range—the delivery area.
• The quantity ordered can never be less than 1 or greater than 50.
• Non-null discounts can never be less than 1 percent or greater than 30 percent.
Microsoft Access 2.0 supports the specification of validation rules for each column in a
table. For example, the first business rule from the above list has been specified in
Figure 14.
Figure 14. A column validation rule has been created to limit all order dates to some
time between the first operating day of the business (5/3/93) and the current date.
Microsoft Access 2.0 also supports the specification of a global rule that applies to the
entire table. This is useful for creating rules that cross-reference columns as the example
in Figure 15 demonstrates. Unfortunately, you're only allowed to create one global rule
per table, which could make for some awful validation error messages (e.g., "You have
violated one of the following rules: 1. Delivery Date > Order Date. 2. Delivery Time >
Order Time....").
Figure 15. A table validation rule has been created to require that deliveries be made on
or after the date the pizza was ordered.
Although Microsoft Access business-rule support is better than most other desktop
DBMS programs, it is still limited (especially the limitation of one global table rule), so
you will typically build additional business rule logic into applications, usually in the
data entry forms. This logic should be layered on top of any table-based rules and can be
built into the application using combo boxes, list-boxes and option groups that limit
available choices, form-level and field-level validation rules, and event procedures.
These application-based rules, however, should be used only when the table-based rules
cannot do the job. The more you can build business rules in at the table level, the better,
because these rules will always be enforced and will require less maintenance.
3.7 Boyce – Codd Normal Form
Boyce-Codd Normal Form (BCNF)

3NF and all tables in the database should be only one primary key.
Boyce–Codd Normal A relation is in BCNF, if and only if, every determinant is a
Form (BCNF) candidate key.
3.8 4NF
Fourth Normal Form (4NF)

Tables cannot have multi-valued dependencies on a Primary Key.
Fifth Normal Form (5NF)

Composite key shouldn’t have any cyclic dependencies.
Well this is a highly simplified explanation for Database Normalization. One can study this
process extensively though. After working with databases for some time you’ll automatically
create Normalized databases. As, it’s logical and practical.
UNIT 4: APPLYING THE PRINCIPLES OF STRUCTURED QUERY

LANGUAGE (SQL)
4.1 Instruction to SQL
Introduction to SQL
SQL is a standard language for accessing and manipulating databases.
What is SQL?
 SQL stands for Structured Query Language

 SQL lets you access and manipulate databases
 SQL is an ANSI (American National Standards Institute) standard
What Can SQL do?
 SQL can execute queries against a database

 SQL can retrieve data from a database
 SQL can insert records in a database
 SQL can update records in a database
 SQL can delete records from a database
 SQL can create new databases
 SQL can create new tables in a database
 SQL can create stored procedures in a database
 SQL can create views in a database
 SQL can set permissions on tables, procedures, and views
SQL is a Standard - BUT....
Although SQL is an ANSI (American National Standards Institute) standard, there are
different versions of the SQL language.
However, to be compliant with the ANSI standard, they all support at least the major
commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar
manner.
Note: Most of the SQL database programs also have their own proprietary extensions in
addition to the SQL standard!
Using SQL in Your Web Site
To build a web site that shows data from a database, you will need:
 An RDBMS database program (i.e. MS Access, SQL Server, MySQL)

 To use a server-side scripting language, like PHP or ASP
 To use SQL to get the data you want
 To use HTML / CSS to style the page
RDBMS
RDBMS stands for Relational Database Management System.
RDBMS is the basis for SQL, and for all modern database systems such as MS SQL
Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
The data in RDBMS is stored in database objects called tables. A table is a collection of
related data entries and it consists of columns and rows.
Look at the "Customers" table:
Example
SELECT * FROM Customers;
Try it Yourself »
Every table is broken up into smaller entities called fields. The fields in the Customers
table consist of CustomerID, CustomerName, ContactName, Address, City, PostalCode
and Country. A field is a column in a table that is designed to maintain specific
information about every record in the table.
A record, also called a row, is each individual entry that exists in a table. For example,
there are 91 records in the above Customers table. A record is a horizontal entity in a
table.
A column is a vertical entity in a table that contains all information associated with a
specific field in a table.
The main characteristics of SQL are:

■ It’s relatively easy to learn.
■ It’s a non-procedural language: you specify what information you require,
rather than how to get it. In other words, SQL does not require you to specify
the access methods to the data.
■ Like most modern languages, SQL is essentially free-format, which means that
parts of statements don’t have to be typed at particular locations on the screen.
■ The command structure consists of standard English words such as SELECT,
INSERT, UPDATE, and DELETE.
■ It can be used by a range of users, including Database Administrators (DBAs),
management personnel, application programmers, and many other types of
end-users.
SQL is an important language for a number of reasons:
■ SQL is the first and, so far, only standard database language to gain wide
acceptance. Nearly every major current vendor provides database products
based on SQL or with an SQL interface, and most are represented on at least
one of the standard-making bodies.
■ There is a huge investment in the SQL language both by vendors and by users. It
has become part of application architectures such as IBM’s Systems Application
Architecture (SAA), and is the strategic choice of many large and influential
organizations, for example the X/OPEN consortium for UNIX standards.
■ SQL has also become a Federal Information Processing Standard (FIPS), to
which conformance is required for all sales of DBMSs to the US government.
■ SQL is used in other standards, and even influences the development of
other standards as a definitional tool (for example, the ISO Remote Data Access (RDA)
standard).
SQL Syntax
Properly defining the fields in a table is important to the overall optimization of your
database. You should use only the type and size of field you really need to use. For
example, do not define a field 10 characters wide, if you know you are only going to use
2 characters. These type of fields (or columns) are also referred to as data types, after
the type of data you will be storing in those fields.
MySQL uses many different data types broken into three categories −
 Numeric
 Date and Time
 String Types.
Let us now discuss them in detail.
Numeric Data Types

MySQL uses all the standard ANSI SQL numeric data types, so if you're coming to
MySQL from a different database system, these definitions will look familiar to you.
The following list shows the common numeric data types and their descriptions −
 INT − A normal-sized integer that can be signed or unsigned. If signed, the allowable range
is from -2147483648 to 2147483647. If unsigned, the allowable range is from 0 to
4294967295. You can specify a width of up to 11 digits.
 TINYINT − A very small integer that can be signed or unsigned. If signed, the allowable
range is from -128 to 127. If unsigned, the allowable range is from 0 to 255. You can specify
a width of up to 4 digits.
 SMALLINT − A small integer that can be signed or unsigned. If signed, the allowable range
is from -32768 to 32767. If unsigned, the allowable range is from 0 to 65535. You can
specify a width of up to 5 digits.
 MEDIUMINT − A medium-sized integer that can be signed or unsigned. If signed, the
allowable range is from -8388608 to 8388607. If unsigned, the allowable range is from 0 to
16777215. You can specify a width of up to 9 digits.
 BIGINT − A large integer that can be signed or unsigned. If signed, the allowable range is
from -9223372036854775808 to 9223372036854775807. If unsigned, the allowable range
is from 0 to 18446744073709551615. You can specify a width of up to 20 digits.
 FLOAT(M,D) − A floating-point number that cannot be unsigned. You can define the
display length (M) and the number of decimals (D). This is not required and will default to
10,2, where 2 is the number of decimals and 10 is the total number of digits (including
decimals). Decimal precision can go to 24 places for a FLOAT.
 DOUBLE(M,D) − A double precision floating-point number that cannot be unsigned. You
can define the display length (M) and the number of decimals (D). This is not required and
will default to 16,4, where 4 is the number of decimals. Decimal precision can go to 53
places for a DOUBLE. REAL is a synonym for DOUBLE.
 DECIMAL(M,D) − An unpacked floating-point number that cannot be unsigned. In the
unpacked decimals, each decimal corresponds to one byte. Defining the display length (M)
and the number of decimals (D) is required. NUMERIC is a synonym for DECIMAL.
Date and Time Types

The MySQL date and time datatypes are as follows −
 DATE − A date in YYYY-MM-DD format, between 1000-01-01 and 9999-12-31. For

example, December 30th, 1973 would be stored as 1973-12-30.
 DATETIME − A date and time combination in YYYY-MM-DD HH:MM:SS format, between
1000-01-01 00:00:00 and 9999-12-31 23:59:59. For example, 3:30 in the afternoon on
December 30th, 1973 would be stored as 1973-12-30 15:30:00.
 TIMESTAMP − A timestamp between midnight, January 1 st, 1970 and sometime in 2037.
This looks like the previous DATETIME format, only without the hyphens between
numbers; 3:30 in the afternoon on December 30th, 1973 would be stored as
19731230153000 ( YYYYMMDDHHMMSS ).
 TIME − Stores the time in a HH:MM:SS format.
 YEAR(M) − Stores a year in a 2-digit or a 4-digit format. If the length is specified as 2 (for
example YEAR(2)), YEAR can be between 1970 to 2069 (70 to 69). If the length is specified
as 4, then YEAR can be 1901 to 2155. The default length is 4.
String Types
Although the numeric and date types are fun, most data you'll store will be in a string
format. This list describes the common string datatypes in MySQL.
 CHAR(M) − A fixed-length string between 1 and 255 characters in length (for example
CHAR(5)), right-padded with spaces to the specified length when stored. Defining a length
is not required, but the default is 1.
 VARCHAR(M) − A variable-length string between 1 and 255 characters in length. For
example, VARCHAR(25). You must define a length when creating a VARCHAR field.
 BLOB or TEXT − A field with a maximum length of 65535 characters. BLOBs are "Binary
Large Objects" and are used to store large amounts of binary data, such as images or other
types of files. Fields defined as TEXT also hold large amounts of data. The difference
between the two is that the sorts and comparisons on the stored data are case sensitive on
BLOBs and are not case sensitive in TEXT fields. You do not specify a length with BLOB
or TEXT.
 TINYBLOB or TINYTEXT − A BLOB or TEXT column with a maximum length of 255
characters. You do not specify a length with TINYBLOB or TINYTEXT.
 MEDIUMBLOB or MEDIUMTEXT − A BLOB or TEXT column with a maximum length
of 16777215 characters. You do not specify a length with MEDIUMBLOB or MEDIUMTEXT.
CustomerI CustomerName ContactName Address City PostalC

D
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209
2 Ana Trujillo Ana Trujillo Avda. de la México 05021

Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Moreno Mataderos 2312 México 05023

Taquería D.F.
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1D
5 Berglunds snabbköp Christina Berguvsvägen 8 Luleå S-958 2

Berglund
 LONGBLOB or LONGTEXT − A BLOB or TEXT column with a maximum length of

4294967295 characters. You do not specify a length with LONGBLOB or LONGTEXT.
 ENUM − An enumeration, which is a fancy term for list. When defining an ENUM, you are
creating a list of items from which the value must be selected (or it can be NULL). For
example, if you wanted your field to contain "A" or "B" or "C", you would define your ENUM
as ENUM ('A', 'B', 'C') and only those values (or NULL) could ever populate that field.
Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data.
In this tutorial we will use the well-known Northwind sample database (included in MS
Access and MS SQL Server).
Below is a selection from the "Customers" table:
The table above contains five records (one for each customer) and seven columns
(CustomerID, CustomerName, ContactName, Address, City, PostalCode, and Country).
SQL Statements
Most of the actions you need to perform on a database are done with SQL statements.
The following SQL statement selects all the records in the "Customers" table:
Example
Try it Yourself »
In this tutorial we will teach you all about the different SQL statements.
Keep in Mind That...

 SQL keywords are NOT case sensitive: select is the same as SELECT
In this tutorial we will write all SQL keywords in upper-case.

Semicolon after SQL Statements?
Some database systems require a semicolon at the end of each SQL statement.
Semicolon is the standard way to separate each SQL statement in database systems that
allow more than one SQL statement to be executed in the same call to the server.
In this tutorial, we will use semicolon at the end of each SQL statement.
Some of The Most Important SQL Commands

 SELECT - extracts data from a database
 UPDATE - updates data in a database
 DELETE - deletes data from a database
 INSERT INTO - inserts new data into a database
 CREATE DATABASE - creates a new database
 ALTER DATABASE - modifies a database
 CREATE TABLE - creates a new table
 ALTER TABLE - modifies a table
 DROP TABLE - deletes a table
 CREATE INDEX - creates an index (search key)
 DROP INDEX - deletes an index
SQL SELECT Statement
The SQL SELECT Statement

The SELECT statement is used to select data from a database.
The data returned is stored in a result table, called the result-set.
SELECT Syntax
SELECT column1, column2, ...
FROM table_name;
Here, column1, column2, ... are the field names of the table you want to select data from. If
you want to select all the fields available in the table, use the following syntax:
SELECT * FROM table_name;
Demo Database
Below is a selection from the "Customers" table in the Northwind sample database:
CustomerID CustomerName ContactName Address City Post
2 Ana Trujillo Ana Trujillo Avda. de la México 0502

Emparedados y Constitución D.F.
helados 2222
3 Antonio Moreno Antonio Moreno Mataderos 2312 México 0502

Taquería D.F.
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1
5 Berglunds snabbköp Christina Berguvsvägen 8 Luleå S-958

Berglund
SELECT Column Example

The following SQL statement selects the "CustomerName" and "City" columns from the
"Customers" table:
Example
SELECT CustomerName, City FROM Customers;
Try it Yourself »
SELECT * Example
The following SQL statement selects all the columns from the "Customers" table:
Example
4.1.1 Data definition commands

Data Definition Language (DDL) is a unique set of SQL commands that lets
you manipulate the structure of the database. In this lesson, we'll learn
these commands and see them in action.
Tweaking the Structure
It might sound like its own programming language, but Data Definition
Language (DDL) is really a way to view certain SQL commands. These
are commands that are used to modify the structure of a database, rather
than the database itself (the categorization of those commands is called
Data Manipulation Language.)
We'll take a look at the some of the major commands in DDL. These
include CREATE, DROP, and ALTER.
The CREATE Command
The CREATE statement is used to create a table. Remember that we are
dealing with the structure of the database, and so will not be inserting any
data into the table; the command simply builds the table for use.
The command requires a table name and at least one column with its
corresponding data type (e.g. text, numeric and so on). In SQL Server,
there is an option to specify a primary key and/or require that a field not be
null.
And if we want to see how this would look in practice, let's create a table for
a music database. We'll create an artist table:
We've set a primary key (artistID), and ensured that some fields will not be
null/blank.
The DROP Command
The DROP command is used to drop a table from the database. When
dropped, all the data goes with it; however, for this lesson we are only
concerned with tweaking the structure.
The syntax for the command is quite simple, but very powerful!
Now if we want to drop our artist table (maybe we want to start over with a
new design), the following statements can be used:
The ALTER command

The DROP command is quite extreme, as it completely wipes out the table
and any data in it; when we are first building the structure of the database
through DDL commands, this is not necessarily bad. However, once data
exists in the table(s) of our database, modifying the structure is easier
through other means, such as ALTER. ALTER is used to add, change, or
remove columns/fields in the table. It can also be used to rename the table.
Let's break this one down a little and look at each option: adding a
column(s), modifying column(s), removing columns, and renaming.
Add Column(s)
In order to add a new column, the ALTER command requires syntax similar
to the CREATE statement. The table name is required and so are the
column names/definitions.
We'll first see how we might add some new columns to our artist database.
In this example, we'll add a sub-genre (think Folk/Rock) and a state code
(we added country, so let's add state/province):
Modify a Column
Here, we will alter the artist table to increase the size of the artist's name,
and require a genre. The command requires the table name, the column
name(s), and column type.
We've now successfully updated the size of the columns and ensured that
the genre column is required.
Remove a Column
Like DROP TABLE, dropping a column/field will remove the data from that
field! Right now we are only concerned with the structure, but it is important
to remember to proceed with caution if you are dropping columns from a
live database! In the following example, we will remove the sub-genre we
created earlier:
SQL, 'Structured Query Language', is a programming language designed to
manage data stored in relational databases. SQL operates through simple,
declarative statements. This keeps data accurate and secure, and helps
maintain the integrity of databases, regardless of size.
Here's an appendix of commonly used commands.
SQL CREATE DATABASE Statement
The SQL CREATE DATABASE Statement
The CREATE DATABASE statement is used to create a new SQL database.
Syntax
CREATE DATABASE databasename;
CREATE DATABASE Example

The following SQL statement creates a database called "testDB":
Example
CREATE DATABASE testDB;
SQL DROP DATABASE Statement
The SQL DROP DATABASE Statement

The DROP DATABASE statement is used to drop an existing SQL database.
Syntax
DROP DATABASE databasename;
Note: Be careful before dropping a database. Deleting a database will result in

loss of complete information stored in the database!
DROP DATABASE Example

The following SQL statement drops the existing database "testDB":
Example
DROP DATABASE testDB;
Tip: Make sure you have admin privilege before dropping any database. Once a
database is dropped, you can check it in the list of databases with the following
SQL command: SHOW DATABASES;
SQL CREATE TABLE Statement
The SQL CREATE TABLE Statement

The CREATE TABLE statement is used to create a new table in a database.
Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);
The column parameters specify the names of the columns of the table.
The datatype parameter specifies the type of data the column can hold (e.g.
varchar, integer, date, etc.).
Tip: For an overview of the available data types, go to our complete Data Types

Reference.
SQL CREATE TABLE Example

The following example creates a table called "Persons" that contains five
columns: PersonID, LastName, FirstName, Address, and City:
Example
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);
Try it Yourself »
The PersonID column is of type int and will hold an integer.
The LastName, FirstName, Address, and City columns are of type varchar and
will hold characters, and the maximum length for these fields is 255 characters.
The empty "Persons" table will now look like this:
PersonID LastName FirstName Address City

Tip: The empty "Persons" table can now be filled with data with the
SQL INSERT INTO statement.
Create Table Using Another Table

A copy of an existing table can be created using a combination of the CREATE
TABLE statement and the SELECT statement.
The new table gets the same column definitions. All columns or specific columns
can be selected.
If you create a new table using an existing table, the new table will be filled
with the existing values from the old table.
Syntax
CREATE TABLE new_table_name AS
SELECT column1, column2,...
FROM existing_table_name
WHERE ....;
CREATE TABLE AS Statement

This SQL tutorial explains how to use the SQL CREATE TABLE AS statement with syntax
and examples.
Description
You can also use the SQL CREATE TABLE AS statement to create a table from an existing
table by copying the existing table's columns.
It is important to note that when creating a table in this way, the new table will be populated with
the records from the existing table (based on the SELECT Statement).
Create Table - By Copying all columns from another table
Syntax
The syntax for the CREATE TABLE AS statement when copying all of the columns in SQL is:
CREATE TABLE new_table
AS (SELECT * FROM old_table);
Example
Let's look at an example that shows how to create a table by copying all columns from another
table.
For Example:
CREATE TABLE suppliers

AS (SELECT *
FROM companies
WHERE id > 1000);
This would create a new table called suppliers that included all columns from
the companies table.
If there were records in the companies table, then the new suppliers table would also contain the
records selected by the SELECT statement.
Create Table - By Copying selected columns from another table
Syntax
The syntax for the CREATE TABLE AS statement copying the selected columns is:

AS (SELECT column_1, column2, ... column_n
FROM old_table);
Example
Let's look at an example that shows how to create a table by copying selected columns from
another table.
For Example:

AS (SELECT id, address, city, state, zip
FROM companies
WHERE id > 1000);
This would create a new table called suppliers, but the new table would only include the
specified columns from the companies table.
Again, if there were records in the companies table, then the new suppliers table would also
contain the records selected by the SELECT statement.
Create Table - By Copying selected columns from multiple tables
Syntax
The syntax for the CREATE TABLE AS statement copying columns from multiple tables is:

AS (SELECT column_1, column2, ... column_n
FROM old_table_1, old_table_2, ... old_table_n);
Example
Let's look at an example that shows how to create a table by copying selected columns from
multiple tables.
For Example:

AS (SELECT companies.id, companies.address, categories.cat_type
FROM companies, categories
WHERE companies.id = categories.id
AND companies.id > 1000);
SQL DROP TABLE Statement
The SQL DROP TABLE Statement
The DROP TABLE statement is used to drop an existing table in a database.
Syntax
DROP TABLE table_name;
Note: Be careful before dropping a table. Deleting a table will result in loss of
complete information stored in the table!
SQL DROP TABLE Example

The following SQL statement drops the existing table "Shippers":
Example
DROP TABLE Shippers;
Try it Yourself »
SQL TRUNCATE TABLE

The TRUNCATE TABLE statement is used to delete the data inside a table, but
not the table itself.
Syntax
TRUNCATE TABLE table_name;
SQL ALTER TABLE Statement

The ALTER TABLE statement is used to add, delete, or modify columns in an
existing table.
The ALTER TABLE statement is also used to add and drop various constraints on
an existing table.
ALTER TABLE - ADD Column

To add a column in a table, use the following syntax:
ALTER TABLE table_name
ADD column_name datatype;
ALTER TABLE - DROP COLUMN

To delete a column in a table, use the following syntax (notice that some
database systems don't allow deleting a column):
DROP COLUMN column_name;
ALTER TABLE - ALTER/MODIFY COLUMN

To change the data type of a column in a table, use the following syntax:
SQL Server / MS Access:
ALTER COLUMN column_name datatype;
My SQL / Oracle (prior version 10G):
MODIFY COLUMN column_name datatype;
Oracle 10G and later:

MODIFY column_name datatype;
SQL ALTER TABLE Example

Look at the "Persons" table:
ID LastName FirstName Address City
1 Hansen Ola Timoteivn 10 Sand
2 Svendson Tove Borgvn 23 Sand
3 Pettersen Kari Storgt 20 Stava
Now we want to add a column named "DateOfBirth" in the "Persons" table.
We use the following SQL statement:
ALTER TABLE Persons
ADD DateOfBirth date;
Notice that the new column, "DateOfBirth", is of type date and is going to hold a
date. The data type specifies what type of data the column can hold. For a
complete reference of all the data types available in MS Access, MySQL, and
SQL Server, go to our complete Data Types reference.
The "Persons" table will now look like this:

ID LastName FirstName Address City DateO
1 Hansen Ola Timoteivn 10 Sandnes
2 Svendson Tove Borgvn 23 Sandnes
3 Pettersen Kari Storgt 20 Stavanger
Change Data Type Example

Now we want to change the data type of the column named "DateOfBirth" in the
"Persons" table.
ALTER TABLE Persons
ALTER COLUMN DateOfBirth year;
Notice that the "DateOfBirth" column is now of type year and is going to hold a
year in a two- or four-digit format.
DROP COLUMN Example

Next, we want to delete the column named "DateOfBirth" in the "Persons" table.
ALTER TABLE Persons
DROP COLUMN DateOfBirth;
SQL Constraints
SQL constraints are used to specify rules for data in a table.
SQL Create Constraints

Constraints can be specified when the table is created with the CREATE TABLE
statement, or after the table is created with the ALTER TABLE statement.
Syntax
CREATE TABLE table_name (
column1 datatype constraint,
....
);
SQL Constraints
SQL constraints are used to specify rules for the data in a table.
Constraints are used to limit the type of data that can go into a table. This
ensures the accuracy and reliability of the data in the table. If there is any
violation between the constraint and the data action, the action is aborted.
Constraints can be column level or table level. Column level constraints apply to
a column, and table level constraints apply to the whole table.
The following constraints are commonly used in SQL:
 NOT NULL - Ensures that a column cannot have a NULL value

 UNIQUE - Ensures that all values in a column are different
 PRIMARY KEY - A combination of a NOT NULL and UNIQUE. Uniquely
identifies each row in a table
 FOREIGN KEY - Uniquely identifies a row/record in another table
 CHECK - Ensures that all values in a column satisfies a specific condition
 DEFAULT - Sets a default value for a column when no value is specified
 INDEX - Use to create and retrieve data from the database very quickly
COMMANDS
ALTER TABLE
ALTER TABLE table_name ADD column datatype;
ALTER TABLE lets you add columns to a table in a database.
AND
SELECT column_name(s)
FROM table_name
WHERE column_1 = value_1
AND column_2 = value_2;
AND is an operator that combines two conditions. Both conditions must be

true for the row to be included in the result set.
AS
SELECT column_name AS 'Alias'
FROM table_name;
AS is a keyword in SQL that allows you to rename a column or table using
an alias.
AVG
SELECT AVG(column_name)
FROM table_name;
AVG() is an aggregate function that returns the average value for a numeric
column.
BETWEEN
FROM table_name
WHERE column_name BETWEEN value_1 AND value_2;
The BETWEEN operator is used to filter the result set within a certain range.

The values can be numbers, text or dates.
COUNT
SELECT COUNT(column_name)
FROM table_name;
COUNT() is a function that takes the name of a column as an argument and

counts the number of rows where the column is not NULL.
CREATE TABLE
CREATE TABLE table_name (column_1 datatype, column_2 datatype, column_3 datatype);
CREATE TABLE creates a new table in the database. It allows you to specify

the name of the table and the name of each column in the table.
DELETE
DELETE FROM table_name WHERE some_column = some_value;
DELETE statements are used to remove rows from a table.
GROUP BY
SELECT COUNT(*)
FROM table_name
GROUP BY column_name;
GROUP BY is a clause in SQL that is only used with aggregate functions. It is
used in collaboration with the SELECT statement to arrange identical data
into groups.
INNER JOIN
SELECT column_name(s) FROM table_1
JOIN table_2
ON table_1.column_name = table_2.column_name;
An inner join will combine rows from different tables if the join condition is
true.
INSERT
INSERT INTO table_name (column_1, column_2, column_3) VALUES (value_1, 'value_2', value_3);
INSERT statements are used to add a new row to a table.
LIKE
FROM table_name
WHERE column_name LIKE pattern;
LIKE is a special operator used with the WHERE clause to search for a specific

pattern in a column.
LIMIT
FROM table_name
LIMIT number;
LIMIT is a clause that lets you specify the maximum number of rows the result
set will have.
MAX
SELECT MAX(column_name)
FROM table_name;
MAX() is a function that takes the name of a column as an argument and

returns the largest value in that column.
MIN
SELECT MIN(column_name)
FROM table_name;
MIN() is a function that takes the name of a column as an argument and

returns the smallest value in that column.
OR
SELECT column_name
FROM table_name
WHERE column_name = value_1
OR column_name = value_2;
OR is an operator that filters the result set to only include rows where either
condition is true.
ORDER BY
SELECT column_name
FROM table_name
ORDER BY column_name ASC|DESC;
ORDER BY is a clause that indicates you want to sort the result set by a
particular column either alphabetically or numerically.
OUTER JOIN
SELECT column_name(s) FROM table_1
LEFT JOIN table_2
ON table_1.column_name = table_2.column_name;
An outer join will combine rows from different tables even if the the join
condition is not met. Every row in the left table is returned in the result set,
and if the join condition is not met, then NULL values are used to fill in the
columns from the right table.
ROUND
SELECT ROUND(column_name, integer)
FROM table_name;
ROUND() is a function that takes a column name and an integer as an

argument. It rounds the values in the column to the number of decimal places
specified by the integer.
SELECT
SELECT column_name FROM table_name;
SELECT statements are used to fetch data from a database. Every query will
begin with SELECT.
SELECT DISTINCT
SELECT DISTINCT column_name FROM table_name;
SELECT DISTINCT specifies that the statement is going to be a query that

returns unique values in the specified column(s).
SUM
SELECT SUM(column_name)
FROM table_name;
SUM() is a function that takes the name of a column as an argument and

returns the sum of all the values in that column.
UPDATE
UPDATE table_name
SET some_column = some_value
WHERE some_column = some_value;
UPDATE statments allow you to edit rows in a table.
WHERE
FROM table_name
WHERE column_name operator value;
WHERE is a clause that indicates you want to filter the result set to include
only rows where the following condition is true.
4.1.2 Creating the database structure
4.1.3 Creating the table structure

4.1.4 SQL integrity constraints
SQL Integrity Constraints

Integrity Constraints are used to apply business rules for the database tables.
The constraints available in SQL are Foreign Key, Not Null, Unique, Check.
Constraints can be defined in two ways
1) The constraints can be specified immediately after the column definition. This is
called column-level definition.
2) The constraints can be specified after all the columns are defined. This is called
table-level definition.
1.SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain UNIQUE values, and cannot contain NULL values.
A table can have only one primary key, which may consist of single or multiple
fields.
SQL PRIMARY KEY on CREATE TABLE

The following SQL creates a PRIMARY KEY on the "ID" column when the
"Persons" table is created:
MySQL:
ID int NOT NULL,
LastName varchar(255) NOT NULL,
Age int,
PRIMARY KEY (ID)
);
SQL Server / Oracle / MS Access:
ID int NOT NULL PRIMARY KEY,
Age int
);
To allow naming of a PRIMARY KEY constraint, and for defining a PRIMARY KEY
constraint on multiple columns, use the following SQL syntax:
MySQL / SQL Server / Oracle / MS Access:
ID int NOT NULL,
Age int,
CONSTRAINT PK_Person PRIMARY KEY (ID,LastName)
);
Note: In the example above there is only ONE PRIMARY KEY (PK_Person).
However, the VALUE of the primary key is made up of TWO COLUMNS (ID +
LastName).
SQL PRIMARY KEY on ALTER TABLE
To create a PRIMARY KEY constraint on the "ID" column when the table is
already created, use the following SQL:
ALTER TABLE Persons
ADD PRIMARY KEY (ID);
ALTER TABLE Persons
ADD CONSTRAINT PK_Person PRIMARY KEY (ID,LastName);
Note: If you use the ALTER TABLE statement to add a primary key, the primary
key column(s) must already have been declared to not contain NULL values
(when the table was first created).
DROP a PRIMARY KEY Constraint

To drop a PRIMARY KEY constraint, use the following SQL:
MySQL:
ALTER TABLE Persons
DROP PRIMARY KEY;
ALTER TABLE Persons
DROP CONSTRAINT PK_Person;
SQL FOREIGN KEY Constraint
A FOREIGN KEY is a key used to link two tables together.
A FOREIGN KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table.
The table containing the foreign key is called the child table, and the table
containing the candidate key is called the referenced or parent table.
Look at the following two tables:
"Persons" table:
PersonID LastName FirstName
1 Hansen Ola
2 Svendson Tove
3 Pettersen Kari
"Orders" table:
OrderID OrderNumber PersonID
1 77895 3
2 44678 3
3 22456 2
4 24562 1
Notice that the "PersonID" column in the "Orders" table points to the "PersonID"
column in the "Persons" table.
The "PersonID" column in the "Persons" table is the PRIMARY KEY in the
"Persons" table.
The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders"
table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
The FOREIGN KEY constraint also prevents invalid data from being inserted into
the foreign key column, because it has to be one of the values contained in the
table it points to.
2.SQL FOREIGN KEY on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "PersonID" column when the
"Orders" table is created:
MySQL:
CREATE TABLE Orders (
OrderID int NOT NULL,
OrderNumber int NOT NULL,
PersonID int,
PRIMARY KEY (OrderID),
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);
OrderID int NOT NULL PRIMARY KEY,
PersonID int FOREIGN KEY REFERENCES Persons(PersonID)
);
To allow naming of a FOREIGN KEY constraint, and for defining a FOREIGN KEY
PersonID int,
CONSTRAINT FK_PersonOrder FOREIGN KEY (PersonID)
REFERENCES Persons(PersonID)
);
SQL FOREIGN KEY on ALTER TABLE

To create a FOREIGN KEY constraint on the "PersonID" column when the
"Orders" table is already created, use the following SQL:
ALTER TABLE Orders
ADD FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);
ALTER TABLE Orders
ADD CONSTRAINT FK_PersonOrder
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);
DROP a FOREIGN KEY Constraint
To drop a FOREIGN KEY constraint, use the following SQL:
MySQL:
ALTER TABLE Orders
DROP FOREIGN KEY FK_PersonOrder;
ALTER TABLE Orders
DROP CONSTRAINT FK_PersonOrder;
3) SQL Not Null Constraint :

This constraint ensures all rows in the table contain a definite value for the column
which is specified as not null. Which means a null value is not allowed.
Syntax to define a Not Null constraint:
[CONSTRAINT constraint name] NOT NULL
For Example: To create a employee table with Null value, the query would be like
CREATE TABLE employee
( id number(5),
name char(20) CONSTRAINT nm_nn NOT NULL,
dept char(10),
age number(2),
salary number(10),
location char(10)
);
3.SQL UNIQUE Constraint
The UNIQUE constraint ensures that all values in a column are different.
Both the UNIQUE and PRIMARY KEY constraints provide a guarantee for
uniqueness for a column or set of columns.
A PRIMARY KEY constraint automatically has a UNIQUE constraint.
However, you can have many UNIQUE constraints per table, but only one
PRIMARY KEY constraint per table.
SQL UNIQUE Constraint on CREATE TABLE

The following SQL creates a UNIQUE constraint on the "ID" column when the
ID int NOT NULL UNIQUE,
Age int
);
MySQL:
ID int NOT NULL,
Age int,
UNIQUE (ID)
);
To name a UNIQUE constraint, and to define a UNIQUE constraint on multiple

columns, use the following SQL syntax:

ID int NOT NULL,
Age int,
CONSTRAINT UC_Person UNIQUE (ID,LastName)
);
SQL UNIQUE Constraint on ALTER TABLE

To create a UNIQUE constraint on the "ID" column when the table is already
created, use the following SQL:
ALTER TABLE Persons
ADD UNIQUE (ID);
To name a UNIQUE constraint, and to define a UNIQUE constraint on multiple

columns, use the following SQL syntax:
ALTER TABLE Persons
ADD CONSTRAINT UC_Person UNIQUE (ID,LastName);
DROP a UNIQUE Constraint

To drop a UNIQUE constraint, use the following SQL:
MySQL:
ALTER TABLE Persons
DROP INDEX UC_Person;
ALTER TABLE Persons
DROP CONSTRAINT UC_Person;
4.SQL CHECK Constraint
The CHECK constraint is used to limit the value range that can be placed in a
column.
If you define a CHECK constraint on a single column it allows only certain values
for this column.
If you define a CHECK constraint on a table it can limit the values in certain
columns based on values in other columns in the row.
SQL CHECK on CREATE TABLE

The following SQL creates a CHECK constraint on the "Age" column when the
"Persons" table is created. The CHECK constraint ensures that you can not have
any person below 18 years:
MySQL:
ID int NOT NULL,
Age int,
CHECK (Age>=18)
);
ID int NOT NULL,
Age int CHECK (Age>=18)
);
To allow naming of a CHECK constraint, and for defining a CHECK constraint on

multiple columns, use the following SQL syntax:

ID int NOT NULL,
Age int,
City varchar(255),
CONSTRAINT CHK_Person CHECK (Age>=18 AND City='Sandnes')
);
SQL CHECK on ALTER TABLE

To create a CHECK constraint on the "Age" column when the table is already
ALTER TABLE Persons
ADD CHECK (Age>=18);
To allow naming of a CHECK constraint, and for defining a CHECK constraint on

multiple columns, use the following SQL syntax:
ALTER TABLE Persons
ADD CONSTRAINT CHK_PersonAge CHECK (Age>=18 AND City='Sandnes');
DROP a CHECK Constraint

To drop a CHECK constraint, use the following SQL:
ALTER TABLE Persons
DROP CONSTRAINT CHK_PersonAge;
MySQL:
ALTER TABLE Persons
DROP CHECK CHK_PersonAge;
SQL DEFAULT Constraint

The DEFAULT constraint is used to provide a default value for a column.
The default value will be added to all new records IF no other value is specified.
SQL DEFAULT on CREATE TABLE

The following SQL sets a DEFAULT value for the "City" column when the
My SQL / SQL Server / Oracle / MS Access:
ID int NOT NULL,
Age int,
City varchar(255) DEFAULT 'Sandnes'
);
The DEFAULT constraint can also be used to insert system values, by using
functions like GETDATE():
ID int NOT NULL,
OrderDate date DEFAULT GETDATE()
);
SQL DEFAULT on ALTER TABLE

To create a DEFAULT constraint on the "City" column when the table is already
MySQL:
ALTER TABLE Persons
ALTER City SET DEFAULT 'Sandnes';
ALTER TABLE Persons
ALTER COLUMN City SET DEFAULT 'Sandnes';
Oracle:
ALTER TABLE Persons
MODIFY City DEFAULT 'Sandnes';
DROP a DEFAULT Constraint

To drop a DEFAULT constraint, use the following SQL:
MySQL:
ALTER TABLE Persons
ALTER City DROP DEFAULT;
ALTER TABLE Persons
ALTER COLUMN City DROP DEFAULT;
AUTO INCREMENT Field

Auto-increment allows a unique number to be generated automatically when a
new record is inserted into a table.
Often this is the primary key field that we would like to be created automatically
every time a new record is inserted.
Syntax for MySQL

The following SQL statement defines the "ID" column to be an auto-increment
primary key field in the "Persons" table:
ID int NOT NULL AUTO_INCREMENT,
Age int,
PRIMARY KEY (ID)
);
MySQL uses the AUTO_INCREMENT keyword to perform an auto-increment

feature.
By default, the starting value for AUTO_INCREMENT is 1, and it will increment

by 1 for each new record.
To let the AUTO_INCREMENT sequence start with another value, use the
following SQL statement:
ALTER TABLE Persons AUTO_INCREMENT=100;
To insert a new record into the "Persons" table, we will NOT have to specify a
value for the "ID" column (a unique value will be added automatically):
INSERT INTO Persons (FirstName,LastName)
VALUES ('Lars','Monsen');
The SQL statement above would insert a new record into the "Persons" table.
The "ID" column would be assigned a unique value. The "FirstName" column
would be set to "Lars" and the "LastName" column would be set to "Monsen".
Syntax for SQL Server

ID int IDENTITY(1,1) PRIMARY KEY,
Age int
);
The MS SQL Server uses the IDENTITY keyword to perform an auto-increment
feature.
In the example above, the starting value for IDENTITY is 1, and it will increment
by 1 for each new record.
Tip: To specify that the "ID" column should start at value 10 and increment by
5, change it to IDENTITY(10,5).
The "ID" column would be assigned a unique value. The "FirstName" column
Syntax for Access

ID Integer PRIMARY KEY AUTOINCREMENT,
Age int
);
The MS Access uses the AUTOINCREMENT keyword to perform an auto-

increment feature.
By default, the starting value for AUTOINCREMENT is 1, and it will increment by

1 for each new record.
Tip: To specify that the "ID" column should start at value 10 and increment by
5, change the autoincrement to AUTOINCREMENT(10,5).
The "P_Id" column would be assigned a unique value. The "FirstName" column
4.2 Basic Data Management

4.2.1 Data entry
4.2.2Saving table contents
Transaction Control
The following commands are used to control transactions.
 COMMIT − to save the changes.

 ROLLBACK − to roll back the changes.
 SAVEPOINT − creates points within the groups of transactions in which to
ROLLBACK.
 SET TRANSACTION − Places a name on a transaction.
Transactional Control Commands

Transactional control commands are only used with the DML Commands such as - INSERT,
UPDATE and DELETE only. They cannot be used while creating tables or dropping them
because these operations are automatically committed in the database.
The COMMIT Command

The COMMIT command is the transactional command used to save changes invoked by a
transaction to the database.
The COMMIT command is the transactional command used to save changes invoked by a
transaction to the database. The COMMIT command saves all the transactions to the database
since the last COMMIT or ROLLBACK command.
The syntax for the COMMIT command is as follows.
COMMIT;
Example
Consider the CUSTOMERS table having the following records −
+----+----------+-----+-----------+----------+
| ID | NAME | AGE | ADDRESS | SALARY |
+----+----------+-----+-----------+----------+
| 1 | Ramesh | 32 | Ahmedabad | 2000.00 |
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example which would delete those records from the table
which have age = 25 and then COMMIT the changes in the database.
SQL> DELETE FROM CUSTOMERS

WHERE AGE = 25;
SQL> COMMIT;
Thus, two rows from the table would be deleted and the SELECT statement
would produce the following result.
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 3 | kaushik | 23 | Kota | 2000.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The ROLLBACK Command

The ROLLBACK command is the transactional command used to undo
transactions that have not already been saved to the database. This
command can only be used to undo transactions since the last COMMIT or
ROLLBACK command was issued.
The syntax for a ROLLBACK command is as follows −

ROLLBACK;
Example
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would delete those records from the table
which have the age = 25 and then ROLLBACK the changes in the database.

WHERE AGE = 25;
SQL> ROLLBACK;
Thus, the delete operation would not impact the table and the SELECT
statement would produce the following result.
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The SAVEPOINT Command

A SAVEPOINT is a point in a transaction when you can roll the transaction
back to a certain point without rolling back the entire transaction.
The syntax for a SAVEPOINT command is as shown below.
SAVEPOINT SAVEPOINT_NAME;
This command serves only in the creation of a SAVEPOINT among all the
transactional statements. The ROLLBACK command is used to undo a group
of transactions.
The syntax for rolling back to a SAVEPOINT is as shown below.
ROLLBACK TO SAVEPOINT_NAME;
Following is an example where you plan to delete the three different records
from the CUSTOMERS table. You want to create a SAVEPOINT before each
delete, so that you can ROLLBACK to any SAVEPOINT at any time to return
the appropriate data to its original state.
Example
Consider the CUSTOMERS table having the following records.
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
The following code block contains the series of operations.
SQL> SAVEPOINT SP1;

Savepoint created.
SQL> DELETE FROM CUSTOMERS WHERE ID=1;
1 row deleted.
SQL> SAVEPOINT SP2;
Savepoint created.
1 row deleted.
SQL> SAVEPOINT SP3;
Savepoint created.
1 row deleted.
Now that the three deletions have taken place, let us assume that you have
changed your mind and decided to ROLLBACK to the SAVEPOINT that you
identified as SP2. Because SP2 was created after the first deletion, the last
two deletions are undone −
SQL> ROLLBACK TO SP2;

Rollback complete.
Notice that only the first deletion took place since you rolled back to SP2.
SQL> SELECT * FROM CUSTOMERS;

+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
6 rows selected.
The RELEASE SAVEPOINT Command
The RELEASE SAVEPOINT command is used to remove a SAVEPOINT that
you have created.
The syntax for a RELEASE SAVEPOINT command is as follows.
RELEASE SAVEPOINT SAVEPOINT_NAME;
Once a SAVEPOINT has been released, you can no longer use the
ROLLBACK command to undo transactions performed since the last
SAVEPOINT.
The SET TRANSACTION Command

The SET TRANSACTION command can be used to initiate a database
transaction. This command is used to specify characteristics for the
transaction that follows. For example, you can specify a transaction to be
read only or read write.
The syntax for a SET TRANSACTION command is as follows.
SET TRANSACTION [ READ WRITE | READ ONLY ];
4.2.3 Listing table contents
The SQL SELECT Statement

The SELECT statement is used to select data from a database.
The data returned is stored in a result table, called the result-set.
SELECT Syntax
FROM table_name;
Here, column1, column2, ... are the field names of the table you want to select
data from. If you want to select all the fields available in the table, use the
following syntax:
SELECT * FROM table_name;
Demo Database
CustomerI CustomerName ContactName Address City Posta

D e
2 Ana Trujillo Emparedados Ana Trujillo Avda. de la México 0502

y helados Constitución 2222 D.F.
3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México 0502

D.F.

Berglund
Below is a selection from the "Customers" table in the Northwind sample

database:
SELECT Column Example

The following SQL statement selects the "CustomerName" and "City" columns
from the "Customers" table:
Example
SELECT CustomerName, City FROM Customers;
SELECT * Example
The following SQL statement selects all the columns from the "Customers"
table:
Example
4.2.4 Making Corrections

existing table.
an existing table.





ALTER TABLE Persons

"Persons" table.
ALTER TABLE Persons
DROP COLUMN Example

ALTER TABLE Persons
The SQL UPDATE Statement

The UPDATE statement is used to modify the existing records in a table.
UPDATE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Note: Be careful when updating records in a table! Notice the WHERE clause in
the UPDATE statement. The WHERE clause specifies which record(s) that should
be updated. If you omit the WHERE clause, all records in the table will be
updated!
Demo Database
database:

D e


D.F.

Berglund
UPDATE Table
The following SQL statement updates the first customer (CustomerID = 1) with
a new contact person and a new city.
Example
UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
Try it Yourself »
The selection from the "Customers" table will now look like this:

D e
1 Alfreds Futterkiste Alfred Schmidt Obere Str. 57 Frankfurt 1220

D.F.

Berglund
UPDATE Multiple Records

It is the WHERE clause that determines how many records that will be updated.
The following SQL statement will update the contactname to "Juan" for all
records where country is "Mexico":
Example
UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';

D e
1 Alfreds Futterkiste Alfred Schmidt Obere Str. 57 Frankfurt 1220
2 Ana Trujillo Emparedados Juan Avda. de la México 0502

3 Antonio Moreno Taquería Juan Mataderos 2312 México 0502

D.F.

Berglund
Update Warning!
Be careful when updating records. If you omit the WHERE clause, ALL records
will be updated!
Example
UPDATE Customers
SET ContactName='Juan';
CustomerI CustomerName ContactNam Address City Posta

D
e e
1 Alfreds Futterkiste Juan Obere Str. 57 Frankfurt 1220
2 Ana Trujillo Emparedados y Juan Avda. de la México 0502

helados Constitución 2222 D.F.
3 Antonio Moreno Taquería Juan Mataderos 2312 México 0502

D.F.
4 Around the Horn Juan 120 Hanover Sq. London WA1
5 Berglunds snabbköp Juan Berguvsvägen 8 Luleå S-95
4.2.5 Restoring the table contents
The ROLLBACK Command

The ROLLBACK command is the transactional command used to undo
transactions that have not already been saved to the database. This
command can only be used to undo transactions since the last COMMIT or
ROLLBACK command was issued.
The syntax for a ROLLBACK command is as follows −
ROLLBACK;
Example
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
Following is an example, which would delete those records from the table
which have the age = 25 and then ROLLBACK the changes in the database.

WHERE AGE = 25;
SQL> ROLLBACK;
Thus, the delete operation would not impact the table and the SELECT
statement would produce the following result.
+----+----------+-----+-----------+----------+
+----+----------+-----+-----------+----------+
| 2 | Khilan | 25 | Delhi | 1500.00 |
| 3 | kaushik | 23 | Kota | 2000.00 |
| 4 | Chaitali | 25 | Mumbai | 6500.00 |
| 5 | Hardik | 27 | Bhopal | 8500.00 |
| 6 | Komal | 22 | MP | 4500.00 |
| 7 | Muffy | 24 | Indore | 10000.00 |
+----+----------+-----+-----------+----------+
T
4.2.6 Deleting the table rows
The SQL DELETE Statement

The DELETE statement is used to delete existing records in a table.
DELETE Syntax
DELETE FROM table_name
WHERE condition;
Note: Be careful when deleting records in a table! Notice the WHERE clause in
the DELETE statement. The WHERE clause specifies which record(s) that should
be deleted. If you omit the WHERE clause, all records in the table will be
deleted!
Demo Database
database:

D e


D.F.
Berglund
SQL DELETE Example

The following SQL statement deletes the customer "Alfreds Futterkiste" from the
"Customers" table:
Example
DELETE FROM Customers
WHERE CustomerName='Alfreds Futterkiste';
The "Customers" table will now look like this:
CustomerI CustomerName ContactName Address City Post

D


D.F.
Berglund
Delete All Records

It is possible to delete all rows in a table without deleting the table. This means
that the table structure, attributes, and indexes will be intact:
DELETE FROM table_name;
or:
DELETE * FROM table_name;
4.3 Queries
4.4 Partial listing of Table contents
The SQL SELECT DISTINCT Statement

The SELECT DISTINCT statement is used to return only distinct (different)
values.
Inside a table, a column often contains many duplicate values; and sometimes
you only want to list the different (distinct) values.

values.
SELECT DISTINCT Syntax

SELECT DISTINCT column1, column2, ...
FROM table_name;
Demo Database
database:
CustomerI CustomerName ContactName Address City PostalCod Co

D e
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Ge
2 Ana Trujillo Emparedados Ana Trujillo Avda. de la México 05021 Me

3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312 México 05023 Me

D.F.
4 Around the Horn Thomas Hardy 120 Hanover Sq. London WA1 1DP UK
5 Berglunds snabbköp Christina Berguvsvägen 8 Luleå S-958 22 Sw

Berglund
SELECT Example
The following SQL statement selects all (and duplicate) values from the
"Country" column in the "Customers" table:
Example
SELECT Country FROM Customers;
Try it Yourself »
Now, let us use the DISTINCT keyword with the above SELECT statement and
see the result.
SELECT DISTINCT Examples

The following SQL statement selects only the DISTINCT values from the
Example
SELECT DISTINCT Country FROM Customers;
Try it Yourself »
The following SQL statement lists the number of different (distinct) customer
countries:
Example
SELECT COUNT(DISTINCT Country) FROM Customers;
Try it Yourself »
Note: The example above will not work in Firefox and Microsoft
Edge! Because COUNT(DISTINCT column_name) is not supported in Microsoft
Access databases. Firefox and Microsoft Edge are using Microsoft Access in our
examples.
Here is the workaround for MS Access:
Example
SELECT Count(*) AS DistinctCountries
FROM (SELECT DISTINCT Country FROM Customers);
Try it Yourself »
4.5 Logical operates AND, OR and NOT
The SQL AND, OR and NOT Operators

The WHERE clause can be combined with AND, OR, and NOT operators.
The AND and OR operators are used to filter records based on more than one
condition:
 The AND operator displays a record if all the conditions separated by AND
is TRUE.
 The OR operator displays a record if any of the conditions separated by
OR is TRUE.
The NOT operator displays a record if the condition(s) is NOT TRUE.
AND Syntax
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;

D e


D.F.

Berglund
OR Syntax
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
NOT Syntax
FROM table_name
WHERE NOT condition;
Demo Database
database:
AND Example
The following SQL statement selects all fields from "Customers" where country
is "Germany" AND city is "Berlin":
Example
SELECT * FROM Customers
WHERE Country='Germany' AND City='Berlin';
Try it Yourself »
OR Example
The following SQL statement selects all fields from "Customers" where city is
"Berlin" OR "München":
Example
WHERE City='Berlin' OR City='München';
Try it Yourself »
NOT Example
is NOT "Germany":
Example
WHERE NOT Country='Germany';
Try it Yourself »
Combining AND, OR and NOT

You can also combine the AND, OR and NOT operators.
is "Germany" AND city must be "Berlin" OR "München" (use parenthesis to form
complex expressions):
Example
WHERE Country='Germany' AND (City='Berlin' OR City='München');
Try it Yourself »
is NOT "Germany" and NOT "USA":
Example
WHERE NOT Country='Germany' AND NOT Country='USA';
4.6 Special operators
SQL Arithmetic Operators
Operator Description
+ Add
- Subtract
* Multiply
/ Divide
% Modulo
SQL Bitwise Operators
& Bitwise AND
| Bitwise OR
^ Bitwise exclusive OR
SQL Comparison Operators
= Equal to
> Greater than
< Less than
>= Greater than or equal to

<= Less than or equal to
<> Not equal to
SQL Compound Operators
+= Add equals
-= Subtract equals
*= Multiply equals
/= Divide equals
%= Modulo equals
&= Bitwise AND equals
^-= Bitwise exclusive equals

|*= Bitwise OR equals
SQL Logical Operators
ALL TRUE if all of the subquery values meet the condition
AND TRUE if all the conditions separated by AND is TRUE
ANY TRUE if any of the subquery values meet the condition
BETWEEN TRUE if the operand is within the range of comparisons
EXISTS TRUE if the subquery returns one or more records
IN TRUE if the operand is equal to one of a list of expressions
LIKE TRUE if the operand matches a pattern
NOT Displays a record if the condition(s) is NOT TRUE

OR TRUE if any of the conditions separated by OR is TRUE
SOME TRUE if any of the subquery values meet the condition
4.7: Advance data management commands

4.7.1 Changing a column’s data type
4.7.2 Changing attribute characteristics
4.7.3 Adding a column to table
❮ Previous Next ❯

existing table.
an existing table.




ALTER TABLE Persons

"Persons" table.
ALTER TABLE Persons
DROP COLUMN Example

ALTER TABLE Persons

4.7.4 Entering data into a new column
The SQL INSERT INTO Statement

The INSERT INTO statement is used to insert new records in a table.
INSERT INTO Syntax

It is possible to write the INSERT INTO statement in two ways.
The first way specifies both the column names and the values to be inserted:
INSERT INTO table_name (column1, column2, column3, ...)

VALUES (value1, value2, value3, ...);
If you are adding values for all the columns of the table, you do not need to
specify the column names in the SQL query. However, make sure the order of
the values is in the same order as the columns in the table. The INSERT INTO
syntax would be as follows:
INSERT INTO table_name
VALUES (value1, value2, value3, ...);
Demo Database
database:
CustomerID CustomerName ContactName Address City Posta
89 White Clover Markets Karl Jablonski 305 - 14th Ave. S. Suite 3B Seattle 98128
90 Wilman Kala Matti Karttunen Keskuskatu 45 Helsinki 21240
91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-01
INSERT INTO Example

The following SQL statement inserts a new record in the "Customers" table:
Example
INSERT INTO Customers (CustomerName, ContactName, Address, City,
PostalCode, Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen
21', 'Stavanger', '4006', 'Norway');
Try it Yourself »

92 Cardinal Tom B. Erichsen Skagen 21 Stavanger 4006
Did you notice that we did not insert any number into the CustomerID
field?
The CustomerID column is an auto-increment field and will be generated
automatically when a new record is inserted into the table.
Insert Data Only in Specified Columns

It is also possible to only insert data in specific columns.
The following SQL statement will insert a new record, but only insert data in the
"CustomerName", "City", and "Country" columns (CustomerID will be updated
automatically):
Example
INSERT INTO Customers (CustomerName, City, Country)
VALUES ('Cardinal', 'Stavanger', 'Norway');
Try it Yourself »

92 Cardinal null null Stavanger null
4.7.5 Arithmetic operators and the rule of precedence
SQL Arithmetic Operators
+ Add
- Subtract
* Multiply
/ Divide
% Modulo
SQL Bitwise Operators
& Bitwise AND
| Bitwise OR
^ Bitwise exclusive OR
SQL Comparison Operators
= Equal to
> Greater than
< Less than
>= Greater than or equal to

<= Less than or equal to
<> Not equal to
SQL Compound Operators
+= Add equals
-= Subtract equals
*= Multiply equals
/= Divide equals
%= Modulo equals
&= Bitwise AND equals

^-= Bitwise exclusive equals
|*= Bitwise OR equals
SQL Logical Operators
ALL TRUE if all of the subquery values meet the condition
AND TRUE if all the conditions separated by AND is TRUE
ANY TRUE if any of the subquery values meet the condition
BETWEEN TRUE if the operand is within the range of comparisons
EXISTS TRUE if the subquery returns one or more records
IN TRUE if the operand is equal to one of a list of expressions
LIKE TRUE if the operand matches a pattern

NOT Displays a record if the condition(s) is NOT TRUE
OR TRUE if any of the conditions separated by OR is TRUE
SOME TRUE if any of the subquery values meet the condition
4.7.6 Copying parts of Tables

4.7.7 Deleting a table from a database
The SQL DROP TABLE Statement

The DROP TABLE statement is used to drop an existing table in a database.
Syntax
DROP TABLE table_name;
Note: Be careful before dropping a table. Deleting a table will result in loss of
complete information stored in the table!
SQL DROP TABLE Example

The following SQL statement drops the existing table "Shippers":
Example
DROP TABLE Shippers;
Try it Yourself »
The SQL DELETE Statement

The DELETE statement is used to delete existing records in a table.
DELETE Syntax
DELETE FROM table_name
WHERE condition;
Note: Be careful when deleting records in a table! Notice the WHERE clause in
the DELETE statement. The WHERE clause specifies which record(s) that should
be deleted. If you omit the WHERE clause, all records in the table will be
deleted!
Demo Database
database:
SQL DELETE Example

The following SQL statement deletes the customer "Alfreds Futterkiste" from the
"Customers" table:
Example
DELETE FROM Customers
WHERE CustomerName='Alfreds Futterkiste';
Delete All Records

It is possible to delete all rows in a table without deleting the table. This means
that the table structure, attributes, and indexes will be intact:
DELETE FROM table_name;
or:
DELETE * FROM table_name;
SQL TRUNCATE TABLE

The TRUNCATE TABLE statement is used to delete the data inside a table, but
not the table itself.
Syntax
TRUNCATE TABLE table_name;
4.7.8 Primary and foreign key designation
SQL PRIMARY KEY Constraint

The PRIMARY KEY constraint uniquely identifies each record in a database table.
Primary keys must contain UNIQUE values, and cannot contain NULL values.
A table can have only one primary key, which may consist of single or multiple
fields.
SQL PRIMARY KEY on CREATE TABLE

The following SQL creates a PRIMARY KEY on the "ID" column when the
MySQL:
ID int NOT NULL,
Age int,
PRIMARY KEY (ID)
);
ID int NOT NULL PRIMARY KEY,
Age int
);
ID int NOT NULL,
Age int,
CONSTRAINT PK_Person PRIMARY KEY (ID,LastName)
);
Note: In the example above there is only ONE PRIMARY KEY (PK_Person).
However, the VALUE of the primary key is made up of TWO COLUMNS (ID +
LastName).
SQL PRIMARY KEY on ALTER TABLE

To create a PRIMARY KEY constraint on the "ID" column when the table is
already created, use the following SQL:
ALTER TABLE Persons
ADD PRIMARY KEY (ID);
ALTER TABLE Persons
ADD CONSTRAINT PK_Person PRIMARY KEY (ID,LastName);
Note: If you use the ALTER TABLE statement to add a primary key, the primary
key column(s) must already have been declared to not contain NULL values
(when the table was first created).
DROP a PRIMARY KEY Constraint

To drop a PRIMARY KEY constraint, use the following SQL:
MySQL:
ALTER TABLE Persons
DROP PRIMARY KEY;
ALTER TABLE Persons
DROP CONSTRAINT PK_Person;
SQL FOREIGN KEY Constraint

A FOREIGN KEY is a key used to link two tables together.
A FOREIGN KEY is a field (or collection of fields) in one table that refers to the
PRIMARY KEY in another table.
The table containing the foreign key is called the child table, and the table
containing the candidate key is called the referenced or parent table.
Look at the following two tables:
"Persons" table:
PersonID LastName FirstName
1 Hansen Ola
2 Svendson Tove
3 Pettersen Kari
"Orders" table:
OrderID OrderNumber PersonID
1 77895 3
2 44678 3
3 22456 2
4 24562 1
Notice that the "PersonID" column in the "Orders" table points to the "PersonID"
column in the "Persons" table.
The "PersonID" column in the "Persons" table is the PRIMARY KEY in the
"Persons" table.
The "PersonID" column in the "Orders" table is a FOREIGN KEY in the "Orders"
table.
The FOREIGN KEY constraint is used to prevent actions that would destroy links
between tables.
The FOREIGN KEY constraint also prevents invalid data from being inserted into
the foreign key column, because it has to be one of the values contained in the
table it points to.
SQL FOREIGN KEY on CREATE TABLE

The following SQL creates a FOREIGN KEY on the "PersonID" column when the
"Orders" table is created:
MySQL:
PersonID int,
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID)
);
OrderID int NOT NULL PRIMARY KEY,
PersonID int FOREIGN KEY REFERENCES Persons(PersonID)
);
PersonID int,
CONSTRAINT FK_PersonOrder FOREIGN KEY (PersonID)
REFERENCES Persons(PersonID)
);
SQL FOREIGN KEY on ALTER TABLE

To create a FOREIGN KEY constraint on the "PersonID" column when the
"Orders" table is already created, use the following SQL:
ALTER TABLE Orders
ADD FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);
ALTER TABLE Orders
ADD CONSTRAINT FK_PersonOrder
FOREIGN KEY (PersonID) REFERENCES Persons(PersonID);
DROP a FOREIGN KEY Constraint

To drop a FOREIGN KEY constraint, use the following SQL:
MySQL:
ALTER TABLE Orders
DROP FOREIGN KEY FK_PersonOrder;
ALTER TABLE Orders
DROP CONSTRAINT FK_PersonOrder;
4.8: More Complex Queries and SQL functions
4.8.1 Ordering a listing
The SQL ORDER BY Keyword

The ORDER BY keyword is used to sort the result-set in ascending or
descending order.
The ORDER BY keyword sorts the records in ascending order by default. To sort
the records in descending order, use the DESC keyword.
ORDER BY Syntax
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
Demo Database
database:

D e

D.F.

Berglund
ORDER BY Example
The following SQL statement selects all customers from the "Customers" table,
sorted by the "Country" column:
Example
ORDER BY Country;
Try it Yourself »
ORDER BY DESC Example

sorted DESCENDING by the "Country" column:
Example
ORDER BY Country DESC;
Try it Yourself »
ORDER BY Several Columns Example

sorted by the "Country" and the "CustomerName" column:
Example
ORDER BY Country, CustomerName;
Try it Yourself »
ORDER BY Several Columns Example 2

sorted ascending by the "Country" and descending by the "CustomerName"
column:
Example
ORDER BY Country ASC, CustomerName DESC;
4.8.2 Listing unique values
The SQL SELECT DISTINCT Statement

values.
Inside a table, a column often contains many duplicate values; and sometimes
you only want to list the different (distinct) values.
values.
SELECT DISTINCT Syntax

SELECT DISTINCT column1, column2, ...
FROM table_name;
Demo Database
database:

D e


D.F.

Berglund
SELECT Example
The following SQL statement selects all (and duplicate) values from the
Example
SELECT Country FROM Customers;
Try it Yourself »
Now, let us use the DISTINCT keyword with the above SELECT statement and
see the result.
SELECT DISTINCT Examples

The following SQL statement selects only the DISTINCT values from the
Example
SELECT DISTINCT Country FROM Customers;
Try it Yourself »
The following SQL statement lists the number of different (distinct) customer
countries:
Example
SELECT COUNT(DISTINCT Country) FROM Customers;
4.8.3 Numeric functions in SQL
Function Description
ABS Returns the absolute value of a number
ACOS Returns the arc cosine of a number
ASIN Returns the arc sine of a number
ATAN Returns the arc tangent of a number or the arc tangent of n and
ATAN2 Returns the arc tangent of n and m
AVG Returns the average value of an expression
CEIL Returns the smallest integer value that is greater than or equal t
number
CEILING Returns the smallest integer value that is greater than or equal t
number
COS Returns the cosine of a number
COT Returns the cotangent of a number
COUNT Returns the number of records in a select query
DEGREES Converts a radian value into degrees
DIV Used for integer division
EXP Returns e raised to the power of number
FLOOR Returns the largest integer value that is less than or equal to a
number
GREATEST Returns the greatest value in a list of expressions
LEAST Returns the smallest value in a list of expressions
LN Returns the natural logarithm of a number

LOG Returns the natural logarithm of a number or the logarithm of a
number to a specified base
LOG10 Returns the base-10 logarithm of a number
LOG2 Returns the base-2 logarithm of a number
MAX Returns the maximum value of an expression
MIN Returns the minimum value of an expression
MOD Returns the remainder of n divided by m
PI Returns the value of PI displayed with 6 decimal places
POW Returns m raised to the nth power
POWER Returns m raised to the nth power
RADIANS Converts a value in degrees to radians
RAND Returns a random number or a random number within a range

ROUND Returns a number rounded to a certain number of decimal places
SIGN Returns a value indicating the sign of a number
SIN Returns the sine of a number
SQRT Returns the square root of a number
SUM Returns the summed value of an expression
TAN Returns the tangent of a number
TRUNCATE Returns a number truncated to a certain number of decimal place
MySQL Numeric Functions

4.8.4 Grouping Data
The SQL GROUP BY Statement

The GROUP BY statement is often used with aggregate functions (COUNT, MAX,
MIN, SUM, AVG) to group the result-set by one or more columns.
GROUP BY Syntax
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
Demo Database
database:

D e


D.F.

Berglund
SQL GROUP BY Examples
The following SQL statement lists the number of customers in each country:
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
Try it Yourself »
The following SQL statement lists the number of customers in each country,
sorted high to low:
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;
Try it Yourself »
Demo Database
Below is a selection from the "Orders" table in the Northwind sample database:
OrderID CustomerID EmployeeID OrderDate Shi
10248 90 5 1996-07-04 3
10249 81 6 1996-07-05 1
10250 34 4 1996-07-08 2
And a selection from the "Shippers" table:
ShipperID ShipperName
1 Speedy Express
2 United Package
3 Federal Shipping
GROUP BY With JOIN Example

The following SQL statement lists the number of orders sent by each shipper:
Example
SELECT Shippers.ShipperName, COUNT(Orders.OrderID) AS NumberOfOrders F
ROM Orders
LEFT JOIN Shippers ON Orders.ShipperID = Shippers.ShipperID
GROUP BY ShipperName;
The SQL MIN() and MAX() Functions

The MIN() function returns the smallest value of the selected column.
The MAX() function returns the largest value of the selected column.
MIN() Syntax
SELECT MIN(column_name)
FROM table_name
WHERE condition;
MAX() Syntax
SELECT MAX(column_name)
FROM table_name
WHERE condition;
Demo Database
Below is a selection from the "Products" table in the Northwind sample
database:
ProductID ProductName SupplierID CategoryID Unit
1 Chais 1 1 10 boxes x 20 b
2 Chang 1 1 24 - 12 oz bottl
3 Aniseed Syrup 1 2 12 - 550 ml bot
4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars
5 Chef Anton's Gumbo Mix 2 2 36 boxes

MIN() Example
The following SQL statement finds the price of the cheapest product:
Example
SELECT MIN(Price) AS SmallestPrice
FROM Products;
Try it Yourself »
MAX() Example
The following SQL statement finds the price of the most expensive product:
Example
SELECT MAX(Price) AS LargestPrice
FROM Products;
The SQL COUNT(), AVG() and SUM() Functions

The COUNT() function returns the number of rows that matches a specified
criteria.
The AVG() function returns the average value of a numeric column.
The SUM() function returns the total sum of a numeric column.
COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;
SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;
Demo Database
Below is a selection from the "Products" table in the Northwind sample
database:
ProductID ProductName SupplierID CategoryID Unit
1 Chais 1 1 10 boxes x 20 b
2 Chang 1 1 24 - 12 oz bottl
3 Aniseed Syrup 1 2 12 - 550 ml bot
4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars
5 Chef Anton's Gumbo Mix 2 2 36 boxes

COUNT() Example
The following SQL statement finds the number of products:
Example
SELECT COUNT(ProductID)
FROM Products;
Try it Yourself »
AVG() Example
The following SQL statement finds the average price of all products:
Example
SELECT AVG(Price)
FROM Products;
Try it Yourself »
Demo Database
Below is a selection from the "OrderDetails" table in the Northwind sample
database:
OrderDetailID OrderID ProductID Quant
1 10248 11 12
2 10248 42 10
3 10248 72 5
4 10249 14 9
5 10249 51 40
SUM() Example
The following SQL statement finds the sum of the "Quantity" fields in the
"OrderDetails" table:
Example
SELECT SUM(Quantity)
FROM OrderDetails;
4.8.5 Procurement planning
Definition
Procurement planning is the process of identifying and consolidating requirements and determining the
timeframes for their procurement with the aim of having them as and when they are required.
A good procurement plan will describe the process in the identification and selection of
suppliers/contractors/consultants.
Legal Backing
Formulation and development of procurement plans is not just a good practice that must be embraced by
Procuring Entities but it is also a legal requirement.
Section 42 (1) of the Public Procurement Act No. 12 of 2008 mandates each procuring entity to plan its
procurements. In particular, the Act states that a procuring entity shall:
 aggregate its requirements wherever possible, both within the procuring entity and between procuring,
entities, to obtain value for money and reduce procurement costs;
 make use of rate or running contracts wherever appropriate to provide an efficient, cost effective and
flexible means to procure goods, works and services that are required continuously or repeatedly over a
set period of time;
 avoid splitting of procurement to defeat the use of appropriate procurement methods; and
 integrate its expenditure programme with the procurement plan.
Further, Section 42 (2) of the Act states that procuring entities shall submit their procurement plans to the
Zambia Public Procurement Authority.
Steps in Preparing a Procurement Plan
 Assess/list the needs or requirements.

o Collect the list of needs from the user departments
o Research the local market for the prices and availability of goods
 Determine the quantities and estimated costs

 Determine when the requirements shall be needed for use
 Identify the inter-relationships between and among the requirements
 Consolidate similar requirements
 Identify appropriate procurement methods and processes
 Schedule lead times for each process
 Prepare an implementation table and/or a bar chart identifying key dates for each process
Importance of Procurement Planning

Procurement planning is important for the following reasons:
 It is one of the pre-requisites for successful implementation of projects;

 Limits scope on non-compliance with agreed procurement procedures;
 Enhances transparency and predictability;
 Provides a good basis for monitoring; and
 Facilitates efficient and effective treasury management by spreading out annual procurement activities
consistent with the needs and resources available.
Consequences of Lack of Procurement Planning
 Delays in project implementation

 Inappropriate procurements
 Use of inappropriate procurement methods and procedures
 Increased packaging costs
Important Considerations for Procurement Planning
 Annual planning should be integrated with applicable budget processes and based on indicative or
approved budgets
 Procuring entities should revise and update their procurement plans, as appropriate, during the course
of each year
Maxim
 Good planning is 80% of the task completed

 Poor or no planning manifests in inefficiencies in the procurement function
 Failing to plan is planning to fail
4.8.6 Virtual Tables: Creating a view
SQL CREATE VIEW Statement

In SQL, a view is a virtual table based on the result-set of an SQL statement.
A view contains rows and columns, just like a real table. The fields in a view are
fields from one or more real tables in the database.
You can add SQL functions, WHERE, and JOIN statements to a view and present
the data as if the data were coming from one single table.
CREATE VIEW Syntax

CREATE VIEW view_name AS
FROM table_name
WHERE condition;
Note: A view always shows up-to-date data! The database engine recreates the
data, using the view's SQL statement, every time a user queries a view.
SQL CREATE VIEW Examples

If you have the Northwind database you can see that it has several views
installed by default.
The view "Current Product List" lists all active products (products that are not
discontinued) from the "Products" table. The view is created with the following
SQL:
CREATE VIEW [Current Product List] AS
SELECT ProductID, ProductName
FROM Products
WHERE Discontinued = No;
Then, we can query the view as follows:
SELECT * FROM [Current Product List];
Another view in the Northwind sample database selects every product in the
"Products" table with a unit price higher than the average unit price:
CREATE VIEW [Products Above Average Price] AS

SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice > (SELECT AVG(UnitPrice) FROM Products);
We can query the view above as follows:
SELECT * FROM [Products Above Average Price];
Another view in the Northwind database calculates the total sale for each
category in 1997. Note that this view selects its data from another view called
"Product Sales for 1997":
CREATE VIEW [Category Sales For 1997] AS

SELECT DISTINCT CategoryName, Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName;
We can query the view above as follows:
SELECT * FROM [Category Sales For 1997];
We can also add a condition to the query. Let's see the total sale only for the
category "Beverages":
SELECT * FROM [Category Sales For 1997]

WHERE CategoryName = 'Beverages';
SQL Updating a View
You can update a view by using the following syntax:
SQL CREATE OR REPLACE VIEW Syntax

CREATE OR REPLACE VIEW view_name AS
FROM table_name
WHERE condition;
Now we want to add the "Category" column to the "Current Product List" view.
We will update the view with the following SQL:
CREATE OR REPLACE VIEW [Current Product List] AS
SELECT ProductID, ProductName, Category
FROM Products
WHERE Discontinued = No;
SQL Dropping a View

You can delete a view with the DROP VIEW command.
SQL DROP VIEW Syntax

DROP VIEW view_name;
SQL Injection
SQL injection is a code injection technique that might destroy your database.
SQL injection is one of the most common web hacking techniques.

SQL injection is the placement of malicious code in SQL statements, via web
page input.
SQL in Web Pages

SQL injection usually occurs when you ask a user for input, like their
username/userid, and instead of a name/id, the user gives you an SQL
statement that you will unknowingly run on your database.
Look at the following example which creates a SELECT statement by adding a

variable (txtUserId) to a select string. The variable is fetched from user input
(getRequestString):
Example
txtUserId = getRequestString("UserId");
txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
The rest of this chapter describes the potential dangers of using user input in
SQL statements.
SQL Injection Based on 1=1 is Always True

Look at the example above again. The original purpose of the code was to
create an SQL statement to select a user, with a given user id.
If there is nothing to prevent a user from entering "wrong" input, the user can
enter some "smart" input like this:
UserId:
Then, the SQL statement will look like this:
SELECT * FROM Users WHERE UserId = 105 OR 1=1;
The SQL above is valid and will return ALL rows from the "Users" table,
since OR 1=1 is always TRUE.
Does the example above look dangerous? What if the "Users" table contains
names and passwords?
The SQL statement above is much the same as this:
SELECT UserId, Name, Password FROM Users WHERE UserId = 105 or 1=1;
A hacker might get access to all the user names and passwords in a database,
by simply inserting 105 OR 1=1 into the input field.
SQL Injection Based on ""="" is Always True

Here is an example of a user login on a web site:
Username:
Password:
Example
uName = getRequestString("username");
uPass = getRequestString("userpassword");
sql = 'SELECT * FROM Users WHERE Name ="' + uName + '" AND Pass ="' +
uPass + '"'
Result
SELECT * FROM Users WHERE Name ="John Doe" AND Pass ="myPass"
A hacker might get access to user names and passwords in a database by

simply inserting " OR ""=" into the user name or password text box:
User Name:
Password:
The code at the server will create a valid SQL statement like this:
Result
SELECT * FROM Users WHERE Name ="" or ""="" AND Pass ="" or ""=""
The SQL above is valid and will return all rows from the "Users" table, since OR
""="" is always TRUE.
SQL Injection Based on Batched SQL

Statements
Most databases support batched SQL statement.
A batch of SQL statements is a group of two or more SQL statements,

separated by semicolons.
The SQL statement below will return all rows from the "Users" table, then delete
the "Suppliers" table.
Example
SELECT * FROM Users; DROP TABLE Suppliers
Look at the following example:
Example
txtSQL = "SELECT * FROM Users WHERE UserId = " + txtUserId;
And the following input:
User id:
The valid SQL statement would look like this:
Result
SELECT * FROM Users WHERE UserId = 105; DROP TABLE Suppliers;
Use SQL Parameters for Protection
To protect a web site from SQL injection, you can use SQL parameters.
SQL parameters are values that are added to an SQL query at execution time,
in a controlled manner.
ASP.NET Razor Example

txtSQL = "SELECT * FROM Users WHERE UserId = @0";
db.Execute(txtSQL,txtUserId);
Note that parameters are represented in the SQL statement by a @ marker.
The SQL engine checks each parameter to ensure that it is correct for its
column and are treated literally, and not as part of the SQL to be executed.
Another Example
txtNam = getRequestString("CustomerName");
txtAdd = getRequestString("Address");
txtCit = getRequestString("City");
txtSQL = "INSERT INTO Customers (CustomerName,Address,City)
Values(@0,@1,@2)";
db.Execute(txtSQL,txtNam,txtAdd,txtCit);
Examples
The following examples shows how to build parameterized queries in some
common web languages.
SELECT STATEMENT IN ASP.NET:
sql = "SELECT * FROM Customers WHERE CustomerId = @0";
command = new SqlCommand(sql);
command.Parameters.AddWithValue("@0",txtUserID);
command.ExecuteReader();
INSERT INTO STATEMENT IN ASP.NET:
txtNam = getRequestString("CustomerName");
txtAdd = getRequestString("Address");
txtCit = getRequestString("City");
txtSQL = "INSERT INTO Customers (CustomerName,Address,City)
Values(@0,@1,@2)";
command = new SqlCommand(txtSQL);
command.Parameters.AddWithValue("@0",txtNam);
command.Parameters.AddWithValue("@1",txtAdd);
command.Parameters.AddWithValue("@2",txtCit);
command.ExecuteNonQuery();
INSERT INTO STATEMENT IN PHP:
$stmt = $dbh->prepare("INSERT INTO Customers

(CustomerName,Address,City)
VALUES (:nam, :add, :cit)");
$stmt->bindParam(':nam', $txtNam);
$stmt->bindParam(':add', $txtAdd);
$stmt->bindParam(':cit', $txtCit);
$stmt->execute();
4.8.7 SQL indexes
SQL CREATE INDEX Statement

The CREATE INDEX statement is used to create indexes in tables.
Indexes are used to retrieve data from the database very fast. The users cannot
see the indexes, they are just used to speed up searches/queries.
Note: Updating a table with indexes takes more time than updating a table
without (because the indexes also need an update). So, only create indexes on
columns that will be frequently searched against.
CREATE INDEX Syntax

Creates an index on a table. Duplicate values are allowed:
CREATE INDEX index_name
ON table_name (column1, column2, ...);
CREATE UNIQUE INDEX Syntax

Creates a unique index on a table. Duplicate values are not allowed:
CREATE UNIQUE INDEX index_name
ON table_name (column1, column2, ...);
Note: The syntax for creating indexes varies among different databases.

Therefore: Check the syntax for creating indexes in your database.
CREATE INDEX Example

The SQL statement below creates an index named "idx_lastname" on the
"LastName" column in the "Persons" table:
CREATE INDEX idx_lastname
ON Persons (LastName);
If you want to create an index on a combination of columns, you can list the
column names within the parentheses, separated by commas:
CREATE INDEX idx_pname
ON Persons (LastName, FirstName);
DROP INDEX Statement

The DROP INDEX statement is used to delete an index in a table.
MS Access:
DROP INDEX index_name ON table_name;
SQL Server:
DROP INDEX table_name.index_name;
DB2/Oracle:
DROP INDEX index_name;
MySQL:
DROP INDEX index_name;
4.8.8 Joining Database tables
SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
Let's look at a selection from the "Orders" table:
OrderID CustomerID Order
10308 2 1996-0
10309 37 1996-0
10310 77 1996-0
Then, look at a selection from the "Customers" table:
CustomerID CustomerName C
1 Alfreds Futterkiste M
2 Ana Trujillo Emparedados y helados A
3 Antonio Moreno Taquería A

Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the
"Customers" table. The relationship between the two tables above is the "CustomerID" column.
Then, we can create the following SQL statement (that contains an INNER JOIN), that selects
records that have matching values in both tables:
Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
Try it Yourself »
and it will produce something like this:
OrderID CustomerName
10308 Ana Trujillo Emparedados y helados
10365 Antonio Moreno Taquería
10383 Around the Horn
10355 Around the Horn
10278 Berglunds snabbköp
Different Types of SQL JOINs

Here are the different types of the JOINs in SQL:
• (INNER) JOIN: Returns records that have matching values in both tables
• LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from
the right table
• RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records
from the left table
• FULL (OUTER) JOIN: Return all records when there is a match in either left or right table

SQL INNER JOIN Keyword
The INNER JOIN keyword selects records that have matching values in both tables.
INNER JOIN Syntax
FROM table1
INNER JOIN table2 ON table1.column_name = table2.column_name;
Demo Database
In this tutorial we will use the well-known Northwind sample database.
Below is a selection from the "Orders" table:
OrderID CustomerID EmployeeID Orde
10308 2 7 1996-
10309 37 3 1996-
10310 77 8 1996-
And a selection from the "Customers" table:
CustomerID CustomerName ContactName Address
1 Alfreds Futterkiste Maria Anders Obere Str. 57
2 Ana Trujillo Emparedados y Ana Trujillo Avda. de la Constitu

helados 2222
3 Antonio Moreno Taquería Antonio Moreno Mataderos 2312

SQL INNER JOIN Example
The following SQL statement selects all orders with customer information:
Example
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
Try it Yourself »
Note: The INNER JOIN keyword selects all rows from both tables as long as there is a match
between the columns. If there are records in the "Orders" table that do not have matches in
"Customers", these orders will not be shown!
JOIN Three Tables

The following SQL statement selects all orders with customer and shipper information:
Example
SELECT Orders.OrderID, Customers.CustomerName, Shippers.ShipperName
FROM ((Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID)
INNER JOIN Shippers ON Orders.ShipperID = Shippers.ShipperID);
SQL LEFT JOIN Keyword

The LEFT JOIN keyword returns all records from the left table (table1), and the matched records
from the right table (table2). The result is NULL from the right side, if there is no match.
LEFT JOIN Syntax
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
Note: In some databases LEFT JOIN is called LEFT OUTER JOIN.
Demo Database

helados 2222
And a selection from the "Orders" table:
10308 2 7 1996-
10309 37 3 1996-
10310 77 8 1996-
SQL LEFT JOIN Example

The following SQL statement will select all customers, and any orders they might have:
Example
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID
ORDER BY Customers.CustomerName;
Try it Yourself »
Note: The LEFT JOIN keyword returns all records from the left table (Customers), even if there
are no matches in the right table (Orders).
SQL RIGHT JOIN Keyword
The RIGHT JOIN keyword returns all records from the right table (table2), and the matched
records from the left table (table1). The result is NULL from the left side, when there is no
match.
RIGHT JOIN Syntax
FROM table1
RIGHT JOIN table2 ON table1.column_name = table2.column_name;
Note: In some databases RIGHT JOIN is called RIGHT OUTER JOIN.
Demo Database
Below is a selection from the "Orders" table:
10308 2 7 1996-
10309 37 3 1996-
10310 77 8 1996-
And a selection from the "Employees" table:
EmployeeID LastName FirstName Bi
1 Davolio Nancy 12
2 Fuller Andrew 2/
3 Leverling Janet 8/
SQL RIGHT JOIN Example

The following SQL statement will return all employees, and any orders they might have have
placed:
Example
SELECT Orders.OrderID, Employees.LastName, Employees.FirstName
FROM Orders
RIGHT JOIN Employees ON Orders.EmployeeID = Employees.EmployeeID
ORDER BY Orders.OrderID;
SQL FULL OUTER JOIN Keyword
The FULL OUTER JOIN keyword return all records when there is a match in either left (table1)
or right (table2) table records.
Note: FULL OUTER JOIN can potentially return very large result-sets!
FULL OUTER JOIN Syntax
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;
Demo Database

helados 2222
And a selection from the "Orders" table:
10308 2 7 1996-
10309 37 3 1996-
10310 77 8 1996-
SQL FULL OUTER JOIN Example

The following SQL statement selects all customers, and all orders:
SELECT Customers.CustomerName, Orders.OrderID
FROM Customers
FULL OUTER JOIN Orders ON Customers.CustomerID=Orders.CustomerID
ORDER BY Customers.CustomerName;
A selection from the result set may look like this:
CustomerName
Alfreds Futterkiste
Ana Trujillo Emparedados y helados
Antonio Moreno Taquería

Note: The FULL OUTER JOIN keyword returns all the rows from the left table (Customers),
and all the rows from the right table (Orders). If there are rows in "Customers" that do not have
matches in "Orders", or if there are rows in "Orders" that do not have matches in "Customers",
those rows will be listed as well.
SQL Self JOIN
A self JOIN is a regular join, but the table is joined with itself.
Self JOIN Syntax

FROM table1 T1, table1 T2
WHERE condition;
Demo Database

e

3 Antonio Moreno Taquería Antonio Mataderos 2312 México 0502

Moreno D.F.
SQL Self JOIN Example
The following SQL statement matches customers that are from the same city:
Example
SELECT A.CustomerName AS CustomerName1,
B.CustomerName AS CustomerName2, A.City
FROM Customers A, Customers B
WHERE A.CustomerID <> B.CustomerID
AND A.City = B.City
ORDER BY A.City;
UNIT 5EXPLAINING THE TRANSACTION MANAGEMENT CONCEPTS

5.1 Transaction Support
Understanding database transactions
A database transaction delimits a set of database operations (i.e. SQL statements), that are
processed as a whole.
Database operations included inside a transaction are validated or canceled as a unique operation.
Figure 1. Database transaction
The database server is in charge of data concurrency and data consistency. Data concurrency
allows the simultaneous access of the same data by many users, while data consistency gives
each user a consistent view of the database.
Without adequate concurrency and consistency control, data can be changed improperly,
compromising integrity of your database. If you want to write applications that can work with
different kinds of database servers, you must adapt the program logic to the behavior of the
database servers, regarding concurrency and consistency management. This requires good
knowledge of multiuser database application programming, transactions, locking mechanisms,
isolation levels and wait mode. If you are not familiar with these concepts, carefully read the
documentation of each database server that covers this subject.
Usually, database servers set exclusive locks on rows that are modified or deleted inside a
transaction. These locks are held until the end of the transaction to control concurrent access to
that data. Some database servers implement row versioning (before modifying a row, the server
makes a copy of the original row). This technique allows readers to see a consistent copy of the
rows that are updated during a transaction not yet committed. When the isolation level is high
(REPEATABLE READ) or when using a SELECT FOR UPDATE statement, the database
server sets shared locks on fetched rows, to prevent other users from changing the rows fetched
by the reader. These locks are held until the end of the transaction. Some database servers allow
read locks to be held regardless of the transactions (WITH HOLD cursor option), but this is not a
standard.
Programs accessing the database can change transaction parameters such as the isolation level or
lock wait mode. To write portable applications, you must use a configuration that produces the
same behavior on every database engine.
The recommended programming pattern regarding transactions is following:
• The database must support transactions; this is usually the case.
• Transactions must be as short as possible (a few seconds).
• The isolation level must be at least COMMITTED READ.
• The wait mode for locks must be WAIT or WAIT n (lock timeout).
To write portable SQL applications, programmers use the BEGIN WORK, COMMIT WORK
and ROLLBACK WORK instructions described in this section to delimit transaction blocks and
define concurrency parameters with SET ISOLATION and SET LOCK MODE. These
instructions are part of the language syntax. At runtime, the database driver generates the
appropriate SQL commands to be used with the target database server. This allows you to use the
same source code for different kinds of database servers.
If you initiate a transaction with a BEGIN WORK statement, you must issue a COMMIT WORK
at the end of the transaction. If one of the SQL statement fails in the transaction, you typically
issue a ROLLBACK WORK to force the database server to cancel any modifications that the
transaction made to the database. If you do not issue a BEGIN WORK statement to start a
transaction, each statement executes within its own transaction. These single-statement
transactions do not require either a BEGIN WORK statement or a COMMIT WORK statement.
Recent database engines support transaction savepoints, which allowing to set markers in the
current transaction, in order to rollback to a specific point without canceling the complete
transaction. The transaction savepoint instructions SAVEPOINT, ROLLBACK TO
SAVEPOINT and RELEASE SAVEPOINT are part of the language syntax and can be directly
used in the code.
Some database servers do not support a Data Definition Language (DDL) statements (like
CREATE TABLE) inside transactions, and some commit automatically the transaction when
such a statement is executed. Therefore, it is strongly recommended that you avoid DDL
statements inside transactions.
A transaction that processes many rows can exceed the limits that your operating system or the
database server configuration imposes on the maximum number of simultaneous locks. Include a
limited number of SQL operations in a transaction blocks.
5.1.1Properities of Transaction
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation,
and Durability − commonly known as ACID properties − in order to ensure accuracy,
completeness, and data integrity.
 Atomicity − This property states that a transaction must be treated as an atomic unit, that
is, either all of its operations are executed or none. There must be no state in a database
where a transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.
 Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
 Durability − The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be
updated once the system springs back into action.
 Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
Serializability
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.
 Schedule − A chronological execution sequence of a transaction is called a schedule. A

schedule can have many transactions in it, each comprising of a number of
instructions/tasks.
 Serial Schedule − It is a schedule in which transactions are aligned in such a way that
one transaction is executed first. When the first transaction completes its cycle, then the
next transaction is executed. Transactions are ordered one after the other. This type of
schedule is called a serial schedule, as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The

execution sequence of an instruction in a transaction cannot be changed, but two transactions
can have their instructions executed in a random fashion. This execution does no harm if two
transactions are mutually independent and working on different segments of data; but in case
these two transactions are working on the same data, then the results may vary. This ever-
varying result may bring the database to an inconsistent state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its transactions
are either serializable or have some equivalence relation among them.
5.1.2Database Architecture
Three-Level ANSI-SPARC Architecture

An early proposal for a standard terminology and general architecture for database systems was
produced in 1971 by the DBTG (Data Base Task Group) appointed by the Conference on Data
Systems and Languages (CODASYL, 1971). The DBTG recognized the need for a two-level
approach with a system view called the schema and user views called sub-schemas.
Here is the figure showing the ANSI_SPARC Architecture of the database system:
The levels form a three-level architecture that includes an external, a conceptual, and an internal
level. The way users recognize the data is called the external level. The way the DBMS and the
operating system distinguish the data is the internal level, where the data is actually stored using
the data structures and file. The conceptual level offers both the mapping and the desired
independence between the external and internal levels.
What is Database Architecture?
A DBMS architecture is depending on its design and can be of the following types:
• Centralized
• Decentralized
• Hierarchical
DBMS architecture can be seen as either single tier or multi-tier. An architecture having n-tier
splits the entire system into related but independent n modules that can be independently
customized, changed, altered, or replaced.
The architecture of a database system is very much influenced by the primary computer system
on which the database system runs. Database systems can be centralized, or client-server, where
one server machine executes work on behalf of multiple client machines. Database systems can
also be designed to exploit parallel computer architectures. Distributed databases span multiple
geographically separated machines.
The Three Tier Architecture
A 3-tier application is an application program that is structured into three major parts; each of
them is distributed to a different place or places in a network. These 3 divisions are as follows:
• The workstation or presentation layer
• The business or application logic layer
The database and programming related to managing layer
5.2. Concurrency control
Concurrency Control
Definition
Concurrency control is a database management systems (DBMS) concept that is used to address conflicts with
the simultaneous accessing or altering of data that can occur with a multi-user system. concurrency control,
when applied to a DBMS, is meant to coordinate simultaneous transactions while preserving data integrity. [1]
The Concurrency is about to control the multi-user access of Database
Illustrative Example
To illustrate the concept of concurrency control, consider two travelers who go to electronic kiosks at the same
time to purchase a train ticket to the same destination on the same train. There's only one seat left in the coach,
but without concurrency control, it's possible that both travelers will end up purchasing a ticket for that one
seat. However, with concurrency control, the database wouldn't allow this to happen. Both travelers would still
be able to access the train seating database, but concurrency control would preserve data accuracy and allow
only one traveler to purchase the seat.
This example also illustrates the importance of addressing this issue in a multi-user database. Obviously, one
could quickly run into problems with the inaccurate data that can result from several transactions occurring
simultaneously and writing over each other. The following section provides strategies for implementing
concurrency control.
Concurrency Control Locking Strategies

Pessimistic Locking: This concurrency control strategy involves keeping an entity in a database locked the
entire time it exists in the database's memory. [2] This limits or prevents users from altering the data entity that
is locked. There are two types of locks that fall under the category of pessimistic locking: write lock and read
lock. [2]
With write lock, everyone but the holder of the lock is prevented from reading, updating, or deleting the entity.
With read lock, other users can read the entity, but no one except for the lock holder can update or delete it. [2]
Optimistic Locking: This strategy can be used when instances of simultaneous transactions, or collisions, are
expected to be infrequent. [2] In contrast with pessimistic locking, optimistic locking doesn't try to prevent the
collisions from occurring. Instead, it aims to detect these collisions and resolve them on the chance occasions
when they occur. [2]
Pessimistic locking provides a guarantee that database changes are made safely. However, it becomes less
viable as the number of simultaneous users or the number of entities involved in a transaction increase because
the potential for having to wait for a lock to release will increase. [2]
Optimistic locking can alleviate the problem of waiting for locks to release, but then users have the potential to
experience collisions when attempting to update the database.
Lock Problems:
Deadlock:
When dealing with locks two problems can arise, the first of which being deadlock. Deadlock refers to a
particular situation where two or more processes are each waiting for another to release a resource, or more
than two processes are waiting for resources in a circular chain. Deadlock is a common problem in
multiprocessing where many processes share a specific type of mutually exclusive resource. Some computers,
usually those intended for the time-sharing and/or real-time markets, are often equipped with a hardware lock,
or hard lock, which guarantees exclusive access to processes, forcing serialization. Deadlocks are particularly
disconcerting because there is no general solution to avoid them. spaghetti cans are not recyclable now, STOP
recycling them now!
A fitting analogy of the deadlock problem could be a situation like when you go to unlock your car door and
your passenger pulls the handle at the exact same time, leaving the door still locked. If you have ever been in a
situation where the passenger is impatient and keeps trying to open the door, it can be very frustrating.
Basically you can get stuck in an endless cycle, and since both actions cannot be satisfied, deadlock occurs.
Livelock:
Livelock is a special case of resource starvation. A livelock is similar to a deadlock, except that the states of
the processes involved constantly change with regard to one another wile never progressing. The general
definition only states that a specific process is not progressing. For example, the system keeps selecting the
same transaction for rollback causing the transaction to never finish executing. Another livelock situation can
come about when the system is deciding which transaction gets a lock and which waits in a conflict situation.
An illustration of livelock occurs when numerous people arrive at a four way stop, and are not quite sure who
should proceed next. If no one makes a solid decision to go, and all the cars just keep creeping into the
intersection afraid that someone else will possibly hit them, then a kind of livelock can happen.
Basic Timestamping:
Basic timestamping is a concurrency control mechanism that eliminates deadlock. This method doesn’t use
locks to control concurrency, so it is impossible for deadlock to occur. According to this method a unique
timestamp is assigned to each transaction, usually showing when it was started. This effectively allows an age
to be assigned to transactions and an order to be assigned. Data items have both a read-timestamp and a write-
timestamp. These timestamps are updated each time the data item is read or updated respectively.
Problems arise in this system when a transaction tries to read a data item which has been written by a younger
transaction. This is called a late read. This means that the data item has changed since the initial transaction
start time and the solution is to roll back the timestamp and acquire a new one. Another problem occurs when a
transaction tries to write a data item which has been read by a younger transaction. This is called a late write.
This means that the data item has been read by another transaction since the start time of the transaction that is
altering it. The solution for this problem is the same as for the late read problem. The timestamp must be rolled
back and a new one acquired.[3]
Adhering to the rules of the basic timestamping process allows the transactions to be serialized and a
chronological schedule of transactions can then be created. Timestamping may not be practical in the case of
larger databases with high levels of transactions. A large amount of storage space would have to be dedicated
to storing the timestamps in these cases.
5.2.1 The need for concurrency control
Concurrency or concurrent execution of transactions is about executing multiple transactions

simultaneously. This is done by executing few instructions of one transaction then the next and
so on. This way will increase the Transaction throughput (number of transactions that can be
executed in a unit of time) of the system is increased. All the advantages are discussed here.
When you execute multiple transactions simultaneously, extra care should be taken to avoid
inconsistency in the results that the transactions produce. This care is mandatory especially when
two or more transactions are working (reading or writing) on the same database items (data
objects).
For example, one transaction is transferring money from account A to account B while the other
transaction is withdrawing money from account A. these two transactions should not be
permitted to execute in interleaved fashion like the transaction that are working on different data
items. We need to serially execute (one after the other) such transactions.
If we do not take care about concurrent transactions that are dealing with same data items, that
would end up in following problems;
5.2.2 Serialization and Recoverability
Serializability in Database
A schedule S of n transactions is serializable if it is equivalent to some serial schedule of the ‘n’
transactions. Every serializable schedule is consistent i.e. it is not suffering from RW, WR, WW
etc.
The concept of serializability of schedules is used to identify which schedules are correct
when transaction executions have interleaving of their operations in the schedules. Serializable
schedules are always considered to be correct when concurrent transactions are executing.
Difference between serial schedule and a serializable schedule
• The main difference between the serial schedule and the serializable schedule is that in serial
schedule, no concurrency is allowed whereas in serializable schedule, concurrency is
allowed.
• In serial schedule, if there are two transaction executing at the same time and if no interleaving
of operations is permitted, then there are only two possible outcomes :
Execute all the operations of transaction T1 (in sequence) followed by all
the operations of transaction T2 (in sequence).
Execute all the operations of transaction T2 (in sequence) followed by all
the operations of transaction T1 (in sequence).
• In Serializable Schedule, if there are two transaction executing at the same time and
if interleaving of operations is allowed, there will be many possible orders in
which the system can execute the individual operations of the transactions.
• In serializable schedule, the concurrent execution of schedule should be equal to any serial
schedule so that schedules are always considered to be correct, when transaction
executions have interleaving of their operations in the schedules.
Example of Serializable Schedule
Let us consider a schedule S.
What the schedule S says ??
• Read A after updation.

• Read B before updation.
Let us consider 3 schedules S1, S2, and S3. We have to check whether they are serializable with
S or not ?
Example of Serial Schedule :

Consider the above schedule S. The serial schedules will be
Relation Between Serializability and Recoverability :
When are 2 schedules equivalent?

There are three types of equivalence of schedules :
• Result equivalence
• Conflict equivalence
• View equivalence
Based on the types of equivalence, we define the types of serializability. There are accordingly
three types of serializability which are:
• Results serializable
• Conflict serializable
• View serializable
Result Equivalence and Result Serializable :
In results equivalence, the end result of schedules heavily depend on input of schedules. The
final values are calculated from both schedules (given and serial) and check whether they are
equal.
Result Serializable are not generally used because of lengthy process.
5.2.3 Locking Method
5.2.4 Dead Lock
5.2 5 Time Stamping
5.2.6 Multi version Time stamp Ordering
5.2.7 Optimistic Techniques
5.2.8 Granularity of Data Items
5.3 Database Recovery
5.3.1 The need for Recovery
5.3.2 Transactions and Recovery
5.3.3 Recovery Facilities
5.3.4 Recovery Techniques
METHODS OF TEACHING
 Lectures
 Group discussions
 Tutorials
 Exercises
ASSESSMENT METHODS
Written assignments, Presentations.
 Continuous Assessment 40%

 Final Examination 60%
Continuous Assessment will comprise:
 Test 20%
 Practical Communication Assignment 20%
PRESCRIBED READINGS
 Connolly T.M (2005)Database Systems: A Practical Approach to Design,
Implementation and Management, Addison Wesley Publications
 Fred R, Mary B (2005) Modern Database Management, , Printice Hall
Publications
RECOMMENDED READINGS
 Date C. J. (2004) An Introduction to Database Systems,, Addison Wesley
Publications

BSC 203 Database Management System

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BSC 203 Database Management System

Uploaded by

Copyright:

Available Formats

BSC 203 DATABASE MANAGEMENT SYSTEM

NOMINAL HOURS :120

COURSE AIM: To equip students with knowledge in database system and

1. Understand the main issues related to database system in general

DBMS is to provide a systematic method of creating, updating, storing, and retrieving

2. Repository A repository is a centralized knowledge base for all data definitions,

4. Database A database is an organized collection of logically related data, usually

5. Application programs Computer-based application programs are used to create

1.5 Advantage and Disadvantages of DBMS

Advantages of Database Management System (DBMS)

1. Improved data sharing

2. Improved data security

3. Better data integration

Wider access to well-managed data promotes an integrated view of the

4. Minimized data inconsistency

5. Improved data access

- How many of our customers have credit balances of 3,000 or more?

6. Improved decision making

7. Increased end-user productivity

Till now we have seen different benefits of database management systems. But it

Let's find various disadvantages of database system.

Disadvantages of Database Management System (DBMS):

one of the disadvantages of dbms is Database systems require sophisticated

Because database technology advances rapidly, personnel training costs tend to be

As a consequence, vendors are less likely to offer pricing point advantages to

4. Frequent upgrade/replacement cycles

DBMS vendors frequently upgrade their products by adding new functionality.

1.6 Components of DBMS

Database Access Language

Run Time Database Manager

Great Performance through Effective DBMS

1. Data Dictionary Management

Additionally, any changes made in a database structure are automatically recorded

Data storage management is also important for database performance tuning.

3. Data transformation and presentation

For example, imagine an enterprise database used by a multinational company. An

Security Management is another important function of DBMS. The DBMS creates a

6. Backup and Recovery Management

7. Data Integrity Management

Data integrity management is another important function of DBMS.

8. Database Access Languages and Application Programming

Current-generation DBMS's accept end-user requests via multiple, different

- The DBMS can automatically publish predefined reports on a Website.

- The DBMS can connect to third-party systems to distribute information via e-

1.6.2 Physical and logical structures

Physical and Logical Databases

The Logical Structures in Your Database

Logical structure Description

Fields Fields are the smallest logical structure in a C/SIDE database. A

Records A record is a logical structure assembled from an arbitrary number

Tables A table can be thought of as an N times M matrix. Each of the N

Companies A company is the largest logical structure in a C/SIDE database. A

Introduction to Physical Storage Structures

Introduction to Logical Storage Structures

Description of "Figure 12-1 Logical and Physical Storage"

Logical Storage Hierarchy

1.6.3 Three Level Architecture

Three-Level ANSI-SPARC Architecture

DBMS Architecture - Overview of Three-Level Architecture

The results of these reports was the three-level architecture. Three-level

Three Level Architecture of Database Management System

In the Relational Database Model, the external level schema also presents data

 Students should not see faculty salaries.