You are on page 1of 14

1 rodu|e

A database management system (0H8) |s a sollWare pac|age W|lr corpuler progrars


lral corlro| lre creal|or, ra|rlerarce, ard lre use ol a dalaoase. ll a||oWs orgar|zal|ors lo
corver|erl|y deve|op dalaoases lor var|ous app||cal|ors oy dalaoase adr|r|slralors
(08As) ard olrer spec|a||sls. A dalaoase |s ar |rlegraled co||ecl|or ol dala records, l||es,
ard olrer dalaoase oojecls. A 08V3 a||oWs d|llererl user app||cal|or progrars lo
corcurrerl|y access lre sare dalaoase. 08V3s ray use a var|ely ol dalaoase rode|s,
sucr as lre re|al|ora| rode| or oojecl rode|, lo corver|erl|y descr|oe ard supporl
app||cal|ors. ll lyp|ca||y supporls query |arguages, Wr|cr are |r lacl r|gr-|eve|
prograrr|rg |arguages, ded|caled dalaoase |arguages lral cors|derao|y s|rp||ly Wr|l|rg
dalaoase app||cal|or progrars. 0alaoase |arguages a|so s|rp||ly lre dalaoase
orgar|zal|or as We|| as relr|ev|rg ard preserl|rg |rlorral|or lror |l. A 08V3 prov|des
lac|||l|es lor corlro|||rg dala access, erlorc|rg dala |rlegr|ly, rarag|rg corcurrercy corlro|,
recover|rg lre dalaoase aller la||ures ard reslor|rg |l lror oac|up l||es, as We|| as
ra|rla|r|rg dalaoase secur|ly.
Fetaures of dbms
Support for Iarge amount of data
Each DBMS is designed to support large amount of data. They provide
special ways and means to store and manipulate large amount of data.
Companies are trying to store more and more amount of data. Some
of this data will have to be online (available every time).
In most of the cases the amount of data that can be stored is not
actually constrained by DBSM and instead constrained by the
availability of the hardware. For example, Oracle can store terabytes of
data.
Data sharing, concurrency and Iocking
DBSM also allows data to be shared by two or more users. The same
data can be accessed by multiple users at the same time - data
concurrency. However when same data is being manipulated at the
same time by multiple users certain problems arise. To avoid these
problems, DBMS locks data that is being manipulated to avoid two
users from modifying the same data at the same time.
The locking mechanism is transparent and automatic. Neither we have
to inform to DBMS about locking nor we need to know how and when
DBMS is locking the data. However, as a programmer, if we can know
intricacies of locking mechanism used by DBMS, we will be better
programmers.
Data Security
While DBMS allowing data to be shared, it also ensures that data in
only accessed by authorized users. DBMS provides features needed to
implement security at the enterprise level. By default, the data of a
user cannot be accessed by other users unless the owner gives explicit
permissions to other users to do so.
Data Integrity
Maintaining integrity of the data is an import process. If data loses
integrity, it becomes unusable and garbage. DBMS provides means to
implement rules to maintain integrity of the data. Once we specify
which rules are to be implemented, then DBMS can make sure that
these rules are implemented always.
Three integrity rules (discussed later in this chapter) - domain, entity
and referential are always supported by DBMS.
FauIt toIerance and recovery
DBMS provides great deal of fault tolerance. They continue to run in
spite of errors, if possible, allowing users to rectify the mistake in the
mean time.
DBSM also allows recovery in the event of failure. For instance, if data
on the disk is completely lost due to disk failure then also data can be
recovered to the point of failure if proper back up of the data is
available.
Support for Languages
DBMS supports a data access and manipulation language. The most
widely used data access language for RDBMS (relational database
management systems) is SQL. We will discuss more about RDBMS and
SQL later in this chapter.
DBMS implementation of SQL will be compliant with SQL standards set
by ANSI.
Apart from supporting a non-procedural language like SQL to access
and manipulate data DBMS now a days also provides a procedural
language for data processing. Oracle supports PL/SQL and SQL Server
provides T-SQL.
Entity and Attribute
An entity is any object that is stored in the database. Each entity is
associated with a collection of attributes. For example, if you take a
data of a training institute, student is an entity as we store
information about each student in the database. Each student is
associated with certain values such as roll number, name, course
etc., which are called as attributes of the entity.



Database characteristics :
supports multiple, concurrent users
one copy of data eliminates redundancy and avoids inconsistency
confidentiality, privacy, and security can be promoted
standards can be enforced, data quality ensured

hen not to use a DBMS
ain costs of using a DB$ include the following.
high initial cost
(possibly) cost of extra hardware
cost of entering data
cost of training people to use DB$
cost of maintaining DB$
when a DB$ may be unnecessary
if access to data by multiple users is not required
if database and application are simple, well-defined, and not
expected to change
ExternaI, ConceptuaI and PhysicaI LeveI
ternal level - an individuaI user's view of the organisation of
the data
Conceptual level - how the community views the organisation of
the data. onIy one data model, conceptual schema.
nternal level - The internal view is the view about the actual physical
storage of data. storage allocation e.g. B-trees, hashing etc. access
paths e.g. specification of primary and secondary keys, indexes and
pointers

Data ndependence - The management of data at each level is
independent. An objective not a feature.
! - Physical level does not impact the conceptual level.
- Conceptual level does not impact the external level.

The DDL specifies how the data is related, e.g., the schema. The
schema is a description of the data. n terms of a DB$ architecture
the DDL involves the following components.
$ystem Catalog - The schema is stored here.
DDL compiler $ that translates the DDL into actions.
Privileged Commands - Actions that only the DBA can do.
DL a language to manipulate the data, just a different kind of query
language
Advs of DBMS
Providing backup and recovery services.
Providing multiple interfaces to users.
Representing complex relationships among data.
Enforcing integrity constraints on the database.
Drawing nferences and Actions using rules
Data abstraction IeveIs of a database system
PhysicaI IeveI: The lowest level of abstraction describes 4 the
data is actually stored. The physical level describes complex low-
level data structures in detail.
LogicaI IeveI: The next higher level of abstraction describes ,9
data are stored in the database, and what relationships exist among
those data. The logical level thus describes an entire database in
terms of a small number of relatively simple structures. Although
implementation of the simple structures at the logical level may
involve complex physical level structures, the user of the logical level
does not need to be aware of this complexity. Database
administrators, who must decide what information to keep in a
database, use the logical level of abstraction.
'iew IeveI: The highest level of abstraction describes only part of the
entire database. Even though the logical level uses simpler
structures, complexity remains because of the variety of information
stored in a large database. any users of a database system do not
need all this information; instead, they need to access only a part of
the database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for
the same database.

Three Tier Architectore
TIree LIer cIIenL server urcIILecLure Is uIso known us muILI-
LIer urcIILecLure und sIgnuIs LIe InLroducLIon oI u mIddIe LIer
Lo medIuLe beLween cIIenLs und servers. TIe mIddIe LIer
exIsLs beLween LIe user InLerIuce on LIe cIIenL sIde und
duLubuse munugemenL sysLem (DBMS) on LIe server sIde.
TIIs LIIrd Iuyer execuLes process munugemenL, wIIcI
IncIudes ImpIemenLuLIon oI busIness IogIc und ruIes. TIe
LIree LIer modeIs cun uccommoduLe Iundreds oI users. L
IIdes LIe compIexILy oI process dIsLrIbuLIon Irom LIe user,
wIIIe beIng ubIe Lo compIeLe compIex Lusks LIrougI messuge
queuIng, uppIIcuLIon ImpIemenLuLIon, und duLu sLugIng or LIe
sLoruge oI duLu beIore beIng upIouded Lo LIe duLu wureIouse.
As In Lwo LIered urcIILecLures, LIe Lop IeveI Is LIe user sysLem
InLerIuce (cIIenL) und LIe boLLom IeveI Is perIorms duLubuse
munugemenL. TIe duLubuse munugemenL IeveI ensures duLu
consIsLency by usIng IeuLures IIke duLu IockIng und
repIIcuLIon. DuLu IockIng Is uIso reIerred Lo us IIIe or record
IockIng. TIIs Is u IIrsL-come, IIrsL-serve DBMS IeuLure used
Lo munuge duLu und upduLes In u muILI-user envIronmenL.
TIe IIrsL user Lo uccess u IIIe or record denIes uny oLIer user
uccess or Iocks IL. L opens up uguIn und becomes uccessIbIe
Lo oLIer users once LIe upduLe Is compIeLe.
TIe mIddIe LIer Is uIso cuIIed LIe uppIIcuLIon server. L
conLuIns u cenLruIIzed processIng IogIc, wIIcI IucIIILuLes
munugemenL und udmInIsLruLIon. ocuIIzIng sysLem
IuncLIonuIILy In LIe mIddIe LIer mukes IL possIbIe Ior
processIng cIunges und upduLes Lo be mude once und be
dIsLrIbuLed LIrougIouL LIe neLwork uvuIIubIe Lo boLI cIIenLs
und servers. SomeLImes LIe mIddIe LIer Is dIvIded InLo Lwo or
more unILs wILI dIIIerenL IuncLIons. TIIs mukes IL u muILI-
Iuyer modeI.
or exumpIe, In web uppIIcuLIons, LIe cIIenL sIde Is usuuIIy
wrILLen In HTM meunwIIIe LIe uppIIcuLIon servers ure
usuuIIy wrILLen In C++ or Juvu. By usIng u scrIpLIng Iunguuge
embedded In HTM, web servers ucL us LrunsIuLIon Iuyers
LIuL uIIow Ior communIcuLIon beLween LIe cIIenL und server
Iuyers.
TIIs Iuyer receIves requesLs Irom cIIenLs und generuLes
HTM responses uILer requesLIng IL Irom duLubuse servers.
PopuIur scrIpLIng Iunguuges IncIude JuvuScrIpL, ASP (AcLIve
Server Puge), JSP (JuvuScrIpL Puges), PHP (HyperLexL
Preprocessor), PerI (PrucLIcuI ExLrucLIon und ReporLIng
unguuge), und PyLIon. One oI LIe mujor beneIILs oI LIree
LIer urcIILecLure Is LIe ubIIILy Lo purLILIon soILwure und drug
und drop moduIes onLo dIIIerenL compuLers In u neLwork.




0ata mode||ng |s a process used lo del|re ard ara|yze dala requ|rererls reeded lo
supporl lre ous|ress processes W|lr|r lre scope ol correspord|rg |rlorral|or syslers |r
orgar|zal|ors. Trerelore, lre process ol dala rode||rg |rvo|ves proless|ora| dala rode|ers
Wor||rg c|ose|y W|lr ous|ress sla|ero|ders, as We|| as polerl|a| users ol lre |rlorral|or
sysler. Trere are lrree d|llererl lypes ol dala rode|s produced Wr||e progress|rg lror
requ|rererls lo lre aclua| dalaoase lo oe used lor lre |rlorral|or sysler [2|. Tre dala
requ|rererls are |r|l|a||y recorded as a corceplua| dala rode| Wr|cr |s esserl|a||y a sel ol
lecrro|ogy |rdeperderl spec|l|cal|ors aooul lre dala ard |s used lo d|scuss |r|l|a|
requ|rererls W|lr lre ous|ress sla|ero|ders. Tre corceplua| rode| |s lrer lrars|aled |rlo
a |og|ca| dala rode|, Wr|cr docurerls slruclures ol lre dala lral car oe |rp|ererled |r
dalaoases. To |rp|ererl ore corceplua| dala rode| ray requ|re ru|l|p|e |og|ca| dala
rode|s
Hierarchical Model
The hierarchical data model organizes data in a tree structure.
There is a hierarchy oI parent and child data segments. This
structure implies that a record can have repeating inIormation,
generally in the child data segments. Data in a series oI records,
which have a set oI Iield values attached to it. It collects all the
instances oI a speciIic record together as a record type. These
record types are the equivalent oI tables in the relational model,
and with the individual records being the equivalent oI rows. To
create links between these record types, the hierarchical model
uses Parent Child Relationships. These are a 1:N mapping between
record types. This is done by using trees, like set theory used in the
relational model, "borrowed" Irom maths. For example, an
organization might store inIormation about an employee, such as
name, employee number, department, salary. The organization
might also store inIormation about an employee's children, such as
name and date oI
etwork
Data is stored along with pointers, which specify the
relationship between entities. This was used in Honeywell's
Integrated Data Store, IDS.
This model is complex. It is difficult to understand both the
way data is stored and the way data is manipulated. It is
capable of supporting many-to-many relationship between
entities, which hierarchical model doesnt.
#eIationaI
This stores data in the form of a table. Table is a collection
of rows and columns. We will discuss more about
relational model in the next second.

moduIe
Data mining is the principle of sorting through large amounts of
data and picking out relevant information. t is usually used by
business intelligence organizations, and financial analysts, but it is
increasingly used in the sciences to extract information from the
enormous data sets generated by modern experimental and
observational methods. t has been described as "the nontrivial
extraction of implicit, previously unknown, and potentially useful
information from "data and "the science of extracting useful
information from large data sets or databases".

Data ining is simply the task of extracting useful information from
recorded data. The captured data needs to be converted into
information and knowledge to become useful. Data mining is the
entire process of applying computer-based methodology, including
new techniques for knowledge discovery, from data. Data mining
identifies trends within data that go beyond simple analysis.

Weakness of Data Mining
Data mining relies on the use of real world data. This data is
extremely vulnerable to co-linearity precisely because data from the
real world may have unknown interrelations. An unavoidable
weakness of data mining is that the critical data that may explain the
relationships is never observed. Alternative approaches using an
experiment based approach such as Choice odelling for human
generated data may be used. nherent correlations are either
controlled for or removed altogether through the construction of an
experimental design.

&ses of data mining. For example, a database of prescription drugs
taken by a group of people could be used to find combinations of
drugs exhibiting harmful interactions. $ince any particular
combination may occur in only 1 out of 1000 people, a great deal of
data would need to be examined to discover such an interaction. A
project involving pharmacies could reduce the number of drug
reactions and potentially save lives. Unfortunately, there is also a
huge potential for abuse of such a database.
Essentially, data mining gives information that would not be available
otherwise. t must be properly interpreted to be useful. hen the data
collected involve individual people, there are many questions
concerning privacy, legality, and ethics.
Data ining is most frequently used for CR. Common goals are to
predict which people are most likely to: a) Be Acquired b) Be Cross-
$old or Up-$old c) Leave \ Churn d) Be Retained, $aved, or on
back

Superkey: t is an attribute or set of attributes that uniquely identifies
rows within a table; in other words, two distinct rows are always
guaranteed to have distinct superkeys. {Employee D, Employee
Address, $kill} would be a superkey for the "Employees' $kills" table;
{Employee D, $kill} would also be a superkey.

Candidate key: t is a minimal superkey, that is, a superkey for which
we can say that no proper subset of it is also a superkey. {Employee
d, $kill} would be a candidate key for the "Employees' $kills" table.

on-prime attribute: A non-prime attribute is an attribute that does
not occur in any candidate key. Employee Address would be a non-
prime attribute in the "Employees' $kills" table.

Primary key: ost DB$s require a table to be defined as having a
single unique key, rather than a number of possible unique keys. A
primary key is a candidate key which the database designer has
designated for this purpose.

#eIationaI Database Management System (#DBMS)
A DBMS that is based on relational model is called as
RDBMS. Relation model is most successful mode of all three
models. Designed by E.F. Codd, relational model is based on
the theory of sets and relations of mathematics.
Relational model represents data in the form a table. A table
is a two dimensional array containing rows and columns.
Each row contains data related to an entity such as a
student. Each column contains the data related to a single
attribute of the entity such as student name.
One of the reasons behind the success of relational model is
its simplicity. It is easy to understand the data and easy to
manipulate.
Another important advantage with relational model,
compared with remaining two models is, it doesnt bind data
with relationship between data item. Instead it allows you to
have dynamic relationship between entities using the values
of the columns.
Almost all Database systems that are sold in the market,
now- a-days, have either complete or partial implementation
of relational model.
Figure 1 shows how data is represented in relational model
and what are the terms used to refer to various components
of a table. The following are the terms used in relational
model.
.
%upIe / #ow
A s|rg|e roW |r lre lao|e |s ca||ed as lup|e. Eacr roW represerls lre dala ol a s|rg|e
erl|ly.
Attribute Column
A co|urr slores ar allr|oule ol lre erl|ly. For exarp|e, |l dela||s ol sluderls are
slored lrer sluderl rare |s ar allr|oule; course |s arolrer allr|oule ard so or.
CoIumn ame
Each column in the table is given a name. This name is used
to refer to value in the column.
%abIe ame
Each table is given a name. This is used to refer to the table.
The name depicts the content of the table.
The following are two other terms, primary key and foreign
key, that are very important in relational model.
Primary Key
A table contains the data related entities. If you take
STUDETNS table, it contains data related to students. For
each student there will be one row in the table. Each
students data in the table must be uniquely identified. In
order to identify each entity uniquely in the table, we use a
column in the table. That column, which is used to uniquely
identify entities (students) in the table is called as primary
key.
In case of STUDENTS table (see figure 1) we can use
ROLLNO as the primary key as it in not duplicated.
So a primary key can be defined as a set of columns used to
uniquely identify rows of a table.
Some other examples for primary keys are account number
in bank, product code of products, employee number of an
employee.
Composite Primary Key
In some tables a single column cannot be used to uniquely
identify entities (rows). In that case we have to use two or
more columns to uniquely identify rows of the table. When a
primary key contains two or more columns it is called as
composite primary key.
Foreign Key
In relational model, we often store data in different tables
and put them together to get complete information. For
example, in PAYMENTS table we have only ROLLNO of the
student. To get remaining information about the student we
have to use STUDETNS table. Roll number in PAYMENTS
table can be used to obtain remaining information about the
student.
Integrity #uIes
Data integrity is to be maintained at any cost. If data loses
integrity it becomes garbage. So every effort is to be made
to ensure data integrity is maintained. The following are the
main integrity rules that are to be followed.
Domain integrity
Data is said to contain domain integrity when the value of a
column is derived from the domain. Domain is the collection
of potential values. For example, column date of joining
must be a valid date. All valid dates form one domain. If the
value of date of joining is an invalid date, then it is said to
violate domain integrity.
Entity integrity
This specifies that all values in primary key must be not null
and unique. Each entity that is stored in the table must be
uniquely identified. Every table must contain a primary key
and primary key must be not null and unique.
#eferentiaI Integrity
This specifies that a foreign key must be either null or must
have a value that is derived from corresponding parent key.
For example, if we have a table called BATCHES, then
ROLLNO column of the table will be referencing ROLLNO
column of STUDENTS table. All the values of ROLLNO
column of BATCHES table must be derived from ROLLNO
column of STUDENTS table. This is because of the fact that
no student who is not part of STUDENTS table can join a
batch





3 module
ormaIization is a technique for designing relational database
tables to minimize duplication of information and, in so doing, to
safeguard the database against certain types of logical or structural
problems, namely data anomalies. For example, when multiple
instances of a given piece of information occur in a table, the
possibility exists that these instances will not be kept consistent when
the data within the table is updated, leading to a loss of data integrity.
A table that is sufficiently normalized is less vulnerable to problems of
this kind, because its structure reflects the basic assumptions for
when multiple instances of the same information should be
represented by a single instance only.
Normalization typically involves decomposing an unnormalized table
into two or more tables that, were they to be combined (joined), would
convey exactly the same information as the original table.

ProbIems addressed by normaIization
An update anomaIy. Employee 519 is shown as having different
addresses on different records.
An insertion anomaIy. Until the new faculty member is assigned to
teach at least one course, his details cannot be recorded.
A deIetion anomaIy. All information about Dr. Giddens is lost when
he temporarily ceases to be assigned to any courses.
A table that is not sufficiently normalized can suffer from logical
inconsistencies of various types, and from anomalies involving data
operations. n such a table:

A tabIe is in 1first normaI formF)if and onIy if it faithfully
represents a relation Given that database tables embody a relation-
like form, the defining characteristic of one in first normal form is that
it does not allow duplicate rows or nulls. $imply put, a table with a
unique key (which, by definition, prevents duplicate rows) and without
any nullable columns is in 1NF.

%he criteria for second normaI form (F) are:
The table must be in 1NF.
None of the non-prime attributes of the table are functionally
dependent on a part (proper subset) of a candidate key; in other
words, all functional dependencies of non-prime attributes on
candidate keys are full functional dependencies. For example, in an
"Employees' $kills" table whose attributes are Employee D,
Employee Address, and $kill, the combination of Employee D and
$kill uniquely identifies records within the table. Given that
Employee Address depends on only one of those attributes
namely, Employee D the table is not in 2NF.
Note that if none of a 1NF table's candidate keys are composite
i.e. every candidate key consists of just one attribute then we can
say immediately that the table is in 2NF.

%he criteria for third normaI form (3F) are:
The table must be in 2NF.
Every non-prime attribute of the table must be non-transitively
dependent on each candidate key.
[7]
A violation of 3NF would mean
that at least one non-prime attribute is only 3/70.9 dependent
(transitively dependent) on a candidate key. For example, consider
a "Departments" table whose attributes are Department D,
Department Name, anager D, and anager Hire Date; and
suppose that each manager can manage one or more departments.
{Department D} is a candidate key. Although anager Hire Date is
functionally dependent on the candidate key {Department D}, this is
only because anager Hire Date depends on anager D, which in
turn depends on Department D. This transitive dependency means
the table is not in 3NF.








[VaW
Database recovery
Numerous occurrences could lead to data loss: data or
database elements could be accidentally deleted; data could become
corrupted by the addition of bad data; hardware, such as a disk or
server, could fail; or disasters, such as flooding, could destroy your
server and storage media.
Since much time, effort, and money are usually invested in an
organization's data, it is unlikely that the loss of it would be a trivial
thing. For this reason, it is critical that you have a tested recovery plan
in place for your geodatabase. Notice the plan should be tested before
it is implemented-you can back up all the data you want, but if you
can't recover it, it is useless.
Backup and recovery strategy needs vary in accordance with your
specific situation. For ArcSDE geodatabases for SQL Server Express,
only simple backup and recovery are performed. A simple backup is a
full backup. Since ArcSDE geodatabases for SQL Server Express are
comparatively small and accessed by fewer users than ArcSDE
geodatabases on the other supported database management systems
(DBMS), it doesn't take as long to create full backup files, and they
can be done more frequently. To learn more about this type of
recovery model, see Simple backup and recovery.

You might also like