You are on page 1of 33

UNIT 1

Concept of advanced database techniques


Advanced Database Techniques combines advanced techniques with practical
advice and many new ideas, methods, and examples for database management
students, system specialists, and programmers. It provides a wealth of technical
information on database methods and an encyclopedic coverage of advanced
techniques that other current books on database lack. An overview covers important
definitions in the area of database management and describes such classical
notions as file structures, conceptual, physical and external schemas, and relational,
network, hierarchical, and entity-relationship models. Remaining chapters offer
advanced techniques, methods, and practical advice for functional specification and
system design of a database-oriented interactive application; database architecture
with qualitative and quantitative optimizations; the prediction of loads and response
times; data representation, packing, and protection; selection of data elements and
structures in a database; practical extensions of the relational theory to include
dynamic relations and schemas, existence and processing constraints and
coroutines; software architectures (functional interface and decision machine); and
open databases for robotics, image processing, CAD, and artificial
intelligence.Extended definitions are provided for conceptual schema, view, soft
constraints and selection, relation, and dynamic schema. And an entire chapter is
devoted to MSD, a new relational approach to specification and design. New
software architectures for database applications are also covered. Advanced
Database Techniques describes the 15 functions of a database management
system and its internal mechanisms and provides a complete product review of the
DBMS ORACLE as well as advice on DBMS purchasing and database
administration.

Impact of emerging database standards


Before a newly installed DBMS can be used effectively, standards
and procedures must be developed for database usage. Studies
have shown that companies with high levels of standardization
reduce the cost of supporting end users by as much as 35
percent or more as compared to companies with low levels of
standardization.
Standards are common practices that ensure the consistency and
effectiveness of the database environment, such as database
naming conventions. Procedures are defined, step-by-step
instructions that direct the processes required for handling
specific events, such as a disaster recovery plan. Failure to
implement database standards and procedures will result in a
database environment that is confusing and difficult to manage.
The DBA should develop database standards and procedures as
a component of corporate-wide IT standards and procedures.
They should be stored together in a central location as a printed
document, in an online format, or as both. Several vendors offer
“canned” standards and procedures that can be purchased for
specific DBMS products.

Architectures of Distributed DBMS


(DDBMS)
The basic types of distributed DBMS are as follows:

1. Client-server architecture of Distributed system.

• A client server architecture has a number of clients and a few servers


connected in a network.
• A client sends a query to one of the servers. The earliest available server
solves it and replies.
• A Client-server architecture is simple to implement and execute due to
centralized server system.
2. Collaborating server architecture.

• Collaborating server architecture is designed to run a single query on multiple


servers.
• Servers break single query into multiple small queries and the result is sent to
the client.
• Collaborating server architecture has a collection of database servers. Each
server is capable for executing the current transactions across the databases.

3. Middleware architecture.

• Middleware architectures are designed in such a way that single query is


executed on multiple servers.
• This system needs only one server which is capable of managing queries and
transactions from multiple servers.
• Middleware architecture uses local servers to handle local queries and
transactions.
• The softwares are used for execution of queries and transactions across one or
more independent database servers, this type of software is called as
middleware.

New developments in database technology


An astounding array of new technologies and approaches have emerged on
the database scene over the past few years that promise to turn the next 12
months into a time of unprecedented transformation for the database
landscape. There are new developments, along with reinforcement of tried-
and-true technologies, some of which may help make the jobs of data
managers just a bit easier.

“Gone are the days of a terabyte of data sitting in a relational database


accessed by a few analysts using BI tools,” said Chris Doolittle, principal
consultant of Teleran. “Big data, IoT, specialized database platforms, AI and
machine learning, and the cloud are driving a generational transformation in
data management.”

There is still plenty of hard, brain-twisting, arm-twisting work ahead to get this
next generation of technologies into and aligned with organizations. “We are
on the cusp of an unprecedented intelligence revolution, and a lot of the
enabling technologies—cloud, machine learning, artificial intelligence, real
time databases, next-generation memory technologies—are already
available,” said Leena Joshi, VP of product marketing at Redis Labs. “What is
needed is for enterprises to develop stacks that can tie all the piece parts
together without generating layers of additional complexity.” This, more than
anything, describes the job of data managers in the year 2018.

Here are key developments that need to top data managers’ to-do lists in
terms of technology focus this year:

ANALYTICS WITH A PURPOSE


For a number of years, the goal of many enterprises—egged on by vendors
and analyst groups—was to find ways to disperse analytics across the
enterprise, a kind of “data democracy.” Now, it may be time to shift gears on
this vision, employing analytics not to empower single individuals, but to build
a collaborative culture. “The trend toward self-service analytics is not panning
out,” said Jon Pilkington, chief product officer at Datawatch. “Putting analytics
power in the hands of the business user was supposed to create agile
companies and deliver analytical, data-driven decisions. Instead, companies
are in worse shape than ever before. IT has lost control over data usage, and
analysts are working in silos, duplicating work efforts and experiencing a
severe lack of trust in their data and analytics outcomes.”

Pilkington urges data managers to move away from the self-service goal and
work toward more collaborative “team-based, enterprise data preparation and
analytics.” Such collaboration “will create a data-driven culture by bringing
analysts together for the common purpose of getting answers—answers that
are founded in the cross-business insights necessary to profoundly impact
operational processes and the bottom line. Teams will be able to create, find,
access, validate, and share governed, trustworthy datasets and models for
true enterprise collaboration and faster, more strategic decision making.

ARTIFICIAL INTELLIGENCE TO IMPROVE ARTIFICIAL INTELLIGENCE


Artificial intelligence may go a long way in helping businesses understand and
predict their futures, but AI is only as good as the data feeding it. Ironically, AI
will help organizations achieve better AI results. “Many companies are faced
with challenges around whether their data is current, complete, and
consistent,” said Doug Rybacki, VP of product management at Conga. “When
you apply intelligent tools against data that is lacking in these components of
data quality, the result is disappointing and potentially misleading.” Ideally,
said Rybacki, intelligent tools must be used for data hygiene so that the larger
benefits of machine learning and artificial intelligence applications can be
realized.
Ultimately, of course, AI needs to deliver to the business. Doolittle sees more
intelligent data management solutions that “combine machine learning with
rule-based systems to watch and learn from changing data usage patterns
and user behaviors. They automatically create data management rules that
can automatically direct changes or actions to better serve changing business
demands. Examples include identifying resource-consuming user behaviors
that indicate a need for more shifting data workloads to a more appropriate or
cost-effective data platforms, or increasing use of more detailed data
indicating a demand for direct access to source data to improve analytical
outcomes and lower data handling costs.”

VIRTUAL ASSISTANTS
Another technology development that is fueled by data is the rise of virtual
assistants. The cutting-edge web companies are employing this type of
solution, and it is coming to mainstream enterprises as well. Google, for
example, is thriving with its Google Now virtual assistant, “which is only
getting smarter because of its ability to use available data from web
interactions to provide a personal experience for users,” observed Luc
Burgelman, CEO of NGDATA.

The most important part of creating virtual assistants will be the data and
having the data drive actions and decisions, Burgelman noted. “This means
considering all data—including real-time and behavioral data—and learning
from all channels to create connected experiences customers expect.
Powering these customer interactions through the understanding of all this
detail will be critical for companies.”

CLOUD ADVANCES
Cloud computing, which has been a major force in the IT and data
management space for close to a decade, continues to reshape database
technologies as well. Cloud is increasingly the home of “systems of insight”
that support advanced data analytics and artificial intelligence capabilities,
said Roman Stanek, CEO and founder at GoodData. In Stanek’s view, a
“unified technology platform in the cloud is the future of analytics and data in
the cloud.” Data in the cloud is growing rapidly, and there is no way to
manage that other than through a system of insight, he added. The industry is
facing a confluence of trends, he noted. These include data growing
exponentially and old BI failing, while advances in BI such as machine
learning and predictive analytics make it ripe to take off.
Not only will there be systems of insight in the cloud, but multiple clouds for
multiple use cases as well. Lately, there’s been movement to multi-cloud
strategies, especially as more applications and innovations open up. “Most
companies don’t set out to adopt a multi-cloud strategy,” said Jaspreet Singh,
CEO and founder of Druva. “Rather, they choose to work with cloud vendors
for specific use cases, and when we take a step back, we see a multi-cloud
implementation. In that regard, multi-cloud is not a strategy, it’s an outcome of
these decisions.

BRING ON THE BLOCKCHAINS


Another technology that is seriously being explored by many enterprises is
blockchain—an online global database that stores and manages smart
contracts and transactions. Some observers see blockchain as the next great
frontier for data management. “Right now, we only hear about blockchain with
cryptocurrency and are starting to see emerging companies in the finance and
healthcare space discussing the value,” said Avani Desai, executive VP of
Schellman and Company.

The value may be in blockchain’s distributed nature—data is verified across


multiple nodes, and thus protected from tampering. “Blockchain provides you
100% assurance that a transaction was valid,” Desai observed. “It shows me
what was done, by whom, when, and maybe even the why. This provides
transparency and reconciliation, one of the most difficult aspects of a
distributed system.”

LATENCY BUSTERS
The move toward real-time computing and real-time enterprises is also
shaping the database technology landscape this year. At the same time, many
of the technologies with which data management teams are working may add
more latency into transactions and computing jobs. “While moving to real-time
is a trend, it competes directly with the move to microservices, distributed
logs, and asynchronous stream processing,” said John Hugg, founding
engineer and manager of developer outreach at VoltDB. “All of these things
can make our systems more resilient and dynamic, but they often compound
latency problems. Things that used to be a single network round trip might
become dozens of asynchronous network messages.”

Nowhere is the need for real-time and reduced latency felt more strongly than
in efforts to leverage the Internet of Things (IoT). Capturing data in real time,
tied to IoT, can be effective only with systems capable of cost-effectively
handling large data volumes with very low latencies, said Joshi of Redis Labs.
“Being able to implement adaptive applications powered by machine learning
in real time is a critical aspiration for most enterprises, but real-time databases
that can power such applications with built-in capabilities are most likely to
make these aspirations a reality.” Joshi added that another critical force in
making the data-driven enterprise a reality is the shift in hardware technology,
which puts more cost-effective memory such as flash within reach of
applications. “Datasets that can deliver the real-time performance of memory
but with the cost-effectiveness of flash are likely to create a competitive edge
for enterprise,” she said.

METADATA AND DATA CATALOGS


Despite all the hype and excitement about data-driven, AI-savvy enterprises,
there is a fundamental component of data management that managers are
beginning to embrace: keeping track of data assets and making them
discoverable to decision makers. With data streaming in from a wide variety of
internal and external sources, there needs to be a way to intelligently track,
archive, and identify what is available to decision makers and applications.
Metadata repositories and data catalogs are the way this can be achieved.
“People tend to focus on things like in-memory and other speed-and-feeds
sorts of metrics when they think about real-time technologies,” said Joe
Pasqua, executive VP at MarkLogic. “But that assumes you’ve got all the
relevant data in one place and you’re just trying to serve it up quickly. That’s
the easy part. The real enabler is making the data available in the first place.
This is made possible by a strong metadata solution to describe what and
where the data is, and a multi-model approach that allows access to the
varied shapes, sizes, and formats of the data across your organization,
including graphs, documents, rows, geospatial, and so on.”

Metadata is also key to the success of AI, as well. “When AI can be leveraged
to automatically and accurately append metadata attributes to information, the
whole game changes,” said Greg Milliken, senior VP of marketing for M-Files.
“AI can automatically evaluate the file contents for specific terms like a
customer name, project, or case as well as the type or class of document—a
contract, invoice, project plan, financial report—and then apply those
metadata attributes to the file. This automatically initiates workflow processes
and establishes access permissions, so only authorized people can access
the information—such as the project team, the HR department, or those
managing a specific case.” The result, Milliken continued, “is a more intelligent
and streamlined information environment that not only ensures consistency in
how content is organized, but also that information is intelligently linked to
other relevant data, content, and processes to deliver a 360-degree view of
structured data and unstructured content residing in different business
systems.”

OPEN SOURCE PREVAILS


Open source technologies have emerged that support the emerging real-time
data center, said Marc Concannon, CTO of Clavis Insight. These include
Kafka for capturing and distributing incoming streaming data; NiFi for data
routing; Ignite for faster in memory process of the incoming data; Hadoop 2.0
for data access and storage; and Kubernetes for managing how we scale a
streaming infrastructure which is susceptible to bursts. “All of these
technologies are relatively new to our stack,” Concannon pointed out. “But
these technologies at their core are all about working with more and more
data and extracting the relevant insights from this data quicker and hence
making it available to our customers quicker.”

Hadoop “does enable a lot of tools which are focused on streaming and it also
enables quicker access to the core insights on large datasets which is not
really possible or would mean a long wait on more traditional technologies,
said Concannon. “This, to me, is all about making more data available at
decision time.”

Introduction to PL/SQL
PL/SQL Database Objects
Proced ures in PL/SQL
PL/SQL is a block-structured language that enables developers to
combine the power of SQL with procedural statements.
A stored procedure in PL/SQL is nothing but a series of declarative
SQL statements which can be stored in the database catalogue. A
procedure can be thought of as a function or a method. They can
be invoked through triggers, other procedures, or applications on
Java, PHP etc.
All the statements of a block are passed to Oracle engine all at
once which increases processing speed and decreases the traffic.
Advantages:
• They result in performance improvement of the application. If a
procedure is being called frequently in an application in a single
connection, then the compiled version of the procedure is
delivered.
• They reduce the traffic between the database and the
application, since the lengthy statements are already fed into
the database and need not be sent again and again via the
application.
• They add to code reusability, similar to how functions and
methods work in other languages such as C/C++ and Java.
Disadvantages:
• Stored procedures can cause a lot of memory usage. The
database administrator should decide an upper bound as to
how many stored procedures are feasible for a particular
application.
• MySQL does not provide the functionality of debugging the
stored procedures.

Syntax to create a stored procedure

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- Comments --

CREATE PROCEDURE procedure_name


= ,
= ,
=

AS
BEGIN
-- Query --
END

GO

Example:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE GetStudentDetails


@StudentID int = 0
AS
BEGIN
SET NOCOUNT ON;
SELECT FirstName, LastName, BirthDate, City,
Country
FROM Students WHERE StudentID=@StudentID
END
GO

Syntax to modify an existing stored procedure


SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

-- Comments --

ALTER PROCEDURE procedure_name


= ,
= ,
=

AS
BEGIN
-- Query --
END

GO

Example:
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE GetStudentDetails


@StudentID int = 0
AS
BEGIN
SET NOCOUNT ON;
SELECT FirstName, LastName, City
FROM Students WHERE StudentID=@StudentID
END
GO

Syntax to drop a Procedure:


DROP PROCEDURE procedure_name

Example:
DROP PROCEDURE GetStudentDetails

Functions in PL/SQL
A function can be used as a part of SQL expression i.e. we can use
them with select/update/merge commands. One most important
characteristic of a function is that unlike procedures, it must return
a value.
Syntax to create a function:
CREATE [OR REPLACE] FUNCTION function_name
[(parameter_name type [, ...])]

// this statement is must for functions


RETURN return_datatype
{IS | AS}

BEGIN
// program code

[EXCEPTION
exception_section;

END [function_name];

Advantages:
1. We can make a single call to the database to run a block of
statements thus it improves the performance against running
SQL multiple times. This will reduce the number of calls
between the database and the application.
2. We can divide the overall work into small modules which
becomes quite manageable also enhancing the readability of
the code.
3. It promotes reusability.
4. It is secure since the code stays inside the database thus
hiding internal database details from the application(user). The
user only makes a call to the PL/SQL functions. Hence security
and data hiding is ensured.
Packages in PL/SQL


CREATE PACKAGE cust_sal AS


PROCEDURE find_sal(c_id customers.id%type);
END cust_sal;
/

CREATE OR REPLACE PACKAGE BODY cust_sal AS

PROCEDURE find_sal(c_id customers.id%TYPE) IS


c_sal customers.salary%TYPE;
BEGIN
SELECT salary INTO c_sal
FROM customers
WHERE id = c_id;
dbms_output.put_line('Salary: '|| c_sal);
END find_sal;
END cust_sal;
/
Trig g ers in SQL
Trigger is a statement that a system executes automatically when
there is any modification to the database. In a trigger, we first
specify when the trigger is to be executed and then the action to be
performed when the trigger executes. Triggers are used to specify
certain integrity constraints and referential constraints that cannot
be specified using the constraint mechanism of SQL.
Example –
Suppose, we are adding a tupple to the ‘Donors’ table that is some
person has donated blood. So, we can design a trigger that will
automatically add the value of donated blood to the ‘Blood_record’
table.
Types of Triggers –
We can define 6 types of triggers for each table:
1. AFTER INSERT activated after data is inserted into the table.
2. AFTER UPDATE: activated after data in the table is modified.
3. AFTER DELETE: activated after data is deleted/removed from
the table.
4. BEFORE INSERT: activated before data is inserted into the
table.
5. BEFORE UPDATE: activated before data in the table is
modified.
6. BEFORE DELETE: activated before data is deleted/removed
from the table.

Examples showing implementation of Triggers:


1. Write a trigger to ensure that no employee of age less than 25
can be inserted in the database.
delimiter $$
CREATE TRIGGER Check_age BEFORE INSERT ON employee
FOR EACH ROW
BEGIN
IF NEW.age < 25 THEN
SIGNAL SQLSTATE '45000'
SET MESSAGE_TEXT = 'ERROR:
AGE MUST BE ATLEAST 25 YEARS!';
END IF;
END; $$
delimiter;
Explanation: Whenever we want to insert any tupple to table
’employee’, then before inserting this tupple to the table, trigger
named ‘Check_age’ will be executed. This trigger will check the age
attribute. If it is greater then 25 then this tupple will be inserted into
the tupple otherwise an error message will be printed stating
“ERROR: AGE MUST BE ATLEAST 25 YEARS!”
2. Create a trigger which will work before deletion in employee
table and create a duplicate copy of the record in another table
employee_backup.
Before writing trigger, we need to create table employee_backup
create table employee_backup (employee_no int,
employee_name varchar(40), job varchar(40),
hiredate date, salary int,
primary key(employee_no));
delimiter $$
CREATE TRIGGER Backup BEFORE DELETE ON employee
FOR EACH ROW
BEGIN
INSERT INTO employee_backup
VALUES (OLD.employee_no, OLD.name,
OLD.job, OLD.hiredate, OLD.salary);
END; $$
delimiter;
Explanation: We want to create a backup table that holds the value
of those employees who are no more the employee of the
institution. So, we create a trigger named Backup that will be
executed before the deletion of any Tupple from the table
employee. Before deletion, the values of all the attributes of the
table employee will be stored in the table employee_backup.
3. Write a trigger to count number of new tupples inserted using
each insert statement.
Declare count int
Set count=0;
delimiter $$
CREATE TRIGGER Count_tupples
AFTER INSERT ON employee
FOR EACH ROW
BEGIN
SET count = count + 1;
END; $$
delimiter;
Explanation: We want to keep track of the number of new Tupples
in the employee table. For that, we first create a variable ‘count’
and initialize it to 0. After that, we create a trigger named
Count_tupples that will increment the value of count after insertion
of any new Tupple in the table employee.
Prog ram m atic SQL
Programmatic SQL is of two types:

1. Embedded SQL
2. Dynamic SQL

Em bed d ed SQL
Embedded SQL is a method of inserting inline SQL statements
or queries into the code of a programming language, which is
known as a host language. Because the host language cannot
parse SQL, the inserted SQL is parsed by an embedded SQL pre-
processor.

Embedded SQL is a robust and convenient method of combining


the computing power of a programming language with SQL's
specialized data management and manipulation capabilities.

Embedded SQL is not supported by all relational database


management systems (RDBMS). Oracle DB and PostgreSQL
provide embedded SQL support. MySQL, Sybase and
SQL Server 2008 do not, although support was provided by
earlier versions of SQL Server (2000 and 2005).

The C programming language is commonly used for embedded


SQL implementation. For example, a commercial
bank's information system (IS) has a front-end user interface
created in the C language, and the IS interfaces with a back-end
Oracle DB database. One of the front-end interface modules
allows quick viewing and commission calculation for sales agents
during specified periods. An inefficient approach to handling this
process would be to store each commission value in a database
table. However, a more effective solution is to calculate and return
commission values based on unique user requests on specified
dates. The application accomplishes this by embedding a SQL
query within the C code, as follows:

SELECT 0.2*SALE_AMOUNT FROM TOTAL_SALES WHERE


SALE_DATE='MM/DD'YYYY' AND AGENT_NO=xx

In this example, the SQL statement calculates and returns 20


percent of the sale amount from a TOTAL_SALES table, while the
user is expected to input the SALE_DATE and AGENT_NO
values. This SQL query is then inserted inline into the C code of
the front-end module. The C code and SQL query work together
to deliver seamless user results.

Dynamic SQL
Dynamic Structured Query Language (SQL) is a SQL version that
facilitates the generation of dynamic (or variable) program
queries. Dynamic SQL allows a programmer to write code that
automatically adjusts to varying databases, environments, servers
or variables.

Dynamic SQL statements are not embedded in the source


program but stored as strings of characters that are manipulated
during a program's runtime. These SQL statements are either
entered by a programmer or automatically generated by the
program. This is the major difference between dynamic SQL and
static SQL statements. Dynamic SQL statements also may
change from one execution to the next without manual
intervention.

Dynamic SQL facilitates automatic generation and manipulation of


program modules for efficient automated repeating task
preparation and performance.

Dynamic SQL facilitates the development of powerful applications


with the ability to create database objects for manipulation
according to user input. For example, a Web application may allow
parameters specifying a SQL query. Typical SQL queries
accommodate a few parameters. However, entering 10 or more
parameters often leads to highly complex SQL queries, especially
if a user is allowed to enter conditions (such as AND or OR)
between parameters.

Dynamic SQL increases processing and efficiency by running


simultaneous queries and distributing results from a single
interface query on multiple databases.

Early Oracle database versions with PL/SQL dynamic SQL


required that programmers to use a complicated Oracle
DBMS_SQL package library. Later, a simpler "Native Dynamic
SQL" was introduced.

ODBC Standard
Open Database Connectivity (ODBC) is an interface standard for
accessing data and communicating with database systems,
regardless of the operating system (OS), database system (DS) or
programming language. This is accomplished by using ODBC
drivers that serve as a bridge between applications and database
systems.
In 1992, a group of manufacturers introduced the ODBC model as
a communications solution for the large number of OSs, DSs and
applications written in different programming languages. For
example, an application written in C to access an Oracle
database in UNIX had to be rewritten if the application changed to
Windows, or if the database platform was moved to Sybase.
These manufacturers recognized the need for an intermediate
translation mechanism and created a set of protocols and
application programming interfaces (APIs), which was the first
ODBC model.

The ODBC model contains the following three major components:

• Client (usually a programming application)


• Database Server
• ODBC Driver

The driver’s function, which is very similar to a human translator,


is to bridge the gap between parties that would not otherwise
understand each other.

Parallel Database Architecture


Today everybody interested in storing the information they have got. Even
small organizations collect data and maintain mega databases. Though the
databases eat space, they really helpful in many ways. For example, they
are helpful in taking decisions through a decision support system. To handle
such a voluminous data through conventional centralized system is bit
complex. It means, even simple queries are time consuming queries. The
solution is to handle those databases through Parallel Database Systems,
where a table / database is distributed among multiple processors possibly
equally to perform the queries in parallel. Such a system which share
resources to handle massive data just to increase the performance of the
whole system is called Parallel Database Systems.
We need certain architecture to handle the above said. That is, we need
architectures which can handle data through data distribution, parallel query
execution thereby produce good throughput of queries or Transactions.
Figure 1, 2 and 3 shows the different architecture proposed and successfully
implemented in the area of Parallel Database systems. In the figures, P
represents Processors, M represents Memory, and D represents Disks/Disk
setups.

1. Shared Memory Architecture

Figure 1 - Shared Memory Architecture

In Shared Memory architecture, single memory is shared among many


processors as show in Figure 1. As shown in the figure, several processors
are connected through an interconnection network with Main memory and
disk setup. Here interconnection network is usually a high speed network
(may be Bus, Mesh, or Hypercube) which makes data sharing (transporting)
easy among the various components (Processor, Memory, and Disk).

Advantages:

• Simple implementation
• Establishes effective communication between processors through
single memory addresses space.
• Above point leads to less communication overhead.
Disadvantages:

• Higher degree of parallelism (more number of concurrent operations in


different processors) cannot be achieved due to the reason that all the
processors share the same interconnection network to connect with
memory. This causes Bottleneck in interconnection network
(Interference), especially in the case of Bus interconnection network.

• Addition of processor would slow down the existing processors.

• Cache-coherency should be maintained. That is, if any processor tries


to read the data used or modified by other processors, then we need
to ensure that the data is of latest version.

• Degree of Parallelism is limited. More number of parallel processes


might degrade the performance.

2. Shared Disk Architecture

Figure 2 - Shared Disk Architecture


In Shared Disk architecture, single disk or single disk setup is shared among
all the available processors and also all the processors have their own
private memories as shown in Figure 2.

Advantages:

• Failure of any processors would not stop the entire system (Fault
tolerance)
• Interconnection to the memory is not a bottleneck. (It was bottleneck in
Shared Memory architecture)
• Support larger number of processors (when compared to Shared
Memory architecture)

Disadvantages:

• Interconnection to the disk is bottleneck as all processors share


common disk setup.

• Inter-processor communication is slow. The reason is, all the


processors have their own memory. Hence, the communication
between processors need reading of data from other processors’
memory which needs additional software support.

Example Real Time Shared Disk Implementation


• DEC clusters (VMScluster) running Rdb

3. Shared Nothing Architecture


Figure 3 - Shared Nothing Architecture

In Shared Nothing architecture, every processor has its own memory and
disk setup. This setup may be considered as set of individual computers
connected through high speed interconnection network using regular
network protocols and switches for example to share data between
computers. (This architecture is used in the Distributed Database System).
In Shared Nothing parallel database system implementation, we insist the
use of similar nodes that are Homogenous systems. (In distributed database
System we may use Heterogeneous nodes)

Advantages:

• Number of processors used here is scalable. That is, the design is


flexible to add more number of computers.
• Unlike in other two architectures, only the data request which cannot
be answered by local processors need to be forwarded through
interconnection network.

Disadvantages:

• Non-local disk accesses are costly. That is, if one server receives the
request. If the required data not available, it must be routed to the
server where the data is available. It is slightly complex.
• Communication cost involved in transporting data among computers.

Example Real Time Shared Nothing Implementation


• Teradata
• Tandem
• Oracle nCUBE

Database System Structure


Database System Structure are partitioned into modules for
different functions. Some functions (e.g. file systems) may be
provided by the operating system. Components include:
• File Manager manages allocation of disk space and data
structures used to represent information on disk.
• Database Manager: The interface between low-level data and
application programs and queries.
• Query Processor translates statements in a query language
into low-level instructions the database manager understands.
(May also attempt to find an equivalent but more efficient form.)
The Query Processor simplifies and facilitates access to data. The
Query processor includes the following component.
DDL Interpreter
DML Compiler
Query Evaluation Engine
The DDL interpreter interprets DDL statements and records the
definition in the data dictionary. The DML compiler translates
DML statements in a query language into an evaluation plan
consisting of low-level instructions that the query evaluation
engine understands. The DML compiler also performs query
optimization, which is it picks the lowest cost evaluation plan from
among the alternatives. Query evaluation engine executes low level
instructions generated by the DML compiler.
• DML Precompiled converts DML statements embedded in an
application program to normal procedure calls in a host language.
The precompiled interacts with the query processor.
• DDL Compiler converts DDL statements to a set of tables
containing metadata stored in a data dictionary.
In addition, several data structures are required for physical system
implementation:
• Data Files: store the database itself.
• Data Dictionary: stores information about the structure of the
database. It is used heavily. Great emphasis should be placed on
developing a good design and efficient implementation of the
dictionary.
• Indices: provide fast access to data items holding particular
values.

Storage Manager

The storage manager is important because database typically


require a large amount of storage space. So it is very important
efficient use of storage, and to minimize the movement of data to
and from disk .
A storage manager is a program module that provides the interface
between the low-level data stored in the database and the
application programs and the queries submitted to the system. The
Storage manager is responsible for the interaction with the file
manager.
The Storage manager translates the various DML statements into
low level file system commands. Thus the storage manager is
responsible for storing, retrieving, and updating data in the
database. The storage manager components include the following.
• Authorization and Integrity Manager
• Transaction Manger
• File Manager
• Buffer Manger

Authorization and Integrity Manger tests for the satisfaction of


integrity constraints and checks the authority of users to access
data. Transaction manager ensures that the database remains in a
consistent state and allowing concurrent transactions to proceed
without conflicting.
The file manager manages the allocation of space on disk storage
and the data structures used to represent information stored on
disk. The Buffer manager is responsible for fetching the data from
disk storage into main memory and deciding what data to cache in
main memory.
The storage manager implements the following data structures as
part of the physical system implementation. Data File, Data
Dictionary, Indices. Data files stores the database itself. The Data
dictionary stores Meta data about the structure of database, in
particular the schema of the database. Indices provide fast access
to data items.

QUERY PROCESSOR
A query processor is one of the major components of a relational database or an
electronic database in which data is stored in tables of rows and columns. It
complements the storage engine, which writes and reads data to and from storage
media.
Basic Operation

A user, or an applications program, interacts with the query processor and the query
processor, in turn interacts with the storage engine. Essentially, the query processor
receives an instruction or instructions written in Structured Query Language (SQL),
chooses a plan for executing the instructions and carries out the plan.

Optimization

The SQL syntax is transformed into a series of operations that can be performed on
data and its indices. The raw query plan, as it is known, is optimized to make it more
efficient before it is executed.

Separation

Effectively, a user specifies the result that he or she wants to achieve and the query
processor determines how the result is achieved. In this way, the query processor
separates the user from the unnecessary details of how a query is executed.

You might also like