You are on page 1of 4

ABSTRACT

The major issues in multi database system design are query processing as well
as query optimization. The query processing problem in large scale databases is NP-
hard in nature. The heuristics and genetic algorithms are the desired methods to solve
query processing problem as well as query optimization problem in large scale database
systems.

The main idea of multiple query processing is to optimize a set of queries


together and execute the common operations once in order to save query execution time
and evaluation cost. The major tasks in multiple query processing are common
operation identification and global execution plan construction. Each query may have
several alternative evaluation plans, each with a different set of tasks. Therefore the
goal of multiple query processing is to choose the right set of plans for queries which
minimizes the total execution time by performing common tasks only once.
Minimization of query execution time is observed as an important performance
objective in large scale databases. While total time is minimized for processing on line
queries, the response time is also minimized for decision Support type queries. Thus
different allocations of sub queries and their execution plans are optimal based. To
evaluate the objective functions, formulation of the sub query allocation problem along
with the cost is required. Since the problem is NP-hard, it is solved using genetic
algorithm. Generally it is seen that the query execution plans with total minimization
objective are inefficient for response time objective and vice versa. The genetic
procedure is tested with simulation experiments using complex queries.

Various techniques have been developed in the field of query processing for the
efficient and robust execution of queries. So far, it has been focused on issues related to
data-retrieval queries, with a strong backing on relational algebra. However, update
operations can also exhibit a number of query processing issues, depending on the
complexity of the operations and the volume of data to process.

The objectives in the multiple query processing are to increase system throughput
and decrease single query response time. The main aspects of multiple query processing

v
are to evaluate the common sub expressions during execution and to reduce
optimization overhead. While comparing the total cost of execution plan of single query
as well as multiple queries, the search space of the cost based processing containing
single plan and multi query plans are evaluated. Optimizing multiple queries is one of
the key problems in the multi query processing. Exploiting common operator can
increase the throughput, but may decrease the response time of some queries. After
retrieving the optimized sequential multi query execution plan, the common operators
usually are dealt with according to the following steps.

(i ) Divide the query into tasks, which are the basic unit of scheduling and
execution.

(ii) Determine the parallelism degrees for each task.

(iii) Identify and mark the pseudo common operations and incomplete common
operations.

While considering the integrated access to multiple databases, usually it is seen that
the data transmission cost and the processing cost can be reduced by applying the semi
join algorithms.

Data and task allocation are also important issues in multiple query processing. Data
allocation defines the type of data stored. A task which is defined as a set of modules
executes on one of the processing state and communicates with some other modules of
the task by inter-module communication. It is a program or a part of a program in
execution. Allocating task is an essential phase in multi database system. In multi
database processing system, the software application is generally called as task and can
be defined as set of cooperating modules. When the resource is shared in the system, the
data & operation allocation are closely interrelated & highly dependent on each other.
Modern database systems use a query optimizer to identify the most efficient strategy
known as plan, to execute declarative queries. Optimization is a mandatory exercise
since the difference between the cost of the best plan and a random choice are in orders
of magnitude. The role of query optimizers is especially critical for the decision support
queries featured in multi database applications. Usually the optimizer’s plan choice is
primarily a function of the selectivities of the base relations participating in the query.

vi
Multi database systems frequently execute a set of related queries, which share
several common sub expressions by finding evaluation plans sharing common results.
However, plans with pipelining may not always be realizable with limited buffer space.

Multiple processors normally can be used to improve the performance of


database systems and the parallelism can be viewed at three levels in query processing,
e.g. intra operation, inter operation, and inter query parallelism. Intra operation and inter
operation parallelism are also known as intra query parallelism .The inter query
parallelism is usually more effective in case of multiple dependent queries. A
simulation study has been conducted and the global near optimal solutions are obtained
when the number of processors available is sufficient. The decision making queries, in
spite of complexity usually access a huge number of tuples, those are involved in joins
between relations as well as aggregations.

In this work it is being suggested to minimize the estimated cost of query plans
in multi query processing environments. The plans and the tasks from multiple query
processing environments are retrieved by applying the genetic algorithm techniques.
Also the estimated cost of query plans, degree coefficient between plans, CPU time
are retrieved while evaluating the size of plans, weight & fitness values and average
sum of plans in multiple query processing environment. As it is a NP-hard problem, the
genetic algorithm techniques are used to solve the problem. It is well understood that
the modification of traditional query processing techniques are applied in a
heterogeneous environment to produce optimized transformed queries. Considering the
large scale database applications, it is required to deal with larger size queries. Initially
while considering the smaller size query plans, it is observed that the estimated cost of
query plans remain similar with the specific range of query plans. The plan select value
is quite dependent to the size of query plans. While considering the higher range of
query plans, it is found that the plan select value is directly proportional to the size of
query plans. It has also been seen that the size of query plan is directly proportional to
the weight of the query plans. The fitness parameter of a query plan is dependent on the
weight of the query plan as well as the size of the query.

While retrieving the query plans and evaluating degree coefficient between
plans in multi dependent query processing environment, it is found that the size of

vii
query plan is directly proportional to the cost of query plans. Also it is observed that the
cost of query plans remain in similar range with the high range of query plans. The
fitness parameter of a query plan is dependent on the weight of the plan.

While evaluating cost of plan and functional value in dynamic processing


environment, it is seen that the functional value is directly proportional to the resulting
plan value. The plan select value increases with the increasing size of query plans. In
dynamic programming environment, techniques like the join enumerations have been
applied to augment the computation of expensive predicates. It uses several
optimizations to enumerate join orders, known as serial dynamic programming
enumerations. It first generates different query execution plans and then calls the plans.
It compares the query evaluation plans and joins the best query evaluation plans to form
large query evaluation plans. The attribute join cost can be found by performing a
specific join implementation using join dependencies. The main idea is to create a hash
table with two relations. It has been observed that the join dependencies actually
impacts on the attribute join cost with increasing functional values achieved from the
relations.

viii

You might also like