You are on page 1of 10

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/228296135

Tuning SQL Queries for Better Performance in Management Information


Systems Using Large Set of Data

Article · March 2007

CITATIONS READS

0 5,730

2 authors:

Ion Lungu Bâra Adela


Bucharest Academy of Economic Studies Bucharest Univesity of Economic Studies
70 PUBLICATIONS 299 CITATIONS 150 PUBLICATIONS 907 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Bâra Adela on 03 February 2014.

The user has requested enhancement of the downloaded file.


TUNING SQL QUERIES FOR BETTER PERFORMANCE IN MANAGEMENT
INFORMATION SYSTEMS USING LARGE SET OF DATA

Prof. univ. dr. Ion LUNGU


Asist. univ. drd. Adela BÂRA
Academy of Economic Studies, Bucharest, Romania

ABSTRACT
In order to improve the quality of Management nformation Systems in an organization we can
choose to build the system using Business Intelligence techniques such as OLAP and
datawarehousing or by using traditional reports based on SQL queries. The cost and developing
time for BI tools is greater than those for SQL Reports and these factors are important in taking
decission on what type of techniques we used for MIS.
This paper presents some of the optimization methods that are used for better performance and
low time response SQL queries.
Keywords: Management Information Systems (MIS), Structured Query Language (SQL),
tuning and optimization, SQL query plans.

1. INTRODUCTION
The purpose of Management Information Systems is to assist tactical level managers in
taking decissions and to provide in real time representative informations, to support their
activities such as analyzing departamental data, planning and forecasting activities for their
decission area.
MIS users can manage and manipulate large sets of data in a short period of time or real
time systems. In essence, managers at every departamental level can have a customized view
that extracts information from transactional sources and summarizes it into meaningful
indicators. MIS gather data from ERP systems implemented in an organization fror different
functional areas such as: financials, inventory, purchase, order management, production.
Information from this functional areas within a ERP system is managed by a relational
software database such as Oracle Database. In the table below we represets the main
differences between ERP reports and MIS reports [LIBA01] (table 1):

Electronic copy of this paper is available at: http://ssrn.com/abstract=967687


CHARACTERISTICS ERP REPORTS MIS REPORTS
Objectives Analyse indicators that Processes optimization,
measure current and internal forecast internal data
activities or daily reports

Level of decision Operational/Medium Medium


User involved Operational level of Tactical level of management
management
Data Management Relational databases/ Relational databases/
Datawarehouse Datawarehouse
Typical operation Report/Analyse Analyse
Number of Limited Large
records/transaction
Data Orientation Record Record/Cube
Level of detail Detailed, sumarised, pre- Aggregate
aggregate
Age of data Current Historical/current/prospective
Table 1: A comparison of ERP and MIS systems.
OLTP applications usualy operate on relatively few rows at a time. We can use an
index that can point to the rows that are required and, for this case Oracle can construct an
accurate plan to access those rows efficiently through the shortest possible path. In decision
support system (DSS) environments like MIS, selectivity is less important, because they often
access more table's rows and full table scans are common. In this situation indexes are not even
used [ORA01].
Management Information Systems usualy work with large sets of data and require a
short response time. If you consider not using analytical tools like OLAP and datawarehousing
techniques then you have to build your system throught SQL queries and retrieve data directly
from OLTP systems. In this case, the large amount of data in ERP systems may lead to an
increase of responding time for MIS. That’s why you should consider to phrase the queries
using the best optimization techniques.

Electronic copy of this paper is available at: http://ssrn.com/abstract=967687


2. COMPUTE STATISTICS AND EXPLAIN THE EXECUTION PLAN OF SQL
QUERIES
For demonstration, we propose a set of examples in which we try to build a set of views
for retreving data from Orders Management level and tables: customers, purchase orders, units
and products. We use these views for analyzing data with joins, agregation and compute
statistics directly in ERP systems throught Reports or in other tools like Oracle Discoverer.
We’ll use four tables for the following examples, these are: CLIENTI, PRODUSE,
UNITATI, COMENZI_DESFACERE. In order to build our set of views for analyzing data, we
need to join these four tables, each of them containing large set of data. The main problem in
this case is how we make joins and agreggations spead up. We’ll use for visualization the
Explain Plan and see how Oracle executes the SQL query.
When a SQL statement is executed on an Oracle database, the Oracle query optimizer
determines the most efficient execution plan after considering many factors related to the
objects referenced and the conditions specified in the query. The optimizer estimates the cost
of each potential execution plan based on statistics in the data dictionary for the data
distribution and storage characteristics of the tables, indexes, and partitions accessed by the
statement and it evaluates the execution cost. This is an estimated value depending on
resources used to execute the statement. The optimizer calculates the cost of access paths and
join orders based on the estimated computer resources, which includes I/O, CPU, and memory
[ORA01]. This evaluation is an important factor in the processing of any SQL statement and
can greatly affect execution time.
During the evaluation process, the query optimizer reviews statistics gathered on the
system to determine the best data access path and other considerations. We can override the
execution plan of the query optimizer with hints inserted in SQL statement. A SQL statement
can be executed in many different ways, such as full table scans, index scans, nested loops,
hash joins, sort merge joins. We can set the parameters for query optimizer mode depending on
our goal. By default the optimizer is set to the best throughput which chooses the least amount
of resources necessary to process all rows accessed by the statement. But for MIS, time is one
of the most important factor and we should optimize a statement with the goal of best response
time. To set up the goal of the query optimizer we can use one of the hints that can override the
OPTIMIZER_MODE initialization parameter for a particular SQL statement [ORA01]. So, we
can use FIRST_ROWS(n) hint to instructs Oracle to optimize an individual SQL statement with
a goal of best response time to return the first n number of rows. The hint uses a cost-based
approach for the SQL statement, regardless of the presence of statistic. The second option is to

3
use ALL_ROWS hint that explicitly chooses the cost-based approach to optimize a SQL
statement with a goal of best throughput.
We can collect exact or estimated statistics about physical storage characteristics and
data distribution in these schema objects by using the DBMS_STATS package. We can use thes
package to collect histograms for table columns that contain values with large variations in
number of duplicates, called skewed data [ORMG01]. The resulting statistics provide
information about data uniqueness and distribution and based on this, the query optimizer can
compute plan costs with a high degree of accuracy. This enables the query optimizer to choose
the best execution plan based on the least cost. For example we can gather and view statistics
against tables in our schema:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (USER, 'COMENZI_DESFACERE');
END;
/
SELECT COLUMN_NAME, COUNT(*)
FROM USER_TAB_HISTOGRAMS
WHERE TABLE_NAME='COMENZI_DESFACERE'
AND COLUMN_NAME IN ('INVENTORY_ORGANIZATION_ID',
'INVENTORY_ITEM_ID')
GROUP BY COLUMN_NAME;
The results is presented in the figure below (figure 1):

Figure 1: Using DBMS_STATS package


3. EXECUTING JOIN STATEMENTS WITH QUERY OPTIMIZER

4
To choose an execution plan for a join statement, the optimizer must choose an access
path to retrieve data from each table in the join statement, use a join method like nested loop,
sort merge, cartesian, hash joins and choose the join order. The optimizer first determines
whether joining two or more tables having UNIQUE and PRIMARY KEY constraints and
places these tables first in the join order. The optimizer then optimizes the join of the
remaining set of tables and determinates the cost of a join depending on the following methods:
Hash joins are used for joining large data sets and the tables are related with an
equality condition join. The optimizer uses the smaller of two tables or data sources to build a
hash table on the join key in memory. It then scans the larger table to find the joined rows.
This method is best used when the smaller table fits in available memory. The cost is
then limited to a single read pass over the data for the two tables.
In the following example we joined tables CLIENTI and COMENZI_DESFACERE. The table
COMENZI_DESFACERE with 86 rows is used to build the hash table, and CLIENTI is the
larger table with 262 rows, which is scanned later (figure 2):

Figure 2: Hash join example


If another method is used by default, we can specifiy the USE_HASH hint to instruct
the optimizer to use a hash join when joining two tables together with an equall condition.
Nested loop joins are useful when small subsets of data are being joined and if the join
condition is an efficient way of accessing the second table. We specified the USE_NL hint to
instruct the Optimizer to use Nested Loop instead of Hash Joins (figure 3):

5
Figure 3: Nested Loops example
We can compare the results from the two methods involved: in Hash method cost is 7
and in Nested Loops the cost is 24, so it’s better to use the first method.
Sort merge joins can be used to join rows from two independent sources. Hash joins
generally perform better than sort merge joins. On the other hand, sort merge joins can perform
better than hash joins if the row sources are sorted already and a sort operation does not have to
be done. However, if a sort merge join involves choosing a slower access method (an index
scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality
condition. Sort merge joins perform better than nested loop joins for large data sets [ORA01].
In a merge join, there is no concept of a driving table and the join consists of two steps:
• Sort join operation in which both the inputs are sorted on the join key. If the input is
already sorted by the join column, then a sort join operation is not performed for that
row source.
• Merge join operation when the sorted lists are merged together.
The optimizer can choose a sort merge join over a hash join for joining large amounts
of data if the join condition between two tables is not an equi-join or because of sorts already
required by other operations.
For example we create an index on customer_id in CLIENTI table and on customer_id
and ordered_date in COMENZI_DESFACERE table:
create index clienti_cust_id_idx on clienti(customer_id);
create index cd_c_id_idx on comenzi_desfacere(customer_id);
create index cd_ord_date_idx on comenzi_desfacere(ordered_date);

6
Then, we specified the USE_MERGE hint and optimizer used merge method for join (figure 4):

Figure 4: Sort Merge joins example


In this case we can choose Hash method in which the cost was lower then Sort Merge
method.
Cartesian joins are used when one or more of the tables does not have any join
conditions to any other tables in the statement. The optimizer joins every row from one data
source with every row from the other data source, creating the Cartesian product of the two
sets.
Outer Joins extends the result of a simple join returning all rows that satisfy the join
condition and also returns some or all of those rows from one table for which no rows from the
other satisfy the join condition. we can instruct the optimizer to use one of the following types
of outer joins [ORA01]:
• Nested Loop Outer Joins - In a regular outer join, the optimizer chooses the order of
tables (driving and driven) based on the cost. However, in a nested loop outer join, the
order of tables is determined by the join condition. The outer table, with rows that are
being preserved, is used to drive to the inner table. In the following example we used
USE_NL hint and obtain the highest cost (266) of all methods:
SELECT /*+ USE_NL(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)

7
group by c.customer_id,
c.customer_name;
But if we use a condition on customer_id to limit it to a short list of values, we can
improve performance with Nested Loops method, like in the following example:
SELECT /*+ USE_NL(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)
and c.customer_id IN(1592, 1598)
group by c.customer_id,
c.customer_name;
In this case we obtain the best performance with only 5 point for cost value, comparing
to 7 points for Hash and Merge methods.
• Hash Join Outer Joins - The optimizer uses hash joins for processing an outer join if
the data volume is high enough to make the hash join method efficient or if it is not
possible to drive from the outer table to inner table. Below is an example of using Hash
method. In this case the cost is 8, better than Merge or Nested Loops methods:
SELECT /*+ USE_HASH(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)
group by c.customer_id,
c.customer_name;
• Sort Merge Outer Joins - When an outer join cannot drive from the outer (preserved)
table to the inner (optional) table, it cannot use a hash join or nested loop joins. Then it
uses the sort merge outer join for performing the join operation. Below is an example of

8
using Merge method which lead to 9 for the cost value, better than Nested Loops
method:
SELECT /*+ USE_MERGE(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)
group by c.customer_id,
c.customer_name;
• Full Outer Joins - A full outer join acts like a combination of the left and right outer
joins. In addition to the inner join, rows from both tables that have not been returned in
the result of the inner join are preserved and extended with nulls.

CONCLUSIONS
Management Information Systems are based on a set of views that extract, join and
agreggate a large amount of data. In order to develop such types of systems we can choose BI
tools or Reports based on SQL queries. The last solution is low consuming developing time
and costs. But MIS require information in a real time manner and based on a different data
sources.
For better performance and in order to reduce response time for SQL queries we should
consider taking the advantage of these methods presented above and tuning our statement using
optimizer’s hints.
REFERENCES
[LIBA01] – Lungu Ion, Bara Adela, Fodor Anca, “Business Intelligence tools for
building the Executive Information Systems”, 5thRoEduNet International Conference,
Universitatea Lucian Blaga, Sibiu, june 2006
[ORA01] – Oracle Corporation, “Database Performance Tuning Guide
10g Release 2 (10.2)”, Part Number B14211-01, oracle.com/technology/documentation
[ORMG01] – Kyte Tom, “On Joins and Query Plans”, Oracle Magazine, may/june
2006, pag . 69 – 72.
[*NET] - http://www.oracle.com

View publication stats

You might also like