Professional Documents
Culture Documents
SSRN Id967687
SSRN Id967687
net/publication/228296135
CITATIONS READS
0 5,730
2 authors:
All content following this page was uploaded by Bâra Adela on 03 February 2014.
ABSTRACT
In order to improve the quality of Management nformation Systems in an organization we can
choose to build the system using Business Intelligence techniques such as OLAP and
datawarehousing or by using traditional reports based on SQL queries. The cost and developing
time for BI tools is greater than those for SQL Reports and these factors are important in taking
decission on what type of techniques we used for MIS.
This paper presents some of the optimization methods that are used for better performance and
low time response SQL queries.
Keywords: Management Information Systems (MIS), Structured Query Language (SQL),
tuning and optimization, SQL query plans.
1. INTRODUCTION
The purpose of Management Information Systems is to assist tactical level managers in
taking decissions and to provide in real time representative informations, to support their
activities such as analyzing departamental data, planning and forecasting activities for their
decission area.
MIS users can manage and manipulate large sets of data in a short period of time or real
time systems. In essence, managers at every departamental level can have a customized view
that extracts information from transactional sources and summarizes it into meaningful
indicators. MIS gather data from ERP systems implemented in an organization fror different
functional areas such as: financials, inventory, purchase, order management, production.
Information from this functional areas within a ERP system is managed by a relational
software database such as Oracle Database. In the table below we represets the main
differences between ERP reports and MIS reports [LIBA01] (table 1):
3
use ALL_ROWS hint that explicitly chooses the cost-based approach to optimize a SQL
statement with a goal of best throughput.
We can collect exact or estimated statistics about physical storage characteristics and
data distribution in these schema objects by using the DBMS_STATS package. We can use thes
package to collect histograms for table columns that contain values with large variations in
number of duplicates, called skewed data [ORMG01]. The resulting statistics provide
information about data uniqueness and distribution and based on this, the query optimizer can
compute plan costs with a high degree of accuracy. This enables the query optimizer to choose
the best execution plan based on the least cost. For example we can gather and view statistics
against tables in our schema:
BEGIN
DBMS_STATS.GATHER_TABLE_STATS (USER, 'COMENZI_DESFACERE');
END;
/
SELECT COLUMN_NAME, COUNT(*)
FROM USER_TAB_HISTOGRAMS
WHERE TABLE_NAME='COMENZI_DESFACERE'
AND COLUMN_NAME IN ('INVENTORY_ORGANIZATION_ID',
'INVENTORY_ITEM_ID')
GROUP BY COLUMN_NAME;
The results is presented in the figure below (figure 1):
4
To choose an execution plan for a join statement, the optimizer must choose an access
path to retrieve data from each table in the join statement, use a join method like nested loop,
sort merge, cartesian, hash joins and choose the join order. The optimizer first determines
whether joining two or more tables having UNIQUE and PRIMARY KEY constraints and
places these tables first in the join order. The optimizer then optimizes the join of the
remaining set of tables and determinates the cost of a join depending on the following methods:
Hash joins are used for joining large data sets and the tables are related with an
equality condition join. The optimizer uses the smaller of two tables or data sources to build a
hash table on the join key in memory. It then scans the larger table to find the joined rows.
This method is best used when the smaller table fits in available memory. The cost is
then limited to a single read pass over the data for the two tables.
In the following example we joined tables CLIENTI and COMENZI_DESFACERE. The table
COMENZI_DESFACERE with 86 rows is used to build the hash table, and CLIENTI is the
larger table with 262 rows, which is scanned later (figure 2):
5
Figure 3: Nested Loops example
We can compare the results from the two methods involved: in Hash method cost is 7
and in Nested Loops the cost is 24, so it’s better to use the first method.
Sort merge joins can be used to join rows from two independent sources. Hash joins
generally perform better than sort merge joins. On the other hand, sort merge joins can perform
better than hash joins if the row sources are sorted already and a sort operation does not have to
be done. However, if a sort merge join involves choosing a slower access method (an index
scan as opposed to a full table scan), then the benefit of using a sort merge might be lost.
Sort merge joins are useful when the join condition between two tables is an inequality
condition. Sort merge joins perform better than nested loop joins for large data sets [ORA01].
In a merge join, there is no concept of a driving table and the join consists of two steps:
• Sort join operation in which both the inputs are sorted on the join key. If the input is
already sorted by the join column, then a sort join operation is not performed for that
row source.
• Merge join operation when the sorted lists are merged together.
The optimizer can choose a sort merge join over a hash join for joining large amounts
of data if the join condition between two tables is not an equi-join or because of sorts already
required by other operations.
For example we create an index on customer_id in CLIENTI table and on customer_id
and ordered_date in COMENZI_DESFACERE table:
create index clienti_cust_id_idx on clienti(customer_id);
create index cd_c_id_idx on comenzi_desfacere(customer_id);
create index cd_ord_date_idx on comenzi_desfacere(ordered_date);
6
Then, we specified the USE_MERGE hint and optimizer used merge method for join (figure 4):
7
group by c.customer_id,
c.customer_name;
But if we use a condition on customer_id to limit it to a short list of values, we can
improve performance with Nested Loops method, like in the following example:
SELECT /*+ USE_NL(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)
and c.customer_id IN(1592, 1598)
group by c.customer_id,
c.customer_name;
In this case we obtain the best performance with only 5 point for cost value, comparing
to 7 points for Hash and Merge methods.
• Hash Join Outer Joins - The optimizer uses hash joins for processing an outer join if
the data volume is high enough to make the hash join method efficient or if it is not
possible to drive from the outer table to inner table. Below is an example of using Hash
method. In this case the cost is 8, better than Merge or Nested Loops methods:
SELECT /*+ USE_HASH(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)
group by c.customer_id,
c.customer_name;
• Sort Merge Outer Joins - When an outer join cannot drive from the outer (preserved)
table to the inner (optional) table, it cannot use a hash join or nested loop joins. Then it
uses the sort merge outer join for performing the join operation. Below is an example of
8
using Merge method which lead to 9 for the cost value, better than Nested Loops
method:
SELECT /*+ USE_MERGE(c cd) */
c.customer_id,
c.customer_name,
nvl(sum(cd.ordered_quantity),0) total_quantity
FROM clienti c,
comenzi_desfacere cd
WHERE c.customer_id = cd.customer_id(+)
group by c.customer_id,
c.customer_name;
• Full Outer Joins - A full outer join acts like a combination of the left and right outer
joins. In addition to the inner join, rows from both tables that have not been returned in
the result of the inner join are preserved and extended with nulls.
CONCLUSIONS
Management Information Systems are based on a set of views that extract, join and
agreggate a large amount of data. In order to develop such types of systems we can choose BI
tools or Reports based on SQL queries. The last solution is low consuming developing time
and costs. But MIS require information in a real time manner and based on a different data
sources.
For better performance and in order to reduce response time for SQL queries we should
consider taking the advantage of these methods presented above and tuning our statement using
optimizer’s hints.
REFERENCES
[LIBA01] – Lungu Ion, Bara Adela, Fodor Anca, “Business Intelligence tools for
building the Executive Information Systems”, 5thRoEduNet International Conference,
Universitatea Lucian Blaga, Sibiu, june 2006
[ORA01] – Oracle Corporation, “Database Performance Tuning Guide
10g Release 2 (10.2)”, Part Number B14211-01, oracle.com/technology/documentation
[ORMG01] – Kyte Tom, “On Joins and Query Plans”, Oracle Magazine, may/june
2006, pag . 69 – 72.
[*NET] - http://www.oracle.com