Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Standard view
Full view
of .
Look up keyword
Like this
0 of .
Results for:
No results containing your search query
P. 1
Analytical Functions 1

Analytical Functions 1

Ratings: (0)|Views: 132 |Likes:
Published by vinod_ce

More info:

Published by: vinod_ce on Apr 16, 2009
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less





Page 1
One of the deficiencies of SQL is the lack of support for analytic calculations like movingaverages, cumulative sums, ranking, percentile andlead and lag which are critical for OLAPapplications. These functions work on ordered sets of data. Expressing these functions in current SQL isnot elegant, requires self-joins and is difficult tooptimize.To address this issue, Oracle has introduced SQLextensions called Analytic functions in Oracle8
Release 2. In this paper we discuss the motivationbehind these functions, their structure and semantics,their usage for answering reasonably complexbusiness questions and various algorithms for theiroptimization and execution in the RDBMS server.Several implementation techniques are discussedin the paper. It presents heuristic algorithms used inthe optimizer for reducing the number of sortsrequired by multiple Analytic functions. It alsocovers specialized processing of Top-N Rank queriesto reduce buffer usage in the iterator model and toimprove communication costs for parallel execution.The scope of OLAP support afforded by these newfunctions far exceeds existing SQL functionality.Oracle and IBM have proposed these SQL extensionsto ANSI to be included in the SQL-99 standard.
Analytic calculations like moving averages,cumulative sums, ranking, percentile and lag/lead arecritical for decision support applications. In order forROLAP tools and applications to have access to suchcalculations in an efficient and scalable manner, it isimportant that relational databases provide a succinctrepresentation in SQL which can be easily optimized.Analytic functions have been introduced in Oracle 8
Release 2 to meet these requirements.These functions typically operate on a windowdefined on an ordered set of rows. That is, for eachrow in the input, a window is defined to encompassrows satisfying certain condition relative to thecurrent row and the function is applied over the datavalues in the window. The function can be anaggregate function or a function like rank, ntile etc.Expressing these functions in the existing SQL isquite complicated, may require definition of viewsand joins and is difficult to optimize. In commercialdatabases, this functionality can be simulatedthrough the use of subqueries, rownum
andprocedural packages
. These simulations often resultin inefficient and non-scalable execution plans as thequery’s semantics are dispersed across multiplequery blocks or procedural extensions which arenon-parallelizable.Oracle has proposed extensions to SQL thatexpress analytic functions succinctly in a singlequery block, thus paving the way for efficient and
(1) An Oracle specific function whichgenerates a monotonically increasingsequence number for the result rows of aquery(2) PL/SQL in Oracle
Analytic Functions in Oracle 8
Srikanth Bellamkonda, Tolga Bozkaya, Bhaskar GhoshAbhinav Gupta, John Haydu, Sankar Subramanian, Andrew Witkowski
Oracle Corporation, 500 Oracle Parkway 4op7,Redwood Shores, CA 94065{sbellamk, tbozkaya, abgupta, bghosh, jhaydu, srsubram, awitkows}@us.oracle.com
Page 2
scalable execution and better support for OLAP toolsand analytic applications. A syntax proposal has beenmade to ANSI jointly with IBM[OraIBM99].
Business Needs
The business needs for analytic calculations fallinto the following categories.1.
Ranking Functions:
These functions rank databased on some criteria and address businessquestions like "rank products based on their an-nual sales", "find top 10 and bottom 5 salesper-sons in each region".2.
These functions operate ona sliding window of rows defined on an ordereddata set and for each row
return some aggregateover the window rooted at
. These functions caneither compute moving aggregates or cumulativeaggregates. Common business queries include"find the 13-week moving average of a stock price", or "find the cumulative monthly sales of aproduct in each region".3.
Reporting Functions:
These functions addressbusiness questions requiring comparisons of ag-gregates at different levels. A typical querywould be "for each region, find the sales for cit-ies which contribute at least 10% of regionalsales". This query needs access to aggregatedsales per (region) and (region, city) levels.4.
Lag/Lead Functions:
These functions provideaccess to other rows from any given row withoutthe need to perform a self-join. Queries ad-dressed by these functions include "find monthlychange in the account balance of a customer"where we can access a row one month before thecurrent row.5.
Inverse Distribution functions allow computationof median and similar functions on an ordereddata set. First/Last functions allow us to computeaggregates on the first or last value of an orderedset. For example, find the average balance for thefirst month of each year. Details of these func-tions are given in the Appendix.
Outline of the Paper
The rest of the paper is organized as follows. InSection 3we present queries for some interestingbusiness questions using SQL with and without ourextensions. We also present the syntax and semanticsof Analytic functions there. The computationalmodel is presented inSection 4.InSection 5,we present several families of Analytic functions.Section 6presents some optimization andexecution techniques, andSection 7presents resultsof experiments comparing our approach with usingexisting SQL for computing the queries given inSection 3.We present related work inSection 8.In the Appendix we present some more examples and afew more Analytical functions.
Motivating Examples
In the rest of the paper, we will use the followingSALESTABLE schema.
salesTable(region, state, product,salesperson, date, sales)
The key of the table is (region, state, product,salesperson, date). The table records each transactionmade by any salesperson. We show here somequeries which are difficult to express in SQL92 butare elegant and intuitive with our extensions.
Q1 Find Top-3 salespersons within each region basedon sales
This query can either be expressed usingprocedural extensions to SQL (e.g. PL/SQL inOracle) or using complicated non-equi joins inSQL92. We give the PL/SQL representation herebecause this is the better approach.
SELECT region,salesperson, s_sales,RANK(region,s_sales) rnkFROM (
Page 3
SELECT region, salesperson,SUM(sales) s_salesFROM salesTableGROUP region, salespersonORDER BY region, SUM(sales) DESCWHERE RANK(region, s_sales) <= 3;
The data from the in-line view is passed "in order"to the user defined PL/SQL function (RANK). Theoperation of assigning ranks inherently becomessequential, whereas we could have easily parallelizedby partitioning by regional level and keeping only theTop-3 values within each region.
Q2 Compute the moving sum of sales of each productwithin the last month
This query would require an expensive non-equiself-join where each row is joined with rows 1 monthbefore it:
SELECT s1.product, SUM(s2.sales) as m_avgFROM salesTable s1, salesTable s2WHERE s1.product = s2.product ANDs2.date <= s1.date ANDs2.date >= ADD_MONTHS(s1.date, -1)GROUP BY s1.product, s1.date;
Clearly, the query execution time would growlinearly with the size of window.
Q3 For each year, find products with maximum annualsales
This query requires defining the view V1 forcomputing yearly sales for each product and anotherview V2 (over V1) to compute maximum yearlysales.
CREATE VIEW V1AS SELECT product, year(date) yr,SUM(sales) s_salesFROM salesTableGROUP BY product, year(date);CREATE VIEW V2AS SELECT yr, MAX(s_sales) m_salesFROM V1GROUP BY yr;SELECT product, yr, s_salesFROM V1, V2WHERE V1.yr = V2.yrAND V1.s_sales = V2.m_sales
Q4 Give the Top-10 products for states which contrib-ute more than 25% of regional sales.
This query requires not only access to 2 differentlevels of aggregation, but also filtering based on rank values.
CREATE VIEW V4AS SELECT region, state, SUM(sales)s_salesFROM salesTableGROUP BY region, stateCREATE VIEW V5AS SELECT region, SUM(sales) s_salesFROM salesTableGROUP BY region;SELECT region, state, product,RANK(region, state, X.s_sales) rankFROM (SELECT region, state, product,SUM(sales) s_salesFROM salesTable, V4, V5WHERE salesTable.region = V4.regionAND salesTable.state = V4.stateAND V4.region = V5.regionAND V4.s_sales > 0.25 * V5.s_salesGROUP BY region,state, productORDERBYregion,state,SUM(sales)DESC)XWHERERANK(region, state, X.s_sales) <= 10;
Clearly, it is not very intuitive to define views for aparticular instance of a query. Also, in a typedlanguage with positional arguments, a differentRANK function would have to be defined for eachnew instance. Not only is the representation complex,but multiple aggregations and joins would make itquite inefficient as well.
Q5 For each product, compare sales of a month to itsprevious month
This query requires a self-join of monthlyaggregated salesTable.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->