Professional Documents
Culture Documents
- Part 2
by Mark Rittman
Part 1 | Part 2
Related Podcast: Mark Rittman - Oracle Openworld in Retrospect
Introduction
The OLAP Option to Oracle 10g gives you the ability to store multidimensional cubes of data in your Oracle database, and
perform OLAP queries on them using OLAP DML, regular SQL or query tools such as Oracle Discoverer Plus OLAP.
In part 1 of this article, you learned tips and best practices for designing Oracle OLAP cubes. Part 2 covers loading,
aggregating, and querying Oracle OLAP cubes and takes a look at some of the new features coming with 10g Release 2.
Pre-aggregating cubes
When you load data into the measures in your cube, you generally load in values at the lowest level of aggregation for the
measure. If a user then queries the measure at a higher level of aggregation, by default Oracle OLAP will calculate the
summarised data on the fly, just as regular SQL statements calculate SUMs and AVGs in response to a SELECT statement.
But, in the same way that with regular relational data you can pre-aggregate queries using materialized views and query
rewrite, you can pre-aggregate data in your cube to improve query response times. Indeed, because pre-summarisation and
summary awareness is built-in to the OLAP engine, it works more reliably and with less initial setup that relational summaries
created using materialized views, and the tools required to create cube summaries are built right in to Analytic Workspace
Manager 10g.
From a performance perspective though, the question that you need to ask yourself is to what extent you should pre-aggregate
your data? Should you always presummarise the entire cube, across all levels in all hierarchies, or should you selectively
summarise your cube (referred to as skip-level aggregation) to balance out preparation time with improvements to query
performance? Or is it better to let the OLAP engine generate summaries on the fly?
In general, it is recommended that you pre-aggregate data when there are ten or more child dimension members for a parent
dimension member, and you should also select pre-summarisation for all of the bottom levels in a dimension hieararchy. So, to
take the Time dimension for the Global Widgets cube, it would make sense to presummarise the day and month levels as 2926
days roll up to 96 months, but there would be little benefit in presummarising levels above those as there would be little benefit
in calculating summaries in advance compared to generating them on the fly. With the other dimensions, none of them have a
10:1 children to parent ratio, and therefore you could decide to pre-aggregate just the bottom level in each hierarchy and leave
it at that.
If you are using compression to reduce the size of your cube, you should experiment with leaving the topmost level in some
dimension hierarchies as unsummarised, as this can decrease the time required to maintain the cube at the cost of slightly
slower queries, due to the way that the compression algorithm works.
Of course, the cost of pre-aggregating data is the additional time, and space, that this part of the load process will take up, and
therefore a test was carried out using the Global Widgets sample data to see how much additional time and space presummarisation took up.
For the purposes of this test, the three versions of the Global Widgets cube (dense, sparse and compressed) were loaded first
using skip-level aggregation, and then using full pre-aggregation. The results of the test were as follows:
Cube Description Initial Cube Size (bytes) Aggregation Strategy Final Cube Size Maintainance Time
Dense
20.289 Mb
Sparse
22.029 Mb
Compressed
21.021 Mb
Skip-level
149.1 Mb
3 mins 27 secs
Full
1.385 Gb
46 mins 11 secs
Skip-level
132.1 Mb
1 min 55 secs
Full
1.074 Gb
16 mins 55 secs
Skip-level
214.1 Mb
2 mins 36 secs
Full
422.1 Mb
2 mins 48 secs
The slowest way to load and maintain a cube is to define all dimensions as dense, and fully aggregate
the cube.
The fastest way to load and maintain a cube is to define all dimensions except Time as sparse, and
partially (skip-level) pre-aggregate the cube.
The biggest impact on maintenance time for dense or sparse cubes is whether or not you partially or
fully aggregate the cube, with full pre-aggregation increasing maintenance time by a factor of 15.
If you move from a dense cube to a sparse cube, this reduces maintenance time by a factor of three.
For dense and sparse cube, pre-aggregating the cube increases the size of the cube by a factor of five.
If you can make use of the compression feature, you can fully pre-aggregate a cube is only slightly more
time than it takes to partially pre-aggregate a sparse or dense cube.
Note that these measurements are specific to the Global Widgets cube, and other cubes with differing or additional dimensions,
measures, hierarchies and levels will produce different timings and differentials.
Calculations are something that you should already be familiar with, and are defined using the Calculated Measures feature
using Analytic Workspace Manager 10g (Figure 3).
4. Maintain your cube using the Aggregate the cube for only the incoming data values option (Figure 4).
Figure 4: Selecting the cube processing option using Analytic Workspace Manager 10g.
As an alternative to remapping your dimension and cube loads to point to tables that only contain new and changed data,
another technique that has been used successfully in the past is to do the following:
1. Add an additional column to the source tables that contains a Data Loaded YN flag.
2. Create a view over this table that only returns those rows where the Data Loaded YN flag is set to
N.
3. Use this view, rather than the underlying table, as the data source for the analytic workspace mapping.
4. After the load has completed, update the underlying table setting the Data Loaded YN flag to Y.
In addition to setting the cube processing options, you will also want to check the Delete all attribute values of the selected
dimensions and the Delete all members of the selected dimension checkboxes if you need to delete dimensions and attribute
values in the analytic workspace that no longer exist in the source data.
Using the Global Widgets dataset, timings were taken using the sparse version of the cube, using the following scenarios:
1. Initial load of 8884 rows from the GS_SALES table, followed by a subsequent load of 2545 rows, with
Aggregate the full cube as the selected cube processing option.
2. Initial load of 8884 rows with Aggregate the full cube selected, then a further load of 11429 rows
(2545 of which were new) with Aggregate the cube for only the incoming values option selected.
3. Initial load of 8884 rows with Aggregate the full cube selected, then a further load of 2545 rows with
Aggregate the cube for only the incoming values option selected.
The results of the test were as follows:
Scenario # Load Type Rows Cube Processing Option Time to maintain Size of cube
1
2
3
Initial
13 mins 34 secs
936 Mb
20 mins 12 secs
1.87 Gb
Initial
11 mins 50 secs
936 Mb
21 mins 00 secs
1.87 Gb
Initial
12 mins 50 secs
936 Mb
19 mins 35 secs
1.82 Gb
The obvious conclusion from these tests was surprising and shows that, regardless of whether you load just the new and
changed data into the cube or load all data, or whether you choose to aggregate just the new values or all values, with Oracle
Database 10g Release 1 there is no difference in the time taken to perform cube maintenance. It is, however, my understanding
that this issue is, in fact, being addressed in Oracle Database 10g Release 2, and incremental loads will be significantly faster
with this release.
Figure 5: Saving a maintenance task to script using Analytic Workspace Manager 10g.
As this is nothing more than an SQL script containing an anonymous PL/SQL block, you can cut and paste it into SQL*Plus
after having SET TIMING ON, and by this method obtain precise timings on how long a maintenance task took to run. Ensure
that you have detached your cube using Analytic Workspace Manager 10g before you run this maintenance task, as the script
will, first of all, try to attach the cube in read/write mode and will fail if AWM10g is similarly attached.
SQL> set timing on
SQL> declare
2 xml_clob clob;
3 xml_str varchar2(4000);
4 isAW number;
5 begin
6 DBMS_AW.EXECUTE('AW ATTACH GSW_AW RW');
7 DBMS_LOB.CREATETEMPORARY(xml_clob,TRUE);
8 dbms_lob.open(xml_clob, DBMS_LOB.LOB_READWRITE);
9 dbms_lob.writeappend(xml_clob, 185, ' <BuildDatabase
Id="Action2" AWName="GSW_AW.GSW_AW" Buil
dType="EXECUTE" RunSolve="true" CleanMeasures="false"
CleanAttrs="false" CleanDim="false" TrackStatu
s="false" MaxJobQueues="0">');
10 dbms_lob.writeappend(xml_clob, 46, '
<BuildList
XMLIDref="CHANNEL.DIMENSION" />');
11 dbms_lob.writeappend(xml_clob, 46, '
<BuildList
XMLIDref="PRODUCT.DIMENSION" />');
12 dbms_lob.writeappend(xml_clob, 43, '
<BuildList
XMLIDref="TIME.DIMENSION" />');
13 dbms_lob.writeappend(xml_clob, 47, '
<BuildList
XMLIDref="CUSTOMER.DIMENSION" />');
14 dbms_lob.writeappend(xml_clob, 48, '
<BuildList
XMLIDref="PROMOTION.DIMENSION" />');
15 dbms_lob.writeappend(xml_clob, 57, '
<BuildList
XMLIDref="SALES.ORDER_QUANTITY.MEASURE" />')
;
16 dbms_lob.writeappend(xml_clob, 56, '
<BuildList
XMLIDref="SALES.SHIP_QUANTITY.MEASURE" />');
17 dbms_lob.writeappend(xml_clob, 18, ' </BuildDatabase>');
18 dbms_lob.close(xml_clob);
19 xml_str := sys.interactionExecute(xml_clob);
20 dbms_output.put_line(xml_str);
21 end;
22 /
PL/SQL procedure successfully completed.
Elapsed: 00:14:20.39
SQL>
Whilst the script is running, you can open another SQL*Plus session and query the OLAPSYS.xml_load_log table to check the
progress of the load. This is the log that Analytic Workspace Manager displays after a load has completed, but by querying this
table, you can check the progress of the load whilst it is running.
SQL>
SQL>
SQL>
SQL>
2
3
4
372
17 12-AUG-05 14:08:46
Hierarchies for CHANNEL.DIMENSION (1 out of 5)
Started Loading
17 rows selected.
SQL>
You can also determine the disk space taken up by a cube by using the DBMS_LOB.GETLENGTH() built-in function, like
this:
SQL> SELECT sum(DBMS_LOB.GETLENGTH(AWLOB)) AW_SIZE
2 FROM AW$GSW_AW;
AW_SIZE
---------1039424108
As an alternative to embedding the AWXML definitions within calls to DBMS_LOB, you can also save the XML to a file and
load it in to the database as a CLOB. Oracle Database 10gR2 comes with a new procedure,
DBMS_AW_XML.EXECUTEFILE, that can be used to load and process AWXML templates, and another,
DBMS_AW_XML.EXECUTE that should be used instead of SYS.INTERACTIONEXECUTE.
Order the data in your source tables to match the order in which dimensions are listed in your cube
(mentioned earlier).
Use the External Table feature to load data directly from flat files. External tables can also take
advantage of parallelism to break large files into smaller ones and distribute them over multiple
processors.
When writing OLAP DML routines to load data into an analytic workspace, use the SQL IMPORT
DIRECT command in preference to SQL FETCH, as this can be up to 53 times faster.
Additional tips provided by Anthony Waite of Oracle on the OTN OLAP Forum (with amendments from Oracle development)
include:
Turn off LOGGING (REDO) during builds to improve data insertion performance.
Disable Logging during build to improve overall load performance. Once load is complete turn it on.
If you choose to set NOLOGGING for the LOB segment (of the AW$ table containing your Analytic
Workspace) check out MetaLink 1058851.6 for information pertaining to event 10359 which can reduce
I/O for frequently updated NOLOGGING LOBs.
http://metalink.oracle.com/metalink/plsql/ml2_documents.showDocument?
p_database_id=NOT&p_id=1058851.6
Increase REDO Log size and log_buffer parameter to reduce log switch waits and improve overall time.
Ask the DBA to increase this to somewhere between 100M and 500M from the default of 10M. Use
ADDM to determine the ideal size. This could be crucial for poorly performing disk subsystems.
Move TEMP, UNDO and REDO Logs to fastest disks to improve overall build performance.
Ask the DBA to place TEMP, UNDO and REDO Logs on fastest disks. No RAID5. Use RAW whenever
possible and consider RAID10 or 0+1. RAID 5 can severely affect performance on highly updated
databases.
Use AW TRUNCATE instead of AW DELETE if you wish to keep the analytic workspace name.
Performs better with less overhead [and preserves grants on the analytic workspace].
EXEC DBMS_AW.EXECUTE('AW TRUNCATE SCOTT.EMPAW');
Users of OracleBI Spreadsheet Add-in 10.1.2 will also sometimes suffer additional performance problems due to the
Spreadsheet Add-ins need to retrieve all of the OLAP query results when the query first runs, as opposed to Discoverer Plus
OLAP and BI Beans which can retrieve the first part of a query and display it, and fetch the rest of the results afterwards.
If you want to look closer at what SQL is being generated by OLAP API tools, you can enable SQL trace and set the
_olap_continuous_trace_file database parameter to include SQL generated by the OLAP API:
To set SQL and OLAP API tracing across the database, you can issue the following commands from the SQL*Plus prompt:
SQL> alter system set sql_trace = true scope = spfile;
System altered.
Elapsed: 00:00:00.13
SQL> alter system set "_olap_continuous_trace_file" = true scope
= spfile;
System altered.
Elapsed: 00:00:00.11
SQL> shutdown immediate;
Database closed.
Database dismounted.
ORACLE instance shut down.
SQL> startup
ORACLE instance started.
Total System Global Area 1073741824 bytes
Fixed Size
792660 bytes
Variable Size
279963564 bytes
Database Buffers
792723456 bytes
Redo Buffers
262144 bytes
Database mounted.
Database opened.
An OLAP API application - the OracleBI Spreadsheet Add-in, in this instance - can then be run, and the SQL emitted by the
application, and the OLAP API, will then be captured in a trace file in the UDUMP directory.
The trace file will contain all of the SQL generated by the application, and will typically contain a large number of SQL
statements, some of which will be the SQL issued by the OLAP API to retrieve data from the analytic workspace in response to
an OLAP query, such as the following:
=====================
PARSING IN CURSOR #42 len=2389 dep=1 uid=118 oct=3 lid=118
tim=1305539325657 hv=4029820088 ad='4a73e080'
SELECT /*+ bypass_recursive_check INDEX_COMBINE(V43) */
ALIAS_211 ALIAS_R93,
ALIAS_200 D29_T2_ET_COL_21,
ALIAS_201 D29_ALIAS_R94,
ALIAS_202 D29_T2_GID_COL_23,
CAST (NULL AS VARCHAR2(100) ) D29_T2_LEVEL_37,
CAST (NULL AS VARCHAR2(100) ) D29_T2_LEVEL_36,
CAST (NULL AS VARCHAR2(100) ) D29_T2_LEVEL_35,
CAST (NULL AS VARCHAR2(100) ) D29_T2_LEVEL_34,
ALIAS_203 D31_T1_ET_COL_39,
ALIAS_201 D31_ALIAS_R100,
ALIAS_204 D31_T1_GID_COL_41,
CAST (NULL AS VARCHAR2(100) ) D31_T1_LEVEL_53,
CAST (NULL AS VARCHAR2(100) ) D31_T1_LEVEL_52,
CAST (NULL AS VARCHAR2(100) ) D31_T1_LEVEL_51,
CAST (NULL AS VARCHAR2(100) ) D31_T1_LEVEL_50,
CAST (NULL AS VARCHAR2(100) ) D31_T1_LEVEL_49
FROM
(
SELECT
ALIAS_198 ALIAS_200,
0 ALIAS_201,
ALIAS_189 ALIAS_202,
ALIAS_190 ALIAS_203,
ALIAS_191 ALIAS_204,
ALIAS_192 ALIAS_205,
ALIAS_193 ALIAS_206,
ALIAS_194 ALIAS_207,
ALIAS_195 ALIAS_208,
ALIAS_196 ALIAS_209,
ALIAS_197 ALIAS_210,
ALIAS_187 ALIAS_211
FROM
(
SELECT
T42.MEASURE_63 ALIAS_187,
T42.ET_COL_21 ALIAS_198,
T42.GID_COL_23 ALIAS_189,
T42.ET_COL_39 ALIAS_190,
T42.GID_COL_41 ALIAS_191,
T42.ET_COL_1 ALIAS_192,
T42.GID_COL_3 ALIAS_193,
T42.ET_COL_54 ALIAS_194,
T42.GID_COL_56 ALIAS_195,
T42.ET_COL_9 ALIAS_196,
T42.GID_COL_11 ALIAS_197
FROM
(SELECT * FROM TABLE(OLAP_TABLE('GSW_AW.GSW_AW duration session',
Once you have a trace file, containing the SQL generated by the OLAP API, you can format it using TKPROF to review the
individual SQL statements, the explain plans, details of wait events, and profile the SQL in terms of total contribution to
response time.
You can also generate a log file of the OLAP DML commands executed during an OLAP session by using the OLAP DML
debugoutfile command. For example, in SQL*Plus, you could use the commands:
SQL> exec dbms_aw.execute('aw attach my_aw ro');
SQL> exec dbms_aw.execute('debugoutfile
my_diralias/my_filename.txt');
SQL> select . . . from view_using_olap_table where . .
SQL> exec dbms_aw.execute('dotf eof');
If you are using the OLAP Worksheet to directly enter OLAP DML, you could use the commands:
>aw attach my_aw ro
>debugoutfile my_diralias/my_filename.txt
>call 'dml program'
>dotf eof
Type
----------------NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
NUMBER
VARCHAR2(64)
VARCHAR2(64)
NUMBER
NUMBER
NUMBER
and the columns that you will normally need to monitor are POOL_HITS and POOL_MISSES, which represents the number
of times data is found in the OLAP pool (sized using the OLAP_PAGE_POOL_SIZE parameter, which if set to 0 is managed
using ASMM or if set to a positive value, is sized manually) and the number of times data is not found in the OLAP pool; a
high ratio of misses to hits could indicate that your data is not grouped close enough together (your fastest varying dimension
is not at the top of the dimension list) and additional pages of data then have to be retrieved into the cache.
Other V$ views that you might want to monitor include:
V$AW_LONGOPS, which provides dynamic information (status, rows processed, start time) on cursors
opened in OLAP DML.
V$AW_SESSION_INFO, which provides dynamic statistics on the current OLAP sessions that are
connected (current and previous OLAP DML statement, total transactions executed, average time for
transaction and so on).
V$AW_OLAP, which provides a record of active sessions and their use of analytic workspaces - the
number of LOB reads, use of temporary segments, degree to which OLAP pool pages have been
modified.
Finally, you will have noticed in one of the earlier SQL trace file outputs that the OLAP API sometimes uses the SQL Model
clause to boost query performance. The SQL Model clause was introduced with Oracle Database 10g and is normally used to
generate spreadsheet-like models from the output of SELECT statements, and works by generating in-memory hash tables to
hold and process the model; the mechanism that builds, populates and then outputs data from these hash tables is however
OLAP-aware, and when dealing with data sourced from an analytic workspace (using the OLAP_TABLE function) bypasses
the normal object layer and returns results directly to the calling application. Small queries may show no improvement but if
you are pumping tens of thousands of rows through OLAP_TABLE, SQL MODEL can provide a significant performance
boost.
This performance optimisation is particularly welcome as the 10.1.0.4 and 10.1.0.5 releases of Oracle OLAP contain a bug that
forces the OLAP_TABLE row buffer to be non-paged; this bug has been fixed in the 10.2 release of Oracle OLAP but these
earlier releases try to retrieve all of the data from the OLAP_TABLE query before processing it, slowing down queries
significantly and using up all of the available memory.
When used in a SELECT statement that uses OLAP_TABLE to query an Analytic Workspace, the MODEL clause has the
following arguments (taken from the 10.1.0.4 OLAP Application Developers Guide):
DIMENSION BY
This should specify the names of the embedded total dimension columns, as defined in the limit map.
Any other columns in the DIMENSION BY list disables this optimization. A properly constructed
SELECT statement still executes, but more slowly.
MEASURES
The measures, attributes and any other columns excluded from the DIMENSION BY list.
UPDATE indicates that you are not adding any custom members in the DIMENSION BY clause. Be sure to include this
keyword, because otherwise the SQL WHERE clauses for measures are discarded, which can significantly degrade
performance.
SEQUENTIAL ORDER prevents Oracle from evaluating the rules to ascertain their dependencies.
Note that while the MODEL clause is used in relational queries for inter-row calculations, you should not use it for this
purpose with OLAP_TABLE. For OLAP_TABLE, the MODEL clause is used only to optimize the query.
Conclusions
Oracle OLAP, though based on Express Server technology, is new as an option to the Oracle RDBMS and as time progresses
techniques and approaches are being developed to optimise data loads, aggregation and user queries. Part 1 and 2 of this article
set out some tips and best practices for designing your cube, loading and aggregating data, and optimising queries and the
interface layer between your OLAP data and your chosen query tools. As adoption of Oracle OLAP continues, more techniques
and best practices will be documented and I would be more than interested in hearing any feedback or approaches that readers
have used.
-Mark Rittman is a Certified Oracle Professional DBA and works for SolstonePlus as a consultant on Oracle BI and Data
Warehousing projects. Mark also chairs the UK Oracle User Group BI & Reporting Tools SIG and is an Oracle ACE.
Mark would also like to thank Heiko Becker, Chris Chiappa, Jameson White and Anthony Waite for their contributions to and
technical review of this article.
Contribuyentes: Mark Rittman
ltima modificacin 2006-02-08 02:45 PM