You are on page 1of 46

Deploying ODI 11g in the Enterprise

Mark Rittman, Technical Director, Rittman Mead


BIWA Summit 2013, San Francisco, January 2013
T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

About the Speaker

Mark Rittman, Co-Founder of Rittman Mead


Oracle ACE Director, specialising in Oracle BI&DW
14 Years Experience with Oracle Technology
Regular columnist for Oracle Magazine
Author of two Oracle Press Oracle BI books
Oracle Business Intelligence Developers Guide
Oracle Exalytics Revealed
Writer for Rittman Mead Blog :
http://www.rittmanmead.com/blog
Email : mark.rittman@rittmanmead.com
Twitter : @markrittman

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

About Rittman Mead

Oracle BI and DW platinum partner


World leading specialist partner for technical excellence, solutions delivery and innovation in Oracle BI
Approximately 50 consultants worldwide
All expert in Oracle BI and DW
Offices in US (Atlanta), Europe, Australia and India
Skills in broad range of supporting Oracle tools:
OBIEE
OBIA
ODIEE
Essbase, Oracle OLAP
GoldenGate
Exadata
Endeca

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Oracle Data Integrator 11g

Oracles strategic data integration tool, originally came via the Sunopsis acquisition (2006)
Java architecture, part of the wider Oracle Data Integration Suite, and Oracle Fusion Middleware
Heterogenous database and source/target support
Long-term successor to OWB,
most common ETL tool on new projects now
Commonly used alongside OBIEE, Essbase
and Oracle RDBMS for BI/DW projects

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Oracle Data Integrator 11g Key Features

Same philosophy as OWB and Oracle RDBMS DB as the ETL engine


Declarative design - separates logic from implementation

Business rules define what goes where,


and using which transformation rules
Technical implementation defines
how data is moved
Built for SOA environments
Support for Web Services, EII etc
Supports batch, event-based and real-time integration
Extensible through Knowledge Modules
Change Data Capture
Slowly Changing Dimensions
Bulk load
Java client application with server elements (agents)

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Part of the Wider Oracle Data Integration Suite

Oracle Data Integrator for large-scale data integration across heterogenous sources and targets
Oracle GoldenGate for heterogeneous data replication and changed data capture
Oracle Enterprise Data Quality for data profiling and cleansing
Oracle Data Services Integrator
for SOA message-based
data federation

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Part of Oracle Fusion Middleware 11g

Oracle complete set of middleware servers and technologies


Based around Java, SOA, Oracle WebLogic Server and non-Java technologies
Foundation for Oracles applications and platforms such as Oracle Fusion Applications

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Deploying ODI within an Enterprise

As ODI becomes more mainstream, and data integration more mission-critical, ODI needed to evolve
Data warehousing and BI projects dont just access (Oracle) relational sources and targets any more
Data quality requires more thought than just ad-hoc corrections and filtering
ODI needs to participate in modern software development techniques such as continuous integration
Its no longer acceptable for ODI jobs to fail, and be unavailable all day or weekend
The stakes are raised - can ODI deliver?

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Loading More than a Data Warehouse, Accessing More than Oracle


RDBMS

Most of us know ODI through its ability to load Oracle data warehouses
Data typically sourced from Oracle databases, files, maybe the odd non-Oracle RBDMS source
Enterprises now work with many and varied data sources and applications, such as
Multidimensional servers such as Oracle Essbase, and associated EPM apps
XML sources, and JMS queues
SOA environments, using messaging
and service buses, typically in real-time
More recently - big data sources such as
Hadoop clusters, NoSQL databases

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Working with Essbase data, and Hyperion Planning

ODI11g is the strategic, long-term DI tool for Essbase


and associated EPM applications
IKMs and LKMs for loading, and extracting from,
Essbase databases and EPM metadata stores
Data models for Essbase databases represented as tables,
columns, the same as with other data sourcres
Data loads via rules files, Essbase / Planning / HFM APIs
However ... not really Essbase native, learning curve for admins
Good sources of ODI + Essbase/EPM Suite information:
http://john-goodwin.blogspot.co.uk
Cameron Lackpour OOW2012 Presentation
Slay the Bad Data in Essbase with ODI
http://tinyurl.com/lackpour-odi
Rittman Mead Blog

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Support for SOA Environments, and Messaging

ODI has technology adapters and features for many SOA, queue and messaging-type technologies
JMS Queue, JMS Topic (plain message or XML), SOAP messages via Web Services etc
Main role for ODI in SOA enviroments is bulk-data movement, invoked by web service calls
Regular inter-service messaging for low volume, switching to ODI for high-volume
Web services provided by runtime agents

Start, monitor, stop and restart scenarios


Start, monitor, stop and restart load pans
Public introspection web service
List contexts
List scenarios
Requires deployment in Java EE container
Call from BPEL or any other standard process

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Example: ODI 11g for Bulk Data Handling in an Orders Process

1.Large file arrives, detected by BPEL file


2.Execution starts (BPEL / ESB) - and a step for
transforming a large document payload occurs
3.Pass XML payload, by reference, to ODI
4.ODI loads payload

5.
6.
7.
8.

ODI transforms payload


ODI sends payload wherever instructed
ODI notifies BPEL / ESB that job has completed
Core BPEL / ESB processing completes

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Oracle Data Services Integrator - A Data Federation Alternative to ODI in


SOA

ODI is essentially a batch-orientated DI tool, though batches can be micro-batches (and event-driven)
ODI moves and transforms data, loading it into a central, integrated location
In some cases though, you may wish to take a different approach

Data federation vs. integration - read and transform data in-place


Data read and integrated on-demand, as a service
Approach could be preferable for many reasons
Security rules dont allow data to be replicated
Development is dynamic, sources frequently
added or changed
Data volumes dont warrant a full ETL solution
Data format is inherently nested and
does not easily map onto relational model

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

ODI, ODSI, Golden Gate and OEDQ in a SOA Environment

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Big Data, Hadoop and Unstructured Data Sources

Big data is the hot topic in BI, DW and Analytics circles


The ability to harness vast datasets, at a highly-granular level, by harnessing massively-parallel computing
Crunching loosely-structured and modelled datasets using simple algorithms: Map (project) + Reduce (agg)
Largely based around open-source projects, non-relational technologies

Apache Hadoop
MapReduce
Hadoop Distributed File System
Apache Hive, Sqoop, HBase etc
Emerging commercial vendors
Cloudera
Hortonworks etc
Can be used standalone, or linked to an
enterprise DW/BI architecture

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

ODI as Part of Oracles Big Data Strategy

ODI is the data integration tool for extracting data from Hadoop/MapReduce, and loading
into Oracle Big Data Appliance, Oracle Exadata and Oracle Exalytics
Oracle Application Adaptor for Hadoop provides required data adapters

Load data into Hadoop from local filesystem,


or HDFS (Hadoop clustered FS)
Read data from Hadoop/MapReduce using
Apache Hive (JDBC) and HiveQL, load
into Oracle RDBMS using
Oracle Loader for Hadoop
Supported by Oracles Engineered Systems
Exadata
Exalytics
Big Data Appliance (w/Cloudera Hadoop Distrib)

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

How ODI Accesses Hadoop and MapReduce

ODI accesses data in Hadoop clusters through Apache Hive

Metadata and query layer over MapReduce


Provides SQL-like language (HiveQL) and a
metadata store (data dictionary)
Provides a means to define tables, into which file
data is loaded, and then queried via MapReduce
Accessed via Hive JDBC driver
(separate Hadoop install required
on ODI server, for client libs)
Additional access through
Oracle Direct Connector for HDFS
and Oracle Loader for Hadoop

Hadoop Cluster
MapReduce

Hive Server
HiveQL

Oracle RDBMS
ODI 11g

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Direct-path loads using


Oracle Loader for Hadoop,
transformation logic in
MapReduce

Running a MapReduce / Hive Job in ODI

Data is extracted and loaded using regular interfaces


LKMs and IKMs generate HiveQL queries
Functionally identical to RDBMS access/loading

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Oracle Loader for Hadoop

Oracle technology for accessing Hadoop data, and loading it into an Oracle database
Pushes data transformation, heavy lifting to the Hadoop cluster, using MapReduce
Direct-path loads into Oracle Database, partitioned and non-partitioned
Online and offline loads
Key technology for fast load of
Hadoop results into Oracle DB

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Profiling Data, and Managing Data Quality Issues

ODI has built-in capabilities for defining data rules, data firewalls
Static controls, Flow controls, constraints etc
But what if you dont know what issues your data actually has?
What if you need to profile, deduplicate, merge or otherwise manage your data?
This is almost a topic in itself...

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Oracle Enterprise Data Quality

Data profiling, auditing and cleansing based on the industry-leading Datanomic platform
Integration with Oracle Data Integrator for a complex data management solution

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Oracle EDQ Features Relevant to ODI 11g

Ability to profile data from many sources


(file, RDBMS, JDNI, XML, MS Office)
Create data quality cases, track and assign to owner
Cleanse, transform, parse and match incoming data
via a palette of operators (processors)
Batch or real-time operation
All-Java architecture, thin-client and
runs in WebLogic Server
Replaces previous Trillium-based OEM solution
(but extra-cost option, as was Trillium solution)

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

ODI 11g Integration with Oracle EDQ

Limited integration at present, but Datanomic only just acquired


Can run in same WLS domain, environment
EDQ result schema can be on same DB as ODI staging area
EDQ processes can be executed from ODI package or load plan
using EDQ Open Tool

Connection details to EDQ


Server, and details of job

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Participation in Large-Scale Enterprise Projects, and DevOps

As ODI and data integration becomes more integral to enterprises, expectations rise
ODI project elements, and executable code, needs to
go into source control
Build systems need to be able to include ODI functionality
in their releases
Development Operations (DevOps) systems need
to be able to spin-up ODI environments automatically
Ideas such as continuous integration and smoke testing
can also apply to ODI projects
ODI topologies need to be flexible enough to
deal with DEV/PROD network & responsibility separations

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Typical ODI Repository Topology : DEV, TEST and PROD

Typical enterprise customers deploy all non-PROD environments


on their own network, isolated from the main production systems
This stops you having a single master repository for
all ODI work repositories
Good practice is to have all non-DEV environments use

execution work repositories


Only allows load plans and scenarios to be imported
Can only run existing code, not alter or change code
Challenge is how you deploy code without DEV assistance
Requires command-line tools
Requires scripting
Requires an API?

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Accessing ODI 11g Admin Features from the Command-Line

ODIs admin functions are available through ODI Tools

Run from the command-line, from an ODI procedure, or other methods


Scriptable using the startcmd.bat|sh utility
Run from the agent home directory, connects to
master and work repositories
The key to automating the deployment and administration
of ODI projects and environments

cdc:\oracle\product\11.1.1\Oracle_ODI_2\oracledi\agent\bin
startcmd.bat OdiImportObject -FILE_NAME=c:\Test_Build_Files\
SCEN_LOAD_PROD_DIM_Version_001.xml
-WORK_REP_NAME=PROD_EXECREP
-IMPORT_MODE=INSERT_UPDATE
T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Continuous Integration and Smoke Testing using ODI

For complex, multi-developer projects, continuous integration is a good practice


Continously taking shipped code and testing it in a smoke test environment
Identifies changes that break the build early
Use a suite of regression tests that run the code
with optimal coverage, end-to-end ETL runs
Gives you confidence that a release shipped into test
will actually compile, deploy and pass functional tests
Enables more agile development, through having a robust
build and regression testing process that welcomes change

Security
Topology
Versioning

DEV/TEST
Master Repository

Models
Projects

Execution

Execution

CI / SMOKE TEST
Execution
Work Repository

TEST
Execution
Work Repository

Execution

DEV
Development
Work Repository

Regression Test #1
Regression Test #2
Regression Test #n

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Using Jenkins and OdiImportObject Tool for Continuous Integration

Jenkins is an open-source build automation and continuous integration tool


Supports a range of build tools including Ant, Maven, Subversion, Git etc
Use to detect new ODI export files in a given directory, and then

automatically deploy them to the CI / Smoke-Test environment


Or monitor a source-control system for new check-ins
Deploy ODI code through ODI Tools (OdiImportScen, OdiImportObject)

Security
Topology

Jenkins CI Server
with scheduler

Versioning

PROD
Master Repository

startcmd.bat OdiImportObject
-FILE_NAME = %1.xml ...

Execution

PROD
Execution
Work Repository

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Steps to Set up a Continuous Integration Environment using Jenkins

Download Jenkins from http://jenkins-ci.org


Set up a new build job, optionally integrate with SVN etc
Run ODI tools through Execute a Batch File function
Or take it further using Maven, Ant etc
Run the build process manually, to a schedule, or
on check-in of new code to the source control system
Report on stability of build, see last failure, reason for fail

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

The ODI SDK

For other automation tasks, the ODI SDK can be used to perform all functions available in ODI Studio
Java-based API analogous to OMB+ within Warehouse Builder
Script the creation of repositories & interfaces, updating of models,
registering of data sources and topologies etc
Used either within Java applications (compiled),
or interpreted using Groovy (editor now shipped with ODI)

import oracle.odi.domain.project.OdiProject;
import oracle.odi.core.persistence.transaction.support.
DefaultTransactionDefinition;
txnDef = new DefaultTransactionDefinition();
tm = odiInstance.getTransactionManager()
txnStatus = tm.getTransaction(txnDef)
project = new OdiProject("Project For Demo", "PROJECT_DEMO")
odiInstance.getTransactionalEntityManager().persist(project)
tm.commit(txnStatus)

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Making ODI ETL Processes Resilient and Highly-Available

ODI routines when deployed in the enterprise, need to be resilient, fail gracefully, be restartable
They are often considered mission critical
You need to code defensively, and anticipate #fail
.. or this.
Make your ETL
routines like this...

Not like this...

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Why do ODI ETL and Data Integration Jobs Fail?

ODI ETL processes typically fail for one of two main reasons

Reason #1 : An error in your code, unexpected data, run out of disk space etc - the process fails
Reason #2 : An agent crashes, ODI repositories goes down etc - the infrastructure fails
Most modern databases (Oracle 11g+ etc) have capabilities to recover from DB process issues
Can we make use of these within ODI packages, KMs etc?

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Enabling ETL Resumption : Resumable Space Allocation and ODI

Oracle Database 9i+ has provided resumable space allocation

When enabled, suspends INSERT operations when out of disk space, rather than fail load
Datafiles can then be extended, or new ones added
Can be incorporated into ODI KM to enable more load operations to complete

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Resumable Space Allocation in Action

Insert process becomes suspended, ODI Operator shows step as still running
Suspended operation can be detected using DBA_RESUMABLE, USER_RESUMABLE
Once more disk space added, step will resume, operation can complete

select name from dba_resumable;


NAME

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Making ETL and Data Integration Processes Restartable

Recoverability is another enterprise ETL requirement - graceful failure and ability to restart process
Can be as simple as re-running the job, but some failures may be catastrophic - how to you unwind?
Oracle RDBMS has several flashback technologies that can help

Flashback database, to a given SCN or restore point


Flashback table, etc
Example : An ETL process performs an UPDATE, then
and INSERT - if the INSERT fails, the UPDATE
stays present. Can we use FLASHBACK TABLE
to restore the table back to original state,
so the process can be restarted safely?

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Using Load Plans to Aid Restartability

Alternative to packages for sequencing interfaces and other steps


Helps organize an optimal execution schedule for a batch
Advanced sequencing capabilities

Parallel or Serial, Conditional branching


Exception handling
Complements Scenarios and Packages, does not replace them
Exception handling feature could be very useful in
restart / graceful failure scenarios
Run ODI procedure, package, to correct errors
Run commands to roll-back/flashback the
database or tables
Lets use one for our example...

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Defining a Load Plan Exception to Handle Catastrophic ETL Failures : I

Flashback table requires an SCN (System Change Number) to flashback-to


Record the current SCN before performing thre integration in a project variable
Requires SELECT privilege on V$_DATABASE

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Defining a Load Plan Exception to Handle Catastrophic ETL Failures : II

Load plan will define an exception, to be raised if the final INSERT operation fails
Exception will call an ODI Procedure that runs the FLASHBACK TABLE command, using the saved SCN

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Defining a Load Plan Exception to Handle Catastrophic ETL Failures : III

Now, when the INSERT step fails due to an error, the UPDATE
is rolled-back as well through the FLASHBACK TABLE feature
Table restored to state at original recorded SCN

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Agent and ODI Infrastructure Failure

Enterprises typically deploy ODI using standalone agents, in a parent/child load-balancing configuration
Repository database has regular backups, or ideally uses DataGuard / log-shipping
Scheduled jobs assigned to the parent, master runtime agent
Jobs then delegated to the child agents,
that then do the work based on load factor, availability
But what if the parent agent goes down?
What about the schedule?

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Using OPMN To Manage, and Restart, Standalone Agents

OPMN (Oracle Process Manager and Notification Server) can be installed to manage standalone agents
Not part of the base install or license, but you probably have it somewhere
Standalone agents then run, stopped, restarted and monitored using OPMN server
Ensures that failed agents are restarted, including
the parent agent for load balancing

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Deploying Agents within WebLogic Server - New with ODI 11g

Runtime agents can now be deployed in WebLogic Server managed servers (requires WebLogic Server license)
Benefit from WebLogic clustering, Enterprise Manager (+ODI Console), more resilient JVM
Better for high-availability protects the scheduler
how?

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

How JEE Agents, WebLogic and Coherence Protect Against Agent Failure

Hardware load balancer provides the load-balancing


Agents are all equal - one elects to be the scheduler on
cluster start, another takes over if that one crashes
Oracle Coherence cache grid holds details of the schedule,
available to all nodes in the cluster
WebLogic Server clustering restarts failed managed servers,
and Java processes (JEE runtime agents)
However ... more complex setup, extra license cost, and

may not be necessary if external scheduler used instead


Still benefits from running agents in production JVM though
And you get Enterprise Manager, ODI Console etc

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Further Reading - ODI11g in the Enterprise series on Rittman Mead Blog

Five-part series on the Rittman Mead Blog: ODI 11g in the Enterprise

Part 1: Beyond Data Warehouse Table Loading


Part 2 : Data Integration using Essbase, Messaging, and Big Data Sources and Targets
Part 3: Data Quality and Data Profiling using Oracle EDQ
Part 4: Build Automation and Devops using the ODI SDK, Groovy and ODI Tools
Part 5: ETL Resilience and High-Availability

http://www.rittmanmead.com/2012/12/

odi11g-in-the-enterprise-part-1-beyond-datawarehouse-table-loading/

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Thank You for Attending!


Thank you for attending this presentation, and more information can be found at http://www.rittmanmead.com
Contact us at info@rittmanmead.com or mark.rittman@rittmanmead.com
Look out for our book, Oracle Business Intelligence Developers Guide out now!
Follow-us on Twitter (@rittmanmead) or Facebook (facebook.com/rittmanmead)

T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

Deploying OBIEE 11g in the Enterprise


Mark Rittman, Technical Director, Rittman Mead
UKOUG Conference & Exhibition, Birmingham December 2012
T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com

You might also like