Professional Documents
Culture Documents
Oracles strategic data integration tool, originally came via the Sunopsis acquisition (2006)
Java architecture, part of the wider Oracle Data Integration Suite, and Oracle Fusion Middleware
Heterogenous database and source/target support
Long-term successor to OWB,
most common ETL tool on new projects now
Commonly used alongside OBIEE, Essbase
and Oracle RDBMS for BI/DW projects
Oracle Data Integrator for large-scale data integration across heterogenous sources and targets
Oracle GoldenGate for heterogeneous data replication and changed data capture
Oracle Enterprise Data Quality for data profiling and cleansing
Oracle Data Services Integrator
for SOA message-based
data federation
As ODI becomes more mainstream, and data integration more mission-critical, ODI needed to evolve
Data warehousing and BI projects dont just access (Oracle) relational sources and targets any more
Data quality requires more thought than just ad-hoc corrections and filtering
ODI needs to participate in modern software development techniques such as continuous integration
Its no longer acceptable for ODI jobs to fail, and be unavailable all day or weekend
The stakes are raised - can ODI deliver?
Most of us know ODI through its ability to load Oracle data warehouses
Data typically sourced from Oracle databases, files, maybe the odd non-Oracle RBDMS source
Enterprises now work with many and varied data sources and applications, such as
Multidimensional servers such as Oracle Essbase, and associated EPM apps
XML sources, and JMS queues
SOA environments, using messaging
and service buses, typically in real-time
More recently - big data sources such as
Hadoop clusters, NoSQL databases
ODI has technology adapters and features for many SOA, queue and messaging-type technologies
JMS Queue, JMS Topic (plain message or XML), SOAP messages via Web Services etc
Main role for ODI in SOA enviroments is bulk-data movement, invoked by web service calls
Regular inter-service messaging for low volume, switching to ODI for high-volume
Web services provided by runtime agents
5.
6.
7.
8.
ODI is essentially a batch-orientated DI tool, though batches can be micro-batches (and event-driven)
ODI moves and transforms data, loading it into a central, integrated location
In some cases though, you may wish to take a different approach
Apache Hadoop
MapReduce
Hadoop Distributed File System
Apache Hive, Sqoop, HBase etc
Emerging commercial vendors
Cloudera
Hortonworks etc
Can be used standalone, or linked to an
enterprise DW/BI architecture
ODI is the data integration tool for extracting data from Hadoop/MapReduce, and loading
into Oracle Big Data Appliance, Oracle Exadata and Oracle Exalytics
Oracle Application Adaptor for Hadoop provides required data adapters
Hadoop Cluster
MapReduce
Hive Server
HiveQL
Oracle RDBMS
ODI 11g
Oracle technology for accessing Hadoop data, and loading it into an Oracle database
Pushes data transformation, heavy lifting to the Hadoop cluster, using MapReduce
Direct-path loads into Oracle Database, partitioned and non-partitioned
Online and offline loads
Key technology for fast load of
Hadoop results into Oracle DB
ODI has built-in capabilities for defining data rules, data firewalls
Static controls, Flow controls, constraints etc
But what if you dont know what issues your data actually has?
What if you need to profile, deduplicate, merge or otherwise manage your data?
This is almost a topic in itself...
Data profiling, auditing and cleansing based on the industry-leading Datanomic platform
Integration with Oracle Data Integrator for a complex data management solution
As ODI and data integration becomes more integral to enterprises, expectations rise
ODI project elements, and executable code, needs to
go into source control
Build systems need to be able to include ODI functionality
in their releases
Development Operations (DevOps) systems need
to be able to spin-up ODI environments automatically
Ideas such as continuous integration and smoke testing
can also apply to ODI projects
ODI topologies need to be flexible enough to
deal with DEV/PROD network & responsibility separations
cdc:\oracle\product\11.1.1\Oracle_ODI_2\oracledi\agent\bin
startcmd.bat OdiImportObject -FILE_NAME=c:\Test_Build_Files\
SCEN_LOAD_PROD_DIM_Version_001.xml
-WORK_REP_NAME=PROD_EXECREP
-IMPORT_MODE=INSERT_UPDATE
T : +44 (0) 8446 697 995 E : enquiries@rittmanmead.com W: www.rittmanmead.com
Security
Topology
Versioning
DEV/TEST
Master Repository
Models
Projects
Execution
Execution
CI / SMOKE TEST
Execution
Work Repository
TEST
Execution
Work Repository
Execution
DEV
Development
Work Repository
Regression Test #1
Regression Test #2
Regression Test #n
Security
Topology
Jenkins CI Server
with scheduler
Versioning
PROD
Master Repository
startcmd.bat OdiImportObject
-FILE_NAME = %1.xml ...
Execution
PROD
Execution
Work Repository
For other automation tasks, the ODI SDK can be used to perform all functions available in ODI Studio
Java-based API analogous to OMB+ within Warehouse Builder
Script the creation of repositories & interfaces, updating of models,
registering of data sources and topologies etc
Used either within Java applications (compiled),
or interpreted using Groovy (editor now shipped with ODI)
import oracle.odi.domain.project.OdiProject;
import oracle.odi.core.persistence.transaction.support.
DefaultTransactionDefinition;
txnDef = new DefaultTransactionDefinition();
tm = odiInstance.getTransactionManager()
txnStatus = tm.getTransaction(txnDef)
project = new OdiProject("Project For Demo", "PROJECT_DEMO")
odiInstance.getTransactionalEntityManager().persist(project)
tm.commit(txnStatus)
ODI routines when deployed in the enterprise, need to be resilient, fail gracefully, be restartable
They are often considered mission critical
You need to code defensively, and anticipate #fail
.. or this.
Make your ETL
routines like this...
ODI ETL processes typically fail for one of two main reasons
Reason #1 : An error in your code, unexpected data, run out of disk space etc - the process fails
Reason #2 : An agent crashes, ODI repositories goes down etc - the infrastructure fails
Most modern databases (Oracle 11g+ etc) have capabilities to recover from DB process issues
Can we make use of these within ODI packages, KMs etc?
When enabled, suspends INSERT operations when out of disk space, rather than fail load
Datafiles can then be extended, or new ones added
Can be incorporated into ODI KM to enable more load operations to complete
Insert process becomes suspended, ODI Operator shows step as still running
Suspended operation can be detected using DBA_RESUMABLE, USER_RESUMABLE
Once more disk space added, step will resume, operation can complete
Recoverability is another enterprise ETL requirement - graceful failure and ability to restart process
Can be as simple as re-running the job, but some failures may be catastrophic - how to you unwind?
Oracle RDBMS has several flashback technologies that can help
Load plan will define an exception, to be raised if the final INSERT operation fails
Exception will call an ODI Procedure that runs the FLASHBACK TABLE command, using the saved SCN
Now, when the INSERT step fails due to an error, the UPDATE
is rolled-back as well through the FLASHBACK TABLE feature
Table restored to state at original recorded SCN
Enterprises typically deploy ODI using standalone agents, in a parent/child load-balancing configuration
Repository database has regular backups, or ideally uses DataGuard / log-shipping
Scheduled jobs assigned to the parent, master runtime agent
Jobs then delegated to the child agents,
that then do the work based on load factor, availability
But what if the parent agent goes down?
What about the schedule?
OPMN (Oracle Process Manager and Notification Server) can be installed to manage standalone agents
Not part of the base install or license, but you probably have it somewhere
Standalone agents then run, stopped, restarted and monitored using OPMN server
Ensures that failed agents are restarted, including
the parent agent for load balancing
Runtime agents can now be deployed in WebLogic Server managed servers (requires WebLogic Server license)
Benefit from WebLogic clustering, Enterprise Manager (+ODI Console), more resilient JVM
Better for high-availability protects the scheduler
how?
How JEE Agents, WebLogic and Coherence Protect Against Agent Failure
Five-part series on the Rittman Mead Blog: ODI 11g in the Enterprise
http://www.rittmanmead.com/2012/12/
odi11g-in-the-enterprise-part-1-beyond-datawarehouse-table-loading/