You are on page 1of 5


Big Data integration is the big deal in Informatica 9.1
Reference Code: OI00141-026 Publication Date: June 2011 Author: Madan Sheina and Tony Baer

The highlights of Informatica's recent 9.1 platform release target Big Data integration, self-service, upgraded data quality, master data management (MDM), and data service capabilities. It provides solid functional updates to what is already a rich and ever-broadening data integration platform. The Informatica platform already supported data movements with Hadoop through partnerships with Cloudera and EMC, but the new release adds direct, bidirectional connectivity between Informatica and Hadoop, tapping an emergent use case for customers seeking the raw power of this NoSQL target. The 9.1 release also adds new connectors to social networks, supporting the increasingly popular use case of social media analytics.

Big Data challenges play directly into Informatica's integration strengths
Big Data represents the confluence of more and new/emerging types of transaction and interaction data with demands for more scalable and quicker processing of that data. The issue is not so much the size of these traditionally slioed repositories of information, but the potential for understanding the relationships between them. This is where Informatica's competencies come into play. Combining traditional structured transactional information with unstructured interaction data generated by humans and the Internet (customer records, social media) and, increasingly, machines (sensor data, call detail records) is clearly the sweet spot. These types of interaction data have traditionally been difficult to access or process using conventional BI systems. The appeal of adding these new data types is to allow enterprises to achieve a more complete view of customers, with new insights into relationships and Big Data integration is the big deal in Informatica 9.1 (OI00141-026) © Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 1

Big Data integration is the big deal in Informatica 9.1 release builds on this by adding a new PowerExchange for the Hadoop Distributed File System (HDFS) connectivity tool. EMC Greenplum. of course. A big part of Big Data will be driven by enterprises seeking to build hybrid architectures that store and integrate data residing in onpremise systems and in the cloud. and IBM Netezza. Informatica already offers connectors to popular databases such as Oracle. and HP Vertica. the ability to prepare and integrate data directly inside Hadoop environments. and target environments.1's Big Data play Informatica supports Big Data in two ways – backing both Hadoop and non-Hadoop processing platforms – and it is doing so largely through its PowerExchange family of data access products. profiling. Informatica has taken the logical first step in supporting social network integration by adding connectors for published Twitter. DB2. Hadoop. and end-to-end metadata lineage across the Informatica. Ovum believes focus on Big Data is a natural corollary to the company's last stated big growth opportunity – the Informatica cloud –as both a data source target and a platform on which to host its products. However. In May 2011 the company announced support for EMC Greenplum's distribution of the Hadoop file system.1 platform also includes a new set of connectors to various Big Data transactional systems to make it easier to meld structured transactional with largely unstructured interaction data (including social media). and related technologies is currently one of the biggest impediments to adoption of NoSQL platforms. The 9. with 9. Not surprisingly. Informatica is calling Big Data the next big growth opportunity for its business. Informatica plans to build a more robust offering that includes a graphical integrated development environment (IDE) for Hadoop. In the next release. Informatica PowerExchange provides the technical foundation for 9.behaviors from social media data. This addresses a major gap identified in the Ovum report What is Big Data: The Big Architecture: the lack of skills for Hadoop. Teradata. and quality know-how directly to Big Data sets and processing environments to enrich data sets as well as master data. and Facebook APIs.1 the first stab of many. The benefit is being able to reuse existing Informatica development skills in Hadoop environments. LinkedIn. which augments Big Data processing by moving enterprise data into Hadoop clustered environments for highly scalable parallel processing and out to targets (such as data warehouses) for consumption and analysis. MapReduce. codeless and metadata-driven development.1 (OI00141-026) © Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 2 . That. including Teradata/Aster Data. and is planning to put purpose-built advanced SQL analytic databases onto its price list. presents Informatica with an opportunity to apply its data integration. The 9.

or service-oriented architecture.0 release was to allow customers to define data quality rules that could be applied to data integration. Hence.1 release further advances integration across the platform by allowing end users to reuse the same data quality rules in the MDM environment. location. one of Informatica's biggest challenges for this release was tighter integration with the MDM technology that came from Siperian. Siperian provided a comprehensive multi-data domain solution (customer. transactional or federated via cloud. The race is on to provide a standardized set of visual Hadoop-focused tools that build around pillars such as MapReduce and access and transformation languages such as Hive and Pig. make master data entity hierarchies and relationships more visible (within the Data Director tool). data cleansing.). and MDM as a single process.1. The key benefits are better governance (which avoids having conflicting data quality rules applied across systems) and safeguarding existing investments in data quality rule standardization and skills (allowing them to be retained and transferred over to the MDM environment). Informatica is not alone in providing support for loading and accessing of data to and from Hadoop. MDM gets tightened integration with the rest of the platform As one of its more recent – and watershed – acquisitions. application ILM. That has changed in 9. event processing. The 9. coexistence. Ovum expects Informatica to eventually offer a software development kit (SDK) approach that provides flexible connectivity to broader social media data sources. such as data quality.Informatica has also enhanced its B2B Data Exchange Transformation product to make it easier to connect to other interaction data gleaned from call detail records (CDR). and low-latency messaging.1 (OI00141-026) © Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 3 . There are simply too many to do each one justice in Big Data integration is the big deal in Informatica 9. chart of accounts. and large image files (through managed file transfer). etc. Although the initial set of social media adapters are prescriptive to certain sites. product.1 encourages users to be self-sufficient This release also comes with a long list of functional upgrades across the staple tools of the Informatica suite. The leader will be the one that makes the NoSQL environment comfortable enough for the SQL developer mainstream. device/sensor data and scientific data (genomic and pharmaceutical). 9. but architecturally it was rigid. single-instance/consolidated hub. federation and virtualization. Further flexibility is enabled through added features that prevent the duplicate master data types from being created. which supports multiple MDM deployment styles – registry. data quality policies can be surfaced and reused across data profiling." Informatica's first move in the 9. and enhance registry services for quicker on-boarding and updating of metadata (primarily through messaging) and more targeted master data search techniques. data profiling. This helps organizations to deliver "authoritative and trustworthy data. analytical.

The model works by dynamically generating and comparing profiles of data as it flows through the mapping pipeline. or both. For example. This allows delivery of data from single sources to the business needs of all projects. Big Data integration is the big deal in Informatica 9. guided interface that enables business analysts and data stewards to pinpoint data using business terms. or batch ETL routines. selectively apply transform rules (including ETL and data quality) from a predefined inventory.1 is so-called adaptive data services that wrap project-specific context and intelligence into the data federation creation and delivery process. There is also a new interactive.this research note. The creation and validation of source-to-target mappings is handled through a browser-based. projects. which allows ETL developers to provide comparative profiling analysis to map certain data quality rules and logic against data profiles at early stages of the transformation pipeline in order to prevent costly errors from surfacing downstream. published web services. one common thread that stands out across many of these additional enhancements in 9. save.1 (OI00141-026) © Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 4 . and share their own transformation logic with other analysts.1 adds greater project awareness to data virtualization Another notable addition to 9. Notable functionality to support this accessibility initiative includes the introduction of so-called "proactive data quality assurance" services to identify data exceptions more quickly. 9. without necessarily having to reinvent the wheel for every project and ensure consistency.1 is a continued focus on self-service provisioning of (in Informatica parlance) "authoritative and trustworthy" data. This is based on a complex event processing (CEP)-like model. specify. validate the rules on the fly. It also enables "top-down" validation of actual versus expected data in data integration projects – which is particularly useful when upgrading applications. However. analysts can find and navigate data sources and targets using metadata such as a business glossary or data lineage trails. The Data Integration Analyst tool then automatically generates the relevant PowerCenter or Informatica Data Services (IDS) transformation mapping logic. as data integration is a complicated IT task that has traditionally been the almost exclusive preserve of skilled DBAs and developers.0 release. and embed existing ETL mapping logic and data quality rules into their specification. This workbench aims to empower non-technical users who are close to the business and arguably have better business understanding of data to define their own data integration mapping and routines without having to constantly toggle back to IT developers. self-service Data Integration Analyst workbench for data analysts and data stewards. which extends a similar capability introduced for data quality analysts in its 9. Informatica has worked hard to make its core business more accessible to a broader. This is a challenge. and preview the results of their specifications. which can be deployed as virtualized SQL views. nontechnical IT audience. define source-to-target mappings.

The facts of this report are believed to be correct at the time of publication but cannot be guaranteed. photocopying. As such Ovum can accept no liability whatever for actions taken based on any information that may subsequently prove to be incorrect. stored in a retrieval system or transmitted in any form by any means. Ovum (a subsidiary company of Datamonitor plc). Informatica call this "multiprotocol data provisioning. Big Data integration is the big deal in Informatica 9. or to PowerCenter as a batch process. mechanical." It is technically an extension of Informatica's core data services architecture. conclusions and recommendations that Ovum delivers will be based on information gathered in good faith from both primary and secondary sources. Please note that the findings. and uses SQL endpoints via ODBC or JDBC as a web service. The key benefit is governance since the multi-provisioning is based on a common logical data object and policy definitions. electronic. whose accuracy we are not always in a position to guarantee. recording or otherwise.1 (OI00141-026) © Ovum (Published 06/2011) This report is a licensed product and is not to be photocopied Page 5 . No part of this publication may be reproduced. without the prior permission of the publisher.Informatica leverages this data virtualization solution as part of the overall platform to enable physical and virtual data integration depending on business needs. APPENDIX Disclaimer All Rights Reserved.