P. 1
Beyond ETL and Data Warehousing - InfoManagement Direct Article

Beyond ETL and Data Warehousing - InfoManagement Direct Article

|Views: 1|Likes:
Published by rupeshvin

More info:

Published by: rupeshvin on Mar 13, 2013
Copyright:Attribution Non-commercial


Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less






Beyond ETL and Data Warehousing - InfoManagement Direct Article

Tuesday, March 5, 2013


integration data & content tools & enablers strategy & governance infrastructure blogs multimedia web seminars & white papers resource center
www.information-management.com/infodirect/2009_109/-10014988-1.html?zkPrintable=1&nopagination=1 1/19

2013 At HIMSS. Starting with Social Media Analytics March 4. 2013 Web Seminars Implementing a Data Quality Strategy Available On Demand Beyond ETL and Data Warehousing Print Reprints Email Tw eet Share Like 0 0 Data integration suffers from an image problem. 2009 1:13pm ET Related Links Realizing the ROI of Social Media March 5.InfoManagement Direct Article By Rick Sherman FEB 19.com/infodirect/2009_109/-10014988-1.3/6/13 Beyond ETL and Data Warehousing . and they greatly inhibit enterprises in their attempt to integrate www. It has become synonymous with extract. Likewise. Health Care’s New Connections Emerge February 28.html?zkPrintable=1&nopagination=1 2/19 .information-management. ETL has been regarded as a data warehousing technology. transform and load. Both of these viewpoints fail to reflect current capabilities. 2013 Exploring Big Data in Small Steps.

that’s because that’s exactly what it is (see Figure 1). consistent. Because of this short-sightedness. The extended processes include data profiling.html?zkPrintable=1&nopagination=1 3/19 . companies have lost opportunities to harness information as a corporate asset.3/6/13 Beyond ETL and Data Warehousing . include prebuilt transformations from the elementary tasks of converting data and performing lookups to the more complex processes of change data capture and slowly changing dimensions. It increases the cost of integrating data.information-management. These prebuilt transformations greatly enhance developer productivity and improve the consistency of results. This creates the ability to determine the state of the source systems. clean and current information.com/infodirect/2009_109/-10014988-1. transform it and put it into its target location. If that sounds like ETL. ensure consistency and manage all the processing. These extensions are necessary to turn data into comprehensive. More than Simply ETL Tasks The basic tasks required in data integration are to gather data. for example. encourages the creation of data silos and forces businesspeople to spend an inordinate amount of time filling the information gaps themselves through data shadow systems or reconciling data. including error handling and performance monitoring.InfoManagement Direct Article data to provide the information their business needs. ETL tools. Data integration tools offer many significant processes and technologies that extend beyond the basic ETL tasks (see Figure 2). ETL tools have automated these tasks and empowered developers with a toolkit beyond what they could have easily hand coded themselves. www. data quality and operational processing. perform cleansing.

The good news is that data integration vendors that now combine all of the above technologies into data integration suites have emerged from the ETL www. With data integration being associated with batch-driven ETL processes that load a data warehouse (DW). The results have been integration silos built with different technologies. or in real time from business intelligence applications. EII and SOA. These technologies included EAI. producing inconsistent business information and generally with data integration built as an afterthought. integration initiatives that did not involve a DW sought out other technologies.InfoManagement Direct Article Data integration suites have expanded to incorporate enterprise application integration. Although each of these technologies has fundamental applications.3/6/13 Beyond ETL and Data Warehousing . More than Batch Processes There are many integration initiatives in an enterprise. enterprise information integration and service-oriented architecture coupled with ETL to offer data integration in batch.com/infodirect/2009_109/-10014988-1.information-management.html?zkPrintable=1&nopagination=1 4/19 . the reality is that organizations had to reinvent the wheel for every data integration task. interoperating with applications.

. they did not have to deploy a cadre of coding gurus but could leverage the data integration developers they already employed. they realized that they would be much more productive at data migrations and application consolidation projects if they used these same data integration tools. EAI. such as customers and products. What got lost in many of the initial implementations was that these applications relied heavily on data integration and that it made sense to leverage a company’s existing data integration platform to create MDM.html?zkPrintable=1&nopagination=1 5/19 . data integration tools enabled the SIs to reuse code. these projects were seen as one-offs and typically hand coded. These suites enable an enterprise to integrate data in one consistent manner. data integration has moved beyond data warehousing to include other integration initiatives in an enterprise. CDI and PIM solutions.InfoManagement Direct Article ranks.e.com/infodirect/2009_109/-10014988-1. Even though they are one-offs. despite the fact that their use is a best practice. lack www. Application consolidation. The initial wave of technology solutions bundled a set of tools and applications that were business process or industry specific. CDI and PIM all deal with conforming and maintaining master data or reference data for data subjects. Operational and real-time BI and Master data management. enabling operational or real-time BI with the same data integration and BI tools as used in DW projects. ETL. these vendors are bundling this convergence is more consistent. MDM. In the past.information-management. The primary inhibitors have been cost and resources. Companies often undertake data migration or application consolidation projects because of mergers and acquisition or because they need to streamline applications. with the major enterprise application vendors also offering data integration and BI tools. customer data integration and product information management. EII or SOA) is appropriate. leverage prebuilt transformations. More than Data Warehousing With the emergence of the more powerful suites. These forces include enterprise applications built on relational databases and data integration tools no longer bound to batch ETL constraints. In addition. Moving to Pervasive Data Integration Data integration tools aren’t pervasive yet. such as: Data migration. In addition. As systems integrators became proficient in ETL tools from DW projects. better manage processes and produce documentation without a laborious manual effort. Several market forces have converged to produce the perfect storm. comprehensive and current information (business benefit) with the same data integration and BI infrastructure (IT benefit).3/6/13 Beyond ETL and Data Warehousing . yet deploy using whatever transport technology (i.

Although this means they would spend less on solutions. the inhibitors to using data integration tools are costs. Fit. data cleansing or data conforming that data warehousing requires. Often. cubes and reporting databases required by individual groups. Resources. the data integration developers are tied up in DW development and are not available for other database loading. online analytical processing cubes and other reporting databases. such as those with annual revenue between $250 million and $1. In addition. they still hand code their data marts. However. If they are aware of any data integration tools at all. Not only do corporate IT departments balk at this solution. These firms do not have the IT budget or resources of the Fortune 1000 to dedicate to data integration solutions.5 billion. www. it should not mean they are forced to hand code. more data shadow systems will arise to plug the information gap because hand coded applications will take longer and incur more maintenance costs as they age. Identifying and elevating the visibility of the extent of the data integration tasks occurring throughout an enterprise will enable IT to justify either more resources or a prioritization of all integration projects so that the enterprise DW does not consume all available resources. The downstream databases do not require the many sources. For firms smaller than the Fortune 1000. The pragmatic answer I give both groups is that without the two-tier standard approach. The pragmatic solution would be to create two corporate standards: one for enterprise-class data integration and the other for “downstream” databases such as data marts or cubes.and resource-effective ETL solutions than required for the enterprise.3/6/13 Beyond ETL and Data Warehousing . They are aware of these tools because industry analysts and publications seem to only mention the expensive ones. The issue of expense is best addressed when selecting a tool or negotiating pricing with the tool vendor. They also hear about them from employees who used to be employed by Fortune 1000 firms. they might not be aware of the tools – at least those that are in their budget. The issue of fit is political for large enterprises because the solution is to go against the grain of selecting one corporate standard. but tool vendors often think they should “own” the account and can do everything that every other tool can do. The data integration tool selected as the corporate standard may not be the best match for creating the data marts.information-management. Most of these downstream databases could be loaded by more cost. The licensing costs of these tools often inhibit more widespread use.html?zkPrintable=1&nopagination=1 6/19 . the downstream databases will continue to be hand coded with all the business and IT costs and risks associated with those applications. Licensing cost should not be the barrier it once was with many vendors offering more scalable pricing options. resources and being unaware of the breadth of the data integration market. it is most likely that the high-end tools are expensive and require highly skilled resources. Companies can address the issue of resources by forming a data integration competency center.InfoManagement Direct Article of understanding of the tool capabilities and a market unawareness of tool offerings. Although Fortune 1000-size corporations tend to use these tools to build their data warehouses. The barriers to pervasive use in these enterprises include: Expense.com/infodirect/2009_109/-10014988-1.

com/infodirect/2009_109/-10014988-1. data profiling requires writing hand-coded SQL statements and comparing query results with expected results. more importantly. Rick Sherman has more than 20 years of business intelligence and data warehousing experience. Next Steps If you work in a Fortune 1000 company. skills and costs. a data management expert at searchdatamanagement.com or (617) 835-0546. having worked on more than 50 implementations as a director/practice leader at PricewaterhouseCoopers and while managing his own firm. and data migration project. you either have to pay for high-end tools or you hand code. Data Profiling Data profiling is the examination and assessment of your source systems’ data quality. Data profiling tools. It’s even worse when. As discussed. data warehouse and business intelligence topics at The Data Doghouse. which automate the process. a DM Review World Class Solution Awards judge. data profiling should be an ongoing activity to ensure that you maintain data quality levels. In addition to teaching at industry conferences. and hand coding usually wins out. Data profiling should be established as a best practice for every data warehouse.InfoManagement Direct Article From their perspective. overwhelmed. there may be multiple products available that would be an excellent fit for a midmarket firm’s needs. training and vendor services. determine if you are hand coding your data integration processes and why.information-management. He is the founder of Athena IT Solutions. After you get your answers. skills and budget. ask yourself if you’re stuck in the ETL rut and if business groups throughout your enterprise are hand coding or. the real content of the data. integrity and consistency. worse.html?zkPrintable=1&nopagination=1 7/19 . If your company is smaller than the Fortune 1000. which can be www. The data integration tool market includes products supporting a wide range of capabilities. Sherman is a published author of over 50 articles. an industry speaker. BI.You can reach him at rsherman@athenasolutions. a Boston-based consulting firm that provides data warehouse and business intelligence consulting. it is likely to be incomplete because people generally will not hand code every permutation of a source system’s tables. assist your staff in truly understanding not just the data definitions but. it helps ensure your data integration works properly so you can avoid being surprised by bad data.3/6/13 Beyond ETL and Data Warehousing . building data shadow systems. In addition to meeting project requirements. Data warehousing and BI projects are often late or encounter data surprises when the staff hand codes data profiling. views and columns. become an advocate to make data integration more pervasive in your enterprise and unleash the information your business needs. Not only is this laborious and time-consuming. Sherman can be found blogging on performance management. Without these tools.com and has been quoted in CFO and Business Week. Sometimes called source systems analysis. Sherman offers on-site data warehouse/business intelligence training. people skip it completely.

You're Reading a Free Preview

/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->