• Embed Doc
  • Readcast
  • Collections
  • CommentGo Back
 
 
Solving the Source Data ProblemWith Automated Data Profiling
Presented by:Harte-Hanks Trillium Software
 
2
White Paper:Solving the Source Data Problem with Automated Data Profiling
The Data Integration and Migration Dilemma
Data ProfilingKeeps DataIntegration andMigration Projectson the Fast Track 
Companies often view data integration and/or migration as an unavoidableand rather unexciting step in building the right environment for a newenterprise application. But these projects are often far more complicated—and expensive—than companies anticipate. They may result in creatingadditional data problems and costing far more than was originallybudgeted.Why does this happen? Industry analysts agree that companies often makemistakes in initial assumptions, both about the integrity of the source dataand what an ETL (Extract, Transform, Load) tool can do about it.
Companies are finding that they can streamline data integration rojects and save large amounts of time and expense by using an advanced data profiling tool during the preliminary stage of their projects.
Costly assumptions 
The reason companies assume the data is fine is simple and easy tounderstand: The data works pretty well in the original source systems.Therefore, the assumption goes, it should work in target systems. Data insource systems, however, may serve one specific purpose while themigrated data is supposed to serve another or multiple purposes, withdifferent data quality requirements. Thus it is important to assess whetherthe data to be migrated is fit for the new purpose it will be used for.Furthermore, source systems themselves often contain special routines forcorrecting their own data. These routines have been written over time tosolve specific data anomalies in particular applications, but they are lostduring integration or migration. As a result, data may not load properly,compounding inaccuracies, time and cost overruns, and, in extreme cases,late-stage project cancellations. Understanding the data within sourcesystems, including all its inconsistencies and anomalies, is thus a criticalstep in any integration project.
Large investments forenterprise applicationinitiatives regularly fail toyield the expected resultsbecause of lack of attention to data qualityissues and correspondinglow adoption rates.” 
 Andreas Bitterer, Gartner Another source of additional data problems arises from the fact that datamigration projects nearly always involve combining data from more thanone system. Data records that may work relatively well within a singlesystem, when combined with other sets of equally questionable quality,produce massively inconsistent, cryptic results, based on divergentpurposes, and consequently different data structures, definitions, and setsof content.
ETL tools: What’s possible, what’s not 
ETL tools offer great connectivity into disparate applications and they makeit easy to map information from one system to another and handlemetadata, but they fall short when it comes to many tasks, such aschecking for continuity, profiling the integrity of the information, andidentifying misfielded data, anomalies, and other inconsistencies. This typeof analysis and intelligent transformation is crucial, researchers point out,but it is often overlooked, largely because organizations assume that thesource data does not need to be fixed.
 ©Copyright 2007 TRILLIUM SOFTWARE www.trilliumsoftware.comUS: (978) 436-8900 trlinfo@trilliumsoftware.comEMEA: +44 (0) 118 940 7666trillium.uk@trilliumsoftware.com 
 
3
White Paper:Solving the Source Data Problem with Automated Data Profiling
 Data contained in legacy applications may not fit precisely the intended purposes for a newenterprise application. To avoid costly rework after data is integrated, advanced data profiling tools uncover problems— inconsistencies, inaccuracies, and incomplete data fields, rows, or columns—early on, so that they can be addressed prior to migration and inconjunction with ETL conversions.
 
Common Techniques for Analyzing Your Data
Data profiling tools canreduce the time it takes toanalyze data by 90% ormore over manualmethods.
Companies can basically use one of three options when they want toanalyze source data before integrating it or migrating it to a new system:
Review the data manually
Write SQL scripts to profile the data
Use a data profiling tool
 An analysis of these options shows their effectiveness for large-scaleintegration and migration projects that enterprises need today.
Manual review of data 
The old-fashioned way of solving data quality problems with source datawas to hire a team of database workers to manually analyze the data.However, especially with today’s large databases, this can takes years,several people, and a big budget. What’s more, data changes constantly.Customers get married, move, buy new services or products, divorce,move again, and so on. Product information can be equally volatile,frequently making 30% or more of product information inconsistent orincorrect. A long-term manual process can never keep pace with suchchange and thus can never completely rectify the data.
 ©Copyright 2007 TRILLIUM SOFTWARE www.trilliumsoftware.comUS: (978) 436-8900 trlinfo@trilliumsoftware.comEMEA: +44 (0) 118 940 7666trillium.uk@trilliumsoftware.com 
of 00

Leave a Comment

You must be to leave a comment.
Submit
Characters: ...
You must be to leave a comment.
Submit
Characters: ...