Standardize, Validate and Improve Your Information Assets: Data Quality

Data Quality
Standardize, Validate and Improve

Your Information Assets
Prepared by:
Larry English, Information Impact International &
the DataFlux Corporation
Data Quality:
Standardize, Validate and Improve Your Information Assets
Executive summary ..........................................................................................................................3
The importance of high quality information ......................................................................................4

Defining and addressing the problem.......................................................................................5
Eliminating defects: The elements of data correction.......................................................................5

Plan and prioritize a data correction initiative ...........................................................................6
Parse data into atomic components .........................................................................................6
Standardize, correct and normalize data..................................................................................7
Verify and validate data accuracy.............................................................................................8
How to solve information quality problems: Process improvement for defect prevention................9
Overview ...................................................................................................................................9
Define a quality improvement initiative ...................................................................................10
“Plan” improvement after discovering root cause(s)...............................................................10
“Do” implement the improvement(s) in a controlled manner ..................................................11
“Check” to verify the effectiveness of the improvement .........................................................12
“Act” to put the process into control and roll it out ..................................................................12
Using information quality technology for data defect prevention............................................12
Getting started ................................................................................................................................14

The growing role of data management...................................................................................14
Figures
Figure 1. The PDCA Process Improvement Cycle .........................................................................10
Figure 2. Cause-and-Effect Diagram of a "Duplicate Customer Record" Problem ........................11

Executive summary
Current data quality problems cost U.S. businesses more than $600 billion
per year.1
The most successful manufacturers have rigorous and defined quality assurance and
improvement programs that help them build better products faster and cheaper. These
techniques are used to eliminate or minimize defects during the production and maintenance of
their products to meet their customer expectations.
But companies typically have not taken the same approach to improving the quality of its data
production and maintenance. The quality of information that serves as the cornerstone of virtually
every critical business process—customer intelligence, billing, accounting, inventory
management, product development, marketing, sales, logistics—is typically an “unknown” in most
organizations. This leaves both IT and business staffs to wonder: is the data reliable? Can you
use this data to make informed, rational decisions? More importantly, do we know how much
nonquality information costs the enterprise when it causes processes to fail and alienate your
customers?
The costs of failure can expand beyond lost customers. One insurance company had a number of
disparate systems, each with a different source of customer information, claims, policies and so
forth. To meet a state regulatory measure aimed to safeguard sensitive customer information, the
company attempted to move legacy data to a new mainframe system and combine this data into
one primary business application. Because the data was inconsistent and inaccurate across the
existing systems, loading the data into the new application was an arduous and costly process;
some pieces of the project were over 18 months behind schedule due to delays attributed to poor
data quality. Without a solution to analyze and correct data quality problems, the company faced
severe fines for non-compliance as well as immense cost overruns.
This is just one example of the business impact of bad data. To assure that you have high quality
information that can drive good, solid business processes, you must take a two-pronged
approach. These steps take different approaches to building reliable business information, but
each has the same goal: creating better, more usable data.
1. Implement a data correction process and technology solution that helps you parse,
standardize, verify and correct key pieces of data. With this functionality in place, you can
cleanse data in batch to make sure that high quality data reaches your business-critical
systems and enables your knowledge workers to take the right actions and make the
right decisions.
2. Implement a process improvement and technology solution that transforms the culture
and improves business processes to eliminate or significantly reduce defective data from
being produced in the first place. You can use the PDCA (Plan, Do, Check, Act)
approach to create better processes for creating, updating, managing and exploiting
information for competitive advantage.
The combination of these techniques ensures that you have not only the right technology to fix
problems. You also have the ability to refine and, if necessary, reinvent your organization to make
information quality a priority. This paper will cover techniques for implementing the processes and
technology necessary to correct and standardize information—and teach you a process to
improve business processes to consistently produce accurate and reliable information.
1
“Data Quality and the Bottom Line: Achieving Business Success through a Commitment to High Quality
Data.” The Data Warehousing Institute. Report Series 2002.
3
The importance of high quality information
During the 1990s, companies began to implement a wide variety of business applications, each
with a different focus, constituency and acronym: CRM, ERP, SCM. The rationale for these
implementations was that each system could improve business processes by aggregating and
controlling data specific to a certain business function. For example, a company would enact a
customer relationship management (CRM) system to build a cohesive repository of data on
customers and prospects to assist the sales and marketing groups.
By the beginning of the next decade, a typical corporation had a variety of different systems
across the enterprise. This spread of information led to a number of problems, including:
• Large “silos” of information—After years of compiling information in separate systems,

they had a disjointed network of data that was not defined consistently, not connected
and, most likely, filled with inaccurate, missing, duplicate or extraneous information.
• Unmanaged growth—The data within these systems had grown haphazardly, without a
pre-determined plan for assuring that the information was consistent, accurate and
reliable.
So, when companies tried to integrate systems across boundaries, they found their “integrated”
processes and systems often failed. They realized that these unique sources had different data
formats, conventions and business rules. And, each database contained errors—null values,
missing fields, inconsistent entries—caused by poor data quality processes coupled with deficient
procedures and training for collecting and maintaining quality information.
The result? Companies typically have problems with the quality of information that serves as the
very foundation of their primary business applications. Inaccurate or inconsistent data can hinder
your company’s ability to understand its current—and future—business problems. This leads to
poor decisions that can cause a host of negative results, including lost profits, operational delays,
customer dissatisfaction, internal systems failures, inaccurate business forecasts, ineffective
sales and marketing efforts and much more.
And as companies look to implement tools to help make sense of their data, they find poor data
quality is a major obstacle in these efforts. Gartner Research estimates that through 2005, “more
than 50 percent of business intelligence and customer relationship management deployments will
suffer limited acceptance, if not outright failure, due to lack of attention to data quality issues.” 2
And, as companies continually add data from acquisitions, mergers, suppliers and so forth, the
problems are compounded over time.
What is the real cost of poor data quality? The direct costs of poor quality information have been
measured at from 10 to more than 20 percent of an organization’s operating revenue (or budget
in not-for-profit entities). Moreover, organizations can waste 40 to 50 percent of their IT budget on
“information scrap and rework”, in the form of reruns, fixing requirements, design and coding
defects, along with unnecessary redundant data handling and interfacing and data correction.3
Poor quality information, however, creates opportunity losses that can be even greater than the
direct costs. When you alienate customers by misspelling their name, mailing information to the
wrong address or sending customers the wrong items, you risk the loss of a customer’s lifetime
value. Poor quality information causes business intelligence and trend analysis to fail, causing
you to sub-optimize marketing campaigns, product development and other opportunity
development processes. This prevents you from exploiting opportunities.
2
Ted Friedman, “A Strategic Approach to Improving Data Quality,” Gartner. June 19, 2002.
3
Larry English, Improving Data Warehouse and Business Information Quality (IDW&BIQ), NY: John Wiley &
Sons, 1999, p. 12.
4
Defining and addressing the problem
Before addressing specific remedies for data quality, you must define what data quality is. Data
quality is about more than just what is in the databases. It is about the complete set of
interactions of people and the information they need to perform their work effectively and
efficiently, whether creating data or applying it. Data quality is, “consistently meeting all
knowledge worker and end-customer expectations through information and information services”
to accomplish knowledge worker objectives and customer objectives.”4 Information has value
only to the extent it is applied to perform work. Poor quality information causes processes to fail
and hinders those who must perform the business processes when they must stop to hunt for it,
verify it, correct it on the spot, perform workarounds, or attempt to keep from losing an unhappy
customer.
You must next understand the types of data and how its nonquality can impact your organization.
Customer names, shipping addresses, inventory quantities, part numbers—all of this is critical
business information. If the information is accurate, complete, timely, unique, consistent and
valid, you have high quality information that can serve as the foundation for effective business
strategies.
This sounds simple, but achieving and maintaining useful and valuable information is rare.
Accuracy suffers each time all of the relevant information on a particular record isn’t completed or
critical data isn’t verified. Data becomes dangerous when records aren’t kept up to date. The
value of your data declines each time a duplicate record enters your database. And every
variation or contradiction entered while capturing data is a blow to data consistency and validity.
To build better business information, the most reliable method is to take a two-part, parallel
attack.
1. Standardize, correct and verify information at its source, giving you a base of quality
information that can be trusted.
2. Improve the process(es) that cause the defects to enable the processes to be trusted.
Eliminating defects: The elements of data correction

By implementing data quality improvement processes and using effective technology, it’s possible
to transform defective data into consistent, accurate and reliable information assets. The type of
assets you can use to make more informed business decisions. Help you better serve customers
and reduce operating costs. And increase profit and solidify your strategic position.
But what types of capabilities do you need in a data correction technology to help you address
the problems that you currently have—and serve as a vanguard to prevent problems in the
future? This section outlines the processes and technology that you can use to improve the
quality of your current business information:
• Plan and prioritize a data correction initiative

• Parse data into atomic components
• Standardize, correct and normalize the data
• Verify and validate data accuracy
4
Larry English, “The Ten Essentials of Information Quality Management,” DM Review, September 2002, p.
37.
5
Plan and prioritize a data correction initiative
The first phase of a new program to increase the quality of your information assets is to define
what processes are needed: (1) to correct data errors, and (2) to improve processes to prevent
recurrence of those data errors. Then, you can build a plan for addressing the most critical issues
based on costs and risks.
The easiest way to start this phase is through a comprehensive data profiling process as part of
information quality assessment to identify where the problems are. Briefly, data profiling
encompasses such activities as frequency and basic statistic reports, table relationships, phrase
and element analysis and business rule discovery. This analysis can help understand the
structure and interdependencies of your information and can be used to pinpoint anomalies
where further assessment is needed. (For more information on data profiling, see the DataFlux
white paper entitled, Data Profiling: The Foundation for Data Management).
To prioritize your data correction activities, ask the following questions:

• Where should you begin? Go to the business stakeholders and find out the most
troubling problems caused by poor quality information. Imagine your company is
preparing a new telemarketing campaign to reach customers by phone. But you find that
85% of the telephone numbers in your customer database do not follow a standard ten-
digit pattern for US and Canadian phone numbers. This will be an important item for you
during the correction phase.
• Based on the results of profiling, which department or business function should lead the
effort to decide how to standardize and correct the data? If the analysis shows that a
high degree of your financial data is “out of limits,” you may assign the accounting and
administrative departments several specific tasks within the quality effort to address
these issues.
• Which quality efforts will mean the most to your business? That depends on your
business processes and the cost of process failure caused by the nonquality information.
If your company does millions of online transactions with distributors and suppliers every
year, product and inventory data would be high priority. If your company sends millions of
catalogs to customers, then accurate name and address data would be a high priority.
Parse data into atomic components

After analyzing the data, the next step in data correction is parsing or breaking a string of
characters up into its constituent parts to facilitate correction and duplicate matching. Parsing
rules can be based on the type of data, the clues found within the data itself or a library of
common data patterns. Typically, data quality technology includes pre-built vocabularies,
grammars and a host of other modifiable expression files that can help you efficiently and
correctly parse data.
Parsing is a critical element to any information quality initiative because it allows improved data
matching. Parsing allows you to break free-form text strings into more usable parts. Therefore, if
you can correctly identify several pieces of a text string—such as the street number, street name
and street type—the chances of matching that string to other strings that are similar increases
dramatically. For later steps, having data broken into more manageable components increases
the reliability of correction techniques.
For instance, a full customer address may be stored in a single field in a table. If you are able to
identify component parts of that full address, such as which state or ZIP codes are available, you
can use those parts by themselves or in conjunction with other data to discover new information
about a particular record. Or, you could then determine that a ZIP code is invalid because it does
not match up correctly with the state, city or street information in the same address.
Finally, parsing is an important part of the verification process, which will be discussed in detail in
a later section. If you can accurately parse a full name so that you can identify the given name,
6
you may then be able to determine or verify that that individual in your database is male or
female. For example, consider the following text field containing a customer name.
Input String Parsed String Elements

Herr Johann Wolfgang von Goethe Name Prefix Herr
Given Name Johann Wolfgang
Family Name von Goethe
Pre-built registries understood the German conventions for prefix, first name and last name and
parsed the data into the correct fields. Parsing isn’t confined to name and address data, either.
Another example would be a record for a vehicle in a fleet of trucks:
Input String Parsed String Elements

1999 Chevrolet Silverado 3500 Year 1999
Make Chevrolet
Model Silverado
Model Style 3500
With parsed records, when you need to conduct other data correction techniques, you can now
match on different parts of the record, such as “Chevrolet” or “Chevrolet” and “Silverado.” This
gives you more latitude to find a matching entry if you are looking for duplicate or similar records.
Standardize, correct and normalize data

Most data sources that have not been assessed for data quality show multiple permutations of
data and other anomalies. Data standardization, correction and normalization help you address
and correct these instances, creating a uniform nomenclature for common records. For instance,
ACME Manufacturing Corporation may be represented in the same data source as Acme Mftg
Corp, ACME, and ACME Manufacturing. This becomes more complex when a company has a
legal long name (ACME Manufacturing Corporation), a commonly used short name (GE for
General Electric Company) and/or “doing-business-as” names.
With most data quality technologies, you initiate a standardization scheme to change data to a
standardized format. Once completed, you can get an accurate picture about the size of your total
business relationship with ACME Manufacturing Corporation, because all permutations have now
been standardized on one naming convention.
Not only can you manually create these schemes independent of your data, you can also allow
the technology to automatically build these schemes for you based on your own dataset. You can
join and modify schemes at will, and use these schemes to virtually and physically change data to
a desired format. Standardization schemes are also useful for individual phrases or elements. For
example, consider a scheme for the permutations of the word “Incorporated.”
Data Permutation Standardized Data

Inc. Inc.
Incorporated Inc.
Incroporated Inc.
Inc Inc.
In this example, a standardization scheme will change companies entries such as “Smithfield
Foods Incorporated” into a standardized “Smithfield Foods Inc.” The scheme can operate on data
elements independently to clean up noise words by modification or elimination (schemes can also
define which words or phrases should be deleted altogether), or to simply standardize commonly
occurring elements like “Street” to “St” or “Junior” to “Jr.”
7
Similarly, normalization techniques help you create valid patterns of data across tables and
columns. Some pieces of data, such as phone numbers, product codes or Social Security
Numbers, may have a common pattern such as (999) 999-9999, where “9” is a valid numeric
character. Pattern standardization can take information in non-standard formats, such as
9999999999 or 999.999.9999 and turn that into a standard telephone number format.
Verify and validate data accuracy

Verification and validation activities identify data that appears to be correct or standardized on the
surface but is actually invalid compared to other data either within or outside of a database.
Addresses with ZIP codes are a prime example. An address field in a customer database can
have a valid ZIP code value and format (it contains 5 digits), but the ZIP code itself may be
incorrect for the address.
Companies typically use these routines to verify that a mailing address is correct. In the example
below, the input string looks like a valid, actionable address. However, an address verification
routine shows that the street name is actually “Weston Parkway,” and that the ZIP code is 27513,
not 27503. To accomplish this, you connect to USPS data sources to verify the ZIP code, and
add the ZIP+4 designation when practical.
Input String Validated Data

4001 Weston Park 4001 Weston Parkway
Suite 300 Suite 300
Cary Cary
NC North Carolina
27503
27513-2311
For marketing departments, address verification is an invaluable tool that allows you to develop
better addresses. And by verifying the quality of these addresses, you can even receive discounts
on postage for bulk mailings because your addresses conform to establish postal standards.
As another example, after a standardization process, you might determine that your company has
10 distinct product lines according to the data values. But after verifying these records, you find
that three of those product lines have been discontinued but are still appearing as new values in
your data. The data might be correct form one point of view, but a thorough verification of the
data—based on comparisons to other information that you know is accurate and up-to-date—may
reveal the data to be erroneous.
8
How to solve information quality problems: Process
improvement for defect prevention
Finding missing data, correcting inaccurate information or consolidating and eliminating duplicate
records is only one step of a sound information quality environment. Fixing defective data attacks
the symptoms of the problem. To maximize the money and time you spend on data correction,
you should treat a data correction initiative as a one-time activity for a given dataset and couple it
with a process improvement of processes causing the defects. To solve the real problem one
must find and eliminate the root cause(s) that produced the defective data, not just the
precipitating cause. This requires a process improvement method.
There are two broad categories of precipitating cause. First, data is created incorrectly for some
reason. Second, data “goes bad” because a characteristic of a real world object (such as marital
status or the relationship of a person to an address) changes, but there is no process that
captures that updated fact. This is called information quality decay.
Walter Shewhart gave us a process improvement method that has been a standard in virtually
every valid quality management system. This method, generally referred to as PDCA or Plan-Do-
Check-Act5, is also called the Shewhart Cycle, or the Deming Cycle in Japan where W. Edwards
Deming taught it to Japanese companies. Six Sigma has a variation of this method, called DMAIC
or Define-Measure-Analyze-Improve-Control. They are basically identical in purpose.
Overview
Simply stated, to improve a process, you must Plan an improvement by first analyzing the root
cause(s) of a type of data defect, such as duplicate customer, or incorrectly spelled names. Only
then can you define improvements that can prevent or reduce recurrence of the defects
effectively. Then, Do implement the improvement(s) in a controlled way so you can assure the
effectiveness of the improvement. Next, Check the improvement to assure it achieved the
improvement goals. Finally, Act to put the process in control to hold the gains and roll the
improved process out to the rest of the enterprise.
The Total Information Quality Management (TIQM®) methodology describes the steps of PDCA
as applied to information processes.6 Figure 1 represents the process steps and their
interdependencies.
5
See Masaaki Imai, Kaizen: The Key to Japan's Competitive Success, NY: McGraw-Hill, 1986, pp. 60f.
6
Op. cit.,English, IDW&BIQ, pp. 285-310.
9
TIQM® METHODOLOGY
PROCESS P5:
Impr ove Infor mation Pr ocess Quality
S5.1 S5.2 S5.3

Define Develop Plan Do
Project for for Implement
Infor mation Infor mation Quality
Quality Quality Improve-
Impr ovement Improvement ments
4 1
ACT PLAN
Shewhar t
S5.5 3 Cyc le 2 S5.4
Act to CHECK DO Check
Standar dize Impact of
Infor mation Information
Quality Quality
Impr ovements Impr ovements Implemented
IQ Process
Plan-Do-Check-Act (PDCA) Improvements
© INFORMATION IMPACT Confidential and Proprietary
Figure 1. The PDCA Process Improvement Cycle7
Define a quality improvement initiative

In this step, you establish a project for improvement using your procedures for establishing
projects, large or small. Make it easy for a small “SWAT” team to be able to quickly address small
projects that may have a big payoff. Prioritize candidate projects to begin with high payoff, high
visibility opportunities first and select a process for improvement.
Then, you can identify the process improvement team members, specifically including the people
who actually perform the work of the process being improved. This Quality Improvement Team
includes information producers and any data entry personnel who perform the work, along with
their immediate information consumers and the technical staff who know the applications and
databases involved.
“Plan” improvement after discovering root cause(s)

The next step is to plan for process improvement. There are two required components of an
information quality plan.
1. Conduct a Root-Cause Analysis

Use a Cause-and-Effect diagram, also called an Ishikawa Diagram or fishbone diagram, as a tool
to capture possible causes of the nonquality information. Figure 2 illustrates an example diagram
that illustrates some typical causes of duplicate customer records.
7
English, IDW&BIQ, p. 290.
10
Application / Data
Envir onment Database
Technology Sour c e
Slow r esponse Does not

No emphasis
time r emember
on tr aining
DB r eloaded Does not tell they previous or der
No Quota with duplicates System have moved
acc oun- “ down” Name change
tability Has pr ivac y
Customer look-up concer ns
algor ithm faulty Duplic ate
c ustomer
Selection Conflicting
Does not ask if c r eated
customer has placed
r equests pr oc edur es
or der befor e Wants to be “ customer-
multiples
No step to ask all ser vice” or iented and does
Sample not Rushes to
infor mation to not ask for customer number
r epr esentative
deter mine duplicate meet quota
Lack of knowledge
Does not under stand of customer look-up
customer ’s name pr ocedur es
Or der Infor -
Measur ement Pr ocess / mation
Pr oc edur es Pr oducer
© INFORMATION IMPACT Confidential and Pr oprietary
Figure 2. Cause-and-Effect Diagram of a "Duplicate Customer Record" Problem8
Begin analyzing root causes by establishing a positive environment of trust in a non-blame, non-
judgmental atmosphere. Then define the “effect” or the quality problem clearly to avoid scope
creep.
Brainstorm possible causes, involving everyone in the cause identification. A key technique is
called “Why analysis,” a method of inquiry that keeps asking, “Why?” to get from the precipitating
cause to the root cause.
2. Define improvement(s) to prevent recurrence

After you have discovered and understood root causes, you define improvements to prevent
recurrence of the defects. These are essentially error-proofing techniques that prevent errors in
the future.
The key to this step is to assure you define the right improvement to solve the root problem and
not merely the symptom. There are many techniques used for error-proofing and controlling the
processes. They include technology of assuring edits and validation in the applications of original
data capture, intuitive forms design with clear definition and clear procedures to assure quality of
data captured manually, and training and performance measure refinement, among many
others.9 Often there will be several improvement techniques to be applied to a specific problem.
“Do” implement the improvement(s) in a controlled manner

In this step you implement the defined improvements in a controlled environment so you can
assure they solve the problem. Application and database changes, of course, are implemented in
a “test” environment.
8
Larry English, Information Quality Improvement seminar, 2003 Ed. Brentwood, TN: Information Impact
International, 1993-2004, p. 8.16.
9
Op cit. See English, IDW&BIQ, pp. 302-309 for a listing of best practices for improving and error-proofing
processes and applications.
11
Other improvements, like forms design, procedure enhancement and training may be
implemented in a small part of the business, so that the effects of the improvements can be
evaluated in the real work environment.
“Check” to verify the effectiveness of the improvement

This step assesses the data to assure the improvement worked successfully without introducing
new problems. Select a set of random sample of records produced by the “improved” process.
This requires a means of identifying the data produced in the controlled environment. Assess the
data for completeness, accuracy and other quality requirements and compare that to the quality
target.
Different kinds of problems require different types of assessment. For example, quality problems
require that involve timeliness require you to measure the degree of “information float” time from
when the data is known, until it is “knowable” to the information consumers.
If the improvement did not solve the problem, conduct a root-cause analysis to determine why.
Then re-implement or re-plan the improvement as necessary.
“Act” to put the process into control and roll it out

This step puts the process into repeatable control to hold the gain and rolls the improved process
out to all of the enterprise.
Be sure to measure the costs of nonquality before and after the improvement(s) to document the
value (ROI) of the improvement initiative.10
Process improvement is an essential process for an information quality management function to

be effective. As you improve processes to prevent defects, you prevent new problems from
occurring that cause you to have to do unnecessary data cleansing.
Using information quality technology for data defect prevention

To help safeguard your systems from incorrect or unreliable data, companies can enact
technology to perform real-time data management routines to correct, standardize and verify
information at the point of origin. Typically, this software provides pre-built processes to ensure
the quality of the information entering your systems from business applications, Web sites and
other entry points.
Data quality technology running in real time should use a client/server communication method
that infuses high-level matching technology across multiple applications without requiring custom
communication protocols. This processing method allows you to support an “always available”
data management engine that simply waits for client data processing requests. When assessing
real-time data management technology, look for a solution that includes:
• Comprehensive data quality functionality that detects and resolves data problems before
they enter your information systems.
• Adaptability to run on a host of platforms including Solaris, UNIX, Windows and Linux and
can be easily implemented in common development languages such as C, Java, Perl and
COM/ASP. This gives you the flexibility to implement data management routines
throughout the enterprise.
• Flexibility to satisfy any business requirement including Web site integration, custom
applications, CRM systems, ERP systems, and other enterprise information systems.
10
Ibid. See Chapter 7, "Measuring Nonquality Information Costs," pp. 199-235 for a process defining how to
quantify both the direct and opportunity costs of nonquality information.
12
Conclusion
Customer names and addresses. Inventory information. Supplier catalogs. Product lists.
Consumer buying information. Purchased marketing lists. All told, companies have millions—or
billions—of pieces of data. Some data sources are updated frequently, while others contain
information that has not been touched in years. Some data entry occurs by trained staff (typically
internal staff such as customer relations or product support). Other data entry comes directly
through e-business applications from customers or partners—organizations that do not know your
data definitions, structures and standard procedures.
Therefore, creating, maintaining and improving the quality of your business information is a
never-ending core process for success in the Information Age, as the addition of new data causes
your information sources to become outdated over time. Faced with this dilemma, what should
companies do to overcome its data problems? By using a combination of quality management
processes and information technology, you can start to build higher quality information that can
help you know more about your current business landscape. Two of the processes, covered in
this paper, are:
1. Implement a data correction process and technology solution that helps you parse,
standardize and verify and correct key pieces of data. With this functionality in place, you
can cleanse data in batch to make sure that high quality data reaches your business-
critical systems and enables your knowledge workers to take the right actions and make
the right decisions.
2. Implement a process improvement and technology solution that transforms the culture
and improves business processes to eliminate or significantly reduce defective data from
being produced in the first place. You can use the PDCA (Plan, Do, Check, Act)
approach to create better processes for creating, updating, managing and exploiting
information for competitive advantage.
Next to your people resources, your information is the next most important resource for
competitive advantage in the Information Age. John Naisbitt correctly identifies:
“In the new information society, that key resource has shifted [from capital] to information,
knowledge, creativity. And there is only one place where the corporation can mine this valuable
new resource—in its employees. That means a whole new emphasis on human resources.”11
For companies that rely on information to contact customers, communicate with trading partners
or share information across the enterprise, data is a critical component of success or failure. Take
steps to ensure the quality of your business information—and become a more competitive
company.
11
John Naisbitt and P. Aburdene, Re-inventing the Corporation, NY: Warner Books, 1985, p. 5.
13
Getting started
A pioneer in data management since 1997, DataFlux is a market leader in providing
comprehensive, end-to-end data management solutions. DataFlux products are designed to
significantly improve the consistency, accuracy and reliability of an organization’s business-critical
data, enhancing the effectiveness of both customer- and product-specific information.
DataFlux frequently works with clients to implement data management solutions within a
framework of overall process improvement. By integrating data quality technology into a
methodology such as “Plan-Do-Check-Act,” DataFlux allows companies to elevate the role of
data correction, standardization and verification into an effective data quality environment. As the
importance of data quality moves from the IT room to the board room, the ability to map into—and
work within—established, corporate-wide process improvement becomes a critical success
factor.
The growing role of data management
The process of data management begins with a discovery or data profiling phase that asks one
critical question: What points of data collection might have relevant, useful information for your
data-based applications and initiatives? Once you begin to understand your data, you can correct
errors and improve the processes that create it, so you can exploit quality information to build
more effective CRM, ERP, data warehousing and other applications.
DataFlux provides a total solution for your data management needs, which encompasses four
building blocks:
• Data Profiling – Discover and analyze data discrepancies

• Data Quality – Reconcile and correct data and improve the processes that create it
• Data Integration – Integrate and link data across disparate sources
• Data Augmentation – Enhance information using internal or external data sources
DataFlux's GUI-based product, dfPower® Studio, brings industrial-strength data management

capabilities to both business analysts and IT staff. dfPower Studio is completely customizable,
easy to implement, intuitive and usable by any department in your organization. With dfPower
Studio, you can identify and fix data inconsistencies, match and integrate items within and across
data sources and identify and correct duplicate data. dfPower Studio also provides data
augmentation functionalities that allow you to append existing data with information from other
data sources, including geographic or demographic information.
For real-time data quality capabilities, dfIntelliServerTM provides a software developer that
provides a client/server architecture designed for on-demand data quality. dfIntelliServer allows
for real-time data validation and correction to improve your front-end processes as data is entered
into Web pages or other applications. Working independently or together, dfPower Studio and
dfIntelliServer ensure a comprehensive data management environment, allowing for better
business decisions and improved data-driven initiatives.
DataFlux is a wholly-owned subsidiary of SAS, the market leader in providing business

intelligence software and services that create true enterprise intelligence.
DataFlux and all other DataFlux Corporation product or service names are registered trademarks or trademarks of, or licensed to, DataFlux
Corporation in the USA and other countries. ® indicates USA registration. Copyright © 2004 DataFlux Corporation, Cary, NC, USA. All
Rights Reserved.
TIQM is a registered trademark of Information Impact International.

02/04
14

Standardize, Validate and Improve Your Information Assets: Data Quality

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Standardize, Validate and Improve Your Information Assets: Data Quality

Uploaded by

Copyright:

Available Formats

Data Quality

Standardize, Validate and Improve

Executive summary ..........................................................................................................................3

The importance of high quality information ......................................................................................4

Eliminating defects: The elements of data correction.......................................................................5

Getting started ................................................................................................................................14

Figure 2. Cause-and-Effect Diagram of a "Duplicate Customer Record" Problem ........................11

• Large “silos” of information—After years of compiling information in separate systems,

Eliminating defects: The elements of data correction

• Plan and prioritize a data correction initiative

To prioritize your data correction activities, ask the following questions:

Parse data into atomic components

Input String Parsed String Elements

Input String Parsed String Elements

Standardize, correct and normalize data

Data Permutation Standardized Data

Verify and validate data accuracy

Input String Validated Data

S5.1 S5.2 S5.3

© INFORMATION IMPACT Confidential and Proprietary

Figure 1. The PDCA Process Improvement Cycle7

Define a quality improvement initiative

“Plan” improvement after discovering root cause(s)

1. Conduct a Root-Cause Analysis

Slow r esponse Does not

© INFORMATION IMPACT Confidential and Pr oprietary

Figure 2. Cause-and-Effect Diagram of a "Duplicate Customer Record" Problem8

2. Define improvement(s) to prevent recurrence

“Do” implement the improvement(s) in a controlled manner

“Check” to verify the effectiveness of the improvement

“Act” to put the process into control and roll it out

Process improvement is an essential process for an information quality management function to

Using information quality technology for data defect prevention

The growing role of data management

• Data Profiling – Discover and analyze data discrepancies

DataFlux's GUI-based product, dfPower® Studio, brings industrial-strength data management

DataFlux is a wholly-owned subsidiary of SAS, the market leader in providing business

TIQM is a registered trademark of Information Impact International.

You might also like