You are on page 1of 22

Introduction to Data Quality

-Ashu Mehta
Database systems
Data quality
• Data Quality is a key factor to Business Intelligence
success.
• There is a common saying in analytics circles which is
“Garbage in, Garbage out”
• This refers to data quality.
– said to mean that if you produce something using
poor quality materials, the thing you produce will also
be of poor quality
• If your data is poor, then the reporting and decisions
made from those reports will be as equally poor.
Data Quality Dimensions
• The six dimensions of data quality are:
– Completeness
– Conformity
– Accuracy/Correctness
– Timeliness
– Consistency
– Integrity
DIMENSIONS OF DATA QUALITY
• Completeness
– Are critical data values missing? A database with
missing data values is not unusual, but when the
information missing is critical, then completeness is
an issue.
– If a customer’s first name and last name are
mandatory but the job title is optional, a record can
be considered complete even if a job title is not
available.
• Conformity
– Is the data following standard data definitions?
For example, are dates in a standard format?
– Maintaining conformity to standard formats are
important for maintaining consistent structure
and nomenclature for sharing and internal data
management.
• Accuracy
– Is the data accurate to the “real-world” values expected?
– Incorrect spellings, misplaced decimals, untimely or out of
date data can lead to inaccurate analysis.
– If the sales from a customer are not the true sales or the
email address of a contact is misspelled, the data is not
accurate.

Are your data values correct?


• Timeliness
– Is the data available when expected and needed?
– Timeliness depends on the user’s expectations and
needs.
– If tracking information is delayed or a customer’s
purchasing information is not updated in real-time then
the timeliness could be an issue.
• Consistency
– Does the data across several systems reflect the
same information?
– If data is reported across multiple systems, it
should have the same information.
– If one database reports a customer’s account as
active, while another reports the account as
closed, the data set is not consistent.
• Integrity
– Is the data valid across the relationships and can
all the data in a database be traced and
connected?
– For example, in a customer database there
should be a valid customer/sales relationship. If
there are sales data without a customer then
that data is not valid and is an orphaned record.
The inability to link related records may
introduce duplication across your systems.
DATA QUALITY IS CONTEXT DEPENDENT
HOW DATA QUALITY CAN BE ASSESSED?
EXAMPLE
TQDM SOLUTIONS
Benefits of data quality
• Accurate analytics
– Get granular control to profile, standardize, measure, and
monitor data quality so you can trust your analytics results.
• More efficient operations
– Ensure end-to-end data quality with automated
integrations between Informatica Data Quality, Axon Data
Governance, and Enterprise Data Catalog
• Improved customer experience
– Ensure your CX apps and MDM tools will always run off the
most accurate and timely data, with a shared repository of
high-quality data.
Informatica data quality tools
• IDQ has been a front runner in the Data
Quality (DQ) tools market.
• IDQ has two type variants, such as:
– Informatica Analyst
– Informatica Developer
• Informatica analyst:
– Intended for business users to view profiling
reports.
– It is a web-based tool that can be used by
business analysts & developers to analyze,
profile, cleanses, standardize & scorecard data in
an enterprise.
• Informatica developer:
– Intended for developers to develop data quality
solutions.
– It is a client-based tool where developers can
create mappings to implement data quality
transformations or services.
– This tool offers an editor where objects can be
built with a wide range of data quality
transformations such as Parser, standardizer,
address validator, match-merge, etc.
• The source data is sent to the Informatica developer
tool (IDQ) for cleansing and standardizing the data
which ensures the quality of the data.
• Then the cleansed data is sent to the Informatica
power center for loading the data from source to
target tables.
• The data in the target table is sent to Informatica
MDM (Master Data Management) to create the
golden record (single view) of a master data.

You might also like