You are on page 1of 16

Virtual University of Pakistan

Data Warehousing
Lecture-21
Introduction to Data Quality Management (DQM)

Ahsan Abdullah
Assoc. Prof. & Head
Center for Agro-Informatics Research
www.nu.edu.pk/cairindex.asp
National University of Computers & Emerging Sciences, Islamabad
Email: ahsan101@yahoo.com
1
DWH-Ahsan Abdullah
Introduction to Data Quality
Management (DQM)

2
DWH-Ahsan Abdullah
What is Quality? Informally
Some things are better than others i.e. they are of
higher quality. How much “better” is better?

Is the right item the best item to purchase? How


about after the purchase?

What is quality of service? The bank example

3
DWH-Ahsan Abdullah
What is Quality? Formally

“Quality is conformance to requirements”


P. Crosby, “Quality is Free” 1979

“Degree of excellence”
Webster’s Third New International Dictionary

4
DWH-Ahsan Abdullah
What is Quality? Examples from Auto Industry

Quality means meeting customer’s needs,


not necessarily exceeding them.

Quality means improving things customers


care about, because that makes their lives
easier and more comfortable.

Why example from auto-industry?

5
DWH-Ahsan Abdullah
What is Data Quality?

What is Data?


Note Change
the picture Height = 5’8”
Weight = 160 lbs
Emp_ID = 440
Gender = Male
Age = 35 yrs
Muhammad Khan

All data is an abstraction of something real

6
DWH-Ahsan Abdullah
What is Data Quality?

Intrinsic Data Quality


Electronic reproduction of reality.

Realistic Data Quality


Degree of utility or value of data to business.

7
DWH-Ahsan Abdullah
Data Quality & Organizations

Intelligent Learning Organization:


High-quality data is an open, shared resource with value-
adding processes.

The dysfunctional learning


organization:
Low-quality data is a proprietary resource with cost-adding
processes.

{Comment: Put picture of person in water holding round tube with data written on it}

8
DWH-Ahsan Abdullah
Orr’s Laws of Data Quality
Law #1 - “Data that is not used cannot be correct!”

Law #2 - “Data quality is a function of its use, not its


collection!”

Law #3 - “Data will be no better than its most stringent use!”

Law #4 - “Data quality problems increase with the age of the


system!”

Law #5 – “The less likely something is to occur, the more


traumatic it will be when it happens!”

9
DWH-Ahsan Abdullah
Total Quality Control (TQM)
Philosophy of involving all for systematic and
continuous improvement.

It is customer oriented. Why?

TQM incorporates the concept of product quality,


process control, quality assurance, and quality
improvement.

Quality assurance is NOT Quality improvement

10
DWH-Ahsan Abdullah
Co$t of fixing data quality

Cost of achieving quality

Exponential rise
in cost

Lowest Quality Highest quality

 Defect minimization is economical.

 Defect elimination is very very expensive.


11
DWH-Ahsan Abdullah
Co$t of Data Quality Defects

 Controllable Costs
 Recurring costs for analyzing, correcting, and preventing
data errors

 Resultant Costs
 Internal and external failure costs of business opportunities
missed.

 Equipment & Training Costs

12
DWH-Ahsan Abdullah
Where data quality is critical?
 Almost everywhere, some examples:

 Marketing communications.

 Customer matching.

 Retail house-holding.

 Combining MIS systems after acquisition.


13
DWH-Ahsan Abdullah
Characteristics or Dimensions of Data Quality

Data Quality Definition


Characteristic
Accuracy Qualitatively assessing lack of error, high accuracy
corresponding to small error.
Completeness The degree to which values are present in the attributes that
require them.

14
DWH-Ahsan Abdullah
Completeness Vs Accuracy

95% accurate and 100% complete


OR
100% accurate and 95% complete

Which is better?

Depends on data quality (i) tolerances,


the (ii) corresponding application and the (iii) cost of achieving that
data quality vs. the (iv) business value.

15
DWH-Ahsan Abdullah
Characteristics or Dimensions of Data Quality
Data Quality Definition
Characteristic
Consistency A measure of the degree to which a set of data satisfies a set of
constraints.
Timeliness A measure of how current or up to date the data is.
Uniqueness The state of being only one of its kind or being without an equal
or parallel.
Interpretability The extent to which data is in appropriate languages, symbols,
and units, and the definitions are clear.
Accessibility The extent to which data is available, or easily and quickly
retrievable
Objectivity The extent to which data is unbiased, unprejudiced, and
impartial

16
DWH-Ahsan Abdullah

You might also like