You are on page 1of 24

Understanding

Data Quality

1
2

Understanding of data
handling
3

Understanding of data
handling
 Read this passage.
 How many processes have you noticed?
 What are the processes involved?
 How data is handled in each process?
4

The first stage in data analysis is the preparation of an


appropriate form in which the relevant data can be
collected and coded in a format suitable for entry into
a computer; this stage is referred to as data
processing. The second stage is to review the
recorded data, checking for accuracy, consistency
and completeness; this process is often referred to as
data editing. Next, the investigator summarizes the
data in a concise form to allow subsequent analysis—
this is generally done by presenting the distribution of
the observations according to key characteristics in
tables, graphs and summary measures. This stage is
known as data reduction. Only after data processing,
editing and reduction should more elaborate statistical
manipulation of the data be pursued.
5

Data handling
 isthe process of ensuring that data is
stored, archived or disposed off in a safe
and secure manner during and after
completion of any program/project. This
includes the development of policies and
procedures to manage data handled
electronically as well as through non-
electronic means
6

Proper planning for data handling can


result in
 efficient and economical storage,
 retrieval, and
 disposal of data.
7

 Inthe case of data handled


electronically, data integrity is a primary
concern to ensure that recorded data is
not altered, erased, lost or accessed by
unauthorized users.
8

Issues that should be considered in


ensuring integrity of data handled
include the following:

 Type of data handled and its impact.


 Type of media containing data and its storage
capacity, handling and storage requirements, reliability,
longevity, retrieval effectiveness, and ease of upgrade
to newer media.
 Data handling responsibilities/privileges, that is, who
can handle which portion of data, at what point during
the program/project, for what purpose, etc.
 Data handling procedures that describe how long
the data should be kept, and when, how, and who
should handle data for storage, sharing, archival,
retrieval and disposal purposes.
9

Data quality dimensions in the


literature
 include dimensions such as accuracy,
reliability, importance, consistency,
precision, timeliness, understandability,
conciseness and usefulness
 Wand and Wang (1996: p92)
10

 Kahn et al. (1997) developed a data


quality framework based on product and
service quality theory, in the context of
delivering quality information to
information consumers.
11

 Fourlevels of information quality were


defined:
 sound information,
 useful information,
 usable information, and
 effective information.
 The framework was used to define a
process model to help organisations plan
to improve data quality.
12

A more formal approach to data quality is


provided in the framework of Wand and
Wang (1996) who use Bunge’s ontology to
define data quality dimensions.
 They formally define five intrinsic data
quality problems: incomplete,
meaningless, ambiguous, redundant,
incorrect.
13

Semiotic Theory
 Semiotictheory concerns the use of
symbols to convey knowledge. Stamper
(1992) defines six levels for analysing
symbols. These are the physical,
empirical, syntactic, semantic, pragmatic
and social levels.
14

Data quality could be


emphasize on these levels:
 Physical - Concern with physical and physical
media for communications of data
 Empirical -
 Syntactic - concerned with the structure of
data
 Semantic - concerns with the meaning of
data
 Pragmatic - concerns with the usage of data
(usability and usefulness)
 Social - concerns with the shared
understanding of the meaning of the
data/information generated from the data
15

Data Quality: How good is


your data?
This is an example of data quality perceived
by a company that producing GPS
 Scale
 ratio of distance on a map to the equivalent distance on the earth's surface
 Primarily an output issue; at what scale do I wish to display?
 Precision or Resolution
 the exactness of measurement or description
 Determined by input; can output at lower (but not higher) resolution
 Accuracy
 the degree of correspondence between data and the real world
 Fundamentally controlled by the quality of the input
 Lineage
 The original sources for the data and the processing steps it has undergone
 Currency
 the degree to which data represents the world at the present moment in time
 Documentation or Metadata
 data about data: recording all of the above
 Standards
 Common or “agreed-to” ways of doing things
 Data built to standards is more valuable since it’s more easily shareable
17

DISCUSSIONS
Discuss the strategies for ensuring quality data
in all the categories listed in the table
according to levels given in the context of
educational settings or institutions.
Semiotic Level Goal Dimension
18 Improvement
Strategy
Syntactic Consistent Well-defined
(perhaps formal)
syntax

Semantic Complete and Comprehensive,


Accurate Unambiguous,
Meaningful,
Correct

Pragmatic Usable and Timely, Concise,


Useful Easily Accessed,
Reputable

Social Shared Understood,


understanding Awareness of
of meaning Bias
Semiotic Goal Dimension
19 Improvement Strategy
Level
Syntactic Consistent Well-defined Corporate data model,
(perhaps Syntax checking, Training
formal) for data producers
syntax
Semantic Complete and Comprehensi Training for data
Accurate ve, producers,
Unambiguou Minimise data
s, transformations and
Meaningful, transcriptions
Correct
Pragmatic Usable and Timely, Monitoring data
Useful Concise, consumers, Explanation
Easily and visualisation,
Accessed, High quality data delivery
Reputable systems, Data tagging
Social Shared Understood, Viewpoint analysis,
understanding Awareness of Conflict resolution,
of meaning Bias Cultural Immersion
20

4 Common Data Challenges


Faced During Modernization:
1. Data is fragmented across multiple
source systems - Each system holds its
own notion of the policyholder. This
makes developing a unified customer-
centric view extremely difficult. The
situation is further complicated because
the level and amount of detail captured
in each system is incongruent.
21

4 Common Data Challenges


Faced During Modernization:
2. Data formats across systems are inconsistent
- When organization operating with systems
from multiple vendors and each vendor has
chosen to implement a custom data
representation. In order to respond to
evolving business needs, this led to a dilution
of the meaning and usage of data fields:
the same field represents different data,
depending on the context.
22

4 Common Data Challenges Faced


During Modernization: (Cont.)
3. Data is lacking in quality - When
organization has units that are organized
by line of functions. Each unit holds
expertise in a specific field and operates
fairly autonomously. This has resulted in
different practices when it comes to
data entry. The data models from
decades-old systems weren’t designed
to handle today's business needs.
23

4 Common Data Challenges Faced


During Modernization: (Cont.)
4. Systems are only available in defined
windows during the day, not 24/7 - If the
organization's core systems are batch
oriented. This means that to make updates
are not available in the system until batch
processing has completed. Furthermore,
while the batch processing is taking place,
the systems are not available, neither for
querying nor for accepting data. Another
aspect affecting availability is the closed
nature of the systems: They do not expose
functionality for reuse by other systems.
24

Lack of Centralized Approach


Hurting Data Quality
“Data quality is the foundation for any data-
driven effort, but the quality of information
globally is poor. Organizations need to
centralize their approach to data
management to ensure information can be
accurately collected and effectively utilized
in today’s cross-channel environment.”
Thomas Schutz, senior vice president, general manager of
Experian Data Quality

You might also like