Ensure Data Quality with Accuracy Assessment

Introduction
Data quality is a measure of the condition of data based on factors such as accuracy,
completeness, consistency, reliability and whether it's up to date. Data quality
indicates how reliable a given dataset is. The data’s quality will affect the user’s
ability to make accurate decisions regarding the subject of their study. For example,
if the data is collected from incongruous sources at varying times, it may not actually
function as a good indicator for planning and decision-making. High-quality data is
collected and analyzed using a strict set of guidelines that ensure consistency and
accuracy. Meanwhile, lower-quality data often does not track all of the affecting
variables or has a high-degree of error.
Data Quality components

Quality of Spatial depends on some of factors such as: Spatial Accuracy, Attribute
Accuracy, Temporal Accuracy Logical Consistency, Lineage and so on. These
factors will help to ensure that the data we use are of high quality and helps for better
decision making.
Spatial Accuracy:
Spatial accuracy refers to the quantifying errors in the location of data. The data
should represent the features with good spatial accuracy. Locational information
should be correct in both horizontal and vertical direction. For example, if we are
representing the house, its horizontal and vertical position should be correct.
Attribute Accuracy:
The non-spatial data linked to location may also be inaccurate or imprecise.
Inaccuracies may result from mistakes of many sorts. Precise attribute information
describes phenomena in great detail. For example, a precise description of a person
living at a particular address might include gender, age, income, occupation, level
of education, and many other characteristics. So, data should be accurate in non-
spatial aspects as well.
Temporal Accuracy:
The temporal accuracy refers to the data that has a time component. The data should
be up-to-date. Data should represent the features that exist in time and space. Data
should be revised periodically.
Logical Consistency:
Logical consistency requires that the data are topologically correct. Polygons should
be closed, there should not be overshoot and undershoot, represented features should
be topologically correct and no data should be missing.
Lineage:
Data lineage is the journey data takes from its creation through its transformations
over time. It describes a certain dataset’s origin, movement, characteristics and
quality. It gives visibility while greatly simplifying the ability to trace errors back to
the root cause in a data analytics process.
Accuracy Assessment
Accuracy assessment is a feedback system for checking and evaluating the
objectives and the results. Accuracy assessment determine the correctness of the
results.
Accuracy and Precision
“Accuracy” is the agreement between an experimental value, or the average of
several determinations of the value, with an accepted or theoretical (“true”) value
for a quantity. Accuracy is usually expressed either as a percent difference or a unit
of measurement and can be positive or negative; the sign shows whether the
experimental value is higher or lower than the actual or theoretical value. Accuracy
is the degree of perfection achieved.
“Precision” refers to the closeness of agreement between test results. In other words,
precision is the agreement among several determinations of the same quantity. The
better the precision the lower the difference amongst the values showing that the
results are highly reproducible. High precision is only achieved with high quality
instruments and careful work. Precision is the degree of perfection used in methods.
Figure 1 Accuracy vs Precision
Error Propagation
The accuracy of GIS results cannot naively be based on the quality of the graphical
output alone. The data stored in a GIS have been collected in the field, have been
classified, generalized, interpreted or estimated intuitively, and in all these cases
errors are introduced. Errors also derive from measurement errors, from spatial and
temporal variation, and from mistakes in data entry. Consequently, errors are
propagated or even amplified by GIS operations.
A Geographic Information System (GIS) is a system designed to capture, store,
manipulate, analyze, manage, and present spatial data. GIS applications are tools
that allow users to create interactive queries (user-created searches), analyze spatial
information, edit data, and present the results of all these operations. GIS can refer
to a number of different technologies, processes, techniques and methods. During
the process of capturing, storing, manipulating, analyzing, managing, and presenting
data errors may be introduced knowingly or unknowingly. At each steps those errors
get modified and amplified because every next step is the output of previous step.
So, errors can be introduced and those errors will be propagated to next operation.
For example, when we calculate the bearing of the line, the errors form previous
bearing of line and included angle will be introduce into new calculated bearing of
line. In this example we clearly understood that the errors form pervious bearing and
included angle propagated to the next calculated bearing which eventually will be
propagated to next calculation. In GIS numbers of such operations are done to get
required results. We can take the example of preparing topographical map of certain
area. In this project errors will be introduced form data collection stage to finally
map delivering stage. Errors will be introduced while collecting the raw data in the
field, those errors will be propagated while entering and storing data in database. At
stage of storing data new errors might be introduced which will eventually increase
gross error. Also, errors might be introduced during the operations and methods of
making topographical map. In this way we can say that errors are propagated or even
amplified by GIS operations.
These types of error propagation should be minimized as far as possible. Care should
be done while performing the operation in GIS. Even errors can be introduced from
data entry point so, standard procedure and methods should be applied while
performing task in GIS. Consideration should be given to the quality of data.
RMSE error during accuracy assessment (user accuracy and

producer accuracy)
Root Mean Square Error (RMSE) measures how much error there is between two
data sets. In other words, it compares a predicted value and an observed or known
value. The smaller an RMSE value, the closer predicted and observed values are.
RMSE is defined as:
𝑛
𝑦𝑖− 𝑦̂𝑖
𝑅𝑀𝑆𝐸 = √∑
𝑛
𝑖=1
where:
• Σ is a fancy symbol that means “sum”

• 𝑦𝑖− is the observed value for the ith observation in the dataset
• 𝑦̂𝑖 is the actual value for the ith observation in the dataset
• n is the sample size
The root mean square error is also sometimes called the root mean square deviation,
which is often abbreviated as RMSD.
There are many different ways to look at the thematic accuracy of results. The error
matrix allows you calculate the following accuracy metrics:
• Overall Accuracy and Error
• Errors of omission
• Errors of commission
• User’s accuracy
• Producer’s accuracy
• Accuracy statistics (e.g., Kappa)
Overall accuracy ultimately provides the map user and producer with basic accuracy
information.
Producer’s Accuracy
Producer's Accuracy is the map accuracy from the point of view of the map maker
(the producer). This is how often are real features on the ground correctly shown on
the classified map or the probability that a certain land cover of an area on the ground
is classified as such. The Producer's Accuracy is complement of the Omission Error,
Producer's Accuracy = 100%-Omission Error.
User’s Accuracy
The User's Accuracy is the accuracy from the point of view of a map user, not the
map maker. the User's accuracy essentially tells us how often the class on the map
will actually be present on the ground. This is referred to as reliability. The User's
Accuracy is complement of the Commission Error, User's Accuracy = 100%-
Commission Error.

Ensure Data Quality with Accuracy Assessment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ensure Data Quality with Accuracy Assessment

Uploaded by

Copyright:

Available Formats

Introduction

Data Quality components

RMSE error during accuracy assessment (user accuracy and

RMSE is defined as:

• Σ is a fancy symbol that means “sum”

You might also like