You are on page 1of 29

Data Verification

Data Quality & Accuracy

Lecture 4
Lecturer: M. Gwena
Contents/Objectives
• Spatial Data verification
• Spatial data quality and accuracy
• Spatial Data quality categories
• Spatial Data quality elements
• Assessment of data Quality
• Sources of Spatial Data Discrepancy
• Data quality Improvement techniques
• Spatial Data quality evaluation procedures
DATA VERIFICATION
Six clear steps for Data editing & verification process for Spatial
data
1). Visual Review
• Usually done by check plotting
2). Cleanup of lines and junctions
• Usually done by software first followed by interactive editing
3). Weeding of excess coordinates
• Involves the removal of redundant vertices by the software for linear and /or
polygonal features
4). Correction for distortion and warping
• Most GIS software have functions for scale correction and rubber sheeting.
• Distinct rubber sheeting algorithm used varies depending on the spatial data
model – vector or raster employed by the GIS
• Some raster techniques may be more intensive
Steps for Data editing & verification process for Spatial
data…..
5). Construction of polygons
• The majority of data used in GIS is polygon – hence the construction
of features from lines/arcs is necessary
• Usually done in conjunction with the topological building process

6). The addition of unique identifier or labels


• Often this process is manual
• Some systems provide the capability to automatically build labels for a
data layer
SPATIAL DATA QUALITY AND ACCURACY
• Introduction
• The quality of data sources for GIS processing is becoming an ever increasing
concern among GIS application specialists.
• With the influx of GIS software on the commercial market and the accelerating
application of GIS technology to problem solving and decision making roles,
the quality and reliability of GIS products is coming under closer scrutiny.
• Much concern has been raised as to the relative error that may be inherent in
GIS processing methodologies.
• While research is ongoing, several practical recommendations have been
identified, and no finite standards have yet been adopted in the commercial
GIS marketplace which help to locate possible error sources, and define the
quality of data.
Spatial Data Quality
• Data quality is the degree of data excellency that satisfy the given
objective.
• Completeness of attributes in order to achieve the given task can be
termed as Data Quality.

• Production of data by private sector as well as by various mapping


agencies assesses the data quality standards in order to produce better
results.

• Data created from different channels with different techniques can have
discrepancies in terms of resolution, orientation and displacements.

• Data quality is a pillar in any GIS implementation and application as


reliable data are indispensable to allow the user obtain meaningful results.
Data quality categories

• Data Completeness: It is basically the measure of totality of features. A


data set with minimal amount of missing features can be termed as
Complete-Data.
• Data Precision: Precision can be termed as the degree of details that are
displayed on a uniform space.
• Data Accuracy: This can be termed as the discrepancy between the
actual attributes value and coded attribute value.
• Data Consistency: Data consistency can be termed as the absence of
conflicts in a particular database.
Data quality elements
• The following review of data quality focuses on three distinct
components, data accuracy, quality, and error.

1. Accuracy. The fundamental issue with respect to data


2. Quality
• Lineage
• Positional Accuracy…
• Attribute Accuracy…
• Logical Consistency…
• Completeness...
3. Error.
1. Accuracy
• Accuracy is the closeness of results of observations to the true values or
values accepted as being true.
• This implies that observations of most spatial phenomena are usually only
considered to estimates of the true value. The difference between observed and
true (or accepted as being true) values indicates the accuracy of the
observations.
• Two types of accuracy exist. These are positional and attribute accuracy.

i) Positional accuracy is the expected deviance in the geographic location of an


object from its true ground position. This is what we commonly think of when the
term accuracy is discussed.
• There are two components to positional accuracy. These
are relative and absolute accuracy.
Accuracy……
• Absolute accuracy concerns the accuracy of data elements with respect to a
coordinate scheme, e.g. UTM.
• Relative accuracy concerns the positioning of map features relative to one
another.
• Often relative accuracy is of greater concern than absolute accuracy.
• For example, most GIS users can live with the fact that their survey township
coordinates do not coincide exactly with the survey fabric, however, the absence of
one or two parcels from a tax map can have immediate and costly consequences.

ii) Attribute accuracy is equally as important as positional accuracy.


• It also reflects estimates of the truth. Interpreting and depicting boundaries and
characteristics for forest stands or soil polygons can be exceedingly difficult and
subjective. Most resource specialists will attest to this fact.
• Accordingly, the degree of homogeneity found within such mapped boundaries is
not nearly as high in reality as it would appear to be on most maps.
2. Quality
• Quality can simply be defined as the fitness for use for a specific data set.

• Data that is appropriate for use with one application may not be fit for use with
another.
• It is fully dependent on the scale, accuracy, and extent of the data set, as well as
the quality of other data sets to be used.

• The recent U.S. Spatial Data Transfer Standard (SDTS) identifies five components
to data quality definitions. These are :

• Lineage
• Positional accuracy
• Attribute accuracy
• Logical consistency
• Completeness
Data quality elements – Quality…..

1. Lineage
• Concerned with historical and compilation aspects of data such as:
• Source of the data
• Content of the data
• Data capture specifications
• Geographic coverage of the data
• Compilation method of the data e.g digitizing Vs Scanned
• Transformation methods applied to the data and
• The use of all pertinent algorithms during compilation e.g. linear
simplification. Feature generalization.
Data quality elements- Quality….

2. Positional Accuracy
• The identification of positional accuracy is important.
• This includes consideration of inherent error (source error) and operational
error (introduced error).

3. Attribute Accuracy
• Consideration of the accuracy of attributes also helps to define the quality of
the data. This quality component concerns the identification of the reliability,
or level of purity (homogeneity), in a data set.
Data quality elements – Quality…..
4. Logical consistency
• This component is concerned with determining the faithfulness of the data
structure for a data set.
• Involves spatial data inconsistencies such as incorrect line intersections,
duplicate lines or boundaries, or gaps in lines.
• These are referred to as spatial or topological errors.
5. Completeness
• Involves a statement about the completeness of the data set.
• Includes consideration of holes in the data, unclassified areas, and any
compilation procedures that may have caused data to be eliminated.
Data quality elements - Note

• The ease with which geographic data in a GIS can be used at any
scale highlights the importance of detailed data quality
information.
• Although a data set may not have a specific scale once it is
loaded into the GIS database, it was produced with levels of
accuracy and resolution that make it appropriate for use only at
certain scales, and in combination with data of similar scales
3. Error
• Two sources of error, inherent and operational, contribute to the
reduction in quality of the products that are generated by geographic
information systems.

• Inherent error is the error present in source documents and data.


• Operational error is the amount of error produced through the data
capture and manipulation functions of a GIS.
• Possible sources of operational errors include:
• Mis-labelling of areas on thematic maps
• Misplacement of horizontal (positional ) boundaries
• Human error in digitizing
• Classification error
• GIS algorithm inaccuracies; and
• Human bias
Error …..
• While error will always exist in any scientific process, the aim within
GIS processing should be to identify existing error in data sources
and minimize the amount of error added during processing.
• Because of cost constraints it is often more appropriate to manage
error than attempt to eliminate it. There is a trade-off between
reducing the level of error in a data base and the cost to create and
maintain the database.
• An awareness of the error status of different data sets will allow
user to make a subjective statement on the quality and reliability of
a product derived from GIS processing.
Error …
• The validity of any decisions based on a GIS product is directly related
to the quality and reliability rating of the product.
• Depending upon the level of error inherent in the source data, and
the error operationally produced through data capture and
manipulation, GIS products may possess significant amounts of error.
• One of the major problems currently existing within GIS is the aura of
accuracy surrounding digital geographic data.
• Often hardcopy map sources include a map reliability
rating or confidence rating in the map legend.
• This rating helps the user in determining the fitness for use for the
map.
• However, rarely is this information encoded in the digital conversion
process.
Error………..
• Often because GIS data is in digital form and can be represented
with a high precision it is considered to be totally accurate.
• In reality, a buffer exists around each feature which represents the
actual positional location of the feature. For example, data captured
at the 1:20,000 scale commonly has a positional accuracy of +/- 20
metres.
• This means the actual location of features may vary 20 metres in
either direction from the identified position of the feature on the
map.
• Considering that the use of GIS commonly involves the integration
of several data sets, usually at different scales and quality, one can
easily see how errors can be propagated during processing.
Example of areas of uncertainty for overlaying data
Several comments and guidelines on the recognition and assessment of error in GIS
processing have been promoted in papers on the subject. These are summarized below:
There is a need for developing error statements for data contained within geographic
information systems (Vitek et al, 1984).
The integration of data from different sources and in different original formats (e.g. points,
lines, and areas), at different original scales, and possessing inherent errors can yield a product
of questionable accuracy (Vitek et al, 1984).
The accuracy of a GIS-derived product is dependent on characteristics inherent in the source
products, and on user requirements, such as scale of the desired output products and the
method and resolution of data encoding (Marble, Peuquet, 1983).
The highest accuracy of any GIS output product can only be as accurate as the least accurate
data theme of information involved in the analysis (Newcomer, Szajgin, 1984).
Accuracy of the data decreases as spatial resolution becomes more coarse (Walsh et al, 1987).
; and
As the number of layers in an analysis increases, the number of possible opportunities for error
increases (Newcomer, Szajgin, 1984).
Assessment of Data Quality:

• Data quality is assessed using different evaluation techniques by


different users.
• The first level of assessment is performed by the data producer.
• This level of assessment is based on data quality check based on given
data specifications.
• Second level of data quality assessment is performed at consumer
side where feedback is taken from the consumer and processed.
• Then the data is analyzed / rectified on the basis of processed
feedback.
Sources of Spatial Data Discrepancy:
• Uncertainty analysis assesses the discrepancy between
geographic data in GIS, and the geographic reality that the data are
intended to represent. ...
• Uncertainty exists in all three components of geographic data: in the
typological attributes, the locational attributes, and attributes related
to spatial dependence.
1. Data Information Exchange:
2. Type and Source:
3. Data Capture:
4. Cartographic Effects:
5. Data Transfer:
6. Metadata:
Sources of Spatial Data Discrepancy……..
1. Data Information Exchange:
• The information about the data provided by the client to organization.
• The degree of information provided by the client defines the accuracy
and completeness of data.
2. Type and Source:
• Data type and source must be identified and evaluated in order to get
appropriate data values and before proceeding towards any analysis.
• There are many spatial data formats and each one of them is having some
beneficiary elements as well as some drawbacks.
• For example: In order to use CAD data on GIS platform, data must be
evaluated and problems must be rectified otherwise resultant values will
show the high extents of discrepancies.
Sources of Spatial Data Discrepancy……
3. Data Capture:
• There are many tools that incorporate manual skills to capture the data using
various software like ArcGIS, QGIS...
• These software allows user to capture information from the base data.
• During this data capture ( especially using manual skills) , the user may
misinterpret features from the base data and captures the features with
errors.
• For example: A user misinterprets two buildings as single building and capture
as a single feature. But in real world, there are two features.

• Correct interpretation of features in base data must be performed.


• However, there are many tools that enables user to find and fix those errors,
but still these tools are not used frequently due to lack of awareness.
• Data capture must be performed on a perfect scale where one must be able
to view the features distinctly.
Sources of Spatial Data Discrepancy……
4. Cartographic Effects:
• After capturing the data, some cartographic effects like symbology,
pattern, colors, orientation and size are assigned to the features.
This is required for a better representation of reality.

• These effects must be assigned according to the domain of the


features.
• Like for Forestry application, forestry domain specific cartographic
elements must be used.
• Elements of any other domain used for a particular domain degrades
the output of results.
Sources of Spatial Data Discrepancy……
5. Data Transfer:
• Some discrepancies may occur while transferring the data from one place to
another. E.g. from a web source to the standalone, web disconnected machine.
• Sometimes, In order to make the accurate data more accurate, user tries to apply
different advanced rectification technique but as a result the less accurate data
changes into highly degraded data.
• “There is no bad or good data. There are only data which are suitable for a specific
purpose.”
• So, Data must be evaluated according to the domain for which it is supposed to be
used.

6. Metadata:
• Sometimes metadata is not updated according to the original features.
• For example: Few features are edited on some software platform but the edited
information is not updated like name of the editor, reason for editing and some
more relevant information. So, metadata must be updated with the original data.
Data Quality Improvement Techniques:

• Choice of relevant data from a relevant source.


• Derive precisions in the origin itself.
• Data quality testing in each phase of data capture.
• Using automated software tools for spatial and non-spatial data
validation.
• Assessment of the mode of data uses and user.
• Determining the map elements like scale, visualization and feature
orientation.
Quality evaluation procedure - Techniques for
evaluation of spatial data quality

The procedures for evaluation of data quality according to the ISO 19114
standard it must be done in five steps [Jak09, Joo06]:

• Identifying the data quality scope: elements and sub-elements,


• Identifying the data quality measure
• Selecting the evaluation method,
• Determine data quality results,
• Determine conformance.
THE END

THE END OF LECTURE 4

You might also like