You are on page 1of 23

Managing Error,

Accuracy, and Precision


In GIS
Importance of Understanding Error
*Until recently, most people involved with GIS
paid little attention to error

*That situation has now changed dramatically

*Error management is a vital role to the proper


functioning of a GIS database, and is subject to
a large percentage of work in most GIS shops
Importance of Understanding Error
*The key point is that through awareness,
scrutiny, and careful planning can minimize
these errors and their associated effects on
management and decision-making
Definitions for Understanding Error
*Accuracy: the degree to which information on
a map or in a digital database matches the true
or accepted values
-can vary greatly amongst datasets
-very high accuracy can be expensive

*Precision: refers to the level of measurement


and “exactness” of description
Definitions for Understanding Error
*Precision: refers to the level of measurement
and “exactness” of description in a GIS
-again, precision requirements vary greatly
depending on the dataset
-highly precise data can be much more
expensive to create
Definitions for Understanding
Error
Accuracy vs. Precision . . .
Types of Error
Positional Accuracy and Precision
*Refers to both horizontal and vertical positions
*Don’t use/compute locational information at a
level beyond which the data was intended

Accuracy Standards for US NTS Maps


1:1,200 ± 3.33 feet 1:2,400 ± 6.67 feet
1:4,800 ± 13.33 feet 1:10,000 ± 27.78 feet
1:12,000 ± 33.33 feet 1:24,000 ± 40.00 feet
1:63,360 ± 105.60 feet 1:100,000 ± 166.67 feet
Types of Error
Attribute Accuracy and Precision
*Attribute (non-spatial) information can also be
erroneous
*Some layers can be more precise than others

Conceptual Accuracy and Precision


*Use of inappropriate categories, or
misclassification
*Ex.-not classifying voltage in your power lines
layer would limit your ability to manage
electrical utilities infrastructure
Sources of Error
*Sources of error can be divided into three
groups:

-obvious sources of error

-errors resulting from natural variations or


from original measurements

-errors arising through processing


Obvious Sources of Error
*Age of Data
-some data sources may be too old to be
useful
-past collection standards may no longer be
acceptable
-the database could have changed
dramatically over time (erosion/deposition,
harvest, fire)
-updating a database is by far the most
common form of error management work
Obvious Sources of Error
*Areal Cover
-some datasets contain only part of the
required information (veg., soils are common)
-ex. FRI often contains no land cover
information for wetland areas
-some remote sensing data may be difficult to
acquire consistently cloudy regions
Obvious Sources of Error
*Map Scale
-always remember the implications of scale!!!!

*Density of Observations
-an insufficient number of observations may
not provide the required level of resolution
-ex. If you have a 40’ contour interval, you
had better not be reporting on or making
decisions about features only a few feet in
difference
Obvious Sources of Error
*Relevance
-surrogate data may be used to indirectly
describe/classify/quantify features
-Ex. We can create a forest polygon layer from
classification of remotely sensed data.
However, we are not classifying a “tree” as a
tree. Rather, we are classifying the imagery
based on spectral signatures, and those
signatures can be related to tree species.
Obvious Sources of Error
*Format
-methods of formatting data can introduce
errors
-conversion of scale, projection, or datum,
vectorization/rasterization, and pixel
resolution are possible areas of format error
-international mapping standards not
established
Obvious Sources of Error
*Accessibility
-try getting a highway map of the former
USSR in the Cold War days . . . Good Luck!

*Cost
-highly accurate, precise data is expensive!!!
Errors from Natural Variation
or from Original Measurements
*Positional Accuracy
-many natural features do not exhibit “hard”
boundaries like roads or boundary lines
-examples include . . .?
Errors from Natural Variation
or from Original Measurements
*Positional Accuracy
-many natural features do not exhibit “hard”
boundaries like roads or boundary lines
-examples include:
-soils
-vegetation communities
-climate variables
-drainage
-biomes, etc.
Errors from Natural Variation
or from Original Measurements
*Accuracy of Content
-qualitative accuracy refers to correct
labelling/classification (Ex.-pine forest vs.
spruce forest)
-quantitative inaccuracies often occur from
faulty equipment or poor readings
-what forestry equipment could give you bad
data? And how?
Errors Arising Through Processing
*Numerical Errors
-by far, the hardest errors to detect!!!
-different (faulty) computer chips can compute
differently, generating a different output
(response)

*Topological Errors
-overlaying, or deriving/creating new variables
based on other data can cause slivers,
overshoots, and dangles
Errors Arising Through Processing
*Classification/Generalization Errors
-classification inaccuracies/class merging
-grouping data in different ways can lead to
dramatically different results (Ex.-studying
cause of death amongst males would probably
be quite different if you had (amongst others)
an aged 18-25 group vs. an 18-50 group
Errors Arising Through Processing
*Geocoding/Digitizing Errors
-what can cause digitizing errors?
Errors Arising Through Processing
*Geocoding/Digitizing Errors
-what can cause digitizing errors?
-rasterizing will cause positional error
Error, Error, Everywhere . . .
How can we manage error?
1. Be aware of where error can be
generated (everything discussed in this
presentation)
2. Metadata, metadata, metadata . . . Fully
understand all data compiled for your GIS,
make notes of all work done with the data,
and send such information to future users
or with all GIS generated output.