You are on page 1of 26

Data Quality and Error

Presented By,
S Bensinghdhas, M.E (Design)
Asst. Lecturer
SJCET, Dar Es Salam

• Whenever you work with spatial data (or any
data for that matter) you will deal with some
sort of error due to the many steps involved
in creating spatial data.
• Spatial data is just an abstraction of what is
really there. Because of this abstraction, we
can expect error due to:
– How we conceptualize the data in the first
– How we collect the data
– How we present the data
• Additionally, there are other sources of error
such as:
– Obvious Errors
– Errors in natural variation


the following should act as a good checklist: – Is the data current? – Were the data mapped at the correct scale? Do they have the same accuracies? – What is the resolution of the data? Will it support the kinds of analysis we want to perform? – Do we have all the data for the project areas. you will have to give thought as to how to correct those errors before proceeding with a project. or will we have trouble getting them? . are they available.Obvious Error • • The errors we just discussed are illustrative of the general types of obvious errors you would encounter when using geospatial information. when given a task to perform. and the associated data. you should always approach a project with the obvious sources of error we just discussed firmly on you mind. or is there some data missing? – If we need other data sets. Therefore. As a geospatial analyst. Also. as a geospatial analyst.

Components of Data Quality • • • • • Positional Accuracy Attribute Accuracy Logical Consistency Resolution Completeness .

Spatial Accuracy • • As we previously stated. – Relative accuracy: refers to the displacement of two or more points on a map (in both the distance and angle).Y coordinates of a geographic object. a USGS quadrangle. or the difference in the X coordinate and the difference in the Y coordinate. Typically. compared to the displacement of those same points in the real world. That is. The figures on the right show two different maps of the Cornell campus and the City of Ithaca. If one knows the correct position of the geographic object. The top map.5 feet. a photogrammetrically derived map of the same area has an absolute accuracy of about 2. . absolute accuracy will measure the total different between an object. has an absolute accuracy of around 40 feet. they can compare the differences with the position represented in the geographic database. even positional accuracy is divided into two different categories: – Absolute accuracy: refers to the actual X. But. positional accuracy relates to the coordinate values for the geographic objects. The bottom map. the coordinates for a building on the quadsheet are probably within 40 feet of their real world coordinates.

the distance along Tower Road is only about 15 feet different. the relative distance. if were were to zoom into an area and measure the distance between two points. In this case. and the angle would be fairly similar.Relative Accuracy • Even though the USGS quadrangle has much less absolute accuracy than the photogrammetrically derived map. and the azimuth .

Positional Accuracy .

Attribute Accuracy Connecticut New Jersey Pennsylvania New York .

Logical Consistency • Representation of data that does not make sense – Road in the water – Contours that cross or end – Features on steep slopes .

peninsulas.) .Resolution • Generalization may improperly represent size and shape • Cartographic Asthetics • Entire regions may be eliminated (islands. etc.

Completeness • Fragmented coverage of many developing countries – Soils – Vegetation • Must determine methods for uniformity .

Also. whether the processing is appropriate or not. – Data Format: The way we represent data also presents an obvious source of error. . For example. That is. doing so may not be a good idea due to the different accuracies of the products. – Map Scale: In general. Therefore. and GIS software really don’t care what data you give it. and it is entirely possible that much of the data collected in the beginning of a project may be out of date by the end of the project. You can see the differences in representation between a map with 10 meter grid cells. That being the case. a raster map of landuse represented by 10 meter grid cells will differ significantly from a raster map of landuse represented by 100 meter grid cells. but are considered a state secret in another country. New York.000 USGS maps.Obvious Errors • The statement “to err is human” is very applicable to creating spatial data. a GIS will process any of your data. however. and 100 meter grid cells. Computers. there may be pieces missing in one section. 1:100. Many GIS projects take years to complete. Typing in the wrong value in a computer is a common mistake that humans make. larger scale maps tend to have greater accuracy than smaller scale maps. there are other sources of obvious error besides human error: – Age: a map is a representation of real-world objects at a given point in time. Also. you can combine data from different scales rather easily.000 and 1:24. especially maps within the “same family” such as the differences between 1:250. The following is a grid of landuse values around Ithaca. – Aerial Coverage: Many data sets may not have uniform coverage. larger scale maps show more detail than smaller scale maps. due to the recent events of September 11. – Accessibility: Not all data sets are equally accessible. land resources in one country may be available. Humans make a lot of errors. The reliability of a dataset typically goes down as it gets older. 2001. This is especially true of data that would frequently change such as housing within a city.000. However. For example. 30 meter grid cells. some data are unavailable due to security reasons.

. You can see how the data has changed over 30 years.Problems with Age The following maps show the different land cover types between 1968 and 1995. and why using older data might present a problem.

Obvious Sources of Error • Areal Coverage – Many data sets do not have a uniform coverage of information       ÏÎ Â Â Â ÏÎ Â Â ÏÎ ÏÎ ÏÎ NASSAU COUNTY BASEMAP   ÏÎ ÏÎ ÏÎ ÏÎ SUFFOLK COUNTY PARCELS  ÏÎ .

Problems with Format • You can see the different way in which data is represented when using different formats. 10 meter 30 meter 100 meter . In this case. and 100 meter grid cells are used. 30. 10.

If you’ve ever measured your blood pressure on one of the automatic machines in the drug store. you will have to be aware of these kinds of errors too. and take the form of: – Positional Errors Due to Natural Variation: there are natural variations in materials that might make them less accurate. For example. a temperature gauge or pH meter may have slightly different readings when measuring the same location. For example. – The variations of measurements are often related to two important concepts called precision and accuracy… .Errors Due to Natural Variation • You can see why each of the previous error types are called Obvious Errors. The shrinking of the material is virtually unnoticeable by a user. or may have slight variations from measurement to measurement. the real world errors could be quite large. While some of this is based on your own fluctuations in blood pressure. Nonetheless. but depending upon the scale of the map. But there are other types of errors that are not so obvious. a paper map stored in a humid room will actually shrink. and oftentimes overlooked. you have probably noticed that two readings taken after one another can be different. – Variations Due to Equipment: Some equipment may not measure information correctly. the machines themselves have some variability. The errors are termed errors in natural variation.

or faulty equipment (thermometer.Errors Resulting from Natural Variations from Original Measurements • Positional Accuracy – Result of poor field work. poor vectorization (line digitizing) – Correction through rubbersheeting • Accuracy of Content – Attribute errors caused by miscoding. media shrinkage and expansion. pH meter) • Sources of Variation in Data: – Data entry or output faults .

Precision • Accuracy: extent to which an estimated value approaches the true value • Precision: measure of dispersion of observations about a mean • Accuracy vs.Errors Resulting from Natural Variations from Original Measurements • Measurement Error – Accuracy vs. Precision example – Laboratory Errors • Results of World-wide Laboratory Exchange Program – Same soil samples in different laboratories exceeded: » 11% for clay content .

Precision is then the ability to repeat a measurement. The figures to the right illustrate the differences between accuracy and precision. or the .• • • • • Accuracy and Precision Accuracy is defined as displacement of a plotted point from its true position in relation to an established standard while Precision is the degree of perfection. or repeatability of a measurement. Therefore. For mapping. accuracy is associated with position of an object to its true position. if there are natural variations in either the instruments used for measurement. or how likely you are to return to the same location time and time again.

Errors Arising Through Processing • Numerical Errors in the Computer – Numerical precision – PC ARC/INFO is Single Precision – Some GIS are using Integer values to store coordinates and large areas may not be stored precisely. • Scaling a triangle • Faults Arising Through Topological Analysis – Assumes • • • • Source data is uniform Digitizing procedures are infallible Map overlay is only concerned with line intersection Boundaries can be sharply defined and drawn .

Vector Data of Buildings Vector data converted to raster with 10’ grid cells Raster data converted back to vector. For example. Or. as the examples show. In some cases. using 10’ grid cells . the representation of the geographic objects may be quite different. we can take a raster feature and convert it to vector format.Raster to Vector • GIS allows you to convert raster and vector features between one another. we can take a vector feature and convert it to raster. But. you can see how the raster version of the map actually caused some buildings to “merge” together. depending upon the resolution of the features.

Errors in Data Processing • • Digitizing Data: Once again. and the concept of “sliver polygons”. as shown in the example below. drawn at a scale of 1:100. the road edge on the USGS quadrangle is actually 4 meters wide in some spots. a 1 mm wide line (the thickness of a sharp pencil) would actually represent 100 meters on the ground. converting data from raster ed ge Each of the examples are shown in the illustrations below. Also.000. Spatial Analysis: Some GIS functions such as overlay present problems W such ambiguous locations. of pa ve m en ti s gr ea te r th a n 4 m et er s . scale presents a problem with digitized data. id th of to vector format will also introduce errors. Or. On a soil map.

but should have a confidence interval .25 m wide on 1:250 map » 100m wide on 1:100000 » Estimates show that 10% of a 1:24000 soil map may represent the boundary lines alone – Digital Representation • Curves are approximated by many vertices • Boundaries are not absolute.Errors Associated with Spatial Analysis • Errors in Digitizing a Map – Source errors • Distortion • Boundaries drawn on a map have a “thickness” – 1 mm line » 1.

the resulting polygon has not only the logical intersection between the two polygons. represent spatial errors in the data. .Sliver Polygons • In the following example. there are two polygons. or sliver polygons. but also many small polygons that are probably due more to the fact that the representation of the polygon boundaries are slightly different. These smaller. When we overlay the two of them.

Errors Associated with Spatial Analysis • Boundary Problems – Definitely in – Definitely out – Possibly in – Possibly out – Ambiguous (on the digitized border line) .