GI 224 Classification

GI 224 Map production
Classification of
Geographic data for
mapping
Prep: Iriael Mlay

Classification
 In mapping context is the grouping of
geographic objects or phenomena on the
basis of identical/similar characteristics
 Classification is in fact a generalization
process.
 The bulk of data is reduced which means
that the original variation and accuracy
are lost
Reasons for Classification
 It reduces the complexity of reality
 It help to understand reality better and distinguish certain
patterns in the cartographic representation of geographic
data
 Human perceptional constraints
 When selective perception is required the number of
classes that can be represented in a map is limited
 Technical constraints
 Matching each occurrence in a data set with a unique
symbol (particularly when the visual variable value has to
be used for the representation of data in a choropleth map)
is in conventional map production often impossible,
because it would require too many reproduction screens.
Classification cont.
 Classification is applied in all topographic maps,
and also in many, but not all, thematic maps.
Sometime the decision made is not to classify the
data, and to produce, for example, unclassified
choropleth maps.
 Another example in which data are not always
classified is the proportional point symbol map
where graduated symbols are used
 In many other thematic map types data are
classified, like in chorochromatic maps, most
choropleth maps, flow line and isoline maps,
proportional point symbol maps with range
graded symbols.
Classification cont.
 If it is decided to classify data for mapping
purposes then the classification should be
done carefully, because it very much
influences what the map user ill ultimately
see and understand of the data.
Classification decision are influenced
by:
 The purpose of the map
 Questions that have to be answered include: Is the map meant for
detailed study or not? Is selective perception required? Is
comparison with other representations of data required?
 The user characteristics
 For school children  fewer classes and methods that can be easily
understood are more important than for adults
 Characteristics of the data
 Critical values in a data set may also influence the class boundaries
(values above / below a certain temperature)
 The scale of the map/the size of the symbols used to
represent the data.
 The smaller the scale of the map and smaller the symbols used to
represent the data the fewer classes can be used.
 The visual variable (s) applied for the representation
 The number of categories (classes) that can be discerned is
influence by the visual variable(s) used
Aim of any classification
 The whole aim of any classification
procedure is to group into classes what is
similar/related, and separate what is
dissimilar/not related. In other words
classes have to be produced that are
internally homogeneous, and externally
heterogeneous
Classification of quantitative data
 Rules for classification
 The classification should encompass the full
range of the data
 Classes may no overlap, and empty classes are
only allowed in exceptional cases
 The accuracy of the classification may not
exceed the accuracy of the original data
 Round off class limits are better understood
and memorised
Main classification decisions
 The main classification decisions refer to
the determination/ calculation of:
 The number classes
 The size of the class intervals (depend on
classification method)
 Determination of class boundaries
A systematic approach to quantitative
data classification
 Put the data in an array
 Order your raw data in an ascending order from low to high
 Produce a dispersal graph/scatter diagram
 Draw two orthogonal axes
 Along horizontal axis, the data values are plotted in ascending
order
 Data frequencies are plotted along the vertical axis
 It may reveal a number of clusters and gaps in the data set
 Produce a graphic array
 Vert. axis is scaled to accommodate all the values occurring in the
data set.
 Horz. axis is scaled to in such a way that all observations
(occurrences) in a data set can be plotted against it, at regular
distances from each other.
 The observations are arranged in an ascending order from the
smallest one to the largest one.
 Classification methods
 Classification methods can be grouped into three classes
Classification methods
 Leading t irregular class interval are methods based on:
 Natural breaks
 Quantiles
 Equal areas
 Nested means
 Optimal classification
 Leading to constant class intervals are methods based on:
 Equal intervals
 Standard deviation
 Leading to systematically changing class intervals are
methods based on:
 Arithmetic progression
 Geometric progression
 Reciprocal (harmonic) progression
Quantiles
 Refers to the division of a data set into
parts containing equal frequencies.
 Depending on the number of parts, more
specific names can be given like:
 Quartiles (four parts), quintiles (five parts) etc.
 So in a classification based on quantiles,
the classes contain equal number of
observations
 The data have to be arrayed first
Equal area
 Classes are chosen in such a way that the
area covered by each class on the map is
approximately the same
Nested means
 A mean divides a data set into two parts.
Below and above the mean. Within each
part, new means can be calculated to
further subdivide the data set
 The method divides a data set into 2n
Equal class interval
 The range of the data set is calculated,
and then subdivided by the number of
classes required
I=(Xmax-Xmin)/n
 I=the class interval

 n= the number of classes
 Xmax=the maximum value of the data set
 Xmin=the maximum value of the data set

Example
 Consider a data set in integers, where
Xmin=0 and Xmax=36, and n=4 the class
interval are:
(36-0)/4=9
class 1 0–9
class 2 10 – 18
class 3 19 – 27
class 4 28 – 36
data classification cont.
 Decide on the number of classes
 Particularly the decision on the number of
classes should be based on the requirement
of the user
 In order to limit the variation within the
classes, there usually is some relationship
between the number of classes and the size
of the data set
 An indication to the number of classes can
be given by the following formula
log N
n
 Where:
 n= the number of classes
 N= the number of observations log 2
data classification cont.
 Calculate class limits
 Adjust class limit to natural breaks
 Adjust number of classes (if necessary)
Statistical surfaces

GI 224 Classification

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GI 224 Classification

Uploaded by

Copyright:

Available Formats

GI 224 Map production

Prep: Iriael Mlay

 I=the class interval

 Xmax=the maximum value of the data set

 Xmin=the maximum value of the data set

You might also like