Professional Documents
Culture Documents
Remote Sensing and Image Analysia PDF
Remote Sensing and Image Analysia PDF
Remote sensing is a technology used for obtaining information about a target through the
analysis of data acquired from the target at a distance. It is composed of three parts, the
targets - objects or phenomena in an area; the data acquisition - through certain
instruments; and the data analysis - again by some devices. This definition is so broad that
the vision system of human eyes, sonar sounding of the sea floor, ultrasound and x-rays
used in medical sciences, laser probing of atmospheric particles, are all included. The
target can be as big as the earth, the moon and other planets, or as small as biological cells
that can only be seen through microscopes. A diagrammatic illustration of the remote
sensing process is shown in Figure 1.1.
Remote sensing data acquisition can be conducted on such platforms as aircraft, satellites,
balloons, rockets, space shuttles, etc. Inside or on-board these platforms, we use sensors to
collect data. Sensors include aerial photographic cameras and non-photographic
instruments, such as radiometers, electro-optical scanners, radar systems, etc. The platform
and sensors will be discussed in detail later.
Once image data are acquired, we need methods for interpreting and analyzing images. By
knowing "what" information we expect to derive from remote sensing, we will examine
methods that can be used to obtain the desirable information. We are interested in "how"
various methods of remote sensing data analysis can be used.
In summary, we want to know how electromagnetic energy is recorded as remotely sensed
data, and how such data are transformed into valuable information about the earth surface.
Figure 1.1
The Flows
of Energy
and
Information
in Remote
Sensing
The following is a brief list of the times when innovative development of remote sensing
were documented. More details may be found in Lillesand and Kiefer (1987) and
Campbell (1987).
We begin to ask: what are the factors that make remotely sensed images taken for the
same target different? Remotely sensed data record the dynamics of the earth surface. The
three-dimensional earth surface is changing as time goes. Two images taken at the same
place with the same imaging condition will not be the same if they are obtained at different
times. Among many other factors that will be introduced in later chapters, sensor and
platform design affect the quality of remotely sensed data.
Remote sensing data can be considered as models of the earth surface at very low level of
generalization. Among various factors that affect the quality and information content of
remotely sensed data, two concepts are extremely important for us to understand. They
determine the level of details of the modeling process. These are the resolution and the
sampling frequency.
For example, assume that the level of solar energy coming from the sun and passing
through the atmosphere at a spectral region between 0.4 mm - 1.1 mm is distributed as in
Fig. 1.2. This is a continuous curve.
After the solar energy interacts with a target such as a forest on the earth, the energy is
partly absorbed, transmitted, or scattered and reflected. Assume that the level of the
scattered and reflected energy collected by a sensor behaves in a manner as illustrated in
Fig. 1.3.
Fig. 1.3 Reflected Solar Energy by Trees
The process that makes the shape of the energy curve change from Fig. 1.2 change to Fig.
1.3 will be discussed later. Let us use Fig. 1.3 to discuss the concepts of spectral resolution
and spectral sampling.
In Figure 1.4, the three shaded bars A, B, and C represent three spectral bands. The width
of each bar covers a spectral range within which no signal variation can be resolved. The
width of each spectral band represents its spectral resolution. The resolution of A is coarser
than the resolution of B. This is because spectral details within band A that cannot be
discriminated may be partly discriminated with a spectral resolution as narrow as band B.
The resolution relationships among the three bands are:
Sampling determines the various ways we use to record a spectral curve. If data storage is
not an issue, we may choose to sample the entire spectral curve with many narrow spectral
bands. Sometimes, we choose to make a discrete sampling over a spectral curve (Figure
1.4). The questions are: which way of sampling is more appropriate and what resolution is
better? It is obvious that if we use a low resolution, we are going to blur the curve. The
finer the resolution is, the more precise can we restore a curve, provided that sufficient
spectral sampling frequency is used.
The difference between imaging spectrometers and earlier generation sensors is in the
difference of the spectral sampling frequency. Sensors of earlier generations use selective
spectral sampling. Imaging spectrometers have a complete systematic sampling scheme
over the entire spectral range. An imaging spectrometer, such as CASI, has 288 spectral
bands between 0.43 - 0.92 spectral region, while earlier generation sensors only have 3 - 7
spectral bands.
Similar to the spectral case, the surface has to be sampled with certain spatial resolution.
The difference is that spatial sampling is mostly systematic, i.e., a complete sampling over
an area of interest. The difference in spatial resolution can be seen in Figure 1.5.
Figure 1.5. Sampling the same target with different spatial resolutions.
A scene including a house with garage and driveway is imaged with two different spatial
resolutions. For each cell in Figure 1.5a no object occupies an entire cell. Each cell will
contain energy from different cover types. Such cells are called mixed pixels, also known
as mixels. In Chapter 7, we will introduce some methods that can be used to decompose
mixed pixels. Mixed pixels are very difficult to discriminate from each other. Obviously a
house cannot be easily recognized at the level of resolution in Figure 1.5a, but it may be
possible in Figure 1.5b. As spatial resolution becomes finer, more details about objects in a
scene become available. In general it is true that with finer spatial resolutions objects can
be better discriminated with human eyes. With computers, however, it may be harder to
recognize objects imaged with finer spatial resolutions. This is because finer spatial
resolutions increase the image size for a computer to handle. More importantly, for many
computer analysis algorithms, they cause the effect of "seeing the tree but not the forest."
Computer techniqes are far poorer than human brain in generalization from fine details.
Temporal sampling can be regarded similar to spectral sampling. For example, temporal
sampling means how frequently we are imaging an area of interest. Are we going to use
contiguous systematic sampling as in movie making or selective sampling as in most
photographic actions? To decide the temporal sampling scheme, the dynamic
characteristics of the target under study have to be considered. For instance, if the study
subject is to discriminate crop species, the phenological calendar of each crop type should
be considered for when to collect remotely sensed data in order to best characterize each
different crop species. The data could be selected from the entire growing season between
late April to early October for mid and high latitudes in northern hemisphere. If the subject
is flood monitoring, the temporal sampling frequency should be high during the flood
period because floods usually last only a few hours to a few days.
We organize the remaining chapters of this book that lead you to take more advantages
of remote sensing in the applications mentioned above. In Chapter two, we will first
introduce the very basic physics required to understand the imaging mechanism in remote
sensing. In Chapter three, we introduce the development of sensing systems following a
historical order. In Chapter four, we introduce imaging geometry and illustrate geometrical
calibration methods that are required to achieve precise measurement of spatial dimensions
of objects. In Chapter five, we explain various methods for recovering image radiometry
affected by sensor malfunctioning, atmospheric interference and terrain relief. In Chapter
six, we illustrate some of the most commonly used image processing methods for image
enhancement. In Chapter seven, we focus on the introduction of various strategies for
information extraction from remotely sensed data. In Chapter eight, following a brief
introduction on map making, we introduce some methods that are used to combine maps
and other spatial data with remotely sensed data for analysis and extraction of information
on various targets.
Chapter 1
References
Gonzalez, R.C., P. Wintz, 1987. Digital Image Processing. 2nd Ed., Addison-Wesley, Reading:MA.
Lillesand, T.M. and Kiefer, R.W., 1987, Remote Sensing and Image Interpretation, Sec. Ed., John Wiley and Sons,
Inc.: Toronto.
Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187-
197.
Lists most of the imaging spectrometers developed worldwide. Sensor calibration and various applications.
Pratt, W., 1991. Digital Image Processing. John Wiley and Sons, Inc.: Toronto.
Further Readings
Asrar G., ed. 1989, Theory and Applications of Optical Remote Sensing, John Wiley and Sons, Toronto.
A selection of most important fields of optical remote sensing ranging from the physical basis of energy-meter
interaction, vegetation canopy modelling, atmospheric effects reduction, applications to forest, agriculture,
coastal wetland, geology, snow and ice, climatology and meteorology, and ecosystem. Its emphasis is on the
application of remote sensing to understanding land-surface processes globally.
Jensen, J.R., 1986, Digital Image Processing, an Introductionary Perspective. Prentice-Hall: Englewood Cliffs, N.J.
A good introduction book on digital image analysis concepts and procedures. A show how type of book. Easy
for beginners. Typical topics covered include image statistics , image enhancement in spatial domain, geometric
correction, classification, change detection. Completely related to a remote sensing context.
A good introduction book. More mathematical than Jensen's book. Some additional materials in comparison to
Jensen's book include an entire chapter on Fourier Transform. Relationships among some basic image
enhancement and image classification algorithms.
2.1 Electromagnetic Energy
Energy is a group of particles travelling through a certain media. Electromagnetic energy is
a group of particles with different frequencies travelling at the same velocity. These
particles have a dual-mode nature. They are particles but they travel in a wave form.
This equation explains that the shorter wavelength has higher spectral frequency
Each wave represents a group of particles with the same frequency. All together they
have different frequencies and magnitudes.
With each wave, there is an electronic (E) component and a magnetic component (M).
The Amplitude (A) reflects the level of the electromagnetic energy. It may also be
considered as intensity or spectral irradiance. If we plot A against the wavelength λ we
then get an electromagnetic curve, or spectrum (Figure 2.1).
Any matter with a body temperature greater than 0 K emits electromagnetic energy.
Therefore, it has a spectrum. Furthermore, different chemical elements have different
spectra. They absorb and reflect spectral energy differently. Different elements are
combined to form compounds. Each compound has a unique spectrum due to its unique
molecular structure. This is the basis for the application of spectroscopy to identify
chemical materials. It is also the basis for remote sensing in discriminating one matter from
the other. Spectrum of a material is like the finger print of human being.
The wavelength of electromagnetic energy has such a wide range that no instrument can
measure it completely. Different devices, however, can measure most of the major spectral
regions.
The division of the spectral wavelength is based on the devices which can be used to
observe particular types of energy, such as thermal, shortwave infrared and microwave
energy. In reality, there are no real abrupt changes on the magnitude of the spectral energy.
The spectrum are conventionally divided into various parts as shown below:
The optical region covers 0.3 - 15 mm where energy can be collected through lenses. The
reflective region, 0.4 - 3.0 mm, is a subdivision of the optical region. In this spectral
region, we collect solar energy reflected by the earth surface. Another subdivision of the
optical spectral region is the thermal spectral range which is between 3 mm to 15 mm,
where energy comes primarily from surface emittance. Table 2.1 lists major uses of some
spectral wavelength regions.
At the reflective spectral region, we are more concerned about the reflective properties of
an object. But in the thermal spectral region, we have to rely on the emittance of an object.
This is because most matters at the conventional temperature (temperature of our
environment) emit energy that can be measured. Therefore, we introduce some basics of
the radiation theory.
The first theory treats electromagnetic radiation as many discrete particles called photons
or quanta (terms in Physics). The energy of a quantum is given by
E = hv
where
since
thus
Energy (or radiation) of a quantum is inversely proportional to the wavelength. The longer
the wavelength of a quantum, the smaller is its energy. (The shorter the wavelength, the
stronger is its energy.) Thus, the energy of a very short wavelength (UV and shorter) is
dangerous to human health. If we want to sense emittance from objects at longer
wavelength, we will have to either use very sensitive devices or use less sensitive device to
view a larger area to get sufficient amount of energy.
This has implications to remote sensing sensor design. To use the available sensing
technology at hand, we will have to balance between wavelength and spatial resolution. If
we wish to make our sensor to have higher spatial resolution, we may have to use short
wavelength regions.
This means that any material with a temperature greater than 0 K will emit energy. The
total energy emitted from a surface is proportional to T4 .
This law is expressed for an energy source that behaves as a blackbody - a hypothetical,
ideal radiator that absorbs and re-emits all energy incident upon it. Actual matters are not
perfect blackbody. For any matter, we can measure its emitting energy (M), and compare it
with the energy emitted from a blackbody at the same temperature (Mb) by:
" " is the emissivity of the matter. A perfect reflector will have nothing to emit.
Therefore, its e will be "0". A true blackbody has an of 1. Most other matters fall in
between these two extremes.
The third theory is Wien's displacement law which specifies the relationship between the
peak wavelength of emittance and the temperature of a matter.
max = 2897.8/T
As the temperature of a blackbody gets higher, the wavelength at which the blackbody
emits its maximum energy becomes shorter.
Figure 2.2 shows blackbody radiation curves for temperature levels of the Sun, a
candescent lamp and the Earth. During the day time we can see the energy from the sun is
overwhelming. During the night, however, we can use the spectral region between 3 µm
and 16 µm to observe the emittance properties of the earth surface.
At wavelengths longer than the thermal infrared region, i.e. at the microwave region, the
energy (radiation) level is very low. Therefore, we often use human-made energy source to
illuminate the target (such as Radar) and to collect the backscatter from the target. A
remote sensing system relying on human-made energy source is called an "active" remote
sensing system. Remote sensing relying on energy sources which is not human-made is
called "passive" remote sensing.
The atmosphere has different effects on the EM transfer at different wavelength. In this
section, we will mainly introduce the fact that the atmosphere can have a profound effect
on intensity and spectral composition of the radiation that reaches a remote sensing system.
These effects are caused primarily by the atmospheric scattering and absorption.
Different particle sizes will have different effects on the EM energy propagation.
dp =λ Mie scattering Sm
The atmosphere can be divided into a number of well marked horizontal layers on the basis
of temperature.
Troposphere:
It is the zone where weather phenomena and atmospheric turbulence are most marked. It
contains 75% of the total molecular and gaseous mass of the atmosphere and virtually all
the water vapour and aerosols.
Stratosphere: 50 km Ozone
Mesosphere : 80 km
Thermosphere : 250 km
Exosphere : 500 km ~ 750 km
Figure 2.3 Horizontal layers that divide the atmosphere (Barry and Chorley, 1982)
Scattering causes degradation of image quality for earth observation. At higher altitudes,
images acquired in shorter wavelengths (ultraviolet, blue) contain a large amount of
scattered noise which reduces the contrast of an image.
Absorption: Atmosphere selectively absorbs energy in different wavelengths with
different intensity.
The atmosphere is composed of N2 (78%), O2 (21%), CO2, H2O, CO, SO2, etc. Since
different chemical element has a different spectral property, regions with different
intensity. As a result, the atmosphere has the combined absorption features of various
atmospheric gases. Figure 2.4 shows the major absorption wavelengths by CO2, H2O, O2,
O3 in the atmosphere.
Figure 2.4 Major absorption wavelengths by CO2, H2O, O2, O3 in the atmosphere
(Source: Lillesand and Kiefer, 1994)
Transmission: The remaining amount of energy after being absorbed and scattered by the
atmosphere is transmitted.
Therefore, the absorpiton of EM energy by H2O and CO2 is the most difficult part to be
characterized.
Atmospheric absorption reduces the number of spectral regions that we can work with in
observing the Earth. It affects our decision in selecting and designing sensor. We have to
consider
3) the source, magnitude, and spectral composition of the energy available in these ranges.
For the third point, we have to base our decision of choosing sensors and spectral regions
on the manner in which the energy interacts with the target under investigation.
On the other hand, although certain spectral regions may not be as transparent as others,
they may be important spectral ranges in the remote sensing of the atmosphere.
t has been mentioned earlier. Solar energy has to transfer through the atmosphere in order
to reach the Earth surface. Transmitted energy can also be measured from under water.
At the thermal spectral region, energy is primarily absorbed, and the reflected energy is
significantly less in magnitude than the emission of a target. Since what is absorbed will be
emitted, the absorbance "a" or the emissivity " " is a parameter of concern in the thermal
region.
r is the
easiest to
measure
using
remote
sensing
devices.
Therefore,
it is the
most
important
parameter
for remote
sensing
observation
using the
0.3 - 2.5
µm. r is
called
spectral
reflectance
or
reflectance
or spectral
signature.
Our second question is: how is energy reflected by a target? It can be classified into three
cases, specular reflector, irregular reflector, and perfect diffusor.
Specular reflector is caused by the surface geometry of a mater. It is of little use in remote
sensing because the incoming energy is completely reflected in another direction. Still
water, ice and many other minerals with crystal surfaces have the same property.
Perfect diffuse reflector refers to a matter which reflects energy uniformly to all directions.
This type of reflector is desirable because it is possible to observe the matter at any
direction and obtain the same reflectance.
Unfortunately most targets have a behaviour between the ideal specular reflector and
diffuse reflector. This makes quantitative remote sensing and target identification purely
from reflectance data difficult. Otherwise, it would be easy to discriminate object using
spectral reflectances from a spectral library. Due to the variability of spectral signature, one
of the current research direction is to investigate the bidirectional properties of various
targets.
Plotting reflectance against wavelength, we will get a spectral reflectance curve. Examples
of spectral curves of typical materials such as vegetation, soil and water are shown in
Figure 2.5. Clear water has a low spectral reflectance (< 10%) in the visible region. At
wavelengths longer than 0.75 µm, water absorbs almost all the incoming energy.
Vegetation generally has three reflectance valleys. The one at the red spectral wavelength
region (0.65 µm) is caused by high absorptance of energy by chloraphyll a and b in the
leaves. The other two at 1.45-1.55 µm and 1.90-1.95 µm are caused by high absorptance of
energy by water in the leaves. Dry soil has a relatively flat reflectance curve. When it is
wet, its spectral reflectance drops due to water absorption.
Figure 2.5 Typical Spectral Reflectance Curves for Soil, Vegetation and Water
Questions
1. Using the scattering properties of the atmosphere explain why under clear sky condition
the sky is blue. Why the sun looks red at the time of sunset or sun-rise?
2. Why X-ray is used for medical examination? Using radiation law No. 3, explain why as
a piece of iron is heated, the color of the iron begins with dark red, then changes to red, to
yellow to white.
3. Describe how one may use absorptance and transmittance of a matter in remote sensing.
4. Use Figures 2.4 and 2.5 as references, answer the following questions:
7. Which spectral regions should be used to observe water content in the atmosphere?
What about water content in vegetation?
Chapter 2
References
Barry, and Chorley, 1982. Climate, Weather and Atmosphere, Longman: London
Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing, John Wiley and Sons, Inc.: Toronto
Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd Ed., John Wiley and Sons,
Inc.: Toronto.
3.1 Camera Systems
A camera system is composed of the camera body, lens, diaphragm, shutter and a film
(Figure 3.1):
where
f = focal length
do = distance from the lens to the object
di = distance from the lens to the image (Figure 3.2)
do >> di
The diameter of a diaphragm controls the depth of field. The smaller the diameter of an
opened diaphram, the wider the distance range in which the scene constructs clearly
focused image. The diaphragm diameter can be adjusted to a particular aperture. What we
normally see on a camera's aperture setting is F 2.8 4 5.6 8 11 16 22
These F#s are obtained by f/diameter. When diameter becomes smaller, F# becomes larger
and more energy is stopped. The actual amount of energy reaching the film is determined
by:
where
Films
A film is primarily composed of an emulsion layer(s) and base (Figure 3.3)
Figure 3.3. Layers in black and white films and colour films
The most important part of the film is the emulsion layer. An emulsion layer contains light
sensitive chemicals. When it is exposed in the light, chemical reaction occurs and a latent
image is formed. After developing the film, the emulsion layer will show the image.
Films can be divided into negative and positive, or divided in terms of their ranges of
spectral sensitivities: black and white (B/W), B/W Infrared, Color, Color Infrared.
B/W negative films are those films that have the brightest part of the scene appearing the
darkest while the darker part of the scene appearing brighter on a developed film.
Color negative films are those on which a color from the scene is recorded by its
complementary colors.
There are two important aspects of a film: its spectral sensitivity and its characteristic
curve.
Spectral sensitivity specifies the spectral region to which a film is sensitive (Figure 3.4).
Since infrared is also sensitive to visible light, the visible light should be intercepted by
some material. This is done by optical filtering (Figure 3.4). For this case, a dark red filter
can be used to intercept visible light.
Similarly, other filters can be used to stop light at certain spectral ranges from reaching the
film.
Characteristic curve indicates the radiative response of a film to the energy level.
[a] [b]
If the desnity of a film develops quickly when the film is exposed to light, we say that the
film is fast (Fig. 3.5a). Otherwise, the film is slow (Fig. 3.5b). Film speed is defined by
labels such as ASA 100, ASA 200, ...., ASA 1000. The greater the ASA number, the faster
a film is. High speed films will have a good contrast on the image, but low speed films will
provide better details.
Color Films
There are two types of colors: additive primaries and subtractive primaries:
The development procedure for the colour negative film is shown in Figure 3.9.
Color Infrared Films
The sensitivity curve of a colour infrared film is shown in Figure 3.10. Figure 3.11 shows
the structure of this type of film.
Yellow
Magenta
Cyan
B G R
Film Dye
Photography
B G R IR
Clear C
Clear Y
Clear M
Final result
Black B G R
Flight Height
For a given focal length of an aerial camera, the higher the camera is, the larger the area
each aerial photo can cover. Obviously, the scale of aerial photographs taken at higher
altitudes will be smaller than those taken at lower altitudes.
Camera Orientation
Two types of camera orientations maybe used: vertical and oblique(slant) (Figure 3.13).
Oblique allows one to take pictures of a large area while vertical allows for less distortion
in photo scale.
Figure 3.13. Vertical and slant aerial photography
View Angle:
View angle is normally determined by the focal length and the frame size of a film. For a
camera, the frame is fixed, therefore the ground coverage is determined by the altitude and
the camera viewing angle (Figure 3.14)
f1 > f2 > f3
a1 < a2 < a3
Photographic Resolution
Spatial resolution of aerial photographs is largely dependent on the following factors:
ï Lens resolution
ï optical quality
ï Film resolution
ï Film flatness -normally not a problem
ï Atmospheric conditions -changes all the time
ï Aircraft vibration and motion -random
If the scale of an aerial photograph is known, we can convert the photographic resolution
(rs) to ground resolution.
Figure 3.15. Resolving power test chart (from Lillesand and Kiefer, 1994).
Ground Coverage
A photograph may have a small coverage if it is taken either at a low flight height or with a
narrower viewing angle.
The advantages of photographs with small coverages are that they provide more detail,
and less distortion and displacement. It is easier to analyze a photograph with a small
coverage because similar target will have less distortion from the center to the edge of the
photograph, and from one photograph to the other.
The disadvantage of photographs with small coverages is that it needs more flight time to
cover an area and thus the cost will be higher. Moreover, mosaicing may cause more
distortion.
A large coverage can be obtained by taking the photograph from a higher altitude or using
a wider angle. The quality of photographs with a large coverage is likely to have poorer
photographic resolution due to larger viewing angle and likely stronger atmospheric effect.
The advantages are that a large coverage is simultanuously obtained, requires less
geometric mosaicing, and costs less.
The disadvantages are that it is difficult to analyze targets in detail and that target is
severely distorted.
Essentially, the size of photo coverage is related to the scale of the raw aerial photographs.
Choosing photographs with a large coverage or a small one should be based on the
following:
ï budget at hand
ï task
ï equipment available
Advantages:
Disadvantages:
What are the differences between a camera system and a scanning system? The following
are some of the major differences:
The first of the Landsat series was launched in 1972. The satellite was called Earth
Resources Technology Satellites (ERTS-1). It was later renamed as Landsat - 1. On board
of Landsat-1 are two sensing systems: multispectral scanning system (MSS) and return
beam vidicon (RBV). RBV was discontinued since Landsat-3. MSS is briefly introduced
here because it is still being used. The MSS sensor has 6 detectors per band (Figure 3.17).
The scanned radiance is measured in four image bands (Figure 3.18).
Figure 3.17. Each scan will collect six image lines.
Figure 3.18. Four image bands with six detectors in each band.
MSSs have been used on Landsat - 1, 2, 3, 4, 5. They are reliable systems. The spectral
region of each band is listed below:
Landsat 1,
Landsat 4, 5
2
B4 0.5 - 0.6 mm B1
B5 0.6 - 0.7mm B2
B6 0.7 - 0.8 mm B3
B7 0.8 - 9.1 mm B4
Landsat 3 had a short life. The MSS systems on Landsat 3 were modified as compared to
Landsat 1 and 2. Landsat-6 was launched unsuccessfully in 1993.
Each scene of MSS image covers 185 km X 185 km in area. It has a spatial resolution of 79
m X 57 m. An advantage of MSS is that it is less expensive. Sometimes one detector is left
blank or its signal is much different from other ones, creating banding or striping. We will
discuss methods for correcting these problems in Chapter 5.
Since the launch of Landsat 4 in 1982, a new type of scanner, called Thematic Mapper
(TM), has been introduced. It
ï Increased the number of spectral bands.
ï Improved spatial and spectral resolution.
ï Increased the angle of view from 11.56° to 14.92°.
MSS data are collected on only one scanning direction. TM data are collected on both
scanning directions (Figure 3.19).
Figure 3.19. Major changes of the TM system as compared to the MSS system.
High Resolution Visible (HRV) Sensors
A French satellite called 'Le Système Pour l'observation de la Terre' (SPOT) (Earth
Observation System) was launched in 1986. On board this satellite, a different type of
sensors called High Resolution Visible (HRV) were used. The HRV sensors have two
modes: the panchromatic (PAN) mode and the multispectral (XS) mode.
The HRV panchromatic sensor has a relatively wide spectral range, 0.51 - 0.73 mm, with a
higher spatial resolution of 10 x 10 m2
B1 0.50 - 0.59 mm
B2 0.61 - 0.68 mm
B3 0.79 - 0.89 mm
Besides the difference of spectral and spatial resolution design from the Landsat sensor
systems, major differences between MSS/TM and HRV are the use of linear array (also
called pushbroom) detectors and the off-nadir observation capabilities with the HRV
sensors (Figure 3.20). Instead of mirror rotation in the MSS or the TM sensors which
collect data using only a few detectors, the SPOT HRV sensors use thousands of detectors
arranged in arrays called "charge-coupled devices" (CCDs). This has significantly reduced
the weight of the sensing system and power requirement.
Figure 3.20. The SPOT HRV systems
A mirror with the view angle of 4.13° is used to allow ±27° off nadir observation. An
advantage of the off-nadir viewing capability is that it allows more frequent observations
of certain targeted area on the earth and acquisitions of stereo-pair images. A disadvantage
of the HRV sensors is the difficulties involved in calibrating thousands of detectors. The
radiometric resolution of MSS is 6 to 7 bits, while both TM and HRVs have an 8 bit
radiometric resolution.
The orbital cycle is 18 days for Landsats 1 - 3; 16 days for landsats 4, 5; 26 days for SPOT-
1 (SPOT HRV sensors can repeat the same target in 3 to 5 days due to their off-nadir
observing capabilities).
Among many meterological satellites, the Advanced Very High Resolution Radiometers
(AVHRR) on board the NOAA series (NOAA-6 through 12) have been widely used.
NOAA series were named after the National Oceanic and Atmospheric Administration of
the United States.
B1 0.58 - 0.68 mm
B2 0.72 - 1.10 mm
B3 3.55 - 3.95 mm
B4 10.3 - 11.30 mm
B5 11.5 - 12.50 mm
The orbit repeating cycle is twice daily. This is an important feature for frequent
monitoring. NOAA AVHRRs have been used for large scale vegetation and sea ice studies
at continental and global scales.
To document and understand global change, NASA initiated Mission to Planet Earth. This
is a program involving international efforts to measure the Earth from space and ground.
Earth Observing System is a primary component of the Mission to Planet Earth. EOS
includes the launch of a series of satellites with advanced sensor systems by the end of this
century. Those sensors will be used to measure most of the measurable aspects of the land,
ocean and atmosphere, such as cloud, snow, ice, temperature, land productivity, ocean
productivity, ocean circulation, atmospheric chemistry, etc.
Among various sensors on board the first six satellites to be launched, there is a sensor
called Moderate Resolution Imaging Spectrometer (MODIS). It has 36 narrow spectral
bands between 10-360 nm. The spatial resolution changes as the spectral band changes.
Two bands have 250 m, 5 have 500 m while the rest have 1000 m resolution. The sensor is
planned to provide data covering the entire Earth daily.
Two private companies, Lockheed, Inc. and Worldview, Inc. are planning to launch their
own commecial satellites in 2-3 years time with spatial resolutions ranging from 1 m to 3
m. In Japan, the NASDA (National Space Development Agency) has developed the Marine
Observation System (MOS). On board this system, there is a sensor called Multispectral
Electronic Self-scanning Radiometer (MESSR) with similar spectral bands as the Landsat
MSS systems. However, the spatial resolution of the MESSR system is 50 x 502.
Other countries such as India and the former USSR have also launched Earth resources
satellites with different optical sensors.
Multispectral scanners
The mechanism of airborne multispectral sensors is similar to the Landsat MSS and TM.
The airborne sensor systems usually have more spectral bands ranging from ultraviolet to
visible through near infrared to thermal areas. For example, the Daedalus MSS system is a
widely used system that has 11 channels, with the first 10 channels ranging from 0.38 to
1.06 µm and the 11th is a thermal channel (9.75 - 12.25 mm).
Another airborne multispectral scanner being used for experimental purposes is the TIMS -
Thermal Infrared Multispectral Scanner. It has 6 channels: 8.2 - 8.6; 8.6 - 9.0; 9.4 - 10.2;
10.2 - 11.2; 11.2 - 12.2 µm.
MEIS-II
Canada Centre for Remote Sensing developed the Multispectral Electro optical Imaging
Scanner (MEIS-II). It uses 1728 - element linear CCD arrays that acquire data in eight
spectral bands ranging from 0.39 to 1.1 mm. The spatial resolution of MEIS-II can reach
up to0.3 m.
Each time, only three colours (red, green and blue) can be used to display data on a colour
monitor. The colours used to display an image may not be the actual colour of the spectral
band that is used to acquire the image. Image displayed with such colour combinations are
called false colour composite. We can make many 3-band combinations out of a
multispectral image.
where Nc is the total number of 3-band combinations and nb is the number of spectral
bands in a multispectral image. For each of these 3-band combinations, we can use red,
green, and blue to represent each band and to obtain a false-colour image.
Videographic imaging includes the use of video cameras and digital CCD cameras. Video
images can be frame grabbed,or quantized and stored as digital images; however, the
image resolution is relatively low (up to 550 lines/image). Digital CCD cameras use two-
dimensional silicon-based charge coupled devices that produce a digital image in standard
raster format. CCD detectors arranged in imaging chips of approximately 1024 X 1024 or
more photosites produce an 8-bit image (King, 1992).
Imaging Spectrometry
Imaging spectrometry refers to the acquisition of images in many, very narrow, continuous
spectral bands.
• The first imaging spectrometer was developed in 1983 by JPL. The system called
Airborne Imaging Spectrometer (AIS) collects data in 128 channels from 1.2 µm to
2.4 mm. Each image acquired has only 32 pixels in a line.
• The Airborne Visible-Infrared Imaging Spectrometer (AVIRIS) represents an
immediate follow-up of the AIS (1987). It collects 224 bands from 0.40 - 2.45 mm
with 512 pixels in each line.
• In Canada, the first system was the FLI - Flourescence Linear Imager manufactured
by Moniteq, a company that used to be located in Toronto, Ontario.
In Calgary, the ITRES Research is producing another imaging spectrometer called the
Compact Airborne Spectroscopy Imager (CASI) (Figure 3.21).
For each line of ground targets, there will be nb x ns data collected at 2 bytes (16 bits)
radiometric resolution where nb is the number of spectral bands and ns is the number of
pixels in a line.
In spectral mode, all 288 spectral bands are used, but only up to 39 spatial pixels (look
directions) can be transferred.
In the spatial mode, all 512 spatial pixels are used, but only up to 16 spectral bands can be
selected.
Where to obtain remote sensing data?
SLAR systems produce continuous strips of imagery depicting very large ground areas
located adjacent to the aircraft flight line. Since cloud system is transparent to microwave
region, a SLAR has been used to map tropical areas such as SLAR Amazon River Basin.
Started in 1971 and ended in 1976, the project RADAM (Radar of the Amazon) was the
largest radar mapping project ever undertaken. In this project, the Amazon area was
mapped for the first time. In such remote and cloud covered areas of the world, radar
system is a prime source of information for mineral exploration, forest and range
inventory, water supplies and transportation management and site suitability assessment.
Radar imagery is currently neither as available nor as well understood as other image
products. An increasing amount of research is being conducted on interaction mechanism
between energy and surface targets, such as forest canopy, and on the combination of radar
image with other image products.
The time period can be measured from a transmitted signal travelling through the air
reaching the target and being scattered back to the antenna. We then can determine the
distance or the 'slant range' between the antenna and the target.
where
From Figure 3.23, it can be seen that SLAR depends on the time it takes for a transmitted
pulse being scattered back to the antenna to determine the position of a target.
In the across track direction, there is a spatial resolution which is determined by the
duration of a pulse and the depression angle (Figure 3.24). This resolution is called ground
range resolution. (rg)
It is obvious that in order to minimize rg, one needs to reduce τ. For the case of ra, the
optimal situation is determined by β which is a function of wavelength λ and antenna
length α
Those systems whose beam width is controlled by the physical antenna length are called
brute force or real aperture radar.
For a real aperture radar, the physical antenna length must be considerably longer than the
wavelength in order to achieve higher azimuth resolution. Obviously, it has a limit at which
the dimension of the antenna is not realistic to be put onboard an aircraft or a satellite.
This limiation is overcome in synthetic aperture radar (SAR) systems. Such systems use
a short physical antenna, but through modified data recording and processing techniques,
they synthesize the effect of a very long antenna. This is achieved by making the use of
Doppler effect (Figure 3.26).
SAR records both amplitude and frequency of backscattering signals of objects throughout
the time period in which they are within the beam of moving antenna. These signals are
recorded on tapes or on films. This leads to two types of data processing.
One of the problem associated with processing radar signals from tapes is that the signal is
contaminated by random noise. When displayed on a video monitor, the radar image tends
to have a noisy or speckled appearance. Later in the digital analysis section, we will
discussed the speckle reduction strategies.
If the same antenna is used for both transmitting and receiving, then
All parameters in this formula except "d" is determined by the system. Only δ is parameter
related to the ground target. Unfortunately, δ is a poorly understood parameter which
largely limits its use in remote sensing.
ï moisture influences the dielectric constant of the target which in turn could significantly
change the backscattering pattern of the signal. Moisture also stops the microwave
penetrating capability.
In the field, we use an array of sticks arranged paralell to each other with a constant
distance interval to measure the surface roughness.
A common definition of a rough surface is one whose S(h) exceeds one eighth of the
wavelength divided by the cosine of the incidence angle
As we illustrated in the spectral reflectance section, a smooth surface will tend to reflect all
the energy input at an angle equal to the incidence angle, while a rough surface tends to
scatter the incoming energy more or less at all direction.
• Polarization
Microwave energy can be transmitted and received by the antenna at a selected orientation
of the electromagnetic field. The orientation or polarization of the EM field is labelled as
Horizontal (H) and Vertical (V) direction. The antenna can transmit using either
polarization. This EM energy makes it posible for a radar system to operate in any of the
four models transmit H and recieve H, transmit H receive V, transmit V recieve H, and
transmit V receive V. By operating at different modes, the polarizing characteristics of
ground target can be obtained.
• Corner reflector
It tends to collect
reflected signal at its foreground and returns the signal to the antenna.
Microwave Bands
Band Wavelength 1
Ka 0.75 - 1.1 cm 40 - 26.5 GHz
K 1.1 - 1.67 cm 26.5 - 18 GHz
Ku 1.67 - 2.4cm 18 - 12.5 GHz
X 2.4 - 3.75cm 12.5 - 8GHz
C 3.75 - 7.5cm 8 - 4GHz
S 7.5 - 15cm 4 - 2GHz
L 15 - 30cm 2 - 1GHz
P 30 - 100cm 1 - 300 MHz
Geometric Aspects
Radar uses two types of image recording systems, a slant-range image recording system
and a ground-range image recording system.
In slant-range recording system, the spacing of targets is proportional to the time interval
between returning signals from adjacent targets.
In ground-range image recording system, the spacing is corrected to be approximately
proportional to the horizontal ground distance between ground targets.
If the terrain is flat, we can convert the slant-range spacing SR to Ground range GR
• Relief distortion
• Frequency L band
• Swath width 100 km centered at 20¡ from nadir
• Polarization HH
• Ground Resolution 25 m x 25 m
• The Euraopean Space Agency has lauched a satellite in 1991: ERS - 1, with a C
band SAR sensor.
• In 1992, the Japanese JERS -1 satellite was launched with a L band radar mounted.
The L band radar has a higher penetration capability than the C band SAR.
• Radarsat
• Scheduled to be launched in mid 1995, Radarsat will contain a SAR system which is
very flexible in terms of configurations of incidence angle, resolution, number of
looks and swath width.
Frequency C band 5.3 GHz
Altitude 792 Km
Repeat Cycle 16 days
Radarsat Radarsat Subcycle 3 day
Period 100.7 min (14 cycles per day)
Equatorial crossing 6:00 A.M.
Platform
Campbell's Book p. 118-129
Satellite Orbits
Chapter 3
References
Ahmed, S. and H.R. Warren, 1989. The Radarsat System. IGARSS'89/12th Canadian Symposium on Remote
Sensing. Vol. 1. pp.213-217.
Anger, C.D., S. K. Babey, and R. J. Adamson, 1990, A New Approach to Imaging Spectroscopy, SPIE
Proceedings, Imaging Spectroscopy of the Terrestrial Environment, 1298: 72 - 86. - specifically, CASI
Curlander, J.C., and McDonough R. N., 1991. Synthetic Aperture Radar, Systems & Signal Processing. John Wiley
and Sons: New York.
Elachi, C., 1987. Introduction to the Physics and Techniques of Remote Sensing. John Wiley and Sons, New York.
King, D., 1992. Development and application of an airborne multispectral digital frame camera sensor. XVIIth
Congress of ISPRS, International Archives of Photogrammetry and Remote Sensing. B1:190-192.
Lenz, R. and D. Fritsch, 1990. Accuracy of videometry with CCD sensors. ISPRS Journal of Photogrammetry and
Remote Sensing, 90-110.
Lillesand, T.M. and Kiefer, R.W., 1994, Remote Sensing and Image Interpretation, 3rd. Ed., John Wiley and Sons,
Inc.: Toronto.
Luscombe, A.P., 1989. The Radarsat Synthetic Aperture Radar System. IGARSS'89/12th Canadian Symposium
on Remote Sensing. Vol. 1. pp.218-221.
Staenz, K., 1992. A decade of imaging spectrometry in Canada. Canadian Journal of Remote Sensing. 18(4):187-
197.
4.1 Digital Imagery
Different from Cartesian coordinate system, the origin and the axis in an image coordinate
system takes the following form for printing and processing purposes:
Each picture element in an image, called a pixel, has coordinates of (x, y) in the discrete
space representing a continuous sampling of the earth surface. Image pixel values represent
the sampling of the surface radiance. Pixel value is also called image intensity, image
brightness or grey level. In a multispectral image, a pixel has more than one grey level.
Each grey level corresponds to a spectral band. These grey levels can be treated as grey-
level vectors.
From the continuous physical space to the discrete image space, a quantization process is
needed. The details of quantization is determined by how we do sampling and what kind of
resolution we use. General concepts on sampling and resolution have been introduced in
Chapter 1.
Two concepts are of particular importance; image space and feature space. Image space
refers to the spatial coordinates of an image(s) which are denoted as I with m x n elements,
where m and n are respectively the number of rows and the number of columns in the
image(s). The elements in image space, I(i,j) (i = 1, 2,..., m; j = 1, 2,..., n) are image pixels.
They represent spatial sampling units from which electromagnetic energy or other
phenomena are recorded. All possible image pixel values constitute the feature space V.
One band of image constitutes a one-dimensional feature space. k bands in an image
denoted as Ik construct a k-dimensional feature space Vk. Each element in Vk is a unit
hypercube whose coordinate is a k-dimensional vector v = (v1, v2, ..., vk)T. When k = 1, 2,
and 3 the hypercube becomes a unit line, a unit area, and a real unit cube. Each pixel in
image space has one and only one vector in feature space. Different pixels may have the
same vector in feature space.
Multispectral images construct a special feature space, a multispectral space Sk. In S, each
unit becomes a grey-level vector g = (g1, g2, ..., gk)T. In multispectral images, each pixel
has a grey-level vector. There are other types of images which add additional dimensions
to the feature space. In the feature space, various operations can be performed. One of
these operations is to classify feature space into groups with similar grey-level vectors, and
give each group a same label that has a specific meaning. The classification decision made
for each image pixel is in feature space and the classification result is represented in image
space. Such an image is a thematic image which could also be used as an additional
dimension in feature space for further analysis.
A pixel window is defined in image space as a group of neighbouring pixels. For the
computation simplicity, a square pixel neighbourhood wl(i,j) centered at pixel I(i,j) with a
window lateral length of l is preferred. Without further explanation, we refer to a pixel
window as wl(i,j). In order to ensure that I(i,j) is located at the centre of the pixel window,
it is necessary for l to be an odd number. It is obvious that the size of a pixel window wl(i,j)
is l X l. The following conditions hold for a pixel window:
This means that the minimum pixel window is the centre pixel itself, and the maximum
pixel window could be the entire image space, provided that the image space is a square
with an odd number of rows and columns. When the image space has more than one image,
a pixel window can be used to refer to a window located in any one image or any
combinations of those images.
A histogram sometimes has two means: a table of occurrence frequencies of all vectors in
feature space or a graph plotting these frequencies against all the grey-level vectors. The
occurrence frequency in the histogram is the number of pixels in the image segment having
the same vector. When the entire image space is used as the image segment, the histogram
is referred to as h(I). When a histogram is generated from a specific pixel window, it is
identified as hl(i,j) where l, i, and j are the same as above. In practice, one-dimensional
feature space is mainly used. In this case, a histogram is a graphical representation of a
table with each grey level as an entry of the table. Corresponding to each grey level is its
occurrence frequency f(vi) , i = 0, 1, 2, ..., Nv-1 and Nv are the numbers of grey levels of an
image (e.g.,Nv = 8 in Figure 4.2).
Figure 4.2. An example histogram
From a histogram h(I) we can derive the cumulative histogram hc(I)={fc(vi) , i = 0, 1, 2, ...,
Nv-1}. This is obtained for each grey level by summing up all frequencies whose grey
levels are not higher than the particular grey level under consideration (Figure 4.3).
Two parameters of a sensor system at a specific height determine the quality of a digital
remote sensing image for a given spectral range; the spatial resolution rsand the
radiometric resolution rr. As discussed in Chapter 1, the spatial resolution determines how
finely the spatial detail of the real world an image can record (i.e., how small the spatial
sampling unit is) and therefore the number of pixels in the image space. The radiometric
resolution determines how finely a spectral signal can be quantized and therefore the
number of grey levels that is produced. The finer these resolutions are, the closer is the
information recorded in the image to the real world, and the larger are the sizes of the
image space and the grey-level vector space. The size (or alternatively the number of
pixels) of image space, S(I), has an exponential relation with the spatial resolution, and so
does the size (or the number of vectors) of the feature space, S(V), with the radiometric
resolution. Their relations take the following forms:
where k, as defined above, is the number of images in the image space. While S(I) has a
fixed exponential order of 2 with rs, S(V) depends not only on rr, but also on k. The
number of vectors in Vk becomes extremely large when k grows while rr is unchanged. For
example, each band of a Landsat TM or SPOT HRV image is quantized into 8 bits (i.e., an
image has 256 possible grey levels). Thus, when k = 1, S(V) = 256 and when k = 3, S(V) =
16,777,216. If a histogram is built in such a three-dimensional multispectral space, it would
require at least 64 Megabytes of random access memory (RAM) or disk storage to process
it. Therefore, the feature space has to be somehow reduced for certain analyses.
BIL is typically used by the Landsat Ground Station Operators' Working Group
(LGSOWG)
AAA BBB CCC, AAA BBB CCC
Band Sequential BSQ takes the following form:
AAA AAA BBB BBB CCC CCC
Pixel Interleaved format is used by PCI. It takes the form of:
ABC ABC ABC ABC ABC ABC
These are the general formats that are being used. BIL is suitable for data transfer from the
sensor to the ground. It does not need a huge buffer for data storage on the satellite if the
ground station is within the transmission coverage of the satellite.
Band sequential and separate file formats are the proper forms to use when we are more
interested in single-band image processing, such as image matching, correlation, geometric
correction, and when we are more concerned with spatial information processing and
extraction. For example, we use these files when linear features or image texture are of our
concern.
In remote sensing there are three major forms of imaging geometry as shown in Figure 4.5:
Figure 4.5 The major types of imaging geometry
The first one is central perspective. It is the simplist because the entire image frame is
defined by the same set of geometrical parameters. In the second imaging geometry, each
pixel has its own central perspective. This is the most complicated because each pixel has
to be corrected separately if there exists geometrical distortion. The third one shows that
each line of an image has a central perspective.
The platform status which can be represented by six parameters all affect the image
geometry.
(X, Y, Z, ψ, λ, ρ)
• airborne platform
• earth rotation - affects satellite
• continental drift
Most remote sensing satellites for earth resources studies, such as the Landsat series and
the SPOT, use Sun synchronous polar orbit around the earth (Figure 4.6) so that they
overpass the same area on the earth at approximately the same local time. Most of the
earth's surface can be covered by these satellites.
Figure 4.6. Sun synchronous polar orbit for Earth resources satellites
The effects of roll, pitch and yaw along the direction of satellite orbit or the airplane flight
track can be illustrated by using Figure 4.7.
Figure 4.7. The effects of roll, pitch and yaw on image geometry
Although the Earth's surface is spherical, we use flat maps to represent the phenomena on
the surface. We transform the coordinates on the spherical surface to a flat sheet of paper
using map projection. The most widely used map projection is Universal Transverse
Mercator (UTM) projection.
The purpose of georeferencing is to transform the image coordinate system (u,v), which
may be distorted due to the factors discussed above, to a specific map projection (x,y) as
shown in Figure 4.8. The imaging process involves the transformation of a real 3-D scene
geometry to a 2-D image
Figure 4.8. Georeferencing is a transformation between the image space to the geographical coordinate
space
In order to achieve:
Every step involved in the imaging process has to be known, i.e., we need to know the
inverse process of geometric transformation.
This is a complex and time consuming process. However, there is a simpler and widely-
used alternative: polynomial approximation.
Coefficients a's and b's are determined by using Ground Control Points (GCPs).
For example, we can use very low order polynomials such as the affine transformation
u = ax + by + c
v = dx + ey + f
A minimum of 3 GCPs will enable us to determine the coefficients in the above equations.
In this way, we don't need to use the transformation matrix T. However, in order to make
our coefficients representative of the whole image that is transformed, we have to make
sure that our GCPs are well distributed all over the image.
The third choice is that we can combine the T-1 method with the polynomial technique in
order to reduce the transformation errors involved in the direct transformation of T-1
(Figure 4.9).
Figure 4.9. Larger magnitude of errors may be introduced
if direct transformation is used.
In T-1, due to the inaccuracies of satellite or plane positioning, polynomials are used to
correct them. For platform position the following formula can be used:
We can use GCPs to refine the coefficients. Global Positioning System (GPS) and/or
Inertial Navigation Systems (INS) techniques can also be used. The integration of GPS and
INS with remote sensing sensors are being investigated (Schwarz, et al, 1993).
(4) Use a low order polynomial inside each block for detailed mapping (Figure 4.11)
Figure 4.11. Further transformation from u-v space to Dx-Dy space using lower order
polynomials
(i) Affine
(ii) Bilinear
Why (ii) is called bilinear? This is because each coordinate can be a multiplication of two
linear function of x and y.
u = (a + bx) (c + dy)
linear linear
Bilinear
Since there are four known and four unknown, therefore we can solve (i) using least
squares (ii) using regular solution of an equation group. We will only show how to obtain
ao, a1, a2, a3 in (ii).
For point
by substituting the n GCPs coordinates into (1) and (2) we will obtain two groups of over-
determined equations
For x
by multiplying on both sides, we have
This can be applied to affine transformation and higher order polynomial transformation.
The results will appear as in Figure 4.13. Pixel position (1, 1) may be transformed to
(4850.672, 625.341).
convolution operator
Since most weight functions are limited to a local neighbourhood, only a limited number of
i's need to be used.
For instances, in the nearest neighbour (NN) interpolation, i takes the value which is
closest to u. In linear interpretation, l and h takes the nearest integer less than and equal to
u and the nearest integer greater than u, respectively. For cubic, l and h takes the second
nearest integer less than and equal to u while h takes second closest integer that is greater
that u.
For sinc function,
x can be infinite but we usually need to use a limited number of terms up to 20.
According to the above introduction of convolution, for the nearest neighbour case the
weight function is
For the linear case, it is
Then Z(u, v) is obtained by applying convolution along the dashed line. The convolution
process for all the three interpolation cases can be shown by
For Cubic:
l = nearest two integers equal or smaller than u
m = nearest two integers larger than u.
Chapter 4
References
Schwarz, K-P., Chapman, M.A., Canon, E.C. and Gong, P., 1993. An integrated INS/GPS approach to the
georeferencing of remotely sensed data. Photogrametric Engineering and Remote Sensing, 59(11): 1667-1673.
Shlien, S., 1979. Geometric correction, registration, and resampling of Landsat Imagery. Canadian Journal of
Remote Sensing. 5(1):74-87.
5. Radiometric Correction
In addition to distortions in image geometry, image radiometry is affected by factors, such
as system noise, sensor malfunction and atmospheric interference. The purpose of
radiometric calibration is to remove or reduce the sensor (detector) inconsistencies, sensor
malfunction, viewing geometry and atmospheric effects. We will first introduce the
calibration of detector responses.
As we have discussed before, Landsat MSS has 6 detectors at each band, TM has 16 and
SPOT HRVs have 3000 or 6000 detectors. The differences between the SPOT sensors and
Landsat sensors are that each SPOT detector collects one column of an image while each
detector of Landsat sensors corresponds to many lines of an image (Figure 5.1).
Figure 5.1. Images acquired using detectors in linear array sensors and in scanners
The problem is that no detector functions the same way as others. If the problem becomes
serious, we will observe banding or striping on the image.
There are two types of approach to overcome the detector response problems: absolute
calibration and relative calibration.
In this mode, we attempt to establish a relationship between the image grey level and the
actual incoming reflectance or the radiation. A reference source is needed for this mode
and this source ranges from laboratory light, to on-board light, to the actual ground
reflectance or radiation.
For CASI, each detector is calibrated by the manufacturer in the laboratory. For the
Landsat MSS, a calibration wedge with 6 different grey levels is used. For the Landsat TM,
three lamps, which have 8 brightness combinations, are used.
vo = a ï vi + b
vo - observed reading
Figure 5.2. Responses of the six Landsat MSS detectors. A least squares linear
fitting is applied to these detector responses.
Once each detector is calibrated, the calibrated image data (digital numbers) can be
converted into radiances or spectral reflectances. For the case of converting digital
numbers of an 8 bit image into radiances, we have
Figure 5.3. When six detectors of the Landsat MSS are seeing the same
water target, their responds should be the same.
The aim of this method is to make the m and to be the same for each detector. For each
detector i, we need a transfer function to transfer measured mi and i to a standard set of m
and .
measured = n
desirable mean = M
desirable =S
The transfer function is
I'n = anIn + bn
where I'n is the calibrated intensity and In is the original intensity
For an 8-bit image, you may try to use M = 128 and S = 50 or may use the mean and
standard deviation calculated from the entire sample.
This may not always work. The assumption behind this strategy is that detector responses
are linear.
This is usually done by comparing their cumulative histograms as shown in Figure 5.4.
This process is done for each given grey level, g2, to find its cumulative frequencies
fc2(g2) in F2. Then in F1 find the grey-level value, g1, such that its cumulative frequency
fc1(g1) = fc2(g2). Then assign g1 to g2 in the histogram to be adjusted.
Normally Lmax, Lmin and DNrange are known from the sensor manufacturer or operator.
However, Ls is composed of contributions from the target, background and the atmosphere
(Figure 5.5):
Figure 5.5 Target, background and scattered radiation received by the sensor.
As introduced before, the atmosphere has severe effects on the visible and near-infrared
radiance. First, it modifies the spectral and spatial distribution of the radiation incident on
the surface. Second, radiance being reflected is attenuated. Third, atmospheric scattered
radiance, called path radiance, is added to the transmitted radiance.
Assuming that Ls is the radiance received by a sensor, it can be divided into LT and LP
LS = LT + LP (1)
For a given spectral interval, the solar irradiance reaching the earth's surface is
EG =
where ES is the solar irradiance outside the atmosphere,
i incident angle
Surface can be either specular or diffuse. Most surfaces can be considered as approximate
diffuse reflectors at high solar elevations, i.e. when i is small.
If the surface is assumed to be a perfect diffuse reflector i.e. the Lambertian case, the ratio
of the radiation reflected in the viewing direction to the total radiation into the whole upper
hemisphere is given by .
where is the target reflectance, Te is the transmittance along the viewing direction.
Therefore in order to quantitatively analyze remotely sensed data, i.e. to find ρ,
atmospheric transmittance T and path radiance Lp have to be known.
Path radiance Lp
Lp is determined by at least two parameters: single scattering albedo and single scattering
phase function.
Single scattering albedo = 1 when no attenuation occurs. Single scattering phase function
denotes the fraction of radiation which is scattered from its initial forward direction to
some other direction.
A number of path radiance determination algorithms exists. For a nadir view as Landsat
MSS, TM and SPOT HRV are usually used. Lp for these algorithms can be determined by:
For aerosol scattering, the phase function Pp(µi) does not change much as wavelength
changes, the function for λ = 0.7 mm can be used for all wavelengths. This function is
usually found in a diagram or a table form. See a function found in Forster (1984).
The average background B is usually determined by collecting ground-truth information
for a region. A 3 km x 3 km square centering the pixel to be corrected can be used.
In this section, we only tried to introduce some basic concepts of this complex topic. This
is only a single-scattering correction algorithm for nadir viewing condition. More
sophisticated algorithms which counts multiple-scattering do exist. Some examples of
these algorithms are LOWTRAN 7, 5S (Simulation of the Satellite Signal in the Solar
Spectrum 5S) and 6S (Second Simulation - aircraft, altitude of target). There are
FORTRAN codes available for these algorithms. The 5S and 6S are proposed by Tanre and
his colleagues (e.g. Tanre et al., 1990, IGARSS 190, p. 187).
One has to be careful when conducting atmospheric correction since there are many factors
to be counted and to be estimated. If these estimations are not properly made, the
atmospheric correction might add more bias than does the atmosphere itself.
This is most suitable to the clear sky when Rayleigh atmosphere dominates since Rayleigh
scattering affects short wavelength, particularly visible, and we know that clear-deep water
has a very low spectral reflectance in the short wavelength region. If a relatively large
water body, say 1-2 km in diameter, can be found on an image, we can use the radiance of
water derived from the image as Lw and the real water radiance, L, to estimate Lp.
Lw = K ï DN water + Lmin
Lp = Lw - L
Lp can then be subtracted from other radiances in an image for the visible channels.
For the infrared channels, Rayleigh atmosphere has little effects and Lp is assumed to be 0.
It can be seen that this method only applies to Rayleigh atmosphere.
R = a ï DN + b
By tying the ground reflectance measured during the flight overpass to the corresponding
pixel values on the image, we can solve the equation to obtain a and b. This is an empirical
method. In fact, both the dark-target and direct digital number conversion methods have
been most widely used in remote sensing.
In previous sections, we attempted to correct the atmospheric effects, i.e. convert image
digital numbers DNs to image radiance Ls. After atmospheric correction, we expect to
have the spectral reflectivity ρ.
Assume that atmospheric effects can be completely removed from the image, the spectral
reflectivity obtained contains the real target reflectance r and the topographic modification
during image acquisition, G.
=rïG
The G contains information about the viewing and energy incidence geometric
relationship.
Moon can be considered approximately as a surface that reflects equal amount of light in
all directions.
What effects does the relief have on the image radiometry? To answer this question, a
different coordinate system will be used and Figure 5.6 shows this image coordinate
system. In this coordinate system, z is the viewing direction and x-y plane is the image
plane.
The actual relief for a small area is defined by its normal and the light source defined by
In discrete case these are the differences between elevations between neighbourhood cells
and the grid cell under consideration.
If r is the same over the whole study area, we can use two set of (p, q)'s to recover (p, q).
Similarly, we can use three sets of
(p, q)s
Using (p, q), we can generate a shaded map based on a DEM of an area.
Instead of calculating (p, q) for each grid on a DEM, we can calculate a two dimensional
lookup table
p
-0.2 -0.1 0 0.1 0.2
q
-0.2
-0.1
0.1
0.2
The entire DEM {p, q} can be mapped using the above table.
Chapter 5
References
Forster, B.C., 1984. Derivation of atmspheric correction procedures for Landsat MSS with particular reference
to urban data. Int. J. of Remote Sensing . 5(5):799-817.
Horn, B.K.P., and Woodham, R.J., 1979. Destriping Landsat MSS images by histogram modification. Computer
Graphics and Image Processing. 10:69-83.
Further Readings:
Woodham, R.J., and Gray, M.H., 1987. An analytic method for radiometric correction of satellite multispectral
scanner data. IEEE Transactions on Geosciences and Remote Sensing. 25(3):258-271.
6. Image Enhancement
A histogram of an image can tell us about the data distribution with respect to image grey
levels. The purpose of a histogram-based operation is that when a grey-level
transformation is made, pixels in the image having a specific range of grey levels can be
enhanced or suppressed. This is also called contrast adjustment. It can be done using:
1. histogram stretching
Both histogram stretching and histogram compression can be done either linearly or
nonlinearly.
DN' = a ï DN
Figure 6.2.
Figure 6.3.
Figure 6.4. (a) original histogram of an image. (b) the histogram after adjustment.
This is realized by equally partitioning the cumulative histogram fc of the original image
into 255 pieces. Each piece will correspond to one digital number in the equalized image
(Figure 6.7). On the cumulative curve, find out the nth dividing point,
Figure 6.7. For the discrete case, modify the grey level value according to the principle of equal
frequency.
The equalization process can also be considered as a histogram matching method used in
image destriping as discussed in Section 5.1. Here we attempt to match the original
cumulative histogram Fc1 to the new cumulative histogram Fc2 (Figure 6.8).
Figure 6.8.
The following example shows how an equalization can be made in discrete digital form. It
starts with the generation of image histogram (first two columns in Table 6.1). Then
probability, Pi is calculated from frequency, f(vi) (third column). A cumulative histogram
Fc can be calculated from frequencies. Similarly, the cumulative distribution function
(CDF) can be derived from probabilities. Based on the cumulative distribution function we
can convert the original grey-levels into grey-levels of the equalized image (Table 2).
Table 6.1 Histogram, cumulative histogram and cumulative distribution function (CDF)
Density Slicing is to represent a group of contiguous digital numbers using a single value.
Although some details of the image will be lost, the effect of noise can also be reduced by
using density slicing. As a result of density slicing, an image may be segmented, or
sometimes contored, into sections of similar grey level. Each of these segments is
represented by a user specified brightness.
The last filter can be used to remove drop-out lines in Landsat images. This is done by
applying a filter only along the drop-out lines in those images.
3. Median filter
This filter is more useful in removing outliers, random noise, and speckles on RADAR
imagery, than a simple average filter. It has a desirable effect of keeping edges to some
extent. This filter can also be applied to drop-out line removal in some Landsat images.
By moving (i, j) all over an image, the original image, I, can be filtered and the new image,
I', can be created.
For 2,
In order to enhance edges, differences between neighbourhood digital numbers are taken.
We will start from one dimensional example:
1111:2222I
<> edge
0 0 1 -1 0 0 I"
I" = I'(i+1) - I'(i) = I(i+2) - I(i+1) - I(i+1) + I(i)
We call the first differencing, taking a gradient and the second differencing, taking a .
We can use the matrix
1 -2 1 as a Laplacian filter, an edge enhancement filter.
In the two-dimension form, a Laplacian filter is:
The question is, can we write DN-kDN" in a filter form? The answer is yes.
Morphological filtering is one type of processing in which the spatial form or structure of
objects within an image is modified. Dilation, erosion and skeletonization are three
fundamental morphological operations.
In this section, we first introduce binary image morphological filtering. Two types of
connectivities are defined as following.
Basic morphological operations, dilation, erosion and many variants can be defined and
implemented by "hit or miss" transformations. A small odd-sized mask is scanned over a
binary image. If the binary-pattern of a mask matches the state of the pixels under the
mask, an output pixel in spatial correspondence to the center pixel of the mask is set to
some desired binary state. Otherwise, the output is set to the opposite binary state.
For example, to perform simple binary noise cleaning, if the isolated 3 x 3 pixel window
is encountered, the "1" in the center will be replaced by a "0". Otherwise, the center pixel
value is not changed.
It is often possible to use simple neighbourhood logical relationships to define the
conditions for a hit. For the simple noise removal case,
where denotes the logical and i.e., intersection operation and denotes the union.
Additive operators
The center pixel of a 3 x 3 pixel window are converted by these operators from zero state
to one state when a hit is obtained. The basic operators include
Interior Fill - create one if all four-corrected neighbour pixels are one
Diagonal Fill - create one if this process will eliminate eight-connecting of the
background.
where
and
There are 119 patterns which satisfy the above condition. For example,
Eight-Neighbour Dilate create one if at least one eight-connected neighbour pixel is one.
where
H - break - Erase one if it is H-connected
Interior Pixel Removal - Erase one if all 4-connected neighbours are ones
It is expressed as
where I(i,j) for 1 < i, j < N is a binary-valued image and H(m,n) for 1 < m,n < a, a is an odd
integer called a structuring element. Minkowski addition is defined as
In order to compare I(i,j) with I'(i,j). I(i,j) should be translated to
TQ(I(i,j)) where Q = ((L-1)/2, (L-1)/2)
According to the rules defined above, you can observe what it looks like.
Some properties of Dilation and Erosion
I(i,j) = I
Dilation is commutative
I J=J I
But, in general, erosion is not commutative
I J J I
Dilation and erosion are opposite in effect; dilation of a background of an object behaves
like erosion of the object
A (B C) = (A B) C
A (B C) = (A B) C
Dilation and erosion are often applied to an image in concatenation. A dilation followed by
an erosion is called a close operation,
Similarly, closing refers to a dilation followed by an erosion while openning means erosion
followed by dilation. The effect of closing on grey scale images is that small objects
brighter than background are preserved and bright objects with small gaps in between may
become connected. Openning, on the other hand, removes bright objects that are small in
size and breaks narrow connections between two bright objects.
The multispectral or vector nature of most remote sensing data makes it possible for
spectral transformations to generate new sets of image components or bands. The
transformed image may make evident features not discernable in the original data or
alternatively, it might possibly preserve the essential information content of the image with
a reduced number of the transformed dimensions. The last point has significance for the
display of a data in three dimensions on a colour monitor or in colour hardcopy, and for
transmission and storage of data.
Addition, subtraction, multiplication, and division of the pixel brightnesses from two bands
of image data form a new image. Multiplication is not as useful as others.
We can plot the pixel values in a two-dimensional space (Figure 6.10.) This two-
dimensional diagram is called a scatter plot.
A multispectral space is a coordinating system in which each axis represents the grey-level
values of a specific image band.
Ratio
ak, bk are constants, there are at least one a and one b that are not 0.
To generate two ratios between SRNIR and SRR, one for the normal vegetation and one
for the vegetation which is under stress, we use the following equations.
From these ratios, RVN > RVS, we can observe the difference in the conditions of the two
types of vegetation
Vegetation Indices
Normalized Difference Vegetation Index (NDVI)
This is calculated from the raw remote sensing data. We can also calculate the NDVI using
the processed remote sensing data (after converting digital numbers to spectral
reflectances)
To suppress the effect of different soil backgrounds on the NDVI, Huete (1989)
recommended to use a soil-adjusted vegetation index:
The dimension of the multispectral space constructed by a remotely sensed image is the
number of spectral bands. For example, Landsat MSS image constructs a four dimensional
multispectral space. For Landsat TM image, the multispectral space will have seven
dimensions.
For simplicity purpose, two-dimensional data will be used as examples to illustrate the
procedure of principal component transformation. Without loss of generality, the procedure
can be applied to data in multispectral space of any dimension.
Pixel B1 B2 Xi - M
X1 1 2 -2, -0.33
X2 2 1 -1, -1.33
X3 4 1 1, -1.33
X4 5 2 2, -0.33
X5 4 4 1, 1.67
X6 2 4 -1, 1.67
M 3 2.33
1 2 3
4 5 6
Example 2
Pixel B1 B2 Xi - M
-1.5, -
X1 2 2
1.5
X2 4 3 0.5, -0.5
X3 5 4 1.5, 0.5
X4 5 5 1.5, 1.5
X5 3 4 -0.5, 0.5
-1.5, -
X6 2 3
0.5
M 3.5 3.5
The mean vectors and (Xi - M) are as listed in the two example tables.
What are the differences between V1 and V2? We can answer this question by further
examining their corresponding correlation matrices R1 and R2.
From R1, we can see that the correlation between Band 1 and 2 is 0. This means that Band
1 and Band 2 contain independent information about our target. We cannot use B1 to
replace B2.
For R2, the correlation between Band 1 and Band 2 is 0.761, which is quite high. Using
either channel, we can obtain, to a large extent, information about the other channel.
X −−−−−−−−−−−>Y
transformation
It is recommended that a rotation matrix be used to complete this process. The rotation
matrix is G,
Y = GX .
G can be found by deriving the eigenvalues and eigenvectors from the covariance matrix
Vx. To find eigenvalues we need to solve
| Vx - λI | = 0 (1)
Where "I" is an identity matrix. is the eigenvalue vector ( 1, 2, ...., nb)T.
For each non-zero eigenvalue, λi, we can find its corresponding eigenvector gi = (gi1, gi2,
...., ginb)T . This can be obtained from
[ Vx - iI ] ï gi = 0 (2)
The rotation matrix G can then be determined by
Figure 6.12. The new axes derived from the PCT in the original coordinate system.
B'1 and B'2 are the new axes. In this coordinating system, data variance along B'1 is 2.67
while variance on B'2 is only 0.33. This means that in the rotated space, the data variance
along one axis is the same as its corresponding eigenvalue.
From, 2.67 + 0.33 = 1.90 + 1.10 = 3.00, we can see that the rotation will not affect the total
variance of the original data. Using 1.90/3.00 and 1.10/3.00 we can determine the
percentage of total variances that B1 and B2 represent.
For B'1, it represents 2.67/3.00 = 89% of the total variance while B'2 contains only 11% of
the total variance.
From the loadings of B'1 and B'2, we can see that after the rotation we can add more
loading in one band while reducing the amount of loading in another band. For
multispectral space with nb dimensions, after the principal component transformation, we
will have a few higher loadings for the first few bands and a very low loading for the rest.
We call those bands containing relatively high loadings the principal components. We can,
therefore, make the use of these principal components in our data analysis while ignoring
those relatively minor components. By so doing, we will not lose much of the original data
variability. This serves as a purpose of reducing data dimensionality. It's application in
classification (keeping the maximum variance) and in change detection (keeping the
minimal variance) normally holds the promise.
The PCT is a linear transformation technique which helps to enhance remotely sensed
imagery. Although, principal components are often used, minor components may also be
useful in highlighting information on low data variability that the remote sensing data have.
For example, a few researchers have used the PCT to multi-temporal change detection.
They found that changes in information of a scene are preserved in minor components.
Different from the PCT which is based on the data covariance matrix, Kauth and Thomas
(1976) have developed a linear transformation which is physically-based on crop growth.
Figure 6.13. A 3-D data scatterplot of the multispectral space constructed by the
green, red and near-infrared bands (Which looks like a tasselled cap.)
The growing cycle of crop started from bare soil, then to green vegetation and then to crop
maturation with crops turning yellow. These different stages of vegetation growth has
made the data distribution in the three dimensional multispectral space (Figure 6.13)
appear in a shape of a tasselled cap.
Kauth and Thomas defined a linear transformation to enhance the data according to the
data structure. They have defined four components called, redness (soil), greenness
(vegetation), yellowness and noise, using the following transformation matrix for Landsat
MSS data
Later, Crist, Cicone and Kauth developed a new transformation technique for Landsat TM
data. (Crist and Kauth, 1986; Crist and Cicone, 1984)
Chapter 6
References
Crist, E.P. and Cicone, R.C., 1984. A physically-based transformation of the Thematic Mapper data - the
Tessled Cap. IEEE Transactions on Geoscience and Remote Sensing. GE-23:256-263
Crist,E.P., and KauthR.J., 1986. The Tessled Cap De-Mystified. Photogrammetric Engineering and Remote
Sensing. 52(1):81-86.
Huete, A.R., 1989. Soil influences in remotely sensed vegetation canopy spectra. In Theory and Applications
of Optical Remote Sensing. Ed. by G. Asrar, John Wiley and Sons: New York.
Kauth, R.J., Thomas, G.S. 1976. The tessled cap - a graphic description of the spectral-temporal development
of agricultural crops as seen by Landsat. Proceedings of the symposium on Machine Processing of Remotely Sensed
Data. Purdue University, West Lafayette, Indiana, pp. 4B41-51.
Pratt, W., 1991. Digital Image Processing. John Wiley and Sons: Toronto.
To derive useful spatial information from images is the task of image interpretation. It
includes
ï detection: such as search for hot spots in mechanical and electrical facilities and white
spot in x-ray images. This procedure is often used as the first step of image interpretation.
ï delineation: to outline the recognized target for mapping purposes. Identification and
delineation combined together are used to map certain subjects. If the whole image is to be
processed by these two procedures, we call it image classification.
ï enumeration: to count certain phenomena from the image. This is done based on
detection and identification. For example, in order to estimate household income of the
population, we can count the number of various residential units.
ï mensuration: to measure the area, the volume, the amount,and the length of certain target
from an image. This often involves all the procedures mentioned above. Simple examples
include measuring the length of a river and the acreage of a specific land-cover class. More
complicated examples include an estimation of timber volume, river discharge, crop
productivity, river basin radiation and evapotranspiration.
In order to do a good job in the image interpretation, and in later digital image analysis,
one has to be familiar with the subject under investigation, the study area and the remote
sensing system available to him. Usually, a combined team consisting of the subject
specialists and the remote sensing image analysis specialists is required for a relatively
large image interpretation task.
Depending on the facilities that an image interpreter has, he might interpret images in raw
form, corrected form or enhanced form. Correction and enhancement are usually done
digitally.
ï Image texture
ï Pattern
ï Association
A specific object co-occurring with another object. Some examples of association are an
outdoor swimming pool associated with a recreation center and a playground associated
with a school.
ï Shadow
Object shadow is very useful when the phenomena under study have vertical variation.
Examples include trees, high buildings, mountains, etc.
ï Shape
Agricultural fields and human-built structures have regular shapes. These can be used to
identify various target.
ï Size
Relative size of buildings can tell us about the type of land uses while relative sizes of tree
crowns can tell us about the approximate age of trees.
ï Site
Broad leaf trees are distributed at lower and warmer valleys while coniferous trees tend to
be distributed on a higher elevation, such as tundra. Location is used in image
interpretation.
Image interpretation strategies
Land-cover classification
- indirect interpretation
to map something that is not directly observable in the image. This is used to classify land
use types (Gong and Howarth, 1992b). Land-use is the human activities on a piece of land.
It is closely related to land-cover types. For example, a residential land-use type is
composed of roof cover, lawn, trees and paved surfaces.
To interpret an area where the interpreter is familiar with first, then interpret the areas
where the interpreter is not familiar with (Chen et al, 1989). This can be assisted by field
observation
In order to obtain forest volume, one might have to determine what is observable from the
image, such as tree canopies, shadows etc. Then the volume can be derived. We can also
estimate the depth of permafrost using the surface cover information (Peddle, 1991).
Census data,and topographical maps and other thematic maps may all be useful during
image interpretation.
More details on the image interpretation can be found in Lillesand and Kiefer (1994) or
Campbell (1987).
e.g.
I(i,j) =
2 3 5 7
1 4 6 1
2 2 3 2
1 3 3 2
0 0 1 1
0 1 1 0
0 0 0 0
0 0 0 0
Normally T is determined from the histogram of an image as shown in the following
example.
I0 I1 I2
I7 I I3
I6 I5 I4
(1) Suppose I as a seed (starting point) is of Label K, then Ii will also belong to K, if |Ii - I|
< .
(2) If the second point is not found in the local neighbourhood, then remove the label K
from the seed point I.
(3) If the second point is found, then operate (1) with the second point using m1. If a third
point I is found, a new m2 will be generated based on m1 an Ij.
(4) Gradually growing a local area by using the criterion in (1). If an nth point is found, the
mn-1 is adjusted to the group mean
(5) Repeat (1) to (4) with different seeds and s. Thresholding is faster, however, it is not
adaptive to local properties. e.g. if a neighbourhood is as following
5 7 6
4 2 5
7 6 5
1 1 1
1 0 1
1 1 1
while with the region-growing technique, if the seed I = 2 and = 1, 2 will not be assigned
to a segment label because no neighbourhood pixel will meet the criterion in (4).
Image segmentation can also be done using clustering algorithms. Segmentation is usually
used as the first step in image analysis. Once an image is properly segmented, the
following operation can be performed: classification, morphological operation, and image
understanding through knowledge-based or more advanced computation.
(3) Select representative areas on the image and analyze the initial clustering results or
generate training signatures.
The following diagram shows the major steps in two types of image classification:
Supervised:
Unsupervised
Spectral class: a class which includes similar grey-level vectors in the multispectral space.
In an ideal information extraction task, we can directly associate a spectral class in the
multispectral space with an information class. For example, we have in a two dimensional
space three classes: water, vegetation, and concrete surface.
By defining boundaries among the three groups of grey-level vectors in the two-
dimensional space, we can separate the three classes.
One of the differences between a supervised classification and an unsupervised one is the
ways of associating each spectral class to an information class. For supervised
classification, we first start with specifying an information class on the image. An
algorithm is then used to summarize multispectral information from the specified areas on
the image to form class signatures. This process is called supervised training. For the
unsupervised case,however, an algorithm is first applied to the image and some spectral
classes (also called clusters) are formed. The image analyst then try to assign a spectral
class to the desirable information class.
A pixel-labelling algorithm is used to assign a pixel to an information class. We can use the
previous diagram to discuss ways of doing this.
From the above diagram, there are two obvious ways of classifying this pixel.
As in the above diagram, we define two threshold values along each axis for each class. A
grey-level vector is classified into a class only if it falls between the thresholds of that class
along each axis.
The advantage of this algorithm is its simplicity. The drawback is the difficulty of
including all possible grey-level vectors into the specified class thresholds. It is also
difficult to properly adjust the class thresholds.
Fig. 1 shows spectral curves of two types of ground target: vegetation and soil. If we
sample the spectral reflectance values for the two types of targets (bold-curves) at three
spectral bands: green, red and near-infrared as shown in Fig. 1, we can plot the sampled
values in the three dimensional multispectral space (Fig. 2). The sampled spectral values
become two points in the multispectral space. Similar curves in Fig. 1 will be represented
by closer points in Fig. 2 (two dashed curves in Fig. 1 shown as empty dots in Fig. 2. From
Fig. 2, we can easily see that distance can be used as a similarity measure for classification.
The closer the two points, the more likely they are in the same class.
We can use various types of distance as similarity measures to develop a classifier, i.e.
minimum-distance classifier.
As an example, we show a special case in Fig. 3 where we have 3 classes (nc = 3) and two
spectral bands (nb = 2)
If we have a pixel with a grey-level vector located in the B1-B2 space shown as A (an
empty dot), we are asked to determine to which class it should belong. We can calculate
the distances between A and each of the centers. A is assigned to the class whose center
has the shortest distance to A.
In a general form, an arbitrary pixel with a grey-level vector g = (g1, g2, ..., gnb)T,
is classified as Ci if
Now, in what form should the distance d take? The most-popularly used form is the
Euclidian distance
The second popularly used distance is Mahalanobis distance
For dm and de, because taking their squares will not change the relative magnitude among
distances, in the minimum distance classifiers, we usually use as the distance
measures so as to save some computations.
Class centers C and the data covariance matrix V are usually determined from training
samples if a supervised classification procedure is used. They can also be obtained from
clustering.
For example, there are ns pixels selected as training sample for class Ci.
where j = 1, 2, ..., nb
k = 1, 2, ..., ns
If there are a total of nt pixels selected as training samples for all the classes
The average vector M = (m1, m2, ..., mns) will be obtained.
i = 1, 2, ..., nb.
k = 1, 2, ..., nt.
The covariance matrix is then obtained through the following vector form
MLC is the most common classification method used for remotely sensed data. MLC is
based on the Baye's rule.
Let C = (C1, C2, ..., Cnc) denote a set of classes, where nc is the total number of classes.
For a given pixel with a grey-level vector x, the probability that x belongs to class ci is
P(Ci|x), i = 1, 2, ..., nc. If P(Ci|x) is known for every class, we can determine into which
class x should be classified. This can be done by comparing P(Ci|x)'s, i = 1, 2, ..., nc.
where
P(Ci) is the probability that Ci occurs in the image. It is called a priori probability.
For one-dimensional case, we can see from the above figure that by generating training
statistics of two classes, we have their probability distributions. If we use these statistics
directly, it will be difficult because it requires a large amount of computer memory. The
Gaussian normal distribution model can be used to save the memory. The one-dimensional
Gaussian distribution is:
where we only need two parameter for each class µi and , i = 1, 2, ..., nc
P(Ci) can also be determined with knowledge about an area. If they are not known, we can
assume that each class has an equal chance of occurrence.
With the knowledge of p(x|Ci) and P(Ci), we can conduct maximum likelihood
classification. p(x|Ci) ï P(Ci) i = 1, 2, ..., nc can be compared instead of P(Ci|x) in (1).
The interpretation of the maximum likelihood classifier is illustrated in the above figure.
An x is classified according to the maximum p(x|Ci) ï P(Ci). x1 is classified into C1, x2 is
classified into C2. The class boundary is determined by the point of equal probability.
(2)
Often, we assume P(Ci) is the same for each class. Therefore (2) can be further simplified
to
(3)
With the maximum likelihood classifier, it is guaranteed that the error of misclassification
is minimal if p(x|Ci) is normally distributed.
Unfortunately, the normal distribution cannot always be achieved. In order to make the
best use of the MLC method, one has to make sure that his training sample will generate
distributions as close to the normal distribution as possible.
How large should one's training sample be? Usually, one needs 10 x nb, preferably 100 x
nb, pixels in each class (Swain and Davis, 1978).
MLC is relatively robust but it has the limitation when handling data at nominal or ordinal
scales. The computational cost increases considerably as the image dimensionality
increases.
For images that the user has little knowledge on the number and the spectral properties of
spectral classes, clustering is a useful tool to determine inherent data structures. Clustering
in remote sensing is the process of automatic grouping of pixels with similar spectral
characteristics.
ï Clustering measures - measures how similar two pixels are. The similarity is based on:
(1) Euclidean distance dE(x1, x2)
Although m can be arbitrarily selected, it is suggested that they be selected evenly in the
multispectral space. For example, they can be selected along the diagonal axis going
through the origin of the multispectral space.
Clustering Algorithm 2
ISODATA - Iterative Self Organizing Data Analysis Technique A
Based on the K-means algorithm, ISODATA adds two additional steps to optimize the
clustering process.
At a suitable stage, e.g. after a number of iterations of steps 2 - 4 in the K-means algorithm,
all the clusters i = 1, 2, ..., nc are examined.
If the number of pixels in a particular cluster is too small, then that particular cluster is
deleted.
If two clusters are too close, then they are merged into one cluster.
2. Splitting a cluster
If the variance of a cluster is too large, that cluster can be divided into two clusters.
These two steps increase the adaptivity of the algorithm but also increase the complexity of
computation. Compared to K-means, ISODATA requires more specification of parameters
for deletion and merging and a variance limit for splitting. Variance has to be calculated for
each cluster.
In the K-means algorithm, clustering may not be realized, i.e., the clustering is not
converging. Therefore, we might have to specify the number of iterations to terminate a
clustering process.
This algorithm does not require an image analyst to specify the number of classes
beforehand. It assumes that all pixels are individual clusters and systematically merges
clusters by checking distances between means. This process is continued until all pixels are
in one cluster. The history of merging (fusion) is recorded and they are displayed on a
dendrogram, which is a diagram that shows at what distances the centers of particular
clusters are merged. The following figure shows an example of this algorithm.
This procedure is rarely used in remote sensing because a relatively large number of pixels
in the initial cluster centers requires a huge amount of disk storage in order to keep track of
cluster distances at various levels. However, this algorithm can be used when a smaller
number of clusters is obtained previously from some other methods.
histogram:
(2) Search for peaks in the multispectral space using an eight-neighbour comparison
strategy to see if the center frequency is the highest in a 3 x 3 grey-level vector
neighbourhood. For three dimensional space, search the peak in a neighbourhood.
(3) If a local highest frequency grey-level vector is found, it is recorded as a cluster center.
(4) After all centers are found, they are examined according to the distance between each
pair of clusters. Certain clusters can be merged if they are close together. If a cluster center
has a low frequency it can be deleted.
The disadvantage of this algorithm is that it requires a large amount of memory space
(RAM). For an 8-bit image, we require 256 x 4 bytes to store frequencies (each frequency
is a 40 byte integer) if the image has only one band. As the dimensionality becomes higher,
we need x 4 bytes of memory. When NB = 3, it requires 64 MB (256nb). Nevertheless, this
limit could partly be overcome by a grey-level vector reduction algorithm (Gong and
Howarth, 1992a).
The process from remote sensing data to cartographic product can be summarized as
following:
The reference that the remote sensing products are to be compared with is created based on
human generalization. Depending on the scale of the reference map product, linear features
and object boundaries are allowed to have a buffer zone. As long as the boundaries fall in
their respective buffer zones, they are considered correct.
However, this has not been the case in assessing remote sensing products. In the evaluation
of remote sensing products, we have traditionally adopted a hit-or-miss approach, i.e., by
overlaying the reference map on top of the map product obtained from remote sensing,
instead of giving the RS products tolerant buffers.
Some of the classification accuracy assessment algorithms can be found in Rosenfield and
Fitz patrick-lins (1986) and Story and Congalton (1986)
The above table is an example confusion matrix. The diagonal elements in this matrix
indicate numbers of sample for which the classification results agree with the reference
data.
The matrix contain the complete information on the categorical accuracy. Off diagonal
elements in each row present the numbers of sample that has been misclassified by the
classifier, i.e., the classifier is committing a label to those samples which actually belong to
other labels. The misclassification error is called commission error.
The off-diagonal elements in each column are those samples being omitted by the
classifier. Therefore, the misclassification error is also called omission error.
In order to summarize the classification results, the most commonly used accuracy measure
is the overall accuracy:
From the example of confusion matrix, we can obtain = (28 + 15 + 20)/100 = 63%.
More specific measures are needed because the overall accuracy does not indicate how the
accuracy is distributed across the individual categories. The categories could, and
frequently do, exhibit drastically differing accuracies but overall accuracy method
considers these categories as having equivalent or similar accuracies.
By examining the confusion matrix, it can be seen that at least two methods can be used to
determine individual category accuracies.
(1) The ratio between the number of correctly classified and the row total
(2) The ratio between the number of correctly classified and the column total
(1) is called the user's accuracy because users are concerned about what percentage of the
classes has been correctly classified.
The producer is more interested in (2) because it tells how correctly the reference samples
are classified.
Kappa coefficient
The Kappa coefficient (K) measures the relationship between beyond chance agreement
and expected disagreement. This measure uses all elements in the matrix and not just the
diagonal ones. The estimate of Kappa is the proportion of agreement after chance
agreement is removed from consideration:
pij = eij/NT
po = 0.63
One of the advantages of using this method is that we can statistically compare two
classification products. For example, two classification maps can be made using different
algorithms and we can use the same reference data to verify them. Two s can be derived,
1, 2. For each , the variance can also be calculated.
A normal distribution table can be used to determine where the two s are significantly
different from Z.
e.g. if Z> 1.96, then the difference is said to be significant at the 0.95 probability level.
can be estimated by using the following equation:
Given the above procedures, we need to know how many samples are need to be collected
and where they should be placed.
Sample size
(1) The larger the sample size, more representative an estimate can be obtained, therefore,
more confidence can be achieved.
(2) In order to give each class a proper evaluation, a minimum sample size should be
applied to every class.
(3) Researchers have proposed a number of pixel sampling schemes (e.g., Jensen, 1983).
These are:
ï Random
ï Stratified Random
ï Systematic
1. By conventional classification, we refer to the algorithms which make the use of only
multi-spectral information in the classification process.
2.
3. The problem with multi-spectral classification is that no spatial information on the image
has been utilized. In fact, that is the difference between human interpretation and
computer-assisted image classification. Human interpretation always involves the use of
spatial information such as texture, shape, shade, size, site, association etc. While the
strength of computer technique lag is on the handling of the grey-level values in the image,
in terms of making use of spatial information, computer technique lag for behind.
Therefore, it is an active field in image understanding (which is a subfield of pattern
recognition, or artificial intelligence to make use of spatial patterns in an image).
Preprocessing approach,
Thanks to the development in the image understanding field, we are able to use part of
the spatial information in image classification. Overall, there are two types of approaches
to make use of spatial information.
Object-based classification
In order to classify objects, one has to somehow partition the original imagery. This can be
done with image segmentation techniques that have been introduced previously, such as
thresholding, region-growing and clustering.
The resultant segmented image can then be passed on to the region extraction procedure,
where segments are treated as a whole object for the successive processing.
For instance, we can generate a table for each object as an entity table. From the entity
table, we can proceed with various algorithms to complete classification, or prior to
classification, we may do some preprocessing, such as filtering out some small objects.
We may have to base our classification decision on some neighbourhood information.
Gong and Howarth (1990) have developed a knowledge-based system to conduct a region-
based (object-based) classification.
In a pixel-window based classification, a labelling decision is made for one pixel according
to the multi-spectral data. This data contains information on not only the pixel but also its
neighbourhood.
A pixel window can be of any size, as long as it does not exceed the size of an image. For
computational simplicity, however, odd-sized squares are used.
The grey-level variability within a pixel window can be measured and used in a
classification algorithm. The grey-level variability is referred to as texture (Haralick,
1979). The following is some commonly-used texture measures:
For each pixel-window, we can calculate parameters as in Table 1 (Hsu, 1978; Gong and
Howarth, 1993).
AVE Average
SKW Skewness
KRT Kurtosis
RXN Range
MED Median
______________________________________________________________________
From the grey-level co-occurrence matrix, one can generate a number of parameters.
(Haralick et al 1973): These include,
Homogeneity
Contrast
Entropy, etc.
Although these methods have been used in many remote sensing applications, they require
a large amount of computation and disk space. There are so many parameters that need to
be determined, such as size of pixel-window, distance, angle, statistics, etc.
Most of these spatial features can be categorized into two groups. The first group of spatial
features is similar to an average filtered image. The second group is similar to an edge-
enhanced image.
The simplest example for the post-processing contextual classification is through filtering
such as majority filtering.
Chapter 7
References
Chen, Q., and others, 1989. Remote Sensing and Image Interpretation. Higher Education Press, Beijing, China, (In
Chinese).
Gong P. and P.J. Howarth, 1990a. Land cover to land use conversion: a knowledge-based approach, Technical
Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing, Denver, Colorado, Vol. 4,
pp.447-456.
_____, 1990c. Impreciseness in land-cover classification: its determination, representation and application. The
International Geoscience and Remote Sensing Symposium, IGARSS '90, pp. 929-932.
_____, 1992a. Frequency-based contextual classification and grey-level vector reduction for land-use
identification. Photogrametric Engineering and Remote Sensing, 58(4):421-437.
_____, 1992b. Land-use classification of SPOT HRV data using a cover-frequency method. International Journal
of Remote Sensing, .
_____, 1993. An assessment of some small window-based spatial features for use in land-cover classification,
IGARSS'93, Tokyo, August 18-22, 1993.
Gonzalez, R. C., and P. Wintz, 1987. Digital Image Processing, 2nd. Ed., Addison-Wesley Publishing Company,
Reading, Mass.
Haralick, R. M., 1979. Statistical and structural approaches to texture. Proceedings of the IEEE, 67(5):786-804.
Haralick, R. M., Shanmugan, K. and Dinstein, I., 1973. Texture features for image classification. IEEE
Transactions on System, Man and Cybernetics, SMC-3(6):610-621.
Hsu, S., 1978. Texture-tone analysis for automated landuse mapping. Photogrammetric Engineering and Remote
Sensing, 44(11):1393-1404.
Jensen, J.R., 1983. Urban/Suburban Land Use Analysis. In R.N. Colwell (editor-in-chief), Manual of Remote
Sensing, Second Edition, American Society of Photogrammetry, Falls Church, USA, pp. 1571-1666.
Lillesand, T. M., and R. W. Kiefer, 1994. Remote Sensing and Image Interpretation. 3rd Edition, John Wiley and
Sons, New York.
Peddle, D., 1991. Unpublished Masters Thesis, Department of Geography, The University of Calgary.
Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag, Berlin.
Story, M. and R. G. Congalton, 1986. Accuracy assessment, a user's perspective. Photogrammetric Engineering
and Remote Sensing, 52(3):397-399.
Swain, P. H., and S. M. Davis (editors.), 1978. Remote Sensing: The Quantitative Approach. McGraw-Hill, New
York.
Yen, J., 1989. Gertis: a Dempster-Shafer approach to diagosing hierarchical hypotheses. Communications of the
ACM. 32(5):573-585.
Further Readings
Ball, G. H., and J. D. Hall, 1967. A clustering technique for summarizing multivariate data. Behavioral Science,
12:153-155.
Bezdek, J.C., R. Ehrlich & W. Fall, 1984, FCM: the fuzzy c-means clustering algorithm, Computers and
Geoscience, 10:191-203.
Bishop, Y. M. M., S. E. Feinberg, and P. W. Holland, 1975. Discrete Multivariate Analysis - Theory and Practice.
The MIT Press, Cambridge, Mass.
Chittineni, C. B., 1981. Utilization of spectral-spatial information in the classification of imagery data.
Computer Graphics and Image Processing, 16:305-340.
Cibula, W. G., M. O. Nyquist, 1987, Use of topographic and climatological models in geographical data base
to improve Landsat MSS classification for Olympic national park. Photogrammetric Engineering and Remote
Sensing, 53(1):67-76.
Cohen, J., 1960. A coefficient of agreement for nominal scales. Educational and Psychological Measurement, Vol.
20, No. 1, pp. 37-46.
Congalton, R. G., and R. A. Mead, 1983. A quantitative method to test for consistency and correctness in
photointerpretation. Photogrammetric Engineering and Remote Sensing, 49(1):69-74.
Conners, R. W., and C. A. Harlow, 1980. A theoretical comparison of texture algorithms. IEEE Transactions on
Pattern Analysis and Machine Intelligence, PAMI-2(3): 204-222.
Fleiss, J. L., J. Cohen, and B. S. Everitt, 1969. Large sample standard errors of Kappa and weighted Kappa.
Psychological Bulletin, Vol. 72, No. 5, pp. 323-327.
Fu, K. S and Yu, T. S., 1980. Spatial Pattern Classification Using Contextual Information, Research Studies Press,
Chichester, England.
Fung, T., and E. F. LeDrew, 1987. Land cover change detection with Thematic Mapper spectral/textural data
at the rural-urban fringe. Proceedings of 21st Symposium on Remote Sensing of Environment, Ann Arbor, Mi., Vol. 2,
pp.783-789.
_____, 1988. The determination of optimal threshold levels for change detection using various accuracy
indices. Photogrammetric Engineering and Remote Sensing, 54(10):1449-1454.
Gong, P., D. Marceau, and P. J. Howarth, 1992. A comparison of spatial feature extraction algorithms for
land-use mapping with SPOT HRV data. Remote Sensing of Environment. 40:137-151.
Gong, P., J. R. Miller, J. Freemantle, and B. Chen, 1991. Spectral decomposition of Landsat TM data for urban
land-cover mapping, 14th Canadian Symposium on Remote Sensing, pp.458-461.
Ketting, R. J., and Landgrebe, D. A., 1976. Classification of multispectral image data by extraction and
classification of homogeneous objects. IEEE Transactions on Geoscience and Electronics, GE-14(1):19-26.
Landgrebe, D. A. and E. Malaret, 1986. Noise in remote sensing systems: the effects on classification error.
IEEE Transactions on Geoscience and Remote Sensing, GE-24(2):
8. Integrated Analysis of Multisource Data
1. Spatial Data
Any data with a locational aspect associated are spatial data. In real life, we often ask the question of
where. Where is the bus stop? Where is the post office? To know where is a major part of human life.
In our computerized information society, most of the questions of where can be answered in a
computer system. However, we are not satisfied with knowing where about something, we may need
to know how things at a specific location are related. We want to use what is known to infer those
unknown aspects, those unknown locations.
From highly civilized urban areas to areas where human kind is sparsely populated, spatial data play
an important role in our modern society. In this chapter, we will focus on our natural environment
where different types of natural resources, land covers and uses, and accessibility often concern us. To
find out what it is about a particular location, one would read a map. As a surveyor or cartographer, it
is our job to make such a map. Traditionally, one has go to the field (a particular place) to measure the
location, and record what exists. This is the traditional survey and map approach. A second approach
is to use aerial photography and remote sensing techniques, these are the techniques developed since
World War I. As the technology advances, we observe a revolutionary leap in instruments and
associated data processing techniques. Now satellite based technology is occupying an important
position in the geomatics field. We begin with asking the following questions:
_________
Image, is a medium for communication. High Resolution TV is the tool for communication.
To know how spatial data are collected, helps us to appreciate the possible level of errors or
uncertainties involved in the data collection process.
• In what forms are spatial data collected? How is spatial sampling done?
· Random collection
One needs to determine the density of sampling, obviously, the denser one collects data, the more
likely one would represent the reality.
At most of the time, we tend to use second-hand spatial data, i.e., currently available data and they are
often in map forms.
· Thematic data transfer (from survey, aerial photographs remote sensing images) on to the base map
· Printing
- preserving area
- preserving length
- preserving direction
· Data transfer
· Interpolation or extrapolation
· Printing
• Areal
• Volumetric
• Man-made
• Municipal
• Cadastral, etc.
• Nominal
• Ordinal
• Interval
• Ratio
• Map overlay, for a particular location, collects all the necessary data so as to derive useful
information.
• Similar to decision making in ordinary life, one needs to accumulate evidences in order to arrive a
decision, in multi-source data analysis, each piece of evidence recorded in the data will be evaluated to
validate certain hypothesis.
• It is the objective of this chapter to examine a number of schemes for integrated analysis of spatial
data. Algorithms developed in pattern recognition and artificial intelligence can be used.
In daily life, we use our sensing organs and brain to recognize things and then make decisions and
take actions. Our sensing organs include eyes, ears, nose, tongue and skin touch. The first three are our
remote sensors. Our sensors pass scene, sound, smell, tastes and feeling to our brains, our brains
process the evidences collected by different sensors and analyze them and then compare with things in
our memory that have been recognized before to see if based on the data collected we can recognize
(label) the newly detected thing as one of the things which has been recognized before. If the
recognized thing is a tree in our way, our brain may decide to go around it. In an increasingly
competitive society, in order to make optimized decisions, we have to make best use of all the
evidences that are available to arrive an accurate recognition. In our daily life, we experience
thousands of processes like this, evidence collection - evidence analysis - decision making - action
taking. For example, our eyes cannot resolve details either from too far away or due to their sizes
being too small. This has been made possible with the help of a telescope and a macroscope.
We cannot see in the spectral ranges outside the visible spectral wavelength region, various detectors
sensitive to different non-visible regions can record images for us to see as if our eyes were sensitive
to those spectral regions. In spatial data handling, our brains cannot memorize exactly the location and
spatial extent that certain phenomenon occupies, electro-magnetic media can be used to do so. The
evidence volume is so large that our brain can only process a very small amount of it. Therefore, we
need to use computers to assist us to do so. In this chapter, we examine some of the techniques that
can be used in computer assisted handling of various spatial evidence, especially integrated analysis of
spatial evidence from multiple sources, such as from field survey, remote sensing and/or existing map
sources.
Data integration: integrate spatial data from different sources for a single application. What types of
application are we referring to?
• data structures
• data types
• spatial resolutions
• levels of generalization
continuous
Phenomena Phenomena
More flexible
_________
_________
* Discussion:
Do PM, LM, AM involve scale as their individual components?
* Scale, generalization, error and uncertainty are so much interrelated that deserve some conceptual
clarification.
• Aggregation
_________
• Disaggregation
_________
X' = r W X + e
_________
The problem is that do we need disaggregate our data? What is the uncertainties involved in the
disaggregation process?
Let W denote a finite collection of mutually exclusive statements about the world. By e = 2W we
denote the set of all events. An empty set f, a subset of every set by definition is called the impossible
event, since the outcome of a random selection can never be an element of f. On the other hand, the set
W itself always contains the actual outcome, therefore it is called the certain event. If A and B are
events, then so are the union of A and B, A»B, and the complements of A( ) and
B( ) , respectively. For example, the event A»B occurs if and only if A occurs or B occurs. We call the
pair (W, e) the sample space. Define a function P: e Æ [0, 1] to be a probability if it could be induced
in the way described, i.e. if it satisfies the following conditions which are well known as the
Kolmogorov axioms,
(2) P(W) = 1
P(A) or P(B) is known as the prior probability of A or B occurring. The prior probability of an event is
not conditioned on the occurrence of any other event. Suppose it is noted in an experiment that for 1² i
² n, the event Ai occurred ki times. Then under the conventional evaluation, called maximum
likelihood evaluation:
Pm(Ai) =
Pb(Ai) =
under this evaluation, we implicitly assume that each event has already occurred once even before the
experiment commenced. When Ski Æ • ,
Pm(Ai) = Pb(Ai)
Nevertheless,
Let P(A‘B) denote the probability of even A occurring conditioned on event B having already
occurred. P(A‘B) is known as the posterior probability of A subject to B, or the conditional probability
of A given B.
P(A) ² 1
P( ) = 1 - P(A)
= S1 - S2 + S3 - S4 + ... + (-1)n-1 Sn
where S1 =
S2 =
S3 =
P(A‘B) = .
We then have
where Ai W
P(B) =
_________
This is the complete probability for event B. Therefore the conditional probability can be written as
P(Ai‘B) =
This is the Bayes formula. A number of different versions of this formula will be discussed.
P( ‘B) = 1 - P(A‘B)
O(A) = .
Since P( ) = 1 - P(A) , O(A) =
P(A) =
O(A‘B) = .
Similarly,
P(A‘B) =
Assume event A is a hypothesis h and event B is a piece of evidence e, with the definition of
conditional probability, the following hold:
P(h‘ ) = ,
P( ‘ ) = ,
P(h‘e) = , and
P( ‘e) =
O(h‘ ) = =
= · O(h)
This is called an odds likelihood formulation of the Bayes theorem. Depending on the context, the
following expressions can be used synonymously: e does not occur, e is absent, e does not exist and e
is false.
Similarly,
O(h‘e) = · O(h) .
This called an odds likelihood formulation of the Bayes theorem. The following expressions can be
used synonymously: e occurs, e is present, e exists and e is true.
For a hypothesis, supported by multiple pieces of evidences, by generalizing the above, we have
O(h‘ 1« 2« ... « k«ek+1« ... «em) = · O(h)
In logical expression, when e implies h, that is, e Æ h, can be alternatively read as 'e is sufficient for h'
or as 'h is necessary for e'. There is no ambiguity between e and h, i.e., the reliability is 100%.
However, in reality, the reliability of e in support of h is lower than a logical implication.
An evidence e can usually be in two states: absent or present when P(e) = 0 or P(e) = 1, it is of no
practical interest. Either way there is nothing to observe. For h, it is the same. Therefore, we shall
assume 0 < P(e) < 1 and 0 < P(h) < 1.
To study the necessity and sufficiency measures of e for h, we need to explore the influence that a
state of e has on h. If the state of e makes h more plausible, we say that the state of e encourages h. If it
makes h less plausible, we say that the state of e discourages h. If it neither encourages nor
discourages h, then the state of e has no influence on h, or e and h are independent of each other.
For the necessity measure, we first explore how the absence of e influences h. From O(h‘ ) = · O(h)
we define N =
0²N²•
Similarly, we have
S = 1 No influence
From the above analysis, it is clear that N and S are the measures for necessity and sufficiency,
respectively. N, S and O(h) needed to evaluate O(h‘ ) and O(h‘e) are provided by domain experts.
Quite often, instead of directly supply N and S domain experts may supply values of P(e‘h) and P(e‘ ).
This implies that observing evidential probabilities under a certain hypothesis h or .
N= =
S= .
In the above section, it has been explained that in order to determine the necessity and sufficiency
measures N and S. The posterior probabilities such as P(e‘h) and P(e‘ ) are provided by domain
experts. Sometimes, the system engineer may have to participate in the process of determining P(e‘h)
and P(e‘ ) as will be explained in later part of this lecture (e.g., classification of land-use/cover types
from remotely sensed images).
In spatial handling, domain experts may provide us the spatial data required or we are requested to
collect further data from sources such as remote sensing images. Domain experts may also provide us
their knowledge on where a specific hypothesis has been validated. It might be our responsibility to
transform this type of knowledge into a computer system. The processes of collecting and encoding of
expert knowledge is called knowledge acquisition and knowledge representation, respectively. While
various complex computer structures for knowledge representation may be used, relatively simple
procedures such as use of parametric statistical models or non-parametric look-up tables are often
used. For the parametric method, a further readings is Richards (1986). For the non-parametric
approach, refer to Duda and Hart (1973). Remote sensing image classification can be considered as a
process of hypothesis test in which remotely sensed data are treated as evidences and a number of
classes represent a list of hypotheses. In remote sensing image classification the equivalent of
processes of knowledge acquisition and representation is supervised training (Gong and Howarth,
1990; and Gong and Dunlop, 1991).
In a classification problem, we are given a data set X = { xi‘ƒ i = 1, 2, ... , N } xi being a vector is
considered as a piece of evidence. It may support a number of classes (hypotheses) H = {hj‘j = 1, 2, ...
, M } . To develop the general method for maximum likelihood classification, the penalty function or
the loss function is introduced:
l(j‘k) , j, k = 1, ... , M .
This is a measure of the loss or penalty incurred when a piece of evidence is supporting class hj when
in fact it should support class hk. It is reasonable to assume that l(j‘j) = 0 for all j. This implies that
there is no loss for an evidence supporting the correct class. For a particular piece of evidence xi, the
penalty incurred as xi being erroneously supporting hj is:
l(j‘k) · p(hk‘xi)
where p(hk‘xi) is as before the posterior probability that hk is the correct class for evidence xi.
Averaging the penalty over all possible hypotheses, we have the average penalty, called the
conditional average loss, associated with f evidence xi erroneously support class hj. That is:
L(hj) =
L is a measure of the accumulated penalty incurred given the evidence could have supported any of the
available classes and the penalty functions relating all these classes to class hj.
Thus a useful decision rule for evaluating a piece of evidence for support of a class is to choose that
class for which the average loss is the smallest, i.e.,
This k_ j is the algorithm that implemented Bayes' rule. Because p(hk‘xi) is usually not available, it is
evaluated by p(xi‘hk), p(hk) and p(xi)
p(hk‘xi) =
Thus
L(hj) =
Suppose l(j‘k) = 1 - Fjk with Fjj = 1 and Fjk to be defined. Then from the above formula we have
L(hj) = · -
=1-
The minimum penalty decision rule has become searching the maximum for g(hj) which is
djk =
Fuzzy set is a "class" with a continuum of grades of membership (Zadeh, 1965). More often than not,
the classes of objects encountered in the real physical world do not have precisely defined criteria of
membership. For example, the "class of all real numbers which are much greater than 1", or the "class
of beautiful cats", do not constitute classes or sets in the usual mathematical sense of these terms.
However, the fact remains that such imprecisely defined "classes" pay an important role in human
thinking, particularly in the domain of patterns recognition, and abstraction
Let W, a non-empty set, be the formal basis of our further exertions. Set W is often called the
universe of discourse or frame of discernment. Our focus is primarily on finite sets. In such cases, the
number of elements in W, its cardinality is abbreviated by ‘W‘. Any element in W is denoted by w.
For a specific w Œ W, $ set A which makes either w Œ A or w œ A. This is the basic requirement in
ordinary set theory.
Set A is denoted by A = {w1, w2, ... , wn} , wi is the ith element of set A. When elements in A cannot
be explicitly listed, A is denoted by { w‘ .... }. The later part in the brackets is a description to those
elements which is included. In general,
A = { w‘A(w) true } ,
where A is a function of w.
A B
If A B and B A, then A = B
A W.
An empty set is one that does not contain any element in W. An empty set is denoted as f.
Any A on W , f A W.
A discussed so far is called a single element set. When any A W becomes an element of another set
U, U is also a set, it is sometimes called a set class. All the set classes for W becomes 2W . For
instance, if W = {black, white} then 2W = {{black, white}, {black}, {white}, f} . In fact, sets defined
on W could be a set class. Therefore, a set A defined on W is sometimes denoted as A Œ 2W .
Definition 1. Given A, B Œ 2W ,
A » B = {w‘w Œ A or w Œ B},
= {w‘w œ A},
are called the union of set A and B, the intersection of A and B and the complement of A, respectively.
When "»", "«", and "-" operators are used in combination, "-" has higher priority than "»" and "«".
It can be proven that for any W and A, B Œ 2W , the following relationships hold:
()= « ,
()= » .
A»A=A,A«A=A
A»B=B»A,A«B=B«A
(A » B)»C = A»(B » C )
(A « B)«C = A«(B « C )
(A « B)»B = B,
(A » B)«B = B
A»(B « C) = (A » B)«(A » C)
A » W = W, A « W = A
A » f = A, A « f = f
()=A
A» =W
A« =f.
Definition 3.
A-B=A«
=»-A
f:WÆF.
Projection is the extension to the concept of a function. For any w Œ W, there exists an element j =
f(w). w is the original image and f(w) is called the image of w.
f(w1) _ f(w2) ,
then f is a one to one projection.
XA : W Æ { 0, 1 } such that
XA(w) =
The value of the characteristic function of A at w is XA (w). X(w) is called the degree of membership
for w in A.
for any w Œ W, there is a number m Œ [0, 1] which is the degree of membership for w belonging to .
if m = 1, m = 0.8 m = 0.4 m = 0 , then is a fuzzy set. If is used to represent the concept of "Circular
shape", then m indicates the degrees of circularity of all elements in W.
A fuzzy set defined on a finite W can be represented by a vector. For instance, the "circular shape"
defined on W constitutes a fuzzy set which can be written as
When there may be confusion between different elements, a fuzzy set may be represented as
Example, If age is the universe of discourse, such as W = {0, 1, 2, ..., 200}, the fuzzy sets for "old"
and "young" may be defined as
m=
m= .
Although W is a finite set, we can treat it as a continuous range between 0 to 200 to generate the
curves for fuzzy sets and .
Definition 6. Given , Œ F(W), where F(W) is the set of all the fuzzy sets defined on W. The
membership functions for » , « and are:
m » = max (m , m ) ,
m « = min (m , m ) , and
m = 1 - m , respectively .
If any w whose membership value exceeds l is considered as a member of , then fuzzy set becomes
ordinary set Al .
m= ,
where XAl is the characteristic function for Al . This theorem and the level cut concept are the
linkages for conversions between fuzzy sets and ordinary sets.
Fuzzy set theory and probability theory are used to handle two different types of uncertainty. We use
probability to study random phenomena. Each event itself has distinct meaning and not uncertain.
However, due to the lack of sufficient condition the outcome for certain event to occur during a
process cannot be determined.
In fuzzy set theory, concept or event itself does not have a clear definition. For example, "tall mean",
how tall they are is not defined. Here, whether certain phenomena belong to this concept is difficult to
determine. We call it fuzziness the uncertainty involved in a classification due to the imprecise
concept definition. The root for fuzziness is that there exists transitions between two phenomena. Such
transitions make it possible for us to label phenomena into either this or that class. Fuzzy set theory is
the base for us to study membership relationships from the fuzziness of phenomena.
Fuzzy statistics is used to determine estimate the degree of membership or membership function. In
order to do so we need to design a fuzzy statistic experiment. In such an experiment, similar to fuzzy
statistics, there are four elements:
1. Universe of discourse W ;
2. An element w in W ;
3. An ordinary set A which is varying on the W basis. A is related to a fuzzy set which corresponds to
a fuzzy concept. Each time A is fixed, it represents a deterministic definition of the fuzzy concept as
its approximation.
4. Condition S which contains all the objective and subjective factors that are related to the definition
of the fuzzy concept and therefore is a constraint of the variation of A.
The purpose of fuzzy statistics is to use a deterministic approach to study the uncertainties. The
requirements for a fuzzy statistical experiment is that in each experiment a deterministic decision on
whether w belongs to A. Therefore, in each experiment, A is a definite ordinary set. In fuzzy statistical
experiments, w is fixed while A is changing.
f=
As n increases, f may stabilize. The stabilized membership frequency is the degree of membership for
w belonging to . We call fuzzy statistics involving more than one fuzzy concepts, multi-phase fuzzy
statistics.
e : W Æ Pm .
The results of multi-phase fuzzy statistics enable us to obtain a fuzzy membership function for each
phase on W. They have the following properties:
An important concept needed in fuzzy set theory is that of a fuzzy relation which generalizes the
conventional set-theoretic notion of relation. Let W1 and W2 be two universes. A fuzzy relation – has
the membership function mR : W1 x W2 Æ [0, 1]. The projection of – on W1 is the marginal fuzzy set
m = m 1(w1)
for all (w1, w2) Œ W1 x W2 .
Based on the above introduction, it can be seen that a fuzzy relation in R, the real number space is a
fuzzy set in the product space R x R. For example, the relation denoted by x >> y, x, y Œ R' may be
regarded as a fuzzy set in R2 with the membership function of , f having the following values:
f=0;
f = 0.7 ;
f = 1 ; etc.
Let wo be an unknown value ranging over a set W, and let the piece of imprecise information be
given as a set E, i.e., wo Œ E is known for sure and ‘E‘ ³ 2. If we ask whether another set A contains
wo, there can be two possible answers:
if A « E _ f then it is possible
Poss : 2W Æ [0, 1]
= sup {m ‘w Œ A}
A = {x‘x ² 3} Poss = 1 .
Possibility tells us about the possibility of "not A", hence the necessity of the occurrence of A,
Nec = 1 - Poss .
In addition to the operations of union and intersection, one can define a number of other ways of
forming combinations of fuzzy sets and relating them to one another.
Algebraic product: Given and the algebraic product of and denoted by is defined in terms of the
membership functions of and ,
f=f·f
f+=f+f
provided that 0 ² f + f ² 1
(, ; )= +
f( , ; ) (w) = f · f + (1 - f ) · f
Given any fuzzy set satisfying « » , one can always find a fuzzy set such that
=(, ; ).
In fact,
f = for w Œ W .
Given spatial data E = {e1, e2, ... , em} from m different sources S1, S2, ... , Sm, one wishes to decide
which hypothesis among n of them H = {H1, H2, ... , Hn} is most likely to happen. Or in a
classification problem, one wishes to decide which class among n classes {C1, C2, ... , Cn} is the most
appropriate one into which E to be classified. Formally stated, one wishes to find out a projection F
such that
F : S1 x S2 x ... x Sm Æ H
which satisfies
(2) FHj(E) = 1 .
It requires relatively deep mathematical knowledge to determine a projection from the Cartesian
product space S1 x S2 x ... x Sm to H, interested reader may find Kruse et al. (1991) a starting point.
This may be relaxed by finding a projection between each source Si to H.
Therefore,
One may follow the steps listed below to solve the problem posed.
Step 1. Consider each element in H fuzzy set j, j = 1, 2, ... , n. Determine the fuzzy membership
function on each source Si, i = 1, 2, ... , m for each Hj, j = 1, 2, ... , n. Thus a total of m x n
membership functions need to be found. Usually, expert knowledge or fuzzy sets.
Step 2. Combine evidences from different sources to validate hypotheses or to conduct classification.
Fuzzy set operations including union, intersection, complement and algebraic operation can be used
for such purposes.
Step 3. Compare combined degree of membership for each hypothesis (class), confirm the hypothesis
with the highest degree of membership.
Gong (1993) and a fuzzy classifier in a forest ecological classification research (Crain et al., 1993) are
all following this procedure. It needs to be further validated. The assumption here is obviously each
hypothesis is independent to the other.
One of the most exciting developments during the early days of pattern recognition was the
perception, the idea that a network of elemental processors arrayed in a manner reminiscent of
biological neural nets might be able to learn how to recognize or classify patterns in an autonomous
manner. However, it was realized that simple linear networks were inadequate for that purpose and
that non-linear networks based on threshold-logic units lacked effective learning algorithms. This
problem has been solved by Rumelhart, Hinton and Williams [1986] with a generalized delta rule
(GDR) for learning. In the following section, a neural network model based on the generalized delta
rule is introduced.
8.7.1. The Generalized Delta Rule For the Semilinear Feed Forward Net With Back Propagation
of Error
The architecture of a layered net with feed forward capability is shown below:
In this system architecture, the basic elements are nodes " " and links "Æ". Nodes are arranged in
layers. Each input node accepts a single value. Each node generates an output value. Depending on the
layer that a node is located, its output may be used as the input for all nodes in the next layer.
The links between nodes in successive layers are weight coefficients. For example wji is the link
between two nodes from layer i to layer j. Each node is an arithmetic unit. Nodes in the same layer are
independent of each other, therefore they can be implemented in parallel processing. Except those
nodes of the input layer i, all the nodes take the inputs from all the nodes of the layer and use the linear
combination of those input values as its net input, or a node in layer j, the net input is,
uj = .
Oj = f (uj)
where f is the activation function. It often takes the form as a signoidal function,
Oj =
qj serves as a threshold or bias. The effect of a positive qj is to shift the activation function to the left
along the horizontal axis. The effect of qo is to modify the shape of the signoid. These effects are
illustrated in the following diagram
This function allows for each node to react to certain input differently, some nodes may be easily
activated or fined to generate a high output value when qo is low and qj is small. In contrary, when qo
is high and qj is large a node will have a slower response to the input uj. This is considered occurring
in the human neural system where neurons are activated by different levels of stimuli.
Such a feed forward networks requires a single set of weights and biases that will satisfy all the (input,
output) pairs presented to it. The process of obtaining the weights and biases is called network learning
or training. In the training task, a pattern p = {IPi}, i = 1, 2, ..., ni, ni is the number of nodes in the
input layer. IP is the input pattern index.
For the given input p , we require the network adjust the set of weights in all the connecting links and
also all the thresholds in the nodes such that the desired outputs p (= {tpk}, k = 1, 2, ..., nk, nk is the
number of output nodes) are obtained at the output nodes. Once this adjustment has been
accomplished by the network, another pair of input and output, p and p is presented and the network
is asked to learn that association also.
In general, the output p = {Opk} from the network will not be the same as the target or designed
values p. For each pattern, the square of the error is
Ep = 2
and the average system error is
E= 2.
where the factor of a half is used purely for mathematical convenience at the later stage.
The generalized delta rule (GDR) is used to determine the weights and biases. The correct set of
weights is obtained by varying the weights in a manner calculated to reduce the error Ep as rapidly as
possible. In general, different results will be obtained depending on one carries out the gradient search
in weight space based on either Ep or E.
In GDR, the determination of weights and thresholds is carried out by minimizing Ep.
The convergency of Ep towards improved values for weights and thresholds is achieved by taking
incremental changes Æwkj proportional to - _Ep/_wkj . The subscript p will be omitted subsequently,
thus
Æwkz = - h (1)
where E is expressed in terms of outputs Ok, each of which is the non-linear output of the node k,
Ok = f(uk)
uk = (2)
= · (3)
= Oj (4)
Define
dk = - (5)
and thus
dk = - = · (7)
where
= - (tk - Ok) (8)
and
= f'k(uk) (9)
Thus
Æwji = - h ·
=-h ·
= - h · Oi
= h Oi
= h f'j(uj) · Oi
= h djOi (12)
However is not directly achievable. Thus it has to be indirectly evaluated in terms of quantities that
are known and other quantities that can be evaluated.
- =- ·
= · wkmOm
That is, the deltas at an internal nodes can be evaluated in terms of the deltas at a later layer. Thus,
starting at the last layer, the output layer, we can evaluate dk using equation (10). Then we can
propagate the "error" backward to earlier layers. This is the process of error back-propagation.
Particularly, if
Oj = (18)
then
for the output layer and the hidden layer nodes, respectively.
Note that the number of hidden layer can be greater than 1. Although a three-layer network can form
arbitrarily complex decision regions, sometimes difficult learning tasks can be simplified by increasing
the number of internal layers. Some preliminary assessment of this algorithm was made for ecological
land systems classification of a selected study site in Manitoba (Gong et al., 1994).
References
Crain, I.K., Gong, P., Chapman, M.A., 1993. Implementation considerations for uncertainty
management in an ecologically oriented GIS. GIS'93, Vancouver, B.C., pp.167-172.
Duda, R. O. and P. E. Hart, 1973. Pattern Classification and Scene Analysis. Wiley and Sons, New
York, 482p.
Freeman J.A., D. M. Skapura, 1991. Neural Networks, Algorithms, Applications, and Programming
Techniques, Addison-Wesley:New York.
Gong, P., 1993. Change detection using principal component analysis and fuzzy set theory. Canadian
Journal of Remote Sensing. 19(1): 22-9.
Gong, P., and D.J. Dunlop, 1991. Comments on Skidmore and Turner's supervised non-parametric
classifier. PE&RS. 57(1):1311-1313.
Gong, P. and P. J. Howarth, 1990. Land cover to land use conversion: a knowledge-based approach,
Technical Papers, Annual Conference of American Society of Photogrammetry and Remote Sensing,
Denver, Colorado, Vol. 4, pp.447-456.
Gong, P., A. Zhang, J. Chen, R. Hall, I. Corns, Ecological land systems classification using
multisource data and neural networks, Accepted by GIS'94, Vancouver, B.C., February, 1994.
Goodchild, M.F., G. Sun, S. Yang, 1992. Development and test of an error model for categorical data.
International Journal of Geographical Information Systems. 6(2): 87-104.
Kosko, B., 1992. Neural Networks and Fuzzy Systems. Prentice-Hall; Englewood Cliffs, New Jersey.
Kruse R., E. Schwecke, J. Heinsohn, 1991. Uncertainty and Vagueness in Knowledge Based on
Systems, Numerical Methods. Springer-Verlag: New York.
Mark D. and Cscillag F., 1989. The nature of boundaries on area-class maps. Cartographica, pp. 65-
77.
Pao Y., 1989. Adaptive Pattern Recognition and Neural Networks. Addison-Wesley: Reading, MA.
Richards, J. A., 1986. Remote Sensing Digital Image Analysis: An Introduction. Springer-Verlag,
Berlin.
Shinghal R., 1992. Formal Concepts in Artificial Intelligence, Fundamentals. Chapman & Hall: New
York.