Zhou, Fuqun - A Data Mining Approach For Evaluation of Optimal Time-Series of MODIS Data For Land Cover Mapping at A Regional Level - 2013

ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Contents lists available at ScienceDirect
ISPRS Journal of Photogrammetry and Remote Sensing

journal homepage: www.elsevier.com/locate/isprsjprs
A data mining approach for evaluation of optimal time-series of MODIS

data for land cover mapping at a regional level
Fuqun Zhou a,⇑, Aining Zhang a, Lawrence Townley-Smith b
a
Canada Centre for Remote Sensing, 588 Booth Street, Ottawa, Ontario K1A 0Y7, Canada
b
Agriculture-Ago Food Canada, Science and Technology Branch, Agriculture and Agri-Food Canada, 1800 Hamilton Street, Regina, Saskatchewan S4P 4L2, Canada
a r t i c l e i n f o a b s t r a c t
Article history: Optical Earth Observation data with moderate spatial resolutions, typically MODIS (Moderate Resolution
Received 4 January 2013 Imaging Spectroradiometer), are of particular value to environmental applications due to their high tem-
Received in revised form 24 May 2013 poral and spectral resolutions. Time-series of MODIS data capture dynamic phenomena of vegetation and
Accepted 17 July 2013
its environment, and are considered as one of the most effective data sources for land cover mapping at a
Available online 21 August 2013
regional and national level. However, the time-series, multiple bands and their derivations such as NDVI
constitute a large volume of data that poses a significant challenge for automated mapping of land cover
Keywords:
while optimally utilizing the information it contains. In this study, time-series of 10-day cloud-free
Land cover
Time-series imagery
MODIS composites and its derivatives – NDVI and vegetation phenology information, are fully assessed
Data mining to determine the optimal data sets for deriving land cover. Three groups of variable combinations of
Information extraction MODIS spectral information and its derived metrics are thoroughly explored to identify the optimal com-
Optimal variable combination binations for land cover identification using a data mining tool.
MODIS data The results, based on the assessment using time-series of MODIS data, show that in general using a
longer time period of the time-series data and more spectral bands could lead to more accurate land
cover identification than that of a shorter period of the time-series and fewer bands. However, we reveal
that, with some optimal variable combinations of few bands and a shorter period of time-series data, the
highest possible accuracy of land cover classification can be achieved.
Crown copyright Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS)
Published by Elsevier B.V. All rights reserved.
1. Introduction 2007; Donohue et al., 2008; Giriraj et al., 2008; Liu et al., 2011;
Alcantara et al., 2012).
Optical Earth Observation (EO) data with medium spatial reso- Time-series of EO data have many advantages compared to a
lution, typically MODIS (Moderate Resolution Imaging Spectroradi- single time image for land cover identification (Maxwell et al.,
ometer), are of particular value to environmental applications and 2002a) as the former captures dynamic spectral information of
are considered as the most effective data sources for land cover vegetation at various vegetation growth stages. In some of these
mapping at regional and national scales due to their high temporal studies, the derived metrics from the time-series EO spectral infor-
and spectral resolutions. Time-series of EO data, including from mation such as Normalized Difference Vegetation Index (NDVI)
Advanced Very High Resolution Radiometer (AVHRR), SPOT VGT, (Fuller, 1998; Azzalil and Menentil, 1999; Weiss et al., 2001; Hill
and Moderate Resolution Imaging Spectroradiometer (MODIS), and Donald, 2003; Hilla et al., 2004; Knight et al., 2006; Wardlow
have been widely used in land cover identification (Hirosawa and Egbert, 2008; Baldi et al., 2008; Evrendilek and Gulbeyaz,
et al., 1996; Kasischke and French, 1997; Liang, 2001; Friedl 2008; Kleynhans et al., 2011; Fensholt and Proud, 2012; Neeti
et al., 2002; Bagan et al., 2005; Latifovic and Pouliot, 2005; Xiao et al., 2012) and vegetation phenology are used (Reed et al.,
et al., 2005; Matsuoka et al., 2007; Wardlow et al., 2007; Carrão 1994; de Beurs and Henebry, 2004; Lupo et al., 2007; Bradley
et al., 2008; Zhang et al., 2008; Clark et al., 2010; Lv and Liu, and Mustard, 2008). These derived metrics augment the data
2010; Wardlow and Egbert, 2010; Colditz et al., 2012a) and for land dimension and enrich the information for vegetation mapping
cover dynamics assessment and land cover change detection and monitoring.
(Jakubauskas et al., 2002; Yang and Lo, 2002; Huttich et al., However, the time-series, multiple bands and their derivations
constitute a large volume of data that poses a significant challenge
for automated mapping of land cover. How to optimally use this
⇑ Corresponding author. wealth of information and maximally realize its advantages
E-mail address: fuqun.zhou@nrcan.gc.ca (F. Zhou). remains an issue. For example:
0924-2716/$ - see front matter Crown copyright Ó 2013 International Society for Photogrammetry and Remote Sensing, Inc. (ISPRS) Published by Elsevier B.V. All rights
reserved.
http://dx.doi.org/10.1016/j.isprsjprs.2013.07.008
F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129 115
(1) Would the utilization of all the information available give by Alberta, on the north by the Northwest Territories, on the east
the highest accuracy for land cover identification? by Manitoba of Canada, and on the south by the states of Montana
(2) Would utilization of only a portion of the time-series of data and North Dakota, USA.
(for example, a few spectral bands, shorter time period or Saskatchewan is vast – 651,900 km2 and contains two major
the derived metrics such as NDVI, vegetation phenology) natural regions: the boreal forest in the north and the prairies in
achieve similar or better accuracy than using of all the the south. The southern half of Saskatchewan contains almost half
information? of Canada’s total cultivated farmland. Saskatchewan continues to
(3) Would utilization of the derived metrics such as NDVI and be a leader in cereal crop production across the country supplying
vegetation phenology metrics combined with spectral bands 5% of the world’s total exported wheat and is Canada’s most impor-
(fewer bands) result in the same or higher accurate land tant grain-producing region. Saskatchewan also produces oilseeds
cover maps? such as canola, flaxseed, mustard, sunflower and safflower. In the
(4) If the answers to questions 2 or/and 3 are positive, what northern part of the province, forestry is also a significant industry.
combinations of bands and time-periods of data and NDVI Overall, annual crop land dominates the study region, with sub-
or vegetation phenology may lead to the optimal result? stantial pasture/perennial crop land such as alfalfa. Forest land is
mainly distributed in the north. (http://www.agricul-
Maxwell et al. (2002a,b) examined AVHRR time-series data (16- ture.gov.sk.ca/Saskatchewan_Picture/ – accessed on August 31st,
day composites) for land cover identification, and addressed some 2012).
of the above questions. In their first study (Maxwell et al., 2002a),
they evaluated the effect of reducing the number of composite 2.2. EO data and derived variables
periods and altering the spacing of those composite periods on
land cover classification accuracy. They found that the number of MODIS sensors aboard the Terra satellites provide global cover-
composite periods can be halved, reducing from fourteen (14) age every one to two days, acquiring data in 7 spectral bands suit-
composite dates to seven (7) composite dates, without significantly able for land surface applications, with a spatial resolution of
reducing overall classification accuracy, and that concentrating 250 m (bands 1–2) or 500 m (bands 3–7). It is one of the most ad-
more composites near the beginning and end of the growing sea- vanced sensors available for large-scale terrestrial applications
son, as compared to using evenly spaced time periods, produced (Salomonson et al., 1989). To facilitate use of the datasets, Canada
slightly higher classification values. The study result was similar Centre for Remote Sensing has developed a technology to produce
to the earlier studies by McGwire et al. (1992) and Ehrlich and 10-day cloud-free composites of Terra MODIS’ 7 bands (1–7) cov-
Lambin (1996). ering Canada and North America. In addition, bands 3–7 are down-
In another study, Maxwell et al. (2002b) assessed the reduction scaled from the original spatial resolution of 500 m to 250 m (Luo
of spectral dimensionality of the AVHRR bands while retaining et al., 2008). Therefore all the 7 bands of the 10-day cloud-free
information needed for land cover mapping. For AVHRR, they con- MODIS composites used in the study have the same spatial resolu-
cluded that three channels among five are necessary for effective tion and dimension.
land cover type discrimination. Canada is located at high latitude and has only one growing sea-
Recently, Biradar et al. (2007) assessed the best spectral bands son. For this study, only the data in the growing season (April to
and timing of imagery for land use – land cover class separation October) is included. The selected time-series datasets span
using Landsat ETM+ and Terra MODIS data (16-day composites). 7 months and each month has 3 composites. In total, 21 compos-
Their study found that using certain bands and time-series period ites are available in the period of the vegetation growing season.
combinations of both datasets can better separate land cover and Considering all the 7 bands for each composite, there is a total of
land use in their small study area. Although these studies have an- 147 ‘bands’ or layers of the selected time series.
swered some of the above questions, further studies are needed to Two additional datasets, NDVI and a set of vegetation phenolog-
fully explore the optimal reduced dataset combinations for land ical metrics in the vegetation-growing season, are derived from the
cover identification for potential monitoring operations at a regio- time-series of MODIS bands and analyzed. As NDVI is derived from
nal scale. Moreover, although NDVI and vegetation phenology de- MODIS bands 1 and 2, it has the same length of the MODIS compos-
rived from time-series of spectral bands are increasingly used for ites, i.e., there are 3 NDVI values per month, and in total, a time-
characterizing land cover and detecting land cover change, their series of 21 NDVI variables are derived during the studied season.
contributions along with the spectral bands as well as their opti- A time-series of NDVI of three typical vegetation types (annual
mal combination for accurate land cover mapping need a thorough crop, native grass and deciduous tree) are shown in Fig. 2a, c,
evaluation. and e. It can be seen that the three types of vegetation, in general,
This study tries to answer the above questions by fully explor- have different NDVI patterns.
ing the time-series of MODIS 10-day cloud-free composites in the Vegetation phenology represents stages of growth or develop-
growing season to identify the optimal variable combinations for ment of plants which occur during a growing season, and apply
land cover identification at a regional level. As the possible combi- to both natural vegetation (e.g. trees, and grassland) and cultivated
nations of the time-series of MODIS data and its derivatives are crops. NDVI responds to the growth of vegetation which differs
enormous, we have developed and evaluated four variable combi- with land cover and can be captured by EO sensors with a high re-
nation models to identify the optimal combination of variables for visit frequency. Vegetation phenological curves can be derived
land cover mapping which is described in section below. from a time-series of NDVI (Fig. 2) and described by phenological
parameters from the curves (Fig. 3). The phenology curves of three
typical vegetation types (annual crop, native grass, and deciduous
2. Study area, data and method tree) are shown in Fig. 2b, d, and f. In this study, a total of 11 veg-
etation phenological parameters were derived from the phonology
2.1. Study area curves using the TIMESAT package (Jönsson and Eklundh, 2003).
We discuss these parameters and their applications in detail in la-
The area under the study is the southern agriculture region of ter sections.
Saskatchewan, Canada as shown in Fig. 1. Saskatchewan is in the Fig. 2 shows the typical time-series of NDVI in a growing season
central part of the Canadian Prairies and is bounded on the west and the derived vegetation phenological curve of three different
116 F. Zhou et al. / ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129
Fig. 1. The study area.
types of vegetation. The three vegetation types have different NDVI the land area, and due to the objective of the study for land cov-
values at various growing stages and therefore different phenolog- er identification at regional level. The 10 land cover types and
ical curves. The phenological curves filter out some of the noise in their description are listed in Table 1. For the study, it is not
the NDVI time series. The typical annual crop has a narrow shape, critical to separate deciduous, coniferous, and mixed forest land
indicating a short growing period, and the peak of the curve is of- cover. They are grouped into forest land cover type in the study.
ten during July. Native grass has a flat but relatively constant NDVI Wetland was masked out before the mapping. All the analysis
distribution over the growing season. The typical deciduous tree and results described below are based on the redefined land cov-
has the largest NDVI over almost all the growing season. er types.
With all the spectral information (time-series of MODIS 10-day
cloud-free 7 band composites) and the derived data (time-series of 2.4. Data mining approach
NDVI, and phenology parameters), there are a number of possible
variable combinations that could be useful for land cover identifi- The datasets used in this study as described above are time-
cation. To fully explore the optimal variable combinations for land series of 10-day cloud-free MODIS 7 bands composites and their
cover identification, a data mining methodology is developed to derived NDVI and vegetation phenological metrics. As a variety
identify the optimal variable combinations for achieving land cover and a large volume of data is available, the main objective of
mapping by using a commercial data mining tool See5/C5.0. the study is to identify the optimal spectral or/and the spectrally
MODIS became operational in 2000 and there is a circa 2000 derived variable combinations and time period which could pro-
land cover map (c2000 map) in the study area which was gener- duce land cover maps with the highest possible accuracy. In-
ated from Landsat ETM images. Therefore year 2000 is selected stead of selecting a few variable combinations subjectively for
for the evaluation of the optimal variable combination for the land the evaluation, this study develops 4 variable combination mod-
cover mapping using the c2000 map (Agriculture and Agri-Food els and thoroughly evaluates all possible variable combinations
Canada, 2012) as the reference. with various time periods to explore the optimal variable set
which can achieve the highest possible accuracy of land cover
2.3. Classification scheme identification.
Fig. 4 shows the workflow of the data mining process. The input
The circa 2000 land cover map of Saskatchewan has 12 land data are stored in the variable space and extracted into a feature
cover types, ten of them are included in the study, and the other space using the data mining process, which uses feature extractors
2 are disregarded as in total they consist of about only 0.6% of to convert the variables into features. Then the data mining models
Fig. 2. Time-series of NDVI variables and phenology curves of three typical vegetation types: at the left are NDVI time series: (a) Annual crop; (c) native grass; (e) deciduous
tree. At the right (b), (d), and (f) are the corresponding fitted phenology curves.
are trained and verified using training and verification features ten ensemble classifiers (with See5/C5 boost option 10 trials) have
respectively before they are deployed to the data of the entire been found to be the optimum improving accuracy and were used
study area. for all the models.
The data mining tool used in the study is See5/C5.0, a commer- The variables of the input EO and EO-derived metrics in the var-
cial and generic tool that extracts patterns from the input data into iable space can be described as:
the form of a decision tree or rule sets for aiding decision-making.
See5/C5.0 has been applied by several studies in land cover classi- (a) Spectral data of MODIS time-series of 10-day cloud-free
fication based on remote sensing (Keane et al., 2004; Pal and Math- composites: spectral(bt)
er 2003; Kumar et al., 2010; Evrendilek and Gulbeyaz, 2011;
Boryan et al., 2011; Boulila et al., 2011; Colditz et al., 2012b; where b e (1, . . . , 7) (band index of the cloud-free composite) and
Schneider, 2012). It has been designed to analyze large volumes t e (1, . . . , 21) (time index of the time-series of cloud-free
of data and incorporates innovations such as adaptive boosting, composites).
which constructs multiple classifiers for ensemble classification. (b) Derived NDVI time-series: NDVI(t)
When boosting is enabled, multiple classifiers are created; each where t e (1, . . . , 21) (time index of the NDVI time-series).
classifier votes for its predicted classes and the votes are counted
to determine the final classes (Quinlan, 1996). In these studies, (c) Vegetation phenological parameters: VPP(k)
X
m X
n
n1 ðm; nÞ ¼ spectralðbt Þ mP3 ð1Þ
b¼1 t¼1
X
n X
m
n2 ðm; nÞ ¼ ðNDVIðtÞ þ spectralðbt ÞÞ b P 3; mPb ð2Þ
t¼1 b¼3
X
11 X
m X
n
n3 ðm; nÞ ¼ VPPðkÞ þ spectralðbt Þ b P 3; mPb ð3Þ
k¼1 b¼3 t¼1
X
l
n4 ðlÞ ¼ VPPðkÞ ð4Þ
k¼1
where n(m, n) is the feature model with parameters m and n;

m e (1, . . . , 7) the MODIS band index; n e (1, . . . , 21) a composite in-
Fig. 3. Illustration of a phenology curve and some of phenology parameters (the
original figure from Eklundh and Jonsson (2012)). The black line is the NDVI curve dex of the time-series; l e (1 . . . , 11) the phenology parameter index;
P
and the red line is the fitted function. The phenology features include: (a) date of and maxmin is the operator of variable combinations from min to max
beginning of growing season; (b) date of end of season; (c) date of reaching 80% of (max P min).
seasonal amplitude (rising); (d) date of reaching 80% of seasonal amplitude
Model 1 n1(m, n) generates features with minimum 3 spectral
(declining); (e) maximum value; (f) seasonal amplitude; (g) growing season length;
(h) small seasonal integral (area between the base level and the phonology curve); bands and maximum 7 bands of various time-series periods from
and (i) large seasonal integral (area below phenology curve). one single time stamp to the entire length of the time-series; Mod-
el 2 n2(m, n) generates features of NDVI and at least 1 spectral band
and maximum 5 (exclusive bands 1 and 2) in various lengths of the
time-series; and Model 3 n3(m, n) generates features of vegetation
where k e (1, . . . , 11) (index of phenology parameters).
phenology parameters and at least 1 spectral band and maximum 5
As shown above and in Fig. 4, the variable space contains time-
(exclusive bands 1 and 2) with various time-series lengths; Model
series of spectral information spectral(bt), derived NDVI(t), and
4 n4(l) generates features of various combinations of vegetation
vegetation phenology parameters VPP(k). Four feature models are
phenology parameters. For the Models 2 and 3 as shown in Eqs.
constructed to extract variable combinations from the variable
(2) and (3), in order to avoid redundancy, bands 1 and 2 are ex-
space into the feature space; each of them generates a large
cluded from the feature extraction when spectral bands are com-
number of feature sets:
bined with NDVI or phenology metrics.
Table 1
Major land cover types and their description.
Land cover type Description

Annual cropland Annually cultivated cropland and woody perennial crops. Includes annual field crops, vegetables, summer fallow, orchards and
vineyards
Water bodies Lakes, reservoirs, rivers, streams, salt water, etc.
Developed land Land predominantly built-up or developed; including vegetation associated with these cover conditions. This may include road
surfaces, railway surfaces, buildings and paved surfaces, urban areas, parks, industrial sites, mine structures and farmsteads
Native grassland Predominantly native grasses and other herbaceous vegetation, may include some shrubland cover. Land used for range or native
unimproved pasture may appear in this class
Perennial (perennial cropland Periodically cultivated cropland. Includes tame grasses and other perennial crops such as alfalfa and clover grown alone or as
and pasture) mixtures for hay, pasture or seed
Shrubland Predominantly woody vegetation of relatively low height (generally ±2 m)
Forest land
Deciduous forest Predominantly broadleaf/deciduous forests or treed areas
Coniferous forest Predominantly coniferous forests or treed areas
Mixed forest Mixed coniferous and broadleaf/deciduous forests or treed areas
Wetland Land with a water table near/at/above soil surface for enough time to promote wetland or aquatic processes (semi-permanent or
permanent wetland vegetation, including fens, bogs, swamps, sloughs, marshes, etc.)
Training Data
Variable Space
Spectral (bt) Feature space
NDVI (t) Decision Tree
Feature of variable Data Ming
VPP(k) for Land Cover
Extractor combinations See5/C5.0
Classification
Verification Data Land Cover

Classification
Fig. 4. EO-based data mining procedures for land cover identification.

In the feature space, each feature set, i.e., a combination of vari- 3. Result and analysis
ables with a time period (either a single time slot – one composite,
or multiple time frames – multiple composites) generated by Mod- 3.1. Results and analysis of homogeneous pixels for training and
el 1, 2, 3 or 4, is processed by the data mining tool and then eval- verification
uated to determine if any variable combination can sufficiently
identify all the land cover types of the study. Using the methodology developed in Section 2.4: (1) Model 1
(Eq. (1)) of spectral band combinations only, (2) Model 2 (Eq. (2))
2.5. Sampling strategies of NDVI/spectral band combinations, (3) Model 3 (Eq. (3)) of phe-
nology parameter/band combinations, and (4) Model 4 (Eq. (4))
Point samples like ground reference collected from field survey of phenology parameter combinations only. This section will dis-
using GPS are not the best representations for both model training cuss the results of the first 3 models (Models 1, 2 and 3) of the var-
and verification of land cover study using MODIS images due to its iable combinations, and then discuss the results of Model 4 in
medium spatial resolution. As the ground dimension of the MODIS Section 3.6.
data used in the study is 250 m by 250 m, a pixel may be mixed by In the first three groups of variable combinations, each variable
two or more land cover types on the ground. If one of the land cov- associates with a time period. To simplify the analysis, the unit of
er types in such pixel is sampled and then used to represent the en- the time-series period is 1 month. Therefore all the 3 composites
tire pixel either for model training or verification, the result does within a month are arranged as a unit. That means that if a month
not reflect the true situation. In order to avoid this situation, in- of the time-series is considered for evaluation, all the data of the
stead, an area sampling method is used in the study. three composites within the month are used.
A pixel of EO images (especially with a medium or coarse spatial Table 2 shows the lowest and highest Kappa values of the three
resolution) could have three land cover situations: (1) only one models for land cover identification using homogeneous pixels for
land cover type; (2) a dominant land cover; or (3) mixed land cover both training and verification processes. The highest Kappa values
types. A pixel with only one land cover type (situation 1) is called a of the three models are very similar, but the lowest has large vari-
homogeneous pixel. A pixel with a dominant land cover (situation ations. The NDVI and one spectral band of a single time stamp
2) is defined as within the ground dimension of the pixel there are (1 month) yields the lowest Kappa (0.55), while the combination
two or more land cover types, but among those land cover types, of band and vegetation phenology parameters yields the highest
one land cover type covers more than a half of the ground area, low-end Kappa (0.75). An analysis of the Kappa values of different
and other land cover types occupy the rest area of the pixel. A pixel variable combinations reveals that, in general, the longer the
of the situation 3 can be called a heterogeneous pixel of which the length of the time-series of data, the higher the Kappa value, and
ground surface of the pixel is covered by multiple land cover types, thus the higher identification accuracy. This observation suggests
but without a dominant one. that time series data are more informative than a single snapshot
Homogeneity, dominance, and heterogeneity described above image for pattern recognition of land cover. For example, the
are relative terms and changeable depending on the scale of a low-end Kappa for phenology/spectral band combinations had a
study. When they are applied to EO-based applications, they are much larger value than that of other two combinations as phenol-
determined by the spatial resolution of imagery in relation to the ogy parameters are derived from all the data in a full growing
size of features on the ground. For this study, the three types of season.
landscape settings are distinguished within the dimension of a In the following sections, the analysis is based on the accuracy
MODIS pixel. of land cover classification against the verification samples, instead
The c2000 map was used to identify and evaluate the distribu- of using Kappa value. Table 3 shows the result of one of the optimal
tion of these pixel types within the study area. The c2000 map is NDVI/band variable combinations (one from the variable combina-
raster-based and generated mainly from Landsat ETM images, tion group with the highest Kappa). The confusion matrix patterns
and its spatial resolution is 30 m. Each pixel of the map is assigned of the land cover types are typical of the 3 models with the highest
one land cover, therefore every pixel of the reference map can be Kappa. The numbers in bold in the diagonal of the matrix represent
considered as a homogeneous one. When a MODIS pixel is super- the pixels that are correctly classified and suggest that the optimal
imposed on the land cover map, it covers about 64 pixels of the variable combinations (the highest overall accuracy) can discrimi-
land cover map. A MODIS pixel is identified as homogeneous if nate most of the land cover types with an overall average accuracy
and only if all the corresponding 64 pixels of the c2000 map have of about 89%. The land covers of water bodies, forest land, annual
the same land cover, or as dominant if more than one land cover cropland, and developed land all have accuracies over 90%. Peren-
types are found, but 32 or more sub-pixels have the same land cov- nial land has the lowest accuracy (76%). In the land covers with
er (the dominant one), or as heterogeneous (other land cover com- accuracies below 80%, perennial cover and native grassland are of-
binations). Within the study area, the proportion of homogeneous ten misclassified as each other (130 and 184 out of 1000 samples,
and dominant (MODIS) pixels is 48.16%, and 47.76%, respectively. respectively). The perennial land includes seeded grasslands which
In total, they occupy about 96% of the study area. Only about 4% are similar to native grassland in spectral signature and phenology.
of the area is occupied by multiple land cover types (heterogene- Shrubland and forest are also misclassified with each other, as the
ity) without a dominant land cover defined above. main difference between them is height and density. In many sit-
In this study, we used only homogeneous samples for model uations, they likely have similar spectral signatures and phenology.
training, but had two sampling strategies for model verification: Although the native grassland and perennial land are difficult to
(1) homogeneous samples only and (2) dominant and homoge- separate from each other, they are easily distinguished from the
neous samples. The latter samples represent about 96% of the land- other land cover types. Only 8 out of 1000 samples of perennial
scape of the study region. All the samples are selected randomly land are classified as forest land cover, and no grassland is misla-
within the study area and 1000 samples were gathered indepen- beled as forest land. Twenty-six and 50 out of 1000 samples of
dently for each of the land cover types. Deciduous, coniferous, grassland and perennial land are misclassified as cropland, respec-
and mixed forest were sampled and classified separately, but their tively. Developed land has an accuracy of 93%. Although it does
results were aggregated as forest land. Therefore, as a group, forest not confuse with water bodies, it confuses with all other types of
land has 3000 samples in total. land cover slightly. Water has the least confusion with others.
Table 2
Kappa of land cover classification.
Model No. of variable combinations Kappa

Lowest Highest
Spectral bands with various time-series periods 12,573 0.58 0.84
NDVI + spectral bands with various time-series periods 3937 0.55 0.85
Phenology metrics + spectral bands with various time-series periods 3937 0.75 0.84
Table 3
Accuracy matrix of a band/NDVI combination for land cover classification (one of the best variable combinations).
Land cover type Cropland Forest land Shrub land Grassland Perennial Developed Water %
Cropland 934 12 36 18 93.40
Forest land 4 2872 102 22 95.73
Shrubland 10 118 798 22 44 6 2 79.80
Grassland 26 8 760 184 20 2 76.00
Perennial 50 8 26 130 758 28 75.80
Developed 22 6 8 12 26 926 92.60
Water 2 2 2 2 992 99.20
Average 87.50
3.2. Results of homogeneous pixels for training and dominant pixels for that, if not all, the majority of the 64 sub-pixels of a MODIS pixel
verification are cropland. The accuracy of developed land cover decreases the
most from about 92% to about 75%, and the accuracy of water
In the study region, about 52% of the land cover is not homoge- bodies drops more than 10%. Other land covers yield a lower accu-
neous based on the spatial dimension of a MODIS pixel and 30 m racy (about 4–8% lower). The results are explainable because the
c2000 map. Therefore, we conducted an evaluation using homoge- land covers other than the dominant one within a MODIS pixel
neous samples for model training, but dominant land cover sam- would contribute to the spectral information, which makes it more
ples for model verification. Statistically, the verification samples difficult to accurately identify the dominant land cover. The degree
represent about 96% of all the MODIS pixels of the study region. of the confusion may depend on the number and the types of land
The same three groups of features (Models 1, 2, and 3) described covers within the pixels of a dominant land cover.
above were evaluated. Table 4 lists the agreement percent of land
cover types of the best results of the three groups of variable com- 3.3. Time-series of band variable combinations
binations, respectively.
In comparison with the results of using homogeneous pixels for In Section 3.2, the best results of land cover identification
both training and verification, the optimal variable combinations of from all possible band variable combinations of various time-
the three groups using homogeneous pixels for training and dom- series periods were presented. In this section, we will present
inant pixels for verification yield similar but lower accuracy. How- and analyze the statistics of all the results of band variable com-
ever, the overall averaged accuracy of all the land cover binations (Eq. (1)) based on the evaluation of homogeneous land
identification is above 80%. Although there are some differences cover samples for model training, and dominant land cover
in the results for each land cover type, overall they are similar to samples for validation. As described in Section 3.1, in order to
the results using homogenous samples: cropland and forest land make the statistical analysis concise, the time series of data
and water bodies have highest accuracy, and perennial and grass- are assessed in a unit of 1 month, leading to a total of 12,753
land have the lowest. possible band variable combinations of the time-series data with
It can be seen, from comparison of Tables 3 and 4, that the over- a condition that any variable combination must have at least 3
all averaged accuracy decreases about 9% using dominant pixels for bands. For example, a combination can be bands 1, 2, and 6 in
verification, which is likely caused by land cover mixing within a April; or bands 1, 2, and 6 in April and May, or bands 1, 2, 3,
pixel. The accuracy of cropland cover type decreases the least from and 6 in June, July and August, and so on.
93% to 90%. This is because cropland in Saskatchewan is usually Two groups of statistical results for the band variable combina-
in large parcels relative to the MODIS pixel size. Once cropland tions are analyzed. One group presents the results of a fixed num-
cover becomes a dominant land cover within a pixel, it is likely ber of bands vs. various time-series periods; and the other presents
the results of a fixed time-series period vs. various numbers of
Table 4 bands. Fig. 5 shows the overall averaged accuracy of land cover
Maximum overall averaged accuracy of land cover identification using dominant classification with a fixed number of bands and various time-series
pixels for verification (one of the best combinations of each group). periods, and Fig. 6 shows the overall averaged accuracy of land
Land cover type The highest overall accuracy (%) cover classification with a fixed time-series period and various
numbers of bands.
Model 1 (Eq. (1)) Model 2 (Eq. (2)) Model 3 (Eq. (3))
As shown in Fig. 5, in general, when the number of bands is
Cropland 89.20 89.80 90.40 fixed, the overall averaged accuracy gains with the increase of
Forest land 86.29 85.56 84.53
Shrubland 71.00 72.40 71.80
time-series length. For example, 3 band combinations from
Grassland 71.20 71.60 71.60 1-month time period could produce an overall averaged accuracy
Perennial 71.40 71.20 71.40 about 73%; when the length of the time-series is increased to
Developed 74.20 75.60 75.00 2 months, the averaged accuracy is improved by about 3% to about
Water 86.60 86.20 89.40
76%. This implies that the reduced accuracy of a smaller number of
Overall average 80.27 80.39 80.35
bands can be compensated by a longer time-series length. This
No. of Bands: 3 No. of Bands: 4

82 82
Accuracy (%)
Accuracy (%)
80 80
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time-series length (in month) Time-series length (in month)
(a) (b)
82 82
Accuracy (%)
Accuracy (%)
80 80
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time-series length (in month) Time-series length (in month)
(c) (d)
No. of Bands: 7
82
Accuracy (%)
80
78
76
74
72
70
1 2 3 4 5 6 7
Time-series length (in month)
(e)
Fig. 5. Overall averaged accuracy and uncertainty of band combinations grouped by a fixed number of bands and various time period. The uncertainty is the standard
deviation.
observation applies to all the other cases of a fixed number of band Similarly, as shown in Fig. 6, when the time-series period is
combinations. However, the accuracy improvement with the fixed the overall averaged accuracy is higher when more bands
increasing time-series length is not linear, with the incremental are utilized. There are also diminishing returns from increasing
improvement of accuracy being reduced as time series length is in- the number of bands, particularly above 4 bands. Moreover, when
creased. In one case, using all 7 bands and increasing the time ser- the length of time-series is 5 months or longer, there is almost no
ies from 6 to 7 months, increasing the time-series length causes a improvement from 6 bands to 7. In the case of 7 month period, the
slight reduction (approximately 0.5%) in accuracy. This similar accuracy even decreases from 6 bands to 7 bands.
phenomenon was reported by Pouliot et al. (2009) and is caused It is worth mentioning that as discussed above, more
by feature saturation. uncertainty is associated with variable combinations of a shorter
These patterns indicate that time-series data do help improve time-series length and fewer bands. Therefore carefully selecting
the land cover classification accuracy, but there are diminishing re- the band and the time period for an optimal variable
turns from increasing the length and eventually the improvement combination to achieve the best possible accurate output is
is small. For example, when 5 bands are used, the overall accuracy necessary, especially when a short period and a fewer bands are
increases only about 1% from time-series length 3 months to used.
7 months. As data with a longer time-series length would involve Of the 12,736 possible combinations (Eq. (1)) with various num-
more processes (such as geometrical and radiometric corrections) bers of bands and time-series periods, there are 439 combinations
and more computation, there is a trade-off of computational cost whose accuracy is over 80%. As shown in Fig. 7a, among the 439
and accuracy improvement. band combinations that produced higher accuracy classifications,
The uncertainty bars shown in Fig. 5 are the standard deviation bands 1 and 2 have the highest occurrences – band 2 is part of
of all the results of the possible combinations of each group. In all the band combinations, and band 1 is involved in 95% of the
general, the accuracy of the combinations with fewer bands and band combinations. They are followed by band 6. Band 5 has the
a shorter time-series period has a larger uncertainty than that of lowest occurrences, with band 4 and band 7 having just a little
the combinations with more bands and longer time-series. This bit higher occurrences than that of band 5. Therefore, bands 1, 2
means that when fewer bands with a shorter time-series length and 6 have an advantage over other bands of generating higher
are used in land cover classification, choosing a ‘right’ band combi- accuracy outputs in terms of band selection. The better perfor-
nation for a better accuracy is more critical than using the data of mance of Bands 1 and 2 is due to their higher (original) spatial res-
more bands and a longer time-series. olution, and are in the spectral range of red and near infrared range
Time-series period -1 month Time-series period -2 months

82 82
Accuracy (%)
Accuracy (%)
80 80
78 78
76 76
74 74
72 72
70 70
3 4 5 6 7 3 4 5 6 7
No. of Bands No. of Bands
(a) (b)
Time-series period -3 months Time-series period - 4 months
82 82
Accuracy (%)
Accuracy (%)
80 80
78 78
76 76
74 74
72 72
70 70
3 4 5 6 7 3 4 5 6 7
(c) (d)
Time-series period - 5 months Time-series period - 6 months
82 82
Accuracy (%)
Accuracy (%)
80 80
78 78
76 76
74 74
72 72
70 70
3 4 5 6 7 3 4 5 6 7
(e) (f)
Time-series period - 7 months
82
Accuracy (%)
80
78
76
74
72
70
3 4 5 6 7
No. of Bands
(g)
Fig. 6. Overall averaged accuracy and uncertainty of various number of spectral band combinations grouped by a fixed time period. The uncertainty is the standard deviation.
Histogram of individual band for Histogram of Individual month for

overall accuracy over 80% overall accuracy over 80%
500 400
Occurrence
Occurrence
400 300
300 200
200 100
100 0
0
ril
ay
ne
ly
st
er
be
Ju
gu
Ap
ob
M
Ju
1 2 3 4 5 6 7
em
Au
ct
O
pt
Se
Band
(a) (b)
Fig. 7. Histogram of individual band (a) and month (b) for overall averaged accuracy over 80%.
Histogram of number of bands for Percent of number of bands for

200 20
150
Occurrence
15
Percent (%)
100 10
50 5
0 0
3 4 5 6 7 3 4 5 6 7
Number of Bands Number of Bands
(a) (b)
Fig. 8. Histogram of the bands (a) and percentage (b) for overall averaged accuracy over 80%.
Histogram of time-series period for Percent of time-series period for

200 25
Occurence
20
Percent (%)
150
15
100
10
50
5
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time-series length in month Time-series length in month
(a) (b)
Fig. 9. Histogram of time-series period (a) and percentage (b) for overall averaged accuracy over 80%.
which are very good for vegetation identification. Band 6 is in the data with a longer time-series length is preferable for land cover
range of middle infrared, and is also good for identifying classification.
vegetation.
Fig. 7b shows individual month occurrence for the variable 3.4. NDVI and band combinations with various time-series lengths
combinations whose accuracy is over 80%. It can be seen that al-
most all the months have the similar number of occurrences and As NDVI is derived from bands 1 and 2 of time-series MODIS
no single month stands out although August and October have a data, the NDVI is used to replace bands 1 and 2 in all the band com-
slightly higher occurrence. This means that, in terms of the time binations involving them (Eq. (2)) to determine if NDVI contributes
factor, no period of the growing season is critical for land cover more to the accuracy of land cover classification than bands 1 and
classification, but rather a time-series period of EO observations 2 directly. In this model band 1 and band 2 in the form of NDVI are
is required. always present in the exploration. As presented in Section 3.3,
Furthermore, among the 439 variable combinations with over- bands 1 and 2 contribute mostly to the higher accuracy results of
all average accuracy exceeding 80%, combinations involving 5 land cover classification. In the exploration, NDVI is combined with
bands have the highest number (159), followed by combinations one or more bands from among band 3 to 7 with various time peri-
of 4 bands (137) (Fig. 8a). Combinations with 7 bands are the least ods of 1 month or more. As in the spectral band analysis above, the
common. However, in terms of classification accuracy exceeding land cover classification accuracies are assessed by two groups –
80%, the combinations engaging all 7 bands have the highest per- one group is a fixed number of bands from 1 to 5 with various
cent, and combinations of 3 bands have the lowest (Fig. 8b). This time-series periods in the unit of month from 1 to 7; and the other
implies that the combinations with more bands have a higher is a fixed time-series period from 1 to 7 months with different
probability of producing a higher accuracy output. As the percent- number of bands (3–5). The statistical results of the assessment
age of fewer bands for generating a higher accurate output is low, it are shown in Figs. 10–13.
further confirms that when fewer bands are used, band selection is As shown in Fig. 10, the results were similar to the band
essential for achieving a high accuracy result. variable analysis; lower accuracy outputs are resulted from the
When time-series length was examined, combinations with variable combinations of NDVI with few bands and a shorter
only 1 or 2 months did not produce any output with accuracy time-series period. The low end of the averaged accuracy of the
over 80% (Fig. 9a). Band combinations with a time-series period NDVI with one band within 1-month period is about 71%, and
of 5 months have the highest occurrence, and following by those the high end of the accuracy is over 80% which can be achieved
of 6 months. As in the combinations of numbers of bands, the with NDVI plus 3 bands or more within a time period of 5 months
longest period (7 months) has the highest probability of produc- or longer or NDVI plus 2 bands or more within a time-series period
ing a high accuracy output. The probability of combinations with of 6 months or longer. It is noted that except for the combinations
a time-series length of 3 months for a good result is very small of a shorter time-series periods of 1 or 2 months, in general, NDVI/
and for the combinations of a time-series of length of 1 or band variable combinations could perform the same as or even
2 months, the probability is 0 (Fig. 9b). This further confirms that better than that of band variable combinations (Fig. 5 vs. Fig. 10).
Overall Averaged Accuracy Overall Averaged Accuracy

82 82
Accuracy (%)
80 80
Accuracy (%)
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time-series period in month Time-series period in month
(a) (b)
Overall Averaged Accuracy Overall Averaged Accuracy
82 82
Accuracy (%)
Accuracy (%)
80 80
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 6 7 1 2 3 4 5 6 7
(c) (d)
Overall Averaged Accuracy
No. of Bands: 5
82
Accuracy (%)
80
78
76
74
72
70
1 2 3 4 5 6 7
Time-series period in month
(e)
Fig. 10. Overall averaged accuracy and its uncertainty of NDVI and spectral combinations. Uncertainty is the standard deviation.
As it can be seen from Fig. 10, the uncertainty of the most combi- a larger chance to produce good results, if selected correctly,
nations is a little smaller compared to those of band combinations; variable combinations with a shorter period (say NDVI + 2 or 3
therefore, combinations of NDVI with the same number of bands bands, and a time series of 4 or 5 months) could also have a good
and the same length of the period perform similarly or even better. chance to reach the similar results.
Fig. 11 shows overall average accuracy and uncertainty grouped Furthermore, in terms of band contribution, as shown in Fig. 14a,
by fixed time period in months and various numbers of bands. band 6 had the highest representation, and then followed by bands
When the time period is more than 4 months, the accuracy differ- 4 and 3. Therefore, combinations including NDVI and bands 6, 4 and
ences among various numbers of bands are small, however, when 3 could have the highest probability of producing a land cover clas-
only 1 month of data are used, the overall accuracies are only from sification with higher accuracy. On the opposite, a combination
about 71% (NDVI plus one band) to about 76% (NDVI and all 5 with band 5 has the lower chance of producing good results.
bands). Hence, it is confirmed again that time-series can improve For time period contribution, October has the most occurrences
the accuracy of land cover classification. and is followed by the months of April, May and June. September
In total, there are 3937 possible NDVI/band combinations. has the lowest frequency (Fig. 14b). Hence, choosing the months
Among them, 421 (11%) combinations produced overall averaged of October, April, May or June has the better opportunity to pro-
accuracy over 80%, with the maximum of 81%. The distribution of duce a better land cover map. As April, May and June show a higher
the occurrences of these combinations is shown in Figs. 12 and frequency of higher accurate land cover classification, it suggests
13. The majority combinations are those of 2–4 bands (Fig. 12a) that it is possible to generate an accurate land cover map by
and 4–6 month period (Fig. 13a). The highest frequency of more mid-season from MODIS data.
accurate classifications is NDVI plus 3 bands and 5 month period.
However, when the percentage of each combinations is considered, 3.5. Phenology metrics and band variable combinations
the combinations with 4 or 5 bands have the similar and highest
frequency (Fig. 12b), and, in term of time scale, the combinations As discussed in Section 2.2 and shown in Fig. 3, vegetation phe-
with the time period of 6 months have the highest frequency nology metrics are derived from time-series of NDVI in a full veg-
(Fig. 13b). Although these results indicate that variable combina- etation growing season. Eleven phenology parameters are used to
tions with a longer period or more bands, in general, could have describe a vegetation phenology pattern: they are (a) time for
Time-series period: 1 month Time-series period: 2 month

82 82
Accuracy (%)
80 80
Accuracy (%)
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 1 2 3 4 5
(a) (b)
82 82
80 80
Accuracy (%)
Accuracy (%)
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 1 2 3 4 5
(c) (d)
82 82
80 80
Accuracy (%)
Accuracy (%)
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 1 2 3 4 5
(e) (f)
Time-series period: 7 month
82
80
Accuracy (%)
78
76
74
72
70
1 2 3 4 5
No. of Bands
(g)
Fig. 11. Overall averaged accuracy and its uncertainty of NDVI and a fixed time-series period with various numbers of bands. Uncertainty is the standard deviation.
Histogram of number of bands for Percent of number of bands for overall

overall accuracy over 80% accuracy over 80%
200 25
Occurrence
Percent (%)
150 20
15
100
10
50
5
0 0
1 2 3 4 5 1 2 3 4 5
Number of Bands Number of Bands
(a) (b)
Fig. 12. Histogram of the bands combined with NDVI (a) and percentage (b) for overall averaged accuracy over 80%.
Histogram of time-series period for Percent of time-series period for overall

overall accuracy over 80% accuracy over 80%
200 50
Percent (%)
Occurence
150 40
30
100
20
50 10
0 0
1 2 3 4 5 6 7 1 2 3 4 5 6 7
Time-series length in month Time-series length in month
(a) (b)
Fig. 13. Histogram of time-series period of band and NDVI combinations (a) and percentage (b) for overall averaged accuracy over 80%.
Histogram of individual band for Histogram of Individual month for

350 400
300
Occurrence
300
Occurrence
250
200 200
150 100
100
50 0
0
ril
ay
ne
ly
st
er
be
Ju
Ap
gu
ob
M
Ju
3 4 5 6 7
em
Au
ct
O
pt
Band
Se
(a) (b)
Fig. 14. Histogram of NDVI and individual band (a) and month (b) for overall averaged accuracy over 80%.
the start of the season – time for which the left edge has increased eters are derived from the data of a full vegetation growing season.
to a user defined level (e.g., 20% of the seasonal amplitude) mea- The high end of the overall accuracies is a little more than 80% that
sured from the left minimum level; (b) time for the end of the sea- is similar to the high-end accuracy of other variable combinations.
son – time for which the right edge has decreased to a user defined Fig. 15 shows the overall accuracy of a fixed number (1–5) of
level measured from the right minimum level; (g) length of the bands (3–7) with various time-series periods. Although, in general,
season; (j) base level – the average of the left and right minimum the variable combinations with more bands and a longer time-ser-
values; time for the middle of the season – the mean value of the ies period would produce a higher accuracy, the accuracy improve-
times for which, respectively, the left edge has increased to the ment with the increase of the time-series period from 3 months to
80% level (c) and the right edge has decreased to the 80% level longer periods is not substantial. The accuracy difference from the
(d); (e) largest data value for the fitted function during the season; lowest to the highest due to the time-series period difference is
(f) seasonal amplitude – difference between the maximal value and within 2% for almost for all combinations. For example, for the
the base level; rate of increase at the beginning of the season – cal- combinations of 2 bands and phenology parameters, the overall
culated as the ratio between the values evaluated at the season averaged accuracy of 1 month time-series period is about 78%,
start and at the left 80% level divided by the corresponding time and the accuracy of 7 month period is above 79%; and for the com-
difference; rate of decrease at the end of the season – calculated binations of 5 bands and phenology parameters, the averaged
as the ratio between the values evaluated at the season end and accuracy of 1 month period is about 79%, and the accuracy of
at the right 80% level divided by the corresponding time difference; 7 month time-series is slightly lower than 80%. Although the max-
(i) large seasonal integral – integral of the function describing the imum accuracy of the combinations can be over 80% with 4 or 5
season from the season start to the season end; and (h) small sea- bands and 5 month time-series period or longer, the accuracy dif-
sonal integral – integral of the difference between the function ference among these combinations is not substantial. Therefore,
describing the season and the base level from season start to sea- any combinations of phenology parameters with more than 2
son end (Jönsson and Eklundh, 2003). bands and 3 month time-series period would produce accuracy
The phenology metrics and spectral band combinations (Eq. (3)) similar to those of combinations with more bands and a longer
for land cover classification exploration consist of all phenology time-series period. This is likely because phenologic metrics con-
parameters plus one or more bands (bands 3, 4, 5, 6, 7) within var- tribute significantly in the land cover classification.
ious time-series periods. In order to avoid redundancy, bands 1 and The observations above can be confirmed from Fig. 15a–e. All
2 are excluded from all the combinations as they were used to the overall accuracies of all the variable combinations are within
compute the NDVI time series which then were used to derive the range of 78–80% (except the combinations of phenology
the phenology parameters. For the evaluation, like other models, parameters plus 1 band in less than 4 month time period). It is also
overall averaged classification accuracies are analyzed from two noted that the uncertainties of all the combination groups are
groups: a group of a fixed number of bands with various time-ser- smaller than that of the combinations of other models. This implies
ies period, and the other of a fixed time-series period with various that what bands used additional to phenology parameters are not
numbers of bands. critical if a certain number of bands (P2) and time-series period
The low-end of the overall accuracies of all the combinations is (P3 months) are utilized.
higher than that of any variable combinations of other models (Eqs. Although phenology metrics are useful information for land
(1) and (2)). This is expected because vegetation phenology param- cover identification, one obvious disadvantage of using the

82 82
80 80
Accuracy (%)
Accuracy (%)
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 6 7 1 2 3 4 5 6 7
(a) (b)
No. of Bands 3 No. of Bands: 4
82 82
80
Accuracy (%)
80
Accuracy (%)
78 78
76 76
74 74
72 72
70 70
1 2 3 4 5 6 7 1 2 3 4 5 6 7
(c) (d)
No. of Bands: 5
82
80
Accuracy (%)
78
76
74
72
70
1 2 3 4 5 6 7
Time-series period in month
(e)
Fig. 15. Overall averaged accuracy of phenology metrics and time-series period with a fixed number of spectral bands. Uncertainty is the standard deviation.
phenology parameters is that it requires the data spanning a full information. In general, higher overall accuracy was achieved with
vegetation growing season. This is a severe constraint to most opti- an increase in the number of phenology variables (Fig. 16). Simi-
cal sensors as cloud cover may prevent obtaining full season data. larly to the other tests, there is large increase in accuracy when
Furthermore, if land cover information is needed before the end of only a few metrics are used and the increase diminishes as the
vegetation growing season, phenology metric does not help at all. number of metrics used increases. For example, when 2 variables
instead of one are used, the overall accuracy improves from about
42% to 53%; when the number of variables is increased to 6, the
3.6. Vegetation phenology overall accuracy reaches about 72%. Once the number of parame-
ters included exceeds 6, the overall accuracy only increases
For the final assessment, we tested the full range of phenology slightly. For example, from 6 variables to 10 variables, the overall
metrics, combinations of one to all parameters to assess if they are accuracy increases only about 2% from 72% to about 74%.
sufficient to distinguish different land covers without additional Like the other types of variable combinations for land cover
classification, accuracy uncertainty is bigger for a smaller number
of variable combinations and accuracy variation is smaller when
Overall Averaged Accuracy
80 the number of variables is 9 or more. The choice of metrics which
70 are selected for land cover identification is not critical if a large
number of variables (>9) is used. However, when the number of
60
Accuracy (%)
variables is decreased, as shown in Fig. 15, the performance of dif-

50
ferent variable combinations in each group will be substantially
40
different.
30
Overall, phenology parameters derived from seasonal time-ser-
20 ies of EO data without other information are not the best metrics
10 for land cover identification as its overall best accuracy is lower
0 than those of other types of variable combinations. Compared to
1 2 3 4 5 6 7 8 9 10 11
phenology with band combinations of Model 3 (Eq. (3)), the accu-
Number of Phenology Metric
racy is about 5% lower. One reason of the lower accuracy is that
Fig. 16. Overall averaged accuracy and uncertainty of land cover classification with
although phenology metrics are derived from the data of a full veg-
phenology metrics. etation growing season, only two bands (MODIS bands 1 and 2) are
involved in the phenology parameters computation. As mentioned landscape of homogeneous/heterogeneity needs to be

above, one of the drawbacks of using phenology parameters for conducted in order to determine if medium spatial resolu-
land cover identification is that they require the data of a full veg- tion of EO data such as MODIS is sufficient for land cover
etation growing season. If we need land cover information before mapping of a region.
the end of a growing season, it cannot be fulfilled using phenology
information.
Aknowledgments
4. Conclusions
The authors would like to thank Drs. Ian Olthof, and Junhua Li
EO-based land cover mapping and its subsequent transitional for their valuable comments that have improved this manuscript.
land assessment at a regional and national level require large cov- The authors wish to thank the two anonymous reviewers and Prof.
erage and adequate spatial and temporal resolutions of EO data. Song, the associate editor of the journal for their insightful cri-
MODIS data is a reasonable choice due to its large swath, medium tiques, which led to significant improvements of the manuscript.
spatial resolution, and its daily capability of image acquisition. The long-term satellite record team of Canada Centre for Remote
This study fully explored, with the developed data mining strat- Sensing is acknowledged for the provision of input data. This re-
egy, the potential of the time-series of MODIS data (10-day cloud- search is partially financed by Canadian Space Agency through
free composites) in various forms for land cover classification at a the Government Related Initiatives Program.
regional level. In particular, MODIS spectral bands, and its derived
NDVI and vegetation phenology metric were studied through var- References
iable combinations of different numbers of bands and various
time-series periods for optimal utilization of the time-series of Agriculture and Agri-Food Canada, 2012. ISO 19131 Land Cover for Agricultural
Regions of Canada, Circa 2000 – Data Product Specification. <http://
MODIS data for land cover mapping. This study concludes that:
www4.agr.gc.ca/AAFC-AAC/display-
afficher.do?id=1343322562230&lang=eng#a8>.
(a) Time-series of MODIS data would produce more accurate Alcantara, C., Kuemmerle, T., Prishchepov, A.V., Radeloff, V.C., 2012. Mapping
abandoned agriculture with multi-temporal MODIS satellite data. Remote
output than single time stamped data. The accuracy
Sensing of Environment 124 (2012), 334–347.
improvement can reach about 10%. Azzalil, S., Menentil, M., 1999. Mapping isogrowth zones on continental scale using
(b) In general, variable combinations of a longer time-series temporal Fourier analysis of AVHRR-NDVI data. International Journal of Applied
length and more spectral bands could produce the higher Earth Observation and Geoinformation 1 (1), 9–20.
Bagan, H., Wang, Q., Watanabe, M., Yang, Y., Ma, J., 2005. Land cover classification
accuracy outputs among all possible variable combinations; from MODIS EVI times-series data using SOM neural network. International
However, reducing time period from 7 to 3 months and Journal of Remote Sensing 26 (22), 4999–5012.
spectral bands from 7 to 4 would decrease the accuracy Baldi, G., Nosetto, M.D., Aragón, R., Aversa, F., Paruelo, J.M., Jobbágy, E.G., 2008.
Long-term satellite NDVI data sets: evaluating their ability to detect ecosystem
about only 2%, but trim down the data volume by a factor functional changes in South America. Sensors 8, 5397–5425. http://dx.doi.org/
of about 4. 10.3390/s8095397.
(c) Variable combinations with more bands and a longer time- Biradar, C.M., Thenkabail, P.S., Lslam, Md.A., Anputhas, M., Tharme, R., Vithanage, J.,
Alankara, R., Gunasinghe, S., 2007. Establishing the best spectral bands and
series length not only produce higher accuracy outputs, timing of imagery for land use-land cover (LULC) class separability using
but also a smaller uncertainty, which means that when a Landsat ETM+ and Terra MODIS data. Canadian Journal Remote Sensing 33 (5),
longer period of time-series and more bands are incorpo- 421–444.
Boryan, C., Yang, Z., Mueller, R., Craig, M., 2011. Monitoring US agriculture: the US
rated for land cover identification, the selection of what
department of agriculture, national agricultural statistics service, cropland data
bands in what period is not critical as long as the selected layer program. Geocarto International 2011, 1–18.
data consist of enough number of bands with enough time Boulila, W., Farah, I.R., Ettabaa, K.S., Solaiman, B., Ghézala, H.B., 2011. A data mining
based approach to predict spatiotemporal changes in satellite images.
period. Therefore, MODIS data at early or middle season
International Journal of Applied Earth Observation and Geoinformation 13
are sufficient to produce the best possible accurate outputs (2011), 386–395.
of land cover identification. Bradley, B., Mustard, J., 2008. Comparison of phenology trends by land cover class: a
(d) Of the 4 groups of variable combinations, although the low- case study in the Great Basin, USA. Global Change Biology 14, 334–346, doi:
10.1111/j.1365-2486.2007.01479.
end accuracy for land cover classification is different, three Carrão, H., Gonçalves, P., Caetano, M., 2008. Contribution of multispectral and
of the variable combinations (Models 1, 2 and 3) can reach multitemporal information from MODIS images to land cover classification.
the similar high-end accuracy. Although NDVI and vegeta- Remote Sensing of Environment 112 (2008), 986–997.
Clark, M.I., Aide, T.M., Grau, H.R., Riner, G., 2010. A scalable approach to mapping
tion phenology provide us with some extra options, MODIS annual land cover at 250 m using MODIS time series data: a case study in the
spectral band data by itself can produce the same highest dry chaco ecoregion of South America. Remote Sensing of Environment 114
possible accuracy of land cover classification as those with (2010), 2816–2832.
Colditz, R.R., Saldaña, G.L., Maeda, P., Espinoza, J.A., Tovar, C.M., Victoria, A., 2012a.
additional information. Generation and analysis of the 2005 land cover map for Mexico using 250 m
(e) As the phenology parameters are derived from the data of a MODIS data. Remote Sensing of Environment 123 (2012), 541–552.
full vegetation growing season, other types of variable com- Colditz, R.R., Schmidt, M., Conrad, C., Hansen, M.C., Dech, S., 2012b. Land cover
classification with coarse spatial resolution data to derive continuous and
binations are preferable, especially in order to meet the
discrete maps for complex regions. Remote Sensing of Environment 115 (2011),
requirement of an early or middle season land cover 3264–3275.
identification. de Beurs, K.M., Henebry, G.M., 2004. Land surface phenology, climatic variation, and
institutional change: analyzing agricultural land cover change in Kazakhstan.
(f) Although promising, based on our evaluations, the useful-
Remote Sensing of Environment 89 (2004), 497–509.
ness of time-series MODIS data for land cover identification Donohue, R.J., Roderick, M.L., McVicar, T.R., 2008. Deriving consistent long-term
also depends on the homogeneity–heterogeneity of the vegetation information from AVHRR reflectance data using a cover-triangle-
landscape being studied. The degree of landscape homoge- based framework. Remote Sensing of Environment 112 (2008), 2938–2949.
Ehrlich, D., Lambin, E.F., 1996. Broad scale land-cover classification and interannual
neity–heterogeneity, i.e., the degree of mixed information climatic variability. International Journal of Remote Sensing 17, 845–862.
within a pixel, is a function of the spatial distribution pat- Eklundh L., Jönsson, P., 2012. Timesat 3.1 Software manual.
terns of land cover and the size of land parcels relative to Evrendilek, F., Gulbeyaz, O., 2008. Deriving vegetation dynamics of natural
terrestrial ecosystems from MODIS NDVI/EVI data over Turkey. Sensors 8,
the MODIS pixels. Thus heterogeneity affects the accuracy 5270–5302, doi: 10.3390/s8095270.
of land cover identification. Therefore, an initial analysis of Evrendilek, F., Gulbeyaz, O., 2011. Boosted decision tree classifications of land cover
over Turkey integrating MODIS, climate and topographic data. International resolution for the seven MODIS land bands over Canada and North America.
Journal of Remote Sensing 32 (12), 3461–3483. Remote Sensing of Environment 112 (12), 4167–4185.
Fensholt, R., Proud, S.R., 2012. Evaluation of earth observation based global long Lupo, F., Linderman, M., Vanacker, V., Bartholome, E., Lambin, E.F., 2007.
term vegetation trends — comparing GIMMS and MODIS global NDVI time Categorization of land-cover change processes based on phonological
series. Remote Sensing of Environment 119 (2012), 131–147. indicators extracted from time series of vegetation index data. International
Friedl, M.A., McIver, D.K., Hodges, J.C.F., Zhang, X.Y., Muchoney, D., Strahler, A.H., Journal of Remote Sensing 28 (11), 2469–2483.
Woodcock, C.E., Gopal, S., Schneider, A., Cooper, A., Baccini, A., Gao, F., Schaaf, C., Lv, T., Liu, C., 2010. Study on extraction of crop information using time-series
2002. Global land cover mapping from MODIS algorithms and early results. MODIS data in the Chao Phraya Basin of Thailand. Advances in Space Research
Remote Sensing of Environment 83 (2002), 287–302. 45 (2010), 775–784.
Fuller, D.O., 1998. Trends in NDVI time series and their relation to rangeland and Matsuoka, M., Hayasaka, T., Fukushima, Y., Honda, Y., 2007. Land cover in East Asia
crop production in Senegal, 1987–1993. International Journal of Remote classified using Terra MODIS and DMSP OLS products. International Journal of
Sensing 19 (10), 2013–2018. Remote Sensing 28 (2), 221–248.
Giriraj, A., Irfan-Ullah, M., Murthy, MSR., Beierkuhnlein, C., 2008. Modelling spatial Maxwell, S.K., Hoffer, R.M., Chapman, P.L., 2002a. AVHRR composite period
and temporal forest cover change patterns (1973–2020): a case study from selection for land cover classification. International Journal of Remote Sensing
South Western Ghats (India). Sensors 8, 6132–6153, doi: 10.3390/s8106132. 23, 5043–5059.
Hill, M.J., Donald, G.E., 2003. Estimating spatio-temporal patterns of agricultural Maxwell, S.K., Hoffer, R.M., Chapman, P.L., 2002b. AVHRR channel selection for land
productivity in fragmented landscapes using AVHRR NDVI time series. Remote cover classification. International Journal of Remote Sensing 23, 5061–5073.
Sensing of Environment 84 (2003), 367–384. McGwire, K.C., Fairbanks, D.H.K., Estes, J.E., 1992. Examining regional vegetation
Hilla, M.J., Donalda, G.E., Hyderb, M.W., Smith, R.C.G., 2004. Estimation of pasture associations using multi-temporal AVHRR imagery. In: Proceedings of the
growth rate in the south west of Western Australia from AVHRR NDVI and National ASPRS/ACSM 1992 Annual Convention, Bethesda, Maryland, pp. 304–
climate data. Remote Sensing of Environment 93 (2004), 528–545. 313.
Hirosawa, Y., Marsh, S.E., Kliman, D.H., 1996. Application of standardized principal Neeti, N., Rogan, J., Christman, Z., Eastman, J.R., Millones, M., Schneider, L., Nickl, E.,
component analysis to land-cover characterization using multitemporal AVHRR Schmook, B., Turner II, B.L., Ghimire, B., 2012. Mapping seasonal trends in
data. Remote Sensing of Environment 58, 267–281. vegetation using AVHRR-NDVI time series in the Yucatán Peninsula, Mexico.
Huttich, C., Herold, M., Schmullius, C., Egorov, V., Bartalev, S.A., 2007. Indicators of Remote Sensing Letters 3 (5), 433–442.
Northern Eurasia’s land-cover change trends from SPOT-VEGETATION time- Pal, M., Mather, P.M., 2003. An assessment of the effectiveness of decision tree
series analysis 1998–2005. International Journal of Remote Sensing 28 (18), methods for land cover classification. Remote Sensing of Environment 86, 554–
4199–4206. 565.
Jakubauskas, ME., Peterson, DL., Kastens, JH., Legates, D.R., 2002. Time series remote Pouliot, D., Latifovic, R., Fernandes, R., Olthof, I., 2009. Evaluation of annual forest
sensing of landscape-vegetation interactions in the Southern Great Plains. disturbance monitoring using a static decision tree approach and 250 m MODIS
Photogrammetric Engineering and Remote Sensing 68 (10), 1021–1030. data. Remote Sensing of Environment 113 (2009), 1749–1759.
Jönsson, P., Eklundh, L., 2003. Seasonality extraction from satellite sensor data. In: Quinlan, J.R., 1996. Bagging, boosting, and C4.5. In: Proceedings AAAI-96 Fourteenth
Chen, C.H. (Ed.), Frontiers of Remote Sensing Information Processing. World National Conference on Artificial Intelligence, Portland, OR.
Scientific Publishing, pp. 487–500. Reed, B.C., Brown, J.F., VanderZee, D., Loveland, T.R., Merchant, J.W., Ohlen, D.O.,
Kasischke, E.S., French, N.H.F., 1997. Constraints on using AVHRR composite index 1994. Measuring phenological variability from satellite imagery. Journal of
imagery to study patterns of vegetation cover in boreal forests. International Vegetation Science 5, 703–714.
Journal of Remote Sensing 18 (11), 2403–2426. Salomonson, VV., Barnes, WL., Maymon, PW., Montgomery, HE., Ostrow, H., 1989.
Keane, R.E., Cary, G.J., Davies, I.D., Flannigan, M.D., Gardner, R.H., Lavorel, S., Lenihan, MODIS: advanced facility instrument for studies of the earth as a system. IEEE
J.M., Li, C., Rupp, T.S., 2004. A classification of landscape fire succession models: Transactions on Geoscience Remote Sensing 27 (2), 145–153.
spatial simulations of fire and vegetation dynamics. Ecological Modelling 179 Schneider, A., 2012. Monitoring land cover change in urban and peri-urban areas
(2004), 3–27. using dense time stacks of Landsat satellite data and a data mining approach.
Kleynhans, W., Olivier, J.C., Wessels, K.J., Salmon, B.P., van den Bergh, F., Steenkamp, Remote Sensing of Environment 124 (2012), 689–704.
K., 2011. Detecting land cover change using an extended kalman filter on Wardlow, B.D., Egbert, S.L., 2008. Large-area crop mapping using time-series MODIS
MODIS NDVI time-series data. Geoscience and Remote Sensing Letters, IEEE 8 250 m NDVI data: an assessment for the US Central Great Plains. Remote
(3), 507–511. Sensing of Environment 112 (2008), 1096–1116.
Knight, J.F., Lunetta, R.L., Ediriwickrema, J., Khorram, S., 2006. Regional scale land- Wardlow, B.D., Egbert, S.L., 2010. A comparison of MODIS 250-m EVI and NDVI data
cover characterization using MODIS-NDVI 250 m multi-temporal imagery: a for crop mapping: a case study for southwest Kansas. International Journal of
phenology based approach. GIScience and Remote Sensing 43 (1), 1–23. Remote Sensing 31 (3), 805–830.
Kumar, U., Kerle, N., Punia, M., Ramachandra, T.V., 2010. Perceptron and decision Wardlow, B.D., Egbert, S.L., Kastens, J.H., 2007. Analysis of time-series MODIS 250 m
tree from MODIS data. Journal of the Indian Society of Remote Sensing 38 (4), vegetation index data for crop classification in the US Central Great Plains.
592–603, doi: 10.1007/s12524-011-0061-y. Remote Sensing of Environment 108 (2007), 290–310.
Latifovic, R., Pouliot, D., 2005. Multitemporal land cover mapping for Canada: Weiss, E., Marsh, S.E., Pfirman, E.S., 2001. Application of NOAA-AVHRR NDVI time-
methodology and products. Canadian Journal of Remote Sensing 31 (5), 347– series data to assess changes in Saudi Arabia’s rangelands. International Journal
363. of Remote Sensing 22 (6), 1005–1027.
Liang, S., 2001. Land-cover classification methods for multiyear AVHRR data. Xiao, X., Boles, S., Liu, J., Zhuang, D., Frolking, S., Li, C., Salas, W., Moore III, B., 2005.
International Journal of Remote Sensing 22 (8), 1479–1493. Mapping paddy rice agriculture in southern China using multi-temporal MODIS
Liu, Y., Wang, X., Guo, M., Tani, H., Matsuoka, N., Matsumura, S., 2011. Spatial and images. Remote Sensing of Environment 95 (2005), 480–492.
temporal relationships among NDVI, climate factors, and land cover changes in Yang, X., Lo, C.P., 2002. Using a time series of satellite imagery to detect land use
Northeast Asia from 1982 to 2009. GIScience and Remote Sensing 48 (3), 371– and land cover changes in the Atlanta, Georgia metropolitan area. International
393, doi: 10.2747/1548-1603.48.3.371. Journal of Remote Sensing 23 (9), 1775–1798.
Luo, Y., Trishchenko, A.P., Khlopenkov, K.V., 2008. Developing clear-sky, cloud and Zhang, X., Sun, R., Zhang, B., Tong, Q., 2008. Land cover classification of the North
cloud shadow mask for producing clear-sky composites at 250-meter spatial China Plain using MODIS EVI time series. ISPRS Journal of Photogrammetry and
Remote Sensing 63, 476–48.

Zhou, Fuqun - A Data Mining Approach For Evaluation of Optimal Time-Series of MODIS Data For Land Cover Mapping at A Regional Level - 2013

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Zhou, Fuqun - A Data Mining Approach For Evaluation of Optimal Time-Series of MODIS Data For Land Cover Mapping at A Regional Level - 2013

Uploaded by

Copyright:

Available Formats

ISPRS Journal of Photogrammetry and Remote Sensing 84 (2013) 114–129

Contents lists available at ScienceDirect

ISPRS Journal of Photogrammetry and Remote Sensing

A data mining approach for evaluation of optimal time-series of MODIS

Fig. 1. The study area.

where n(m, n) is the feature model with parameters m and n;

Land cover type Description

Verification Data Land Cover

Fig. 4. EO-based data mining procedures for land cover identiﬁcation.

Model No. of variable combinations Kappa

No. of Bands: 3 No. of Bands: 4

Time-series period -1 month Time-series period -2 months

Histogram of individual band for Histogram of Individual month for

Histogram of number of bands for Percent of number of bands for

Histogram of time-series period for Percent of time-series period for

Overall Averaged Accuracy Overall Averaged Accuracy

Time-series period: 1 month Time-series period: 2 month

Histogram of number of bands for Percent of number of bands for overall

Histogram of time-series period for Percent of time-series period for overall

Histogram of individual band for Histogram of Individual month for

No. of Bands: 1 No. of Bands: 2

variables is decreased, as shown in Fig. 15, the performance of dif-

involved in the phenology parameters computation. As mentioned landscape of homogeneous/heterogeneity needs to be

You might also like