You are on page 1of 23

remote sensing

Article
Evaluation of Three Deep Learning Models for Early
Crop Classification Using Sentinel-1A Imagery Time
Series—A Case Study in Zhanjiang, China
Hongwei Zhao 1,2 , Zhongxin Chen 1,2 , Hao Jiang 3,4,5, * , Wenlong Jing 3,4,5 , Liang Sun 1,2 and
Min Feng 6
1 Institute of Agricultural Resources and Regional Planning, CAAS, Beijing 100081, China;
zhaohongwei@caas.cn (H.Z.); chenzhongxin@caas.cn (Z.C.); sunliang@caas.cn (L.S.)
2 Key Laboratory of Agricultural Remote Sensing, Ministry of Agriculture and Rural Affairs,
Beijing 100081, China
3 Key Laboratory of Guangdong for Utilization of Remote Sensing and Geographical Information System,
Guangzhou 510070, China; jingwl@lreis.ac.cn
4 Guangdong Open Laboratory of Geospatial Information Technology and Application,
Guangzhou 510070, China
5 Guangzhou Institute of Geography, Guangzhou 510070, China
6 Institute of Tibetan Plateau Research, CAS, Beijing 100101, China; mfeng@itpcas.ac.cn
* Correspondence: jianghao@gdas.ac.cn; Tel.: +86-010-8210-5072

Received: 19 September 2019; Accepted: 12 November 2019; Published: 15 November 2019 

Abstract: Timely and accurate estimation of the area and distribution of crops is vital for food security.
Optical remote sensing has been a key technique for acquiring crop area and conditions on regional
to global scales, but great challenges arise due to frequent cloudy days in southern China. This
makes optical remote sensing images usually unavailable. Synthetic aperture radar (SAR) could
bridge this gap since it is less affected by clouds. The recent availability of Sentinel-1A (S1A) SAR
imagery with a 12-day revisit period at a high spatial resolution of about 10 m makes it possible to
fully utilize phenological information to improve early crop classification. In deep learning methods,
one-dimensional convolutional neural networks (1D CNNs), long short-term memory recurrent
neural networks (LSTM RNNs), and gated recurrent unit RNNs (GRU RNNs) have been shown
to efficiently extract temporal features for classification tasks. However, due to the complexity of
training, these three deep learning methods have been less used in early crop classification. In this
work, we attempted to combine them with an incremental classification method to avoid the need for
training optimal architectures and hyper-parameters for data from each time series. First, we trained
1D CNNs, LSTM RNNs, and GRU RNNs based on the full images’ time series to attain three classifiers
with optimal architectures and hyper-parameters. Then, starting at the first time point, we performed
an incremental classification process to train each classifier using all of the previous data, and obtained
a classification network with all parameter values (including the hyper-parameters) at each time
point. Finally, test accuracies of each time point were assessed for each crop type to determine the
optimal time series length. A case study was conducted in Suixi and Leizhou counties of Zhanjiang
City, China. To verify the effectiveness of this method, we also implemented the classic random forest
(RF) approach. The results were as follows: (i) 1D CNNs achieved the highest Kappa coefficient
(0.942) of the four classifiers, and the highest value (0.934) in the GRU RNNs time series was attained
earlier than with other classifiers; (ii) all three deep learning methods and the RF achieved F measures
above 0.900 before the end of growth seasons of banana, eucalyptus, second-season paddy rice, and
sugarcane; while, the 1D CNN classifier was the only one that could obtain an F-measure above 0.900
for pineapple before harvest. All results indicated the effectiveness of the solution combining the
deep learning models with the incremental classification approach for early crop classification. This
method is expected to provide new perspectives for early mapping of croplands in cloudy areas.

Remote Sens. 2019, 11, 2673; doi:10.3390/rs11222673 www.mdpi.com/journal/remotesensing


Remote Sens. 2019, 11, 2673 2 of 23

Keywords: early crop classification; one-dimensional CNNs; long short-term memory; gated recurrent
unit; incremental classification; RNNs; synthetic aperture radar; Sentinel-1A

1. Introduction
Crop-type information is important for food security due to its wide-ranging applicability, such
as for yield estimates, crop rotation, and soil productivity [1,2]. Timely and accurate estimation of crop
distribution provides crucial information for agricultural monitoring and management [3,4], and the
demand for accurate crop-type maps is increasing in government and society [5–7].
In the field of crop classification, optical data are useful to estimate the chemical contents of crops,
e.g., chlorophyll and water [8], whereas synthetic aperture radar (SAR) backscatter is more sensitive to
crop structure and field conditions [9]. However, in southern China, optical sensors do not perform
well due to frequent rainy weather and prolonged cloud cover [10]. However, active microwave remote
sensing using SAR can work under any weather condition [11,12].
The phenological evolution of each crop structure produces a unique temporal profile of the SAR
backscattering coefficient [13,14]. In this way, multi-temporal SAR imagery is an efficient source of
time series observations that can be used to monitor growing dynamics for crop classification [15,16].
However, different classification tasks require different levels of temporal resolution in SAR data.
In previous studies, a lack of high spatial and temporal resolution SAR data was a major challenge
for crop identification in southern China, due small-scale structures of plants and a rich variety of crops
in the region [14]. Especially for early crop identification, the sufficient frequency of data acquisition in
the growth season is very important.
Sentinel-1A (S1A), launched on 3 April 2014, is equipped with a C-band SAR sensor with a 12-day
revisit interval, 20 m spatial resolution, and two polarizations (VH, VV) [17]. Moreover, the Level-1
Ground Range Detected (GRD) product, at an image resolution of 10 m, is open access. Therefore, S1A
SAR imagery provides new opportunities for early crop identification in southern China.
Classical machine learning approaches, such as the random forest (RF) and support vector machine
(SVM), are not designed to work with time series data, and they take each time acquired data as an
input feature in crop classification tasks [18,19]. Therefore, they ignore the temporal dependency of
the time series [20]. Deep learning algorithms have gained momentum in recent years, and have
shown unprecedented performance in combining spatial and temporal patterns for crop classification.
Different from classical machine learning methods that use the extracted features as input [21], deep
learning methods allow a machine to be fed raw data (such as the pixel values of raw imagery) and to
automatically discover representations needed for detection or classification at multiple levels [22,23].
For classification tasks, higher-level layers of representation in a network amplify aspects of the input
that are important for discrimination and that suppress irrelevant variations [22]. This is very helpful
for crop classification because of the complex relations of internal biochemical processes, inherent
relations between environmental variables, and variable crop behavior.
One-dimensional convolutional neural networks (1D CNNs) [24] and recurrent neural networks
(RNNs) [25] have been shown to be effective deep learning methods for end-to-end time series
classification problems [26,27]. Long short-term memory RNNs (LSTM RNNs) and gated recurrent
unit RNNs (GRU RNNs) are variants of RNNs that solve the problem of gradient disappearance or
explosion seen with an increasing time series [28,29]. Recently, some effort has been spent on exploiting
1D CNNs [2,30], LSTM RNNs [20,31–33], and GRU RNNs [20,33] for time series classification of crops.
Zhong et al. [30] classified 13 summer crops in Yolo County, California, USA, by applying 1D CNNs
to the Enhanced Vegetation Index (EVI) time series. Conversely, Kussul et al. [2] input time series of
multiple spectral features into 1D CNNs. Ndikumana et al. [20] evaluated the potential of LSTM RNNs
and GRU RNNs on Sentinel-1 remote sensing data.
Remote Sens. 2019, 11, 2673 3 of 23
Remote Sens. 2019, 11, x FOR PEER REVIEW 3 of 23

To our best knowledge,


knowledge,there therearearealmost
almostno noRNNs
RNNsthat have
that havebeen
beenapplied
applied to time series
to time datadata
series for
early
for cropcrop
early identification.
identification. Cai etCai al. [34]
et al.used
[34]time
usedseries
time data
series(Landsat 5, 7, and5,8)7,during
data (Landsat and 8)the growing
during the
season ofseason
growing corn of and cornsoybean from 2000
and soybean to 2013
from 2000 to train
to 2013 hyper-parameters
to train hyper-parameters and
andparameters
parametersof of a
determinate 1D
determinate 1DCNNCNNarchitecture.
architecture.Different
Differenttimetime series
series data
data during
during thethe growing
growing season
season fromfrom
20142014
and
and 2015
2015 werewere selected
selected as testing
as testing data.data. The started
The test test started
at dayat of
day of year
year (DOY) (DOY)
91 and91 more
and more Landsat
Landsat data
datagradually
was was graduallyinputinput to generate
to generate the crop
the crop classification
classification untiluntil
DOYDOY270. 270.
ThisThis method
method is not
is not suitable
suitable for
for RNNs
RNNs duedue to parameters
to the the parameters (but(but not hyper-parameters)
not hyper-parameters) beingbeing determined
determined by thebylength
the length
of theoftime
the
time series
series [35]. However,
[35]. However, if weoptimal
if we train train optimal architectures
architectures and hyper-parameters
and hyper-parameters of RNNsof RNNs
and CNNs and
at
CNNs
each at each
time time
series, theseries,
workload the workload
is huge. is huge.
In this work, we proposed to train 1D CNNs, LSTM RNNs, and GRU RNNs based on full time
series data during the growing season of the main crop in the study area. The goal was to attain the
networks’ optimal
networks’ optimalarchitectures
architecturesand and hyper-parameters
hyper-parameters (we(we refer
refer to these
to these networks
networks as classifiers).
as classifiers). Next,
Next, starting
starting at the
at the first timefirst timeofpoint
point of the
the time timewe
series, series, we performed
performed an incremental
an incremental classification
classification to train eachto
train eachusing
classifier classifier using
all of the all of data,
previous the previous
and obtaineddata,aand obtained network
classification a classification
with all network
parameterwithvaluesall
parameter the
(including values (including theacquired
hyper-parameters hyper-parameters
before) at each acquired before)
time point. at incremental
In the each time point. In the
classification
incremental
method, more classification
data will bemethod, input into more data will as
the classifier bethe
input into the
growing classifier
season as the [36].
progresses growing season
Finally, test
progressesof
accuracies [36].
eachFinally,
time pointtest accuracies
were assessed of each timethe
to find point wereoptimal
earliest assessed to find the performance
classification earliest optimal for
classification
each crop type. performance for each crop type.
A case study was conducted in Suixi and Leizhou Leizhou counties
counties of Zhanjiang
Zhanjiang City,
City, China. In In addition,
addition,
in order to verify the effectiveness of this this solution,
solution, we also implemented
implemented the classic random forest
(RF) approach.
organized as
This paper is organized as follows.
follows. In Section 2, the study area and and data
data are
are introduced.
introduced. In
Section 3, the methodology is reported, while in Section 4 an analysis of the results results isis presented.
presented. In
Section 5, a discussion is provided. Finally, conclusions are drawn in
Finally, conclusions are drawn in Section 6. Section 6.

2. Data Resources

2.1. Ground Data


2.1. Ground Data
For
For our
our experiments,
experiments, anan 84 km ×
84 km 128 km
× 128 km study
study area
area in
in Suixi
Suixi and
and Leizhou
Leizhou counties
counties of
of Zhanjiang
Zhanjiang
City, China (Figure 1) was chosen as the area of interest (AOI). It has a humid subtropical
City, China (Figure 1) was chosen as the area of interest (AOI). It has a humid subtropical climate with
climate
mild
with and
mildovercast winterswinters
and overcast and a hot
andanda dry
hot summer
and dryperiod.
summerTheperiod.
monthly daily
The average
monthly temperature
daily average
in July is 29.1 ◦ C, and in January is 16.2 ◦ C. The rainy season is from May to October [37].
temperature in July is 29.1 °C, and in January is 16.2 °C. The rainy season is from May to October [37].

Figure 1. Study area and sample distribution.


Remote Sens. 2019, 11, 2673 4 of 23

The field campaign was conducted in the study area in October 2017. We specified six major
cropping sites based on expert knowledge, and we traveled to each site and recorded raw samples of
the crop types at each site. There were 198 samples acquired from the field survey, and 610 samples
were taken by interpretation. Therefore, a total of 808 sample points (at the center of the fields) were
obtained through ground surveys for five main types of local vegetation, as follows: (1) Paddy, (2)
sugarcane, (3) banana, (4) pineapple, and (5) eucalyptus. The size of fields ranged from 0.3 to 2 ha.
Figure 1 shows the position of ground samples, and the distribution of the number of samples per
class is shown in Table 1.

Table 1. Number of samples per type.

ID 1 2 3 4 5 Total
Type Paddy Sugarcane Banana Pineapple Eucalyptus
Number 179 215 53 44 339 830

Sugarcane, with a growth period from mid-March to mid-February of the following year, is the
most prominent agricultural product of the AOI. Bananas, pineapples, and paddy rice are native
products that also play a significant role in the local agricultural economy. In addition, the study area
is the most important eucalyptus growing region in China. In the paddy fields of the AOI, paddy
rice includes first-season rice and second-season rice, whose growth periods are early March to late
July and early August to late November, respectively. The growth periods of banana, pineapple, and
eucalyptus generally last for 2–4, 1.5–2, and 4–6 years, respectively.
At present, there is no research that has been carried out to prove the optimal distribution of
samples in deep learning models for classification tasks using remote sensing data. In References [20]
and [38], 11 crop types were classified based on 921 samples and 547 samples respectively, so we
believe that the ground data set in this study is effective for the classification of five crop types using
deep learning models

2.2. SAR Data


For this study, we used the S1A interferometric wide swath GRD product, which has a
12-day revisit time; thus, there were 30 images acquired from 10 March 2017 to 21 February 2018,
covering the 2017 growing season. The product was downloaded from the European Space Agency
(ESA) Sentinels Scientific Data Hub website [39] for free. The product contains both VH and VV
polarizations, which allows for measurement of the polarization properties for terrestrial surfaces, and
the backscatter intensity.
The S1A data was preprocessed using Sentinel Application Platform (SNAP) open source software
version 6.0.0. The preprocessing stages included the following:
(i) Radiometric calibration. This step provided imagery in which the pixel values could be related
directly to the radar backscatter of the scene, and depended on metadata information downloaded
with the Sentinel-1A data. Using any of the four look-up tables provided with Level 1 products, the
pixel values are returned as either of the following: Gamma naught band (γ◦), sigma naught band
(σ◦), beta naught band (β◦), or the original digital number band. In this study, the sigma naught band
(σ◦) was used for analysis.
(ii) Orthorectification. To compensate for geometric distortions introduced from the side-looking
geometry of the images, range Doppler terrain orthorectification was applied to the images
radiometrically corrected in step (i). The orthorectification algorithm used the available metadata
information on the orbit state vector, the radar timing annotations, slant to ground range
conversion parameters, and a reference Digital Elevation Model (DEM) data set to derive the precise
geolocation information.
Remote Sens. 2019, 11, 2673 5 of 23

(iii) Re-projection. The orthorectified SAR image was further resampled to a spatial resolution
Remote Sens. 2019, 11, x FOR PEER REVIEW 5 of 23
of 10 m using bilinear interpolation, and re-projected to the Universal Transverse Mercator (UTM)
coordinate system, Zone 49 North, World Geodetic System (WGS) 84.
(iv) Speckle filtering. In this study, the Gamma-MAP (maximum a posteriori) speckle filter with
(iv) Speckle filtering. In this study, the Gamma-MAP (maximum a posteriori) speckle filter with
a 7 × 7 window size [40] was applied to all the images to remove the granular noise (i.e., speckle
a 7 × 7 window size [40] was applied to all the images to remove the granular noise (i.e., speckle
filtering).
filtering).
(v) After speckle filtering, all intensity images were transformed to the logarithmic dB scale, and
(v) After speckle filtering, all intensity images were transformed to the logarithmic dB scale, and
normalized to values between 0 and 255 (8 bits).
normalized to values between 0 and 255 (8 bits).
3. Methodology
3. Methodology
The overall methodology used in this study is presented in Figure 2. In Step 1, we processed S1A
The overall methodology used in this study is presented in Figure 2. In Step 1, we processed S1A
imagery to get the VH+VV-polarized backscatter data (see Section 2.2), and extracted the backscatter
imagery to get the VH + VV-polarized backscatter data (see Section 2.2), and extracted the backscatter
time series (VH+VV) of 30 lengths using the ground point data. In Step 2, we trained the deep learning
time series (VH + VV) of 30 lengths using the ground point data. In Step 2, we trained the deep learning
networks (1D CNNs, LSTM RNNs, and GRU RNNs) and RF to attain their optimal architectures and
networks (1D CNNs, LSTM RNNs, and GRU RNNs) and RF to attain their optimal architectures
hyper-parameters using the time series data of 30 lengths. It should be noted that 80% of each crop
and hyper-parameters using the time series data of 30 lengths. It should be noted that 80% of each
type was randomly selected to constitute the training set, and the remaining 20% was used in the test
crop type was randomly selected to constitute the training set, and the remaining 20% was used in
set. In Step 3, we performed incremental classification to train the four classifiers with the optimal
the test set. In Step 3, we performed incremental classification to train the four classifiers with the
architectures and hyper-parameters using all of the previous data. Finally, we analyzed test
optimal architectures and hyper-parameters using all of the previous data. Finally, we analyzed test
performance in terms of overall accuracy and Kappa coefficient (Step 4) and the accuracy of each crop
performance in terms of overall accuracy and Kappa coefficient (Step 4) and the accuracy of each crop
time series (Step 5).
time series (Step 5).

Figure 2. Methodology used in this study. L refers to the length of the time series.
Figure 2. Methodology used in this study. L refers to the length of the time series.
3.1. D CNNs
3.1.DNeural
CNNs networks are parallel systems used for solving regression and classification problems in
manyNeural
fields [41]. Traditional
networks neuralsystems
are parallel networks (NNs)
used have various
for solving architectures
regression and the most
and classification popular
problems in
form is multiplayer perceptron (MLP) networks [42]. In an MLP, each neuron receives
many fields [41]. Traditional neural networks (NNs) have various architectures and the most popular a weighted sum
from
form each neuron in perceptron
is multiplayer the preceding layernetworks
(MLP) and provides
[42]. an
In input to every
an MLP, each neuron
neuron of the nexta layer
receives [43].
weighted
sum Compared with traditional
from each neuron NNs, CNNs
in the preceding layershare local connections
and provides an input and weights
to every using
neuron of athe
convolution
next layer
kernel
[43]. (also known as a “filter”) [22]. The convolution kernel not only reduces the number of parameters,
but also reduces with
Compared the complexity
traditionalof the CNNs
NNs, model share
[44]. Therefore, CNNs are
local connections andmore suitable
weights usingthan traditional
a convolution
NNs for processing a large amount of image data [44].
kernel (also known as a “filter”) [22]. The convolution kernel not only reduces the number of
parameters, but also reduces the complexity of the model [44]. Therefore, CNNs are more suitable
than traditional NNs for processing a large amount of image data [44].
LeCun et al. introduced the architecture of 2D CNNs [45]. It includes a convolutional layer
(Conv), the rectified linear unit (Relu), the pooling layer (Pooling), and the fully connected layer
(Fully-Con). The 1D CNN is a special form of the CNN, and employs a one-dimensional convolution
that lower layers focus on local features and upper layers summarize more general patterns [30].

3.2. LSTM RNNs


RNNs are neural networks specialized for processing sequential data. A standard RNN
Remote Sens. 2019, 11, 2673 6 of 23
architecture is shown in Figure 3 ((b) is an expanded form of (a) on data sequences). The state of the
network at each time point depends on both the current input and the previous information stored
in theLeCun
network. There
et al. are twothe
introduced types of RNN architectures,
architecture as follows:
of 2D CNNs [45]. With
It includes output at eachlayer
a convolutional time(Conv),
point
(many-to-many)
the rectified linear or unit
with(Relu),
outputthe
only at the layer
pooling last time point (many-to-one).
(Pooling), and the fully connected layer (Fully-Con).
The Given
1D CNN an isinput
a special form of𝑥 the
sequence , 𝑥 ,CNN,
… , 𝑥 ,and output 𝑦a one-dimensional
theemploys of the unit at time t is givenkernel
convolution by the to
following:
capture the temporal pattern or shape of the input series [46]. Conv layers can be stacked so that lower
layers focus on local features and upper layers summarize more general patterns [30].
ℎ 𝑈𝑥 𝑊𝑠 , (1)
3.2. LSTM RNNs
𝑠 𝑓 ℎ 𝑏 , (2)
RNNs are neural networks specialized for processing sequential data. A standard RNN architecture
𝑦 𝑔 V𝑠 𝑐 , (3)
is shown in Figure 3 ((b) is an expanded form of (a) on data sequences). The state of the network at
where 𝑠 is
each time point depends
the state onnetwork
of the time; U,
both theatcurrent V, and
input andWthe
areprevious 𝑏 and
information
weight matrices; 𝑐 are
stored in the
biasnetwork.
weight
vectors; and f and 𝑔 are usually tanh and softmax activation functions, respectively. When t is 1, 𝑠
There are two types of RNN architectures, as follows: With output at each time point (many-to-many)
isornormally
with output only atas
initialized the0.last time point (many-to-one).

(a) (b)

Figure 3. AFigure 3. Arecurrent


standard standardneural
recurrent neural
network network architecture
architecture (many-to-many).
(many-to-many), (a) is the standard unit,
and (b) is an expanded form of (a).
Since ordinary RNNs fail to learn long-term dependencies because of the problem of vanishing
Given an input
and exploding sequence
gradients, an LSTM . . . , xwas
{x1 , x2 ,unit t }, the output ytbecause
designed, of the unit time t can
theatLSTM is given by the following:
“remember” values
over arbitrary time intervals, long or short [47].
The LSTM unit reduces or increases the ht =ability
Uxt +ofWs t−1 ,
information to pass through the unit through (1)
three gates, as follows: Forget, input, and output. These gates are shown in Figure 4. Each gate is
st = f (ht + b), (2)
controlled by the state of the previous time step and the current input signal, and it contains a sigmoid
layer and a multiplication operation. The ysigmoid t = g(Vslayer
t + c),outputs a value between 0 and 1, which (3)
represents how much information can be passed. The forget gate decides what information will be
where st is the state of the network at time; U, V, and W are weight matrices; b and c are bias weight
discarded from the unit state C and the input gate decides what new information is going to be stored
vectors; and f and g are usually tanh and softmax activation functions, respectively. When t is 1, s0 is
in it. The output gate determines the new state 𝑠 . Equations (4) to (9) describe the internal operations
normally initialized as 0.
carried out in an LSTM neural unit.
Since ordinary RNNs fail to learn long-term dependencies because of the problem of vanishing
and exploding gradients, an LSTM unit 𝑓 was 𝜎 designed,
𝑊 ∙ 𝑠 , because
𝑥 𝑏 the
, LSTM can “remember” values over (4)
arbitrary time intervals, long or short [47].
𝑖 𝜎 𝑊 ∙ 𝑠 ,𝑥 𝑏 , (5)
The LSTM unit reduces or increases the ability of information to pass through the unit through
three gates, as follows: Forget, input,𝑜 and 𝜎output. 𝑊 ∙ 𝑠 These, 𝑥 gates 𝑏 ,are shown in Figure 4. Each gate (6)is
controlled by the state of the previous 𝐶 time 𝑡𝑎𝑛ℎstep𝑊and∙ 𝑠the current
,𝑥 𝑏input
, signal, and it contains a sigmoid
(7)
layer and a multiplication operation. The sigmoid layer outputs a value between 0 and 1, which
represents how much information can be passed. The forget gate decides what information will be
discarded from the unit state C and the input gate decides what new information is going to be stored
in it. The output gate determines the new state st . Equations (4) to (9) describe the internal operations
carried out in an LSTM neural unit.

ft = σ(W f ·[st−1 , xt ] + b f ), (4)


Remote Sens. 2019, 11, 2673 7 of 23

it = σ(Wi ·[st−1 , xt ] + bi ), (5)


Remote Sens. 2019, 11, x FOR PEER REVIEW 7 of 23
ot = σ(Wo ·[st−1 , xt ] + bo ), (6)
et =𝐶tanh𝑓(W∗C𝐶·[st−1 , 𝑖xt ∗] +
C 𝐶 b, C ), (8)
(7)

C𝑠t = f𝑜t ·C∗t−1


𝑡𝑎𝑛ℎ
+ it𝐶·C
et ,, (9)
(8)
where, 𝐶 , 𝐶 , and 𝐶 are unit memory, st = o𝑊 , 𝑊 (, C
𝑊),, and 𝑊 are weight matrices; and
t ·tanh t (9)
𝑏 , 𝑏 , 𝑏 , and 𝑏 are bias vectors.
where, Ct−1 , Cet , and Ct are unit memory, W f , Wi , WC , and Wo are weight matrices; and b f , bi , bC , and bo
are bias vectors.

Figure 4. Diagram of the long short-term memory recurrent neural network (LSTM RNN) unit.
Figure 4. Diagram of the long short-term memory recurrent neural network (LSTM RNN) unit.
3.3. GRU RNNs
3.3. GRU RNNs
A gate recurrent unit (GRU) is an LSTM variant with a simpler architecture [48], as shown in
A gate
Figure 5. Therecurrent
GRU unit unit
has(GRU) is anasLSTM
two gates, variant
follows: with
Update andareset.
simpler
Thearchitecture
update gate[48], as shown
determines in
if the
Figure 5. The GRU unit has two gates, as follows: Update and reset. The update gate determines
hidden state is to be updated with a new hidden state, while the reset gate decides if the previous if the
hidden state is
hidden state is to
to be
beignored.
updatedTheir
with outputs
a new hidden state,
are zt and rt ,while the reset
respectively. gate
The decides
detailed if the previous
operations of the
hidden state is to be ignored. Their outputs are
GRU unit are illustrated in Equations (10) to (13). 𝑧 and 𝑟 , respectively. The detailed operations of
the GRU unit are illustrated in Equations (10) to (13).
σ(W
𝑧zt = 𝜎 𝑊z ·∙[s𝑠t−1 , ,x𝑥t ] + b𝑏z ), , (10)
(10)

r𝑟t = σ
𝜎(W 𝑠 , ,x𝑥t ] + b𝑏r ), ,
𝑊r ·∙[st−1 (11)
(11)
𝑠̃st = tanh
e 𝑡𝑎𝑛ℎ(W 𝑟 , x, 𝑥t ] + bs𝑏), ,
𝑊s ·[∙rt−1 (12)
(12)
𝑠st = (11 − 𝑧zt )·s∗t−1
𝑠 + zt𝑧+ e
st ,𝑠̃ , (13)
(13)

where, 𝑊 ,, W
where, W 𝑊 ,, and 𝑊 are
and W areweight
weightmatrices; andb 𝑏, b, 𝑏, and
matrices;and , 𝑎𝑛𝑑b𝑏 are
are bias
bias vectors.
vectors.
z r s z r s

Figure 5. Diagram of the gated recurrent unit RNN (GRU RNN) unit.
𝑟 𝜎 𝑊 ∙ 𝑠 ,𝑥 𝑏 , (11)
𝑠̃ 𝑡𝑎𝑛ℎ 𝑊 ∙ 𝑟 ,𝑥 𝑏 , (12)

𝑠 1 𝑧 ∗𝑠 𝑧 𝑠̃ , (13)
Remote Sens. 2019, 11, 2673 8 of 23
where, 𝑊 , 𝑊 , and 𝑊 are weight matrices; and 𝑏 , 𝑏 , 𝑎𝑛𝑑 𝑏 are bias vectors.

Figure 5. Diagram of the gated recurrent unit RNN (GRU RNN) unit.
Figure 5. Diagram of the gated recurrent unit RNN (GRU RNN) unit.
3.4. RF
The RF classifier is an ensemble classifier proposed by Breiman (2001) [49]. It improves classification
accuracy and controls overfitting by fitting the results of multiple simple decision tree classifiers [48].
These simple decision tree classifiers act on a subset of the samples. To reduce the computational
complexity of the algorithm and the correlation between subsamples, tree construction can be stopped
when a maximum depth is reached or when the number of samples on the node is less than a minimum
sample threshold [20].
Previous studies have shown that in crop identification of time series data, compared with other
machine learning methods such as the SVM, the RF has the characteristics of high precision and fast
calculation speeds to process high-dimensional data [18,50]. Several studies have investigated the use
of the RF classifier for rice mapping with SAR datasets [18,51,52]. Considering that 30 time series SAR
data are to be processed, we chose the RF for comparison with deep learning methods.

3.5. Classifier Training


The objective of classifier training is to get the optimal architectures and hyper-parameters of each
method. The criterion is usually decided in order to obtain the highest accuracy with the least amount
of calculation. Thirty time series of dual-polarized (VH + VV) data were input when training was
executed, and the dimension of an input sample for the 1D RNN and RF was 60 (30(time series) ×
2(VV and VH)). These values were (30, 2) for LSTM RNN and GRU RNN. Since the distribution of the
number of samples of different crops was uneven, we randomly selected 80% of samples of each crop
type to form the training set, and the remaining samples (20%) constituted the test set. The trained
hyper-parameters are shown in Table 2. All of the training for both CNNs and RNNs was performed
by using the Adam optimizer with cross entropy loss [53], which has been shown to be superior to
other stochastic optimization methods [54], and it has been used successfully in some classification
tasks of time series [30,31,55].
Remote Sens. 2019, 11, 2673 9 of 23

Table 2. Hyper-parameters. The test values of each hyper-parameter are sorted from small to large,
rather than assembled into a training sequence. 1D CNNs, one-dimensional convolutional neural
networks; LSTM RNNs, long short-term memory recurrent neural networks; GRU RNNs, gated
recurrent unit RNNs; RF, random forest.

Hyper-Parameter Tested Hyper-Parameter Optimal


Model Description
Name Values Hyper-Parameter Value
Number of filters
num_filter1 in the first 1D Conv 10, 12, 14, 16, 18 16
layer
Number of filters
num_filter2 in the second 1D 6, 8, 10, 12, 14, 16 14
Conv layer
Number of filters
num_filter3 in the third 1D 4, 6, 8, 10 8
Conv layer
Number of neurons
num_neu1 in the first fully 20, 30, 36, 38, 40 38
connected layer
Maximum number 10,000, 12,000, 15,000,
max_interations 10,000
of iterations 20,000
Number of samples
1D
batch_size for every batch of 32, 64, 128 64
CNNs
training
Dropout rate of a
dropout neuron in the first 0.5, 1 1
fully-con layer
0.00001, 0.00002,
0.00003, 0.00004,
learning_rate Learning rate 0.00002
0.00005,
0.0001
LSTM Number of hidden
num_layers 1, 2, 3, 4 3
RNNs layers
Number of hidden
hidden_size 50, 100, 150, 200 100
neurons per layer
0.0005, 0.005
learning_rate Learning rate 0.005
0.004, 0.006
Dropout rate of a
dropout neuron in hidden 0.5, 1 1
layers
Maximum gradient
max_grad_norm 1, 2.5, 5, 10 5
norm
Maximum number 10,000, 15,000, 18,000,
max_interations 15,000
of iterations 20,000
Number of samples
batch_size for every batch of 32, 64, 128 64
training
GRU Number of hidden
num_layers 1, 2, 3, 4 2
RNNs layers
Number of hidden
hidden_size 50, 100, 150, 200 200
neurons per layer
0.0005, 0.005
learning_rate Learning rate 0.005
0.004, 0.006
Dropout rate of a
dropout neuron in hidden 0.5, 1 1
layers
Maximum gradient
max_grad_norm 1, 2.5, 5, 10 5
norm
Maximum number 10,000, 15,000, 18,000,
max_interations 20,000
of iterations 20,000
Number of samples
batch_size for every batch of 32, 64, 128 64
training
RF n_estimators Number of trees 100, 200, 300, 400, 500 400
Remote Sens. 2019, 11, 2673 10 of 23

In the case of the 1D CNN, the training started using an architecture with two convolutional
layers and two fully connected layers [56–59]. The number of neurons in the second fully connected
layer was equal to the number of categories (5), and thus was not trained as a parameter. When the
accuracy of the test set was unchanged by changing hyper-parameters, we added a convolutional
layer to generate a new network. In order to improve the training speed and generalization ability of
networks, we added a batch normalization (Batch-Norm) layer after each Conv layer [36,60]. Hence,
there was a Batch-Norm layer [61] and a Relu layer [45] after each convolutional layer. Using a model
with the best performance on the validation set as a seed, a new iterative training was started until an
acceptable accuracy above 0.950 was reached. Other parameters were set based on experience, e.g.,
the width of the “filter” was generally set to be small sizes (3–5) in order to capture local temporal
information [56,62].
The starting architecture of the LSTM RNN and GRU RNN included one hidden layer with 50
neurons, and iterative training was performed by adding hidden layers and changing hyper-parameters.
Experiential values of hyper-parameters in References [63,64] were referred to in the training process.
To run the RF model at each time point, it is necessary to tune several adjustable hyper-parameters.
The primary parameters are the number of predictors at each decision tree node split and the number
of decision trees to run [65]. These are parameters “max_features” and “n_estimators”, respectively, in
Scikit-learn. In this study, features changed with the length of time series, and therefore, the parameter
Remote Sens. 2019, 11, x FOR PEER REVIEW √ 10 of 23
“max_features” was set with the default value p (p is the number of input features) [66].
The
The optimized
optimized1D 1DCNN
CNNarchitecture
architectureincludes
includesthree
threeConv
Convlayers and
layers andtwo fully-con
two layers
fully-con (Figure
layers (Figure6).
The convolution
6). The convolution kernel sizessizes
kernel (width height ×height
(width input× channel
input channel output channel)
outputof channel)
Conv1, Conv2, and
of Conv1,
Conv3 are 5 × 1 × 1 × 16, 4 × 1 × 16 × 14, and 3 × 1 × 14 × 8, respectively.
Conv2, and Conv3 are 5 1 1 16, 4 1 16 14, and 3 1 14 8, respectively.

Figure 6. Architecture of the optimal one-dimensional convolutional neural network.

Architectures
Architectures of of the
the LSTM-based
LSTM-based and and the
the GRU-based
GRU-based networks
networks with
with optimal
optimal performance
performance areare
presented in Figures 7 and 8, respectively. The optimized LSTM RNN architecture consists
presented in Figure 7 and Figure 8, respectively. The optimized LSTM RNN architecture consists of of three
hidden layers, layers,
three hidden and thereandarethere
100 LSTM neurons
are 100 LSTMinneurons
each layer.
in The
eachoptimized
layer. The GRU RNN architecture
optimized GRU RNN is
shallower, with two hidden layers; there are 200 neurons in each layer. These two networks
architecture is shallower, with two hidden layers; there are 200 neurons in each layer. These two only output
classification
networks only results at the
output last time step
classification in aat
results time
theseries. Therefore,
last time step in they areseries.
a time both many-to-one RNNs.
Therefore, they are
both many-to-one RNNs.

Figure 7. Architecture of the optimal long short-term memory recurrent neural network (many-to-
threeArchitectures
hidden layers, of the
andLSTM-based
there are 100 and the GRU-based
LSTM neurons innetworks with
each layer. optimal
The performance
optimized GRU RNN are
presented
architecture in is
Figure 7 and with
shallower, Figure 8, respectively.
two Thethere
hidden layers; optimized
are 200LSTM RNN
neurons in architecture consists
each layer. These twoof
three hidden layers, and there are 100 LSTM neurons in each layer. The optimized GRU
networks only output classification results at the last time step in a time series. Therefore, they are RNN
architecture
both many-to-one is shallower,
RNNs. with two hidden layers; there are 200 neurons in each layer. These two
networks only output classification results at the last time step in a time series. Therefore, they are
Remote Sens. 2019, 11, 2673 11 of 23
both many-to-one RNNs.

Figure 7. Architecture of the optimal long short-term memory recurrent neural network (many-to-
one).
Figure 7. Architecture of the optimal long short-term memory recurrent neural network (many-to-one).
Figure 7. Architecture of the optimal long short-term memory recurrent neural network (many-to-
one).

Figure
Figure 8. Architecture of
8. Architecture of the
the optimal
optimal gated
gated recurrent
recurrent unit
unit recurrent
recurrent neural
neural network
network (many-to-one).
(many-to-one).

Implementation details:
Figure 8. Architecture 1Doptimal
of the CNNs,gated
LSTM RNNs,unit
recurrent andrecurrent
GRU RNNs neuralwere implemented
network based on
(many-to-one).
Tensorflow-cpu, version 1.13.1; the RF classifier was used with Scikit-learn, version 0.19.1; and the
Python version was 3.6.

3.6. Incremental Classification


In order to perform crop-type identification before the end of crop seasons, an incremental
classification procedure was used. The objective of this method was to obtain the best classification
results based on the shortest time series data. Firstly, we set the first time point (10 March 2017) as
the start, and performed supervised classification using 1D CNNs, LSTM RNNs, GRU RNNs, and RF
classifiers with optimal architectures and hyper-parameters. Then, the four classifiers were triggered
at each time when a new S1A image acquisition was available, using all of the previously acquired
images [36]. Finally, we obtained three deep learning networks and an RF with all parameters at each
time point. The above configurations allowed us to analyze the evolution of the classification quality
as a function of time and thus we could find the earliest time points at which classifiers identified
different crop types effectively. The test protocol used 80% of the samples of each crop type for training
and the rest for testing.
To reduce the influence of random sample splitting bias, five random splits were performed
in order to perform five trainings and five corresponding tests. This allowed us to compute
average performances.

3.7. Accuracy Assessment


As shown in Figure 2, the accuracy assessment of the proposed classification methods consisted
of two steps.
Remote Sens. 2019, 11, 2673 12 of 23

(i) First of all, the Kappa coefficient and overall accuracy (OA) were used for the overall accuracy
measure of different classifiers. Then, the confusion matrix, producer’s accuracy (PA), and user’s
accuracy (UA) were calculated using the highest OA time point of each classifier for further overall
assessment. All these calculations were introduced in Reference [67].
A confusion matrix (as demonstrated in Table 3) lists the values (A, B, C, and D) for known cover
types of the reference data in the columns and those of the classified data in the rows. A, B, C, and D
represent the number of true positives, false positives, false negatives, and true negatives, respectively.

Table 3. Example layout of a confusion matrix.

Reference Data
Crop Urban
Crop A B
Classified Data
Urban C D

OA is calculated by dividing the correctly classified pixels (sum of the values in the main diagonal)
by the total number of pixels checked. The Kappa coefficient is a measure of overall agreement of a
matrix. In contrast to the overall accuracy—the ratio of the sum of diagonal values to the total number
of cell counts in the matrix—the Kappa coefficient also takes non-diagonal elements into account [68].
The PA is derived by dividing the number of correct pixels in one class by the total number of
pixels as derived from the reference data (column total in Table 3). Meanwhile, the UA is derived by
dividing the number of correct pixels in one class by the total number of pixels derived from classified
data (row total in Table 3) [69].
(ii) In order to evaluate the performances of different classifiers on each crop type and find the
optimal time series lengths for different crops, the F-measure was used. The F-measure is defined as a
harmonic mean of precision (P) and recall (R) (see Equation (14)) [70]. Recall equals PA, and precision
is the same as UA.
2×P×R
F= , (14)
P+R

4. Results

4.1. Temporal Profiles of the Sentinel-1A Backscatter Coefficient


Figure 9 summarizes the temporal profiles of the five crop types per polarization, and each point
is the average backscatter coefficient of samples per type. There are 30 points in each time series (one
for each acquisition). Figure 10 provides information on the temporal dynamic of crop types by giving
their averages and standard deviations.
(ii) The backscatter coefficient of banana, which had the highest average backscatter values (VH
+ VV) (Figure 10), was always higher than that of other crop types before October. Thus, banana can
be identified early.
(iii) The backscatter coefficient curves of two-season paddy rice, sugarcane, and pineapple
changed dramatically and intersected many times before September, and therefore early
Remote Sens. 2019, 11, 2673 13 of 23
identification of these three crop types was more difficult than identification of other crop types.

Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 23


Remote Sens. 2019, 11, x FOR PEER REVIEW 13 of 23
(a) VV Polarization

(b) VH Polarization
(b) VH Polarization
Figure 9. Temporal profiles of the five different crop types with respect to the (a) VV and (b) VH
Figure
Figure 9. Temporal profiles
9. Temporal profiles of the five different crop types with respect to the (a) VV and (b) VH
backscatter coefficient (dB). of the five different crop types with respect to the (a) VV and (b) VH
backscatter
backscatter coefficient
coefficient (dB).
(dB).

Figure
Figure 10. Averages and
10. Averages and standard
standard deviations
deviations of
of the
the backscatter
backscatter coefficient
coefficient (dB) of VV and VH
Figure 10. Averages
polarizations for the and
five standard
different deviations
crop types.
polarizations for the five different crop types. of the backscatter coefficient (dB) of VV and VH
polarizations for the five different crop types.
There are
4.2. Overall severalMetrics
Accuracy important characteristics reflected in Figures 9 and 10:
4.2. Overall Accuracy Metrics
Figure 11 summarizes the evolution of average classification accuracies of five test sets as a
Figure
function 11 summarizes
of time series usingthe
1D evolution of average
CNNs (blue), classification
LSTM RNNs (orange),accuracies
GRU RNNs of five test
(red), sets
and theasRFa
function
(yellow). Each time point indicates the performance from five test sets on S1A imagery time the
of time series using 1D CNNs (blue), LSTM RNNs (orange), GRU RNNs (red), and RF
series.
(yellow).
The Kappa Each time point
coefficient indicates
value theeach
given for performance
time pointfrom
is thefive test sets
average fromonfive
S1Arepetitions.
imagery time series.
The Kappa coefficient value given for each time point is the average from five repetitions.
In terms of the temporal profiles of the Kappa coefficient (Figure 11), we can see that the
Remote Sens. 2019, 11, 2673 14 of 23

(i) The backscatter coefficient curve of eucalyptus (Figure 9) was almost horizontal, meaning
its backscatter coefficient rarely changed throughout the growing season. In addition, compared to
other crop types, eucalyptus had the smallest backscatter standard deviations (VH + VV) (Figure 10).
Therefore, it exhibited significant temporal characteristics, which benefit its identification.
Remote(ii) The
Sens. backscatter
2019, coefficient
11, x FOR PEER REVIEWof banana, which had the highest average backscatter values14(VH +
of 23
VV) (Figure 10), was always higher than that of other crop types before October. Thus, banana can be
respectively.
identified Therefore, the 1D CNNs achieved the highest overall crop classification accuracy.
early.
However, GRU RNNs achieved
(iii) The backscatter coefficientthe maximum
curves OA twopaddy
of two-season monthsrice,earlier than and
sugarcane, 1D CNNs andchanged
pineapple the RF,
and had the Kappa
dramatically coefficient
and intersected above
many 0.900
times for the
before first timeand
September, before the 1Dearly
therefore CNNs and RF. Confusion
identification of these
matrices
three crop(averages
types was from
morefive test sets)
difficult thanonidentification
the time series ofdata with
other cropthe maximum OAs of the different
types.
approaches are reported in Figure 12. The producer’s accuracy (PA) and user’s accuracy (UA) are
4.2. Overall Accuracy
summarized in TableMetrics
5 and Table 6. We can observe that 1D CNNs had the best UAs on three crop
typesFigure
(sugarcane,
11 summarizesand
banana, theeucalyptus),
evolution ofand the second-best
average UAsaccuracies
classification were obtained bytest
of five GRU RNNs
sets as a
on two crop
function types
of time (paddy
series and1D
using pineapple).
CNNs (blue), LSTM RNNs (orange), GRU RNNs (red), and the RF
The results
(yellow). Each time showed that 1D CNNs
point indicates exhibited from
the performance the best
five accuracies
test sets onoverall, but GRU
S1A imagery time RNNs
series.
performed better in classification before the end of growth seasons.
The Kappa coefficient value given for each time point is the average from five repetitions.

Figure 11. Kappa coefficient profiles of the four classifiers.


Figure 11. Kappa coefficient profiles of the four classifiers.

In terms of the temporal profiles of the Kappa coefficient (Figure 11), we can see that the accuracies
Table 4. A summary of Kappa coefficients and the overall accuracies of the different classifiers.
of 1D CNNs, LSTM RNNs, and GRU RNNs increase with the length of the time series. Moreover, the
Bolded dates and values are the best results.
curve of 1D CNNs is very close to that of the RF. This is an important result supporting the early crop
classification
Classifiersolution of combining
Kappa-Max deep learning
OA-Max models
The Date with the incremental
of Maximum First Dateclassification.
of Kappa≥0.900
In
1Daddition,
CNNs from Figure
0.942 11 we can also observe
0.959 that the highest
21 February 2018 accuracy30ofSeptember
each classifier
2017is above
0.900, thus showing the quality of the three deep learning methods for crop classification tasks using
LSTM RNNs 0.931 0.951 23 December 2017 6 September 2017
S1A imagery in the AOI. A summary of the overall performances of different classification approaches
GRU RNNs
is reported 0.934table includes
in Table 4. This 0.954 four metrics:
11 December 2017
(i) Kappa-Max, 6 September
the maximum 2017
Kappa coefficient
RF can obtain;
each classifier 0.937 0.954
(ii) OA-Max, 9 February
the maximum 2018
OA a classifier achieves;24(iii)
October 2017
Date of Maximum,
the date corresponding to the Kappa-Max and OA-Max metrics; and (iv) First Date of Kappa ≥ 0.900,
the first time point when the Kappa coefficient is above 0.900.
Remote Sens. 2019, 11, 2673 15 of 23

Table 4. A summary of Kappa coefficients and the overall accuracies of the different classifiers. Bolded
dates and values are the best results.

The Date of First Date of


Classifier Kappa-Max OA-Max
Maximum Kappa ≥ 0.900
1D CNNs 0.942 0.959 21 February 2018 30 September 2017
LSTM RNNs 0.931 0.951 23 December 2017 6 September 2017
GRU RNNs 0.934 0.954 11 December 2017 6 September 2017
RF 0.937 0.954 9 February 2018 24 October 2017

The maximum Kappa coefficient values of 1DCNNs, LSTM RNNs, GRU RNNs, and RF were
0.942, 0.931, 0.934, and 0.937, respectively, and the maximum OAs were 0.956, 0.951, 0.954, and
0.954, respectively. Therefore, the 1D CNNs achieved the highest overall crop classification accuracy.
However, GRU RNNs achieved the maximum OA two months earlier than 1D CNNs and the RF,
and had the Kappa coefficient above 0.900 for the first time before the 1D CNNs and RF. Confusion
matrices (averages from five test sets) on the time series data with the maximum OAs of the different
approaches are reported in Figure 12. The producer’s accuracy (PA) and user’s accuracy (UA) are
summarized in Tables 5 and 6. We can observe that 1D CNNs had the best UAs on three crop types
(sugarcane, banana, and eucalyptus), and the second-best UAs were obtained by GRU RNNs on two
Remote Sens. 2019, 11, x FOR PEER REVIEW 15 of 23
crop types (paddy and pineapple).

(a) 1D CNNs (b) LSTM RNNs

(c) GRU RNNs (d) RF

Figure
Figure12.12.Confusion
Confusionmatrices
matrices(averages
(averagesfrom
fromfive
fivetest
testsets)
sets)ononthe
thetime
timeseries
seriesdata
datawith
withthe
themaximum
maximum
overall
overall accuracies (OAs) of the different approaches: (a) 1D CNNs; (b) LSTM RNNs; (c)GRU
accuracies (OAs) of the different approaches: (a) 1D CNNs; (b) LSTM RNNs; (c) GRURNNs;
RNNs;
and
and(d)
(d)RF.
RF.

Table 5. Producer’s accuracy (PA) on the time series data with the maximum OAs. Bolded values are
the best performances (PA) of each crop type in different classifiers.

Crop 1D CNNs PA LSTM PA GRU PA RF PA


Paddy 0.988 0.961 0.977 0.977
Remote Sens. 2019, 11, 2673 16 of 23

Table 5. Producer’s accuracy (PA) on the time series data with the maximum OAs. Bolded values are
the best performances (PA) of each crop type in different classifiers.

Crop 1D CNNs PA LSTM PA GRU PA RF PA


Paddy 0.988 0.961 0.977 0.977
Sugarcane 0.936 0.930 0.919 0.932
Banana 0.945 0.944 0.981 0.929
Pineapple 0.907 0.905 0.889 0.905
Eucalyptus 0.968 0.965 0.970 0.970

Table 6. User’s accuracy (UA) on the time series data with the maximum OAs. Bolded values are the
best performances (UA) of each crop type in different classifiers.

Crop 1D CNNs UA LSTM UA GRU UA RF UA


Paddy 0.944 0.961 0.966 0.955
Sugarcane 0.953 0.921 0.949 0.949
Banana 0.981 0.962 0.962 0.981
Pineapple 0.886 0.864 0.909 0.864
Eucalyptus 0.976 0.973 0.956 0.968

The results showed that 1D CNNs exhibited the best accuracies overall, but GRU RNNs performed
Remote Sens. 2019, 11, x FOR PEER REVIEW 16 of 23
better in classification before the end of growth seasons.
4.3.
4.3. Incremental
Incremental Classification
Classification Accuracy of Each
Accuracy of Each Crop
Crop
The F-measureisisused
The F-measure usedtotomeasure
measure a test’s
a test’s accuracy,
accuracy, andand it balances
it balances theofuse
the use of precision
precision and
and recall.
recall. Since
Since crop crop
type type determined
determined the optimalthe time
optimal time
series series
length, welength,
reportedweper-type
reportedF-measure
per-type values
F-measure
as a
values as a function of time series using 1 D CNNs (Figure 13), LSTM RNNs (Figure
function of time series using 1 D CNNs (Figure 13), LSTM RNNs (Figure 14), GRU RNNs (Figure 15), 14), GRU RNNs
(Figure
and RF 15), and16)
(Figure RF to
(Figure 16)early
find the to find the early classification
classification time series of time series crop
different of different
types. crop types. In
In addition, a
addition,
summary ashowing
summary theshowing
dates and thevalues
dates of
and values
each cropof each
type cropF-measures
when type when were
F-measures werefor
above 0.900 above
the
0.900 for the
first time for first time classifiers
different for different classifiers
is given is given
in Table 7. in Table 7.

Figure 13. F-measure time series of one-dimensional convolutional neural networks.


Figure 13. F-measure time series of one-dimensional convolutional neural networks.
Remote Sens. 2019, 11, 2673 17 of 23
Figure 13. F-measure time series of one-dimensional convolutional neural networks.

Remote Sens. 2019, 11, 14.


Figure x FOR PEER REVIEW
F-measure time series of long short-term memory recurrent neural networks. 17 of 23
Figure 14. F-measure time series of long short-term memory recurrent neural networks.

Figure
Figure 15. F-measure
F-measure time
time series
series of
of gated
gated recurrent
recurrent unit recurrent neural networks.
Remote Sens. 2019, 11, 2673 18 of 23
Figure 15. F-measure time series of gated recurrent unit recurrent neural networks.

Figure
Figure 16. F-measure time
16. F-measure time series
series of
of random
random forest.
forest.
Table 7. Dates and values of each crop type when F-measures were above 0.900 for the first time for
Table 7. Dates and values of each crop type when F-measures were above 0.900 for the first time for
different classifiers.
different classifiers.
Classifier Paddy Sugarcane Banana Pineapple Eucalyptus
Classifier Paddy Sugarcane Banana Pineapple Eucalyptus
13 August 2017 18
18 September
September 2017 1 August 2017 16 January 2018 20 July 2017
1D 13 August 2017 1 August 2017 16 January 2018 20 July 2017
1D CNNs
CNNs (0.903) 2017
(0.900) (0.914) (0.911) (0.917)
(0.903) (0.914) (0.911) (0.917)
(0.900)
6 September 2017 18 September 2017 13 August 2017 8 July 2017
LSTM RNNs 6 September 18 September
LSTM (0.935) (0.906) 13 August 2017
(0.947) 8 July 2017
(0.900)
2017 2017
RNNs 6 September 2017 24 October 2017 (0.947)
14 June 2017 (0.900)
8 July 2017
GRU RNNs (0.935) (0.906)
(0.942)
6 September (0.921) (0.911) (0.909)
1 August 24 October 2017
2017 14 June 20172017 8 14
July 2017
GRU RNNs 2017 2017 13 August 8 July June 2017
RF (0.921) (0.911) (0.909)
(0.910)
(0.942) (0.908) (0.956) (0.902)
1 August 2017 13 August 2017 8 July 2017 14 June 2017
RF
(0.910) (0.908) (0.956) (0.902)

From Figures 13–16, we can observe the following characteristics of the F-measure time series:
(i) The F-measure values of banana and eucalyptus changed slowly with all four methods, and
were first above 0.900 between June and July 2017. To explain this behavior, we can refer to the temporal
profiles of VH and VV presented in Figure 9. The dual-polarized backscatter coefficient values of
banana were the highest before October, and therefore banana was easily identified at this stage. As
discussed in Section 4.1, the temporal profiles of VH and VV of eucalyptus (with a tiny fluctuation and
a small standard deviation value (see Figure 10) were more distinct than those of other crops.
(ii) All classifiers performed poorly on pineapple; it had the largest VH + VV backscatter standard
deviation values (see Figure 10). This might be related to its year-round planting. Moreover, its
F-measure time series values changed greatly with three deep learning methods, especially GRU RNNs,
thus causing great volatility of its Kappa coefficient temporal profile. It is worth mentioning that this
phenomenon is related to the long-term temporal dependence of GUR RNNs.
(iii) In all four methods, the F-measure values of the paddy and sugarcane were relatively higher
on 1 August 2017, and this was followed by slow or fluctuating growths. As shown in Figure 9, the VH
backscatter coefficient of the paddy decreased significantly on 1 August 2017 due to the harvest of
first-season paddy rice.
Remote Sens. 2019, 11, 2673 19 of 23

Furthermore, taking into account the summarized dates and accuracies summarized in Table 7,
we can report that GRU RNNs had an advantage in classifying banana, the 1D CNN classifier was the
only one that achieved an accuracy on pineapple above 0.900, and the RF had better early classification
of the three other crop types (second-season paddy rice, sugarcane, and eucalyptus). However, on
account of the growth periods of second-season paddy rice (early August to late November), sugarcane
(mid-March to mid-February of the second year), and eucalyptus (4–6 years) (see Section 2.1), we
believe that 1D CNNs, LSTM RNNs, and GRU RNNs achieved acceptable results for early classification
of these crops.

5. Discussion
In this work we attempted to evaluate the use of three deep learning methods (1D CNNs, LSTM
RNNs, and GRU RNNs) for early crop classification using S1A image time series in the AOI. We
proposed to use all 30 S1A images during the 2017 growing season to train the architectures and
hyper-parameters of three deep learning models, and we obtained three classifiers accordingly. Then,
starting at the first time point, we performed an incremental classification process to train each classifier
using all of the previous data. We obtained a classification network with all of the parameter values
(including the hyper-parameters acquired earlier) at each time point. In order to validate the solution,
we also implemented the classic RF approach.
First of all, we showed that with both the deep learning methods and the classical approach (the
RF), good classification performance could be achieved with S1A SAR time series data. In order to find
the early classification time series of different crop types, we reported the F-measure time series of each
classifier for each crop type in Figures 13–16 and summarized the first time points (and F-measure
values) where the F-measure was above 0.900 in Table 6. We note that only 6 of 33 optical images
acquired by Sentinel-2 [71] during the 2017 growing season in the AOI were not hindered by cloud
cover. Therefore, good performance in early crop classification can be achieved because S1A SAR
with a 12-day revisit period allows us to not only obtain data under any weather conditions, but also
permits a precise temporal follow-up of crop growth.
All results in Section 4 indicated the effectiveness of the proposed solution, which avoided the
training of optimal architectures and hyper-parameters at each time point. Although the performances
of the three deep learning methods are almost similar to those of the RF, as mentioned in Section 1,
deep learning models have advantages that other methods do not. For example, they allow a machine
to be fed with raw data, can learn long-term dependencies and representations that handcrafted feature
models cannot [22,23,59], and so on. Therefore, we believe that deep learning methods will play an
important role in early crop identification in the near future.
Further illustrating the performance of the three deep learning methods on S1A time series data,
Figure 12 shows that the 1D CNNs and RF performed better than LSTM RNNs and GRU RNNs
before July. As the length of the time series increased, the accuracies of LSTM RNNs and GRU
RNNs increased rapidly, especially between June and August 2017. This was due to the long-term
dependencies improving the performance of LSTM RNNs and GRU RNNs. Although GRU RNNs
achieved an accuracy above 0.900 earlier than other classifiers, i.e., on 6 September 2017, its Kappa
coefficient temporal profile fluctuated more. This performance was mainly due to RNNs establishing
long-term dependence on the sequence data; meanwhile, the 1D CNN convolutional kernel was locally
calculated [72].
As described in Section 1, Cai et al. demonstrated the effectiveness of training all parameters of
deep learning models using time series data from different years [34]. We can use all time series data
during growing seasons from different years to train optimal architectures and hyper-parameters of 1D
CNNs, LSTM RNNs, and GRU RNNs, and then train their parameters at each time point. It is worth
pointing out that training data must be sorted by DOY when they come from different years [55]. In
addition, the input data of all the used models were ground data labels and backscatter coefficients,
and therefore, the solution can be scalable to other regions.
Remote Sens. 2019, 11, 2673 20 of 23

6. Conclusions
In this paper, we investigated the potential of three deep learning methods (1D CNNs, LSTM
RNNs, and GRU RNNs) for early crop classification in S1A imagery time series in Zhanjiang City,
China. The main conclusions are as follows.
First, we validated the effectiveness of combining 1DCNNs, LSTM RNNs, and GRU RNNs with
an incremental classification method for early crop classification using the time series of 30 S1A images
by comparing them with the classical method, RF. The key idea of this solution was that the three
deep learning models were trained on the full time series data to produce optimal architectures and
hyper-parameters, and then all of the parameters were trained at each time point with all of the
previous data. This solution increased the application efficiency of deep learning models for early crop
classification by avoiding training optimal architectures and hyper-parameters at each time point.
Second, in terms of early classification of different crop types, we demonstrated that the three
deep learning methods could achieve an F-measure above 0.900 before the end growth seasons of the
five crop types.
Finally, we found that, compared with 1D CNNs, the performance metrics of the two RNNs
(LSTM- and GRU-based) were smaller in the case of very short time series lengths. Moreover, the
Kappa coefficient temporal profiles of the two RNNs showed greater fluctuations. This is mainly
because RNNs are more sensitive to long-term temporal dependencies.
Future work intends to focus on parcel-based early crop identification using deep learning
methods in order to map crops intelligently for sustainable agriculture development.

Author Contributions: Conceptualization, H.Z. and Z.C.; methodology, H.Z. and H.J.; software, H.Z.; validation,
H.Z., L.S. and W.J.; formal analysis, H.Z., L.S. and W.J.; investigation, H.Z., H.J. and Z.C.; resources, Z.C. and
H.J.; data curation, H.Z. and H.J.; writing—original draft preparation, H.Z.; writing—review and editing, H.Z.,
W.J., L.S. and M.F.; visualization, H.Z. and W.J.; supervision, H.Z. and Z.C.; project administration, H.Z.; funding
acquisition, Z.C. and S.L.
Funding: This research was funded by the Atmospheric Correction Technology of GaoFen-6 Satellite Data (No.
30-Y20A02-9003-17/18), Imported Talent of Innovative Project of CAAS (Agricultural Remote Sensing) (No. 960-3),
Modern Agricultural Talent Support Project of the Ministry of Agriculture and Villages (Spatial Information
Technology Innovation Team of Agriculture) (No. 914-2), GDAS’ Project of Science and Technology Development
(NO. 2019GDASYL-0502001), National Natural Science Foundation of China (No. 41601481) and Guangdong
Provincial Agricultural Science and Technology Innovation and Promotion Project in 2018 (No. 2018LM2149).
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Kolotii, A.; Kussul, N.; Shelestov, A.Y.; Skakun, S.V.; Yailymov, B.; Basarab, R.; Lavreniuk, M.; Oliinyk, T.;
Ostapenko, V. Comparison of biophysical and satellite predictors for wheat yield forecasting in Ukraine.
ISPRS Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2015, 40, 39–44. [CrossRef]
2. Kussul, N.; Kogan, F.; Adamenko, T.I.; Skakun, S.V.; Kravchenko, A.N.; Krivobok, A.A.; Shelestov, A.Y.;
Kolotii, A.V.; Kussul, O.M.; Lavrenyuk, A.N. Winter Wheat Yield Forecasting: A Comparative Analysis of
Results of Regression and Biophysical Models. J. Autom. Inf. Sci. 2013, 45, 68–81.
3. Skakun, S.; Franch, B.; Vermote, E.; Roger, J.-C.; Becker-Reshef, I.; Justice, C.; Kussul, N. Early season
large-area winter crop mapping using MODIS NDVI data, growing degree days information and a Gaussian
mixture model. Remote Sens. Environ. 2017, 195, 244–258. [CrossRef]
4. McNairn, H.; Kross, A.; Lapen, D.; Caves, R.; Shang, J. Early season monitoring of corn and soybeans with
TerraSAR-X and RADARSAT-2. Int. J. Appl. Earth Obs. Geoinf. 2014, 28, 252–259. [CrossRef]
5. Vintrou, E.; Ienco, D.; Begue, A.; Teisseire, M. Data Mining, A Promising Tool for Large-Area Cropland
Mapping. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2013, 6, 2132–2138. [CrossRef]
6. Huang, J.; Ma, H.; Sedano, F.; Lewis, P.; Liang, S.; Wu, Q.; Su, W.; Zhang, X.; Zhu, D. Evaluation of regional
estimates of winter wheat yield by assimilating three remotely sensed reflectance datasets into the coupled
WOFOST–PROSAIL model. Eur. J. Agron. 2019, 102, 1–13. [CrossRef]
Remote Sens. 2019, 11, 2673 21 of 23

7. De Leeuw, J.; Vrieling, A.; Shee, A.; Atzberger, C.; Hadgu, K.; Biradar, C.; Keah, H.; Turvey, C. The potential
and uptake of remote sensing in insurance: A review. Remote Sens. 2014, 6, 10888–10912. [CrossRef]
8. Sonobe, R.; Yamaya, Y.; Tani, H.; Wang, X.; Kobayashi, N.; Mochizuki, K.-I. Crop classification from
Sentinel-2-derived vegetation indices using ensemble learning. J. Appl. Remote Sens. 2018, 12, 026019.
[CrossRef]
9. Vreugdenhil, M.; Wagner, W.; Bauer-Marschallinger, B.; Pfeil, I.; Teubner, I.; Rüdiger, C.; Strauss, P. Sensitivity
of Sentinel-1 Backscatter to Vegetation Dynamics: An Austrian Case Study. Remote Sens. 2018, 10, 1396.
[CrossRef]
10. Xie, L.; Zhang, H.; Li, H.; Wang, C. A unified framework for crop classification in southern China using fully
polarimetric, dual polarimetric, and compact polarimetric SAR data. Int. J. Remote Sens. 2015, 36, 3798–3818.
[CrossRef]
11. Lee, J.S.; Pottier, E. Polarimetric Radar Imaging: Basics to Applications, 2nd ed.; CRC Press: Boca Raton, FL,
USA, 2016.
12. Cloude, S. Polarisation: Applications in Remote Sensing. Phys. Today 2010, 63, 53–54.
13. Mosleh, M.; Hassan, Q.; Chowdhury, E. Application of remote sensors in mapping rice area and forecasting
its production: A review. Sensors 2015, 15, 769–791. [CrossRef] [PubMed]
14. Jiang, H.; Li, D.; Jing, W.; Xu, J.; Huang, J.; Yang, J.; Chen, S. Early Season Mapping of Sugarcane by Applying
Machine Learning Algorithms to Sentinel-1A/2 Time Series Data: A Case Study in Zhanjiang City, China.
Remote Sens. 2019, 11, 861. [CrossRef]
15. Rogan, J.; Franklin, J.; Roberts, D.A. A comparison of methods for monitoring multitemporal vegetation
change using Thematic Mapper imagery. Remote Sens. Environ. 2002, 80, 143–156. [CrossRef]
16. Xie, Y.; Sha, Z.; Yu, M. Remote sensing imagery in vegetation mapping: A review. J. Plant Ecol. 2008, 1, 9–23.
[CrossRef]
17. Potin, P.; Rosich, B.; Grimont, P.; Miranda, N.; Shurmer, I.; O’Connell, A.; Torres, R.; Krassenburg, M.
Sentinel-1 mission status. In Proceedings of the EUSAR 2016: 11th European Conference on Synthetic
Aperture Radar, Hamburg, Germany, 6–9 June 2016; pp. 1–6.
18. Onojeghuo, A.O.; Blackburn, G.A.; Wang, Q.; Atkinson, P.M.; Kindred, D.; Miao, Y. Mapping paddy rice
fields by applying machine learning algorithms to multi-temporal Sentinel-1A and Landsat data. Int. J.
Remote Sens. 2018, 39, 1042–1067. [CrossRef]
19. Hao, P.; Zhan, Y.; Li, W.; Zheng, N.; Shakir, M. Feature Selection of Time Series MODIS Data for Early
Crop Classification Using Random Forest: A Case Study in Kansas, USA. Remote Sens. 2015, 7, 5347–5369.
[CrossRef]
20. Ndikumana, E.; Ho Tong Minh, D.; Baghdadi, N.; Courault, D.; Hossard, L. Deep Recurrent Neural Network
for Agricultural Classification using multitemporal SAR Sentinel-1 for Camargue, France. Remote Sens. 2018,
10, 1217. [CrossRef]
21. Ünsalan, C.; Boyer, K.L. Review on Land Use Classification; Springer: London, UK, 2011.
22. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436. [CrossRef]
23. Zhu, X.; Tuia, D.; Mou, L.; Xia, G.S.; Fraundorfer, F. Deep Learning in Remote Sensing: A Review. IEEE
Geosci. Remote Sens. Mag. 2017, 5, 8–36. [CrossRef]
24. Wang, Z.; Yan, W.; Oates, T. Time Series Classification from Scratch with Deep Neural Networks: A Strong
Baseline. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN). IEEE,
Anchorage, AK, USA, 14–19 May 2017.
25. Werbos, P.J. Backpropagation through time: What it does and how to do it. Proc. IEEE 1990, 78, 1550–1560.
[CrossRef]
26. Giles, C.L.; Miller, C.B.; Chen, D.; Sun, G.-Z.; Chen, H.-H.; Lee, Y.-C. Extracting and learning an unknown
grammar with recurrent neural networks. In Proceedings of the Advances in Neural Information Processing
Systems, Denver, CO, USA, 30 November–3 December 1992; pp. 317–324.
27. Lawrence, S.; Giles, C.L.; Fong, S. Natural language grammatical inference with recurrent neural networks.
IEEE Trans. Knowl. Data Eng. 2000, 12, 126–140. [CrossRef]
28. Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
[PubMed]
29. Cho, K.; Merrienboer, B.V.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation:
Encoder-Decoder Approaches. Comput. Sci. 2014. [CrossRef]
Remote Sens. 2019, 11, 2673 22 of 23

30. Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ.
2019, 221, 430–443. [CrossRef]
31. Rußwurm, M.; Körner, M. Temporal Vegetation Modelling using Long Short-Term Memory Networks for
Crop Identification from Medium-Resolution Multi-Spectral Satellite Images. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2017, Honolulu, HI, USA,
22–25 July 2017; pp. 11–19.
32. Rußwurm, M.; Körner, M. Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders.
ISPRS Int. J. Geo-Inf. 2018, 7, 129. [CrossRef]
33. Castro, J.; Achanccaray Diaz, P.; Sanches, I.; Cue La Rosa, L.; Nigri Happ, P.; Feitosa, R. Evaluation of
Recurrent Neural Networks for Crop Recognition from Multitemporal Remote Sensing Images. In Anais do
XXVII Congresso Brasileiro de Cartografia; SBC: Rio de Janeiro, Brazil, 2017.
34. Cai, Y.; Guan, K.; Peng, J.; Wang, S.; Seifert, C.; Wardlow, B.; Li, Z. A high-performance and in-season
classification system of field-level crop types using time-series Landsat data and a machine learning approach.
Remote Sens. Environ. 2018, 210, 35–47. [CrossRef]
35. Sak, H.; Senior, A.; Beaufays, F. Long short-term memory recurrent neural network architectures for large
scale acoustic modeling. In Proceedings of the Fifteenth Annual Conference of the International Speech
Communication Association, Singapore, 14–18 September 2014.
36. Inglada, J.; Vincent, A.; Arias, M.; Marais-Sicre, C. Improved Early Crop Type Identification by Joint Use of
High Temporal Resolution SAR And Optical Image Time Series. Remote Sens. 2016, 8, 362. [CrossRef]
37. Ren, W.; Chen, J.; Tao, J.; Li, K.; Zhao, X. Spatiotemporal Characteristics of Seasonal Meteorological Drought
in Leizhou Peninsula during 1984–2013. J. China Hydrol. 2017, 37, 36–41.
38. Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep Learning Classification of Land Cover and Crop
Types Using Remote Sensing Data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [CrossRef]
39. Copernicus Open Access Hub. Available online: http://www.alz.org/what-is-dementia.asp (accessed on 15
November 2019).
40. Frost, V.S.; Stiles, J.A.; Shanmugan, K.S.; Holtzman, J.C. A model for radar images and its application to
adaptive digital filtering of multiplicative noise. IEEE Trans. Pattern Anal. Mach. Intell. 1982, 4, 157–166.
[CrossRef]
41. Wang, M.-H.; Hung, C. Extension neural network and its applications. Neural Netw. 2003, 16, 779–784.
[CrossRef]
42. Xie, T.; Yu, H.; Wilamowski, B. Comparison between traditional neural networks and radial basis function
networks. In Proceedings of the 2011 IEEE International Symposium on Industrial Electronics, Gdansk,
Poland, 27–30 June 2011; pp. 1194–1199.
43. Pandey, P.; Barai, S. Multilayer perceptron in damage detection of bridge structures. Comput. Struct. 1995, 54,
597–608. [CrossRef]
44. O’Shea, K.; Nash, R. An Introduction to Convolutional Neural Networks. arXiv 2015, arXiv:1511.08458.
45. Lecun, Y.; Bottou, L.; Orr, G.B.; Muller, K.R. Efficient BackProp. In Neural Networks: Tricks of the Trade, This
Book Is an Outgrowth of A Nips Workshop; Springer: Berlin/Heidelberg, Germany, 2012.
46. Hu, W.; Huang, Y.; Wei, L.; Zhang, F.; Li, H. Deep convolutional neural networks for hyperspectral image
classification. J. Sens. 2015. [CrossRef]
47. Bakker, B. Reinforcement learning with long short-term memory. In Proceedings of the Advances in Neural
Information Processing Systems, Vancouver, BC, Canada, 9–14 December 2002; pp. 1475–1482.
48. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical evaluation of gated recurrent neural networks on
sequence modeling. arXiv 2014, arXiv:1412.3555.
49. Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [CrossRef]
50. Inglada, J.; Arias, M.; Tardy, B.; Hagolle, O.; Valero, S.; Morin, D.; Dedieu, G.; Sepulcre, G.; Bontemps, S.;
Defourny, P. Assessment of an operational system for crop type map production using high temporal and
spatial resolution satellite optical imagery. Remote Sens. 2015, 7, 12356–12379. [CrossRef]
51. Clauss, K.; Ottinger, M.; Leinenkugel, P.; Kuenzer, C. Estimating rice production in the Mekong Delta,
Vietnam, utilizing time series of Sentinel-1 SAR data. Int. J. Appl. Earth Obs. Geoinf. 2018, 73, 574–585.
[CrossRef]
52. Son, N.-T.; Chen, C.-F.; Chen, C.-R.; Minh, V.-Q. Assessment of Sentinel-1A data for rice crop classification
using random forests and support vector machines. Geocarto Int. 2018, 33, 587–601. [CrossRef]
Remote Sens. 2019, 11, 2673 23 of 23

53. Zhang, Z.; Sabuncu, M. Generalized cross entropy loss for training deep neural networks with noisy labels.
In Proceedings of the Advances in Neural Information Processing Systems, Montréal, QC, Canada, 3–8
December 2018; pp. 8778–8788.
54. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980.
55. Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate lstm-fcns for time series classification. Neural
Netw. 2019, 116, 237–245. [CrossRef] [PubMed]
56. Hatami, N.; Gavet, Y.; Debayle, J.; Hatami, N.; Gavet, Y.; Debayle, J. Classification of Time-Series Images
Using Deep Convolutional Neural Networks. In Proceedings of the Tenth International Conference on
Machine Vision (ICMV 2017), Vienna, Austria, 13–15 November 2017.
57. Yang, J.; Nguyen, M.N.; San, P.P.; Li, X.L.; Krishnaswamy, S. Deep convolutional neural networks on
multichannel time series for human activity recognition. In Proceedings of the Twenty-Fourth International
Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015.
58. Borovykh, A.; Bohte, S.; Oosterlee, C.W. Conditional Time Series Forecasting with Convolutional Neural
Networks. arXiv 2017, arXiv:1703.04691.
59. Cui, Z.; Chen, W.; Chen, Y. Multi-scale convolutional neural networks for time series classification. arXiv
2016, arXiv:1603.06995.
60. Jaderberg, M.; Simonyan, K.; Zisserman, A.; Kavukcuoglu, K. Spatial Transformer Networks. In Proceedings
of the Neural Information Processing Systems Conference, Montreal, QC, Canada, 7–12 December 2015;
pp. 2017–2025.
61. Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal
Covariate Shift. arXiv 2015, arXiv:1502.03167.
62. Zheng, Y.; Liu, Q.; Chen, E.; Ge, Y.; Zhao, J.L. Time series classification using multi-channels deep convolutional
neural networks. In Proceedings of the International Conference on Web-Age Information Management,
Macau, China, 16–18 July 2014; pp. 298–310.
63. Pascanu, R.; Mikolov, T.; Bengio, Y. On the difficulty of training recurrent neural networks. In Proceedings of
the International Conference on Machine Learning, Atlanta, GA, USA, 16–21 June 2013; pp. 1310–1318.
64. Jozefowicz, R.; Zaremba, W.; Sutskever, I. An empirical exploration of recurrent network architectures. In
Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2342–2350.
65. Tatsumi, K.; Yamashiki, Y.; Torres, M.A.C.; Taipe, C.L.R. Crop classification of upland fields using Random
forest of time-series Landsat 7 ETM+ data. Comput. Electron. Agric. 2015, 115, 171–179. [CrossRef]
66. Goldstein, B.A.; Polley, E.C.; Briggs, F.B. Random forests for genetic association studies. Stat. Appl. Genet.
Mol. Biol. 2011, 10. [CrossRef]
67. Congalton, R.G. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens.
Environ. 1991, 37, 35–46. [CrossRef]
68. Rosenfield, G.H.; Fitzpatrick-Lins, K. A coefficient of agreement as a measure of thematic classification
accuracy. Photogramm. Eng. Remote Sens. 1986, 52, 223–227.
69. Banko, G. A Review of Assessing the Accuracy of Classifications of Remotely Sensed Data and of Methods including
Remote Sensing Data in Forest Inventory; IIASA: Leiden, The Netherlands, 1998.
70. Sasaki, Y. The truth of the F-measure. Teach Tutor Mater 2007, 1, 1–5.
71. Drusch, M.; Bello, U.D.; Carlier, S.; Colin, O.; Fernandez, V.; Gascon, F.; Hoersch, B.; Isola, C.; Laberinti, P.;
Martimort, P. Sentinel-2: ESA’s Optical High-Resolution Mission for GMES Operational Services. Remote
Sens. Environ. 2012, 120, 25–36. [CrossRef]
72. Garnot, V.S.F.; Landrieu, L.; Giordano, S.; Chehata, N. Time-space tradeoff in deep learning models for crop
classification on satellite multi-spectral image time series. arXiv 2019, arXiv:1901.10503.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access
article distributed under the terms and conditions of the Creative Commons Attribution
(CC BY) license (http://creativecommons.org/licenses/by/4.0/).

You might also like