Professional Documents
Culture Documents
F
arXiv:1802.03518v2 [cs.CV] 20 Mar 2019
.
and DenseNet. We have demonstrated the application of our Hydra Temporal
..
framework in two datasets, FMOW and NWPU-RESISC45, achieving
results comparable to the state-of-the-art for the former and the best Spatial
reported performance so far for the latter. Code and CNN models are Textual metadata
available at https://github.com/maups/hydra-fmow.
gsd: 0.57625807..
Index Terms—Geospatial land classification, remote sensing image
cloud cover: 0
classification, functional map of world, ensemble learning, on-line data
Spatial
2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future
media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or
redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 2
architecture that receives a large satellite image patch as pre-trained on I MAGE N ET [27] as a starting point. Then, the
input and outputs a label for all its pixels at once. They obtained weights serve as a starting point for several copies
also train part of their network in an unsupervised manner of these CNNs, which are further optimized to form the
to ease the lack of labeled training data. Although we do heads of the Hydra. During testing, an image is fed to all
not conduct any experiments on the semantic segmentation Hydra’s heads and their outputs are fused to generate a
task, we believe it could also benefit from the Hydra frame- single decision, as illustrated in Figure 4.
work proposed in this work.
Training Set
3 T HE H YDRA F RAMEWORK
Ensembles of learning algorithms have been effectively used
to improve the classification performance in many computer
vision problems [17], [20], [21]. The reasons for that, as
...
pointed out by Dietterich [5] are: 1) the training phase does
not provide sufficient data to build a single best classifier;
2) the optimization algorithm fails to converge to the global
minimum, but an ensemble using distinct starting points
could better approximate the optimal result; 3) the space
being searched may not contain the optimal global mini-
mum, but an ensemble may expand this space for a better ResNet1
approximation.
Formally, a supervised machine learning Body ResNet2
ResNet
algorithm receives a list of training samples weights
ResNet3
{(x1 , y1 ), (x2 , y2 ), . . . , (xm , ym )} drawn from some
unknown function y = f (x) — where x is the feature
...
ImageNet ResNetm
vector and y the class — and searches for a function h
through a space of functions H, also known as hypotheses, weights
DenseNet1
that best matches y . As described by Dietterich [5],
an ensemble of learning algorithms build a set of DenseNet2
Body
hypotheses {h1 , h2 , . . . , hk } ∈ H with respective weights DenseNet
weights
{w1 , w2 , . . . , wk }, and then provides a classification decision DenseNet3
ȳ through hypotheses fusion. Different fusion approaches
Pk Hydra’s body ...
may be exploited, such as weighted sum ȳ = i=1 wi hi (x), DenseNetn
majority voting, etc.
Hydra is a framework to create ensembles of CNNs. As Hydra’s heads
pointed out by Sharkey [22], a neural network ensemble
can be built by varying the initial weights, varying the
Fig. 3. Hydra’s training flowchart: two CNN architectures, ResNet and
network architecture, and varying the training set. Varying DenseNet, are initialized with ImageNet weights and coarsely optimized
the initial weights, however, requires a full optimization for to form the Hydra’s body. The obtained weights are then refined several
each starting point and thus consumes too many computa- times through further optimization to create the heads of the Hydra.
tional resources. For the same reason, varying the network ResNets and DesNets use the same training set in both optimization
stages (Hydra’s body and Hydra’s heads).
architecture demands caution when it is not possible to
share learned information between different architectures. A key point on ensemble learning, as observed by
Unlike previous possibilities, varying the training set can be Krogh et al. [28] and Xandra et al. [29], is that the hy-
done in any point of the optimization process for one or potheses should be as accurate and as diverse as possi-
multiple network architectures, and, depending on how it is ble. In Hydra, such properties were prompted by apply-
done, it may not impact the final training cost. ing different geometric transformations over the training
Our framework uses two state-of-the-art CNN architec- samples, like zoom, shift, reflection, and rotations. More
tures, Residual Networks (ResNet) [23] and Dense Con- details regarding the proposed framework are presented in
volutional Networks (DenseNet) [24], as shown in Fig- Sections 3.1, 3.2 and 3.3.
ures 3 and 4. They were chosen after experimenting with
them and other architectures: VGG-16 [17], VGG-19 [17],
Inception V3 [25], and Xception [26]. Different reasons con- 3.1 CNN architectures
tribute into making ResNet and DenseNet the best combi- Both CNN architectures used in this work, ResNet (ResNet-
nation of architectures: (1) they outperform any other eval- 50) [23] and DenseNet (DenseNet-161) [24], were chosen for
uated architecture and other architectures previously men- achieving state-of-the-art results in different classification
tioned in Section 2 (see Table 5); (2) they have a reasonable problems while being complementary to each other. Let
level of complementarity; and (3) they achieve similar clas- F` (·) a non-linear transformation — a composite function
sification results individually. The training process depicted of convolution, relu, pooling, batch normalization and so on
in Figure 3 consists in first creating the body of the Hydra by — for layer `. As described by He et al. [23], a ResNet unit
coarsely optimizing the chosen architectures using weights can be defined as:
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 4
ResNet1 ResNet
Input or FC FC FC Softmax
O UTPUT
ResNet2 DenseNet
Shipyard
I NPUT I MAGE
ResNet3
(Testing Set)
...
ResNetm I MAGE CNN
Fusion
... ...
DenseNet1
DenseNet2
... ... ... ...
DenseNet3 Metadata ...
63 classes
...
DenseNetn
Fig. 5. The convolutional neural network architecture used by the Hydra
framework.
Fig. 4. Hydra’s inference flowchart: an input image is fed to all heads of
the Hydra, and each head outputs the probability of each class being the
correct one. A majority voting is then used to determine the final label, that benefit from data augmentation lead many authors [33],
considering that each head votes for its most probable label. [34], [35], [36] to study the effectiveness of this technique.
Different techniques can be considered, such as ge-
ometric (e.g., reflection, scaling, warping, rotation and
translation) and photometric (e.g., brightness and satura-
x` = F` (x`−1 ) + x`−1 (1)
tion enhancements, noise perturbation, edge enhancement,
where x`−1 and x` are the input and output of layer `, and and color palette changing) transformations. However, as
x0 is the input image. ResNets are built by stacking several pointed out by Ratner et al. [37], selecting transformations
of these units, creating bypasses for gradients that improve and tuning their parameters can be a tricky and time-
the back-propagation optimization in very deep networks. consuming task. A careless augmentation may produce
As a consequence, ResNets may end up generating redun- unrealistic or meaningless images that will negatively affect
dant layers. As a different solution to improve the flow of the performance.
gradients in deep networks, Huang et al. [24] proposed to The process of data augmentation can be conducted off-
connect each layer to all preceding layers, which was named line, as illustrated in Figure 6. In this case, training images
a DenseNet unit: undergo geometric and/or photometric transformations be-
fore the training. Although this approach avoids repeating
x` = F` ([x0 , x1 , . . . , x`−1 ]) (2) the same transformation several times, it also considerably
increases the size of the training set. For instance, if we
where [x0 , x1 , . . . , x`−1 ] is the concatenation of the ` pre- only consider basic rotations (90, 180 and 270 degrees)
vious layer outputs. DenseNets also stack several units, and reflections, the FMOW training set would have more
but they use transition layers to limit the number of con- than 2.8 million samples. This would not only increase the
nections. In the end, DenseNets have less parameters than amount of required storage space, but the time consumed
ResNets and compensate this facts through feature reuse. by a training epoch as well.
This difference in behavior may be one of the causes for In our framework, we apply an on-line augmentation,
their complementarity. as illustrated in Figure 7. To this end, different random
We initialize both DenseNet and ResNet with weights transformations were applied to the training samples every
from ImageNet [27]. For each of them, the last fully con- epoch. This way, the number of transformations seen during
nected layer (i.e., the classification layer) is replaced by three training increase with the number of epochs. In addition,
hidden layers with 4096 neurons and a new classification different heads of the Hydra experience different versions of
layer, as illustrated in Figure 5. The extension in these archi- the training set and hardly converge to the same endpoint.
tectures is necessary to take image metadata into account. On-line data augmentations were carried out by the Im-
A vector of metadata that is useful for classification (e.g., ageDataGenerator function from Keras1 . We used random
ground sample distance, sun angle) is concatenated to the flips in vertical and horizontal directions, random zooming
input of the first hidden layer so that subsequent layers learn and random shifts over different image crops. We did not
the best way to combine metadata and visual features. use any photometric distortion to avoid creating meaning-
less training samples.
3.2 On-line data augmentation
The process of artificially augmenting the image samples in 3.3 Fusion and decision
machine learning by using class-preserving transformations To discover the most appropriate label for an input region
is meant to reduce the overfitting in imbalanced classifica- according to the models created by Hydra, we have to
tion problems or to improve the generalization ability in combine the results from all heads. Each head produces a
the absence of enough data [15], [16], [30], [31], [32]. The
breakthroughs achieved by many state-of-the-art algorithms 1. https://keras.io/
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 5
... ...
CNN-Training
Off-line data augmentation
Randomly transformed samples for the 1st epoch (n regions)
...
CNN-Training
CNN-Training
score vector that indicates how well this region matches 4.1.1 Dataset
each class. If a region shows up in more than one image, The FMOW dataset, as detailed by Christie et al. [1], contains
it will have multiple score vectors per head. In this case, almost half of a million images split into training, evaluation
these vectors are summed to form a single vector of scores and testing subsets, as presented in Table 1. It was publicly
per head for a region. After that, each head votes for the released3 in two image formats: JPEG (194 GB) and TIFF (2.8
class with the highest score within its score vector for that TB). For both formats, high-resolution pan-sharpened [38]
region. Finally, a majority voting is used to select the final and multi-spectral images were available, although the lat-
label. But if the number of votes is below or equal to half the ter one varies with the format. While TIFF images provided
number of heads, the region is considered a false detection. the original 4/8-band multi-spectral information, the color
space was compressed in JPG images to fit their 3-channel
limitation.
4 E XPERIMENTAL RESULTS
TABLE 1
Table of contents of the FMOW dataset.
2. https://www.topcoder.com/ 3. https://www.iarpa.gov/challenges/fmow.html
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 6
Airport Airport hangar Airport terminal Amusement Aquaculture Archaeological Barn Border checkpoint
(w: 0.6) (w: 1.0) (w: 1.0) park (w: 1.0) (w: 1.0) site (w: 1.0) (w: 1.0) (w: 1.4)
Burial site Car dealership Construction Crop field Dam Debris or rubble Educational insti- Electric substation
(w: 1.0) (w: 1.0) site (w: 1.4) (w: 0.6) (w: 1.0) (w: 0.6) tution (w: 1.4) (w: 1.0)
Factory or power Fire station Flooded road Fountain Gas station Golf course Ground transp. Helipad
plant (w: 1.4) (w: 1.4) (w: 0.6) (w: 1.0) (w: 1.4) (w: 1.0) station (w: 1.0) (w: 1.0)
Hospital Impoverished set- Interchange Lake or pond Lighthouse Military facility Multi-unit residen- Nuclear powerplant
(w: 1.0) tlement (w: 1.0) (w: 1.0) (w: 1.0) (w: 1.0) (w: 0.6) tial (w: 1.0) (w: 0.6)
Office building Oil or gas facility Park Parking lot or Place of worship Police station Port Prison
(w: 1.0) (w: 1.0) (w: 1.0) garage (w: 1.0) (w: 1.0) (w: 1.4) (w: 1.0) (w: 1.0)
Race track Railway bridge Recreational Road bridge Runway Shipyard Shopping mall Single-unit residen-
(w: 1.0) (w: 1.0) facility (w: 1.0) (w: 1.4) (w: 1.0) (w: 1.0) (w: 1.0) tial (w: 0.6)
Smokestack Solar farm Space facility Stadium Storage tank Surface mine Swimming pool Toll booth
(w: 1.4) (w: 0.6) (w: 1.0) (w: 1.0) (w: 1.0) (w: 1.0) (w: 1.0) (w: 1.0)
Tower Tunnel opening Waste disposal Water treatment Wind farm Zoo False detection
(w: 1.4) (w: 0.6) (w: 1.0) facility (w: 1.0) (w: 0.6) (w: 1.0) (w: 0.0)
Fig. 9. List of classes of the FMOW challenge. An image example, a label and an associated weight are presented for each class. The least
challenging classes have a lower weight (0.6) while the most difficult ones have a higher weight (1.4). These weights impact the final classification
score.
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 8
Fig. 10. The class histogram distribution for the training (gray bars) and evaluation sets (black bars) by using the 3-band pansharpened
JPEG/RGB set. Note the highly unbalanced class distribution.
4.1.6 Results
In this section, we report and compare the Hydra per- (mainly confused with port), hospital (confused with edu-
formance in the FMOW dataset. In this experiment, we cational institution), multi unit residential (confused with
use the entire training subset and false detections of the single unit residential), police station (confused with ed-
evaluation subset for training, and the remaining images ucational institution), and office building (confused with
of the evaluation subset for performance computation. fire station, police station, etc). We also had high confusion
The confusion matrix for the best ensemble is presented between nuclear powerplant and port, between prison and
in Figure 12. The hardest classes for Hydra were: shipyard educational institution, between space facility and border
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 9
Fig. 12. Confusion matrix for the Hydra ensemble after 11 training epochs (Figure 14). Each row represents the ground truth label, while the
column shows the label obtained by the Hydra. Large values outside of the main diagonal indicate that the corresponding classes are hard to be
discriminated.
TABLE 5
Accuracy for the NWPU-RESISC45 dataset using training/testing splits
of 10%/90% and 20%/80%. Average accuracy and standard deviation
were reported by using five different random splits. Evaluated
classifiers may use on-line data augmentation (O) and/or ImageNet
weights (I ). Classifiers marked with † were reported by Cheng et al. [4].
Accuracy
10%/90% 20%/80%
Hydra (DenseNet + ResNet)O,I 92.44 ± 0.34 94.51 ± 0.21 Fig. 14. Evaluation of the ensemble accuracy. (top) Standard ensemble,
DenseNetO,I 91.06 ± 0.61 93.33 ± 0.55 with all classifiers trained from the starting point. Fusion results are
shown for each epoch. (middle) The first four epochs show the accuracy
ResNetO,I 89.24 ± 0.75 91.96 ± 0.71 of the Hydra’s body using DenseNet and Resnet. The following epochs
VGG16I † 87.15 ± 0.45 90.36 ± 0.18 show the accuracy for each head separately as well as for the fusion of
all heads. (bottom) Comparison between standard ensemble and Hydra.
AlexNetI † 81.22 ± 0.19 85.16 ± 0.18
GoogleNetI † 82.57 ± 0.12 86.02 ± 0.18
AlexNet† 76.69 ± 0.21 79.85 ± 0.13 R EFERENCES
VGG16† 76.47 ± 0.18 79.79 ± 0.15
[1] G. Christie, N. Fendley, J. Wilson, and R. Mukherjee, “Functional
GoogleNet† 76.19 ± 0.38 78.48 ± 0.26 map of the world,” arXiv preprint arXiv:1711.07846, 2017.
BoVW† 41.72 ± 0.21 44.97 ± 0.28 [2] K. Nogueira, O. A. Penatti, and J. A. dos Santos, “Towards better
exploiting convolutional neural networks for remote sensing scene
LLC† 38.81 ± 0.23 40.03 ± 0.34 classification,” Pattern Recognition, vol. 61, pp. 539–556, 2017.
BoVW + SPM† 27.83 ± 0.61 32.96 ± 0.47 [3] S. Chaib, H. Liu, Y. Gu, and H. Yao, “Deep feature fusion for
Color histograms† 24.84 ± 0.22 27.52 ± 0.14 vhr remote sensing scene classification,” IEEE Transactions on
Geoscience and Remote Sensing, vol. 55, no. 8, pp. 4775–4784, 2017.
LBP† 19.20 ± 0.41 21.74 ± 0.18 [4] G. Cheng, J. Han, and X. Lu, “Remote sensing image scene
GIST† 15.90 ± 0.23 17.88 ± 0.22 classification: Benchmark and state of the art,” Proceedings of the
IEEE, vol. 105, no. 10, pp. 1865–1883, 2017.
[5] T. G. Dietterich, Ensemble Methods in Machine Learning. Springer
Berlin Heidelberg, 2000, pp. 1–15.
the most recent state-of-the-art developments. [6] X. Xu, W. Li, Q. Ran, Q. Du, L. Gao, and B. Zhang, “Multisource
remote sensing data classification based on convolutional neural
network,” IEEE TGRS, vol. PP, no. 99, pp. 1–13, 2017.
[7] G. Cheng, Z. Li, J. Han, X. Yao, and L. Guo, “Exploring hierarchi-
ACKNOWLEDGMENTS cal convolutional features for hyperspectral image classification,”
IEEE Transactions on Geoscience and Remote Sensing (TGRS), vol. 56,
Part of the equipments used in this project are supported by
no. 11, pp. 6712–6722, Nov 2018.
a grant (CNS-1513126) from NSF-USA . The Titan Xp used [8] G. Cheng, P. Zhou, and J. Han, “Learning rotation-invariant
for this research was donated by the NVIDIA Corporation. convolutional neural networks for object detection in vhr optical
TO APPEAR IN IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019 12
remote sensing images,” IEEE Transactions on Geoscience and Remote [33] K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman, “Return
Sensing (TGRS), vol. 54, no. 12, pp. 7405–7415, Dec 2016. of the devil in the details: Delving deep into convolutional nets.”
[9] E. Li, J. Xia, P. Du, C. Lin, and A. Samat, “Integrating multilayer in BMVC, 2014.
features of convolutional neural networks for remote sensing [34] T. DeVries and G. W. Taylor, “Dataset Augmentation in Feature
scene classification,” IEEE TGRS, vol. 55, no. 10, pp. 5653–5665, Space,” ArXiv e-prints, 2017.
2017. [35] L. Perez and J. Wang, “The Effectiveness of Data Augmentation in
[10] X. Lu, X. Zheng, and Y. Yuan, “Remote sensing scene classification Image Classification using Deep Learning,” ArXiv e-prints, 2017.
by unsupervised representation learning,” IEEE TGRS, vol. 55, [36] S. C. Wong, A. Gatt, V. Stamatescu, and M. D. McDonnell, “Under-
no. 9, pp. 5148–5157, 2017. standing data augmentation for classification: When to warp?” in
[11] G. Cheng, C. Yang, X. Yao, L. Guo, and J. Han, “When deep International Conference on Digital Image Computing: Techniques and
learning meets metric learning: Remote sensing image scene clas- Applications, 2016, pp. 1–6.
sification via learning discriminative cnns,” IEEE TGRS, vol. 56, [37] A. J. Ratner, H. R. Ehrenberg, J. D. Zeshan Hussain, and C. Re,
no. 5, pp. 2811–2821, May 2018. “Learning to compose domain-specific transformations for data
augmentation,” in NIPS, 2017, pp. 1–11.
[12] Y. Yang and S. Newsam, “Bag-of-visual-words and spatial exten-
[38] H. Li, L. Jing, Y. Tang, Q. Liu, H. Ding, Z. Sun, and Y. Chen,
sions for land-use classification,” in SIGSPATIAL, 2010, pp. 270–
“Assessment of pan-sharpening methods applied to worldview-
279.
2 image fusion,” in IEEE IGARSS, 2015, pp. 3302–3305.
[13] D. Dai and W. Yang, “Satellite image classification via two-layer [39] D. M. W. Powers, “Evaluation: From precision, recall and f-
sparse coding with biased image representation,” IEEE Geoscience measure to roc., informedness, markedness & correlation,” Journal
and Remote Sensing Letters, vol. 8, no. 1, pp. 173–176, 2011. of Machine Learning Technologies, vol. 2, no. 1, pp. 37–63, 2011.
[14] G. S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, L. Zhang, and [40] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimiza-
X. Lu, “Aid: A benchmark data set for performance evaluation of tion,” CoRR, vol. abs/1412.6980, 2014.
aerial scene classification,” IEEE TGRS, vol. 55, no. 7, pp. 3965– [41] G. King and L. Zeng, “Logistic regression in rare events data,”
3981, 2017. Political Analysis, vol. 9, pp. 137–163, 2001.
[15] A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classifi-
cation with deep convolutional neural networks,” Commun. ACM,
vol. 60, no. 6, pp. 84–90, 2017. Rodrigo Minetto is an assistant professor
[16] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, at Federal University of Technology - Paraná
D. Erhan, V. Vanhoucke, and A. Rabinovich, “Going deeper with (UTFPR) - Brazil. He received the Ph.D. in com-
convolutions,” in IEEE CVPR, 2015, pp. 1–9. puter science in 2012 from University of Camp-
[17] K. Simonyan and A. Zisserman, “Very deep convolutional net- inas (UNICAMP), Brazil and Université Pierre et
works for large-scale image recognition,” in ICLR, 2015, pp. 1–14. Marie Curie, France (UPMC). His research inter-
[18] Y. Tao, M. Xu, F. Zhang, B. Du, and L. Zhang, “Unsupervised- est include image processing, computer vision
restricted deconvolutional neural network for very high resolution and machine learning. Currently he is a visiting
remote-sensing image classification,” IEEE TGRS, vol. 55, no. 12, scholar at University of South Florida (USF),
pp. 6805–6823, 2017. USA.
[19] M. D. Zeiler, G. W. Taylor, and R. Fergus, “Adaptive deconvolu-
tional networks for mid and high level feature learning,” in ICCV,
2011, pp. 2018–2025.
[20] H. Li, Z. Lin, X. Shen, J. Brandt, and G. Hua, “A convolutional
neural network cascade for face detection,” in IEEE CVPR, 2015,
pp. 5325–5334. Maurı́cio Pamplona Segundo is a professor
[21] C. Ding and D. Tao, “Trunk-branch ensemble convolutional neural of the Department of Computer Science at the
networks for video-based face recognition,” IEEE TPAMI, vol. PP, Federal University of Bahia (UFBA). He received
no. 99, pp. 1–11, 2017. his BSc, MSc and DSc in Computer Science
[22] A. J. Sharkey, Ed., Combining Artificial Neural Nets: Ensemble and from the Federal University of Paraná (UFPR).
Modular Multi-Net Systems, 1st ed. Springer-Verlag, Inc., 1999. He is a researcher of the Intelligent Vision Re-
search Lab at UFBA and his areas of expertise
[23] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for
are computer vision and pattern recognition. His
image recognition,” in IEEE CVPR, 2016.
research interests include biometrics, 3D recon-
[24] G. Huang, Z. Liu, L. van der Maaten, and K. Q. Weinberger, struction, accessibility tools, medical images and
“Densely connected convolutional networks,” in IEEE CVPR, vision-based automation.
2017, pp. 4700–4708.
[25] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna,
“Rethinking the inception architecture for computer vision,”
CoRR, vol. abs/1512.00567, 2015. [Online]. Available: http:
//arxiv.org/abs/1512.00567
Sudeep Sarkar is a professor of Computer Sci-
[26] F. Chollet, “Xception: Deep learning with depthwise separable ence and Engineering and Associate Vice Pres-
convolutions,” CoRR, vol. abs/1610.02357, 2016. [Online]. ident for Research & Innovation at the University
Available: http://arxiv.org/abs/1610.02357 of South Florida in Tampa. He received his MS
[27] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, and PhD degrees in Electrical Engineering, on
“ImageNet: A Large-Scale Hierarchical Image Database,” in IEEE a University Presidential Fellowship, from The
CVPR, 2009. Ohio State University. He is the recipient of the
[28] A. Krogh and J. Vedelsby, “Neural network ensembles, cross National Science Foundation CAREER award
validation and active learning,” in NIPS, 1994, pp. 231–238. in 1994, the USF Teaching Incentive Program
[29] A. Chandra and X. Yao, “Evolving hybrid ensembles of learning Award for Undergraduate Teaching Excellence
machines for better generalisation,” Neurocomputing, vol. 69, no. 7, in 1997, the Outstanding Undergraduate Teach-
pp. 686–700, 2006. ing Award in 1998, and the Theodore and Venette Askounes-Ashford
[30] A. G. Howard, “Some improvements on deep convolutional neu- Distinguished Scholar Award in 2004. He is a Fellow of the American
ral network based image classification,” CoRR, vol. abs/1312.5402, Association for the Advancement of Science (AAAS), Institute of Elec-
2013. [Online]. Available: http://arxiv.org/abs/1312.5402 trical and Electronics Engineers (IEEE), American Institute for Medical
[31] K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in and Biological Engineering (AIMBE), and International Association for
deep convolutional networks for visual recognition,” IEEE TPAMI, Pattern Recognition (IAPR); and a charter member and member of the
vol. 37, no. 9, pp. 1904–1916, 2015. Board of Directors of the National Academy of Inventors (NAI). He has
[32] E. Ahmed, M. Jones, and T. K. Marks, “An improved deep learning 25 year expertise in computer vision and pattern recognition algorithms
architecture for person re-identification,” in IEEE CVPR, 2015, pp. and systems, holds three U.S. patents and has published high-impact
3908–3916. journal and conference papers.