Professional Documents
Culture Documents
when to warp?
Sebastien C. Wong Adam Gatt Victor Stamatescu
Defence Science and Technology Australian Defence Force and Mark D. McDonnell
Edinburgh Edinburgh Computational Learning Systems Laboratory
SA, Australia SA, Australia Information Technology and Mathematical Sciences
Email: sebastien.wong@dsto.defence.gov.au University of South Australia
Mawson Lakes
SA, Australia
arXiv:1609.08764v2 [cs.CV] 26 Nov 2016
©2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including
reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or
reuse of any copyrighted component of this work in other works. DOI: ¡DOI No. TBD¿
Stage1: Stage2:
warping
data
feature
-space
Fig. 1. Experimental architecture. Stage 1 was kept the same for all
experiments, while stage 2 consisted of either a backpropagation-trained
neural network, support vector machine or extreme learning machine classifier. Fig. 2. Data warping using elastic deformations [5]. Original MNIST digits
compared to warped digits for α = 1.2 (left) and α = 8 pixels (right).
2 2
error %
error %
1.5 1.5
1 1
0.5 0.5
0 0
500 1000 1500 2000 2500 3000 3500 4000 4500 5000 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
training samples per class training samples per class
Fig. 3. Baseline performance of CNN, CSVM and CELM on MNIST Fig. 4. CNN performance on MNIST. The dashed lines indicate training
using real data. The dashed lines indicate training error %, and the solid error %, and the solid lines indicate test error %.
lines indicate test error %. Lower test error % indicates better performance.
Reducing the difference between training and test error % is indicative of
reducing overfitting. 3
REAL
ELASTIC
2.5 SMOTE
B. CNN Results DBSMOTE
are increased. Again this shows that increasing the number 1.5
of samples resulted in a performance improvement. Most
notably the test error % decreased steadily. The CNN results 1
were consistent, with multiple runs of the same experiment
producing similar error %.
0.5
Augmentation in data-space using elastic deformations gave
a better improvement in error % than augmentation in feature-
0
space. Interestingly, while test error % decreased, training 500 1000 1500 2000 2500 3000 3500 4000 4500 5000
error % also decreased, and thus the gap between test and training samples per class
training was roughly maintained.
Fig. 5. CSVM performance on MNIST. The dashed lines indicate training
Augmentation in feature-space using SMOTE showed error %, and the solid lines indicate test error %.
marginally promising results. Using SMOTE, the test error %
continued to decrease as the number of samples were increased
all the way to 50,000 samples. DBSMOTE encourages overfitting of the training data. Aug-
Augmentation in feature-space using DBSMOTE resulted mentation in data-space using the SMOTE algorithm provided
in a slight improvement on test error %. Another interesting little to no improvement in performance. Even augmentation
observation is that the training error % for SMOTE and in data-space using elastic warping only provided a modest
DBSMOTE followed very similar curves. improvement in test error %, and this only occurred at very
large amounts of data augmentation. However, it should be
C. Convolutional SVM Results noted that the gap between training error % and test error
The performance of the CSVM is shown in Figure 5, which % did decrease steadily as the amount of augmentation was
illustrates how error % varies as the number of samples are increased, thus indicating a reduction in overfitting.
increased. Perhaps the most interesting result is that increasing
the number of synthetic samples using DBSMOTE caused the D. Convolutional ELM Results
performance to degrade, i.e., the error % increased as more The performance of the CELM is shown in Figure 6. This
samples were used. Augmentation in feature-space using the Figure illustrates how the training error and test error varies
DBSMOTE algorithm was not effective in reducing overfitting as the number of samples are increased, which again shows
in the SVM classifier. However, this result is not surprising as that performance typically improves as the amount of synthetic
the DBSMOTE promotes towards the centre of class clusters. samples are increased. However, this improvement is less than
By creating synthetic samples that are “more of the same” that given by increasing the number of real samples.
3
trend is to construct architectures with deep layered hierarchies
REAL of feature transformations with more complex features built
ELASTIC
SMOTE upon simpler features. Nevertheless, these results should also
2.5
DBSMOTE provide insight into performing data augmentation for more
modern architectures. A good overview of label preserving
2
transformations for image data for a more modern classfication
architecture is given by [17].
error %