You are on page 1of 8

A Framework Pipeline to Address Imbalanced Class

Distribution Problem in Real-world Datasets


Uma Chinta Adham Atyabi
Computer Science and Engineering Computer Science and Engineering
2023 IEEE 13th Annual Computing and Communication Workshop and Conference (CCWC) | 979-8-3503-3286-5/23/$31.00 ©2023 IEEE | DOI: 10.1109/CCWC57344.2023.10099163

University of Colorado at Colorado Springs University of Colorado at Colorado Springs


Email: uchinta@uccs.edu Email: aatyabi@uccs.edu

Abstract—The imbalanced distribution of the target class on common approaches to address imbalance sample classifica-
the real-world dataset is a common problem in machine learning tion. Aiming to address this problem, studies have focused on
that will lead to biased results favoring the majority class. This using different augmentation techniques and creating new loss
paper proposes a classification framework pipeline that includes
preprocessing and feature extraction with a convolutional neural functions to leverage the model to predict accurately for both
network (CNN), accompanied by data augmentation using hybrid majority and minority classes together [3].
sampling methods and classification, aiming to address the This study focuses on creating a framework combining
problem with the imbalanced distribution of the target class different preprocessing and augmentation structures to address
in a study of emotional facial expression detection. The study the problem of imbalance classification. The proposed frame-
utilizes two datasets for the evaluation of the proposed framework
a) Facial expression (FER2013) dataset containing 30k facial work utilizes Convolutional Neural Network (CNN) to extract
RGB images and b) Autism Brain Imaging Data Exchange deep features and includes a hybrid sampling method for data
(ABIDE) dataset containing 1.1k neuroimaging samples (resting- augmentation and a classification method for class prediction.
state functional magnetic resonance imagery-fMRI data) of kids Four well-known classification algorithms of Random Forest
with and without autism spectrum disorder (ASD). The proposed (RF), Supported Vector Machine (SVM), K-Nearest neighbor
model extracted features using a pre-trained ResNet 50 model
and augmented data using a hybrid sampling technique with (KNN), and Extreme Boost (XGB) are considered. To assess
under-sampling and over-sampling methods. Four classifiers are the feasibility of the proposed framework and its standing with
considered in the final step, including Random Forest (RF), respect to state-of-the-art methods, two imbalanced datasets
Support Vector Machine (SVM), K-Nearest Neighbor (KNN), containing facial expression (FER2013 [4]) and Autism Brain
and Extreme Boosting (XGB). The highest average classification Imaging Data Exchange (ABIDE [5]) are used. The FER2013
performance achieved by the proposed framework on FER
2013 dataset is gained by RF (97.09%), outperforming this dataset contains seven classes representing emotional faces in
dataset’s state-of-the-art (96%). XGB achieved 96.83% average which the class sample size ranges from 0.015% to 25.7%.
classification accuracy in the ABIDE dataset in ASD vs. non- ABIDE dataset includes 1112 samples from 539 individuals
ASD classification task, outperforming state-of-the-art in this with ASD and 573 individuals with typical control aging
dataset (92%). The results indicate the feasibility of the proposed between 7 and 64.
framework in addressing imbalanced data classification and
improving overall classification results. The paper’s outline is as follows: Section II represents
Index Terms—facial expression, Autism Spectrum Disorder, related works. Section III introduces the two datasets being
convolutional neural network, Imbalance classification used in this study. Section IV introduces the ResNet-based
CNN layout architecture used for deep feature extraction, the
I. I NTRODUCTION proposed over-under-sampling data augmentation mechanism,
Dealing with an imbalanced dataset is one of the main and various classification methods used. Section V presents
complexities of using classification models in real-world ap- the proposed framework for addressing the imbalanced data
plications. In imbalanced datasets, the classes have unequal classification problem. Sections VI to VIII report results
representation, with one or more classes represented by sub- achieved with the proposed framework on the two datasets
stantially more samples than others. Conventional approaches used in the study and present discussion and conclusion.
for training classification models with these types of datasets
II. R ELATED W ORK
result in biased training because the model will be less capable
of identifying underrepresented classes [1]. In such scenarios, The imbalance classification is a prevalent problem in real-
the overall accuracy of the classification model is likely to be world applications [6], [7]. An example of such a problem
high since the accuracy of correctly predicting the majority can be seen in medical data. Rare diseases are less likely
class is high [2]. Oversampling the minority class, under- to emerge in the wild, resulting in smaller sample sizes than
sampling the majority class, and using hybrid methods are more common diseases and healthy controls. Classifying these
samples is a bit more challenging than other common diseases
because of the size of the dataset.
979-8-3503-3286-5/23/$31.00 ©2023 IEEE

0746
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
Peng et al. [8] proposed a method to calculate the prediction al. [19] proposed a fast network intrusion detection system
loss to improve the performance in favor of the minority class. using adaptive synthetic oversampling (ADSYN), increasing
Authors have presented new regularization DropOut, Drop- the minority samples by using ADSYN to solve the issue with
Block, and Modified RandAugment mechanisms. Assessment a low detection rate caused due to the imbalanced data, which
is done using HAM dataset [9] and ISIC 2018 challenge test- improved the overall accuracy rate upto 92.57%, 89.56%, and
set [10]. The HAM dataset is a well-known skin image dataset 99.91% on three test sets respectively.
publicly available through the ISIC archive, containing 10,015 Classification of autism spectrum disorder (ASD) diagnosis
skin lesion images in seven skin lesion classes. More than 80% is another problem that requires specific classification algo-
of all instances in this dataset belong to only two classes, rithms due to the imbalanced nature of the autism study
while the two minority classes have only 2% of all instances. datasets. Reliance on tedious and time-consuming psycholog-
Using their proposed method, Peng et al. [8] achieved 86% ical measures such as ADOS (Autism Diagnostic Observation
classification accuracy. Schedule) for ASD diagnosis that requires a high level of
Sungho et al. [6] introduced a method based on Generative skills for correctly administering the test hinders the chances of
Adversarial Network (GAN) [11], which takes a random noise early detection of ASD [20]–[28]. Among various approaches
input and generates a new dataset and concluded that their used for the early detection of autism, Machine Learning
proposed method would increase imbalanced classification by (ML) methods demonstrated promising performances [20],
2.73%, 3.13%, 5.19%, 6.78%, and 3.31% on the MNIST, [21], [24], [25]. Jagadeesan et al. [29] used electroencephalo-
EMNIST, fashion-MNIST, CIFAR-10, and CINIC-10 dataset gram (EEG) signals to detect ASD. The study utilized SVM
respectively. Al-Badarneh et al. [12] proposed a combination and decision tree [30] models for classification and achieved
of heuristic optimization techniques (grey wolf optimization, 93.08% precision and 87.44% recall using 500 samples.
particle swarm optimization, and the salp swarm algorithm) Based on the previous studies [31] [32] [33] [34], it is
with multi-layer perceptron to increase the accuracy of im- clear that classifying samples in an imbalanced dataset is
balance classification [13]. The authors concluded that the challenging and takes a longer time to balance the dataset for
problem of imbalance sample classification could be addressed accurate results, which may lead to the loss of some samples
with their method. At the same time, these metaheuristic for pseudo-balancing. Studies have focused on using differ-
algorithms showed no clear superiority over each other in ent data augmentation, pseudo-sampling, and preprocessing
classifying imbalanced samples. structures to deal with imbalance classification problems. This
Mollahosseini et al. [7] reported the prediction of a facial article proposes a novel data augmentation and preprocessing
expression using an imbalance class sample dataset. The approach with hybrid sampling methods. First, Convolutional
authors utilized a deep neural network structure incorporating Neural Network (CNN) is used to extract deep features, and
an inception layer and reported an overall facial recognition later, a hybrid data sampling approach is used to augment
accuracy of 81.7%. Georgescu et al. [14] proposed a fea- the dataset. Random Forest, Supported Vector Machine, K-
ture extraction network based on a famous VGG network. Nearest neighbor, and Extreme boost are used for sample
Georgescu et al. utilized three variants of the VGG network, classification. FER2013 and ABIDE datasets are used to
namely VGG-13, VGG-f, and VGG-face networks, as their evaluate the proposed framework.
deep feature extraction models [15] and used Supported Vector
III. M ATERIALS AND M ETHODS
Machine (SVM) [16] for classification. The authors reported
87.76% accuracy for the imbalance classification of the FER+ A. Data
dataset. The study findings highlighted the importance of 1) Facial Expression Recognition (FER2013)
preprocessing in developing and tuning the network model. The facial expression dataset used in this study contains
Li et al. [17] proposed a new CNN model incorporating 28,000 training, 3,500 validation, and 3,500 testing instances,
an attention mechanism in its architecture. Their proposed reflecting facial images of a total of 7 emotions, including
work focused on extracting useful information from lips and “sad”, “happy”, “angry”, “disgust”, “neutral,”, “surprised”, and
eye regions to recognize facial expressions. Using the FER “fear” [4]. Approximately 25.7% of samples are associated to
dataset, the study demonstrated that only using information the “happy” class, and a low percentage of 0.015% of samples
from the lips region can predict facial expressions with a in the dataset is associated to “disgust” class. The facial images
75.82% prediction accuracy in addition to highlighting the role in the dataset have a size of 48 × 48 pixels. Figure 1 depicts
of the attention layer in terms of detecting facial expressions four samples from the FER2013 dataset.
faster and more reliably even when the only partial face is As shown in Figure 1, each emotion is elaborated. Due to
visible in the image. the low resolution of the images, e.g., only using 48 × 48
A hybrid algorithm is proposed by Yang et al. [18], pixels, the image quality is compromised, which can impact
which uses a combination of synthetic minority over-sampling the classification accuracy and the overall ability to identify
technique (SMOTE) and edited nearest neighbor (ENN) in deep features to some extent.
which they oversampled diabetes and missed abortion datasets 2) Autism Brain Imaging Data Exchange(ABIDE)
to balance the classes which resulted in 95.6% accuracy Autism Spectrum Disorder is a neurodevelopmental disor-
when compared to other sampling techniques tested. Liu et der associated with impaired social behavior and repetitive

0747
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
The difference between an overfitting problem and clas-
sifying imbalanced samples is important, in the sense that
in imbalanced sample classification, some classes have more
samples than others, while in an overfitting problem, samples
are mostly from one class which results in the predictive model
mistakenly classifying all samples to the same class.
(a) (b) (c) (d) In this study, ResNet 50, which is consisted of 50 layers,
is used as the base network for deep feature extraction. The
Fig. 1. Different types of class in the dataset; (a) happy, (b)sad, (c) angry,
(d) disgust [35] extracted features from the first 10, 20, and 40 layers of
the model are evaluated for feature extraction. The proposed
structure for feature extraction is illustrated in Figure 2.
behaviors. Estimates of the Centers for Disease Control and
Prevention (CDC) indicate that 1 in 44 8-year-old children
are expected to have Autism diagnosis [36]. Autism is asso-
ciated with various phenotypes in social, communicative, and
sensorimotor deficits [37]. Diagnosis of autism is a complex
task that requires assessing characteristic social behaviors and
language skills [38]. In this work, ABIDE dataset [5] is used
as a secondary dataset for assessing the performance of the
proposed framework for dealing with imbalanced data. ABIDE
contains neuroimaging data of 539 ASD patients and 573
Fig. 2. The architecture of ResNet with three paths for feature extraction
typically developing (TD) controls, mostly representing the
resting-state fMRI of each participant.
As it is shown in Figure 2, using features from the 10th
ABIDE contains a different connectivity matrix where each
and 40th layers results in extracting 128 and 512 channels,
cell contains a Pearson correlation coefficient [39]. These
respectively, while the width and length of the image being
coefficients have been extracted from a correlation between
extracted from each layer are different. Increasing the number
the anterior and posterior areas of the brain, whose values
of layers in the CNN layout architecture will reduce the width
range from -1 to 1. If the value is close to 1, it indicates
and height of the output image in subsequent layers. An
that the time series are highly correlated otherwise are anti-
example of extracted features on one picture is presented in
correlated. This dataset contains information from 17 different
Figure 3.
sources gathered from patients aged between 7 and 64 years,
with a median of 14.7 years across groups.
B. Feature extraction
Convolutional neural network (CNN) is a popular deep
learning (DL) model that shows superior performances in
image and video recognition tasks. CNN uses the shared
weight procedure to extract deep features from the input image
and can learn spatial hierarchies of features for the input
image.
Different layout architectures are introduced for CNN, aim-
ing to improve its capabilities. One of the famous exam-
Fig. 3. Extracted features from each layer of Resent 50
ples of such layout architecture modification is observed in
ResNet [40], where a residual layer is used to accommodate a Considering presented features in Figure 3, it is noticeable
proper flow of information in the forward and backward path. that a) in extracted features from the first ten layers, eyes and
ResNet is one of the most successful and applicable models mouth regions have more distinguishable features, reflected
for feature selection and classification of images [41]. We with lighter color pixels in the lower right corner image in
aim to identify the best base structure for our model, ResNet, figure 3, b) through rigorous testing and evaluation of the
VGG, and Inception structures in limited ad-hoc evaluations validation set it is found that using features from the 40th layer
within which ResNet is found to be more stable and identified results in better overall performance, c) extracted features from
as the base model for feature extraction. ResNet can address the 40th layer have a lower shape, i.e., a higher descriptive
overfitting, a common problem when dealing with an imbal- representation of face features, which helps to decrease the
anced dataset. Overfitting impacts the model’s generalizability run time.
through over-the-top performance on the training set and In addition to the assessment above, a shallow model
considerably lower performance with the unseen validation set. with only two convolutions layers and one average pooling
layer [42] is also evaluated, which didn’t give a better result

0748
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
for ABIDE dataset in ASD diagnosis classification. Using only class prediction. The final class prediction is generated by
two convolutions layers hasn’t helped save the input structure. giving priority to one of the classifiers (e.g., XGB, SVM,
In this CNN, the number of filters on the first and second KNN, RF) that have a higher potential in correctly predicting
convolution layers are set to 3 and 8, and the window size for the class label. The results indicated that RF and XGB are
extracting features is set to 11 and 9, respectively. the most commonly used classifiers in FER2013 and ABIDE
dataset [54]. A schema of the RF classifier is presented in
C. Data augmentation
Figure 4 The core unit of random forest classifiers is the
The main issue in the imbalanced dataset is an uneven class decision tree, which is a hierarchical structure built using
sample size. One of the common mechanisms for addressing features of the data. Each decision tree is trained using the
this problem is to use pseudo-sampling methods. Several sam- Classification and Regression Tree (CART) algorithm and
pling methods are introduced by the deep learning community, evaluated using metrics such as Gini impurity, information
including under-sampling [43], over-sampling [44], Synthetic gain, or mean square error (MSE). The output of multiple
Minority Over-sampling Technique (SMOTE) [45], Modified decision trees is combined to reach a single result. Random
Synthetic Minority Over-sampling Technique (MSMOTE) [46] forest is ideal for dealing with imbalanced data because
and so on. These techniques tend to eliminate or generate of its ability to incorporate class weights into the random
samples in the dataset. forest classifier, which is cost-sensitive. Also, it combines the
Through a limited adhoc-evaluation, various combinations sampling technique and ensemble learning, which leads to
of random undersampling (i.e., Tomek-link, edited nearest- down-sampling the majority class and growing trees on a more
neighbors [47]) and random oversampling (i.e., SMOTE, balanced dataset.
MSMOTE [44]) are assessed where the best results are
achieved using random undersampling and SMOTE. Thus in
this study, the results of under-sampling are combined with
an over-sampling method, i.e., MSMOTE, to augment the
dataset. Through this mechanism, the under-sampling method
decreases the number of instances from the majority class
through random selection. This process continues until the
number of samples in each class becomes even. In addition,
the over-sampling method increases the number of instances Fig. 4. Structure of RF classifier algorithms.
in the minority class to create an even distribution in each
class. Modified SMOTE (MSMOTE), used in this approach, The structure of XGB is demonstrated in Figure 5. The core
is an improved SMOTE algorithm that divides samples of the unit of XGBoost is the decision tree, a hierarchical structure
minority class into three groups of i) safe, ii) border, and built using data features. Each decision tree is trained using
iii) latent noise instances by calculating distances among all the Gradient Boosting algorithm and evaluated using a convex
samples. loss function, such as mean square error (MSE). The output of
Random Oversampling methods create random samples the multiple decision trees is then combined to reach a single
on minority classes to balance the dataset and improve the result.
prediction accuracy on the minority class. This method is an
additive extension to the Forward Stage-wise Regression, as
it generates additional candidate features from under-sampled
minority classes. This helps to improve the predictive model’s
accuracy and provides insights about under-sampled classes.
Each sampling method is applied to the original data after
that concatenated to form a new dataset. The combination of
these sampling methods increases the number of samples in
minority classes which will help reduce the biased nature of
the majority class and create a semi-balanced dataset. The ML
or DL model prediction has shown biased results toward the
majority of classes without using the augmentation techniques,
and the overall accuracy didn’t increase more than 80% [48]–
[50] Fig. 5. Structure of XGB classifier algorithms.

D. Classification
In this study, a set of well-known classification methods, IV. P ROPOSED F RAMEWORK
including Random Forest (RF) [51], Extreme Boost classifier The proposed model consists of feature extraction, prepro-
(XGB) [52], Support Vector Machine (SVM), and K-Nearest cessing, and classification stages. The main steps involved in
Neighbors (KNN) [53] are used to attain the final sample this approach are as follows:

0749
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
• In the first step, using CNN, deep features are extracted
from FER2013 and ABIDE samples using the 40th layer
of ResNet 50. The proposed model is trained on the
Image net dataset [55], and we used the pre-trained
weight without modifications.
• In the second step, augmentation addresses the imbalance
sample problem in datasets while increasing the number
of samples. In this step, samples are presented as images
representing the extracted deep features in the first step
(a) (b)
(output of 40th layer in ResNet 50). The augmentation
phase entails a combination of over-sampling, under- Fig. 7. Distribution of dataset instances; (a) Imbalance, (b) Balance
sampling, and MSMOTE.
• In the third step, some of the predictive models, e.g.,
how the proposed data augmentation method impacts the
SVM, KNN, XGB, and RF, are used to predict class
sample distribution, the result of using the proposed hybrid
labels of testing set samples.
sampling method on the FER2013 dataset is presented in
Grid search, training, and validation sets are utilized to Figure 7. Using the proposed augmentation process reduces the
identify the best hyperparameters for the proposed model [56]. difference between the most (Happy) and the least (Disgust)
The schema of the proposed framework pipeline is illustrated populated classes from 23% to 7%, making classes more
in Figure 6. fairly/closely matched in terms of the number of samples
available for training. Using the proposed hybrid sampling
methods, the number of samples of the minority class is
increased while the relation between minority and majority
classes is preserved, which is shown in Figure 7The difference
between the highest and least populated classes in the ABIDE
dataset decreases from 5% to 1.2% after the data augmenta-
tion.
Five-fold cross-validation is performed to assess classifica-
tion performances with SVM, RF, XGB, and KNN. Accuracy,
precision, recall, confusion matrix, and AUROC are used to
assess the proposed architecture’s performance with FER2013
and ABIDE datasets. The average classification performance
achieved using FER2013 and ABIDE datasets are presented
in Tables I and II respectively.

TABLE I
AVERAGE CLASSIFICATION PERFORMANCE WITH FER2013 DATASET.

Model Accuracy Precision Recall AUROC


RF 97.09% 97.10% 97.05% 0.99
XGB 95.43% 95.45% 94.78% 0.97%
KNN 73.70% 73.84% 67.2% 0.71%
Fig. 6. Structure of the proposed framework. SVM 84.80% 84.81% 80.21% 0.83%

V. E XPERIMENTAL R ESULTS The results indicate that using deep features extracted from
To facilitate better learning, FER2013 and ABIDE datasets the 40th layer of ResNet50 together with the RF classifier
are normalized. In FER2013 (emotional facial images), RGB achieved the highest classification accuracy of 97.09% in the
channel normalization is performed using each RGB channel’s FER2013 dataset. In the ABIDE dataset, the XGB classi-
mean and standard deviation. In the FER2013 dataset, the fier achieved the highest classification accuracy of 96.83%.
average value of green, red, and blue channels are 137.1, Aiming to understand the performance better, the additional
127.2, and 123.9, respectively, and the Standard deviation of measures of Precision, Recall, and AUROC are added to
the channels are 49.5,52.4, and 51.2, respectively. In ABIDE tables.
TABLE II
dataset, a similar normalization process is performed with the AVERAGE CLASSIFICATION PERFORMANCE WITH ABIDE DATASET.
understanding that this process is performed on signals rather
Model Accuracy Precision Recall AUROC
than images, given that ABIDE dataset contains fMRI samples RF 95.76% 95.78% 95.65% 0.99
from ASD and TD participants. Using hybrid sampling intro- XGB 96.83% 97.02% 96.90% 0.99%
KNN 77.45% 77.49% 61.96% 0.81%
duced earlier, data augmentation is performed to address the SVM 94.35% 94.48% 94.12% 0.99%
imbalance distribution of class samples. To better understand

0750
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
The results indicated that in the FER2013 dataset, the classification in specific age groups and within the ASD group
performance of all classes is in the range of 96% except the but focused on well-known phenotypical impacts such as age
“Disgust” class, where 99% prediction accuracy is achieved. (ASD: children vs. adults). Eslami et al. [68] reported 82%
The best hyperparameters for the RF model consist classification accuracy in stratifying children with ASD from
of max-depth=200, max-features=500, min-samples-leaf=3, adults in this dataset. Suman and Sarfaraz [69] reported 80%
min-samples-split=12, and n-estimators=90. For the XGB and 96% classification accuracy in stratifying children and
classifier, the results of hyperparameters tuning are n- adults with ASD using KNN and CNN, respectively.
estimators=40,max-depth=40, and learning-rate=0.8. Python It is noteworthy that Suman and Sarfaraz [69] also re-
3.7, Tensorflow 2.7, and Scikit-learn libraries are used. ported 98.5% classification accuracy in stratifying children
with ASD from non-ASD children using a subset of sam-
VI. D ISCUSSION
ples that removed phenotypical variation. While this per-
Facial expression recognition and prediction of autism di- formance is higher than the average classification perfor-
agnosis are challenging tasks. This study proposed a hybrid mance achieved by our proposed method on this dataset
model for facial expression and autism diagnosis classification. (ResNet50+Augmentation+XGB, 96.83%), the authors only
The achieved results with the proposed approach indicated a utilized samples from children in their evaluation and elim-
3% improvement in the classification of FER2013 compared inated all samples from adults. Given that in the autism re-
to state-of-the-art [57]. Results from a collection of recent search community, it is well-understood that measurements of
articles that have reported classification performances with individuals with autism are influenced by phenotype variation,
FER2013 are presented in Table III. The results indicate a clear we consider the comparison between results of our proposed
improvement with our approach (97.09%) with approximately approach that utilized samples impacted by phenotype effects,
a 1% performance improvement than that of Chowdary et and Suman and Sarfaraz [69] results on the sub-sample of
al. [58] and 3% performance difference with the second best ABIDE without phenotype variations unfair, and we are only
performing method introduced by Ngo et al. [57]. reporting their results for completion.
While not all articles listed in Table III intended to address Within studies that utilized all samples in ABIDE dataset
the imbalanced nature of this dataset as part of their solution, (see Table IV), our proposed method achieved the highest aver-
the comparison is still considered fair given that the same age classification in stratifying ASD from non-ASD, achieving
dataset (FER2013) is used for this assessment. It is also 96.83%, closely followed by Katuwa et al. [67] with 92%.
noticeable that the results reported on our proposed approach
are not the best performance achieved but the average results VII. S TUDY L IMITATIONS
from the cross-validation folds, which further encourage the • ABIDE dataset, used for assessment of the proposed
achieved performance. framework has low number of samples and similar
TABLE III datasets with larger sample sizes might better be posi-
FACIAL EXPRESSION CLASSIFICATION WITH DIFFERENT MODELS . tioned to assess feasibility of the proposed method.
Author Model Accuracy AUROC Recall • This study is limited in the number datasets considered
Kim et al. [30] Multi-task cascade neural 75% - 66.8% for the evaluation. Higher variety of real-world datasets
network
Chen [59] 3D ResNet-RGB 80.6% - - from different problem spaces, with larger number of
Zahara et al. [60] GAN 88.26% - - classes and more severe imbalance sample population can
Ngo et al. [57] CNN + Auxiliary path 93.94% 0.98 -
Li et al. [61] Deep residual network 95.39% - -
better illustrate the limitations, potentials, and generaliz-
ResNet ability of the proposed framework.
Chowdary et al. [58] ResNet 96% - -
Proposed work ResNet+sampling+RF 97.09% 0.99 97.05%
• More extensive ablation study can further assist in better
understanding of the contribution of different components
TABLE IV in the proposed framework.
R ESULTS OF ASD CLASSIFICATION WITH DIFFERENT MODEL .
VIII. C ONCLUSION
Author Model Accuracy AUROC Recall
Gwyn, Roy, and Atay ResNet50, ResNet101 52% - 84%
This study utilized a hybrid data augmentation approach,
et al. [62] including under-sampling, over-sampling, and Modified Syn-
Alexnet, ZF-5net,
Ma et al. [63]
GoogLeNet
69.44% - 92.74% thetic Minority Over-sampling Technique (MSMOTE) meth-
Thabtah et al. [64] Auto encoder 82% - 92% ods to address imbalanced data problems in real-world clas-
Ding et al. [65] ResNet-face18 82.08% - 93.26% sification problems. The proposed deep-learning architecture
Song et al. [66] FaceNet + XGBoost 88% - -
Katuwal et al. [67] XGB 92% 0.92 - included using Resnet50’s 40th layer output, Hybrid data
Proposed work ResNet+sampling+XGB 96.83% 0.99 96.90% augmentation to address over- and under-sampling, and a
final classification layer (e.g., RF, XGB, KNN, SVM). The
Table IV presents a collection of recent studies presenting FER2013 dataset, containing images of emotional faces (7
classification performance on the ABIDE dataset. Given that class), and ABIDE dataset, containing fMRI data of individ-
the ABIDE dataset contains samples from a wide age range uals with and without Autism Spectrum Disorder (ASD), are
of participants, several studies investigated ASD vs. non-ASD used to evaluate the proposed architecture. The result indicated

0751
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
the superiority of the proposed approach in both datasets, [16] G. N. Kouziokas, “Svm kernel based on particle swarm optimized
achieving average classification performances of 97.09% and vector and bayesian optimized svm in atmospheric particulate matter
forecasting,” Applied Soft Computing, vol. 93, p. 106410, 2020.
96.83% with FER2013 and ABIDE datasets, respectively, [17] J. Li, K. Jin, D. Zhou, N. Kubota, and Z. Ju, “Attention mechanism-
outperforming state-of-the-art in both datasets (96% and 92%, based cnn for facial expression recognition,” Neurocomputing, vol. 411,
respectively). The proposed hybrid augmentation approach has pp. 340–350, 2020.
[18] F. Yang, K. Wang, L. Sun, M. Zhai, J. Song, and H. Wang, “A
been able to reduce the difference between least and highest hybrid sampling algorithm combining synthetic minority over-sampling
populated classes to between class sample-size differences technique and edited nearest neighbor for missed abortion diagnosis,”
from 23% to 7% in FER2013 dataset and from 5% to 1.2% BMC Medical Informatics and Decision Making, vol. 22, no. 1, pp. 1–
14, 2022.
in ABIDE dataset, indicating its success in minimizing the [19] J. Liu, Y. Gao, and F. Hu, “A fast network intrusion detection system
imbalance sample problem in these datasets and also positively using adaptive synthetic oversampling and lightgbm,” Computers &
contributing to improved classification performance by provid- Security, vol. 106, p. 102289, 2021.
[20] A. Atyabi, F. Shic, J. Jiang, C. E. Foster, E. Barney, M. Kim, B. Li,
ing more sample representation for under-sampled classes. P. Ventola, and C. H. Chen, “Stratification of children with autism
spectrum disorder through fusion of temporal information in eye-gaze
R EFERENCES scan-paths,” ACM Trans. Knowl. Discov. Data, jun 2022. Just Accepted.
[1] C. Huang, Y. Li, C. C. Loy, and X. Tang, “Learning deep representation [21] B. Li, N. Nuechterlein, E. Barney, C. Foster, M. Kim, M. Mahony,
for imbalanced classification,” in Proceedings of the IEEE conference A. Atyabi, L. Feng, Q. Wang, P. Ventola, et al., “Learning oculomotor
on computer vision and pattern recognition, pp. 5375–5384, 2016. behaviors from scanpath,” in Proceedings of the 2021 International
[2] Q. Zou, S. Xie, Z. Lin, M. Wu, and Y. Ju, “Finding the best classification Conference on Multimodal Interaction, pp. 407–415, 2021.
threshold in imbalanced classification,” Big Data Research, vol. 5, pp. 2– [22] F. Shic, K. J. Dommer, A. Atyabi, M. Mademtzi, R. A. Øien, J. Kientz,
8, 2016. and J. Bradshaw, “Advancing technology to meet the needs of infants
[3] Y. Feng, M. Zhou, and X. Tong, “Imbalanced classification: A paradigm- and toddlers at risk for autism spectrum disorder, in autism spectrum
based review,” Statistical Analysis and Data Mining: The ASA Data disorder in the first years of life: Research, assessment, and treatment,”
Science Journal, vol. 14, no. 5, pp. 383–406, 2021. A Guilford Press, p. 300–351, 2020.
[4] P. Giannopoulos, I. Perikos, and I. Hatzilygeroudis, “Deep learning [23] S. J. Webb, A. J. Naples, A. R. Levin, G. Hellemann, H. Borland,
approaches for facial emotion recognition: A case study on fer-2013,” J. Benton, C. Carlos, T. McAllister, M. Santhosh, H. Seow, et al., “The
in Advances in hybridization of intelligent methods, pp. 1–16, Springer, autism biomarkers consortium for clinical trials: initial evaluation of a
2018. battery of candidate eeg biomarkers,” American Journal of Psychiatry,
[5] C. Craddock, Y. Benhajali, C. Chu, F. Chouinard, A. Evans, A. Jakab, pp. appi–ajp, 2022.
B. S. Khundrakpam, J. D. Lewis, Q. Li, M. Milham, et al., “The [24] B. Li, A. Atyabi, M. Kim, E. Barney, A. Y. Ahn, Y. Luo, M. Aubertine,
neuro bureau preprocessing initiative: open sharing of preprocessed neu- S. Corrigan, T. St. John, Q. Wang, M. Mademtzi, M. Best, and F. Shic,
roimaging data and derivatives,” Frontiers in Neuroinformatics, vol. 7, “Social influences on executive functioning in autism: Design of a
2013. mobile gaming platform,” in Proceedings of the 2018 CHI Conference
[6] S. Suh, H. Lee, P. Lukowicz, and Y. O. Lee, “Cegan: Classification on Human Factors in Computing Systems, CHI ’18, (New York, NY,
enhancement generative adversarial networks for unraveling data imbal- USA), p. 1–13, Association for Computing Machinery, 2018.
ance problems,” Neural Networks, vol. 133, pp. 69–86, 2021. [25] A. Atyabi, B. Li, Y. A. Ahn, M. Kim, E. Barney, and F. Shic, “An
[7] A. Mollahosseini, D. Chan, and M. H. Mahoor, “Going deeper in exploratory analysis targeting diagnostic classification of aac app usage
facial expression recognition using deep neural networks,” in 2016 IEEE patterns,” in 2017 International Joint Conference on Neural Networks
Winter conference on applications of computer vision (WACV), pp. 1–10, (IJCNN), pp. 1633–1640, 2017.
IEEE, 2016. [26] F. Shic, A. J. Naples, E. C. Barney, S. A. Chang, B. Li, T. McAllister,
[8] P. Yao, S. Shen, M. Xu, P. Liu, F. Zhang, J. Xing, P. Shao, B. Kaffen- M. Kim, K. J. Dommer, S. Hasselmo, A. Atyabi, et al., “The autism
berger, and R. X. Xu, “Single model deep learning on imbalanced small biomarkers consortium for clinical trials: evaluation of a battery of
datasets for skin lesion classification,” arXiv preprint arXiv:2102.01284, candidate eye-tracking biomarkers for use in autism clinical trials,”
2021. Molecular Autism, vol. 13, no. 1, pp. 1–17, 2022.
[9] P. Tschandl, C. Rosendahl, and H. Kittler, “The ham10000 dataset, [27] S.-J. Cho, S. Brown-Schmidt, P. De Boeck, and M. Naveiras, “Space-
a large collection of multi-source dermatoscopic images of common time modeling of intensive binary time series eye-tracking data using a
pigmented skin lesions,” Scientific data, vol. 5, no. 1, pp. 1–9, 2018. generalized additive logistic regression model.,” Psychological Methods,
[10] N. Codella, V. Rotemberg, P. Tschandl, M. E. Celebi, S. Dusza, 2022.
D. Gutman, B. Helba, A. Kalloo, K. Liopyris, M. Marchetti, et al., “Skin [28] S. J. Webb, A. J. Naples, A. R. Levin, G. Hellemann, H. Borland,
lesion analysis toward melanoma detection 2018: A challenge hosted J. Benton, C. Carlos, T. McAllister, M. Santhosh, H. Seow, A. Atyabi,
by the international skin imaging collaboration (isic),” arXiv preprint R. Bernier, K. Chawarska, G. Dawson, J. Dziura, S. Faja, S. Jeste,
arXiv:1902.03368, 2019. M. Murias, C. A. Nelson, M. Sabatos-DeVito, D. Senturk, F. Shic,
[11] J. Gui, Z. Sun, Y. Wen, D. Tao, and J. Ye, “A review on generative C. A. Sugar, and J. C. McPartland, “The autism biomarkers consortium
adversarial networks: Algorithms, theory, and applications,” IEEE Trans- for clinical trials: Initial evaluation of a battery of candidate eeg
actions on Knowledge and Data Engineering, 2021. biomarkers,” American Journal of Psychiatry, vol. 180, no. 1, pp. 41–49,
[12] I. Al-Badarneh, M. Habib, I. Aljarah, and H. Faris, “Neuro-evolutionary 2023. PMID: 36000217.
models for imbalanced classification problems,” Journal of King Saud [29] M. Jagadeesan, P. Selvaraj, M. Harikrishnan, T. Kamalavalli, and
University-Computer and Information Sciences, 2020. V. Jayakumar, “Behavioral features based autism spectrum disorder
[13] C. Iwendi, P. K. R. Maddikunta, T. R. Gadekallu, K. Lakshmanna, detection using decision trees,” Annals of the Romanian Society for Cell
A. K. Bashir, and M. J. Piran, “A metaheuristic optimization approach Biology, pp. 8069–8075, 2021.
for energy efficiency in the iot networks,” Software: Practice and [30] Y. H. Kim, M.-J. Kim, H. J. Shin, H. Yoon, S. J. Han, H. Koh, Y. H. Roh,
Experience, vol. 51, no. 12, pp. 2558–2571, 2021. and M.-J. Lee, “Mri-based decision tree model for diagnosis of biliary
[14] M.-I. Georgescu, R. T. Ionescu, and M. Popescu, “Local learning with atresia,” European Radiology, vol. 28, no. 8, pp. 3422–3431, 2018.
deep and handcrafted features for facial expression recognition,” IEEE [31] D. Ramyachitra and P. Manikandan, “Imbalanced dataset classification
Access, vol. 7, pp. 64827–64836, 2019. and solutions: a review,” International Journal of Computing and
[15] W. Wang, C. Zhang, J. Tian, X. Wang, J. Ou, J. Zhang, and J. Li, Business Research (IJCBR), vol. 5, no. 4, pp. 1–29, 2014.
“High-resolution radar target recognition via inception-based vgg (ivgg) [32] S. Kotsiantis, D. Kanellopoulos, P. Pintelas, et al., “Handling imbalanced
networks,” Computational Intelligence and Neuroscience, vol. 2020, datasets: A review,” GESTS international transactions on computer
2020. science and engineering, vol. 30, no. 1, pp. 25–36, 2006.
[33] H. Ali, M. N. M. Salleh, R. Saedudin, K. Hussain, and M. F. Mush-
taq, “Imbalance class problems in data mining: a review,” Indonesian

0752
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.
Journal of Electrical Engineering and Computer Science, vol. 14, no. 3, the combination of sentinel-1 and sentinel-2 data,” GIScience & Remote
pp. 1560–1571, 2019. Sensing, vol. 58, no. 7, pp. 1072–1089, 2021.
[34] A. D. Amirruddin, F. M. Muharam, M. H. Ismail, N. P. Tan, and M. F. [55] B. Recht, R. Roelofs, L. Schmidt, and V. Shankar, “Do imagenet
Ismail, “Synthetic minority over-sampling technique (smote) and logistic classifiers generalize to imagenet?,” in International Conference on
model tree (lmt)-adaptive boosting algorithms for classifying imbalanced Machine Learning, pp. 5389–5400, PMLR, 2019.
datasets of nutrient and chlorophyll sufficiency levels of oil palm (elaeis [56] L. Zahedi, F. G. Mohammadi, S. Rezapour, M. W. Ohland, and M. H.
guineensis) using spectroradiometers and unmanned aerial vehicles,” Amini, “Search algorithms for automated hyper-parameter tuning,” arXiv
Computers and Electronics in Agriculture, vol. 193, p. 106646, 2022. preprint arXiv:2104.14677, 2021.
[35] I. J. Goodfellow, D. Erhan, P. L. Carrier, A. Courville, M. Mirza, [57] Q. T. Ngo and S. Yoon, “Facial expression recognition based on
B. Hamner, W. Cukierski, Y. Tang, D. Thaler, D.-H. Lee, et al., weighted-cluster loss and deep transfer learning using a highly imbal-
“Challenges in representation learning: A report on three machine anced dataset,” Sensors, vol. 20, no. 9, p. 2639, 2020.
learning contests,” in International conference on neural information [58] M. K. Chowdary, T. N. Nguyen, and D. J. Hemanth, “Deep learning-
processing, pp. 117–124, Springer, 2013. based facial emotion recognition for human–computer interaction appli-
[36] M. J. Maenner, K. A. Shaw, A. V. Bakian, D. A. Bilder, M. S. Durkin, cations,” Neural Computing and Applications, pp. 1–18, 2021.
A. Esler, S. M. Furnier, L. Hallas, J. Hall-Lande, A. Hudson, et al., [59] H. Chen, X. Duan, F. Liu, F. Lu, X. Ma, Y. Zhang, L. Q. Uddin,
“Prevalence and characteristics of autism spectrum disorder among and H. Chen, “Multivariate classification of autism spectrum disorder
children aged 8 years—autism and developmental disabilities monitoring using frequency-specific resting-state functional connectivity—a multi-
network, 11 sites, united states, 2018,” MMWR Surveillance Summaries, center study,” Progress in Neuro-Psychopharmacology and Biological
vol. 70, no. 11, p. 1, 2021. Psychiatry, vol. 64, pp. 1–9, 2016.
[37] H. Hodges, C. Fealko, and N. Soares, “Autism spectrum disorder: [60] L. Zahara, P. Musa, E. P. Wibowo, I. Karim, and S. B. Musa, “The
definition, epidemiology, causes, and clinical evaluation,” Translational facial emotion recognition (fer-2013) dataset for prediction system of
pediatrics, vol. 9, no. Suppl 1, p. S55, 2020. micro-expressions face using the convolutional neural network (cnn)
[38] H. Tager-Flusberg and E. Caronna, “Language disorders: autism and algorithm based raspberry pi,” in 2020 Fifth International Conference
other pervasive developmental disorders,” Pediatric Clinics of North on Informatics and Computing (ICIC), pp. 1–9, IEEE, 2020.
America, vol. 54, no. 3, pp. 469–481, 2007. [61] B. Li and D. Lima, “Facial expression recognition via resnet-50,”
[39] A. S. Heinsfeld, A. R. Franco, R. C. Craddock, A. Buchweitz, and International Journal of Cognitive Computing in Engineering, vol. 2,
F. Meneguzzi, “Identification of autism spectrum disorder using deep pp. 57–64, 2021.
learning and the abide dataset,” NeuroImage: Clinical, vol. 17, pp. 16– [62] T. Gwyn, K. Roy, and M. Atay, “Face recognition using popular deep
23, 2018. net architectures: A brief comparative study,” Future Internet, vol. 13,
[40] Z. Lu, Y. Bai, Y. Chen, C. Su, S. Lu, T. Zhan, X. Hong, and S. Wang, no. 7, p. 164, 2021.
“The classification of gliomas based on a pyramid dilated convolution [63] Z. Ma, Y. Ding, B. Li, and X. Yuan, “Deep cnns with robust lbp guiding
resnet model,” Pattern Recognition Letters, vol. 133, pp. 173–179, 2020. pooling for face recognition,” Sensors, vol. 18, no. 11, p. 3876, 2018.
[41] S. Targ, D. Almeida, and K. Lyman, “Resnet in resnet: Generalizing [64] F. Thabtah and D. Peebles, “A new machine learning model based on
residual architectures,” arXiv preprint arXiv:1603.08029, 2016. induction of rules for autism detection,” Health informatics journal,
[42] P. Satti, N. Sharma, and B. Garg, “Min-max average pooling based filter vol. 26, no. 1, pp. 264–286, 2020.
for impulse noise removal,” IEEE Signal Processing Letters, vol. 27, [65] A. Di Martino, D. O’connor, B. Chen, K. Alaerts, J. S. Anderson,
pp. 1475–1479, 2020. M. Assaf, J. H. Balsters, L. Baxter, A. Beggiato, S. Bernaerts, et al.,
[43] K. S. Prendergast, M. H. Menz, K. W. Dixon, and P. W. Bateman, “The “Enhancing studies of the connectome in autism using the autism brain
relative performance of sampling methods for native bees: an empirical imaging data exchange ii,” Scientific data, vol. 4, no. 1, pp. 1–15, 2017.
test and review of the literature,” Ecosphere, vol. 11, no. 5, p. e03076, [66] A. Gupta, A. Gupta, S. Saxena, R. Kaliyaperumal, D. Upreti, and
2020. A. Kazim, “Face mask detector using machine learning applications,”
[44] A. R. Rahmani, M. Leili, G. Azarian, and A. Poormohammadi, “Sam- INTERNATIONAL JOURNAL OF SPECIAL EDUCATION, vol. 37,
pling and detection of corona viruses in air: A mini review,” Science of no. 3, 2022.
The Total Environment, vol. 740, p. 140207, 2020. [67] G. J. Katuwal, S. A. Baum, N. D. Cahill, and A. M. Michael, “Divide
[45] H. Mansourifar and W. Shi, “Deep synthetic minority over-sampling and conquer: sub-grouping of asd improves asd detection based on brain
technique,” arXiv preprint arXiv:2003.09788, 2020. morphometry,” PloS one, vol. 11, no. 4, p. e0153331, 2016.
[46] N. Chakrabarty and S. Biswas, “Navo minority over-sampling technique [68] T. Eslami, V. Mirjalili, A. Fong, A. R. Laird, and F. Saeed, “Asd-diagnet:
(nmote): A consistent performance booster on imbalanced datasets,” a hybrid learning approach for detection of autism spectrum disorder
Journal of Electronics, vol. 2, no. 02, pp. 96–136, 2020. using fmri data,” Frontiers in neuroinformatics, vol. 13, p. 70, 2019.
[47] C. K. Aridas, S. Karlos, V. G. Kanas, N. Fazakis, and S. B. Kotsiantis, [69] S. Raj and S. Masood, “Analysis and detection of autism spectrum dis-
“Uncertainty based under-sampling for learning naive bayes classifiers order using machine learning techniques,” Procedia Computer Science,
under imbalanced data sets,” IEEE Access, vol. 8, pp. 2122–2133, 2019. vol. 167, pp. 994–1004, 2020.
[48] K. Liu, M. Zhang, and Z. Pan, “Facial expression recognition with
cnn ensemble,” in 2016 international conference on cyberworlds (CW),
pp. 163–166, IEEE, 2016.
[49] C. Pramerdorfer and M. Kampel, “Facial expression recognition us-
ing convolutional neural networks: state of the art,” arXiv preprint
arXiv:1612.02903, 2016.
[50] Y. Tang, “Deep learning using linear support vector machines,” arXiv
preprint arXiv:1306.0239, 2013.
[51] P. Palimkar, R. N. Shaw, and A. Ghosh, “Machine learning technique to
prognosis diabetes disease: Random forest classifier approach,” in Ad-
vanced Computing and Intelligent Technologies, pp. 219–244, Springer,
2022.
[52] D. Yu, Z. Liu, C. Su, Y. Han, X. Duan, R. Zhang, X. Liu, Y. Yang,
and S. Xu, “Copy number variation in plasma as a tool for lung
cancer prediction using extreme gradient boosting (xgboost) classifier,”
Thoracic cancer, vol. 11, no. 1, pp. 95–102, 2020.
[53] A. Ali, M. Hamraz, P. Kumam, D. M. Khan, U. Khalil, M. Sulaiman,
and Z. Khan, “A k-nearest neighbours based ensemble via optimal model
selection for regression,” IEEE Access, vol. 8, pp. 132095–132105, 2020.
[54] A. Jamali, M. Mahdianpari, B. Brisco, J. Granger, F. Mohammadi-
manesh, and B. Salehi, “Deep forest classifier for wetland mapping using

0753
Authorized licensed use limited to: University of Grenoble Alpes. Downloaded on August 16,2023 at 14:44:58 UTC from IEEE Xplore. Restrictions apply.

You might also like