Professional Documents
Culture Documents
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2017.Doi Number
ABSTRACT Exam proctoring is a hectic task i.e., the monitoring of students’ activities becomes difficult
for supervisors in the examination rooms. It is a costly approach that requires much labor. Also, it is a difficult
task for supervisors to keep an eye on all students at a time. Automatic exam activities recognition is therefore
necessitating and a demanding field of research. In this research work, categorization of students’ activities
during the exam is performed using a deep learning approach. A new deep CNN architecture with 46 layers
is proposed which contains the characteristics of deep AlexNet and SqueezeNet. The model is engineered
first with slight modifications in AlexNet. After then, the squeezed branch-like structure of SqueezeNet is
grafted/embedded at two locations in the modified AlexNet architecture. The model is named as L2-GraftNet
because of dual grafting blocks. The proposed model is first converted to a pre-trained model by performing
its training with SoftMax classifier on the CIFAR-100 dataset. Afterwards, the features of the dataset prepared
for exam activities categorization are extracted from the above-mentioned pre-trained model. The extracted
features are then fed to the atom search optimization (ASO) approach for features optimization. The
optimized features are passed to different variants of SVM and KNN classifiers. The best performance results
attained are on the Fine KNN classifier with an accuracy of 93.88%. The satisfactory results prove the
robustness of the proposed framework. Also, the proposed categorization provides a base for automated exam
proctoring without the need for proctors in the exam halls.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
whether the student is focused on his exam task or he is assess unusual motions during examinations. The presented
watching around [27]. Jalali and Fakhroddin [16] monitor dataset contains six classes to predict the type of exam
exam activities using the webcam. The images captured from deception. Hendryli and Fanany [31] employ multimodal
the webcam are subtracted with the reference image and if decomposable models (MODEC) for features engineering
the subtracted outcome is found more than the threshold and Multi-class Markov Chain Latent Dirichlet Allocation
value, the activity in the image is considered cheating. (MCMCLDA) approach as a predictor to categorize five
Idemudia et al. [28]utilize face biometrics and head poses to classes belonging to the suspicious activities conducted
detect legal and illegal exam activities. To the best of our during exams. These categories include visual contact,
insight, Offline automated EAR methods are very less in exchanging documents, making hand motions to engage with
number in literature. Nishchal et al. [29] by combining both some other student, gazing at the reaction of another student,
gesture and mood interpretation, aim to discover cheating and using a visual aid; and one non-cheating practice. Ahmad
behavior in an examination. The authors use an open pose et al. use two classes for cheating prediction. Speed Up
tool for posture analysis and convolutional neural network Robust Features (SURF) approach is utilized for features
(CNN) base Alexnet architecture for recognizing the exam mining. For developing further understanding of the domain,
deception activity. Four different classes are incorporated EAR can be associated with other activities recognition
such as looking straight, turning backward, stretching the works. Such works include human activity recognition
arms towards the back, and bending down. The total (HAR) [32-34], activities recognition in sports [35-37],
accuracy of the model is claimed as 77.8%. Aini et al. [30] Infants monitoring [38].
employ background subtraction and pixel shift techniques to
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
In the context of activities recognition, many methodologies include SVM [46-51], KNN[52-55] and Decision trees [56,
found for feature extraction in the recent literature. Such 57].
methodologies include traditional feature extraction methods
and deep learning approaches. Apart from feature extraction, III. MATERIAL AND METHODS
several feature optimization approaches are used to reduce This section presents a new CNN-based model named as L2-
the dimensionality of the features. The mainstay methods are GraftNet for EAR. The section describes the proposed model
(a) filter approaches involving feature scoring include PCA and the steps involved to perform EAR. The main steps of
[39], entropy [40], etc., (b) wrapper approaches involving the proposed model include pre-training, Dataset
feature learning include ant colony optimization (ACO) [41] augmentation, feature extraction from pre-trained L2-
genetic algorithms (GA) [42], etc. In the recent past, GraftNet, Feature subset selection from ASO based features
manifold learning [43, 44] is gaining attention for mapping subset selection, and classification. Figure 2 provides a brief
high dimensional features to low dimensional feature spaces. overview of the proposed model of EAR using the proposed
Although manifold learning is easy to implement for CNN network. The mentioned steps are discussed one by one
classification procedures, the deep learning approaches show in the accompanying section.
superior performances many times because of automatic
feature learning [45]. The focus of this manuscript is on deep A. PROPOSED L2-GRAFTNET
features extraction with learning-based feature selection for This work is contributed by a new CNN-based architecture,
activity classification. Some useful classification methods named as L2-GraftNet for EAR.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
TABLE I
THE DETAIL WITH LAYERS CONFIGURATIONS OF L2-GRAFTNET
17 R2 27 × 27 × 256 - - - -
18 CN2 27 × 27 × 256 - - - -
19 P2 13 × 13 × 256 - [2 2] [0 0 0 0] Max pooling 3 × 3
20 BN5 13 × 13 × 256 - - - -
21 GC2(C6) 13 × 13 × 384 3 × 3 × 256 × 384 [1 1] [1 1 1 1] -
22 R3 13 × 13 × 384 - - - -
23 BN7 13 × 13 × 384 - - - -
24 C7 13 × 13 × 192 1 × 1 × 384 × 192 [1 1] Same -
25 BN6 13 × 13 × 192 - - - -
26 C9 13 × 13 × 384 3 × 3 × 384 × 384 [1 1] Same -
27 BN8 13 × 13 × 384 - - - -
28 LR4 13 × 13 × 384 - - - Scale 0.01
29 C8 13 × 13 × 384 5 × 5 × 192 × 384 [1 1] Same
30 LR3 13 × 13 × 384 - - Scale 0.01
31 ADD2 13 × 13 × 384 - - - -
Two groups of 3 ×
32 GC3(C10) 13 × 13 × 384 3 × 192 × 192 [1 1] [1 1 1 1] -
33 R4 13 × 13 × 384 - - - -
Two groups of 3 ×
34 GC4(C11) 13 × 13 × 256 3 × 192 × 128 [1 1] [1 1 1 1] -
35 R5 13 × 13 × 256 - - - -
36 P3 6 × 6 × 256 - [2 2] [0 0 0 0] Max pooling 3 × 3
37 BN9 6 × 6 × 256 - - - -
38 FC12 1 × 1 × 4096 - - - -
39 R6 1 × 1 × 4096 - - - -
40 D1 1 × 1 × 4096 - - - 50% Dropout
41 FC13 1 × 1 × 4096 - - - -
42 R7 1 × 1 × 4096 - - - -
43 D2 1 × 1 × 4096 - - - -
44 FC14 1 × 1 × 100 - - - -
45 SOFTMAX 1 × 1 × 100 - - - -
46 OUTPUT - - - - -
The proposed model is developed after studying two famous proposed L2-GraftNet. Figure 3 depicts the architectural
CNN networks such as AlexNet [58] and SqueezeNet [59]. view of L2-GraftNet. The detailed architecture is described
The AlexNet comprises five convolutional and three fully as follows: the input size is the same as that of AlexNet i.e.,
connected layers. Along with these layers, the model also 227×227×3. The network starts with a convolution (C1)
contains three pooling, seven ReLU, two dropouts, and layer, after which the first grafting of fire block (GB1) (looks
SoftMax layers. The proposed model consists of 46 layers like Squeeze Net’s fire module) is embedded.
including input and output layers. Starting with AlexNet, The GB1 contains ten layers in three branches that emerge
several new layers including batch normalization and the fire from ReLU (R1) layer. The first branch consists of a batch
modules like structures of SqueezeNet are added to form the normalization layer (BN2). The second branch contains four
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
passage of atoms. The optimization function is used as studied from [52-55]. The classifiers are evaluated on
follows: numerous performance evaluation metrics. Based on
1 experimentation, the two best classifiers observed are CSVM
𝑅𝑜𝑜𝑡 𝑀𝑒𝑎𝑛 𝑆𝑞𝑢𝑎𝑟𝑒 𝐸𝑟𝑟𝑜𝑟 = 𝑅𝑀𝑆𝐸 = √ (7)
𝑁(𝜌 −Α )2 𝑖 𝑖 and FKNN. FKNN is observed with highest accuracy while
where N illustrates the number of training samples, 𝜌𝑖 is the CSVM is found with best accuracy at second place. The
ith categorization outcome and Α𝑖 is the actual value to be detailed results and experiments are presented in the results
predicted. The main aim of the ASO algorithm is to lessen the section.
value of the optimization function. An atom community of N
is suggested (represented as 𝜚𝑖 = [𝜚𝑖1 , 𝜚𝑖2 , … , 𝜚𝑖𝑆 ], 𝑖 = E. DATASET AUGMENTATION
1,2, … , 𝑁) in S-dimensional space for minimization solution. Deep learning requires a huge dataset for robust performance
In this space, the Euclidean distance is represented as 𝑑𝑖𝑗 , i and at the training level. To improve the size of the dataset,
j depict the number of the atoms. By the force augmentation is performed. Four different types of
specified𝐹𝑜𝑟𝑐𝑒𝑖𝑗 (which is the change known as Lennard- augmentations are chosen as mirroring, adding Gaussian
Jones (L-J) potential), atoms communicate with one another noise, adding salt and pepper noise, and color-shifting (see
as follows: Figure 4 for visualization). This increases the size of the
24 dataset to four times.
𝐹𝑜𝑟𝑐𝑒𝑖𝑗 = 14 8 (8)
Γ Γ
Γ2 [2( ) −( ) ]
𝑑𝑖𝑗 𝑑𝑖𝑗
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
Wah Campus Pakistan. The dataset consists of a total of 2268 augmentation is performed by following the procedure
snaps of student activities obtained while offline defined in section 3.5. Table II portrays the size of the dataset
examinations are setup through Surveillance cameras. The before and after the augmentation.
dataset is annotated from 50 acquired video streams. The TABLE II
CUI-EXAM DATASET DETAIL
bounding boxes of individual students are cropped from the
Class Original Augmented
frames of video streams manually and separated category-
Back watching 406 2030
wise according to 6 classes. The action categories are: (a)
Front watching 285 1425
seeing back, (b) watching towards the front, (c) normal
performing, (d) passing gestures to other fellows, (e) Normal 560 2800
watching towards left or right, and (f) other suspicious Showing gestures 326 1630
actions (see Figure 5 for sample dataset snaps). The dataset Side watching 379 1895
is challenging because many bounding boxes acquired are suspicious 312 1560
from images that are far from the camera and have a blurring Total 2268 11340
effect. To increase the size of the dataset, dataset
FIGURE 5. Some sample images from the CUI Exam dataset representing six classes: (a) seeing back, (b) watching towards the front, (c) normal
performing, (d) passing gestures to other fellows, (e) watching towards left or right, and (f) other suspicious actions
TruePositiveValues+TrueNegativeValues
Accuracy (Ac) = (14)
TruePositiveValues+TrueNegativeValues+FalsePositiveValues+FalseNegativeValues
TruePositiveValues
𝑆𝑒𝑛𝑠𝑖𝑡𝑖𝑣𝑖𝑡𝑦 (𝑆𝑖) = (15)
TruePositiveValues+FalseNegativeValues
TrueNegativeValues
𝑆𝑝𝑒𝑐𝑖𝑓𝑖𝑐𝑖𝑡𝑦 (𝑆𝑝) = (16)
TrueNegativeValues+FalsePositiveValues
The area under the ROC curve (AUC)=
TruePositiveValues+TrueNegativeValues
(17)
TruePositiveValues+TrueNegativeValues+FalsePositiveValues+FalseNegativeValues
TruePositiveValues
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 (𝑃𝑟) = (18)
TruePositiveValues+FalsePositiveValues
Precision×Recall
F − measure (FM) = 2 × (19)
Precision×Recall
𝐺 − 𝑀𝑒𝑎𝑛 (𝐺𝑀) = √𝑇𝑃𝑅 × 𝑇𝑁𝑅 (20)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
5-folds cross-validation scheme is used for training and visualizations of features taken at various convolution layers
testing purposes. Figure 6 represents some intermediate of the proposed L2-GraftNet.
FIGURE 6. L2-GraftNet visualizations of an image at different convolution layers (a) C1, (b) C2, (c) C3, (d) C4, (e) GC1(C5), (f) GC2(C6), (g) C7, (h) C8, (i)
C9, (j) GC3(C10), (k) GC4(C11)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
percent in terms of accuracy was reached by CSVM. In Table presented. Similarly, Figure 8 contains a time plot and speed
IV, the findings of the research for this experiment are of training in this experiment.
FIGURE 7. ASO Fitness charts at different number of selected features: (a) 100, (b) 250, (c) 500, (d) 750 and (e) 1000
TABLE IV
PERFORMANCE RESULTS OF EXPERIMENT 1 (100 FEATURES)
Classifier Ac (%) Si (%) Sp (%) Pr (%) FM (%) GM (%)
LSVM 48.02 49.61 47.67 17.13 25.46 48.63
QSVM 76.77 84.63 75.06 42.52 56.61 79.70
FGSVM 64.89 68.67 64.07 29.42 41.19 66.33
MGSVM 62.03 69.21 60.46 27.62 39.49 64.69
CGSVM 42.29 45.57 41.58 14.53 22.04 43.53
CSVM 85.85 91.48 84.62 56.46 69.83 87.98
CKNN 79.20 89.90 76.86 45.87 60.74 83.13
CrKNN 52.69 60.79 50.92 21.26 31.51 55.64
FKNN 92.43 95.76 91.71 71.58 81.92 93.71
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
FIGURE 8. Training time (sec) and prediction speed (obs/sec) plot with 100 features
2) EXPERIMENT # 2: (250 SELECTED FEATURES) of SVM and KNN. The best result of 93.17 percent accuracy
The first experiment is accomplished by picking250 features is obtained in this test by FKNN. The second-highest
by the ASO algorithm. The size of the feature matrix performance of 90.03 percent in terms of accuracy is reached
becomes 4536 × 250. 5-fold cross-validation is applied on by CSVM. In Table V, the outcomes of the research for this
this feature matrix with ground truth class labels on the CUI- experiment are introduced. Similarly, Figure 9covers a time
EXAM dataset. This feature matrix is given to the variants plot and speed of training in this experiment.
TABLE V
PERFORMANCE RESULTS OF EXPERIMENT 2 (250 FEATURES)
Classifier Ac (%) Si (%) Sp (%) Pr (%) FM (%) GM (%)
LSVM 58.15 63.10 57.07 24.27 35.06 60.01
QSVM 83.13 90.34 81.56 51.65 65.72 85.84
FGSVM 36.28 36.21 36.29 11.03 16.90 36.25
MGSVM 83.85 90.69 82.36 52.86 66.79 86.43
CGSVM 52.45 57.29 51.40 20.45 30.14 54.26
CSVM 90.03 93.69 89.23 65.47 77.08 91.43
CKNN 81.20 91.63 78.93 48.67 63.57 85.04
CrKNN 55.74 62.36 54.30 22.93 33.53 58.19
FKNN 93.17 96.90 92.36 73.45 83.56 94.60
FIGURE 9. Training time (sec) and prediction speed (obs/sec) plot with 250 features
3) EXPERIMENT # 3: (500 SELECTED FEATURES) validation is operated on the CUI-EXAM dataset. This
In this experiment,500 features are chosen. The size of the feature matrix is supplied for classification to the SVM and
feature matrix turns out to be 4536×500. Again, 5-fold cross- KNN based classifiers. The best result of 93.50 percent
accuracy is attained in this test by FKNN. And the second-
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
best performance of 91.41 percent in terms of accuracy was in terms of accuracy. Figure 10 encompasses a time plot and
grasped by CSVM. In Table VI, the findings of the research prediction speed of training in this experiment. The training
for this experiment are presented. Also, it is observed that the time is increased as compared to previous experiments
performance of MGSVM is increased to almost six percent because of the increase in the number of features.
TABLE VI
PERFORMANCE RESULTS OF EXPERIMENT 3 (500 FEATURES)
FIGURE 10. Training time (sec) and prediction speed (obs/sec) plot with 500 features
second-best execution of 92.21 percent in terms of accuracy
4) EXPERIMENT # 4: (750 SELECTED FEATURES) was found by CSVM. In Table VII, the results of the research
In this experiment, 750 features are chosen. The size of the for this experiment are presented. Also, it is observed that the
feature matrix is now 4536×750. This feature matrix is performance of MGSVM is now decreased in terms of
provided for the categorization of the SVM and KNN accuracy. Figure 11 encompasses a time plot and prediction
classifiers. The finest result of 93.53 percent accuracy is speed of training in this experiment. The training time is
attained in this test by FKNN. It is observed that there is a increased as compared to previous experiments because of
slight difference in accuracy results when the number of the increase in the number of features.
features is increased from 500 to750 that is 0.03 percent. The
TABLE VII
PERFORMANCE RESULTS OF EXPERIMENT 4 (750 FEATURES)
Classifier Ac (%) Si (%) Sp (%) Pr (%) FM (%) GM (%)
LSVM 66.53 74.09 64.88 31.50 44.21 69.33
QSVM 86.34 91.13 85.30 57.47 70.49 88.17
FGSVM 34.56 29.36 35.69 09.05 13.84 32.37
MGSVM 88.27 91.77 87.51 61.57 73.69 89.62
CGSVM 64.94 71.18 63.58 29.88 42.09 67.27
CSVM 92.21 95.22 91.56 71.09 81.41 93.37
CKNN 82.28 91.38 80.30 50.28 64.87 85.66
CrKNN 57.40 64.73 55.80 24.20 35.23 60.10
FKNN 93.53 96.95 92.78 74.55 84.28 94.84
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
FIGURE 11. Training time (sec) and prediction speed (obs/sec) plot with 750 features
5) EXPERIMENT # 5: (1000 SELECTED FEATURES) and prediction speed of training showing the effective
In this experiment, 1000 features are chosen. The size of the training times of CKNN, CrKNN, and FKNN. All the
feature matrix is now 4536×1000. This feature matrix is experiments are performed on the same classifiers
provided for the categorization of the SVM and KNN (mentioned in classification subsection of material and
classifiers. The finest result of 93.88 percent accuracy is methods section) except experiment #5 where five more
attained in this test by FKNN. It is observed that there is classifiers results are added to further analyze the results. The
again a slight difference of accuracy results when the number additional classifiers include Medium KNN (MKNN),
of features is increased from 7500 to1000 that is 0.35 Weighted KNN (WKNN) [54, 55], Fine Tree (FTree),
percent. The second-best performance of 92.58 percent in Medium Tree (MTree), and Coarse Tree (CTree). Tree based
terms of accuracy was found by CSVM. In Table VIII, the classifiers can further be studied in detail from [56, 57]. It is
findings of the research for this experiment are shown. The observed that FKNN is dominant over all the mentioned
third-highest accomplishment of MGSVM is further classifier (see Table VIII).
declined in terms of accuracy. Figure 12 covers a time plot
TABLE VIII
PERFORMANCE RESULTS OF EXPERIMENT 5 (1000 FEATURES)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
FIGURE 12. Training time (sec) and prediction speed (obs/sec) plot with 1000 features and FKNN classifier
The performance outcomes with 1000 features are selected CIFAR dataset which contains 100 classes. The features of
as best as the performance reduces or sometimes increases the CUI-EXAM dataset are then obtained on the trained
with a very small difference. Table IX presents the confusion network. ASO-based feature selection is adopted on
matrix of best results. The true positive values of six classes extracted features. The classification is applied in different
are highlighted in a diagonal. Overall, all classes show a experiments with varying the number of optimal features to
good recognition rate. The class “Watching towards left or observe the performance of the proposed work. The results
right” shows less accuracy. This is also depicted by ROC and of experiments provided in Tables IV-VIII depict the
AUC in Figure 13. Still, the AUC is found more than 0.94. performance of the proposed system in terms of accuracy
(Ac), sensitivity (Si), Specificity (Sp), Precision (Pr), F1-
D. DISCUSSION measure, and G-mean. These results are obtained with
The manuscript is focused on EAR. For this purpose, the varying the number of features at the feature selection step.
CUI-EXAM dataset is obtained. To increase the All the experiments are performed on the same classifiers.
performance of the proposed approach, the proposed 46 Five additional classifiers result in experiment 5 subsection
layers L2-GraftNet is developed with extensive are added because the performance of experiment 5 is found
experimentation. The model and is first pre-trained on the better with 1000 features.
TABLE IX
CONFUSION MATRIX OF BEST OUTCOME WITH 1000 FEATURES AND FKNN CLASSIFIER
Watching
towards 12 1300 60 17 31 5
front
Normal
22 57 2621 10 47 43
performing
True Classes
Passing
gestures to 5 26 17 1546 30 6
other fellows
Watching
towards left 43 34 80 17 1699 22
or right
Other
suspicious 5 4 19 9 9 1514
actions
Watching Passing Watching Other
Normal
Seeing back towards gestures to towards left suspicious
performing
front other fellows or right actions
Predicted Classes
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
FIGURE 13. ROCs and AUCs of all classes with the best-considered outcome (1000 features) and FKNN classifier
It is analyzed that as the number of features increases, the on the CIFAR-100 dataset. The features are taken from the
performance also increases but after a certain level the rate pre-trained model on the CUI-EXAM dataset. The CUI-
of increase of performance becomes very low. For instance, EXAM dataset is first annotated and augmented before
the accuracy difference between 500 features and 750 taking its image features. These features are forwarded to
features is 0.03 percent. Similarly, the accuracy difference ASO based features optimization method and through
between 750 features and 1000 features is 0.35 percent. various classifier variants of SVM and KNN, the model
However, the accuracy drops on 1500 and 2000 features performance is observed on the selected features. 5-Fold
slightly (see Table III). Therefore, the framework with 1000 cross-validation is applied for testing and training of the
features is found better in all aspects. The performance in dataset. Many experiments are performed, apart, only five
experiment 5 is measured on variants of three classifiers experiments are discussed in detail. It is observed that the
including SVM, KNN, and tree. Overall, tree-based least performance outcomes are obtained for experiments
classifiers performed very poor with all having accuracy less with 100 features having an accuracy of 92.43 percent by
than or equal to 35%. The accuracy of SVM based classifiers using the FKNN classifier. Similarly, the best performance
is found in between 34.56% to 92.58%. Performance on outcomes are considered for experiments with 1000 features
nearly all KNN based classifiers is found better having an having an accuracy of 93.88 percent by using the FKNN
accuracy range of 57.16% to 93.88%. Also, in all cases, the classifier. From all classifiers' outcomes, in all experiments,
performance of all SVM variants regarding training time is it is observed that FKNN shows high performance and
found lower as compared to KNN based variants. The CSVM depicts second to high performance in terms of
prediction speeds of KNN variants on the other hand are selected performance measures.
relatively better as compared to SVM and tree variants. Although the proposed system shows satisfactory outcomes
in terms of performance measures, still there can be
V. CONCLUSION improvements performed for further enhancement of
This work is based on a proposed CNN network i.e., L2- accuracy. In the future, the work can further be explored with
GraftNet for feature extraction along with the ASO more state-of-the-insight approaches such as manifold
algorithm for feature selection. The proposed L2-GraftNet is learning, LSTMs, Quantum deep learning, and brain-like
first converted to a pre-trained model by performing training computing approaches.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
[33] Z. Ren, Q. Zhang, X. Gao, P. Hao, and J. Cheng, "Multi- [49] S. Rüping, "SVM kernels for time series analysis,"
modality learning for human action recognition," Technical report2001.
Multimedia Tools and Applications, pp. 1-19, 2020. [50] N.-E. Ayat, M. Cheriet, and C. Y. Suen, "Automatic
[34] C. Dhiman and D. K. Vishwakarma, "A review of state- model selection for the optimization of SVM kernels,"
of-the-art techniques for abnormal human activity Pattern Recognition, vol. 38, pp. 1733-1745, 2005.
recognition," Engineering Applications of Artificial [51] B. Haasdonk, "Feature space interpretation of SVMs with
Intelligence, vol. 77, pp. 21-45, 2019. indefinite kernels," IEEE Transactions on pattern
[35] T. Kautz, B. H. Groh, J. Hannink, U. Jensen, H. analysis and machine intelligence, vol. 27, pp. 482-492,
Strubberg, and B. M. Eskofier, "Activity recognition in 2005.
beach volleyball using a Deep Convolutional Neural [52] A. P. Singh, "Analysis of Variants of KNN Algorithm
Network," Data Mining and Knowledge Discovery, vol. based on Preprocessing Techniques," in 2018
31, pp. 1678-1705, 2017. International Conference on Advances in Computing,
[36] J. W. McGrath, J. Neville, T. Stewart, and J. Cronin, Communication Control and Networking (ICACCCN),
"Cricket fast bowling detection in a training setting using 2018, pp. 186-191.
an inertial measurement unit and machine learning," [53] A. Lamba and D. Kumar, "Survey on KNN and its
Journal of sports sciences, vol. 37, pp. 1220-1226, 2019. variants," International Journal of Advanced Research in
[37] T. Aditya, S. S. Pai, K. Bhat, P. Manjunath, and G. Computer and Communication Engineering, vol. 5, pp.
Jagadamba, "Real Time Patient Activity Monitoring and 430-435, 2016.
Alert System," in 2020 International Conference on [54] L. Jiang, H. Zhang, and J. Su, "Learning k-nearest
Electronics and Sustainable Communication Systems neighbor naive bayes for ranking," in International
(ICESC), 2020, pp. 708-712. conference on advanced data mining and applications,
[38] Ø. Meinich-Bache, S. L. Austnes, K. Engan, I. Austvoll, 2005, pp. 175-185.
T. Eftestøl, H. Myklebust, S. Kusulla, H. Kidanto, and H. [55] Y. Xu, Q. Zhu, Z. Fan, M. Qiu, Y. Chen, and H. Liu,
Ersdal, "Activity Recognition From Newborn "Coarse to fine K nearest neighbor classifier," Pattern
Resuscitation Videos," IEEE journal of biomedical and recognition letters, vol. 34, pp. 980-986, 2013.
health informatics, vol. 24, pp. 3258-3267, 2020. [56] Y.-l. Chen, T. Wang, B.-s. Wang, and Z.-j. Li, "A survey
[39] S. N. Gowda, "Human activity recognition using of fuzzy decision tree classifier," Fuzzy Information and
combinatorial Deep Belief Networks," in Proceedings of Engineering, vol. 1, pp. 149-159, 2009.
the IEEE conference on computer vision and pattern [57] Priyanka and D. Kumar, "Decision tree classifier: a
recognition workshops, 2017, pp. 1-6. detailed survey," International Journal of Information
[40] M. Sharif, M. A. Khan, T. Akram, M. Y. Javed, T. Saba, and Decision Sciences, vol. 12, pp. 246-269, 2020.
and A. Rehman, "A framework of human detection and [58] A. Krizhevsky, I. Sutskever, and G. E. Hinton, "Imagenet
action recognition based on uniform segmentation and classification with deep convolutional neural networks,"
combination of Euclidean distance and joint entropy- Advances in neural information processing systems, vol.
based features selection," EURASIP Journal on Image 25, pp. 1097-1105, 2012.
and Video Processing, vol. 2017, pp. 1-18, 2017. [59] F. N. Iandola, S. Han, M. W. Moskewicz, K. Ashraf, W.
[41] S.-C. Huang, Y.-T. Peng, C.-H. Chang, K.-H. Cheng, S.- J. Dally, and K. Keutzer, "SqueezeNet: AlexNet-level
W. Huang, and B.-H. Chen, "Restoration of Images With accuracy with 50x fewer parameters and< 0.5 MB model
High-Density Impulsive Noise Based on Sparse size," arXiv preprint arXiv:1602.07360, 2016.
Approximation and Ant-Colony Optimization," IEEE [60] J. Bouvrie, "Notes on convolutional neural networks,"
Access, vol. 8, pp. 99180-99189, 2020. 2006.
[42] R. Guha, A. H. Khan, P. K. Singh, R. Sarkar, and D. [61] Y. Li, Z. Hao, and H. Lei, "Survey of convolutional
Bhattacharjee, "CGA: A new feature selection model for neural network," Journal of Computer Applications, vol.
visual human action recognition," Neural Computing and 36, pp. 2508-2515, 2016.
Applications, pp. 1-20, 2020. [62] J. Wu, "Introduction to convolutional neural networks,"
[43] F. Luo, B. Du, L. Zhang, L. Zhang, and D. Tao, "Feature National Key Lab for Novel Software Technology.
learning using spatial-spectral hypergraph discriminant Nanjing University. China, vol. 5, p. 23, 2017.
analysis for hyperspectral image," IEEE transactions on [63] S. Balocco, M. González, R. Ñanculef, P. Radeva, and G.
cybernetics, vol. 49, pp. 2406-2419, 2018. Thomas, "Calcified plaque detection in IVUS sequences:
[44] G. Shi, H. Huang, and L. Wang, "Unsupervised Preliminary results using convolutional nets," in
dimensionality reduction for hyperspectral imagery via International Workshop on Artificial Intelligence and
local geometric structure feature learning," IEEE Pattern Recognition, 2018, pp. 34-42.
Geoscience and Remote Sensing Letters, vol. 17, pp. [64] Y. Liu, X. Wang, L. Wang, and D. Liu, "A modified leaky
1425-1429, 2019. ReLU scheme (MLRS) for topology optimization with
[45] H. Cheng and W. Hongmei, "Comparison of Manifold multiple materials," Applied Mathematics and
Learning and Deep Learning on Target Classification," in Computation, vol. 352, pp. 188-204, 2019.
Proceedings of the 2017 International Conference on [65] S. Ioffe and C. Szegedy, "Batch normalization:
Cloud and Big Data Computing, 2017, pp. 100-104. Accelerating deep network training by reducing internal
[46] M. K. Khaleel, M. A. Ismail, U. Yunan, and S. Kasim, covariate shift," in International conference on machine
"Review on intrusion detection system based on the goal learning, 2015, pp. 448-456.
of the detection system," International Journal of [66] A. Krizhevsky and G. Hinton, "Learning multiple layers
Integrated Engineering, vol. 10, 2018. of features from tiny images," 2009.
[47] M. Aljanabi, M. A. Ismail, and V. Mezhuyev, "Improved [67] M. H. Pham, T. H. Do, V.-M. Pham, and Q.-T. Bui,
TLBO-JAYA algorithm for subset feature selection and "Mangrove forest classification and aboveground
parameter optimisation in intrusion detection system," biomass estimation using an atom search algorithm and
Complexity, vol. 2020, 2020. adaptive neuro-fuzzy inference system," PloS one, vol.
[48] Z. F. Hussain, H. R. Ibraheem, M. Alsajri, A. H. Ali, M. 15, p. e0233110, 2020.
A. Ismail, S. Kasim, and T. Sutikno, "A new model for [68] W. Zhao, L. Wang, and Z. Zhang, "Atom search
iris data set classification based on linear support vector optimization and its application to solve a hydrogeologic
machine parameter's optimization," International Journal parameter estimation problem," Knowledge-Based
of Electrical and Computer Engineering, vol. 10, p. 1079, Systems, vol. 163, pp. 283-304, 2019.
2020.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
[69] J. Too and A. R. Abdullah, "Chaotic atom search Dr. Nor Shahida Mohd Jamail is
optimization for feature selection," Arabian Journal for
Science and Engineering, vol. 45, pp. 6063-6079, 2020.
currently in Prince Sultan University,
[70] W. S. Noble, "What is a support vector machine?," Riyadh, Saudi Arabia as an Assistant
Nature biotechnology, vol. 24, pp. 1565-1567, 2006. Professor. She obtained her PHD in
[71] L. E. Peterson, "K-nearest neighbor," Scholarpedia, vol.
Software Engineering from Universiti
4, p. 1883, 2009.
[72] Y.-W. Chang and C.-J. Lin, "Feature ranking using linear Putra Malaysia. Her specialized are in
SVM," in Causation and prediction challenge, 2008, pp. Software Engineering, Software Process
53-64. Modelling, Software Testing and Cloud Computing
[73] I. Dagher, "Quadratic kernel-free non-linear support Author’s formal
vector machine," Journal of Global Optimization, vol. 41, Services.
photo
She had involved in Machine Learning Research
pp. 15-30, 2008. Group in Prince Sultan University.Besides, she is the co-
[74] P. Virdi, Y. Narayan, P. Kumari, and L. Mathew, Chair, Track Chair, Committee in several International
"Discrete wavelet packet based elbow movement
classification using fine Gaussian SVM," in 2016 IEEE Conference published by IEEE and Scopus indexed. Her
1st international conference on power electronics, current research interests are on Cloud Computing, Quality
intelligent control and energy systems (ICPEICES), in Software Engineering and Software Modelling, Machine
2016, pp. 1-5.
[75] Z. Liu, M. J. Zuo, X. Zhao, and H. Xu, "An Analytical
Learning and more on commercialization of research
Approach to Fast Parameter Selection of Gaussian RBF products. Dr. Nor Shahida is very committed in her academic
Kernel for Support Vector Machine," J. Inf. Sci. Eng., vol. career and she can be contacted at njamail@psu.edu.sa
31, pp. 691-710, 2015.
SOUAD LARABI MARIE-SAINTE graduated in
Dr. Tanzila Saba earned PhD in document information operational research from the University of Sciences and
security and management from Faculty of Computing Technology Houari Boumediene, Algeria. She received dual
Universiti Teknologi Malaysia (UTM), Malaysia in 2012. M.Sc. degrees in mathematics, decision, and organization
She won best student award in the Faculty of Computing from Paris Dauphine University, France, and in computing
UTM for 2012. Currently, she is serving as Research Prof. in and applications from Sorbonne Paris1 University, France,
College of Computer and Information Sciences Prince Sultan and a Ph.D. degree in bio-inspired algorithms (Artificial
University Riyadh KSA. Her primary research focus in the Intelligence) from the Computer Science Department,
recent years is medical imaging, MRI analysis and Soft- Toulouse1 University Capitole, France, in 2011. She was an
computing. She has above two hundred publications that Assistant Professor with the College of Computer and
have attracted 5000 plus citations. Due to her excellent Information Sciences, King Saud University. She was also
research achievement, she is included in Marquis Who’s an Associate Researcher with the Department of Computer
Who (S & T) 2012.” Currently she is editor of reviewer of Science, Toulouse1 Capitole University, France. She is
reputed journals and on panel of TPC of international currently an Assistant Professor with the Department of
conferences. She has full command on varity of subjects and Computer Science, Associate Director of Software
taught several courses at graduate and postgraduate level. Engineering and Cyber Security Master Programs, and Vice-
She is also chair of the information system department for Chair of the ACM Professional Chapter at Prince Sultan
females. On accreditation side, she is a skilled lady with University. She has written several articles and has attended
ABET & NCAAA quality assurance. She is also senior various specialized international conferences. She has also
member of IEEE. participated in several research projects. Her research
interests include Combinatorial Optimization,
Dr. Rehman is a senior researcher in the Metaheuristics, Artificial Intelligence and especially bio-
Artificial Intelligence & Data Analytics inspired algorithms applied to multidisciplinary domains,
Lab Prince Sultan University Riyadh Algorithms Analysis and Design, Bioinformatics, Data
Saudi Arabia. He received his PhD & Mining, Machine and Deep Learning, Biometric
Postdoc from Faculty of Computing identification, and Natural Language Processing, and
Universiti Teknologi Malaysia with a Software Engineering.
specialization in Forensic Documents
Analysis and Security with honor in 2010 and 2011 Mudassar Raza, PhD is currently
respectively. He received rector award for 2010 best student Assistant Professor at COMSATS
in the university. Currently, he is PI in several funded University Islamabad (CUI), Wah
projects and also completed projects funded from MOHE Campus, Pakistan with an overall job
Malaysia, Saud Arabia. His keen interests are in Data experience of around 17 years at both
Mining, Health Informatics, Pattern Recognition. He is graduate and undergraduate levels. He
author of more than 200 ISI journal papers, conferences and did his Ph.D. from University of
is a senior member of IEEE with h-index 42. Science and Technology of China,
Anhui, Hefei, China (USTC). He remained a very active
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3068223, IEEE Access
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/