Professional Documents
Culture Documents
6.1 Introduction
Android has literally conquered the smartphone industry and acquired 87.7% of
market share at the end of second quarter of 2019 [1]. Android official app store
i.e., Google Play had 3.7 million apps available at the end of December 2018 [2]
whereas the total app download count has reached 205.4 billion till the end of the
year 2018 [3]. The popularity of Android apps invites cybercriminals who view it
as a lucrative target. According to McAfee 2018 mobile threat report [4], there has
been 16 million malware found in Google play store which was double in number
from the previous years. More than 4000 mobile threats are present in Android [5].
According to Google Play Protect [6], they are securing more than 2 billion devices
on a daily basis. However, the report which was published by McAfee in 2018 [4]
is about-face of it. Google Play Protect failed to protect the most common malware
threats. Research carried out in this direction found that 4,964,460 devices were
infected from pre-installed apps in the first quarter of 2018 [6].
Alike a desktop operating system, there is anti-malware software in the mobile
phones too. The efficiency of anti-malware software is dependent upon a signature-
based approach. A signature-based approach follows the concept of a unique se-
quence of bytes which is constantly present inside the malware infected software.
The critical problem with this process is that it fails to find new malware. Still,
malware analyst must wait for a new malware to come into the market, and then it
generates a signature file and provides a solution for its users. This approach is oper-
ative when a significant amount of new malware signature is present in its database.
A Machine-Learning approach is generally followed to conquer the problem of
signature-based approach and to detect unknown malware from Android [7]. Machine
learning based approaches are trained with classification algorithms by the mean of
data sets which are composed of several characteristics or features that come from
both malicious and benign apps. In machine learning approach, feature selection is
the main step which determines the accuracy rate of the classifier.
In previous studies, researchers applied various supervised machine learning tech-
niques [8–11] to predict whether the app is infected with malware or not. Supervised
machine-learning techniques need a significant amount of labeled data for each of
the family. It is very hard to collect a considerable number of labeled data from the
real-world, such as known malicious app in Android. Gathering labeled data for both
the classes is a time-consuming process, and in such processes, some malware apps
can exceed detection.
From the researches done so far [8–11], it has been observed that for supervised
learning, the testing data can similarly allocate to the training data. Then, there is
degradation in performance of supervised machine learning algorithms when tested
on samples that do not allocate identically as the training data. Therefore to overcome,
the problems faced by supervised techniques, a semi-supervised machine learning
technique is helpful in which a fixed amount of labeled data is available for both
the classes. Semi-Supervised machine learning techniques trained with the help of
supervised classifier by using labeled data and detect the label for each unlabeled
6 Feature-Based Semi-supervised Learning … 95
Fig. 6.1 Flow chart of the proposed Android malware detection approach
instance. These techniques help us to enhance the accuracy with no labels present in
the data set.
In this work, a malware detection approach is developed based on the principle of
semi-supervised machine learning technique. We apply LLGC (Learning with Local
and Global Consistency) [12, 13] on our collected data set which consists of the
dynamic behavior of the Android apps. The unique and novel contribution of this
study are as given below:
– To develop a malware detection model by using distinct feature sets which are
suitable for all categories of Android apps.
– To demonstrate that semi-supervised machine learning approach is equivalent good
as supervised machine learning approach.
The steps followed by us in building a capable Android malware detection model
are demonstrated in Fig. 6.1. To build an effective malware detection model, we
collect Android application packages (.apk) from different sources mentioned in
Sect. 6.3. Next, it is important to identify the class (i.e., benign or malware) of .apk
file. Further, features are extracted from collected .apk files by using tools available in
the literature and these collected features forms our data set. Next, the right set of fea-
tures are chosen by implementing a feature sub-set selection methods. These selected
features are recognized as input to build a model by using semi-supervised machine
learning approach. Finally, the developed model is validated with the proposed de-
tection framework to check its capability to detect malware from real-world apps
or not. Rest of the chapter is arranged are as follows. In Sect. 6.2, we discusses the
previously developed Android malware detection models. In Sect. 6.3, we present
the description of the data set. In Sect. 6.4, we present the feature sub-set selec-
tion methods. Section 6.5, explain the semi-supervised machine learning classifier.
Section 6.6, provides the proposed detection framework. Section 6.7, provides the
performance evaluation parameters. Sections 6.8 and 6.9 represent the experimental
setup, experimental results and conclusion of this chapter with the future scope.
The combination of artificial intelligence and statistics offers machine learning the
foundation of probabilistic models and data-driven parameter estimation [13]. Ma-
chine learning techniques can further be categorized into three distinct learning prin-
ciples viz. supervised, unsupervised and semi-supervised [7]. Table 6.1 shows the
96 A. Mahindru and A. L. Sangal
existing framework or approaches developed in the past. First column of Table 6.1,
point towards the name of the framework or approach developed in the literature.
The second column describe the principal followed by the researchers to detect mal-
ware from Android apps. The third column presents the findings of the previously
developed approaches.
There are two techniques which are utilized to identify malware from Android,
i.e., static and dynamic. The static technique involves examining and disassembling
of the code to validate the function and also help us to evaluate the apps without
running it [7]. The static based detection method is further divide into three parts i.e.,
permission, signature and Dalvik. The dynamic technique is based on the principle
where detection of Android malware take place while executing it. The dynamic
technique is further segregated into three methods based on the principle of taint
analysis, anomaly and emulation [7].
In supervised learning, we train it with the help of labeled classes and after learn-
ing it from the features, the test is performed on the remaining data set. Traditional
methods include Support Vector Machine [14, 15], Decision Tree [11, 16], Near-
est Neighbor [17], Navies Bayes [11], Random Forest [11], etc. In unsupervised
learning, we train it with the help of an unlabeled class and after learning from the
features, the test performed on the remaining data set. Traditional methods include
K-mean [18], Model-Based Clustering [19], Hierarchical Clustering [20] etc. Semi-
supervised learning is among unsupervised and supervised learning, where we train
data with the help of unlabeled data and testing is performed by using both the labeled
and unlabeled data. Typical methods include hidden Markov model, low-density sep-
aration and so on.
Faruki et al. [21] proposed AndroSimilar which is utilized to identify the unknown
malware on the basis of signature length which is further compare with signatures
already exist in its database to detect malware apps. Felt et al. [22] was developed to
examine either the Android apps were over-privileged or not. Further, they authors
implemented their proposed approach on collection of distinct Android apps and
conclude that 33% apps were over-privileged. Tang et al. [23] developed a model
which work on the principal of security distance and implemented on the key idea
that if the app demands exceed one feature i.e., permission, with in the time it raises
an issue to the security of Android based devices. DroidAnalytics [24] utilizes the
signature of the app together with API calls to determine malware apps.
Wognsen et al. [25] developed initially formalized version of Dalvik byte-code
which is base on the java reflective features. Further, developed technique is utilized
to determine malware by employing data flow analysis. PUMA [9] obtained the
accuracy of 80% by utilized the machine learning technique with the help of extracted
permissions to detect malware from the Android apps. KIRIN [26] is a light-weight
framework based on the certificate and it utilized at the time of execution. An app
is considered to be malicious behavior if it is incapable to clear all of its security
check. Sato et al. [27] suggested a light-weight approach for malware detection
which investigates the Android"Manifiest.xml" file. It match the extracted features
with the manifest file and achieves the accuracy of 90% and also calculates the score
to evaluate the app is malware or not.
98 A. Mahindru and A. L. Sangal
DroidMat [28] based on the extraction of data from “Manifest.xml” file of An-
droid. To strengthen the performance of machine learning classifier, they implement
K-mean machine learning algorithm in addition to K-nearest neighbors algorithm on
collected data set. Zhou et al. [29] developed DroidMOSS that is a approach which
evaluate the apps on the principal of analogy. Fuzzy hashing technique is utilized to
discover modifications done in the app by re-packaging. This framework is restricted
to a limited number of malware samples. Haung et al. [30] able to identified 81% of
the malicious apps by implementing Machine learning algorithms which is worked
on the rule of labeling.
Aafer et al. [31] present DroidAPIMiner which combines permission established
on the principle of behavioral footprints and implemented a filtering mechanism to
discover the existence of malware in Android apps. Authors achieved the accuracy
of 99% by using API level to identify malware and benign apps. ComDroid [32]
detects app communication vulnerabilities.
DREBIN is a light-weight approach proposed by Arp et al. [33] which discovered
malicious apps by utilizing the standards of joint vector space. DERBIN achieve the
performance of 94% with some false-alarms by using machine learning technique
to discover malware apps. CrowDroid is proposed by Burguera et al. [34] utilizes
the behavior of Android apps to discover malware by utilized unsupervised machine
learning techniques and outcomes were stored at the server.
Zhao et al. [35] proposed AntiMal-Droid which rely on the behavior of apps to
detect whether the app is malware and benign. AntiMal-Droid work on the principle
of the signature comparing to identify that the apps belong to benign and malware
categories. Enck et al. [36] developed TaintDroid which is based on real-time anal-
ysis. It follows various source of important information and recognizes the data
leakage. Shabtai et al. [10] proposed Andromaly which is based of machine learning
techniques to check Android devices and to identify that app belongs to benign and
malware category.
Yan et al. [37] developed DroidScope which operates on an Android device which
provides in assisting custom analysis and to identify privilege based attacks. Feng et
al. [38] proposed Apposcopy having the characteristics of static analysis, taint anal-
ysis and Inter-component call graph which successfully identify the malware apps.
Narayanan et al. [39] developed Scalable Android malware detector and Context-
aware Adaptive which is able to identifying all kinds of malicious behavior apps, but
it is adaptive to developing malware.
Earlier, researchers presented features selection techniques for detecting malware
from real-world apps. Table 6.2 highlights the researches conducted by distinct au-
thors to choose the best features which are used to develop a model for malware
detection from real-world apps.
6 Feature-Based Semi-supervised Learning … 99
RQ1: Does it is feasible to detect malware from Android apps by utilized semi-
supervised machine learning technique?
By the help of this question, we examine the performance of the LLGC to detect
malware from Android apps. In this study, LLGC semi-supervised machine learning
classifiers have been considered for building a model by recognizing a set of features
as input and able to detect either the app is benign or malware.
RQ2: Does the feature sub-set selection approaches pay any impact on the perfor-
mance of the semi-supervised machine learning classifier or not?
It is noticed that certain feature sub-set selection approaches work very well with a
certain classification techniques. Therefore, in this work, four distinct sub-set selec-
tion approaches are evaluated by utilizing LLGC as an classifier.
RQ3: Which feature sub-set selection approach work best for the task of detecting
malware from Android apps?
This question helps us to choose the best features by applying feature sub-set selec-
tion methods on our collected data set. Further, feature selected by feature sub-set
selection method are utilized to develop a model to detect either the app is benign or
malware.
RQ4: Does a selected set of features perform better than considering all set of fea-
tures for the task to detect either the app is benign or malware?
In this research question, our objective is to select best set of features by applying
feature sub-set selection method which help us to differentiate between benign or
malware apps.
100 A. Mahindru and A. L. Sangal
1 https://play.google.com/store?hl=en.
2 http://android.pandaapp.com/.
3 http://www.gfan.com/.
4 http://www.hiapk.com/.
5 http://andrdoid.d.cn/.
6 http://www.appchina.com/.
7 http://www.mumayi.com/.
8 http://slideme.org/.
9 https://www.virustotal.com/.
10 https://www.microsoft.com/en-in/windows/comprehensive-security.
Table 6.3 Categories of .apk files belong to their respective families (.apk)
ID Category N T Ba W Bo S
D1 Arcade and action (AA) 8291 440 100 204 130 600
D2 Books and reference (BR) 8235 200 250 56 150 150
D3 Brain and puzzle (BP) 4928 820 54 28 50 50
D4 Business (BU) 8308 152 150 150 22 22
D5 Cards and casino (CC) 8886 76 65 81 100 44
D6 Casual (CA) 8010 321 69 46 150 140
D7 Comics (CO) 8667 65 95 35 3 0
D8 Communication (COM) 18,414 250 50 500 3 3
D9 Education (ED) 8744 560 68 50 50 68
D10 Entertainment (EN) 19,222 500 500 500 100 42
D11 Finance (FI) 7999 50 200 99 65 92
D12 Health and fitness (HF) 8551 98 65 45 140 140
D13 Libraries and demo (LD) 8655 70 100 100 6 500
D14 Lifestyle (LS) 7650 155 200 100 193 192
6 Feature-Based Semi-supervised Learning …
D15 Media and video (MV) 8019 100 123 162 450 71
D16 Medical (ME) 6000 12 13 12 24 25
D17 Music and audio (MA) 8621 65 100 65 165 165
D18 News and magazines (NM) 8164 100 100 100 100 32
D19 Personalization (PE) 9334 500 42 500 200 22
D20 Photography (PH) 9133 100 120 50 96 500
D21 Productivity (PR) 9850 100 516 250 250 62
D22 Racing (RA) 9766 50 100 210 100 180
D23 Shopping (SH) 9673 100 100 120 150 50
D24 Social (SO) 6159 100 50 210 150 150
D25 Sports (SP) 9669 100 240 100 450 112
D26 Sports games (SG) 9889 100 145 145 650 198
D27 Tools (TO) 8346 120 500 550 475 563
D28 Transportation (TR) 8796 2 500 100 100 20
D29 Travel and local (TL) 9180 500 220 150 48 100
D30 Weather (WR) 9841 120 23 700 50 25
101
“N” stands for Normal, “T” stands for Trojan, “Ba” stands for Backdoor, “W” stands for"Worm", “BO” stands for Botnet, and “S” stands for “Spyware”
102 A. Mahindru and A. L. Sangal
Table 6.4 Formulation of sets (having permissions, API calls, number of user download the app
and rating of the apps)
Set No. Description Set Description
No.
S1 SYNCHRONIZATION _DATA S2 CONTACT_INFORMATION
S3 PHONE_STATE and S4 AUDIO and VIDEO
PHONE_CONNECTION
S5 SYSTEM_SETTINGS S6 BROWSER_INFORMATION
S7 BUNDLE S8 LOG_FILE
S9 LOCATION_INFORMATION S10 WIDGET
S11 CALENDAR_INFORMATION S12 ACCOUNT_SETTINGS
S13 DATABASE_INFORMATION S14 IMAGE
S15 UNIQUE_IDENTIFIER S16 FILE_INFORMATION
S17 SMS_MMS S18 READ
S19 ACCESS_ACTION S20 READ_AND_WRITE
S21 YOUR_ACCOUNTS S22 STORAGE_FILE
S23 SERVICES_THAT_COST_YOU_MONEY S24 PHONE_CALLS
S25 SYSTEM_TOOLS S26 NETWORK_INFORMATION and
BLUETOOTH_INFORMATION
S27 HARDWARE_CONTROLS S28 DEFAULT GROUP
S29 API CALLS S30 RATING and USER DOWNLOADS
THE APP
samples are expected onto the sub-set of attributes [55]. The coherence rate is mea-
sured utilizing incongruity rate wherever two measuring points are studied incom-
patible whether they have the similar feature importance among two distinct class
names (i.e., benign or malware). For this work, destination variable i.e., apps is hav-
ing two distinct characteristics (i.e., 0 for benign apps and 1 for malware apps). A
group of features (GF) is having Z amount of sample, there are z amount of instances
in a manner that Z = X 1 + X 2 + · · · + X z . Instance X i seems in entirely A samples
from which A0 numbers of samples are marked by 0 and A1 number of samples are
marked by 1, here A = A0 + A1 . If A1 is less than A0 , then the difference count for
the instance X i is I N Ci = A − A0 . The inconsistency rate (I N C R) of feature set
is computed by utilizing succeeding equation:
z
I N Ci
I NC R = i=1
(6.1)
Z
Filtered sub-set evaluation is based on the principal to select random sub-set evaluator
from data set that was gained by applying arbitrary filtering approach [56]. The
filtering technique does not based on any learning induction algorithm. Filtered sub-
set evaluation approach is fast and scalable.
Rough set analysis method is based on the principal of the similarity of a conventional
crisp set11 phrases of sets in pairs, which deliver the upper and the lower estimation
of the original data set [57]. This ceremonious similarity, depicts the upper, and
lower limits of the original data set. Rough set analysis approach creates information
model apparent by decreasing the “degree of precision” [58]. We use RSA to search
diminished set of features. RSA utilized trinity distinct kinds of notations such as
reduced attributes, approximations, and information system.
11 https://en.wikipedia.org/wiki/Rough_set.
104 A. Mahindru and A. L. Sangal
where car d() several elements in the lowermost or topmost approach of the set
Z . Entire feasible groups are chosen whom correctness is equivalent to precision
of the universal set.
iii. Information system: It is determined as Z = (C B), where C is a universe in-
cluding non-empty group of confined objects and B is the confined attribute sets.
Here occur a corresponding Fb : C → Vb for every b ∈ B, here Vb is the group
of importance of attribute b. For the sake of every group attribute Z ⊂ B, there
exist a related parity association named as B-indiscernibility (Ind(Z)) relation.
I nd(Z ) is determined in the following way:
the expected value have the similar label and (2) Similar structure points are expected
to have the same label.
LLGC algorithm [12, 59] described as following:
This section yields the fundamental descriptions of the performance parameters uti-
lized for malware detection. Each of these performance parameters are calculated by
utilizing Confusion matrix. It includes actual and detected classification information
done by detection approach. Table 6.5 gives the confusion matrix for the malware de-
tection model. For our work, two performance parameters F-measure and Accuracy
are utilized for evaluating the performance of malware detection methods. F-measure
and accuracy can be measured by using Eqs. (6.6) and (6.7).
2 ∗ Pr ecision ∗ Recall
F − measur e =
Pr ecision + Recall
2 ∗ N Malwar e→Malwar e
= .
2 ∗ N Malwar e→Malwar e + N Benign→Malwar e + N Malwar e→Benign
(6.7)
Table 6.5 Confusion matrix to classify an Android app is benign or malware (.apk)
Benign Malware
Benign Benign → Benign Benign → Malware
Malware Benign → Malware Malware → Malware
6 Feature-Based Semi-supervised Learning … 107
In this part of the chapter, the experimental set-up to discover the efficacy of mal-
ware detection model using the proposed detection framework is presented. LLGC
is utilized to build a model that detect either the app is benign or malware. These
approaches are implemented on thirty different categories of Android apps, as shown
in Table 6.3. All these categories have a different percentage of benign or malware
apps. Figure 6.2 shows the proposed framework for malware detection.
The subsequent steps are considered during selecting a set of features to build the
malware detection model which help us to detect either the app is benign or mal-
ware. Feature sub-set selection approaches are applied to thirty different categories
of Android apps. Consequently, a total of 150 [(4 feature sub-set selection approach
+ 1 recognizing all features) X 30 distinct Android apps data sets X 1 detection
approaches] different detection models have been build in this work.
1. In this work, four feature sub-set selection approaches are implemented on thirty
different categories of Android apps to chose the appropriate set of features for
malware detection.
2. The sub-sets of features achieved from the above one steps are considered as
input to semi-supervised machine learning algorithms while building a model.
For the effectiveness of Android malware detection model, we implemented 20-
fold cross-validation technique. The effectiveness of all build malware detection
models are compared by utilizing two distinct performance parameters namely
as F-measure and Accuracy.
3. The effective model build from above mentioned two stages are used to validate
with proposed malware detection framework.
This section of the chapter contains, the relationship among a distinct set of features
and malware detection at the Android level. F-measure and Accuracy are recog-
nized as performance assessment parameters to compare the performance of mal-
ware detection model build by utilizing LLGC as an classifier approach. To depict
the outcomes, we utilize the respective abbreviations as revealed in Table 6.6 to their
authentic names.
In this work, four distinct kinds of feature sub-set selection approaches are im-
plemented on thirty data sets of Android apps one after another. Feature sub-set
selection approaches work on the principle of hypothesis which make models with
better accuracy and make less amount of misclassified errors, while selecting the best
features from available number of features. Later, these isolation sub-set of features
has been recognized as an input for building a model to detect either the app is be-
nign or malware. Features selected by distinct feature sub-set selection approaches
are demonstrated in Fig. 6.3.
Tables 6.7 and 6.8 show the gained performance values for distinct data sets by
utilizing LLGC as an classifiers. On the basis of Tables 6.7 and 6.8, it can be implicit
that:
– In case of LLGC, malware detection model build by recognizing the selected set
of features by utilizing FS4, i.e., RSA gained better outcomes when matched to
other feature sub-set selection approaches.
6 Feature-Based Semi-supervised Learning … 111
In this chapter, one classifier and two evaluation parameters are recognized to
detect either the app belongs to benign or malware class. Figure 6.4 demonstrates the
two box-plot diagrams for each of the cases i.e., F-Measure and Accuracy . Every
single figure contains five box-plots. The model which is having the high value of
median and less numbers of outliers is consider the superior model. On the basis of
these box-plot diagram, we can analyze that:
112 A. Mahindru and A. L. Sangal
– In all feature sub-set approaches, FS4 have achieved high value of median with
lesser outliers. Based on box-plots demonstrated in Fig. 6.4, FS4 produced the
better outcome, i.e., feature sub-set selection by utilizing RSA compute the best
set of features for detecting malware and benign apps and give best results as
compared to others.
Pair-wise t-test being utilized to identify which feature sub-set selection approaches
perform better or all of approaches worked equally well.
Feature Sub-set Selection Approaches: For this work, we consider four distinct
feature sub-set selection approaches as an input to build a model with thirty distinct
categories of Android apps and consider two outcome parameters, i.e., F-measure
and Accuracy. As every feature sub-set selection approach, used two sets, each with
30 points (1 classifier × 30 data sets). Hence, t-test among distinct feature sub-
set selection approaches are carried out and matched with the respective P-value to
measure the statistical importance. Figure 6.5 demonstrates the outcome of the t-test
study. Because, the values of P are presented by utilizing two distinct symbols such
as (green circle) P-value > 0.05 (no relevance importance) and (red circle) <= 0.05
(relevance importance). On the basis of Fig. 6.5 it has been observed, that a large
number of cells are filled with green circle; it means that it does not significantly
differentiate among applied feature selection approaches. Thus, FS4 selected set of
features using RSA, gives better outcomes as compared to other techniques.
6 Feature-Based Semi-supervised Learning … 113
Table 6.10 Comparison with previously used classifiers having full dataset
Name of the machine learning classifier Averaged accuracy (%)
SimpleLogistic [9] 84.08
BayesNet K2 [9] 82
BayesNet TAN [9] 68.51
RandomTree [9] 83.32
Our proposed model (LLGC+FS4) 97.8
114 A. Mahindru and A. L. Sangal
This part of the chapter contains, the overall finding of the empirical works done
so far. The empirical work was conducted on thirty different categories of Android
apps by selected features with the help of four distinct feature sub-set selection
techniques. Further, the selected features are trained with LLGC as an classifiers and
the performance are measured by using two effective performance parameters i.e.,
F-measure and Accuracy.
On the basis of empirical studies, this chapter able to answers the following re-
search questions.
RQ1: For building a malware detection model, LLGC have been considered to de-
tect either the app is benign or malware. On the basis of Tables 6.7 and 6.8, it can be
implicit that model developed using LLGC by recognizing selected set of features
by utilizing FS4 as an input gives better outcomes when compare to others.
RQ2: To give respond for RQ2, Fig. 6.4 were examined, and it is noted that the
outcome comes by using feature sub-set selection approaches is varied with LLGC.
It indicates that performance of LLGC to build a detection model to detect either
the app is benign or malware is influenced by the feature sub-set selection approaches.
RQ3: In this work, four distinct kinds of feature sub-set selection approaches are
recognized to select the smaller sub-set of features. On the basis of t-test study, it has
been analyzed that the feature sub-set selection by utilizing FS4 i.e., RSA approach
produces the best outcomes when compare to others.
RQ4: To give respond for RQ4, Figs. 6.4 and 6.5 were analyzed, we have seen that
model developed by using four different feature sub-set selection method is more
capable to detect malware rather than considering all extracted features from Android
apps.
6.9.6 Conclusion
This study highlighted on building a malware detection framework for identifying the
efficiency of the build malware detection model which is created by utilizing set of
features. In this chapter, thirty distinct set of features are utilized to build a model by
using LLGC. The execution process was conducted on thirty different categories of
Android app. The experiments carried out and outcomes are generated on MATLAB
environment.
6 Feature-Based Semi-supervised Learning … 115
References
1. https://www.statista.com/statistics/266136/global-market-share-held-by-smartphone-
operating-systems/
2. https://www.statista.com/statistics/266210/number-of-available-applications-in-the-google-
play-store/
3. https://www.statista.com/statistics/271644/worldwide-free-and-paid-mobile-app-store-
downloads/
4. https://www.mcafee.com/in/resources/reports/rp-mobile-threat-report-2018.pdf
5. https://source.android.com/security/reports/Google Android Security 2017 Report Final.pdf
6. https://thehackernews.com/2018/03/android-botnet-malware.html
7. I.H. Witten, E. Frank, M.A. Hall, C.J. Pal, Data Mining: Practical Machine Learning Tools
and techniques (Morgan Kaufmann, 2016)
8. J. Sahs, L. Khan, A machine learning approach to android malware detection, in 2012 European
Intelligence and Security Informatics Conference (IEEE, 2012), pp. 141–147
9. B. Sanz, I. Santos, C. Laorden, X. Ugarte-Pedrero, P. Garcia Bringas, G. Álvarez, Puma:
permission usage to detect malware in android, in International Joint Conference CISIS’12-
ICEUTE 12-SOCO 12 Special Sessions (Springer, Berlin, Heidelberg, 2013), pp. 289–298
10. A. Shabtai, U. Kanonov, Y. Elovici, C. Glezer, Y. Weiss, Andromaly: a behavioral malware
detection framework for android devices. J. Intell. Inf. Syst. 38(1), 161–190 (2012)
11. A. Mahindru, P. Singh, Dynamic permissions based android malware detection using machine
learning techniques, in Proceedings of the 10th Innovations in Software Engineering Confer-
ence (ACM, 2017), pp. 202–210
12. D. Zhou, O. Bousquet, T.N. Lal, J. Weston, B. Schölkopf, Learning with local and global
consistency, in Advances in Neural Information Processing Systems (2004) pp. 321–328
13. L. Chen, M. Zhang, C. Yang, R. Sahita, POSTER: semi-supervised classification for dynamic
android malware detection, in Proceedings of the 2017 ACM SIGSAC Conference on Computer
and Communications Security (ACM, 2017), pp. 2479–2481
14. C. Cortes, V. Vapnik, Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)
15. J. Li, L. Sun, Q. Yan, Z. Li, W. Srisa-an, H. Ye, Significant permission identification for
machine-learning-based android malware detection. IEEE Trans. Ind. Inf. 14(7), 3216–3225
(2018)
16. A. Zulkifli, I.R.A. Hamid, W. Md Shah, Z. Abdullah, Android malware detection based on
network traffic using decision tree algorithm, in International Conference on Soft Computing
and Data Mining (Springer, Cham, 2018), pp. 485–494
116 A. Mahindru and A. L. Sangal
17. W. Wang, M. Zhao, J. Wang, Effective android malware detection with a hybrid model based
on deep autoencoder and convolutional neural network. J. Ambient Intell. Humanized Comput.
10(8), 3035–3043 (2019)
18. Z. Aung, W. Zaw, Permission-based android malware detection. Int. J. Sci. Technol. Res. 2(3),
228–234 (2013)
19. L. Cen, C.S. Gates, L. Si, N. Li, A probabilistic discriminative model for android malware
detection with decompiled source code. IEEE Trans. Dependable Secure Comput. 12(4), 400–
412 (2014)
20. L. Weichselbaum, M. Neugschwandtner, M. Lindorfer, Y. Fratantonio, V. van der Veen, C.
Platzer, Andrubis: android malware under the magnifying glass. Vienna University of Tech-
nology, Tech. Rep. TR-ISECLAB-0414-001 (2014)
21. P. Faruki, V. Ganmoor, V. Laxmi, M.S. Gaur, A. Bharmal, AndroSimilar: robust statistical fea-
ture signature for Android malware detection, Proceedings of the 6th International Conference
on Security of Information and Networks (ACM, 2013), pp. 152–159
22. A.P. Felt, E. Chin, S. Hanna, D. Song, D. Wagner, Android permissions demystified, in Pro-
ceedings of the 18th ACM Conference on Computer and Communications Security (ACM,
2011), pp. 627–638
23. W. Tang, G. Jin, J. He, X. Jiang, Extending android security enforcement with a security
distance model, in 2011 International Conference on Internet Technology and Applications
(IEEE, 2011), pp. 1–4
24. M. Zheng, M. Sun, J.C.S. Lui, Droid analytics: a signature based analytic system to collect,
extract, analyze and associate android malware, in 2013 12th IEEE International Conference
on Trust, Security and Privacy in Computing and Communications (IEEE, 2013), pp. 163–171
25. E.R. Wognsen, H.S. Karlsen, M.C. Olesen, R.R. Hansen, Formalisation and analysis of Dalvik
bytecode. Sci. Comput. Program. 92, 25–55 (2014)
26. W. Enck, M. Ongtang, P. McDaniel, On lightweight mobile phone application certification, in
Proceedings of the 16th ACM Conference on Computer and Communications Security (ACM,
2009), pp. 235–245
27. R. Sato, D. Chiba, S. Goto, Detecting android malware by analyzing manifest files. Proc.
Asia-Pac. Adv. Netw. 36, 23–31 (2013)
28. D.J. Wu, C.-H. Mao, T.-E. Wei, H.-M. Lee, K.-P. Wu, Droidmat: android malware detection
through manifest and api calls tracing, in 2012 Seventh Asia Joint Conference on Information
Security (IEEE, 2012), pp. 62–69
29. W. Zhou, Y. Zhou, X. Jiang, P. Ning, Detecting repackaged smartphone applications in third-
party android marketplaces, in Proceedings of the Second ACM Conference on Data and
Application Security and Privacy (ACM, 2012), pp. 317–326
30. C.Y. Huang, Y.-T. Tsai, C.-H. Hsu, Performance evaluation on permission-based detection for
android malware, in Advances in Intelligent Systems and Applications, vol. 2 (Springer, Berlin,
Heidelberg, 2013), pp. 111–120
31. Y. Aafer, W. Du, H. Yin, Droidapiminer: mining api-level features for robust malware detection
in android, in International Conference on Security and Privacy in Communication Systems
(Springer, Cham, 2013), pp. 86–103
32. E. Chin, A.P. Felt, K. Greenwood, D. Wagner, Analyzing inter-application communication in
Android, in Proceedings of the 9th International Conference on Mobile Systems, Applications,
and Services (ACM, 2011), pp. 239–252
33. D. Arp, M. Spreitzenbarth, M. Hubner, H. Gascon, K. Rieck, C.E.R.T. Siemens, Drebin: ef-
fective and explainable detection of android malware in your pocket. Ndss 14, 23–26 (2014)
34. I. Burguera, U. Zurutuza, S. Nadjm-Tehrani, Crowdroid: behavior-based malware detection
system for android, in Proceedings of the 1st ACM Workshop on Security and Privacy in
Smartphones and Mobile Devices (ACM, 2011), pp. 15–26
35. M. Zhao, F. Ge, T. Zhang, Z. Yuan, AntiMalDroid: an efficient SVM-based malware detection
framework for android, in International Conference on Information Computing and Applica-
tions (Springer, Berlin, Heidelberg, 2011), pp. 158–166
6 Feature-Based Semi-supervised Learning … 117
36. W. Enck, P. Gilbert, S. Han, V. Tendulkar, B.-G. Chun, L.P. Cox, J. Jung, P. McDaniel, A.N.
Sheth, TaintDroid: an information-flow tracking system for realtime privacy monitoring on
smartphones. ACM Trans. Comput. Syst. (TOCS) 32(2) (2014)
37. L.K. Yan, H. Yin, DroidScope: seamlessly reconstructing the OS and Dalvik semantic views
for dynamic android malware analysis, in Presented as Part of the 21st USENIX Security
Symposium (USENIX Security 12) (2012), pp. 569–584
38. Y. Feng, S. Anand, I. Dillig, A. Aiken, Apposcopy: semantics-based detection of android
malware through static analysis, in Proceedings of the 22nd ACM SIGSOFT International
Symposium on Foundations of Software Engineering (ACM, 2014), pp. 576–587
39. A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, Context-aware, adaptive and scal-
able android malware detection through online learning (extended version). arXiv preprint
arXiv:1706.00947 (2017)
40. BlackHat, Reverse Engineering with Androguard https://code.google.com/androguard (Online;
Accessed 29 Mar. 2013)
41. H. Kang, J. Jang, A. Mohaisen, H.K. Kim, Detecting and classifying android malware using
static analysis along with creator information. Int. J. Distrib. Sens. Netw. 11(6) (2015)
42. D. Octeau, S. Jha, M. Dering, P. McDaniel, A. Bartel, L. Li, J. Klein, Y.L. Traon, Combin-
ing static analysis with probabilistic models to enable market-scale android inter-component
analysis, in ACM SIGPLAN Notices, vol. 51, no. 1 (ACM, 2016), pp. 469–484
43. B. Amos, H. Turner, J. White, Applying machine learning classifiers to dynamic android
malware detection at scale, in 2013 9th International Wireless Communications and Mobile
Computing Conference (IWCMC) (IEEE, 2013), pp. 1666–1671
44. W.-C. Wu, S.-H. Hung, DroidDolphin: a dynamic Android malware detection framework using
big data and machine learning, in Proceedings of the 2014 Conference on Research in Adaptive
and Convergent Systems (ACM, 2014), pp. 247–252
45. S. Sheen, R. Anitha, V. Natarajan, Android based malware detection using a multifeature
collaborative decision fusion approach. Neurocomputing 151, 905–912 (2015)
46. M. Damshenas, A. Dehghantanha, K.-K. Raymond Choo, R. Mahmud, M0droid: an android
behavioral-based malware detection model. J. Inf. Priv. Secur. 11(3), 141–157 (2015)
47. R. Vinayakumar, K.P. Soman, P. Poornachandran, S. Sachin Kumar, Detecting android malware
using long short-term memory (LSTM). J. Intell. Fuzzy Syst. 34(3), 1277–1288 (2018)
48. M.Z. Mas’ ud, S. Sahib, M.F. Abdollah, S. Rahayu Selamat, R. Yusof, Analysis of features
selection and machine learning classifier in android malware detection, in 2014 International
Conference on Information Science & Applications (ICISA) (IEEE, 2014), pp. 1–5
49. A. Narayanan, M. Chandramohan, L. Chen, Y. Liu, A multi-view context-aware approach
to Android malware detection and malicious code localization. Empirical Softw. Eng. 23(3),
1222–1274 (2018)
50. K. Allix, T.F. Bissyandé, Q. Jérome, J. Klein, Y. Le Traon, Empirical assessment of machine
learning-based malware detectors for Android. Empirical Softw. Eng. 21(1), 183–211 (2016)
51. A. Azmoodeh, A. Dehghantanha, K.-K. Raymond Choo, Robust malware detection for internet
of (battlefield) things devices using deep eigenspace learning. IEEE Trans. Sustain. Comput.
4(1), 88–95 (2018)
52. A.F.A. Kadir, N. Stakhanova, A.A. Ghorbani, Android botnets: what urls are telling us, in
International Conference on Network and System Security (Springer, Cham, 2015), pp. 78–91
53. Y. Zhou, X. Jiang, Dissecting android malware: characterization and evolution, in 2012 IEEE
Symposium on Security and Privacy (IEEE, 2012), pp. 95–109
54. Botnet Research Team. SandDroid: An APK Analysis Sandbox. Xi’an Jiaotong University
(2014)
55. M. Dash, H. Liu, Consistency-based search in feature selection. Artif. Intell. 151(1–2), 155–176
(2003)
56. R. Kohavi, G.H. John, Wrappers for feature subset selection. Artif. Intell. 97(1–2), 273–324
(1997)
57. Z. Pawlak, Rough sets. Int. J. Comput. Inf. Sci. 11(5), 341–356 (1982)
118 A. Mahindru and A. L. Sangal
58. C.-Y. Huang, Y.-T. Tsai, C.-H. Hsu, Performance evaluation on permission-based detection
for android malware, in Advances in Intelligent Systems and Applications, vol. 2 (Springer,
Berlin, Heidelberg, 2013), pp. 111–120
59. I. Santos, B. Sanz, C. Laorden, F. Brezo, P.G. Bringas, Opcode-sequence-based semi-supervised
unknown malware detection, in Computational Intelligence in Security for Information Systems
(Springer, Berlin, Heidelberg, 2011), pp. 50–57
60. S. Kokoska, C. Nevison, Critical values for Cochran’s test, in Statistical Tables and Formulae
(Springer, New York, 1989), p. 74