You are on page 1of 6

Tuning Deep Learning Performance for Android

Malware Detection
Jarrett Booz, Josh McGiff, William G. Hatcher, Wei Yu, James Nguyen, and Chao Lu
Department of Computer and Information Sciences
Towson University, Maryland, USA
Emails: {jbooz1, jmcgif1, whatch2}@students.towson.edu,{wyu, clu}@towson.edu,
james.huy.nguyen@gmail.com

Abstract—In this paper, we address the issue of Android which now encompasses approximately 209 million users in
malware detection by implementing a deep learning environment the U.S., and over 1.9 billion users worldwide [7].
and fine-tune parameters to determine optimal settings for the
classification of Android malware from extracted permission In seeking to address the issues of smartphone security, and
data. By determining the optimal settings, we demonstrate Android malware detection in particular, in this paper we have
the potential performance of a deep learning environment for make the following contributions:
Android malware detection. Specifically, we conduct an extensive
study of various hyper-parameters to determine optimal config- • We address the utility of deep learning for analyzing
urations, and then carry out a performance evaluation on those permission data from Android applications in order to
configurations to compare and maximize detection accuracy in classify apps as malicious or benign. To accomplish this,
our target networks. Our results achieve approximately 95 % we use various python libraries to construct a deep learn-
detection accuracy, with an approximate F1 score of 93 %.
Keywords—Deep Learning, Malware Detection, Performance ing infrastructure, which consists of TensorFlow machine
Tuning learning backend, the Keras framework, and Scikit-Learn
utilities, to extract and vectorize the target features, and
then use these dense vectors for deep learning analysis.
I. I NTRODUCTION Initially attempting only a rudimentary implementation,
Recent advances in machine learning have projected neural our neural network yielded results of about 90 % accu-
networks and deep learning systems into the public con- racy.
sciousness [8], [14]. This is attributable to the significant • Once this initial result was extracted and a framework
strides that deep learning systems continue to make in a built, we then targeted mechanisms to optimize the re-
large variety of areas, including image and video analysis sults. To identify the optimal network hyper-parameters,
and feature recognition, autonomous vehicles, natural language we utilized the grid search technique to test many com-
processing, the control of robotic systems, and others. Indeed, binations of tunable parameters for the deep learning
deep learning has emerged as an extremely powerful tool environment. We also leveraged various neural network
for processing complex data [9]. Deep learning models, and shapes, making some networks deeper or wider, to de-
deep neural networks in particular, have the capacity to learn termine the best shape and size model for our appli-
and represent extremely complex systems and reveal features cation. The results showed that, by tuning six different
or patterns at a level of abstraction that is not feasible for parameters, we were able to increase the accuracy of
simpler algorithms. Encompassing a variety of learning tasks, the classification network, for a maximum accuracy of
from clustering and dimensionality reduction, to classification approximately 95 % correct classifications.
and reinforcement learning, deep learning architectures apply The remainder of this paper is as follows: In Section II, we
systems of hierarchical layers to fit and generalize large, often provide background information pertaining to deep learning,
multi-dimensional, feature sets more accurately than their mobile malware, and our testing environment. In Section III,
shallow learning counterparts [5]. we describe our approach, including the basic idea, system
For this reason, deep learning has great appeal for appli- workflow, and the parameters and scenarios that were tested.
cations in the realm of computer security as well. Truly, a In Section IV, we present our performance evaluation. Finally,
wide array of security applications exist, which can benefit in Section V, we provide concluding remarks.
from deep learning approaches. For instance, malware and
intrusion detection systems that employ anomaly detection can
benefit immensely from the application of machine learning, as II. P RELIMINARY
they require accurate detection of possibly yet unknown threats
in highly complex environments [3], [16]–[18], [20]–[23]. In We now provide background details central to this inves-
addition, mobile malware detection is an area of pressing tigation, including deep learning, mobile malware, and our
concern due to the rapid growth of the smartphone market, specific testbed environment.

978-1-5386-5889-5/18/$31.00 ©2018 IEEE 140


SNPD 2018, June 27-29, 2018, Busan, Korea
2

A. Deep Learning framework, and Scikit-Learn utilities. TensorFlow is a rela-


Deep learning, in general, is the term used to describe the tively new machine learning system, compared with Torch,
function of neural networks, which feature multiple hidden Caffe, MXNet, CNTK, etc., and was developed by Google
layers of many interconnected neurons to process their data. for experimenting with learning models and training on large
Though many differing definitions have been given to describe datasets. TensorFlow is able to be run on a large scale, on
deep learning, two common elements are: (i) the presence of single and multiple CPU or GPU systems and servers, or on
multiple, non-linear, layers for data processing, and (ii) the a small scale, on mobile devices [2].
notion of processing data at an increasingly higher level and Keras is a python library that serves as a wrapper for
abstract manner throughout the network [4]. Neural networks, various deep learning infrastructures (MXNet, TensorFlow,
and more specifically, deep neural networks, have emerged DeepLearning4J, etc.), providing a higher-level Application
in recent years as highly effective tools when approaching Programming Interface (API) for building and developing with
machine learning problems [8], [14]. neural network frameworks. The goal of the Keras project was
to develop an easier way for programmers to take advantage
Deep learning has been demonstrated for all types of
of neural network technologies [1]. In our work, Keras was to
learning tasks, whether they be directed, undirected, or
speed up the development process, allowing us to build various
reinforcement-based. As a primary goal of learning systems,
models faster and easier. In many cases, building a model
the ability to generalize a given data set for prediction on
in Keras can be done with very few lines of code, whereas
new and unforeseen data can enable the discovery of new
in a native TensorFlow environment, much more code would
targets and threats in the wild, but must be tuned to prevent
be needed, especially for very large and complex learning
over-fitting. In deep neural networks, each layer is composed
systems.
of many varying neurons, each with different weights and
potentially different activation functions, as well as loss func- In addition, Scikit-Learn is a python library that allows for
tions and optimizers. As data is applied to the network, the easy use of well-known machine learning algorithms and other
loss function calculates the error of the prediction, and the machine learning tools. Scikit-learn provides a python module
optimizer is used to update weights to progressively reduce for machine learning algorithm implementation. By utilizing
the loss function error and increase accuracy. Deep learning a high-level language, Scikit-learn provides a straightforward
algorithms incorporate association, classification, regression, approach to handling machine learning problems [12]. This
clustering, etc. [13], supporting various tasks, such as prepro- library was used to provide a well-known implementation of
cessing, prediction, and detection. several machine learning components.
III. O UR A PPROACH
B. Malware in Mobile
In this section, we introduce our approach. We first discuss
Android continues to dominate the global market for mobile concepts essential to our approach, then the system workflow,
operating systems, holding what is estimated to be as high as and finally the specific parameters and scenarios that were
87 % of global market share [15], a figure which has remained tested.
relatively stable since 2013. Clearly, as smartphone saturation
continues, the imminent threat of malicious intrusion cannot A. Basic Concepts
be downplayed. Current trends in Android malware include Feature Extraction: Feature extraction concerns the pro-
increasingly nuanced means of evading existing malware de- cess of transforming raw data into a format compatible with
tection tools, both on-device and in robust detection systems, machine learning algorithms. From our dataset of Android
such as the Android marketplace’s Google Bouncer or virtu- APK files, the permission information was extracted through
alized detection sandboxes. Such evasion mechanisms include the use of “aapt” (Android Asset Packaging Tool). The raw text
the repackaging of legitimate applications with malicious code output of this process contains information on the permissions
and the utilization of dynamically loaded code to load in for each application.
malware [11]. In addition, Android threats have emerged that To carry out feature extraction, the Scikit-learn library was
exploit privilege escalation, aggressively target users via drive- utilized to transform this text data into feature matrices for
by downloads, implement native code execution and dynamic use in training our neural networks. A vectorizer parsed the
payloads, and utilize code obfuscation and the limitations of input data according to a token defined to recognize android
mobile device hardware to circumvent commercial detection permission strings. As the vectorizer parsed through the input
systems [6]. As malware strains continue to evolve and develop data, it built a vocabulary of unique tokens as they were
invasive techniques, there is a clear and present need to keep encountered. Each input source, in this case each line of data,
pace in malware analysis and identification [21]. is represented in a row of the token matrix. Using a regular
expression on our entire dataset yields a sparse m x n matrix,
C. Testbed Overview denoting that, across a data set of m Android applications,
We now provide a brief overview of key components of the vectorizer identified a vocabulary of n permissions and
the testbed environment used in our scenario. The testbed permission contexts. For our particular dataset, this yielded a
includes the TensorFlow machine learning backend, the Keras 48643 x 22300 matrix.

141
3

B. System Workflow
Because of the quantity of hyper-parameters that we wished
to tune, and the number of models we wished to test, trying
every possible combination of all parameters and models
would have been infeasible due to the amount of computation
time it would take. To circumvent this, hyper-parameters were
tested in pairs. Each pair of hyper-parameters was used to
train and then test one model at a time. The following pairs
of hyper-parameters were tested together: epochs and batch
sizes, number of neurons and optimizer, and dropout rate and
weight constraint.
Each test yielded an optimized pair of hyper-parameters.
This optimized pair was then used in the training of the next
set of tested parameters. Thus, for each new test, the default
values for hyper-parameters set by Keras were replaced with
the optimized values found as a result of the previous tests.
Fig. 1 shows the overall workflow of the system.
Fig. 1. System Workflow • Step 1. In Step 1, we must prepare the data for input
to the neural network. In this phase, we utilize feature
extraction to extract the permissions data from the APK
Neural Network Model: In this particular experiment, we files. The structure of this list would be a list of permis-
implemented deep neural networks with both fully-connected sions for each application, with each application having
and dropout layers. In dense, or fully-connected, layers each its permissions on its own line. Next, we must convert this
neuron in the layer is directly connected to every neuron list to a dense matrix using Scikit-Learn as described in
in the preceding layer. The neurons are initialized with an Section II-C.
activation function and randomized weights, and on each • Step 2. Step 2 of the process is where we use the dense

back propagation step, the weights are updated. The activation matrix as input to the neural network for learning and
function defines what type of calculations and classifications classification. With the matrix as input, during training,
the neurons in that layer will complete. In our neural net- the neural network will learn which permissions are most
works, two different activation functions were used: the ReLU likely associated with benign applications, and which are
(Rectified Linear Unit) activation function, and the Sigmoid most likely associated with malicious applications. After
activation function. completion of the neural network training and testing,
we are left with a matrix of all the applications with
The ReLU activation function is an implementation of the
their predicted and actual classifications. This can then
rectifier function, f(x) = max(0,x), where x is the input to the
be used to determine accuracy and F1 score as described
neuron. The Sigmoid activation function was used as the last
in Section IV.
layer in every model. The purpose of using this function as the
last layer is to normalize the output. In addition, in the dropout C. System Parameters and Scenarios
layers, a ratio of neurons from the previous layer are prevented We now provide a description of the parameters and sce-
from passing input to the subsequent layer. The purpose of narios that were tested. The differences between the various
this type of layer is to prevent over-fitting. A sample neural models tested are described, followed by descriptions of each
network utilizing dense layers with dropout can be seen in of the hyper-parameters that were tuned.
Fig. 1. Neural Network Models: The following three different
Hyper-parameter Tuning: Neural networks often have neural network designs were tested: (i) The first model was
many hyper-parameters that are able to be initialized by the the most simple, containing a single dense layer with a
programmer or left at a default value. The Scikit-Learn library base number of neurons. (ii) The second model used was a
implements a hyper-parameter tuning function, called grid deeper network that utilized four dense layers, all with the
search, that was utilized here. The tuning function takes as base number of neurons. (iii) The third model was also four
input, a dictionary of parameter names and values, then runs layers deep; however, the number of neurons in each layer
a full test using each combination of parameters. The hyper- decreased. Using the first layer the base number of neurons,
parameters that we were concerned with tuning include batch the subsequent layers contain the fractions of the base number
size, epochs, number of neurons per layer, optimizer, dropout and the layer depth (i.e., Layer 1: x neurons, Layer 2: x/2
rate, and weight constraint. The grid search function would neurons, Layer 3: x/3 neurons, Layer 4: x/4 neurons). A
test every possible set of parameters, allowing us to determine sigmoid layer of size 1 was used as the output normalizing
the best combination. We also tested various shapes of neural last layer in all model configurations. Also, a dropout layer
networks to determine the best layout for the network. was added after each dense layer in all model configurations.

142
4

Fig. 3. Number of Neurons vs. Time


Fig. 2. Batch Size & Epochs vs. Accuracy

Epochs and Batch Size: An epoch is defined as a single in any given layer was 1. Any division resulting in a quotient
pass over an entire training set. More epochs means that the of 0 was assigned to 1 in order to preserve the shape/depth of
model is trained against the same training data more times. the model.
Epochs were tested in a range from 1 epoch to 32 epochs. In It was evident that increasing the number of neurons in the
most cases, as the number of epochs was increased, accuracy model generally led to a slight increase in accuracy. In some
increased as well. Although, in some cases the accuracy cases, too many neurons could lead to the network over-fitting
declined slightly after 16 epochs. This could be due to the model to the training data, and therefore under-performing on
model over-fitting to the specific features of the training data, unseen testing data.
and therefore performing slightly worse on data that it has The one-layer model showed the least variation in accuracy
never seen before. In addition, as epochs increased, the time with additional neurons added. The best result for this model
to train the model increased significantly. was the 45 neuron test. The four layers of the same size
Batch size is defined as the number of data points that are model showed improvement with adding neurons. The one
observed at once during training. The model will be fit with neuron test produced an average accuracy of approximately
the number of data points defined in the batch size repeatedly 68 %, while the best result produced 94.6 % at both 45 and 50
until the entire data set has been trained on for all epochs. neurons. The four layers with decreasing size model produced
Batch size in our scenario ranged from a very small batch of an accuracy of approximately 65 %, on average, at 1, 5 and
size 10 to a large batch of size 5000. Based on the results, it 10 neuron tests. The model showed an increase in accuracy at
can be determined that smaller batch sizes lead to the model 15 neurons. The best result for this model was the 45 neuron
taking a longer time to train and test. Additionally, it was seen test with 94.6 % accuracy.
that smaller batch sizes leads to higher accuracy. Additionally, as displayed in Fig. 3, it was noticed in all
When examined together, epochs and batch size have an models that increasing the number of neurons increases the
interesting relationship. Fig. 2 displays that increasing batch amount of time taken to train the model. Because there was
size and epochs tend to lead to higher accuracy. Though, a only a small increase in accuracy by adding additional neurons,
small batch size does not see as significant of an improvement one may consider using a more simple neural network to
by increasing epochs when comparing to a larger batch size. achieve relatively high accuracy if time is a constraining factor.
It can be seen in Fig. 2 that a small batch tends to achieve Optimizer: Several optimizers were chosen based on com-
slightly better accuracy than a larger batch with a high number patibility with the Keras wrapping infrastructure that we were
of epochs. On the contrary, there is a significant difference using. Enumerated, these optimizers are: SGD, RMSprop,
in the accuracy achieved by a model with a large batch size Adagrad, Adadelta, Adam, Adamax, and Nadam. Each of these
trained over low and high epochs. In addition, in terms of time, optimizers performs slightly different mathematical calcula-
batch size lead to a much larger impact on time than did the tions to optimize the results produced by the loss function.
number of epochs. It was evident, through this test, that all of the optimizers
Number of Neurons: The most successful number of tested produce similar results with this data set, in terms
neurons per layer was tested in a grid search. A neuron in of test accuracy. Fig. 4 illustrates the performance of many
this sense is the artificial entity, which makes a decision or optimizers with tests using different numbers of neurons.
classification based on its known criteria. Each model has The best result, as displayed by Fig. 4, was the Nadam
its own neuron scheme as defined in the model description optimizer when using 45 neurons. In addition, the biggest
section. Values for the number of neurons were tested as difference in the optimizers was seen with the time taken to
multiples of 5 ranging from 1 to 50 neurons per layer (1, 5, train the model, not the achieved test accuracy. The fastest
10, . . . , 50). For the four layers of the decreasing size model, optimizer was consistently the SGD optimizer and the slowest
integer division was used and the minimum number of neurons was consistently the Nadam optimizer. Optimizers besides

143
5

Fig. 4. Number of Neurons vs. Accuracy for Optimizers


Fig. 5. Dropout and Weight Constraint vs. Accuracy

Fig. 6. Training Ratio vs. Accuracy for 3 Fig. 7. Training Ratio vs. F1 Score for 3 Fig. 8. Training Ratio vs. Fit Time
Models Models

SGD and Nadam produced similar accuracies, in times falling rate can lead to significant increases in accuracy.
between SGD and Nadam.
IV. P ERFORMANCE E VALUATION
Dropout Rate and Weight Constraint: The most success- To perform the best-case tests, the best parameters from
ful dropout rate and weight constraint were tested together in each of the previously described searches were combined. The
a grid search. Dropout rate is the ratio that is passed to the utilized parameters for this test were: batch size of 10, 16
dropout layers in each model and designate the rate at which epochs, 45 neurons, Nadam optimizer, dropout rate of 30 %,
neurons are dropped out of each layer. Weight Constraint and weight constraint of 3. These parameters were used in
is used together with dropout layers to give the neurons tests of training percentages, 20 %, 40 %, 60 %, and 80 %. A
that remain after the dropout a certain classification weight. five-fold cross validation was performed to ensure the validity
A higher weight means the remaining neurons have more of the results. A Stratified Shuffle Split was used to mix the
influence on the ultimate classification. Dropout rate was tested data and ensure that a different portion of the data was used
at 10 % intervals from 0-90 %. Weight constraint is always an for each fold of cross validation.
integer value and was tested at values 1-5. For scoring this test, accuracy along with the F1 score
Based on the results of the grid search, it can be determined were considered. Notice that accuracy is the standard measure
that the best values to use for dropout rate and weight of correct classification percentage. F1 score is a standard
constraint are 30 % and 3, respectively. Results showed an weighted average of precision and recall given by F 1 =
2∗(Precision∗Recall)
Precision+Recall , where Precision = F P +T P , Recall =
TP
increase in accuracy as dropout rate was increased up to
30 %. This shows that dropout can help reduce over-fitting F N +T P , T P = True Positives, F P = False Positives, and
TP

of a model to training data. Nonetheless, after 30 % dropout, F N = False Negatives. Though more concerned with accuracy,
there was a decline in accuracy, showing that dropping out we can more fully determine the effectiveness of the network
too many neurons can restrict the network too much. This can classification by also considering the F1 metric.
be seen by the dark coloring of the graph in Fig. 5 at the The results of the test showed that at all training ratios, the
0.3 dropout rate marker. This is again seen with the weight four layers of decreasing size model performed the best. Fig. 6
constraint. An increase in accuracy is seen with increasing and Fig. 7 can be used together to display the accuracy and F1
weight until the weight of 3. After this point, there is a decline Score results. The four layers of the same size model had the
in accuracy, showing that too much weight on the remaining worst accuracy, but often the best F1 score, as shown by Fig. 6
neurons will give too much influence on the classification, and Fig. 7. The accuracies for this model were consistently
possibly resulting in over-fitting. Fig. 5 displays that utilizing lower than the other two models. At the 20 % training ratio,
appropriate numbers for both weight constraint and dropout the F1 score was the lowest of the models. Nonetheless, the

144
6

F1 scores for this model were ranked the best out of the three [2] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S.
for the 40 %, 60 % and 80 % training ratios, as seen in Fig. 7. Corrado, A. Davis, J. Dean, M. Devin, et al. Tensorflow: Large-scale
machine learning on heterogeneous distributed systems. arXiv preprint
The model that improved the most by increasing the training arXiv: 1603.04467, 2016.
ratio was the one layer model. This model showed the largest [3] Z. Chen, G. Xu, V. Mahalingam, L. Ge, J. Nguyen, W. Yu, and C. Lu. A
increase in accuracy by increasing the training ratio from 20 % cloud computing based network monitoring and threat detection system
for critical infrastructures. Big Data Research, 3:10–23, 2016.
to 80 %. At the 80 % ratio, accuracy was the second best of [4] L. Deng, D. Yu, et al. Deep learning: methods and applications.
the models tested and the F1 score was the best of all tested Foundations and Trends in Signal Processing, 7(3–4):197–387, 2014.
ratios and models. [5] Z. M. Fadlullah, F. Tang, B. Mao, N. Kato, O. Akashi, T. Inoue,
and K. Mizutani. State-of-the-art deep learning: Evolving machine
In terms of time, all models performed similarly. This can intelligence toward tomorrows intelligent network traffic control sys-
be seen in Fig. 8. The fastest time to train was seen in the one tems. IEEE Communications Surveys and Tutorials, 19(4):2432–2455,
layer model, which took an average of 637 seconds to train Fourthquarter 2017.
[6] P. Faruki, A. Bharmal, V. Laxmi, V. Ganmoor, M. S. Gaur, M. Conti, and
at the 20 % training ratio. The longest time to train a model M. Rajarajan. Android security: A survey of issues, malware penetration,
was seen in the one layer model, which took an average of and defenses. IEEE Communications Surveys and Tutorials, 17(2):998–
2461 seconds to train at the 80 % training ratio. Based on these 1022, Secondquarter 2015.
[7] S. GmbH. Number of smartphone users in the United States from
statistics, it can be seen that training ratio and time increase is 2010 to 2022 (in millions)*. https://www.statista.com/statistics/201182/
approximately linear, as shown in Fig. 8. Thus, doubling the forecast-of-smartphone-users-in-the-us/, 2017.
training ratio will double the amount of time taken to train the [8] W. G. Hatcher and W. Yu. A survey of deep learning: Platforms,
applications and emerging research trends. IEEE Access, 2018.
model. [9] Y. LeCun, Y. Bengio, and G. Hinton. Deep learning. Nature,
Based on these results, the best model to use for classifi- 521(7553):436–444, 2015.
cation with this data set is the four layers of decreasing size [10] J. Lin, W. Yu, N. Zhang, X. Yang, H. Zhang, and W. Zhao. A survey
on Internet of Things: Architecture, enabling technologies, security and
model. At all training ratios, this model outperformed the other privacy, and applications. IEEE Internet of Things Journal, 4(5):1125–
two in terms of accuracy. By also considering the F1 score, 1142, Oct 2017.
this model was the best performer at the 20 % training ratio, [11] M. Lindorfer, M. Neugschwandtner, L. Weichselbaum, Y. Fratantonio,
V. Van Der Veen, and C. Platzer. Andrubis–1,000,000 apps later: A view
and the second best in all other ratios. Overall, by increasing on current android malware behaviors. In Building Analysis Datasets
the training ratio, the analysis will be able to achieve higher and Gathering Experience Returns for Security (BADGERS), 2014 Third
accuracy, though at the cost of a much higher training time. International Workshop on, pages 3–17. IEEE, 2014.
[12] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
By tuning hyper-parameters of the deep learning infrastructure, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vander-
we were able to optimize the performance. plas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duch-
esnay. Scikit-learn: Machine learning in Python. Journal of Machine
Learning Research, 12:2825–2830, 2011.
V. F INAL R EMARKS [13] SAP SE. Data Science and Machine Learning in the Internet of Things
and Predictive Maintenance. https://www.sap.com/documents/2016/10/
In this paper, we have successfully demonstrated the appli- 8ec7f23f-917c-0010-82c7-eda71af511fa.html, 2017.
cation of deep neural networks for permission-based Android [14] J. Schmidhuber. Deep learning in neural networks: An overview. Neural
malware classification. Distinct from similar research efforts, networks, 61:85–117, 2015.
[15] L. Sui. Global smartphone os market share by region: Q3 2016, 2016.
we examine both core-Android and user-defined permissions, [16] V. L. L. Thing. IEEE 802.11 network anomaly detection and attack
and distinguish between optional and required permissions, classification: A deep learning approach. In 2017 IEEE Wireless Com-
allowing us to establish a much larger feature vocabulary munications and Networking Conference (WCNC), pages 1–6, March
2017.
for classification. Utilizing the grid search method of hyper- [17] W. Yu, Z. Chen, G. Xu, S. Wei, and N. Ekedebe. A threat monitoring
parameter tuning, we were able to optimize our networks for system for smart mobiles in enterprise networks. In Proceedings of the
the best possible performance in terms of accuracy, precision, 2013 Research in Adaptive and Convergent Systems, RACS ’13, pages
300–305, New York, NY, USA, 2013. ACM.
and recall. As ongoing research, we plan to extend our work [18] W. Yu, L. Ge, G. Xu, and X. Fu. Towards Neural Network Based
to edge computing platform [19] and Internet of Things Malware Detection on Android Mobile Devices, pages 99–117. Springer
applications [10]. International Publishing, Cham, 2014.
[19] W. Yu, F. Liang, X. He, W. G. Hatcher, C. Lu, J. Lin, and X. Yang. A
survey on the edge computing for the Internet of Things. IEEE Access,
ACKNOWLEDGEMENT 6:6900–6919, 2018.
[20] W. Yu, G. Xu, Z. Chen, and P. Moulema. A cloud computing based
This work was supported in part by the US National Sci- architecture for cyber security situation awareness. In 2013 IEEE
ence Foundation (NSF) under grants: CNS 1350145 (Faculty Conference on Communications and Network Security (CNS), pages
488–492, Oct 2013.
CAREER Award), and the University System of Maryland [21] W. Yu, H. Zhang, L. Ge, and R. Hardy. On behavior-based detection of
(USM) Endowed Wilson H. Elkins Professorship Award Fund. malware on android platform. In 2013 IEEE Global Communications
Any opinions, findings and conclusions or recommendations Conference (GLOBECOM), pages 814–819, Dec 2013.
[22] Z. Yuan, Y. Lu, Z. Wang, and Y. Xue. Droid-Sec: deep learning in an-
expressed in this material are those of the authors and do not droid malware detection. In ACM SIGCOMM Computer Communication
necessarily reflect the views of the agencies. Review, volume 44, pages 371–372. ACM, 2014.
[23] D. Zhu, H. Jin, Y. Yang, D. Wu, and W. Chen. DeepFlow: deep learning-
R EFERENCES based malware detection by mining Android application for abnormal
usage of sensitive data. In 2017 IEEE Symposium on Computers and
[1] Keras: The python deep learning library. https://keras.io/. Accessed: Communications (ISCC), pages 438–443, July 2017.
2018-02-14.

145

You might also like