Malware Detection Using Blockchain

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/334298384
Evaluating Machine Learning Models on the Ethereum Blockchain for Android

Malware Detection
Chapter · July 2019

DOI: 10.1007/978-3-030-22868-2_34
CITATIONS READS
7 666
3 authors:
Md Shohel Rana Charan Gudla

Florida Gulf Coast University University of Southern Mississippi
22 PUBLICATIONS 267 CITATIONS 6 PUBLICATIONS 62 CITATIONS
SEE PROFILE SEE PROFILE
Andrew H. Sung
University of Southern Mississippi
27 PUBLICATIONS 326 CITATIONS
SEE PROFILE
All content following this page was uploaded by Md Shohel Rana on 11 July 2019.
The user has requested enhancement of the downloaded file.

Evaluating Machine Learning Models
on the Ethereum Blockchain for Android
Malware Detection
Md. Shohel Rana(&), Charan Gudla, and Andrew H. Sung
School of Computing Sciences and Computer Engineering,

The University of Southern Mississippi, Hattiesburg, MS 39406, USA
{md.rana,charan.gudla,andrew.sung}@usm.edu
Abstract. Android, the most popular mobile operating system, with billions
of active users and more than 2 million apps, has motivated advertisers,
hackers, fraudsters and cyber-criminals to develop malware of all types for it.
In recent years, extensive research has been conducted on malware analysis
and detection for Android devices, even though Android has already
implemented various security mechanisms to deal with the problem. In this
paper, we developed a consortium blockchain network to evaluate various
machine learning models for a given malware dataset. A reward is offered
using smart contracts as an incentive to the competitors for their work by
allowing them to submit solutions through training with selected machine
learning models in a secure and trustworthy manner. The analysis of datasets
by competitors helps various organizations in the network to enhance or
boost their current malware detection or defense tools. The decentralized
network provides transparency, enhances security and reduces the cost in
managing all relevant data by eliminating third parties. We used DREBIN
dataset in the developed framework for initial experiments and the encour-
aging results are presented.
Keywords: Machine learning Blockchain Smart contract Google

Malware
1 Introduction
Android is a most popular mobile operating system developed by Google, built on a

modified version of the Linux kernel in conjunction with additional open source
software and designed primarily for touchscreen mobile devices and its architecture is
divided into five components with two models of permissions: (i) A sandbox envi-
ronment at the kernel level which prevent access to the file-system and other
© Springer Nature Switzerland AG 2019

K. Arai et al. (Eds.): CompCom 2019, AISC 998, pp. 446–461, 2019.
https://doi.org/10.1007/978-3-030-22868-2_34
Evaluating Machine Learning Models on the Ethereum Blockchain 447
resources and (ii) API used to expose to the user during installation of an application
[1]. Every application is assembled into an Android Application Package (APK) file
contains application code (.dex files), resources, and the AndroidManifest.xml file
that is considered as an important element that provides the information of the fea-
tures and the security configuration of every application (e.g. permissions API,
activities, services, content providers and the broadcast receivers). After Decompiling
the APK file, the AndroidMenifest.xml (see Fig. 1) file has been experimented how
the permissions are used and the Java files where the API functions are invoked in
conjunction with the asset and resource files whether there is any dex executable
(ELF) image file or any code hiding image script available or not [2–5]:
Fig. 1. An AndroidMenifest.xml file with used permissions
In the above, red permissions are commonly used in malicious Android apps. It has
also been found that some sensitive functions often related to malware, as follows:
448 Md. S. Rana et al.
(TelephonyManager) context.getActivity().getSystemService("phone")).getDeviceId()
(TelephonyManager) context.getActivity().getSystemService("phone")).getLine1Number()
(ActivityManager) context.getActivity().getSystemService("activity")
List<RunningAppProcessInfo> procInfos = ((ActivityManager)
context.getActivity().getSystemService("activity")).getRunningAppProcesses();
for (int i = 0; i < procInfos.size(); i++) {
if(((RunningAppProcessInfo)
procInfos.get(i)).processName.equals(args[0].getAsString())){
b = Boolean.valueOf(true);
}
}
String[] strArr = new String[]{
"android.permission.ACCESS_COARSE_LOCATION",
"android.permission.ACCESS_FINE_LOCATION",
"android.permission.ACCESS_WIFI_STATE",
"android.permission.CHANGE_WIFI_STATE",
"android.permission.VIBRATE",
"android.permission.READ_CALENDAR",
"android.permission.WRITE_CALENDAR",
"com.google.android.gms.permission.ACTIVITY_RECOGNITION"
};
IntentFilter intentFilter = new IntentFilter("android.intent.action.PACKAGE_ADDED");
intentFilter.addAction("android.intent.action.PACKAGE_REMOVED");
intentFilter.addAction("android.intent.action.PACKAGE_REPLACED");
intentFilter.addDataScheme("package");
registerReceiver(this.f2955D, intentFilter);
intentFilter = new IntentFilter();
intentFilter.addAction("android.net.wifi.SCAN_RESULTS");
intentFilter.addAction("android.net.wifi.STATE_CHANGE");
intentFilter.addAction("android.net.wifi.WIFI_STATE_CHANGED");
intentFilter.addAction("android.net.wifi.supplicant.CONNECTION_CHANGE");
registerReceiver(this.an, intentFilter);
External links:
public static String f2920C = "http://www.shinhwa21.net/new/apps_judis_end.php?pkg=";

private static String al = "http://www.shinhwa21.net/new/apps_kakao_judis_7.php?pkg=";
To deal with the Android malware problem, much research has been done recently.
For example, TaintDroid [6], DroidRanger [7] and DroidScope [8] are methods pro-
posed to monitor the runtime behavior of applications. Also, static analysis methods to
assess malicious behavior in source code, data or binary files without running the
application have been proposed by Kirin [9], Stowaway [10] and Risk-Ranker [11].
In this paper, we evaluate machine learning solutions to the Android malware
detection problem by building a consortium blockchain based on the DanKu [12]
protocol that allows the initiator to post a dataset, the evaluation criteria, and a reward
for the best machine learning model submitted. The proposed system is depicted in
Fig. 2 below.
The paper is organized as follows: Sect. 2 describes related works, Sect. 3
describes the proposed malware detection system architecture, Sect. 4 describes the
implementation, Sect. 5 describes threat analysis, Sect. 6 describes various machine
learning algorithms which have been used in our experiment, Sect. 7 describes
methodology includes dataset description and feature extraction, Sect. 8 describes
results and analysis, including measurement metrices and comparison, and finally
Sect. 9 describes conclusion and future work.
Fig. 2. Overview of proposed framework
2 Related Works
There are many studies in the literature of using machine learning algorithms for
malware detection. Gu et al. [13] proposed a multi-feature detection method by con-
structing a framework named CB-MMDE for detection and classification of malware
on Android devices through blockchain technology.
Raje et al. [14] proposed a decentralized firewall built using blockchain technology
to classify Portable Executable (PE) files as malicious or benign using a deep belief
neural network (DBN) as the detection engine. In order to classify grayscale images
into two classes using DBN, the dataset of 10,000 files is used to train the network.
Ouaguid et al. [15] proposed a new framework ANDROSCANREG (Android
Permissions Scan Registry) by extracting and analyzing the requested permissions in an
Android application through a decentralized and distributed system based on the
emerging technology called blockchain. ANDROSCANREG consists of two block-
chains, (i) PERMBC, handles analysis, validation and preparation of the raw results so
that they can persist in the second Blockchain, (ii) BTCBC, save the permissions
history of each version of the applications being scanned via financial transactions. The
presented approach can implement different analysis of an Android application
allowing it to download and install malicious modules in the device file in order to
upload, destroy or encrypt the victim’s private data based on the acquisition of the
administrator permissions.
Noyes [16] proposed design and implementation of a novel anti-malware envi-
ronment called BitAV allowing decentralization to update and maintain the software.
The feedforward scanning technique significantly enhanced the performance of mal-
ware matching system by decomposing the file matching process into efficient inter-
rogations at constant (O(1)) time.
Firdaus et al. [17] proposed a bio-inspired method with machine learning to detect
root exploit by examining three types of features including system command, directory
path, and code-based features, in conjunction with the novel android debug bridge
(ADB). This proposed method used four types of boosting algorithms including
AdaBoost, real AdaBoost, logit boost, and multiboot for the classification purposes and
developed a system called RODS.
Moubarak [18] proposed a new notion by utilizing the blockchain technology to
explore the feasibility of new untraceable malwares. This proposed idea developed a 4-
ary malware and tested it in real time where each chunk of the code interacts with the
bitcoin network to validate and assure whether it belongs to malicious software.
3 Proposed System Architecture
3.1 Basic Structure

In order to ensure trust and integrity, the proposed system is consisted of three phases,
see Fig. 3.
Fig. 3. Structure of the proposed system
• Phase 1: Initiator (Alice) initializes the genesis block by revealing a dataset defined
with train and test ratio, an evaluation criterion that must be fulfilled by the miners
or machine learning engineers, and a reward amount typically denominated in
cryptocurrency for best model submitter to the Ethereum smart contract. The
Ethereum smart contract provides a list to machine learning algorithms to model the
data and evaluate the model for classification tasks.
• Phase 2: Miners or machine learning engineers acting as submitter download the
training set provided by an initiator, and train with the selected machine learning
algorithm to build a model. After successful evaluation of the model, submit their
solutions to the blockchain. The smart contract ensures the submission by providing
Submitter_ID to uniquely identify the submission results into the Blockchain.
• Phase 3: After finalizing the competition, the blockchain analyzes the submission
results and selects a winner of the competition who meets the accuracy criteria.
3.2 Definitions
• user is anyone who can interact with Ethereum contracts.

• initiator is a user who initializes the blockchain with genesis file and create smart
contracts.
• smart contract is an Ethereum contract.
• submitter (miner or machine learning engineer) is a user who submits solutions
to the blockchain for a reward.
• period is a timeframe needed to mine a block.
• data point is made up of input(s) and prediction(s).
• data group is a matrix made up of data points.
• hashed data group is the hash of a data group including random nonce.
• contract wallet address is an account that hold the reward amount until the
finalization of the contract.
3.3 Features of Smart Contracts
• Initialization(), using this function Bob will create a genesis block with some
parameters including Dataset, Accuracy criteria and a reward amount.
• IsBlockInitialized() is responsible whether the contract is created, initialized and
returns two arrays containing Training and Test indexes of the dataset.
• RevealData() splits the dataset into training and testing sets by creating two arrays
containing the indexes generated randomly and reveals to the chain.
• DownloadTrainDataAndMLASelection() is used by miners/submitters to get the
data revealed in previous step and returns two arrays (i) list of Machine Learning
Algorithms, and (ii) list of Training Data Indexes.
• SubmitSolution() is invoked after modeling the data using training indexes with
one of the machine learning algorithms and returns Submitter_ID as
acknowledgement.
• EvaluateSubmission() takes submitter id as input parameter to evaluate the model
from previous step and finally returns the evaluation result (Accuracy, Precision,
Recall, F1-score, etc.) as an array.
• SubmissionQueue() is responsible to store the submission information into the
blockchain along with Submitter_ID.
• BestSubmitter() retrieves submitted information from the blockchain to analyze
accuracy criteria and returns Submitter_Address and Submitter_ID.
• ContractFinalized() To finalize the competition, this function is used with returning
Boolean Value (True or False).
• CancelContract() is used to cancel the contract if any exception is occurred.
• RewardOrRefund() This function is used to send reward amount to the best sub-
mitter and/or to refund the reward amount to the initiator if no one fulfills the
conditions.
• GetTrainingIndex() is used to retrieve training indexes.

• GetTestingIndex() is used to retrieve testing indexes.
• GetSubmissionId() is used to retrieve submission id.
4 Implementation
4.1 Dataset Hashing

The initiator who creates the contract, splits the dataset into data groups. Nonce is a
number generated randomly along with data groups and it is used to identify each data
group. The initiator hashes these data groups using SHA256 algorithms.
4.2 Determine Train and Test Dataset

During Initialization in phase 1, the initiator calls the function Initialization() which
uses previously mined block hash number as a seed for randomization. The dataset
groups are randomly selected using nonce. The dataset is split into 80% for training and
20% for testing. We repeat the procedure until all data group indexes are selected.
Algorithm used in DanKu protocol (in solidity) for randomly selecting hashes:
function randomly_select_index(uint[] array) private {
uint t_index = 0;
uint array_length = array.length;
uint block_i = 0;
// Randomly select training indexes
while(t_index < training_partition.length) {
uint random_index =
uint(sha256(block.blockhash(block.number-block_i))) %
array_length;
training_partition[t_index] = array[random_index];
array[random_index] = array[array_length-1];
array_length--;
block_i++;
t_index++;
}
t_index = 0;
while(t_index < testing_partition.length) {
testing_partition[t_index] = array[array_length-1];
array_length--;
t_index++;
}
}
5 Competition Rules
The proposed system has been designed in such a way that no participants can cheat or
get advantages over other users (e.g. the initiators, the submitters, and/or machine
learning engineers of the Ethereum Blockchain).
5.1 Overfitting by Submitter

In order to prevent overfitting problem, the test dataset is kept secret until the sub-
mission period ends. Because, submitters or machine learning engineers can overfit
their selected machine learning model if they have the access to test dataset. The smart
contract evaluates the submission using test dataset.
5.2 Too Many Submissions

The competitor cannot submit solution more than once, because after first successful
submission the smart contract save the information to the chain with his/her address
along with submission id.
5.3 Block Hash Manipulation by Miners

A miner does not have the full control over the working procedure of training indexes
selection. After observing which data groups are going to be selected, the initiator can
take decision whether to mine the block or not which could result in disagreeable
training indexes.
5.4 Distributed Reward System Abuse

To prevent malicious submitters from re-submitting the similar solution by making small
changes to the original submitter solution and calling the evaluation function before the
original submitter, a distributed reward system may de-incentivize submitters. Due to this
reason, it ensures that the best submitter only be paid. The first submitter will be
rewarded if there are two submissions with the same solution and evaluated.
6 Machine Learning Algorithms
This section describes basics of machine learning algorithms that have been used in this
experiment as follows:
• Decision Tree (DT): Builds classification models in the form of a tree structure by
breaking down a dataset (categorical and/or numerical data) into smaller subsets and
the result is a tree with decision nodes having two or more branches and leaf nodes
that represent a classification or decision. The topmost decision node also known as
root node in a tree which corresponds to the best predictor [19].
• Random Forest (RF): A supervised learning algorithm trained with the Bagging
method that builds multiple decision trees and merges them together to get a more
accurate and stable prediction [20].
• Extremely Randomized Tree (ERT): With respect to random forests, this method
drops the idea of using bootstrap copies of the learning model without trying to find an
optimum cut-point for each one of the K randomly chosen features at each node [21].
• Gradient Boosted (GB): An ensemble machine learning algorithm works for both
regression and classification problems using boosting technique, combining several
weak learners to form a strong learner [22].
• Support Vector Machine (SVB): A discriminative classifier defined by a sepa-
rating hyperplane and the algorithm outputs an optimal hyperplane which catego-
rizes new examples. In 2D space this hyperplane is a line dividing a plane in two
parts where in each class lay in either side [23].
• Neural Networks (NN-MLP): The multilayer perceptron is a feedforward artificial
neural network model that consists of multiple layers where each of the layer is fully
connected to the next layer and maps sets of input data onto a set of appropriate
outputs. The nodes of the layers are neurons using nonlinear activation functions,
except for the nodes of the input layer. There can be one or more non-linear hidden
layers between the input and the output layer [24].
• Naïve Bayes (NB): A supervised classification algorithm for solving binary or
multi-class classification problems and for predictive modeling by calculating the
probabilities for each factor and then selecting the outcome with highest probability
using Bays theorem [25].
• k-Nearest Neighbors (k-NN): A simplest machine learning algorithms that can be
used for both classification and regression predictive problems. In a classification
task, the output is calculated as the class with the highest frequency from the K-most
similar instances where each instance votes for their class and the class having most
votes is taken as the prediction and class probabilities is calculated as the nor-
malized frequency of samples that belong to each class in the set of K most similar
instances for a new data instance [26].
• Discriminant Analysis (DA): Discovers a set of prediction comparisons based on
independent variables that are used to classify entities into groups with having are
two possible objectives, (i) finding a predictive equation for classifying new indi-
viduals, (ii) interpreting the predictive equation to better understand the relation-
ships exists among the variables [27].
• Logistic Regression (LR): The appropriate regression analysis to conduct when the
dependent variable (target) is categorical (e.g. to predict whether an email is spam 1
or 0) and used to describe data used when the dependent variable (target) is cate-
gorical and to explain the relationship between one dependent binary variable and
one or more nominal, ordinal, interval or ratio-level independent variables [28].
7 Methodology
In this experiment we have used ‘DREBIN’ [29] dataset in which 11,120 of 123,453
are real Android applications from 179 different malware families, 5,560 applications
of real malware samples and 5,560 applications of real benign samples. The samples
were collected in the period of August 2010 to October 2012. An overview of the top
20 malware families in our dataset is provided in Table 1 including numerous feature

families: api_call, feature, url, service_receiver, permission, call, intent, real_permis-
sion, activity, provider. Note that only top 20 families are exposed, and our dataset
contains number of entries in each class of malware (values above 40).
Table 1. Top malware families of our dataset (values above 40)

Malware family # Entries Malware family # Entries
FakeInstaller 925 Adrd 91
DroidKungFu 667 DroidDream 81
Plankton 625 ExploitLinuxL otoor 70
Opfake 613 Glodream 69
Ginmaster 339 MobileTx 69
BaseBridge 330 FakeRun 61
Iconosys 152 SendPay 59
Kwin 147 Gappusin 58
FakeDoc 132 Imlog 43
Geinimi 92 SMSreg 41
8 Results and Analysis

8.1 Measurement Metrices
Confusion Matrix: A confusion matrix is a matrix which contains information about
actual and predicted classifications to measure the performance of algorithm using the
matrix data [30, 31]. Table 2 represents confusion matrix.
Accuracy (AC) is the proportion of the total number of corrected predictions.
Overall, how often is the classifier correct?
Table 2. Confusion matrix

Actual class
Positive Negative
Predictive class Positive True Positive (TP) False Positive (FP)
Negative False Negative (FN) True Negative (TN)
No:of correctly classified data ðTP þ TN Þ

accuracy ðAC Þ ¼
Total
Precision (P) is the proportion of the correctly predicted positive cases determined by
TP
precision ¼
TP þ FP
Recall or True Positive rate (TP) is the proportion of the correctly identified
positive cases defined by
TP
recall ¼
TP þ FN
False Positive rate (FP) is the proportion of negatives cases that were incorrectly
classified as positive, defined by
FP
false positive rate ¼
TP þ FP
f1- Score or F-Measure is a weighted average of the True Positive (TP) rate or
recall and Precision (P) defined by

b2 þ 1 P TP
F¼
b2 P þ TP
Where b has a value from 0 to infinity and is used to control the weight assigned to
TP and P.
ROC Curve is a graph to summarize the performance of the classifier over all
probable thresholds generated by plotting the True Positive (TP) Rate in Y-axis against
the False Positive (FP) Rate in X-axis.
8.2 Results
We compared the results of machine learning classifier submitted by the competitors or
miners in a simulated environment. More than one competition is held, and their
respective accuracy criteria are recorded (see Figs. 4 and 5). We conducted 4 com-
petitions for our experimental purpose which is described in Table 3.
Table 3. Result of the performance

Competition Submitter Algorithms Precision Recall f1-score Accuracy
0 1 0 1 0 1
COMP1 With 1 k-NN 0.92 0.89 0.89 0.92 0.90 0.91 90.47
Accuracy Criteria 2 DT 0.93 0.88 0.88 0.94 0.91 0.91 91.78
80% 3 LR 0.78 0.84 0.86 0.75 0.82 0.80 80.94
4 NB 0.68 0.52 0.15 0.92 0.25 0.66 53.60
5 NN (MLP) 0.85 0.91 0.92 0.83 0.88 0.87 87.54
COMP2 with 1 ERT 0.93 0.93 0.93 0.93 0.93 0.93 93.66
Accuracy criteria 2 DA 0.76 0.88 0.91 0.70 0.83 0.78 80.71
85% 3 SVM 0.91 0.91 0.91 0.91 0.91 0.91 90.74
4 GB 0.87 0.88 0.89 0.86 0.88 0.87 87.50
5 k-NN 0.92 0.89 0.89 0.92 0.90 0.91 90.47
(continued)
Table 3. (continued)
Competition Submitter Algorithms Precision Recall f1-score Accuracy
0 1 0 1 0 1
COMP3 with 1 DA 0.76 0.88 0.91 0.70 0.83 0.78 80.71
Accuracy criteria 2 GB 0.87 0.88 0.89 0.86 0.88 0.87 87.50
90% 3 SVM 0.91 0.91 0.91 0.91 0.91 0.91 90.74
4 LR 0.78 0.84 0.86 0.75 0.82 0.80 80.94
5 RF 0.95 0.94 0.94 0.95 0.94 0.94 94.33
COMP4 with 1 RF 0.95 0.94 0.94 0.95 0.94 0.94 94.33
Accuracy criteria 2 ERT 0.93 0.93 0.93 0.93 0.93 0.93 93.66
95% 3 DT 0.93 0.88 0.88 0.94 0.91 0.91 91.78
4 NN (MLP) 0.85 0.91 0.92 0.83 0.88 0.87 87.54
5 SVM 0.91 0.91 0.91 0.91 0.91 0.91 90.74
Fig. 4. Accuracy curves

Fig. 5. ROC curves
Finally, we obtained our best outcome by running several trials, using 80% for
training and 20% for testing, done under the top 100 most frequent API calls and
system permissions applied with information gain. During the experiment we con-
ducted a competition where 5 competitors can take place to submit their trained model
to the Blockchain. We obtain various results if we see each competition, for example,
in the first competition that assign the accuracy criteria (80%), so anyone can achieve
the reward who will obtain more than or equal to accuracy criteria, here Submitter_2
achieves the reward by obtaining the highest accuracy 91.78% among the others by
using Decision Tree classifier. Similarly, in the second competition Submitter_1 is
rewarded who obtained accuracy of 93.66% using Extremely Randomized Tree with
fulfilling the accuracy criteria (85%). In the third competition, Submitter_5 is rewarded
by obtaining 94.33% using Random Forest classifier with fulfilling the accuracy criteria
(90%). But, in terms of fourth competition, No one is rewarded due to failing to fulfill
the accuracy criteria defined 95%, so the reward amount will be back to the initiator
who created or established competition. We can define the “BEST” outcome as the
experiment that maximizes the difference between the TPR and the FPR since we have
achieved higher TPR and lower FPR shown in Table 3.
9 Conclusion and Future Work
In this paper, for a given dataset we evaluated various machine learning models in a
consortium blockchain network. An incentive is awarded to the best machine learning
model submitter. The analysis of dataset by competitors helps various organizations in
the network to enhance or boost their current malware detection tools. The decen-
tralized network provides transparency, enhances security and reduces the cost in
managing all relevant data by eliminating third parties.
For future work, we develop a real-time malware detection system with blockchain
integration to detect and prevent unknown malicious attacks on a network. Also, we
will conduct more experiments with additional datasets and other machine learning
algorithms to improve malware detection.
Acknowledgment. The authors wish to acknowledge the valuable help received from Besir
Kurtulmus, Algorithmia Inc., for his guidance on technology and domain knowledge pertaining
to applying machine learning within blockchain.
References
1. Drake, J.J., Lanier, Z., Mulliner, C., Fora, P.O., Ridley, S.A., Wicherski, G.: Android
Hacker’s Handbook. Wiley, Indianapolis (2014)
2. Rana, M.S., Sung, A.H.: Malware analysis on android using supervised machine learning
techniques. Int. J. Comput. Commun. Eng. 7(4), 178–188 (2018)
3. Rana, M.S., Rahman, S.S.M.M., Sung, A.H.: Evaluation of tree based machine learning
classifiers for android malware detection. In: Nguyen, N., Pimenidis, E., Khan, Z.,
Trawiński, B. (eds.) Computational Collective Intelligence. ICCCI 2018. Lecture Notes in
Computer Science, vol. 11056. Springer, Cham (2018). https://doi.org/10.1007/978-3-319-
98446-9_35
4. Rana, M.S., Gudla, C., Sung, A.H.: Android malware detection using stacked generalization.
In: Proceeding of 27th International Conference on Software Engineering and Data
Engineering, pp. 15–19 (2018)
5. Rana, M.S., Gudla, C., Sung, A.H.: Evaluating machine learning models for android
malware detection – a comparison study. In: Proceeding of International Conference on
Network, Communication, and Computing, Taipei, Taiwan (2018)
6. Enck, W., Gilbert, P., Chun, B., Cox, L.P., Jung, J., McDaniel, P., Sheth, A.: Taintdroid: an
information-flow tracking system for real-time privacy monitoring on smartphones. In:
Proceeding of USENIX Symposium on Operating Systems Design and Implementation
(OSDI), pp. 393–407 (2010)
7. Zhou, Y., Wang, Z., Zhou, W., Jiang, X.: Hey, you, get off my market: detecting malicious
apps in official and alternative android markets. In: Proceeding of Network and Distributed
System Security Symposium (NDSS) (2012)
8. Yan, L.K., Yin, H.: Droidscope: seamlessly reconstructing OS and dalvik semantic views for
dynamic android malware analysis. In: Proceeding of USENIX Security Symposium (2012)
9. Enck, W., Ongtang, M., McDaniel, P.D.: On lightweight mobile phone application
certification. In: Proceeding of ACM Conference on Computer and Communications
Security (CCS), pp. 235–245 (2009)
10. Felt, A.P., Chin, E., Hanna, S., Song, D., and Wagner, D.: Android permissions demystified.
In: Proceeding of ACM Conference on Computer and Communications Security (CCS),
pp. 627–638 (2011)
11. Grace, M., Zhou, Y., Zhang, Q., Zou, S., Jiang, X.: Risk-ranker: scalable and accurate zero-
day android malware detection. In: Proceeding of International Conference on Mobile
Systems, Applications, and Services (MOBISYS), pp. 281–294 (2012)
12. Kurtulmus, A.B., Daniel, K.: Trustless Machine Learning Contracts; Evaluating and
Exchanging Machine Learning Models on the Ethereum Blockchain, Algorithmia Research
(2018). https://algorithmia.com/static/documents/d3a4c04/Machine-Learning-Models-on-
the-Ethereum-Blockchain.pdf. Accessed 18 Sept 2018
13. Gu, J., Sun, B., Du, X., Wang, J., Zhuang, Y., Wang, Z.: Consortium blockchain-based
malware detection in mobile devices. In: IEEE Access, vol. 6, pp. 12118–12128 (2018).
https://doi.org/10.1109/access.2018.2805783
14. Raje, S., Vaderia, S., Wilson, N., Panigrahi, R.: Decentralised firewall for malware detection.
In: 2017 International Conference on Advances in Computing, Communication and Control
(ICAC3), pp. 1–5 (2017)
15. Ouaguid, A., Abghour, N., Ouzzif, M.: A novel security framework for managing android
permissions using blockchain technology. Int. J. Cloud Appl. Comput. (IJCAC) 8(1), 55–79
(2018)
16. Noyes, C.: BitAV: Fast Anti-Malware by Distributed Blockchain Consensus and Feedfor-
ward Scanning, CoRR, abs/1601.01405 (2016)
17. Firdaus, A., Anuar, N.B., Razak, M.F., Hashem, I.A., Bachok, S., Sangaiah, A.K.: Root
exploit detection and features optimization: mobile device and blockchain based medical
data management. J. Med. Syst. 42, 1–23 (2018)
18. Moubarak, J., Filiol, E., Chamoun, M.: Developing a K-ary malware using Blockchain.
https://arxiv.org/abs/1804.01488. Accessed 20 Oct 2018
19. Decision Tree – Classification. https://www.saedsayad.com/decision_tree.htm. Accessed 20
Oct 2018
20. Towards Data Science | The Random Forest Algorithm. https://towards-datascience.com/the-
random-forest-algorithm-d457d499ffcd. Accessed 20 Oct 2018
21. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42
(2006)
22. A Comprehensive Guide to Ensemble Learning. https://www.analyticsvidhya.com/-blog/
2018/06/comprehensive-guide-for-ensemble-models/. Accessed 20 Oct 2018
23. Towards Data Science | Support Vector Machine - Introduction to Machine Learning
Algorithms. https://towardsdatascience.com/support-vector-machine-introduction-to-
machine-learning-algorithms-934a444fca47. Accessed 20 Oct 2018
24. Neural Networks with Scikit. https://www.python-course.eu/neural-networks-with-scikit.php.
Accessed 20 Oct 2018
25. Naive Bayes for Machine Learning. https://machinelearningmastery.com/naive-bayes-for-
machine-learning/. Accessed 20 Oct 2018
26. K-Nearest Neighbors for Machine Learning. https://machinelearningmastery.com/k-nearest-
neighbors-for-machine-learning/. Accessed 20 Oct 2018
27. Discriminant Analysis. https://ncss-wpengine.netdna-ssl.com/wp-content/themes/nc-ss/pdf/
Procedures/NCSS/Discriminant_Analysis.pdf. Accessed 20 Oct 2018
28. Towards Data Science | Logistic Regression - Detailed Overview. https://towards-
datascience.com/logistic-regression-detailed-overview-46c4da4303bc. Accessed 20 Oct
2018
29. Arp, D., Spreitzenbarth, M., Hubner, M., Gascon, H., Rieck, K.: DREBIN: effective and
explainable detection of android malware in your pocket. In: NDSS, vol. 14, pp. 23–26,
USA (2014)
30. Confusion Matrix. http://www2.cs.uregina.ca/*dbd/cs831/notes/confusion-matrix/confusion-
matrix.html. Accessed 20 Oct 2018
31. Simple guide to confusion matrix terminology. http://www.dataschool.io/simple-guide-to-
confusion-matrix-terminology/. Accessed 20 Oct 2018
View publication stats

Malware Detection Using Blockchain

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Malware Detection Using Blockchain

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Evaluating Machine Learning Models on the Ethereum Blockchain for Android

Chapter · July 2019

Md Shohel Rana Charan Gudla

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Md. Shohel Rana(&), Charan Gudla, and Andrew H. Sung

School of Computing Sciences and Computer Engineering,

Keywords: Machine learning Blockchain Smart contract Google

Android is a most popular mobile operating system developed by Google, built on a

© Springer Nature Switzerland AG 2019

Fig. 1. An AndroidMenifest.xml ﬁle with used permissions

public static String f2920C = "http://www.shinhwa21.net/new/apps_judis_end.php?pkg=";

Fig. 2. Overview of proposed framework

3 Proposed System Architecture

3.1 Basic Structure

Fig. 3. Structure of the proposed system

• user is anyone who can interact with Ethereum contracts.

3.3 Features of Smart Contracts

• GetTrainingIndex() is used to retrieve training indexes.

4.1 Dataset Hashing

4.2 Determine Train and Test Dataset

5.1 Overﬁtting by Submitter

5.2 Too Many Submissions

5.3 Block Hash Manipulation by Miners

5.4 Distributed Reward System Abuse

6 Machine Learning Algorithms

20 malware families in our dataset is provided in Table 1 including numerous feature

Table 1. Top malware families of our dataset (values above 40)

8 Results and Analysis

Table 2. Confusion matrix

No:of correctly classified data ðTP þ TN Þ

Table 3. Result of the performance

Fig. 4. Accuracy curves

Fig. 5. ROC curves

9 Conclusion and Future Work

View publication stats

You might also like