Professional Documents
Culture Documents
detection: A roadmap
1 Introduction
Machine Learning (ML) [1] plays a crucial role in current times to analyze
voluminous data, due to improved hardware and sophisticated algorithms being
readily available while more evolving each day. Different approaches which ML
algorithms use to solve some real-world problems are as follows:
It is also known as data-driven approach, in which labels for output variables are not
available for any data point. Based on different properties of data, ML model can
show some interesting patterns. It is generally used to find anomalies in data.
This approach combines the advantages of both supervised and unsupervised learning
approaches, when there is some labeled data available.
The second dimension answers “what” or at what technical level, issues are
monitored. There are different layers listed in this dimension, namely, network,
endpoint, application, user and process. Table 2 describes how ML/ DL techniques
can be used to monitor anomalies in different layers.
2 Related Work
There are two aspects of related research work summarized in this paper: Role of ML/
DL in cyber-security domains and Role of ML/ DL in detecting Android
vulnerabilities.
Tchakounté and Hayata [13] used supervised ML to detect Android malware. They
used permissions as a feature to detect malicious behavior. Maier et al. [14]
demonstrated that Android malware can bypass many AntiVirus (AV) tools and
Google Bouncer. Hussain et al. [15] presented a conceptual framework for improving
the privacy of the users and to secure medical data related to Android Mobile Health
applications (mHealth). Liang et al. [16] proposed an end to end DL model for
Android malware detection using raw system call sequences. They achieved an
accuracy of 93.16%. Ganesh et al. [17] presented a CNN based malware detection
solution using permissions. This solution detected malware with an accuracy of 93%.
Garg and Baliyan [18] proposed a novel parallel classifier scheme for detection of
vulnerabilities in Android with an accuracy of 98.27%. Details of the data collection
and various steps of preprocessing are discussed in [19].
3 Android Architecture
Android is a Linux-based mobile operating system with features like- shared memory
mechanism, binder IPC mechanism, power manager, etc. There are five software
layers, which are present on top of the Linux kernel, namely, hardware abstraction
layer, native libraries, Android runtime, application framework and system
application (app) [20], as shown in the Figure 2.
Hardware Abstraction Layer (HAL) - It acts as an interface for communicating
the Android application/framework with hardware-specific device drivers such as
camera, Bluetooth, etc. HAL is hardware-specific and implementation varies from
vendor to vendor.
Native Libraries - core system components and services of Android like Android
Runtime (ART) and Hardware Abstraction Layer (HAL) are built from the native
libraries written in C/C++. There are different libraries such as application framework
libraries, libraries for building user interface, graphics drawing and database access.
Android Runtime (ART) - ART is introduced as a new runtime environment in
newer Android versions (version 5.0 onwards). During app installation, it uses ahead-
of-time (AOT) and just-in-time (JIT) compilation, which compiles the Dalvik
bytecode into native binaries (ELF format). This optimizes garbage collection and
power assumption and achieves high runtime performance.
Application framework- Android SDK provides tool and API libraries to develop
applications on Android java. This framework is known as Android Application
Framework. Important features are database for storing data, support for audio, video
and image formats, debugging tools, etc.
System applications - Applications are located at the top most layer of the
Android stack. These consist of both native and third-party applications such as web
browser, email, SMS messenger, etc., which are installed by the user.
Fig. 2. Android Architecture
4 Android Vulnerabilities
Denial of Service (DoS) - The Denial of Service (DoS) vulnerability makes the
resources unavailable by tampering network packets, logic, programming, etc. The
services are ceased to the legitimate users when there are a large number of requests.
Arbitrary code can be injected and executed while performing DoS attacks to access
critical information. DoS attacks can have a direct impact on vulnerability by
introducing large response delays, service interruptions and excessive losses.
Code Execution - This vulnerability is exploited by executing arbitrary code.
This vulnerability is caused due to improper input/output data validation. Arbitrary
code can take the control of privileges and change or delete data using complete user
rights.
Overflow - Overflow vulnerability can occur when excess of the data is placed by
a malicious program than was originally allocated to be stored. This data leak can
corrupt/ overwrite the existing data. The extra data can have special instructions
which can trigger a response to damage files, access personal information or change
the data.
Memory Corruption - Memory corruption vulnerability can occur in a system
when a memory is altered without an explicit assignment. Programming errors can
enable the attackers to execute an arbitrary code, which can modify the contents of a
memory location.
SQL Injection - SQL query is injected via the input data from the client to the
application. This query can access the sensitive data from the database; modify the
database, shutdown the entire database management system, etc.
Cross Site Scripting (XSS) - XSS vulnerability is due to the injection of some
malicious scripts into benign and trusted websites. Attacker uses a browser side script
to send malicious code to the end user. The web application takes an input from the
user and generates an output without any validation. The malicious script can change
the HTML content of a web page, access tokens of the sessions, cookies, or any other
sensitive information used by the browser.
Directory Traversal - Directory traversal also known as path traversal
vulnerability accesses directories and files that are stored outside the web root folder.
Arbitrary files and directories or critical system files can be accessed by manipulating
absolute file paths. In case of Android, it is in the form of HTTP exploit where
attackers carry out a path traversal attack in the context of a user application and
read/write files inside internal storage.
HTTP Response Splitting - This vulnerability occurs when malicious characters
like carriage return (\r) and line feed (\n) are inserted in the HTTP response header
and sent to the end user without any validation. These characters allow attackers to
have direct control of the remaining headers and body of the response the application
intends to send, but also allow them to create additional responses entirely under their
control.
Bypass something - This vulnerability occurs when attackers can bypass
authentication mechanisms. Attackers can access unprotected file and can attack
protected applications by evading the authentication system.
Gain Information - This vulnerability allows attackers to gain privileges via a
malicious program in the affected application. It allows local users to gain privileges
via a crafted application that makes an API call to access sensitive information in the
registry.
Gain Privileges - It can occur when an attacker exploits the design or a
configuration flaws in the application or an operating system to gain access to the
resources and confidential data. The resources are then unavailable to the users.
Attackers can steal credentials and other sensitive information and can execute an
arbitrary code.
It is important to analyze the trend of android vulnerabilities from the year 2009 to
2019. It can be seen that there is a continuous increase in the number of
vulnerabilities till 2017 and later on there is a steep decrease in the vulnerabilities in
the year 2018 and 2019.
This decrease in the number of vulnerabilities is due to the better detection rates
using ML and DL algorithms. Table 3 shows detection rates of ML/ DL algorithms
2016 onwards.
Memory
38 32 12 19 101 4%
Corruption
-181
(not
mutu
Others 174 260 190 443 18%
ally
exclu
sive)
# of
525 843 613 414 2395 100%
Vulnerabilities
Memory Corruption 38 19 7% 5% -3 pp
The assessment is not merely carried out on impact scores of each vulnerability,
but also on the number of instances of occurrence. Table 6 shows the average impact
score of each vulnerability and Count of CVE ID shows the instances of each
vulnerability. Number of instances depict the widespread nature of each vulnerability
and impact score depicts the severity level of each vulnerability. This is to ensure that
we can capture the effect of both volume and the impact of each vulnerability.
Further, it is shown that impact analysis on a single instance is not enough to judge
the severity of vulnerability. Large number of instances of vulnerabilities having an
average impact score shows how widespread it is and higher cases of such
vulnerabilities make it more dangerous. This is shown in the Figure 4.
Fig. 4. Total impact of different types of vulnerabilities
High 4 0 1 5
Medium 33 5 13 51
Low 25 4 15 44
Table 8 shows the impact of vulnerability scores on integrity. It is seen that high,
medium and low-level vulnerabilities have a complete impact on integrity, i.e., the
attacker can modify the information or total compromise on integrity.
High 4 0 0 4
Medium 32 11 8 51
Low 24 11 10 45
High 4 0 0 4
Medium 35 7 8 50
Low 27 7 11 45
7 Conclusions
References