You are on page 1of 17

FORM 2

THE PATENTS ACT, 1970

5 (39 of 1970)

&

THE PATENT RULES, 2003


10

COMPLETE SPECIFICATION

15 [See Section 10 and Rule 13]

TITLE:
20
“A BYTE CODE FEATURE BASED MALWARE
DETECTION SYSTEM FOR MOBILE PLATFORM”

25
APPLICANT:

NATIONAL INSTITUTE OF TECHNOLOGY, PATNA


An Indian Autonomous body
30 having address at
Patna-800005, Bihar, India

PREAMBLE TO THE DESCRIPTION:

35 The following specification particularly describes the invention and the


manner in which it is to be performed:
FIELD OF THE INVENTION

The present invention relates to a malware detection system. More


particularly, the present invention is directed towards a byte code feature-based
malware detection system for identifying and reporting a malicious application
5 in mobile platforms like android.

BACKGROUND OF THE INVENTION

A malware attack is a type of malicious activity to execute unauthorized


actions to harm user’s system and gain access to the user’s system without
10 his/her permission. Malware attacks are extremely harmful to a person,
company or government target, if the malicious user steals the confidential
information. The malware attacks cause various problems i.e. corrupting a
single computer system to an orchestrated large system or extracting payment
information from the victim’s device. The objective of the malware is
15 dependent on the cyber-criminal as per his/her requirement or imagination.
Hence, it is necessary to detect and remove malware attacks in order to resolve
the problems associated above.

The malware activities happen through e-mail, text, application, weak network
service and through various physical devices. Attackers use evasion and
20 obfuscation techniques to fool users, security administrators and anti-malware
platforms. Attackers use simple evasion and obfuscation techniques by hiding
IP address and changing its code to avoid detection from security
administrators and anti-malware products. Additionally, there are various
types of malware attacks such as adware, file less malware, viruses, worms etc.
25 Therefore, there is long felt need of a platform/system or device for securing
and identifying all types and techniques of malware attacks.

In order to protect the system from malware activities, various malware


detection methods are being used. Most of the existing malware identification

2
is signature based where some specific pattern from a malware is used as a
signature to identify future malware. These signature-based malwares cannot
detect zero-day attack as their signature is not present in the signature
database. There are two types of malware detection methods, static detection
5 methods and dynamic detection methods. Static detection methods are widely
used in which the platform carrying malware may be reverse-engineered and
extraction of information like permissions, API classes etc. called as features,
are done on the suspect file. These features are further used to detect the
suspect file. The dynamic approaches analyses the suspect application by
10 executing them in a closed safe environment such as a virtual machine or
sandbox environment to extract features from the application. The extracted
features are then used to detect where the suspect application is malware or
not. The execution of an application is difficult on a resource constrained
devices or general use devices as they may infect the system making them
15 undeployable on a general-purpose mobile device.

One of the research articles suggested a nearest neighbor-based model to detect


malware using opcodes, application programming interfaces and actions. The
method used Doc2vec for the generation of sensitive semantic information
making it difficult to deploy on mobile devices.

20 With rapid development in the data security field, several approaches were
introduced related to limited permissions to classify benign and malicious
applications using association rules by using convolutional neural network
(CNN) model to detect malicious code. This method uses permission trigger
risk and pattern recognition of IOT devices. Although, this methodology could
25 not cope with the advancement of malware families, due to a dearth of
behavioral semantics analysis.

US9853997B2 discloses a multi-channel change point malware detection


system that detects changes in host behavior whenever malware execution
happens using linear discriminant analysis (LDA) for feature extraction, multi-

3
channel change point detection and a host wide diagnosis through data fusion
center (DFC) which uses the decisions from the local detectors to infer whether
the host computer is infected by malware. The device being monitored through
the sensors corresponding to predetermined features. However, this invention
5 does not work for application using obfuscation and simple encryption
techniques.

US10511614B1 discloses a subscription-based malware detection under


management system control that collectively provides a distributed malware
detection scheme for determining the network traffic analysis. The malware
10 detection system is designed to set the first level of malware detection based on
a first subscription level purchased by a subscriber and control operability.
However, this invention does not work through byte code information, which
makes device lightweight and faster.

US10382479B2 discloses a method of analyzing using internal and/or external


15 malware detection operation by modifying an environment, executing on a
particular device, to form a modified environment. The system performs the
external operation by performing a communication from the particular device.
The system monitors the modified environment for a first behavior indicative
of the malware infection. The system detects that the first or second behavior,
20 whenever it occurs. The notification causes one or more network devices to
block network traffic to or from the client device. However, this invention does
not scan on the basis of obfuscated techniques of incorporating virus.

WO2016/191486A1 discloses a method that involves a network infrastructure


device receiving a flow of packets determined using network infrastructure.
25 The first subset is a first datagram, and the first length of the first datagram is
determined. The second subset is for a second datagram that came after the
first one to figure out the second length of the second datagram. A duration
value is calculated between a first arrival time of the first datagram and a
second arrival time of the second datagram.

4
Therefore, due to above mentioned drawbacks, there is a need for an improved
malware detection system that produces faster results in an efficient format.

OBJECT OF THE INVENTION

5 The main object of the present invention is to provide a byte code feature-based
malware detection system for mobile platforms like android devices.

Another object of the present invention is to provide a byte code feature-based


malware detection system to use the complete hex code value from the JAR
file of an application on the mobile platform only to detect whether the
10 application is malware or not.

Yet another object of the present invention is to provide a byte code feature
based malware detection system to take the difference between two hex values,
which defies simple encryption techniques to work for obfuscated techniques
as well.

15 Yet another object of the present invention is to provide a normalized byte


code feature based malware detection system for feature extraction to ensure
consistency.

Still another object of the present invention is to provide a byte code feature
based malware detection system to provide lightweight and faster system as
20 compared to the conventional system.

SUMMARY OF THE INVENTION

The present invention relates to a system for detecting mobile malware by


using byte code features by scanning the system and detecting obfuscated
25 application and extracting several features like 2-gram byte counts and byte
pair difference to ensure consistency.

5
In an embodiment, the present invention provides a byte code feature based
malware detection system for identifying and reporting a malicious activity in
the mobile platform comprising a data acquisition unit and a processor,
wherein, said data acquisition unit extracts a hex opcode sequence from mobile
5 platform and convert said hex opcode sequence into a long hex string; said
processor processes said long hex string to detect malware in said mobile
platform via a feature extraction module and a classification module; said
feature extraction module applies a 2-gram feature extraction protocol to
extract a set of feature vectors from said long hex string; said classification
10 module divides said feature vectors into a training set, validation test set and
testing set and evaluates said sets to distinguish whether said mobile
application is malware or benign.

In another embodiment, the present invention provides the byte code feature
based malware detection system wherein said system works via a method: a)
15 extracting said hex opcode sequence from said mobile platform and converting
said hex opcode sequence into said long hex string; b) processing said long hex
string to detect malware in said mobile platform via said feature extraction
module and said classification module; c) applying a 2-gram feature extraction
protocol to extract a set of feature vectors from said long hex string; and d)
20 using said feature vectors to evaluate said application on said mobile
application as malware or benign.

The above objects and advantages of the present invention will become
apparent from the hereinafter set forth brief description of the drawings,
detailed description of the invention, and claims appended herewith.

25 BRIEF DESCRIPTION OF THE DRAWING

An understanding of the bytecode feature based malware detection system for


mobile platform may be obtained by reference to the following drawing:

6
Figure 1 is a block diagram of the system for detecting mobile malware
platform according to an embodiment of the present invention.

Figure 2 is a block diagram representing data acquisition, data pre-processing,


and model training at the cloud according to an embodiment of the present
5 invention.

Figure 3 is a block diagram representing the working steps of malware


classification according to an embodiment of the present invention.

Figure 4 is a pictorial view of sample hex code of a mobile application


according to an embodiment of the present invention.

10 Figure 5 is a graphical view of sequence of hex code from two-gram


calculation according to an embodiment of the present invention.

Figure 6 is a tabular representation of feature-extracted value from a mobile


application according to an embodiment of the present invention.

Figure 7 is a tabular representation of normalized feature-extracted value from


15 a mobile application according to an embodiment of the present invention.

DETAILED DESCRIPTION OF THE INVENTION

The present invention will now be described hereinafter with reference to the
accompanying drawings in which a preferred embodiment of the invention is
20 shown. This invention may, however, be embodied in many different forms
and should not be construed as being limited to the embodiment set forth
herein. Rather, the embodiment is provided so that this disclosure will be
thorough, and will fully convey the scope of the invention to those skilled in
the art.

25 Many aspects of the invention can be better understood with references made
to the drawings below. The components in the drawings are not necessarily

7
drawn to scale. Instead, emphasis is placed upon clearly illustrating the
components of the present invention. Moreover, like reference numerals
designate corresponding parts through the several views in the drawings.
Before explaining at least one embodiment of the invention, it is to be
5 understood that the embodiments of the invention are not limited in their
application to the details of construction and to the arrangement of the
components set forth in the following description or illustrated in the drawings.
The embodiments of the invention are capable of being practiced and carried
out in various ways. In addition, the phraseology and terminology employed
10 herein are for the purpose of description and should not be regarded as
limiting.

The present invention relates to a byte code feature-based malware detection


system to scan the system whenever any application is installed in the mobile
and identify the malicious application in the mobile platform.

15 In an embodiment, the present invention provides a byte code feature based


malware detection system for identifying and reporting a malicious activity in
the mobile application comprising a data acquisition unit and a processor,
wherein, said data acquisition unit extracts a hex opcode sequence from mobile
platform and convert said hex opcode sequence into a long hex string; said
20 processor processes said long hex string to detect malware in said mobile
platform via a feature extraction module and a classification module; said
feature extraction module applies a 2-gram feature extraction technique to
extract a set of feature vectors from said long hex string; said classification
module divides said feature vectors into a training set, validation test set and
25 testing set and evaluates said sets to distinguish whether said mobile platform
is malware or benign ware.

In another embodiment, the present invention provides the byte code feature
based malware detection system wherein said system works via a method: a)
extracting said hex opcode sequence from said mobile platform and converting

8
said hex opcode sequence into said long hex string; b) processing said long hex
string to detect malware in said mobile platform via said feature extraction
module and said classification module; c) applying a 2-gram feature extraction
protocol to extract a set of feature vectors from said long hex string; and d)
5 predicting the said application using the feature obtained to distinguish
whether said mobile application is malware or benign.

Figure 1 represents a block diagram of a byte code feature-based malware


detection system (5). The system comprises of a data acquisition unit (1) and a
processor (2) wherein the processor (2) includes a feature extraction module
10 and a classification module in which said data determination module (1)
extracts a hex opcode sequence from mobile platform and converts hex opcode
sequence into a long hex string; said feature extraction module applies a 2-
gram feature extraction and brings a set of feature vectors in tabular format
from said long hex string; said classification module categorizes said set of
15 feature vectors into said training and testing set; and said training and testing
set trains and validates said set of feature vectors and classifies application as
malware or benign.

Figure 2 shows the initial process of malware detection system (5) which
happens at the cloud (100). The embodiment consists of data extraction, data
20 preprocessing and model training at the cloud.

The byte code feature-based malware detection system wherein said the system
works via a method: a) generating JAR files (102) from a mobile platform like
android application files (apk files); b) extracting said hex opcode sequence
from said JAR files and converting said hex opcode sequence into said long
25 hex string (103); c) processing said long hex string to detect malware in said
mobile platform via said feature extraction module and said classification
module; d) applying a 2-gram feature extraction (108) protocol to extract a set
of feature vectors from said long hex string; e) counting the occurrences of
hexadecimal features starting from 00 till FF and creating said hexcode table

9
(104); f) normalizing the values of the hexcode table to negate the differences
in the values due to the size of the application said normalized hexcode table
(105) is generated; and d) dividing said feature vectors into said training set,
validation test set and testing set and evaluates said sets to distinguish whether
5 said mobile platform is malware or benign.

The complete architecture of the present invention is classified into two


sections, i.e., training and normal platform classification. The system is trained
and validated through the training and validation module on a cloud server.
Once the system is trained, it is ready for regular use of classifying any
10 application as malware or benign and may be installed in the mobile platform.

Figure 3 shows a process flow for regular use of classifying any application as
malware or benign based malware classification system (200) according to
another embodiment in which whenever any application is downloaded on a
mobile platform, the malicious activity is detected via the system and alert the
15 user about it. The application classification as malware or benign wherein said
the system works via a method: a) apk (201) of an application is converted into
jar file (202); b) hexcode is extracted from the jar file and converted into long
hex string (203); c) 2-gram feature extraction protocol is applied to long hex
string; d) hexcode table is generated using feature vectors said hextable (104);
20 e) normalized hexcode table is generated said normalized hextable (105); and
f) feature vectors are sent as input to said trained model (107) to evaluate the
application as malicious or benign.

Figure 4 is a pictorial view of sample hex code (103) of a mobile application.


The data from mobile application is decompressed into specific format to
25 obtain a hex opcode and obtained hex opcode sequence is converted into a hex
long string as shown in the Figure 4. The operation takes place through data
acquisition unit (1) including dex2jar tool and hex dump tool respectively.

10
Figure 5 is a graphical view of sequence of hex code (108) from two-gram
calculation. A two-gram calculation is preformed from the sample hex code
through all possible combinations of two-gram hex code through feature
extraction module in the processor (2). The figure shows the two-gram sliding
5 window method to count the feature occurrences.

Figure 6 is a tabular representation of extracted feature values from a mobile


platform and shows the final extracted data sample in tabular format (104)
which provides 256 feature vectors, ranging from “00” to “FF”, comprising
256 columns corresponding to these feature vectors and each row represents a
10 single mobile application. To defy simple encryption techniques, we find the
difference of two consecutive hex values, the data is normalized between '0'
and '1' and is then labelled according to the class of malware or benign.

Figure 7 is a normalized hexcode table (105) generated to negate the variable


feature vector values of different applications due to their sizes.

15 The present invention utilizes complete hex code value of a jar file (extracted
from apk. file) to determine whether it is malware or not and difference
between two hex values are considered in the system which defies simple
encryption techniques to work for obfuscated applications also which makes
the system lightweight and faster as compared to others that extract several
20 features like permission, calls, intents and native codes etc.

Further, Table 1 depicts the result representing the accuracy, precision, recall
and F1 score of the present invention on sample testing dataset called Drebin
dataset.

Table 1 shows the result representing the accuracy, precision, recall and F1
25 score of the invented module on sample testing dataset.

S.no Present Accuracy (%) Precision Recall F1


Invention

11
1 Adaptive 99.71 0.99 0.99 0.99
boosting

Further, the byte code feature based malware detection system includes other
computing device including a mobile telephone, a laptop, tablet, or desktop
computer having, a netbook, a video game device, a pager, a smart phone, an
5 ultra-mobile personal computer (UMPC), a personal data assistant (PDA), etc.
which uses android OS to run one or more applications, such as Internet
browsers, voice calls, video games, video conferencing, and email, among
others and these device be coupled to a network.

The present invention works on a network that offers data transport and other
10 services. In general, the network includes and implement any commonly
defined network architectures including those defined by standards bodies,
such as the Global System for Mobile communication (GSM) Association, the
Internet Engineering Task Force (IETF), and the Worldwide Interoperability
for Microwave Access (WiMAX) forum. For example, network may
15 implement one or more of a GSM architecture, a General Packet Radio
Service (GPRS) architecture, a Universal Mobile Telecommunications System
(UMTS) architecture, and an evolution of UMTS referred to as Long Term
Evolution (LTE).

The processor of the present invention includes any type of processor,


20 microprocessor, or processing logic that may interpret and execute instructions
(e.g., a field programmable gate array (FPGA)). The processor includes a
single device (e.g., a single core) and/or a group of devices (e.g., multi-core).
Further, the present invention includes a memory unit for storing all the data
extracted, generated in the byte code feature-based malware detection system,
25 such as random-access memory (RAM) or another type of dynamic storage
device that may store information and instructions for execution by processor.

12
The memory is also be used to store temporary variables or other intermediate
information during execution of instructions by the processor.

The memory unit acts as data storage unit which includes a single storage
device or multiple storage devices, such as multiple storage devices operating
5 in parallel. Moreover, the storage device resides locally on the computing
devices or processor of the present invention and/or may be remote with
respect to a server and connected thereto via network and/or another type of
connection, such as a dedicated link or channel.

The byte code feature-based malware detection system includes an input device
10 and an output device. The input device includes any mechanism or
combination of mechanisms that permit an operator to input information to
the computing device and processor. The input device is selected from
keyboard, a mouse, a touch sensitive display device, a microphone, a pen-
based pointing device, and/or a biometric input device, such as a voice
15 recognition device and/or a finger print scanning device. The output device
may include any mechanism or combination of mechanisms that outputs
information to the operator, including a display, a printer, a speaker, etc.

Therefore, the present invention provides a byte code feature-based malware


detection system for identifying and reporting malware on mobile platform and
20 extracting several features like 2-gram byte code value and byte code difference
values, further the present invention utilizes complete hex code value of a jar
file (extracted from apk. file) to determine whether it is malware or not and
difference between two hex values are considered in the system which defies
simple encryption techniques to work for obfuscated applications also which
25 makes the system lightweight and faster as compared to others that extract
several features like permission, calls, intents.

Many modifications and other embodiments of the invention set forth herein
will readily occur to one skilled in the art to which the invention pertain having
the benefit of the teachings presented in the foregoing descriptions and the

13
associated drawings. Therefore, it is to be understood that the invention is not
to be limited to the specific embodiments disclosed and that modifications and
other embodiments are intended to be included within the scope of the
appended claims. Although specific terms are employed herein, they are used
5 in a generic and descriptive sense only and not for purposes of limitation.

The foregoing description of embodiments of the invention has been presented


for purposes of illustration and description. It is not intended to be exhaustive
or to limit the invention to the precise form disclosed, and modifications and
variations are possible in light of the above teachings or may be acquired from
10 practice of the invention. The embodiments were chosen and described in
order to explain the principals of the invention and its practical application to
enable one skilled in the art to utilize the invention in various embodiments
and with various modifications as are suited to the particular use
contemplated.

15

14
CLAIMS

We claim:

1. A byte code feature-based malware detection system (5) for mobile


platform, comprises of:

5 a data acquisition unit (1); and

a processor (2);

wherein:

said data acquisition unit (1) extracts a hex opcode sequence from
mobile platform and converts said hex opcode sequence into a long hex
10 string;

said processor (2) processes said long hex string to detect malware in
said mobile platform via a feature extraction module and a classification
module;

said feature extraction module applies a 2-gram feature extraction


15 protocol to extract a set of feature vectors from said long hex string; and

said classification module divides said feature vectors into a training set,
validation test set and testing set and evaluates said sets to distinguish
whether said mobile platform is malware or benign.

2. The byte code feature-based malware detection system (5) for mobile
20 platform as claimed in claim 1, wherein said classification module
works via adaptive boosting classifier.

3. The byte code feature-based malware detection system (5) for mobile
platform as claimed in claim 1, wherein said data acquisition unit (1) is
preferably a data extraction unit.

15
4. The byte code feature-based malware detection system (5) for mobile
platform as claimed in claim 1, wherein said system works via a method
that includes the steps of:

a. extracting said hex opcode sequence from said mobile platform and
5 converting said hex opcode sequence into said long hex string;

b. processing said long hex string to detect malware in said mobile


platform via said feature extraction module and said classification
module;

c. applying a 2-gram feature extraction protocol to extract a set of


10 feature vectors from said long hex string; and

d. dividing said feature vectors into said training set, validation test set
and testing set and evaluates said sets to distinguish whether said
mobile platform is malware or benign.

5. The byte code feature-based malware detection system (5) for mobile
15 platform as claimed in claim 1, wherein said classification module (3)
divides said set of feature vectors in 70:30 ratio.

6. The byte code feature-based malware detection system (5) for mobile
platform as claimed in claim 1, wherein said byte code feature-based
malware detection system (5) achieves an accuracy ranging from 99.60
20 to 99.75%.

Dated this 03rd day of March, 2023

SHRUTI KAUSHIK
of PATENTWIRE
25 Agent for the Applicant
[IN/PA 1324]

16
ABSTRACT

“A BYTE CODE FEATURE BASED MALWARE DETECTION


SYSTEM FOR MOBILE PLATFORM”

5 The present invention discloses a byte code feature based malware detection
system (5) for identifying and reporting a malicious activity in the mobile
platform comprising a data acquisition unit (1), a processor (2), a feature
extraction module, an output module, wherein, said data acquisition unit (1)
extracts a hex opcode sequence from mobile platform and convert said hex
10 opcode sequence into a long hex string; said processor (2) processes said long
hex string to detect malware in said mobile platform via a feature extraction
module and a classification module; said feature extraction module applies a
2-gram feature extraction protocol to extract a set of feature vectors from said
long hex string; said classification module divides said feature vectors into a
15 training set, validation test set and testing set and evaluates said sets to
distinguish whether said mobile platform is malware or benign.

20

25

Figure 1 on sheet no. 1 of the drawings may accompany the abstract when published.

You might also like