Professional Documents
Culture Documents
Form 2: The Patents Act, 1970 (39 of 1970) & The Patent Rules, 2003
Form 2: The Patents Act, 1970 (39 of 1970) & The Patent Rules, 2003
5 (39 of 1970)
&
COMPLETE SPECIFICATION
TITLE:
20
“A BYTE CODE FEATURE BASED MALWARE
DETECTION SYSTEM FOR MOBILE PLATFORM”
25
APPLICANT:
The malware activities happen through e-mail, text, application, weak network
service and through various physical devices. Attackers use evasion and
20 obfuscation techniques to fool users, security administrators and anti-malware
platforms. Attackers use simple evasion and obfuscation techniques by hiding
IP address and changing its code to avoid detection from security
administrators and anti-malware products. Additionally, there are various
types of malware attacks such as adware, file less malware, viruses, worms etc.
25 Therefore, there is long felt need of a platform/system or device for securing
and identifying all types and techniques of malware attacks.
2
is signature based where some specific pattern from a malware is used as a
signature to identify future malware. These signature-based malwares cannot
detect zero-day attack as their signature is not present in the signature
database. There are two types of malware detection methods, static detection
5 methods and dynamic detection methods. Static detection methods are widely
used in which the platform carrying malware may be reverse-engineered and
extraction of information like permissions, API classes etc. called as features,
are done on the suspect file. These features are further used to detect the
suspect file. The dynamic approaches analyses the suspect application by
10 executing them in a closed safe environment such as a virtual machine or
sandbox environment to extract features from the application. The extracted
features are then used to detect where the suspect application is malware or
not. The execution of an application is difficult on a resource constrained
devices or general use devices as they may infect the system making them
15 undeployable on a general-purpose mobile device.
20 With rapid development in the data security field, several approaches were
introduced related to limited permissions to classify benign and malicious
applications using association rules by using convolutional neural network
(CNN) model to detect malicious code. This method uses permission trigger
risk and pattern recognition of IOT devices. Although, this methodology could
25 not cope with the advancement of malware families, due to a dearth of
behavioral semantics analysis.
3
channel change point detection and a host wide diagnosis through data fusion
center (DFC) which uses the decisions from the local detectors to infer whether
the host computer is infected by malware. The device being monitored through
the sensors corresponding to predetermined features. However, this invention
5 does not work for application using obfuscation and simple encryption
techniques.
4
Therefore, due to above mentioned drawbacks, there is a need for an improved
malware detection system that produces faster results in an efficient format.
5 The main object of the present invention is to provide a byte code feature-based
malware detection system for mobile platforms like android devices.
Yet another object of the present invention is to provide a byte code feature
based malware detection system to take the difference between two hex values,
which defies simple encryption techniques to work for obfuscated techniques
as well.
Still another object of the present invention is to provide a byte code feature
based malware detection system to provide lightweight and faster system as
20 compared to the conventional system.
5
In an embodiment, the present invention provides a byte code feature based
malware detection system for identifying and reporting a malicious activity in
the mobile platform comprising a data acquisition unit and a processor,
wherein, said data acquisition unit extracts a hex opcode sequence from mobile
5 platform and convert said hex opcode sequence into a long hex string; said
processor processes said long hex string to detect malware in said mobile
platform via a feature extraction module and a classification module; said
feature extraction module applies a 2-gram feature extraction protocol to
extract a set of feature vectors from said long hex string; said classification
10 module divides said feature vectors into a training set, validation test set and
testing set and evaluates said sets to distinguish whether said mobile
application is malware or benign.
In another embodiment, the present invention provides the byte code feature
based malware detection system wherein said system works via a method: a)
15 extracting said hex opcode sequence from said mobile platform and converting
said hex opcode sequence into said long hex string; b) processing said long hex
string to detect malware in said mobile platform via said feature extraction
module and said classification module; c) applying a 2-gram feature extraction
protocol to extract a set of feature vectors from said long hex string; and d)
20 using said feature vectors to evaluate said application on said mobile
application as malware or benign.
The above objects and advantages of the present invention will become
apparent from the hereinafter set forth brief description of the drawings,
detailed description of the invention, and claims appended herewith.
6
Figure 1 is a block diagram of the system for detecting mobile malware
platform according to an embodiment of the present invention.
The present invention will now be described hereinafter with reference to the
accompanying drawings in which a preferred embodiment of the invention is
20 shown. This invention may, however, be embodied in many different forms
and should not be construed as being limited to the embodiment set forth
herein. Rather, the embodiment is provided so that this disclosure will be
thorough, and will fully convey the scope of the invention to those skilled in
the art.
25 Many aspects of the invention can be better understood with references made
to the drawings below. The components in the drawings are not necessarily
7
drawn to scale. Instead, emphasis is placed upon clearly illustrating the
components of the present invention. Moreover, like reference numerals
designate corresponding parts through the several views in the drawings.
Before explaining at least one embodiment of the invention, it is to be
5 understood that the embodiments of the invention are not limited in their
application to the details of construction and to the arrangement of the
components set forth in the following description or illustrated in the drawings.
The embodiments of the invention are capable of being practiced and carried
out in various ways. In addition, the phraseology and terminology employed
10 herein are for the purpose of description and should not be regarded as
limiting.
In another embodiment, the present invention provides the byte code feature
based malware detection system wherein said system works via a method: a)
extracting said hex opcode sequence from said mobile platform and converting
8
said hex opcode sequence into said long hex string; b) processing said long hex
string to detect malware in said mobile platform via said feature extraction
module and said classification module; c) applying a 2-gram feature extraction
protocol to extract a set of feature vectors from said long hex string; and d)
5 predicting the said application using the feature obtained to distinguish
whether said mobile application is malware or benign.
Figure 2 shows the initial process of malware detection system (5) which
happens at the cloud (100). The embodiment consists of data extraction, data
20 preprocessing and model training at the cloud.
The byte code feature-based malware detection system wherein said the system
works via a method: a) generating JAR files (102) from a mobile platform like
android application files (apk files); b) extracting said hex opcode sequence
from said JAR files and converting said hex opcode sequence into said long
25 hex string (103); c) processing said long hex string to detect malware in said
mobile platform via said feature extraction module and said classification
module; d) applying a 2-gram feature extraction (108) protocol to extract a set
of feature vectors from said long hex string; e) counting the occurrences of
hexadecimal features starting from 00 till FF and creating said hexcode table
9
(104); f) normalizing the values of the hexcode table to negate the differences
in the values due to the size of the application said normalized hexcode table
(105) is generated; and d) dividing said feature vectors into said training set,
validation test set and testing set and evaluates said sets to distinguish whether
5 said mobile platform is malware or benign.
Figure 3 shows a process flow for regular use of classifying any application as
malware or benign based malware classification system (200) according to
another embodiment in which whenever any application is downloaded on a
mobile platform, the malicious activity is detected via the system and alert the
15 user about it. The application classification as malware or benign wherein said
the system works via a method: a) apk (201) of an application is converted into
jar file (202); b) hexcode is extracted from the jar file and converted into long
hex string (203); c) 2-gram feature extraction protocol is applied to long hex
string; d) hexcode table is generated using feature vectors said hextable (104);
20 e) normalized hexcode table is generated said normalized hextable (105); and
f) feature vectors are sent as input to said trained model (107) to evaluate the
application as malicious or benign.
10
Figure 5 is a graphical view of sequence of hex code (108) from two-gram
calculation. A two-gram calculation is preformed from the sample hex code
through all possible combinations of two-gram hex code through feature
extraction module in the processor (2). The figure shows the two-gram sliding
5 window method to count the feature occurrences.
15 The present invention utilizes complete hex code value of a jar file (extracted
from apk. file) to determine whether it is malware or not and difference
between two hex values are considered in the system which defies simple
encryption techniques to work for obfuscated applications also which makes
the system lightweight and faster as compared to others that extract several
20 features like permission, calls, intents and native codes etc.
Further, Table 1 depicts the result representing the accuracy, precision, recall
and F1 score of the present invention on sample testing dataset called Drebin
dataset.
Table 1 shows the result representing the accuracy, precision, recall and F1
25 score of the invented module on sample testing dataset.
11
1 Adaptive 99.71 0.99 0.99 0.99
boosting
Further, the byte code feature based malware detection system includes other
computing device including a mobile telephone, a laptop, tablet, or desktop
computer having, a netbook, a video game device, a pager, a smart phone, an
5 ultra-mobile personal computer (UMPC), a personal data assistant (PDA), etc.
which uses android OS to run one or more applications, such as Internet
browsers, voice calls, video games, video conferencing, and email, among
others and these device be coupled to a network.
The present invention works on a network that offers data transport and other
10 services. In general, the network includes and implement any commonly
defined network architectures including those defined by standards bodies,
such as the Global System for Mobile communication (GSM) Association, the
Internet Engineering Task Force (IETF), and the Worldwide Interoperability
for Microwave Access (WiMAX) forum. For example, network may
15 implement one or more of a GSM architecture, a General Packet Radio
Service (GPRS) architecture, a Universal Mobile Telecommunications System
(UMTS) architecture, and an evolution of UMTS referred to as Long Term
Evolution (LTE).
12
The memory is also be used to store temporary variables or other intermediate
information during execution of instructions by the processor.
The memory unit acts as data storage unit which includes a single storage
device or multiple storage devices, such as multiple storage devices operating
5 in parallel. Moreover, the storage device resides locally on the computing
devices or processor of the present invention and/or may be remote with
respect to a server and connected thereto via network and/or another type of
connection, such as a dedicated link or channel.
The byte code feature-based malware detection system includes an input device
10 and an output device. The input device includes any mechanism or
combination of mechanisms that permit an operator to input information to
the computing device and processor. The input device is selected from
keyboard, a mouse, a touch sensitive display device, a microphone, a pen-
based pointing device, and/or a biometric input device, such as a voice
15 recognition device and/or a finger print scanning device. The output device
may include any mechanism or combination of mechanisms that outputs
information to the operator, including a display, a printer, a speaker, etc.
Many modifications and other embodiments of the invention set forth herein
will readily occur to one skilled in the art to which the invention pertain having
the benefit of the teachings presented in the foregoing descriptions and the
13
associated drawings. Therefore, it is to be understood that the invention is not
to be limited to the specific embodiments disclosed and that modifications and
other embodiments are intended to be included within the scope of the
appended claims. Although specific terms are employed herein, they are used
5 in a generic and descriptive sense only and not for purposes of limitation.
15
14
CLAIMS
We claim:
a processor (2);
wherein:
said data acquisition unit (1) extracts a hex opcode sequence from
mobile platform and converts said hex opcode sequence into a long hex
10 string;
said processor (2) processes said long hex string to detect malware in
said mobile platform via a feature extraction module and a classification
module;
said classification module divides said feature vectors into a training set,
validation test set and testing set and evaluates said sets to distinguish
whether said mobile platform is malware or benign.
2. The byte code feature-based malware detection system (5) for mobile
20 platform as claimed in claim 1, wherein said classification module
works via adaptive boosting classifier.
3. The byte code feature-based malware detection system (5) for mobile
platform as claimed in claim 1, wherein said data acquisition unit (1) is
preferably a data extraction unit.
15
4. The byte code feature-based malware detection system (5) for mobile
platform as claimed in claim 1, wherein said system works via a method
that includes the steps of:
a. extracting said hex opcode sequence from said mobile platform and
5 converting said hex opcode sequence into said long hex string;
d. dividing said feature vectors into said training set, validation test set
and testing set and evaluates said sets to distinguish whether said
mobile platform is malware or benign.
5. The byte code feature-based malware detection system (5) for mobile
15 platform as claimed in claim 1, wherein said classification module (3)
divides said set of feature vectors in 70:30 ratio.
6. The byte code feature-based malware detection system (5) for mobile
platform as claimed in claim 1, wherein said byte code feature-based
malware detection system (5) achieves an accuracy ranging from 99.60
20 to 99.75%.
SHRUTI KAUSHIK
of PATENTWIRE
25 Agent for the Applicant
[IN/PA 1324]
16
ABSTRACT
5 The present invention discloses a byte code feature based malware detection
system (5) for identifying and reporting a malicious activity in the mobile
platform comprising a data acquisition unit (1), a processor (2), a feature
extraction module, an output module, wherein, said data acquisition unit (1)
extracts a hex opcode sequence from mobile platform and convert said hex
10 opcode sequence into a long hex string; said processor (2) processes said long
hex string to detect malware in said mobile platform via a feature extraction
module and a classification module; said feature extraction module applies a
2-gram feature extraction protocol to extract a set of feature vectors from said
long hex string; said classification module divides said feature vectors into a
15 training set, validation test set and testing set and evaluates said sets to
distinguish whether said mobile platform is malware or benign.
20
25
Figure 1 on sheet no. 1 of the drawings may accompany the abstract when published.