You are on page 1of 31

Statistical

Deobfuscation for Android Applications


Why De-obfuscate?
Android binaries (APKs)
(no code available)

Number of APKs on Google Play


2.4M APKs

’10 ’12 ’14 ’16


Google Play
Layout Obfuscation in Android
Non-descriptive
names

package com.example.dbhelper package a.b.c

class DBHelper extends SQLiteHelper { class a extends SQLiteHelper {


SQLiteDatabase db; SQLiteDatabase b;
Some names

Obfuscate
public DBHelper(Context ctx) { public a(Context ctx) {
db = getWritableDatabase(); remain
b = getWritableDatabase();
} }

Cursor execSQL(String str) { Cursor c(String str) {


return db.rawQuery(str); return b.rawQuery(str);
} }
} Names provide }

key semantic
information
Layout Obfuscation in Android
Non-descriptive
names

package com.example.dbhelper package a.b.c

Security Challenges
class DBHelper extends SQLiteHelper { class a extends SQLiteHelper {
SQLiteDatabase db; SQLiteDatabase b;
Some names

Obfuscate
public DBHelper(Context ctx) Code Inspection
{ public a(Context ctx) {
db = getWritableDatabase(); remain
b = getWritableDatabase();
} }
Third-party Library Detection
Cursor execSQL(String str) { Cursor c(String str) {
return db.rawQuery(str); return b.rawQuery(str);
} … many others }
} Names provide }

key semantic
information
Layout Obfuscation in Android
Non-descriptive
names

package com.example.dbhelper package a.b.c

class DBHelper extends SQLiteHelper { class a extends SQLiteHelper {


SQLiteDatabase db; SQLiteDatabase b;
Can we reverse Some names
public DBHelper(Context ctx) { public a(Context ctx) {
db = layout obfuscationb = getWritableDatabase();
getWritableDatabase(); remain
} }

Cursor execSQL(String str) { Cursor c(String str) {


return db.rawQuery(str); return b.rawQuery(str);
} }
} Names provide }

key semantic
information
Layout Obfuscation in Android
Non-descriptive
names

package com.example.dbhelper package a.b.c

class DBHelper extends SQLiteHelper { class a extends SQLiteHelper {


SQLiteDatabase db; SQLiteDatabase b;

public DBHelper(Context ctx) { public a(Context ctx) {


db = getWritableDatabase(); www.apk-deguard.com
b = getWritableDatabase();
} }

Cursor execSQL(String str) { Cursor c(String str) {


return db.rawQuery(str); return b.rawQuery(str);
} }
} Names provide }

key semantic
Yes, with roughly 80% accuracy!
information
www.apk-deguard.com
Released last week, so far: > 5K users > 5GB APKs

Reddit posts/comments
. . . Tweets

. . .
How Does DeGuard Work?
DeGuard: System Overview
class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{
SQLiteDatabase b; SQLiteDatabase db;
public a(Context ctx) { Static MAP public DBHelper(Context ctx) {
Transform
}
b = getWritableDB(); analysis Inference }
db = getWritableDB();

} }

Obfuscated Code De-obfuscated Code


Prediction Phase

Learning Phase
Probabilistic model
! )
Open-source, Static
unobfuscated Training
analysis
applications

Semantic
representation
Probabilistic Graphical Models
Probabilistic Graphical Models
class a extends SQLiteHelper {
name1 name2 weight SQLiteDatabase b;
$% SQLiteHelper DBUtils 0.3 public a(Context ctx) {
$& SQLiteHelper DBHelper 0.2 b = getWritableDB();
}
SQLiteHelper extends
a name1 name2 weight
}
$) DBUtils instance 0.5
field-in
$* DBHelper db 0.4
gets
getWritableDB b $+ … … …

name1 name2 weight


$' getWritableDB db 0.7 Graph + features define a probabilistic graphical model
$( getWritableDB instance 0.4
!
, )
- =
= ! /, 1 234567879:7;, <76=;56/197>? )
- Known variables
1
, Unknown variables = exp (0.3 I $% 234567879:7;, /
A
$% , $& , . . Feature functions + 0.2 I $& 234567879:7;, / + ⋯ )
Probabilistic Graphical Models
class a extends SQLiteHelper {
name1 name2 weight SQLiteDatabase b;
$% SQLiteHelper DBUtils 0.3 public a(Context ctx) {
$& SQLiteHelper DBHelper 0.2 b = getWritableDB();
}
SQLiteHelper extends
a name1 name2 weight
}
$) DBUtils instance 0.5
field-in
$* DBHelper db 0.4
getWritableDB
gets
b $+ … …Next …
How are the weights
name1 name2 weight
$' getWritableDB db 0.7 and features learned?
Graph + features define a probabilistic graphical model
$( getWritableDB instance 0.4
!
, )
- =
= ! /, 1 234567879:7;, <76=;56/197>? )
- Known variables
1
, Unknown variables = exp (0.3 I $% 234567879:7;, /
A
$% , $& , . . Feature functions + 0.2 I $& 234567879:7;, / + ⋯ )
Learning
Learning Actual graphs have
>1,000 nodes

>2,000

name1 name2 weight


Dependency graphs $% SQLiteHelper DBUtils 0.3
Unobfuscated $& SQLiteHelper DBHelper 0.2
APKs name1 name2 $' getWritableDB db 0.7
Static Train
$% SQLiteHelper DBUtils $( getWritableDB instance 0.4
analysis Model
$& SQLiteHelper DBHelper $) DBUtils instance 0.5
$' getWritableDB db $* DBHelper db 0.4
Feature
$( getWritableDB instance $+ … … …
templates
$) DBUtils instance
$* DBHelper db
28 templates $+ … …
Compute weights that maximize
Features (with >100,000 ! , = MN - = ON for all
candidate names) training samples (MN , ON )
DeGuard: System Overview
class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{
SQLiteDatabase b; SQLiteDatabase db;
public a(Context ctx) { Static MAP public DBHelper(Context ctx) {
Transform
}
b = getWritableDB(); analysis Inference }
db = getWritableDB();

} }

Obfuscated Code De-obfuscated Code


Prediction Phase

Learning Phase
Probabilistic model
! )
Open-source, Static
unobfuscated Training
analysis
applications
DeGuard: System Overview
class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{
SQLiteDatabase b; SQLiteDatabase db;
public a(Context ctx) { Static MAP public DBHelper(Context ctx) {
Transform
}
b = getWritableDB(); analysis Inference }
db = getWritableDB();

} }

Obfuscated Code De-obfuscated Code


Prediction Phase

Probabilistic model
! )
Prediction Phase
name1 name2 weight
SQLiteHelper DBUtils 0.3
class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2
SQLiteDatabase b;
public a(Context ctx) { Static
b = getWritableDB(); analysis SQLiteHelper a
} extends
} Obfuscated Code field-in
gets
getWritableDB b

name1 name2 weight


name1 name2 weight DBUtils instance 0.5
getWritableDB db 0.7 DBHelper db 0.4
getWritableDB instance 0.4 DBUtils db 0.2
DBHelper instance 0.2
Prediction Phase
name1 name2 weight
SQLiteHelper DBUtils 0.3

SQLiteDatabase b; MAP Inference


class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2

public a(Context ctx) { M Program


⃗ = /;<T/U !
analysis , = M⃗′ - = O
b = getWritableDB();
M⃗′ ∈ Ω
SQLiteHelper a
} extends
} Obfuscated Code field-in
Candidate assignment P Q P R)* gets
getWritableDB b
a = DBUtils b = instance 1.2
a = DBHelper b = db 1.3
name1 name2 weight
a = DBUtils b = db name1 0.8 name2 weight DBUtils instance 0.5
a = DBHelper getWritableDB
b = instance 1.2 db 0.7 DBHelper db 0.4
getWritableDB instance 0.4 DBUtils db 0.2
DBHelper instance 0.2
*Non-normalized
Prediction Phase
name1 name2 weight
SQLiteHelper DBUtils 0.3

SQLiteDatabase b; MAP Inference


class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2

public a(Context ctx) { M Program


⃗ = /;<T/U !
analysis , = M⃗′ - = O
b = getWritableDB();
M⃗′ ∈ Ω
SQLiteHelper a
} extends
} Obfuscated Code field-in
Candidate assignment P Q P R)* gets
getWritableDB b
a = DBUtils b = instance 1.2
a = DBHelper b = db 1.3
name1 name2 weight
a = DBUtils b = db name1 0.8 name2 weight DBUtils instance 0.5
a = DBHelper getWritableDB
b = instance 1.2 db 0.7 DBHelper db 0.4
getWritableDB instance 0.4 DBUtils db 0.2
DBHelper instance 0.2
*Non-normalized
Prediction Phase
name1 name2 weight
SQLiteHelper DBUtils 0.3
class a extends SQLiteHelper { SQLiteHelper DBHelper 0.2
SQLiteDatabase b;
public a(Context ctx) { Static
b = getWritableDB(); analysis SQLiteHelper DBHelper
} extends
} Obfuscated Code field-in
gets
getWritableDB db
class DBHelper extends SQLiteHelper {
SQLiteDatabase db; name1 name2 weight
public DBHelper(Context ctx) { name1 name2 weight DBUtils instance 0.5
db = getWritableDB(); Transform
getWritableDB db 0.7 DBHelper db 0.4
} getWritableDB instance 0.4 DBUtils db 0.2
} Deobfuscated Code DBHelper instance 0.2
Preserving Semantics
class A
Freely renaming fields/variables/methods must have
int a must have
may change the program semantics distinct
distinct
Object b
names
names
void a()

Syntactic constraints class B


e.g. fields within a class must have distinct names extends must not
A override
void b() method a()
Semantic constraints
void c(A a)
e.g. method overloads must be preserved
DeGuard: System Overview
class a extends SQLiteHelper { class DBHelper extends SQLiteHelper{
SQLiteDatabase b; SQLiteDatabase db;
public a(Context ctx) { Static MAP public DBHelper(Context ctx) {
Transform
}
b = getWritableDB(); analysis Inference }
db = getWritableDB();

} }

Obfuscated Code De-obfuscated Code


Prediction Phase

Learning Phase

Open-source, Static
unobfuscated Training
analysis
applications
DeGuard Implementation
DeGuard Implementation
Static Analysis
§ Static analysis framework
for Java and Android

Learning and MAP Inference


§ Scalable open-source framework
for structured prediction
§ Open-source: http://nice2predict.org
www.apk-deguard.com
§ Training data: 2K open-source,
unobfuscated Android applications
Evaluation
Evaluation
1. Can DeGuard reverse ProGuard?
2. Can DeGuard detect third-party libraries?
3. Is DeGuard useful for malware inspection?
ProGuard Experiment

?
Non-obfuscated APK =
Source Code

Obfuscated APK De-obfuscated APK


After Obfuscation
% of program elements

100 Known names

80

60

40 only 13%
known names
20

0
Fields Methods Classes Packages Total
Can DeGuard reverse ProGuard?
% of program elements Package names are i.e., identical to
80.6% correct directly used to the original
100
predict third-party Known namesnames
names
libraries Correctly predicted names
80
Mis-predicted names
60

40

20 1.6% known
names
0
Fields Methods Classes Packages Total

80% of the names are identical to the original ones


Can DeGuard Detect Third-Party Libraries?

ProGuard
obfuscates library
package names
?
Source Code
ProGuard

Obfuscated APK De-obfuscated APK


Library Code

Precision: 93.1% Recall: 91%


Is DeGuard Useful for Malware Inspection?
De-obfuscating samples from the Android Malware Genome Project

class d { class Base64 {


String a = System.getProperty(..) Base64
String NL = System.getProperty(..)
char[] b; char[] ENC; Decoder
byte [] c; byte [] DEC;
byte[] a(String) {} byte[] decode(String) {}
} }
Malware Sample De-obfuscated Malware Sample

Reveals string Reveals classes that handle Hard to handle heavily-obfuscated


decoders sensitive data (e.g. Location) code (e.g. reflection)
Summary
name1 name2 weight
package com.example.dbhelper package a.b.c SQLiteHelper a
SQLiteHelper DBUtils 0.3
class DBHelper extends class a extends
SQLiteHelper { SQLiteHelper { SQLiteHelper DBHelper 0.2
SQLiteDatabase db; SQLiteDatabase b; getWritableDB b
public DBHelper(Context ctx) { public a(Context ctx) {
db = getWritableDB(); b = getWritableDB();
} } name1 name2 weight
Cursor execSQL(String str) { Cursor c(String str) { getWritableDB db 0.7
return db.rawQuery(str); return b.rawQuery(str);
getWritableDB instance 0.4

Probabilistic Models
100

80

60

40

20
0

Try online: www.apk-deguard.com Fields Methods Classes Packages Total

High Prediction Accuracy

More info: http://www.srl.inf.ethz.ch/spas

You might also like