Professional Documents
Culture Documents
- Actual goals
- Learn the tech details on what's going on under the hood
- Learn how to go from APK back to higher-level representations
2
The Compilation Process
Java source code Kotlin source code C/C++ source code
.java .kt .kts .c .cpp .h
DEX compiler
dx
3
Android Application Package (APK)
- $ unzip app.apk
- Content
- AndroidManifest.xml (compressed)
- classes.dex (raw Dalvik bytecode)
- resources.arsc (compressed)
- res/*.xml (compressed)
4
Java/Dalvik bytecode
- DVM
- It's a program (which your machine can run) that takes as input Dalvik
bytecode and somehow "executes" the intended behavior
5
Dalvik bytecode, it looks like this:
.method foo(ILcom/mobisec/Peppa;)I
.registers 5
6
Why a Virtual Machine?
7
Dalvik bytecode verifier
8
Bad guys' tricks
- Obfuscation tricks!
- Native code
- Reliable disassembling of arbitrary native code is an open problem
- There is no such thing as a "native code verifier"
- Well, there is something called "Google Native Client" (NaCl) but it doesn't apply here
9
Dalvik Bytecode
- Dest-to-src syntax
- E.g., "move r3, r2" means r2 → r3
- Types
- Built-in: V (void), B (byte), S (short), C (char), I (int), Z (boolean), ...
- Actual Classes (syntax: L<fullyqualifiedclassname>;)
- Landroid.content.Intent;
- Lcom.mobisec.Peppa;
10
Dalvik Bytecode
- Dalvik is register-based
12
Register Model
class Peppa {
int pig(int x) {
return 2*x;
}
14
Example (Dalvik bytecode)
.method pig(I)I
int pig(int x) {
.registers 3
return 2*x;
}
mul-int/lit8 v0, v2, 0x2
return v0
.end method Why *3* registers?
This!
15
Example (Dalvik bytecode)
17
Example of Dalvik instructions
- Method invocation
- invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I
- invoke-static ...
- invoke-{direct, super, interface} ...
18
Example of Dalvik instructions
19
Example of Dalvik instructions
20
Which component is actually executing Dalvik?
21
Then, Android ART
- Ahead-Of-Time compilation
- Compilation happens at app installation time
22
ART vs DVM
- Major cons:
- Installation time takes MUCH longer
- Bad repercussion on system upgrades, could take ~15 minutes
23
New Version of ART
- Profiled-guided JIT/AOT
- Introduced in Android 7
24
New Version of ART
- It is pretty smart...
- It automatically precompiles methods that are
"near to be used"
- Precompilation only happens when the device is
idle and charging
25
DVM JIT vs ART AOT vs ART JIT/AOT
26
ODEX: Optimized DEX
- Cons
- ODEX files take space
- Device-dependent (note: it is still bytecode)
27
The analogous of ODEX for ART is tricky...
29
When are these two formats used?
- ART format:
- Only one file: boot.art
- It contains the pre-initialized memory for most of the Android framework
- Huge optimization trick
- OAT format:
- One important file: boot.oat
- It contains the pre-compiled most important Android framework libraries
- All the "traditional" ODEX files are OAT files
- You can inspect them with Android-provided oatdump
30
When a new app is starting
- Optimization trick
- boot.oat is already mapped in memory
- No need to re-load the framework!
31
The Big Picture
33
Tools time!
34
Unpacking APKs
- unzip app.apk
- AndroidManifest.xml (compressed)
- classes.dex
- resources (compressed)
35
smali/baksmali
36
apktool
- apktool is awesome
- It embeds baksmali/smali
38
Disassembly vs. Decompilation
- Disassembly
- classes.dex binary file ~> Dalvik bytecode "smali" representation
- machine code bytes ~> assembly representation (mov eax, edx)
- Decompilation
- Go from assembly/bytecode to source code-level representation
- Dalvik bytecode ~> Java source code
39
How to decompile
- All-in-one tools
- JEB (commercial, VERY expensive)
- BytecodeViewer (pretty good one)
- jadx
- NEW: Ghidra, open source tool developed by NSA
41
aapt
- $ adb devices
43
adb
- $ adb logcat
44
adb
- $ adb shell
- Get a shell on the device
- $ adb shell ls
- Execute "ls" on the device
- https://hackernoon.com/how-i-hacked-modern-vending-m
achines-43f4ae8decec
46