You are on page 1of 46

Messing with Android Apps

Mobile Systems and Smartphone Security


(MOBISEC 2020)
Prof: Yanick Fratantonio
EURECOM 1
This class

- Learn how we go from source code (Java, C/C++) to APK

- Actual goals
- Learn the tech details on what's going on under the hood
- Learn how to go from APK back to higher-level representations

2
The Compilation Process
Java source code Kotlin source code C/C++ source code
.java .kt .kts .c .cpp .h

Java compiler Kotlin compiler


javac kotlinc

Java bytecode Executable


.class .jar by the JVM

DEX compiler
dx

Dalvik bytecode Executable Machine code


.dex by the DVM .so

3
Android Application Package (APK)

- An APK is a zip file (kinda)

- $ unzip app.apk

- Content
- AndroidManifest.xml (compressed)
- classes.dex (raw Dalvik bytecode)
- resources.arsc (compressed)
- res/*.xml (compressed)
4
Java/Dalvik bytecode

- It cannot be run by your processor

- It's code that can be run by a Virtual Machine


- Java VM / JVM
- Dalvik VM / DVM

- DVM
- It's a program (which your machine can run) that takes as input Dalvik
bytecode and somehow "executes" the intended behavior
5
Dalvik bytecode, it looks like this:

.method foo(ILcom/mobisec/Peppa;)I
.registers 5

invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I


move-result v0

add-int v1, v3, v0


return v1
.end method

6
Why a Virtual Machine?

- The same Dalvik bytecode (classes.dex) can run across


multiple devices / architectures
- The DVM is, instead, "custom" for each architecture

- Security benefit: since everything is run in a virtual


machine, the app's code is isolated

7
Dalvik bytecode verifier

- Before the Dalvik bytecode is run, it is processed by a


component called "Bytecode Verifier"

- The verifier checks that the bytecode is well-formed


- You can't do weird tricks ~> Dalvik is easy to disassemble
- Rephrased: If you want to invoke a method, no way you can hide an
"invoke-*" instruction + which method you are invoking

- Are the bad guys out of luck?

8
Bad guys' tricks

- Obfuscation tricks!

- Reflection (target methods are specified via "strings")

- Dynamic Code Loading

- Native code
- Reliable disassembling of arbitrary native code is an open problem
- There is no such thing as a "native code verifier"
- Well, there is something called "Google Native Client" (NaCl) but it doesn't apply here

9
Dalvik Bytecode

- Dalvik knows about OO concepts


- Classes, methods, fields, "object instances"

- Dest-to-src syntax
- E.g., "move r3, r2" means r2 → r3

- Types
- Built-in: V (void), B (byte), S (short), C (char), I (int), Z (boolean), ...
- Actual Classes (syntax: L<fullyqualifiedclassname>;)
- Landroid.content.Intent;
- Lcom.mobisec.Peppa;
10
Dalvik Bytecode

- The bytecode is nicely split in "methods"


- A.smali file for each class

- Dalvik is register-based

- Each method has its own register "frame"

- Methods' args are placed in the last registers of the frame

- If a method is non-static, the first argument is "this"


11
Register Frame

- Consider a method s.t.


- it takes 3 arguments
- its register frame has 6 registers

- The method will use


- Registers v0, v1, v2, v3, v4, v5
- Arguments are placed in v3, v4, v5

12
Register Model

- Very different model than CPU's registers


- They are NOT shared across methods

- But registers can contain values (for built-in types) and


references to objects

- Each object is stored in ONE register


- Including "complex" objects: you just store the reference
- Exception: LONG / DOUBLE, they take TWO *contiguous* registers
13
Example (Java)

class Peppa {
int pig(int x) {
return 2*x;
}

static int foo(int a, Peppa p) {


int b = p.pig(a);
return a+b;
}
}

14
Example (Dalvik bytecode)

.method pig(I)I
int pig(int x) {
.registers 3
return 2*x;
}
mul-int/lit8 v0, v2, 0x2

return v0
.end method Why *3* registers?

This!

15
Example (Dalvik bytecode)

.method static foo(ILcom/example/Peppa;)I


.registers 4

invoke-virtual {v3, v2}, Lcom/example/Peppa;->pig(I)I


move-result v0

add-int v1, v2, v0


static int foo(int a, Peppa p) {
int b = p.pig(a);
return v1
return a+b;
.end method
}
16
Example of Dalvik instructions (doc)

- Moving constants/immediates/registers into registers


- const v5, 0x123
- move v4, v5

- Math-related operations (many, many variants)


- add-int v1, v3, v0
- mul-int/lit8 v0, v2, 0x2

17
Example of Dalvik instructions

- Method invocation
- invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I
- invoke-static ...
- invoke-{direct, super, interface} ...

- Getting return value


- invoke-virtual {v4, v3}, Lcom/mobisec/Peppa;->pig(I)I
- move-result v5

18
Example of Dalvik instructions

- Set/get values from fields


- iget, iget-object, ...
- iput, iput-object, ...
- sget, sput ... (for static fields)

- Instantiate new object


- new-instance v2, Lcom/mobisec/Peppa;

19
Example of Dalvik instructions

- Conditionals / control flow redirection


- if-ne v0, v1, :label_a
...
:label_a
...
- goto :label_b

- Meta-instructions that contain "data"


- filled-new-array

20
Which component is actually executing Dalvik?

- In the past (up to Android 4.4)


- DVM, libdvm.so
- When about to execute a method, compile it and run
- Compile process: "Dalvik bytecode ~> machine code"
- Rephrasing: compilation is done "on demand"
- We refer to this as Just-In-Time compilation (JIT)

- Compiled code is stored in a cache

21
Then, Android ART

- ART stands for Android Run-Time


- It replaced the old DVM
- It was introduced in Android 4.4 as optional, mandatory in Android 5

- Ahead-Of-Time compilation
- Compilation happens at app installation time

22
ART vs DVM

- Pro: The app's boot and execution are MUCH faster


- Because everything is already compiled

- Cons: ART takes more space on RAM & disk

- Major cons:
- Installation time takes MUCH longer
- Bad repercussion on system upgrades, could take ~15 minutes

23
New Version of ART

- Profiled-guided JIT/AOT
- Introduced in Android 7

- ART profiles an app and precompiles only the


"hot" methods, the ones most likely to be used

- Other parts of the app are left uncompiled

24
New Version of ART

- It is pretty smart...
- It automatically precompiles methods that are
"near to be used"
- Precompilation only happens when the device is
idle and charging

- Biggest Pro: quick path to install / upgrade

25
DVM JIT vs ART AOT vs ART JIT/AOT

DVM JIT ART AOT ART JIT/AOT

App boot time slowest fastest trade-off

App speed slowest fastest trade-off

App install time fastest slowest trade-off

System upgrade time fastest slowest trade-off

RAM/disk usage lowest highest trade-off

26
ODEX: Optimized DEX

- DEX → dexopt → ODEX

- It is optimized DEX: faster to boot and to run

- Most (all?) system apps that start at boot are ODEXed

- Note: ODEX is an additional file, next to an APK

- Cons
- ODEX files take space
- Device-dependent (note: it is still bytecode)
27
The analogous of ODEX for ART is tricky...

- The new Android Run-Time uses two formats

- The ART format (.art files)


- It contains pre-initialized classes / objects

- The OAT files


- Compiled bytecode to machine code, wrapped in an ELF file
- It can contain one or more DEX files (the actual Dalvik bytecode)
- Obtained with dex2oat (usually run at install time)
28
The analogous of ODEX for ART is tricky...

- The confusing part: you still have .odex files!

- Now .odex files are OAT-formatted files!

29
When are these two formats used?

- ART format:
- Only one file: boot.art
- It contains the pre-initialized memory for most of the Android framework
- Huge optimization trick

- OAT format:
- One important file: boot.oat
- It contains the pre-compiled most important Android framework libraries
- All the "traditional" ODEX files are OAT files
- You can inspect them with Android-provided oatdump
30
When a new app is starting

- All apps processes are created by forking Zygote

- Zygote can be seen as the "init" of Android


- A "template" process for each app

- Optimization trick
- boot.oat is already mapped in memory
- No need to re-load the framework!

31
The Big Picture

Taken from stackoverflow 32


More information

- "Dalvik and ART" slides: link

- Write-up for an old CTF challenge I've solved a while ago


- It involves ART / OAT / etc.
- Here it is: link

33
Tools time!

34
Unpacking APKs

- unzip app.apk
- AndroidManifest.xml (compressed)
- classes.dex
- resources (compressed)

35
smali/baksmali

- $ baksmali classes.dex -o output


- Disassemble DEX files
- Output: a .smali file for each class
- Dalvik bytecode in "smali" format

- $ smali output -o patched.apk


- Assembler for DEX files

36
apktool

- apktool is awesome

- It embeds baksmali/smali

- It unpacks / packs APKs, including resources and


manifest files

- $ apktool d app.apk -o output


- $ apktool b output -o patched.apk
37
Signing apps

$ keytool -genkey -v -keystore debug.keystore -alias


androiddebugkey -keyalg DSA -sigalg SHA1withDSA -keysize
1024 -validity 10000

$ jarsigner -keystore <path to debug.keystore> -verbose


-storepass android -keypass android -sigalg SHA1withDSA
-digestalg SHA1 app.apk androiddebugkey

38
Disassembly vs. Decompilation

- Disassembly
- classes.dex binary file ~> Dalvik bytecode "smali" representation
- machine code bytes ~> assembly representation (mov eax, edx)

- Decompilation
- Go from assembly/bytecode to source code-level representation
- Dalvik bytecode ~> Java source code

39
How to decompile

- All-in-one tools
- JEB (commercial, VERY expensive)
- BytecodeViewer (pretty good one)
- jadx
- NEW: Ghidra, open source tool developed by NSA

- Using a Java decompiler (Java bytecode ~> Java)


- Dalvik bytecode ~> Java bytecode
- dex2jar
- Java bytecode ~> Java source code
- Jd-GUI
40
Decompilation

- Decompiling Dalvik bytecode is usually simple

- Packing techniques and obfuscation tricks try to make


decompilers' lives very difficult

- When they don't work, you gotta read the bytecode

41
aapt

- It comes with Android SDK


- <sdk>/build-tools/26.0.2/aapt

- It takes an APK as input

- It can dump tons of useful info


- Package name, components, main activity, permissions
- strings, resources, ...

- $ aapt dump badging <apk_path>


42
adb

- Tool to interact with apps and devices/emulators

- $ adb devices

- $ adb install app.apk


- $ adb uninstall com.mobisec.testapp # package name

43
adb

- $ adb logcat

- $ adb push file.txt /sdcard/file.txt # push to device


- $ adb pull /sdcard/file.txt file.txt # pull from device

44
adb

- $ adb shell
- Get a shell on the device

- $ adb shell ls
- Execute "ls" on the device

- $ adb shell am start -n <pkgname>/<component>


- $ adb shell pm grant <pkgname> <permission>
- $ adb shell dumpsys
45
Is this stuff actually useful?

- News from (3 days + 2 year) ago

- One guy analyzed an app for vending machines

- They were storing the "amount of money" in a content


provider INSIDE the app

- https://hackernoon.com/how-i-hacked-modern-vending-m
achines-43f4ae8decec

46

You might also like