Playing with Binary Analysis

Deobfuscation of VM based software protection

Jonathan Salwan,
Sébastien Bardin and Marie-Laure Potet
SSTIC 2017

Topic
● Binary protection
○ Virtualization-based software protection
● Automatic deobfuscation, our approach
● The Tigress challenges
● Limitations
● What next?
● Conclusion

Binary Protection

Binary Protection ● Goal ○ Turn your program to make it hard to analyze ■ Protect your software against reverse engineering P Transformation P’ .

] ○ Virtualization-based software protection ..Binary Protection ● There are several kinds of protection ○ [..

Binary Protection .Virtualization ● Also called Virtual Machine (VM) ● Virtualize a custom Instruction Set Architecture (ISA) .

Virtualization ● Also called Virtual Machine (VM) ● Virtualize a custom Instruction Set Architecture (ISA) long secret(long x) { [transformations on x] return x. } .Binary Protection . } bool auth(long user_input) { long h = secret(user_input). return (h == 0x9e3779b97f4a7c13).

Virtualization ● Also called Virtual Machine (VM) ● Virtualize a custom Instruction Set Architecture (ISA) long secret(long x) { [transformations on x] Bytecodes . return (h == 0x9e3779b97f4a7c13). } .Custom ISA return x. } bool auth(long user_input) { long h = secret(user_input).Binary Protection .

Custom ISA return x.Binary Protection . } . &h. return (h == 0x9e3779b97f4a7c13). VM(opcodes. } bool auth(long user_input) { long h = 0. user_input).Virtualization ● Also called Virtual Machine (VM) ● Virtualize a custom Instruction Set Architecture (ISA) long secret(long x) { [transformations on x] Bytecodes .

} bool auth(long user_input) { long h = 0.Virtualization ● Also called Virtual Machine (VM) ● Virtualize a custom Instruction Set Architecture (ISA) Removed long secret(long x) { [transformations on x] Bytecodes . return (h == 0x9e3779b97f4a7c13). VM(opcodes. &h. } .Binary Protection . user_input).Custom ISA return x.

Go to the next instruction or terminate . Decode the opcode .Binary Protection .VM Design (a simple one) Fetching Decoding Dispatcher ● Close to a CPU design Operator 1 Operator 2 Operator 3 a.mnemonic / operands c. Dispatch to the appropriate semantics handler d. Fetch the opcode pointed via the virtual IP b. Execute the semantics Terminator e.

VM Design (a simple one) Fetching long secret(long x) { [transformations on x] Decoding Bytecodes . Go to the next instruction or terminate .Binary Protection . Decode the opcode . Dispatch to the appropriate semantics handler d. } Dispatcher ● Close to a CPU design Operator 1 Operator 2 Operator 3 a. Execute the semantics Terminator e. Fetch the opcode pointed via the virtual IP b.Custom ISA return x.mnemonic / operands c.

Custom ISA Decoding Fetch : Dispatcher Decode : Code : Operator 1 Operator 2 Operator 3 Terminator .VM Design (a simple one) Fetching Bytecodes .Binary Protection .

Custom ISA Decoding Fetch : 0xaabbccdd Dispatcher Decode : Code : Operator 1 Operator 2 Operator 3 Terminator .Binary Protection .VM Design (a simple one) Fetching Bytecodes .

Custom ISA Decoding Fetch : 0xaabbccdd Dispatcher Decode : mov r/r Code : Operator 1 Operator 2 Operator 3 Terminator .VM Design (a simple one) Fetching Bytecodes .Binary Protection .

Custom ISA Decoding Fetch : 0xaabbccdd Dispatcher Decode : mov r/r Code : Operator 1 Operator 2 Operator 3 Terminator .VM Design (a simple one) Fetching Bytecodes .Binary Protection .

Custom ISA Decoding Fetch : 0xaabbccdd Dispatcher Decode : mov r/r Code : mov r1. input Operator 1 Operator 2 Operator 3 Terminator .VM Design (a simple one) Fetching Bytecodes .Binary Protection .

Custom ISA Decoding Fetch : Dispatcher Decode : Code : mov r1.VM Design (a simple one) Fetching Bytecodes . input Operator 1 Operator 2 Operator 3 Terminator .Binary Protection .

Custom ISA Decoding Fetch : 0x11223344 Dispatcher Decode : Code : mov r1.VM Design (a simple one) Fetching Bytecodes . input Operator 1 Operator 2 Operator 3 Terminator .Binary Protection .

VM Design (a simple one) Fetching Bytecodes . input Operator 1 Operator 2 Operator 3 Terminator .Binary Protection .Custom ISA Decoding Fetch : 0x11223344 Dispatcher Decode : mov r/i Code : mov r1.

Custom ISA Decoding Fetch : 0x11223344 Dispatcher Decode : mov r/i Code : mov r1.VM Design (a simple one) Fetching Bytecodes .Binary Protection . input Operator 1 Operator 2 Operator 3 Terminator .

VM Design (a simple one) Fetching Bytecodes .Binary Protection . 2 Operator 1 Operator 2 Operator 3 Terminator .Custom ISA Decoding Fetch : 0x11223344 Dispatcher Decode : mov r/i Code : mov r1. input mov r2.

Binary Protection - VM Design (a simple one)
Fetching

Bytecodes - Custom ISA Decoding

Fetch :
Dispatcher
Decode :

Code : mov r1, input
mov r2, 2
Operator 1 Operator 2 Operator 3

Terminator

Binary Protection - VM Design (a simple one)
Fetching

Bytecodes - Custom ISA Decoding

Fetch : 0x5577aabb
Dispatcher
Decode :

Code : mov r1, input
mov r2, 2
Operator 1 Operator 2 Operator 3

Terminator

Binary Protection - VM Design (a simple one)
Fetching

Bytecodes - Custom ISA Decoding

Fetch : 0x5577aabb
Dispatcher
Decode : mul r/r/r

Code : mov r1, input
mov r2, 2
Operator 1 Operator 2 Operator 3

Terminator

2 Operator 1 Operator 2 Operator 3 Terminator .VM Design (a simple one) Fetching Bytecodes . input mov r2.Binary Protection .Custom ISA Decoding Fetch : 0x5577aabb Dispatcher Decode : mul r/r/r Code : mov r1.

r1. 2 Operator 1 Operator 2 Operator 3 mul r3.Custom ISA Decoding Fetch : 0x5577aabb Dispatcher Decode : mul r/r/r Code : mov r1.VM Design (a simple one) Fetching Bytecodes . input mov r2.Binary Protection . r2 Terminator .

2 Operator 1 Operator 2 Operator 3 mul r3. input mov r2. r1. r2 Terminator .Custom ISA Decoding Fetch : Dispatcher Decode : Code : mov r1.Binary Protection .VM Design (a simple one) Fetching Bytecodes .

VM Design (a simple one) Fetching Bytecodes .Binary Protection . 2 Operator 1 Operator 2 Operator 3 mul r3. r1. r2 Terminator . input mov r2.Custom ISA Decoding Fetch : 0x1337dead Dispatcher Decode : Code : mov r1.

Custom ISA Decoding Fetch : 0x1337dead Dispatcher Decode : ret r Code : mov r1. 2 Operator 1 Operator 2 Operator 3 mul r3. input mov r2. r1.Binary Protection .VM Design (a simple one) Fetching Bytecodes . r2 Terminator .

Binary Protection . r2 Terminator . r1. 2 Operator 1 Operator 2 Operator 3 mul r3.Custom ISA Decoding Fetch : 0x1337dead Dispatcher Decode : ret r Code : mov r1.VM Design (a simple one) Fetching Bytecodes . input mov r2.

Custom ISA Decoding Fetch : 0x1337dead Dispatcher Decode : ret r Code : mov r1. 2 Operator 1 Operator 2 Operator 3 mul r3. r2 ret r3 Terminator . input mov r2.Binary Protection .VM Design (a simple one) Fetching Bytecodes . r1.

Binary Protection . r2 ret r3 Terminator .VM Design (a simple one) Fetching Bytecodes . input mov r2. 2 Operator 1 Operator 2 Operator 3 mul r3. r1.Custom ISA Decoding Fetch : Dispatcher Decode : Code : mov r1.

Virtual Machine .Standard Reverse Process ● Reverse and understand the virtual machine’s structure / components ● Create a disassembler and then reverse the bytecodes ? Bytecodes ? Create a disassembler ? Disassembly ? ? ? ? Start Reversing .

Our Approach Automatic Deobfuscation .

Automatic Deobfuscation ● We don’t care about reconstructing a disassembler ● Our goal: .Our Approach .

Automatic Deobfuscation ● We don’t care about reconstructing a disassembler ● Our goal: ○ Directly reconstruct a devirtualized binary from the obfuscated one .Our Approach .

Our Approach .Automatic Deobfuscation ● We don’t care about reconstructing a disassembler ● Our goal: ○ Directly reconstruct a devirtualized binary from the obfuscated one ○ The crafted binary must have a control flow graph close to the original one .

Automatic Deobfuscation ● We don’t care about reconstructing a disassembler ● Our goal: ○ Directly reconstruct a devirtualized binary from the obfuscated one ○ The crafted binary must have a control flow graph close to the original one ○ The crafted binary must have instructions close to the original ones .Our Approach .

VM(opcodes. return (h == 0x9e3779b97f4a7c13). } FROM bool auth(long user_input) { long h = 0. } . &h. user_input).Our Approach .Automatic Deobfuscation Removed long secret(long x) { [transformations on x] Bytecodes return x.

Automatic Deobfuscation TO Obfuscated Traces .Our Approach .

Automatic Deobfuscation THEN FROM Simplified Traces .Our Approach .

} TO bool auth(long user_input) { long h = secret(user_input).Our Approach . } .Automatic Deobfuscation long secret_prime(long x) { [transformations on x] return x. return (h == 0x9e3779b97f4a7c13).

return (h == 0x9e3779b97f4a7c13). } .Automatic Deobfuscation long secret_prime(long x) { Where secret_prime() is semantically [transformations on x] identical to the original code but without return x. the process of the virtual machine } TO bool auth(long user_input) { long h = secret(user_input).Our Approach .

e. it must execute the original instruction (or its equivalent.Important fact ● Our approach is based on an important fact: Fetching ○ trace P' = instr P + instr VM Decoding Whatever the process of the VM execution.Our Approach .g: div / shr) Dispatcher Operator 1 Operator 2 Operator 3 Terminator . at the end.

g: div / shr) Dispatcher Operator 1 Operator 2 Operator 3 Terminator .Important fact ● Our approach is based on an important fact: Fetching ○ trace P' = instr P + instr VM Decoding Whatever the process of the VM execution. e.Our Approach . it must execute the original instruction (or its equivalent. at the end.

Concretize everything which is not related to these instructions (discard VM) 4. Unfolding program (tree-like program) 6. Perform a code coverage to recover the original CFG (iterate on more traces) 5. Compacted program (folding program) . Recompile with compiler optimizations a.Our Approach .Overview 1. Isolate these pertinent instructions using a taint analysis along a trace 2. Transform our representation into the LLVM one a. Keep a semantics transition between these isolated instructions using a SE 3.

VM(opcodes. &h.Step 1: Taint Analysis ● Track the input(s) of the function into the process of the VM execution long secret(long x) { [transformations on x] Custom ISA return x. } bool auth(long user_input) { Tainted long h = 0. } . user_input). return (h == 0x9e3779b97f4a7c13).

rax VM(opcodes. user_input). cl [transformations on x] Custom ISA mov rax. rax } mov rdx. [.. qword ptr [rdx] mov qword ptr [rax]. rbx return x.] return (h == 0x9e3779b97f4a7c13).Step 1: Taint Analysis ● Track the input(s) of the function into the process of the VM execution ● Pertinent instructions isolated mov rsi. rsi shr rbx. &h. rdx bool auth(long user_input) { Tainted mov rcx. qword ptr [rax] long secret(long x) { mov rbx. mov qword ptr [rdx]. rcx long h = 0.. } . qword ptr [rax] xor rax. mov qword ptr [rdx].

qword ptr [rdx] mov qword ptr [rax]. qword ptr [rax] mov rbx. rbx mov qword ptr [rdx].. cl mov rax. rax [.Step 1: Taint Analysis ● Track the input(s) of the function into the process of the VM execution ● Pertinent instructions isolated mov rsi. rsi shr rbx..] . qword ptr [rax] xor rax. rdx mov rcx. rcx mov qword ptr [rdx]. rax mov rdx.

. cl mov rax. rax mov rdx. rbx mov qword ptr [rdx]. qword ptr [rax] xor rax. rax [.] . qword ptr [rdx] mov qword ptr [rax]. rsi shr rbx. rcx mov qword ptr [rdx]. qword ptr [rax] mov rbx..Step 1: Taint Analysis ● Track the input(s) of the function into the process of the VM execution ● Pertinent instructions isolated ○ Now. rdx mov rcx. the problem is that this sub-trace has no sense without the VM’s state mov rsi.

rsi shr rbx. rbx mov qword ptr [rdx]. rdx mov rcx.] .. cl mov rax. qword ptr [rax] xor rax. qword ptr [rdx] mov qword ptr [rax]. rax mov rdx. qword ptr [rax] mov rbx.Step 2: Symbolic Representation ● A symbolic representation is used to provide a sense to these tainted instructions mov rsi. rcx mov qword ptr [rdx]. rax [..

rax ref!1334 := (((_ extract 63 0) ref!1131)) [. qword ptr [rax] ) xor rax... rbx (bvlshr mov qword ptr [rdx].] [. rcx ) mov qword ptr [rdx]. qword ptr [rax] ref!228 := SymVar_0 mov rbx. rdx (_ bv63 64) ) mov rcx.. cl ref!1131 := ( mov rax.] .. qword ptr [rdx] ((_ zero_extend 56) (_ bv5 8)) mov qword ptr [rax]. rax ((_ extract 63 0) ref!243) (bvand mov rdx. rsi ref!243 := (((_ extract 63 0) ref!228)) shr rbx.Step 2: Symbolic Representation ● A symbolic representation is used to provide a sense to these tainted instructions Symbolic representation of a given path mov rsi.

x 9 x 1 x π + π 7 2 5 3 4 .Step 3: Concretization Policy ● Input(s) of the function are both tainted and symbolized ● In order to remove the process of the VM execution ○ We concretize every LOAD and STORE ○ We concretize everything which is not related to the input(s) ■ Untainted values are concretized + + .

we must discover its paths ○ SMT solver is used onto our symbolic representation .Step 4: Code Coverage .Discovering Paths ● In order to find the original CFG.

From a Paths Tree to a CFG? ● Two approaches ○ Custom algorithm (not trivial) ○ LLVM optimizations (-02) (the lazy way) .Step 4: Code Coverage .

Step 5: Transformation to LLVM-IR ● In order to reconstruct a valid binary and apply paths merging ○ Move from our representation to the LLVM-IR ○ Arybo as crossroad Medusa Optimizations Binary code Triton AST Arybo IR LLVM-IR Binary code Miasm Sspam Bit-blasting https://github.com/quarkslab/arybo .

Step 6: Recompilation ● Based on the LLVM-IR we are able to: ○ Recompile a valid (and deobfuscated) code ○ Move to another architecture ○ Apply LLVM’s analysis and optimizations .

The Tigress Challenges .

cs.The Tigress Challenges ● Tigress ○ C Diversifier/Obfuscator ○ http://tigress.edu ● Challenges ○ 35 VMs ○ f(x) → x’ ■ Function f is virtualized and we have to find the transformation algorithm .arizona.

The Tigress Challenges .

The Tigress Challenges .

Limitations .

asynchronous codes… ● Currently. we also have these limitations: ○ Loops reconstruction ○ Arrays reconstruction ■ Due to our concretization policy ○ Calls graph reconstruction . IPC.Limitations ● Our limitations are those of the symbolic execution ○ Code coverage of the virtualized function ■ Complexity of expressions ○ Multi-threading.

What Next? .

What Next? ● Be able to determine on what designs of VM this approach works and doesn't ● Tests onto others protections .

What Next? ● Be able to determine on what designs of VM this approach works and doesn't ● Tests onto others protections ○ Teasing: It’s working well on VMProtect .

Demo .

Conclusion .

Conclusion ● Dynamic Taint Analysis + DSE ○ Powerful against VM based protections simplification ■ Automatic. etc ● LLVM optimizations ○ Powerful for paths merging (and code simplification) ● Worked well for the Tigress protection ○ They (Tigress team) released a new protection ■ Code obfuscation against symbolic execution attacks ACSAC '16 Recommendation: Protections should also be applied onto the custom ISA instead of the process of the VM execution . dispatcher. vpc. independent from custom opcode.

Thanks .com https://github.com/JonathanSalwan/Tigress_protection .Questions? https://triton.quarkslab.

Marion Videau ○ Review.Acknowledgements ● Adrien Guinet ○ Arybo support ● Romain Thomas ○ Ideas around path merging ● Gabriel Campana. proofreading . Fred Raynal.