You are on page 1of 7

Introduction to

Modern Code Virtualization

by Nooby
This paper describes how code protection is done via virtual machines and techniques used in
popular virtual machines, giving a considerable level of understanding of such virtual machines for
readers from beginners to professionals.
Why Virtual Machines?
In the early years of software protection, techniques like obfuscation and mutation were developed,
such methods insert junk codes into the oriinal code flow, chane the oriinal instructions to their
synonyms, replace constants with some calculations, insert conditional and unconditional branches
and write random bytes in the hives between e!ecution code"makin sequential disassemblin fail#,
etc$ %!ample of such processin are included as e!ample&obfuscated$e!e$
'ver time, the level of comple!ity introduced with such methods become insufficient due to the
development of debuin tools"many now features run(time tracin# and averae human skill$ It is
possible for an averae skilled reverse enineer to manually or semi(automatically clean up the
code to make it readable)alterable$ *eavier obfuscation increases the size of protected code
dramatically but brins little increase in comple!ity$ +eople bein to seek a method of code
protection aainst such brute force$ ,y breakin down instructions into a e!ecution loop with a set
of reusable micro operations, the lenth of code bein e!ecuted can be increased e!ponentially
without rowin much in size$ -uch loop code act like an .emulator/ of the oriinal code"aka$
Interpreter#, with takes in a flow of data"aka$ +esudo(code or +(code#, do micro operations"aka$
*andlers#, which much like a .virtual machine/ e!ecutes on its own instruction set$ 0his rows into
the term1 code virtialization$
How Does a Virtual Machine Work?
2e know that a real processor has reisters, instruction decoders and e!ecution loic$ Virtual
machines are about the same$ 0he virtual machine entry code will collect conte!t information from
the real processor and store on its own conte!t, the the e!ecution loop will read +(Code and
dispatch to the correspondin handler$ 3nd when the virtual machine e!its, it will update real
processor reisters from its stored conte!t$
4or a quick e!ample, there is a function bein e!ecuted via a pseudo virtual machine1
'riinal Instructions1
add ea!, eb!
,y transform it into virtualized code1
push address&of&pcode
jmp VM%ntry
push all reister values
jmp VM5oop
fetch p(code from VM%I+
dispatch to handler
pop all reister values into VMConte!t
pop address&of&pcode into VM%I+
jmp VM5oop
do .add ea!, eb!/ on VMConte!t
jmp VM5oop
restore reister values from VMConte!t
do .retn/
Note that a virtual machine does not and need not to emulate all !89 instructions, some can be
e!ecuted as(is by the real processor, which takes the virtual machine to e!it at certain point, point
%I+ to the raw instruction and then re(enter VM$
0he actual virtual machine handlers are usually desined more eneric as opposed to the e!ample
handler above$ :sually the +(Code also determines operands$ 0he .3dd&%36&%,6&*ander/ can
be defined as an .3dd&*andler/ which takes ; parameters and produce a result$ 0here will also be
load)store reister handlers with produces)saves parameters and results$ ,y doin so makes the
handlers more reusable so that tracin throuh such handlers without understandin the virtual
machine architecture cannot not make a ood understandin of oriinal code$ Now we see how it
works on a stack(based virtual machine1
pop 7%< = 7%< > parameter;
add ?-03C@A, 7%< = ?-03C@A points to parameterB
fetch +(Code for operand
push VMC'N0%60?operandA = push value of 7%< on stack
fetch +(Code for operand
pop VMC'N0%60?operandA = pop value of 7%< from stack
0he +(Code of above function will be1
<et7%< %,6
<et7%< %36
-et7%< %36
What Modern Virtual Machines Do Against Reverse Engineering?
Code obfuscation and mutation are important to code virtualization, as the virtual machine
Interpreters are directly e!posed, they can help protectin the virtual machine aainst automated
analysis tools$ *eavily obfuscated virtual machine handlers can take a while to de(obfuscate
without the knowlede of its underlyin virtual machine architecture$ -ince some of the processor
reisters are not used"VMConte!t are stored separately and the virtual machine Interpreter can be
desined to use only a few reisters#, they can be used as e!tra obfuscation$ Virtual machine
handlers can be desined to have as little operand)conte!t dependency as possible$ 4urthermore,
real stack pointer can be tracked within the VMConte!t, stack can be junked durin interpreter loop$
2ith these aspects, code obfuscation and mutation can be very effective$ 3n e!ample of such
obfuscated virtual machine can be found in e!ample&virtualized$e!e
Now we know how the e!ecution part of virtual machines are protected, letCs continue to see what
techniques are used durin transformin instructions into +(Codes, which is the part of
awesomeness in code virtualization$
Instruction Decomposition
Logical Instructions
In the approach of increased comple!ity and reusabiliy, loical operations can be broken into
operation like N3ND)N'7, accordin to the followin1
N'0"6# > N3ND"6, 6# > N'7"6, 6#
3ND"6, E# > N'0"N3ND"6, E## > N'7"N'0"6#, N'0"E##
'7"6, E# > N3ND"N'0"6#, N'0"E## > N'0"N'7"6, E##
6'7"6, E# > N3ND"N3ND"N'0"6#, E#, N3ND"6, N'0"E### > N'7"3ND"6, E#, N'7"6, E##
Arithmetic Instructions
-ubtraction can be substituted by addition, with the overhead of %453<- calculation1
-:,"6, E# > N'0"3DD"N'0"6#, E##
0akin the %453<- before the final N'0 as 3 and the %453<- after the final N'0 as ,, the
calculation is as follows1
%453<- > '7"3ND"3, F!8BG#, 3ND",, N'0"F!8BG### = F!8BG masks '4, 34, +4 and C4
Register Astraction
-ince a virtual machine can have more reisters than an actual !89 processor, real processor
reisters can be dynamically mapped to virtual machine reisters, the e!tra reisters can be used to
store intermediate values or simply be confusions$ 0his also allows further obfuscation)optimization
across instructions as described below$
Context Rotation
Due to reister abstraction, different pieces of +(Code can have different reister mappins, and
such correspondence can be desined to chane from time to time, makin reverse enineerin
more difficult$ 0he virtual machine only swaps the value on its conte!t when the ne!t piece of +(
Code has different reister mappins$ 2hen transformin instructions like 6C*<, it can simply
chane the mappins of reisters and not producin any +(Code$ -ee the followin e!ample1
'riinal Instruction1
!ch eb!, ec!
add ea!, ec!
+(Code 2ithout Conte!t 7otation1
<et7%< 7; = 7; > %C6
<et7%< 7B = 7B > %,6
-et7%< 7; = %C6 > value of %,6
-et7%< 7B = %,6 > value of %C6
<et7%< 7;
<et7%< 7F = 7F > %36
-et7%< 7F
+(Code 2ith Conte!t 7otation"e!chae done durin +(Code eneration#1
?Map 7B > %C6, 7; > %,6A = e!chane
<et7%< 7B = 7B > %C6
<et7%< 7F = 7F > %36
-et7%< 7F = 7F > %36
-uch rotation can also be applied to the last -et7%< operation, so that the result of addition will be
written to another unused virtual machine reister"i$e$ 7H#, leavin the 7F with useless data$ 0he
followin piece of +(Code operates on H virtual machine reisters, makes it hard for reverse
enineers to find its !89 equivalent$
After Exchange
7eal 7eisters Virtual 7eisters
%36 7F
%,6 7;
%C6 7B
Before Exchange
7eal 7eisters Virtual 7eisters
%36 7F
%,6 7B
%C6 7;
urrent !egister "appings
7eal 7eisters Virtual 7eisters
%36 7F
%,6 7B
%C6 7;
+(Code 2ith Conte!t 7otation ;1
?Map 7B > %C6, 7; > %,6A = e!chane
<et7%< 7B = 7B > %C6
<et7%< 7F = 7F > %36
?Map 7F > :nused, 7H > %36A = rotation
-et7%< 7H = 7H > %36
Register Aliasing
2hen processin assinment instructions, especially assinment between reisters, it is possible to
make temporary mappins between source and destination reisters$ :nless the source reister is
about to be chaned"which forces a remappin or a <et7%< I -et7%< operation#, this mappin
can redirect read access to destination reister to its source without actually perform the assinment$
0ake the followin piece of code as an e!ample1
'riinal Instructions1
mov ea!, ec!
add ea!, eb!
mov ec!, ea!
mov ea!, eb!
?Make alias 7F > 7;A
<et7%< 7B = 7B > %,6
<et7%< 7; = readin of 7F redirects to 7;
?7F"%36# is bein chaned, since 7F is destination of an alias, just clear its aliasA
?Map 7F > :nused, 7H > %36A = rotation
-et7%< 7H = 7H > %36
?Make alias 7; > 7HA
<et7%< 7B
?7H"%36# is bein chaned, since 7H is source of an alias, we need to do the assinmentA
?Map 7H > %C6, 7; > %36A = we can simplify the 7; > 7H assinment by rotation
?Map 7F > %36, 7H > :nusedA = another rotation
-et7%< 7F = 7F > %36
urrent !egister "appings
7eal 7eisters Virtual 7eisters
%36 7F
%,6 7B
%C6 7;
Register !sage Analysis
<iven the conte!t of a set of instructions, it can determined that at some point the value of certain
reisters are chaneable without affectin the proram loic, and some overhead of %453<-
calculations can be omitted$ 4or e!ample, a piece of code at F!JF9K38 in e!ample$e!e1
+:-* %,+
M'V %,+, %-+ = %36L%C6L%,+L'4L-4LM4L+4LC4
-:, %-+, F!BF = %36L%C6L'4L-4LM4L+4LC4
M'V %C6, D2'7D +07 ?%,+NF!8A = %36L%C6L'4L-4LM4L+4LC4
M'V %36, D2'7D +07 ?%C6NF!BFA = %36L'4L-4LM4L+4LC4
+:-* %-I = '4L-4LM4L+4LC4
M'V %-I, D2'7D +07 ?%,+NF!CA = %-IL'4L-4LM4L+4LC4
+:-* %DI = '4L-4LM4L+4LC4
M'V %DI, %-I = %DIL'4L-4LM4L+4LC4
-:, %DI, D2'7D +07 ?%C6NF!CA = '4L-4LM4L+4LC4
3DD %-I, (F!J = %C6L'4L-4LM4L+4LC4
-*7 %DI, F!4 = %C6L'4L-4LM4L+4LC4
M'V %C6, %DI = %C6L'4L-4LM4L+4LC4
IM:5 %C6, %C6,F!;FJ = '4L-4LM4L+4LC4
5%3 %C6, D2'7D +07 ?%C6N%36NF!BJJA = '4L-4LM4L+4LC4
M'V D2'7D +07 ?%,+(F!BFA, %C6 = '4L-4LM4L+4LC4
M'V %C6, D2'7D +07 ?%-IA = %C6L'4L-4LM4L+4LC4
D%C %C6 = '4L-4LM4L+4LC4
0%-0 C5, F!B = '4L-4LM4L+4LC4
M'V D2'7D +07 ?%,+(F!JA, %C6
3nalysis in comments shows the unused reister)fla state before the instruction$ 0his information
is used to enerate reister rotations, %453<- calculation omission and junk operations which
makes enerated +(Code even harder to analyze$
"ther #$%ode "&uscations ' "ptimi(ations
Const Encryption
Intermediate values and constants from the oriinal instructions can be transformed into calculated
results durin run(time, thus decrease the chance of constants bein directly e!posed in +(Codes$
Stack Obfuscation
0he virtual machineCs stack can be obfuscated by pushin)writin random values due to the fact that
the real %-+ can be calculated)tracked from VMConte!t$
Multiple Virtual Machine Interpreters
It is possible to use multiple virtual machines to e!ecute one series of +(Code$ 'n certain points, a
special handler leadin to another interpreter loop is e!ecuted$ 0he +(Code data after such points
are processed in a different virtual machine$ 0hese virtual machines need only to share the
intermediate run(time information such as reister mappins on switch points$ 0racin such +(Code
will need to analyze all virtual machine instances, which is considerably much more work$
ode #irtuali'er
!e)olf*s x+, #irtuali'er