You are on page 1of 197

Frida handbook

Learn about binary instrumentation


with the Frida toolkit.

Fernando Diaz (@entdark_)


This book is for sale at http://leanpub.com/fridahandbook

This version was published on 2022-04-21

This is a Leanpub book. Leanpub empowers authors and publishers


with the Lean Publishing process. Lean Publishing is the act of
publishing an in-progress ebook using lightweight tools and many
iterations to get reader feedback, pivot until you have the right book
and build traction once you do.

© 2021 - 2022 Fernando Diaz (@entdark_)


To my parents, and my “adoptive family” at VirusTotal.
Contents

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Handbook structure . . . . . . . . . . . . . . . . . . . . 1

2. What we will need . . . . . . . . . . . . . . . . . . . . . . . . 3


2.1 System requirements . . . . . . . . . . . . . . . . . . . 3
2.2 Software requirements . . . . . . . . . . . . . . . . . . 3
2.3 Programming language requirements . . . . . . . . . 4

3. Binary instrumentation and Frida . . . . . . . . . . . . . . 5


3.1 Application and code-level instrumentation . . . . . 6
3.2 Frida: a binary instrumentation toolkit . . . . . . . . 7
3.3 Instrumentation tool structure under Frida . . . . . 10
3.4 Frida architecture basics . . . . . . . . . . . . . . . . . 11

4. Frida usage basics . . . . . . . . . . . . . . . . . . . . . . . . . 14


4.1 JavaScript vs TypeScript . . . . . . . . . . . . . . . . . 14
4.2 An overview of Frida API . . . . . . . . . . . . . . . . 16
4.3 Main features . . . . . . . . . . . . . . . . . . . . . . . 17
4.3.1 Stalker: a code tracing engine . . . . . . . 17
4.3.2 Hooks and the Interceptor API . . . . . . 18
4.4 frida-tools . . . . . . . . . . . . . . . . . . . . . . . . . 20
4.4.1 Frida command line interface . . . . . . . 20
4.4.2 frida-trace . . . . . . . . . . . . . . . . . . . 23

5. Dealing with data types with Frida . . . . . . . . . . . . . . 30


5.1 Dealing with strings: Reading and allocation . . . . 30
5.1.1 Practical use case: Reading a WinAPI
UTF16 string parameter . . . . . . . . . . 31
5.2 Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.2.1 Numerical arguments passed by value. . 34
CONTENTS

5.2.2 Numerical values by reference . . . . . . 35


5.2.3 Writing numbers . . . . . . . . . . . . . . . 36
5.3 Pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.4 Pointer to offsets . . . . . . . . . . . . . . . . . . . . . 38
5.5 Getting pointers to exports . . . . . . . . . . . . . . . 39
5.5.1 findExportByName vs getExportByName 40
5.6 Pointer to ArrayBuffers . . . . . . . . . . . . . . . . . 40
5.7 Hexdump: getting a picture from a memory region 41
5.8 Writing our first agent. . . . . . . . . . . . . . . . . . . 43
5.8.1 Writing the control script . . . . . . . . . . 44
5.9 Injecting our scripts using Frida’s command line . . 48
5.10 Remote instrumentation . . . . . . . . . . . . . . . . . 49

6. Intermediate usage . . . . . . . . . . . . . . . . . . . . . . . . 52
6.1 Defining globals in Frida’s REPL . . . . . . . . . . . . 52
6.2 Following child processes . . . . . . . . . . . . . . . . 53
6.3 Creating NativeFunctions . . . . . . . . . . . . . . . . 57
6.3.1 Using NativeFunction to call system APIs 58
6.4 Modifying return values . . . . . . . . . . . . . . . . . 61
6.5 Access values after usage . . . . . . . . . . . . . . . . 61
6.6 CryptDecrypt: A practical case. . . . . . . . . . . . . 62
6.7 Modifying values before execution . . . . . . . . . . 64
6.8 Undoing instrumentation . . . . . . . . . . . . . . . . 69
6.9 std::string . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.9.1 std::vector in MSVC . . . . . . . . . . . . . 76
6.10 Operating with ArrayBuffers . . . . . . . . . . . . . . 77

7. Advanced usage . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.1 NOP functions . . . . . . . . . . . . . . . . . . . . . . . 80
7.1.1 Using the replace API . . . . . . . . . . . . 80
7.1.2 Patching memory . . . . . . . . . . . . . . 81
7.2 Memory scanning . . . . . . . . . . . . . . . . . . . . . 82
7.2.1 Reacting on memory patterns . . . . . . . 83
7.3 Using custom libraries (DLL/.so) . . . . . . . . . . . . 86
7.3.1 Creating a custom DLL . . . . . . . . . . . 87
7.3.2 Using our custom library . . . . . . . . . . 87
7.4 Reading and writing registers . . . . . . . . . . . . . . 89
7.5 Reading structs . . . . . . . . . . . . . . . . . . . . . . 91
CONTENTS

7.5.1 Reading from a user-controlled struct. . . 92


7.6 SYSCALL struct . . . . . . . . . . . . . . . . . . . . . . 93
7.7 WINAPI struct. . . . . . . . . . . . . . . . . . . . . . . 96
7.8 Tips for calculating structure offsets . . . . . . . . . . 99
7.9 CModule . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.9.1 CModule: A practical use case . . . . . . . 102
7.9.2 CModule: Reading return values . . . . . 105
7.9.3 CModule vs JavaScript agent performance 106
7.9.4 CModule: Sharing state between JS and C 109
7.10 Sharing state between two CModule objects . . . . . 110
7.10.1 Notifying from C code . . . . . . . . . . . 111
7.11 CModule boilerplates . . . . . . . . . . . . . . . . . . . 113
7.12 Stalker . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7.12.1 Getting a thread id . . . . . . . . . . . . . . 117
7.12.2 Stalker: Tracing from a known function
call . . . . . . . . . . . . . . . . . . . . . . . 118
7.12.3 Tracing instructions . . . . . . . . . . . . . 123
7.12.4 Getting RET addresses . . . . . . . . . . . 125

8. MacOS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.1 ObjC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
8.2 Intercepting NSURL InitWithString . . . . . . . . . . 128
8.3 Obj-C: Intercepting fileExistsAtPath . . . . . . . . . 131
8.4 ObjC: Methods with multiple arguments. . . . . . . 134
8.5 ObjC: Reading a CFDataRef . . . . . . . . . . . . . . 137
8.6 Getting CryptoKit’s AES.GCM.seal data before en-
cryption . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
8.7 Swift.String . . . . . . . . . . . . . . . . . . . . . . . . . 143

9. r2frida . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
9.0.1 Testing r2frida . . . . . . . . . . . . . . . . 146
9.1 Tracing functions . . . . . . . . . . . . . . . . . . . . . 147
9.1.1 Tracing functions from imports/exports . 148
9.1.2 Tracing functions by using offsets . . . . 151
9.2 Disassembling functions in memory . . . . . . . . . 155
9.3 Replace return values . . . . . . . . . . . . . . . . . . . 156
9.4 Replacing return values (hijacking) . . . . . . . . . . 158
9.5 Allocating strings . . . . . . . . . . . . . . . . . . . . . 160
CONTENTS

9.6 Calling functions . . . . . . . . . . . . . . . . . . . . . 160

10. Optimizing our Frida setup . . . . . . . . . . . . . . . . . . . 163


10.1 Building an optimized Frida agent . . . . . . . . . . . 165

11. A real-world use case: Building an anti-cheat with Frida 167


11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . 167
11.2 Anti-cheat Requirements . . . . . . . . . . . . . . . . 168
11.2.1 Timenudge . . . . . . . . . . . . . . . . . . 169
11.3 Quick environment setup . . . . . . . . . . . . . . . . 170
11.4 Anti-cheat architecture . . . . . . . . . . . . . . . . . 170
11.5 Extending the banlist . . . . . . . . . . . . . . . . . . . 173
11.5.1 Monitoring userinfo changes . . . . . . . 177
11.5.2 Predicting timenudge values . . . . . . . . 181
11.6 Optimizing G_RunFrame calls . . . . . . . . . . . . . 185
11.6.1 Persistence across map changes . . . . . . 187
11.6.2 Conclusions . . . . . . . . . . . . . . . . . . 188

12. Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189


1. Introduction
The aim of this book is to introduce ourselves in the world of binary
instrumentation by using Frida (frida.re)¹ toolkit. In this book we will
see a practical approach to learning the framework which means that
we will reduce the theory side and work with practical examples as
much as possible.
Although the Frida API documentation is quite good, there’s still a
gap between the basic use cases and the most complex ones. Moreover,
this toolkit is particularly used for mobile instrumentation and lots of
Android examples can be found on the internet, even though it has
support for desktop OS’s. This gap is what I attempt to fulfill with
this book.
We are going to see a variety of scenarios from reading and manipu-
lating simple arguments to reading and writing to structs. Also, the
most interesting features such as remote instrumentation will also be
covered.
An online version of this handbook is available and updated at
https://learnfrida.info.

1.1 Handbook structure


Here lays a brief explanation of how this book is going to be
structured:

1. System/Software requirements: Although the examples are


self-explanatory, in case you want to try them or play around
you will need to meet the software requirements.
2. Binary Instrumentation: The basic concepts of binary instru-
mentation will be explained to help us understand the underly-
ing techniques used in Frida or other frameworks.
¹https://frida.re
Introduction 2

3. Frida internals: We learn how Frida works on the inside and


what makes Frida interesting for us to use compared to other
toolkits or frameworks.
4. Frida basics: We will learn the basics of how Frida operates, its
tools and most interesting APIs as well as recommendations on
how to approach certain tasks and also how not to. We will also
learn how to create our first control-instrumentation tandem.
5. Intermediate usage: We will perform tasks that are useful in
real-life examples. Since they rely on already understanding
the Frida API and are error-prone (this is, you might crash the
application if done incorrectly) or cover more advanced topics,
they are set in a separate category (Modifiying return values,
modifying function params, reading buffers…).
6. Advanced usage: This area covers more advanced topics that re-
quire understanding more complex concepts unrelated to Frida
such as struct offsets, NOPing functions, optimizing certain
tasks with CModule…

This handbook is aimed to cover only Desktop OS’s and so most of


the book can be followed either using a Linux distro of your choice
or a modern Windows system. Be sure to understand each part of the
book before moving forward to more advanced topics (as I said you
will probably end up crashing the instrumented process if you make
mistakes).
The examples are done in different platforms to illustrate the fact that
having an understanding of the Frida toolkit applies anywhere.
As an extra, I added instrumentation for Objective-C/Swift under
MacOS. Objective C interaction with Frida works a bit different from
what we will see throughout the book (mostly indexes and how
classes are loaded) so it has its own separate section.
2. What we will need
2.1 System requirements
To be able to follow this book, a Linux distribution that supports
NPM, Python and clang/gcc/g++ is required, so any major one should
suffice (I will be working on a debian machine throughout the book).
For the Windows part, Windows 10 is what I will be using. You are
free to use older versions of Windows if you want.
Virtual machines are ideal; however, they are not required unless you
intend to run malicious applications ;).

2.2 Software requirements


For the most part this is what we will need:
1. Python 3.7 or greater (Still supports Python 2.7 at the time of
writing)
2. Clang (Personal preference, you can use gcc or any other C
compiler)
3. NPM for typescript usage (Not required, but recommended)
4. (Windows) Visual C++ or GCC.
5. (MacOS) XCode for Swift/Objective-C.
During this book we will be using Javascript(JS) for instrumentation
code (but will show some equivalent examples in TypeScript too) and
Python to interface with Frida when needed but, you are free to use a
different language (Java, Swift…) to interface with Frida if you wish.
What we will need 4

2.3 Programming language


requirements
Although most information displayed here will be explained in detail,
the following knowledge will make this book easier to understand:
1. General programming concepts.
2. JavaScript and/or Typescript knowledge.
3. Basic Python knowledge.
4. Understanding of C language basics and pointers.
(*) If you want to follow the MacOS part, Objective-C knowledge is
recommended.
3. Binary instrumentation
and Frida
This section should give you a brief but general understanding of
what binary instrumentation is and will be useful even if you use
other tools or write your own ones (you are free to first read Section
3. About Frida).
Binary instrumentation consists on injecting instrumentation code
which is transparent to the target application so that we can obtain
behavioral information during its execution. With Frida during the
instrumentation process, not only we have detailed information about
the binary structure but also it is possible to modify the execution
flow of an application while it is running.
In particular, the instrumentation process allows us to obtain infor-
mation on:

• The assembly instructions executed by the process.


• Function arguments (whenever functions are called).
• Returned values from called functions.
• Pointer data.

This information is very valuable because it allows us to quickly learn


about how functions their data(such as parameters, local variables or
return values) are used in the target process.
However, from the point of view of a malware analyst, the capacity
of altering the course of the execution of an application is perhaps
even more useful. The instrumentation process allows modifying the
execution flow of the application given certain conditions, as well as
modify registers given certain instruction patterns.
There are multiple applications of binary instrumentation:
Binary instrumentation and Frida 6

• Reverse engineering: Allows to quickly obtain information


about a binary or process, especially on scenarios where static
analysis time is limited. For example during static analysis we
can notice certain functions are being called lots of times but
we don’t have time to check their input manually, this we can
automate.
• Malware Analysis: It can be used to obtain a quick behavioral
report from a malicious process by inspecting popular APIs,
introducing breakpoints or modifying its control flow. It also
allows exporting the knowledge we gained from a quick static
analysis to check if our findings are correct.
• Fuzzing: Manipulating data throughout the execution of the
application so that we can force errors or race conditions in an
application, leading to vulnerabilities.
• Taint Analysis: Check all the variables controlled by the user
and take advantage of this to examine memory regions and
registers to see how they are affected by the user-controlled
input.
• Measure performance: It is possible to measure the perfor-
mance of specific code sections or instruction sets, although this
is usually done along with the source code.

3.1 Application and code-level


instrumentation
Depending on the level of access to our target application, two main
types of instrumentation can be differentiated:

1. Application-level instrumentation: We are able to instrument


applications for which we don’t have access to its source code, as
long as we provide an environment where it can run. Aside from
few cases where there are code leaks, most malware binaries do
not have their source code available so this method fits perfectly
for this use case.
2. Code-level instrumentation: Given access to the applications
source code, we can instrument and trace certain sections of
Binary instrumentation and Frida 7

the code to measure its performance, find bugs or obtain any


information that is of interest to either the analyst or the
developer.

In addition, application-level instrumentation helps us to speed up


static analysis scenarios and is where this type of instrumentation
shines. The reason behind this is that it allows us to retrieve infor-
mation from function arguments, return values, stack and registers
without the need to manually reverse engineer the complete binary.
There is also the possibility of probing certain blocks of code to
monitor how they are being accessed or where they are called from,
which is very helpful when tracking down the execution chain of
functions.
Frida can be embedded as a library to add probes that allow tracing
our code (although this task is better handled by more complex
IDEs’s and debuggers) but it is mostly used for application-level
instrumentation.
Up to this point the concepts of binary instrumentation and the
levels ore instrumentation have been introduced, however before
going into details about how instrumentation tools are structured
the next section describes what Frida is and the role it takes in the
instrumentation process.

3.2 Frida: a binary instrumentation


toolkit
Frida is a binary instrumentation toolkit developed by Ole Andre V.
Ravnas¹ and sponsored by NowSecure. There are other frameworks
available to achieve similar things like Intel PIN² and DynamoRIO³
but there are some key points that makes Frida an interesting toolkit
over the others:
¹https://github.com/oleavr
²https://software.intel.com/content/www/us/en/develop/articles/pin-a-dynamic-
binary-instrumentation-tool.html
³https://dynamorio.org/
Binary instrumentation and Frida 8

• Cross-platform: Frida works in Windows, Linux, MacOS sys-


tems as well as mobile platforms (Android, iOS).
• We can develop our instrumentation code in JavaScript or
TypeScript, which speed things up a lot compared to other tools
or frameworks.
• There are bindings available in various languages: Python, Java,
Swift…
• Compared to other instrumentation frameworks such as PIN,
development is easier and faster due to Frida’s very straight-
forward API as well as its setup process.
• Open-source: We can add features that we need or check out
how Frida works internally.

These are the main ‘features’ that make this framework interesting
to us. However, there are some more interesting features such as the
possibility of working in other architectures like ARM⁴ or MIPS⁵, and
the fact that it is possible to make instrumentation software using the
Frida libraries and/or toolkit and use it for commercial purposes.
This table helps illustrate the main advantages of Frida over other
frameworks:
Frida DynamoRIO PIN
Open Source Yes Yes No
Cross-Platform Yes Yes (limited) Yes(limited)
Bindings in Yes No No
different
languages
Write quick Yes No No
instrumentation
tools
Support writing Yes No No
instrumentation
without C
Mobile Support Yes No No
Free Yes Yes No

⁴https://en.wikipedia.org/wiki/ARM_architecture
⁵https://en.wikipedia.org/wiki/MIPS_architecture
Binary instrumentation and Frida 9

The way Ole describes what is Frida is: “the Greasemonkey for
native apps, a dynamic code instrumentation toolkit that lets you
inject snippets of JavaScript or your own library into native apps on
multiple systems”.
For us it means that we can do all the flashy things that are possible to
do with other instrumentation frameworks but faster due to the use of
JavaScript to write instrumentation scripts and with high portability.
Regarding high portability, Frida supports the following Operating
Systems and architectures:

Supported architectures and sys-


tems
Supported OS list and architectures:

• Windows (x86, x64)


• Linux (x86, x64, arm, arm64, arm64e)
• MacOS (x86, x64, Apple Silicon M1)
• Android (including x86)
• iOS (arm64, arm64e, x64)

Frida works with x86, x64 as well as ARM without prob-


lems and for other architectures like MIPS support can
be added thanks to Frida being open-source or the use of
prebuilt binaries by the community. There is also support
for termux⁶ (an Android terminal emulator).
For a complete and up to date set of releases please refer
to https://github.com/frida/frida/releases.

Now that Frida has been briefly introduced, in the next section we’ll
see how instrumentation tools are structured when using Frida and
its role in them.
⁶https://termux.com/
Binary instrumentation and Frida 10

3.3 Instrumentation tool structure


under Frida

Figure 1. Instrumentation flow

The pattern described in Figure 1 is the one that should be followed


when instrumenting an application unless certain conditions are met;
some of them being the need to increase performance or decreasing
complexity. In these cases, the instrumentation script and Frida’s
REPL should be good enough.
When writing an instrumentation tool there are two main parts to
differentiate: The control script and the instrumentation script.
The control script is the one in charge of communicating with
Frida via* bindings*. Bindings are libraries that offer access to the
underlying Frida API from our language of choice. These are available
in multiple languages: Python, Node, Swift, C#… In this book, when
learning about writing the control script Python will be the language
of choice (but you are always free to use others).
This script takes the role of loading the instrumentation script and
injecting it into the target process. It also enables the child_gating
feature which allows for child processes of a process that is being
instrumented to be automatically instrumented too (reminder: this
implementation is OS-dependant).
Binary instrumentation and Frida 11

The control script also communicates with the instrumentation


script and receives/sends all the messages from/to the instrumenta-
tion script and handles them as required (e.g: Handling events such
as saving instrumentation messages or deactivating a function that is
flooding our tool.)
Finally, it is also able to execute code from the instrumentation script
via RPC(=remote procedure calls⁷).
On the other hand, there’s the instrumentation script which takes
care of any interaction with the running process. This script is
written in JavaScript and most of the book examples are written in
JS(because it is what Frida’s CLI welcomes and saves us time), how-
ever you are free to use TypeScript if you prefer. The instrumentation
script has the ability of intercepting functions calls, monitoring
process memory tracing instructions or function calls or even
modifying the process execution flow. It is also able to send messages
from the target process and is able to expose RPC methods to be
called from Frida’s REPL(a.k.a Frida’s command line interface) or an
external script.

3.4 Frida architecture basics


Let’s picture an image of how Frida works internally with the help
of some diagrams. But first, let’s introduce the keywords of said
diagram:

• frida-core is the part of Frida’s internals that enables instru-


mentation with JS among other features (two-way communica-
tion, process enumeration…)
• frida-python are the bindings chosen to interact with frida-
core.
• frida-agent is a Frida library (from frida-core) that is in-
jected into our target process and does the low-level stuff for
us(installing hooks, communicating with our instrumentation
code) and is written in JavaScript.
⁷https://en.wikipedia.org/wiki/Remote_procedure_call
Binary instrumentation and Frida 12

• frida-gum is a cross-platform instrumentation and introspection


library (again, part of frida-core). For more details on what
features it supports please refer to frida-gum’s repository⁸.

Figure 2. Frida instrumentation internals.

Taking Figure 2 as a reference, on the left labeled A there’s the


control tool that is written in Python (or it can be Frida’s REPL) which
communicates with frida-core labeled B through bindings (you can
use C too, but that kind of defeats Frida’s commodities).
On the right side, there is the frida-agent, labeled D, that is injected
in the target process and interfaces with our JavaScript code labeled
E via a P2P DBUS which transports a bidirectional exchange of
JSON messages and is labeled C in Figure 2. If more control over
instrumentation is required such as handling certain messages from
JavaScript, processing child processes or calling functions from the
agent RPC exports the control tool on the left side of Figure 2 is left
to the user to be coded.
A more detailed explanation of Frida’s architecture can be read at
frida.re/docs/hacking⁹.
⁸https://github.com/frida/frida-gum
⁹https://frida.re/docs/hacking/
Binary instrumentation and Frida 13

After this brief introduction, if you are interested in some real-world


projects powered by Frida you can take a look at the following list:

Projects using Frida


There are some interesting projects based on Frida, here I
will be listing some of them however with a quick Google
search you can see there are a lot more:

• Dwarf¹⁰: A debugger using Frida as backend.


• APPMon¹¹: A tool to analyze MacOS and iOS ap-
plications, it instruments interesting APIs to inspect
their usage or retrieving valuable information.
• r2frida¹²: Extension that allows us to work with
radare2 as well as Frida.
• frida-fuzzer¹³: An experimental fuzzer for API in-
memory fuzzing.
• Objection¹⁴: A runtime mobile exploration toolkit
based on Frida.
• It is also possible to find more projects
thanks to the Frida topic in GitHub.
https://github.com/topics/frida

¹⁰http://www.giovanni-rocca.com/dwarf/features/
¹¹https://github.com/dpnishant/appmon
¹²https://github.com/nowsecure/r2frida
¹³https://github.com/andreafioraldi/frida-fuzzer
¹⁴https://github.com/sensepost/objection
4. Frida usage basics
This chapter introduces the basic usage of Frida, which includes
learning how tools based on Frida work but also the usage of the
frida-tools package, Frida’s CLI (Command Line Interface) as well
as making our basic instrumentation scripts.
Before going on, be sure to install frida and frida-tools packages
using Python’s pip:
$ pip install frida frida-tools

The frida package includes the libraries that can be used from Python
and the frida-tools package include the prebuilt command line tools
of Frida. For more information on the frida-tools package refer to
Section 5.2. frida-tools.
Important: From now on, whenever frida is mentioned it refers to
Frida’s CLI. Whenever Frida (in capital letters) is mentioned the text
refers to the toolkit as a whole.
Frida development can be done using JavaScript or TypeScript al-
though the later is transpiled into compatible JavaScript, in the next
section the differences between both are shown.

4.1 JavaScript vs TypeScript


Frida supports writing instrumentation code in JavaScript(JS) and
TypeScript(TS) and while the usage of TypeScript is encouraged,
everything can be written using JS.
The main reasoning behind writing instrumentation tools using
TypeScript would be the assistance of code auto-completion as well
as modularity and compile-time errors. However, the latter will not
prevent any runtime error that ends up wrongly manipulating an
instrumented process.
Frida usage basics 15

It is also possible to use modules developed by other users like frida-


panic¹ (provides easy crash-reporting functions) and swift-frida²
(provides interop with Swift’s datatypes). Loading external modules
like the ones mentioned above is a feature exclusive to TypeScript
development.

TypeScript JavaScript
Editor autocompletion Yes No
Extension support Yes Yes, but limited
Error checking on build Yes No
Runtime error checking Yes Yes

Although this process is seen in detail in Section 5.10, here is a small


diagram displaying the main difference between developing an agent
in TypeScript versus JavaScript:

Figure 4. Steps difference between JavaScript and TypeScript based agents.

In essence, the TypeScript agent requires being transpiled to compat-


ible JavaScript first.
As a general rule of thumb, if you are writing a simple and quick
script you can stick with JavaScript for most of it. On the contrary,
for bigger and more complex instrumentation scripts TypeScript is
greatly recommended.
¹https://github.com/nowsecure/frida-panic
²https://github.com/maltek/swift-frida
Frida usage basics 16

Throughout the book, the examples are written in JavaScript for


the most part. The reasoning behind this is that it saves time(and
space in the book) and scripts are easily usable/debuggable in
Frida’s command-line with no extra steps(transpiling the project
into JavaScript).

4.2 An overview of Frida API


The Frida Javascript API has several modules that provide function-
ality to the users. These features are mostly cross-platform and thus
work in almost every environment. There are however, exceptions
to this rule such as the Java module which is only available in
Android and the ObjC module that is only available as long as there
is an Objective-C runtime present. Here the ones I consider the most
important:
The Thread module provides functionality to operate with threads,
sleeping them and obtaining backtraces.
The Process module provides information from the instrumented
process, this covers obtaining the architecture, pointer sizes, code
signing policies, thread IDs… But is also able to provide information
about loaded modules, their addresses and enumerating memory
ranges. It is also able to set a process-wide exception handler.
The Memory module provides functionality to operate with memory
which translates into the ability of reading/writing strings from
memory, numbers, pointers, structures… It also provides the ability to
scan the process’ memory for specific patterns and setting protection
modes.
The Module module provides functionality to retrieve information
from loaded modules. This is not only limited to retrieving informa-
tion from them such as in-memory base addresses but is also able to
load external modules.
The Kernel module provides access to kernel-mode APIs and allows
to enumerate modules, memory ranges and change memory protec-
tion of specific memory regions. This feature is limited to MacOS
systems.
Frida usage basics 17

CModule module allows to map a snippet of C source code in


memory and make it available for the JavaScript runtime.This API
is mostly used in the final steps of the instrumentation development
in order to optimise the end result. This API is later seen in detail in
Section …
For code instrumentation, there are 4 modules:

• Interceptor provides the functionality to intercept function


calls, modify behavior, function replacement… This API is one
of the most important if not the most and is used through the
book continuously.
• ObjC provides functionality to interact with Objective-C con-
structs, methods and classes.
• Stalker is a code tracing engine built-in Frida. This module will
be seen in detail later in Section …
• Java module provides instrumentation for JAVA APIs in the
Android ecosystem. This module is not available to instrument
JAVA applications in desktop operating systems.

A more detailed, API-by-API documentation can be found at the


frida.re³ site along with some examples of their usage.

4.3 Main features


There are two main features that Frida provides us with, the first one
is the Interceptor API and the second one is the Stalker code-tracing
engine. In later sections such as Section 6.9 both the Interceptor API
and the Stalker API are put in practice.

4.3.1 Stalker: a code tracing engine


Stalker is a code tracing engine, which allows transparently following
of threads and tracing every instruction and function that are being
called.
³https://frida.re/docs/javascript-api
Frida usage basics 18

The Stalker engine works by applying a technique named dynamic


recompilation⁴.
Dynamic recompilation is an emulation technique that translates
into recompiling, while a program is running, the target program’s
machine code (instructions) into a local copy that is able to run in
your target CPU. These instructions are kept in caches which can be
read, written or executed on demand.

Figure 5. Stalker’s dynamic recompilation.⁵


Thus, as seen in Figure 5. Stalker creates a copy of the instructions that
are going to be executed and adds its modifications to the copy and
it is this copy that is executed instead. The original instructions are
left unmodified. This logic allows to maintain the original checksum⁶
while still being able to trace the execution. This technique was
chosen due to it being very performant.
Dynamic recompilation is a rather complex technique mainly used
in emulators and going in depth of how it works is out of the scope
of this book (and takes some time to fully grasp) - But if you are
interested in learning more in depth how this technique works by all
means check Marcosatti’s guide⁷ about the topic or Ole’s post about
the anatomy of a code tracer⁸.

4.3.2 Hooks and the Interceptor API


API hooking is a technique which allows inspecting or modifying the
flow of function calls. It is possible to achieve this through different
⁴https://en.wikipedia.org/wiki/Dynamic_recompilation#:~:text=In%20computer%
20science%2C%20dynamic%20recompilation,of%20a%20program%20during%
20execution
⁵https://miro.medium.com/max/2400/1*8rD7LvTUEldL7wRPDjWhKQ.png
⁶https://en.wikipedia.org/wiki/Checksum#:~:text=A%20checksum%20is%20a%
20small,upon%20to%20verify%20data%20authenticity.
⁷https://github.com/marcosatti/Dynarec_Guide
⁸https://medium.com/@oleavr/anatomy-of-a-code-tracer-b081aadb0df8
Frida usage basics 19

techniques but the most common one are trampoline based hooks.
These work by inserting the beginning of a function we want to
instrument, a jump to a function that is in our control of so that
the former is executed instead of the original one. Let’s see a more
graphical example of how this works:
Say there is a program that has a function_A function, and the intent
is to execute the function_B function instead. function_A prologue
is modified and replaced with a JMP instruction to our function_B.
Once function_B code is executed, the trampoline ensures it returns
to the intended function_A execution flow.

Figure 6. Trampoline internals

This is a very brief explanation of what API hooking is and


one of its techniques, for more information I recommend
checking this post⁹.

The Interceptor API allows us to easily hook functions or code


sections provided a valid memory address. It takes NativePointers
(a type defined by Frida in TypeScript) matching the target function
address and lets us attach to some interesting callbacks:

• onEnter: Allows us to see or modify what is being submitted


to the function before its execution begins, this being function
⁹http://jbremer.org/x86-api-hooking-demystified/#ah-basic
Frida usage basics 20

arguments before they are manipulated as well as other memory


sections.
• onLeave: Allows us to see or modify return values or how
function arguments were modified after the function execution.
Same applies to memory regions.

Any hooks installed by the Interceptor API are automatically reverted


after closing our tool or destroying our instrumentation script.

4.4 frida-tools
frida-tools is a Python package that offers some CLI tools that
can be used for quick instrumentation and they can be vinstalled by
simply running the following pip:
$ pip install frida-tools

The frida-tools package includes a set of small tools although this


book only covers the most important ones (frida-trace and Frida’s
command line interface), because those are the most used ones.

4.4.1 Frida command line interface


One of the two most important tools that are present in the frida-
tools package is the Frida command line interface. Once installed,
it can be accessed by typing frida in the system console be it bash,
Windows’ CMD or MacOS Terminal/iTerm.
Frida’s CLI is a very important tool because it kind of substitutes
the need for a control script (the one usually written with bindings
like frida-python) and thus allows to quickly instrument a binary
or perform quick tests without the need of writing a full-fledged
instrumentation script.
Almost every example in this book can be run through Frida’s CLI.
Let’s see how this can be achieved and what options the CLI brings
to the table.
Frida usage basics 21

To instrument a process that is already running, this can be achieved


by simply writing the PID(Process IDentifier) of the process or its
name:
$ frida 1234
$ frida notepad.exe

However, in case there are two processes with the same name Frida
will fail because it doesn’t know which process to attach to. To solve
this issue, try to use the PID as much as possible.
-f switch allows to spawn a process given a path. When doing this,
the instrumented binary is spawned by Frida and suspended. Frida
then gives us access to a command line which allows for early
instrumentation that allows to defeat anti-debugging techniques or
inject our instrumentation code before the process is run.
To resume the execution from within the command line simply type
%resume and the process will continue its execution.

-l switch loads an instrumentation script. This is the most useful


switch because it allows to directly load an instrumentation script
in the process to be instrumented (without the user having to write
any instrumentation tool).
--runtime allows choosing between the QuickJS runtime (--
runtime=qjs) and the JavaScript V8 (--runtime=v8) one. A detailed
explanation between the difference of these runtimes can be read on
the following quote.

Runtimes in Frida
Frida supports running instrumentation scripts using duktape
(an embedded JavaScript Engine), JavaScript V8 Engine and now
in recent versions QuickJS (which replaces duktape). For basic
scripts QuickJS is enough whereas V8 provides better language
features as well as more detailed error logs. Also V8 is more
performant than QuickJS but JS VM exits(this is, everytime our
code has to communicate with the agent) are more expensive than
QuickJS.
Since they are both included when you install Frida you are free
Frida usage basics 22

to choose whichever engine suits your use case; in case you are
having trouble figuring out where exactly your instrumentation
code is failing be sure to check the V8 engine because errors are
more detailed (you can learn how to switch engines in Section 5.3.
frida-trace).
https://duktape.org
https://v8.dev
https://bellard.org/quickjs/

After this brief explanation on Frida’s runtimes, let’s continue talking


about Frida’s CLI parameters.
--no-pause allows to instrument a process and automatically resume
it after instrumentation is applied and saves the user from having to
manually enter %resume to resume execution.
By default Frida instruments in our local machine but also allows for:

• -U for USB instrumentation of devices connected via USB


(mainly smartphones).
• -R for remote instrumentation. Requires a version synced frida-
server running in the remote host.
• -H for remote instrumentation specifying a host. For example it
is possible to instrument a machine in our local network: frida
-H 192.168.1.81
• -D to instrument a device given an identifier.

--stdio=inherit|pipe allows inheriting the standard input/output of


the instrumented application by default but this can be piped setting
--stdio=pipe.

allows loading a CModule. This is later discussed in detail in the


-C
CModule section. The same goes for the --toolchain switch.
-o logs output to a file.
-callows loading an instrumentation script hosted in Frida’s code-
share repository¹⁰. If someone has already written an script that does
¹⁰https://codeshare.frida.re/
Frida usage basics 23

what is needed for the task. It can be fetched from this repository
automagically.
There are other command line switches present in the Frida CLI but
they are either focused on mobile devices or self explanatory. Also
note that some of these command line switches are shared with the
frida-trace tool which is explained in the next section.

4.4.2 frida-trace
frida-trace is a tool that allows us to instrument processes or apps
without the need of writing an instrumentation tool, this one is
essentially based on Frida’s Interceptor API. Its main features are:

a. Instrument live processes or spawn new processes.


b. Following child processes (Note: This is implemented per sys-
tem, which means that not all methods are included. E.g.: Win-
dows’ only includes the CreateProcessInternalW as a reference
for detecting child processes.)
c. Instrument all the functions included in a module:
KERNEL32.DLL!* or NTDLL.DLL!*
d. Instrument specific APIs: KERNEL32.DLL!CreateFileW
e. Works in a local environment, remote environment as well as
USB for mobile devices.

This tool provides different command line arguments, here we will


see how they work and how they are useful to us.
• $ frida-trace -f binary.exe: The -f parameter translates into
filepath and is used to spawn a new process given a path to a binary.
This works for instrumenting processes before any app code has the
opportunity to run (or bypass any anti-tampering measures).
• $ frida-trace <PID|process_name>: If no -f param is specified,
frida-trace automatically searches for any PID or process name
matching the provided one.
It is also possible to choose the JavaScript runtime between
JavaScript’s V8 or QuickJS:
$ frida-trace -runtime=quickjs|v8 1234
Frida usage basics 24

One of the important details of Frida’s CLI tools is that parameters


are case sensitive in most cases, it is important to take this into
consideration. An example is that lowercase parameters are used for
functions and uppercase parameters for modules, be it inclusions or
exclusions.
Functions or modules can be traced using frida-tools, functions are
traced with the -i parameter whereas modules can be included with
the -I parameter. For exclusions of functions and modules, there are
-x and -X respectively.
-I and -X only affect module exports and not module imports, for
module imports use the -T parameter.
It is possible to use the * wildcard for partial matches such as
CreateFile*.

Before introducing some usage examples, say we have a


binary with the following exports: KERNEL32.DLL!CreateFileA,
KERNELBASE.DLL!CreateFileA, KERNEL32.DLL!CreateFileW,
KERNELBASE.DLL!CreateFileW, KERNEL32.DLL!DeleteFileW and
ADVAPI32.DLL:RegOpenKeyExW.

• $ frida-trace -i “CreateFileW”: Instruments every API call


that matches the exact string, without taking into account which
module it is coming from. When no exclusions are specified
Frida will intercept all the matching functions. In this case,
Frida would instrument both KERNEL32 and KERNELBASE’s Cre-
ateFileW.
• $ frida-trace -i “CreateFile*”: Instruments all the APIs or
functions that match the starting pattern of CreateFile<…>. In
this case, it would instrument both KERNEL32 and KERNELBASE’s
CreateFileW and CreateFileA.
• $ frida-trace -i “CreateFileW” -I “KERNEL32.DLL”: Instru-
ments only KERNEL32.DLL!CreateFileW
• $ frida-trace -i “CreateFileW” -X “KERNEL32.DLL”: Instru-
ments any CreateFileW out of KERNEL32.DLL. In this case, Frida
only instruments KERNELBASE’s CreateFileW.

These parameters can be repeatedly used to include or exclude


multiple functions or modules. There is also the possibility of in-
Frida usage basics 25

strumenting function calls given module offsets thanks to the -a


parameter. An example:
$ frida-trace <PID> -a “customLib.DLL!0x1234

Other modifiers to remember and/or take into account:

• -q:
remove Frida’s API call formatting for each instrumenting
call.
• --runtime: Choose the desired runtime. QuickJS is
recommended for performance and V* for modern JS features
and a more in-depth error reporting.
• --debug: opens Frida’s debug console.

The following table shows summarizes the different command line


switches of frida-trace:
Options Effect
-i Include function
-x Exclude function
-I Include module
-X Exclude module
-T Include imports
-s Include debug symbols
--runtime Switch between QuickJS runtime and V8
-f Spawn from file

Whenever we instrument a module or an API call or function, frida-


trace auto-generates a handler with the basic structure for us to write
the instrumentation code.
To see how this tool works in detail, we will make use of a practical
example. In this case, the target process is notepad.exe, Windows’
signature plain text editor. This binary uses KERNEL32’s CreateFileW
to create and/or open files therefore it is interesting to instrument it
in order to see what it is trying to open.
The first thing that needs to be done is to open notepad.exe and let it
run in the background. Then, run the following command from our
Frida usage basics 26

terminal:
$ frida-trace -i "CreateFileW" notepad.exe

Be sure to check that no other processes named


notepad.exe are running in the background and be sure to
use the PID notepad instance that you want to instrument.
In Windows, you can get the PID with the tasklist
command.

If successful, frida-trace generates the following output:

1 Instrumenting functions...
2 CreateFileW: Auto-generated handler at "/Users/fernandou/\
3 __handlers__/KERNEL32.DLL/CreateFileW.js"
4 CreateFileW: Auto-generated handler at "/Users/fernandou/\
5 __handlers__/KERNELBASE.dll/CreateFileW.js"
6 Started tracing 2 functions. Press Ctrl+C to stop.

The result is that these KERNELBASE.DLL and KERNEL32.DLL’s


CreateFileW have been instrumented, and a default auto-generated
stub is (one per function per module). Its contents are:
Frida usage basics 27

Now there is something that catches our attention, why are two stubs
generated? CreateFileW is present in KERNELBASE.DLL and it stores a
reference to KERNEL32’s CreateFile and, since it was not specified
which module we want to instrument frida-trace instruments both
by default. The next problem is to extract meaningful information
from this API call and for this purpose we can examine the official
Microsoft MSDN documentation¹¹:
1 HANDLE CreateFileW(
2 LPCWSTR lpFileName,
3 DWORD dwDesiredAccess,
4 DWORD dwShareMode,
5 LPSECURITY_ATTRIBUTES lpSecurityAttributes,
6 DWORD dwCreationDisposition,
7 DWORD dwFlagsAndAttributes,
8 HANDLE hTemplateFile
9 );

In this case, CreateFileW’s most important parameter is its first one


lpFileName which stores a wide string (UTF-16) pointing to the file
¹¹https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-
createfilew
Frida usage basics 28

it wants to open. CreateFileW (W for wide Unicode string) is a C


string encoded as UTF-16 (UTF-16 is Windows only). Therefore, the
handler can be completed by writing in the onEnter code section of
the generated stub:

1 onEnter: function (log, args, state) {


2 log('CreateFileW('+ args[0].readUtf16String() + ')');
3 },

An important note is that there are 3 parameters in the onEnter


function, these being log, args, state. log is a callable function that
is short for console.log. args are the array list of arguments that
the instrumented function receives, this array is of unknown size so
the number of arguments must be figured out beforehand. state is
used to share state between onEnter and onLeave functions (sharing
information before the function execution and after execution, on
return).
In this case the auto-generated log call is extended by adding
args[0].readUtf16String() which stands for obtaining the first
argument from the parameter array and reading it as an UTF-16
encoded string.
Finally we get the following output:

1 /* TID 0x325c */
2 5877 ms CreateFileW()
3 5877 ms CreateFileW(C:\Users\fdiaz\Documents)
4 5877 ms CreateFileW()
5 5878 ms CreateFileW(C:\Users\fdiaz\Documents\test.dat)
6 5879 ms CreateFileW()

We have duplicates of CreateFileW due to KERNELBASE’s stub.


TID(Thread ID) and timestamps will differ when you run
frida-trace in your environment but this does not affect the
final output at all.
Frida usage basics 29

After instrumenting the notepad.exe it can be observed that the


process is trying to create a file in the user document’s folder and
that this file name is test.dat.
The Frida API documentation can be seen in the official website
however it does not have as many examples as one would desire. For
the most popular use cases there are some examples written, but we
will play with the Frida repl (Frida’s command line) to learn about
this API.

All the instrumentation code examples found in this text


use the appropiate scope modifier for variables according
to the situation (const, let…). However, when using the
Frida REPL (this being the Frida command line) remember
to remove these modifiers. The reasoning behind this is
that Frida’s REPL uses eval() to handle commands and
thus these scope modifiers are scoped to the current block.
If you are using a script, then there is nothing to worry
about. :)
5. Dealing with data types
with Frida
5.1 Dealing with strings: Reading and
allocation
It is possible to work with strings be it reading strings from memory
or allocating them. This section covers the basic use cases and some
platform specific ones.
For allocating strings we have the following APIs:

API Description
Memory.allocAnsiString Allocating ANSI strings
(windows-only)
Memory.allocUtf8String Allocating UTF8 strings
Memory.allocUtf16String Allocating UTF16 strings
(windows-only)

When you allocate strings always make them constant to avoid any
problems with the string being wiped from memory at some point
(this might happen because of several reasons, mainly the program
freeing memory regions):
const myTestString = Memory.allocAnsiString(“HELLO WORLD”);

myTestString can be read using the string read APIs:

API Description
.readCString Read C-Style strings
.readAnsiString Read ANSI strings
.readUtf8String Read UTF8 strings
.readUtf16String Read UTF16 strings
Dealing with data types with Frida 31

is a pointer to an address containing the string.


It is possible to pass a number as an argument to these
APIs to specify the number of bytes to read.

myTestString can be read using the Memory.readAnsiString() API:


myTestString.readAnsiString();

In case it was a C-String and this string is 1024 bytes long it is possible
to pass the size of the string as an argument:
myTestString.readCString(1024);

Frida figures out in most cases where the string ends for each string
type however, when you are sure of the size of the string by all means
share it with Frida! :).

5.1.1 Practical use case: Reading a WinAPI


UTF16 string parameter
This example is a use case of a user from the Frida IRC / Tele-
gram channel who requested assistance, so I ended up doing a
quick POC for it. In this case, he wanted to hook SearchPathW to
get the lpFileName argument; this API is part of KERNEL32.DLL or
KERNELBASE.DLL

1 DWORD SearchPathW(
2 LPCWSTR lpPath,
3 LPCWSTR lpFileName,
4 LPCWSTR lpExtension,
5 DWORD nBufferLength,
6 LPWSTR lpBuffer,
7 LPWSTR *lpFilePart
8 );

First let’s take a look at the SearchPathW parameters, in this case the
second argument matches lpFileName and its type is LPCWSTR which
means a pointer to a wide string or UTF-16 in case of Windows.
I made an example program to test it out, you can compile it under
Windows using Visual Studio:
Dealing with data types with Frida 32

1 #include <iostream>
2 #include <Windows.h>
3
4 int main()
5 {
6 TCHAR lpBuffer[MAX_PATH];
7 LPWSTR *lpFilePart{};
8 DWORD result;
9
10 result = SearchPath(NULL, L"c:\\windows\\", NULL, MAX\
11 _PATH, lpBuffer, lpFilePart);
12 std::cout << "SearchPath retval: " << result << std::\
13 endl;
14 }

This program can be further modified to test more things if you are
interested but for this basic example we will just check if c:/windows
folder path exists.
It is possible to instrument this application from Frida’s
REPL(command line interface) but first let’s write an instrumentation
script.
As mentioned in Section 4.3 JavaScript vs TypeScript, it is possible
to write instrumentation scripts in JavaScript and TypeScript. For
the time being instrumentation is written in JavaScript but the same
code equivalent in TypeScript is shown(refer to Section 5.10. Writing
our first agent for building the agent with TypeScript).
First, let’s create a file named instrumentation.js. From there, we
need:

1. Get a pointer to the function SearchPathW from KERNEL32.DLL


and intercept it.
2. Read the first argument(from the array, this is 0)
3. Print it into the console.
Dealing with data types with Frida 33

1 const searchPathPtr = Module.getExportByName(“KERNELBASE.\


2 DLL”, “SearchPathW”);
3
4 Interceptor.attach(searchPathPtr, {
5 onEnter(args) {
6 console.log(“Output: ” + args[0].readUtf16String(\
7 ))
8 }
9 });

Then, we can launch the C++ app we created before with the
instrumentation code:
frida -l instrumentation.js -f searchPathCpp.exe --no-pause

The --no-pause flag means that the app will run right
after the instrumentation code is applied by Frida. The -l
flag sets the instrumentation script.

And then our console prints:


Output: c:\\windows\\

For learning purposes, this is the equivalent code in case we would


write our instrumentation script in TypeScript:

1 const searchPathPtr:NativePointer = Module.getExportByNam\


2 e(“KERNELBASE.DLL”, “SearchPathW”);
3
4 class SearchPathW {
5 onEnter(args:NativePointer[]) {
6 console.log("Output: " + args[0].readUtf16String());
7 }
8 }
9
10 Interceptor.attach(searchPathPtr, new SearchPathW);

As you can see, the code is a bit longer (due to types mostly) but
also looks cleaner and clearer. The main difference is that instead of
directly writing Interceptor.attach it is wrapped in a class which
overloads the onEnter callback.
Dealing with data types with Frida 34

5.2 Numbers
It is possible to operate with numbers in a similar fashion as with
strings, but there are some caveats to take into account.
The first and most important one is that we need to know whether the
argument is just a number type or an address to it, because if it is not
an address in memory then we cannot use Frida’s API for numerical
types and if we do we are going to screw up the target process.
Now we are going to see how to read and write these values whether
they are passed by value or by reference.

5.2.1 Numerical arguments passed by value.


In case that the arguments are simple integers like in the following
stub:

1 int
2 add(int a, int b)
3 {
4 return a + b;
5 }

If we write:

1 Interceptor.attach(addPtr, {
2 onEnter(args) {
3 console.log("a: " + args[0].toInt32());
4 console.log("b: " + args[1].toInt32());
5 }
6 }
7 );

Then if we try to read these numbers, args[0] and args[1] will point
to a hex representation of the arguments and we can simply call
toInt32() to get the real input.
Dealing with data types with Frida 35

5.2.2 Numerical values by reference


In this case we will always be working with an address that points to
the object we want to read, and because of this we will need to take
a glimpse of which APIs we have available:

API Description
{}.readInt() Read an Integer from the given address
{}.readUInt() Read an unsigned Integer from the
given address
Read a signed 8-16-32-64 bit integer
from the given address
{}.readShort() Read a short integer from the given
address
{}.readFloat Read a float number from the given
address.
{}.readDouble Read a double number from the given
address
{}.readLong() Read a long number from the given
address
{}.readULong() Read an unsigned long number from the
given address.
{}.readUShort() Read an unsigned short number from
the given address.
Read an unsigned integer from the
given address.

Where {} equals a NativePointer or ptr().


For this example we will use the same function as before which will
instead print two values to screen:
Dealing with data types with Frida 36

1 // Given a = 1337, b=7331


2 void
3 print_numbers(int *a, int *b)
4 {
5 printf("a:%d\nb:%d\n", *a, *b);
6 }

If we try to read args[0] and args[1] in this case, we will only get a
random address that is not understandable for us, but we can use the
hexdump API to see its contents:

1 7ffecdce5c08 a3 1c 00 00 39 05 00 00 c0 91 d0 b3 49 56 0\
2 0 00 ....9.......IV..

(Due to the writing format limitations, hexdump output is limited.)


We can see the pairs 1C A3 which fit decimal 7331, so there is our
number but… how do we read it? We can call the aforementioned
API readInt()

1 Interceptor.attach(addPtr, {
2 onEnter(args) {
3 console.log("a: " + args[0].readInt());
4 console.log("b: " + args[1].readInt());
5 }
6 }
7 );

Which will print out the a=1337 and b=7331 values.

5.2.3 Writing numbers


It is also possible to write our desired values to an address provided
we fit the appropriate struct and call the right API.
TODO TABLE OF WRITE NUMERICA APIS
Using as a basis the C example we have just seen before, we will now
try to change its values to 10 and 20.
Dealing with data types with Frida 37

1 Interceptor.attach(addPtr, {
2 onEnter: function(args) {
3 args[0].writeInt(10);
4 args[1].writeInt(20);
5 }
6 });

This only works if the args we are receiving are pointing


to an address containing the data structure we want, it
doesn’t not work for numbers passed by value. If you
want to see how to modify these values check Section 6.7
Modifying values before execution.

The output will be modified on our target program and as a result


prints a=10 and b=20 instead.
In the next section we will see how to deal with pointers.

5.3 Pointers
It is possible to read the address that a pointer is pointing to by using
the readPointer() API. This use case is going to be useful when there
is a pointer to a struct to be read. But a more in depth use case is
covered later in Section 7.4.
The following example shows a use case where the readPointer API
returns useful information. The recvfrom function takes the socklen_-
t argument as it is documented in the man pages¹:

1 ssize_t recvfrom(int sockfd, void *buf, size_t len, int f\


2 lags,
3 struct sockaddr *src_addr, socklen_t *ad\
4 drlen);

The man pages state:


¹https://linux.die.net/man/2/recvfrom
Dealing with data types with Frida 38

socklen_t, which is an unsigned opaque integral type of


length of at least 32 bits

Since there is a pointer to it in the 6th parameter, it can be retrieved


using readPointer:

1 // ...
2 onEnter: function (args) {
3 console.log(
4 args[5].readPointer();
5 );
6 }
7 // ...

If you are interested in understanding how Frida is able to interact


with
pointers from JS, please read the following quote:

About NativePointers
Frida is able to interact with pointers thanks to the
NativePointers objects that are present in Frida. The reason
why the NativePointer data type exists is because the JS number
type is backed by double, so it is not able to represent all 64-bit
pointers therefore whenever pointers are used in Frida they are
always backed by this data type.

It is also possible to work with pointers to offsets, these are introduced


in the next sub-section.

5.4 Pointer to offsets


In case offsets are obtained through reverse engineering or online
offset tables it is possible to use them in Frida. In order to do so, the
first requirement is getting the base address of the module of which
the offset belongs too:
Dealing with data types with Frida 39

myBaseAddr = Module.findBaseAddress('myLib.so');

findBaseAddress returns the base address of the module, which can


now be used to apply offsets:
myOffsetPtr = myBaseAddr.add(ptr('0x76E'))

And now this myOffsetPtr pointer can be used in our instrumentation


code in conjunction with other APIs like Interceptor.attach as seen in
Section 5.4.1.

Finding process’s modules


In the previous example myLib.so was used to get the base
address but what if the module name is not known to us or is
named differently from what was expected?
For this scenario, the best option would be calling the
Process.enumerateModulesSync() API that returns a list of
modules along their base addresses, sizes and paths.
For an example of this API, check Section 4.9

5.5 Getting pointers to exports


It is possible to use Interceptor.attach with process’s exports and
imports, however in case we want to do that we need an address
pointing to the export.
To retrieve this pointer we need to know the module name and the
name of the export however, if we pass null to the module argument
Frida will try return the first match of the export name. To get the
export’s address we have the Module.findExportByName API.
Syntax:
Module.findExportByName([MODULE_NAME], [EXPORT_NAME])

This API returns a pointer in case a valid export is found and null
in case nothing matches.
Dealing with data types with Frida 40

5.5.1 findExportByName vs
getExportByName
It is important to notice (specially if we are using autocomplete) that
there are two methods which seem to be similar, these are Mod-
ule.getExportByName and Module.findExportByName - The main
difference resides in what will happen if an export is not found.
.getExportByName will throw an exception in case the export is
not found whereas .findExportByName will simply return null. I
recommend using .getExportByName to be able to spot errors but
if you want to use .findExportByName be sure to check the return
values.

5.6 Pointer to ArrayBuffers


When allocating strings what is noticeable is the fact that it returns a
pointer and this pointer can already be used to substitute other strings
in the code:

1 [Local::]-> Memory.allocUtf8String('foo')
2 "0x7f81143f6be0"

This same situation applies to pointers to numerical


types where what is needed is to call the respective
.read/.write(.readU32()/.writeU32()) APIs. However this is
not the case for ArrayBuffer’s that don’t return an address as it can
be seen in this example:

1 [Local::]-> test = new ArrayBuffer(10)


2 0000 00 00 00 00 00 00 00 00 00 00 .\
3 .........

In this case the datatype has access to the .unwrap() method which
returns a pointer that points to the first element of the ArrayBuffer:
Dealing with data types with Frida 41

1 [Local::]-> test.unwrap()
2 "0x7fc17c210930"

Size of pointers
The size of pointers is something that must be taken
into account when performing more complex operations
and to ensure that an instrumentation script is portable
enough.
The API Process.pointerSize returns the size of a pointer
in bytes of the process that is being instrumented. This will
be needed in later sections like Section 6.3 and Section 6.4

5.7 Hexdump: getting a picture from


a memory region
The hexdump function returns a hexdump given a NativePointer or an
ArrayBuffer. This is useful when we want to observe a given address
or get a better representation of an ArrayBuffer.
Syntax:
hexdump(address, [, options])

Where options can be:

1 {
2 offset: number,
3 length: number,
4 header: true|false,
5 ansi: true|false,
6 }

To get a better representation, use console.log for pretty


printing.

For this quick example, write a simple “hello world” program, com-
pile it and fire it up in Frida. Once we are in it, we will call
Process.enumerateModules() and get the one matching our binary:
Dealing with data types with Frida 42

1 $ clang hello.c
2 $ frida -f a.out
3
4 [Local::a.out]-> Process.enumerateModulesSync()
5 [
6 {
7 "base": "0x1072f1000",
8 "name": "a.out",
9 "path": "/Users/fernandou/Desktop/a.out",
10 "size": 16384
11 },
12 ...
13 ]

Then we get the base address of our binary, which we can now print
using Frida:

You are free to use custom options in case you want to start at a
different offset or need longer lengths.

The hexdump API displays data byte-by-byte and only shows


printable ASCII characters. Do not use the output of this API as a
reference for calling Memory APIs such as readUtf8String.
Dealing with data types with Frida 43

5.8 Writing our first agent.


Until now the examples we have seen can be followed with a binary
of our choice and Frida’s REPL however, it is interesting to see how
to write a full-fledged instrumentation tool (with our agent and the
control script.).
Thankfully, in most recent versions of Frida the agent boilerplate can
be generated with frida-create, a CLI tool that creates a boilerplate
agent for us to fill:
$ frida-create agent

Which is the same as:


$ git clone git://github.com/oleavr/frida-agent-example.git

Beware that frida-create will create the agent in the current work-
ing directory. The output we should be getting should be this:

1 frida ) frida-create agent


2 Created ./package.json
3 Created ./tsconfig.json
4 Created ./agent/index.ts
5 Created ./agent/logger.ts
6 Created ./.gitignore
7
8 Run `npm install` to bootstrap, then:
9 - Keep one terminal running: npm run watch
10 - Inject agent using the REPL: frida Calculator -l _agent\
11 .js
12 - Edit agent/*.ts – REPL will live-reload on save
13
14 Tip: Use an editor like Visual Studio Code for code compl\
15 etion, inline docs,
16 instant type-checking feedback, refactoring tools, e\
17 tc.
18
19 frida ) ls -l
20 total 16
21 drwxr-xr-x 4 fernandou primarygroup 128 14 Feb 15:46 a\
22 gent/
Dealing with data types with Frida 44

23 -rw-r--r-- 1 fernandou primarygroup 449 14 Feb 15:46 p\


24 ackage.json
25 -rw-r--r-- 1 fernandou primarygroup 167 14 Feb 15:46 t\
26 sconfig.json

The first time we create the agent we need to run npm install to
bootstrap. Then, when we want to build our agent we will run:
npm run build

And it will create a file _agent.js with the instrumentation script for
us to use. It is also possible to run npm run watch to get live-reload
when a file is saved.
When we are finished with our control script, we have two options:

• Write a control script to manage messages and inject our agent.


• Use Frida’s REPL frida -l _agent.js notepad.exe to inject the
agent.

Although we can use REPL for quick tests, we will go the long way
now by writing a control script.

5.8.1 Writing the control script


1 import os
2 import sys
3
4 import frida
5
6 _SCRIPT_FILENAME = 'agent.js'
7
8 def on_message(message, date):
9 """Print received messages."""
10 print(message)
11
12 def main(process_name):
13 with open(_SCRIPT_FILENAME, 'r') as script_file:
14 code = script_file.read()
15
16 device = frida.get_local_device()
Dealing with data types with Frida 45

17 pid = device.spawn(process_name)
18 print('pid: %d' % pid)
19
20 session = device.attach(pid)
21
22 script = session.create_script(code)
23 script.on('message', on_message)
24 script.load()
25
26 device.resume(pid)
27
28 print('Press CTRL-Z to stop execution.')
29 sys.stdin.read()
30 session.detach()
31
32 if __name__ == '__main__':
33 main(sys.argv[1])
Dealing with data types with Frida 46

This is the core instrumentation script that we are going to be using


for our examples. You are free to work directly with Frida’s REPL but
if you want to automate things our want more control over what is
happening then be sure to use this script.
After some reading, we will see how to extend this script to enable
child-gating which translates into handling child processes automag-
Dealing with data types with Frida 47

ically.
Let’s explain the most important parts of this script:

1 def on_message(message, data):


2 """Print received messages."""
3 print(message)

The on_message callback will receive the messages from the agent, we
will print them and avoid handling them for now.

1 device = frida.get_local_device()
2 pid = device.spawn(process_name)
3 print('pid: %d' % pid)

frida.get_local_device() get’s the local device (in this case, our


desktop OS) and once we get the device object we can call the
spawn(PID|ProcessName) API. If successful, this returns a pid (int)
we can use and process remains suspended.

1 session = device.attach(pid)
2
3 script = session.create_script(code)
4 script.on('message', on_message)
5 script.load()
6
7 device.resume(pid)

We then create a session object with the attach(pid) API which


allows us to interact with the process. This session enables the
.create_script(str) method to send our instrumentation script
(which we have opened and read before, or we can just hardcode
it.) and returns the script object.
Once we have the script object, we can assign how each callback
will be handled (for now, just the message one) with .on(event,
callback) - When we are finished assigning callbacks we can load
the instrumentation script.
When we have finished with everything, we can call
device.resume(pid) to resume the process and instrumentation will
Dealing with data types with Frida 48

begin. When we are doing, we can call session.detach to detach


from the instrumented process and revert any instrumentation
(hooks will be reverted).
This instrumentation script is fairly simple and might not seem useful
at first, but once we learn about the child-gating feature and RPC
exports we will see how this script can be extended and become more
useful.
For now, you can extend this script to log messages to a file instead
of printing them to the console.

When you send lots of send() events from JS the on_-


message callback will be overloaded slowing things down.
Be sure to batch messages to reduce overhead before
sending them.

5.9 Injecting our scripts using Frida’s


command line
Although most of the examples of the book can be done in Frida’s
REPL (this is, writing each statement separately) its better and I
encourage you to write everything in an instrumentation script so
that it’s easier to debug and read. There are also some limitations on
Frida’s REPL such as not being able to use let and const keywords,
be careful!
To inject instrumentation scripts we use the -l parameter:
frida -l myscript.js <PID|process name>

However, this will direct us to Frida’s REPL and we will need


to %resume execution. In this case, we can take advantage of the
--no-pause flag.

$ frida -l myscript.js <PID|process_name> --no-pause

Or spawning a process from scratch:


$ frida -l myscript.js -f <filepath> --no-pause
Dealing with data types with Frida 49

To finish with this section, in the next one let’s take a look at remote
instrumentation.

5.10 Remote instrumentation


There are situations when we want to instrument applications re-
motely such as instrumenting malware (so that our development
environment is safe) or whenever physical development is not an
option (a mainframe, a remote server, mobile devices…).
Fortunately, Frida provides pre-built binaries for the server-side.
These binaries can be found under the releases² and are tagged
as frida-server (for example frida-server-14.2.14-windows-x86_-
64.exe.xz).

To remotely instrument an application with Frida we need:

1. A remote system running frida-server


2. A binary that is able to run under the remote system mentioned
in point 1.
3. A local (this being our physical machine) installation of Frida
(frida-tools and frida packages).
4. frida-server and our physical installation of Frida must match
versions.

First, let’s set up the remote server. What is needed is only the
frida-server binary so it can be downloaded from the aforemen-
tioned github releases page:

1 $ wget https://github.com/frida/frida/releases/download/1\
2 4.2.14/frida-server-14.2.14-linux-x86_64.xz`

Then extract it and give it execution permissions:


$ unxz frida-server-14.2.14-linux-x86_64.xz

²https://github.com/frida/frida/releases
Dealing with data types with Frida 50

1 file frida-server-14.2.14-linux-x86_64
2 chmod +x frida-server-14.2.14-linux-x86_64

And then start listening:


./frida-server-14.2.14-linux-x86_64 -l 0.0.0.0

Now that the server-side part is covered, we can finally get to the the
client part(our local computer).
The -H flag in frida and frida-trace allows us to connect to an
specific address/port combination:
frida -H IP:PORT
or
frida-trace -H IP:PORT

In this example, we open the /bin/ls binary in the remote server and
try to obtain the Process.pointerSize of it. In this case, the syntax
would be:
frida -H 192.168.1.101 -f /bin/ls

-fspawns a binary from the remote filesystem and -H sets the IP


address with no port so it uses the default one. And the final result is:
Dealing with data types with Frida 51

The Frida CodeShare repository


The Frida community has a codeshare repository where
Frida instrumentation scripts are shared. There are mul-
tiple scripts that provide some interesting functionality
such as tracing JNI APIs in Android or an ObjC method
observer script. What is interesting about this repository
is that the Frida command line is able to fetch and load
scripts remotely. For example say we want to dump an
iOS application using @Lichao890427’s script, it would be
possible by writing the following command:
$ frida --codeshare lichao890427/dump-ios -f YOUR_-
BINARY

Use these scripts wisely and avoid reinventing the wheel


as much as possible, there is always room for improve-
ment though.
Frida’s CodeShare repository can be found at
https://codeshare.frida.re/
6. Intermediate usage
6.1 Defining globals in Frida’s REPL
One thing we notice when executing scripts via Frida’s REPL (frida
-l script.js) is that any variables we have in our script are not
accessible after it has been executed. The reason is that the code inside
frida-compile (in the REPL) runs inside an anonymous function so
that variables aren’t leaked out to the global namespace.
To achieve this behaviour, we ned to use the global trick. Say we
have a variable named CreateFileWPtr that stores the pointer to
CreateFileW function, we would create it this way:
const CreateFileWPtr = Module.getExportByName('kernelbase.dll',
'CreateFileW')

But we wouldn’t be able to access it from REPL. So, we use the


following syntax:
(global).variableName = realVariable - For JavaScript.
(global as any).variableName = realVariable - For TypeScript.
Where realVariable is any variable, const, function we want to
expose. So for our CreateFileWPtr:
(global).CreateFileWPtr = CreateFileWPtr

or
(global as NativePointer).CreateFileWPtr = CreateFileWPtr

And once we run our small script we can access our variable:

1 frida -l globalstest.js notepad.exe


2 [Local::notepad.exe]-> CreateFileWPtr
3 "0x7f7a59bfe500"
Intermediate usage 53

6.2 Following child processes


In previous pages we have seen how to create a basic control script to
inject our instrumentation agent in a target process. What we want to
do now is to detect child processes using Frida’s child-gating feature.

This feature is not available from the CLI tools (Frida’s


REPL and frida-trace).

It is important to understand that the child-gating feature is OS-


dependent and hence will not detect every possible way to spawn
a child-process. Windows detects CreateProcessInternalW calls to
follow child processes while on Linux it detects fork() and vfork()
calls.
For this example, we use the official Frida’s child-gating script¹ (after
all, we don’t need to reinvent the wheel) and explain it:

1 # -*- coding: utf-8 -*-


2 from __future__ import print_function
3
4 import threading
5
6 import frida
7 from frida_tools.application import Reactor
8
9
10 class Application(object):
11 def __init__(self):
12 self._stop_requested = threading.Event()
13 self._reactor = Reactor(run_until_return=lambda r\
14 eactor: self._stop_requested.wait())
15
16 self._device = frida.get_local_device()
17 self._sessions = set()
18
19 self._device.on("child-added", lambda child: self\
¹https://raw.githubusercontent.com/frida/frida-python/master/examples/child_
gating.py
Intermediate usage 54

20 ._reactor.schedule(lambda: self._on_child_added(child)))
21 self._device.on("child-removed", lambda child: se\
22 lf._reactor.schedule(lambda: self._on_child_removed(child\
23 )))
24 self._device.on("output", lambda pid, fd, data: s\
25 elf._reactor.schedule(lambda: self._on_output(pid, fd, da\
26 ta)))
27
28 def run(self):
29 self._reactor.schedule(lambda: self._start())
30 self._reactor.run()
31
32 def _start(self):
33 argv = ["/bin/sh", "-c", "cat /etc/hosts"]
34 env = {
35 "BADGER": "badger-badger-badger",
36 "SNAKE": "mushroom-mushroom",
37 }
38 print("� spawn(argv={})".format(argv))
39 pid = self._device.spawn(argv, env=env, stdio='pi\
40 pe')
41 self._instrument(pid)
42
43 def _stop_if_idle(self):
44 if len(self._sessions) == 0:
45 self._stop_requested.set()
46
47 def _instrument(self, pid):
48 print("[*] attach(pid={})".format(pid))
49 session = self._device.attach(pid)
50 session.on("detached", lambda reason: self._react\
51 or.schedule(lambda: self._on_detached(pid, session, reaso\
52 n)))
53 print("[*] enable_child_gating()")
54 session.enable_child_gating()
55 print("[*] create_script()")
56 script = session.create_script("""\
57 Interceptor.attach(Module.getExportByName(null, 'open'), {
58 onEnter: function (args) {
59 send({
60 type: 'open',
61 path: Memory.readUtf8String(args[0])
Intermediate usage 55

62 });
63 }
64 });
65 """)
66 script.on("message", lambda message, data: self._\
67 reactor.schedule(lambda: self._on_message(pid, message)))
68 print("[*] load()")
69 script.load()
70 print("[*] resume(pid={})".format(pid))
71 self._device.resume(pid)
72 self._sessions.add(session)
73
74 def _on_child_added(self, child):
75 print("[+] child_added: {}".format(child))
76 self._instrument(child.pid)
77
78 def _on_child_removed(self, child):
79 print("[-] child_removed: {}".format(child))
80
81 def _on_output(self, pid, fd, data):
82 print("[*] output: pid={}, fd={}, data={}".format\
83 (pid, fd, repr(data)))
84
85 def _on_detached(self, pid, session, reason):
86 print("[-] detached: pid={}, reason='{}'".format(\
87 pid, reason))
88 self._sessions.remove(session)
89 self._reactor.schedule(self._stop_if_idle, delay=\
90 0.5)
91
92 def _on_message(self, pid, message):
93 print("[*] message: pid={}, payload={}".format(pi\
94 d, message["payload"]))
95
96
97 app = Application()
98 app.run()

This script will instrument /bin/sh -c cat /etc/hosts which in turn


spawns a child process (the cat command).

1. _start sets the application to instrument and its arguments.


Intermediate usage 56

Once the process is spawned and suspended, its PID is sent to


_instrument.
2. _instrument attaches to the received PID and creates a session.
This session sets the detached callback to log when a process is
detached and the reason why it happened. Also session enables
the child-gating feature using session.enable_child_gating().
Then, it creates a script using the string we pass it that instru-
ments the open function. After this, this script is sets message
callback to receive message events from our instrumentation
script. Once everything is set, our target process is resumed.
3. on_child_added callback receives the child-added event from
our instrumented process and repeats step (2) again.
4. The instrumentation script continues its execution until no more
processes are instrumented (this is, neither parent nor child
processes are alive).

Although this can be a bit difficult to understand the following


sequence diagram should be helpful:

When we are finished writing the script, we can run it and get the
following output:
Intermediate usage 57

6.3 Creating NativeFunctions


It is possible to use NativeFunctions to create auxiliary functions we
can call from JavaScript. This is useful in case that we want to call
exports for our own sake or use the process’s functions at will.
NativeFunction syntax:
new NativeFunction(address, returnType, argTypes[, abi])

What we need to create the NativeFunction object:

• A valid pointer to a function.


• Argument(s) types.
• Return value type.
• (Optional) ABI or calling convention (stdcall, fastcall…)

Let’s do a quick example, say way have the following add function:
Intermediate usage 58

1 int
2 add(int a, int b)
3 {
4 return a + b;
5 }

And we want to call in our own terms at will, how do we create the
native function? We are not taking into account for now the ABI.
This leaves us with a=int, b=int, return=int. So, we can build our own
NativeFunction now:
new NativeFunction(ptr(my_address), 'int', ['int', 'int'])

And we are able to call it in our own terms:

1 const myAdd = new NativeFunction(ptr(0x0065fd40), 'int', \


2 ['int', 'int']);
3
4 myAdd(10, 10);

In case that you are sure the ABI is _fastcall for example, you can
add the calling convention parameter:
new NativeFunction(ptr(my_address), 'int', ['int', 'int'],
'fastcall)

6.3.1 Using NativeFunction to call system


APIs
We can now reuse the previously acquired knowledge to create a
NativeFunction from a known system API and call it ourselves. When
doing this, as mentioned before we must be very careful when setting
up the function parameters and return values but also the types we
are passing to the function when called.
In this first example, we are going to create a NativeFunction from
the mkdir API:
Intermediate usage 59

1 #include <sys/types.h>
2 #include <sys/stat.h>
3 #include <unistd.h>
4 #include <stdio.h>
5
6 void
7 main()
8 {
9 mkdir("/home/fernandou/frida/test_folder", 0700);
10
11 struct stat st = { 0 };
12 if (stat("/home/fernandou/frida/test_folder", &st) ==\
13 -1)
14 {
15 puts("Folder does not exist.\n");
16 }
17 else
18 {
19 puts("Folder exists.\n");
20 }
21 }

We can then run the program in Frida’s REPL:

1 $ clang mkdir.c -o cMkdir


2 $ frida -f cMkdir

Once we are inside the program, we first need a pointer to the mkdir
API:

1 [Local::a.out]-> mkdir = Module.getExportByName(null, 'mk\


2 dir')
3 "0x7fff204ca3b4"

Once we have the pointer we have build our NativeFunction. The


argument that mkdir is receiving is a const char* so we need to set
the argument as a pointer and allocate an UTF8 string to pass it:
Intermediate usage 60

1 [Local::a.out]-> folderName = Memory.allocUtf8String("fri\


2 dahandbook")
3 "0x108b6dbe0"

When outside of Frida’s REPL always use const when passing


allocated buffers to NativeFunctions.

We allocate the string fridahandbook as we want that as the folder


name and then we need to create the NativeFunction:

1 [Local::a.out]-> frida_mkdir = new NativeFunction(mkdir, \


2 'int', ['pointer'])
3 function
4 [Local::a.out]-> frida_mkdir(folderName)
5 0
6 [Local::a.out]-> \
7 \
8
9 Thank you for using Frida!
10
11 ➜ frida ls -l
12 d---r----- 2 fernandou primarygroup 64 26 Fe\
13 b 14:22 fridahandbook

We create our nativefunction with pointer as argument and return


value of int, it is not neccesary in this example to specify the ABI.
Once we have defined the NativeFunction, we can then call it using
the previously allocated UTF8 string buffer. We can see that the return
value is 0 which as per man pages² documentation states:

mkdir() and mkdirat() return zero on success, or -1 if an


error occurred (in which case, errno is set appropriately).

Which means our NativeFunction worked flawlessly and we have our


folder created!
²https://man7.org/linux/man-pages/man2/mkdir.2.html
Intermediate usage 61

6.4 Modifying return values


In case we want to modify the external control flow of the program
or indicate that a function call’s output is different we can do it using
the .replace() method. A quick example is KERNEL32.DLL!MoveFileW
which returns 0 if the function fails or anything but zero if it succeeds.
Let’s say the program is checking if a file was moved before continu-
ing, but we only want it to believe that it was actually moved:

1 const retvalOne:NativePointer = ptr(0x1);


2 class MoveFileW {
3 onLeave(retval:NativeReturnValue) {
4 retval.replace(retvalOne);
5 }
6 }

NativeReturnValue has access to the retval method and we can use


it to modify objects. It is again recommended to work with const as
much as possible to prevent problems.

6.5 Access values after usage


Sometimes we want to know what has changed in an entry parameter
during the function execution or we want to know how a buffer’s
contents changes, what is written to it or change the value after
execution.
It is possible to do this with Frida however, we need to take into
account some extra information.
If we want to store any of the original arguments of a function, keep it
mind that these will always be NativePointers unless we do anything
to them with an API such as .readUtf16String() – But this is not
what we want to do when we need to see how they have changed.
Basic structure
Intermediate usage 62

1 class myInstrumentedFunction{
2 firstParam:NativePointer = null;
3
4 onEnter(args:NativePointer[]) {
5 this.firstParam = args[0];
6 }
7
8 onLeave(retval:NativeReturnValue) {
9 console.log(this.firstParam.readCString())
10 }
11 }

The first thing we want to do is creating the variable we want to


access during the onLeave stage and give it a NativePointer type.
After that, we will store the pointer to the argument firstParam using
the this keyword. In this case, the this keyword affects the class
scope so it is accessible during both stages.
Then we can access it during the onLeave stage. In case that
it was a buffer, we can print it during the onLeave stage using
this.firstParam.readCString() to see its contents or we can check
its contents with the hexdump() API.
We will now work with a real example of how this can be used for.

6.6 CryptDecrypt: A practical case.


The CryptDecrypt API is the perfect example about the aforemen-
tioned subject. Let’s see what MSDN has about this API:

1 BOOL CryptDecrypt(
2 HCRYPTKEY hKey,
3 HCRYPTHASH hHash,
4 BOOL Final,
5 DWORD dwFlags,
6 BYTE *pbData,
7 DWORD *pdwDataLen
8 );

MSDN notes:
Intermediate usage 63

1 pbData
2
3 A pointer to a buffer that contains the data to be decryp\
4 ted. After the decryption has been performed, the plainte\
5 xt is placed back into this same buffer.
6
7 The number of encrypted bytes in this buffer is specified\
8 by pdwDataLen.

So we have an hKey but that is not the most important argument for
us in this, it is the *pbData pointer and the *pwdDataLen pointer. The
way this API works is that once the function body is executed and
we are in the onLeave or return stage the pbData pointer which is
initially encrypted is decrypted and we can read it.
To achieve this, we will need to store the pointers of pbData and
pdwDataLen to be able to access them later on.

1 class CryptDecrypt {
2 buffer_size? : NativePointer;
3 buffer? : NativePointer;
4
5 onEnter (args:NativePointer[]) {
6 this.buffer = args[4];
7 this.buffer_size = args[5];
8 }
9
10 onLeave (retval:InvocationReturnValue) {
11 this.buffer.readCString(this.buffer_size);
12 }
13 }

In this case we first have a variable we want to store, which is buffer_-


size matching args[5] and buffer matching args[4] – Hence what
we want to do is to keep the address that they are pointing to at the
beginning of the function and see how they have changed in the end.
If we try to inspect this.buffer = args[4] during onEnter, we will
only get random or unintelligible data. However, once we enter the
onLeave stage we can see the decrypted data and the size of it.
In case that what we are trying to inspect is not a string or a number
Intermediate usage 64

we can work with the hexdump API specially if we know about the
length:

1 class CryptDecrypt {
2 buffer_size?:NativePointer;
3 buffer?:NativePointer;
4
5 onEnter (args:NativePointer[]) {
6 this.buffer = args[4];
7 this.buffer_size = args[5];
8 }
9
10 onLeave (retval:InvocationReturnValue) {
11 let buffer_size;
12
13 if (this.buffer_size) {
14 hexdump(this.buffer,{ length:
15 this.buffer_size.readPointer().toInt32()
16 });
17 }
18 }
19 }

6.7 Modifying values before execution


In case we are interested in modifying input values when a function
is called and that is the only thing you want to do then it is safe to say
that modifying function parameters is a safe bet, but it must be done
with caution.
If what we want to do is calling functions with custom parameters
for our own sake (e.g. calling CreateFileW to return us a HANDLE) then
it is best to create a NativeFunction and do it that way. In case what
we want to do is replace arguments when a function is called, this is
what we are here for.
Say we have a program that checks if a file exists, if this file does exist
then the program will not continue its execution. We do not want to
remove this file because it is a dependency for other programs in our
Intermediate usage 65

environment. What we are going to do is to deceive the program to


redirect it to a different path.
This is our sample program:

1 #include <sys/types.h>
2 #include <sys/stat.h>
3 #include <unistd.h>
4 #include <stdio.h>
5
6 void
7 main()
8 {
9 struct stat st = { 0 };
10 if (stat("/bin/ls", &st) == -1)
11 {
12 puts("File does not exist.\nInstalling our own bu\
13 sybox binaries");
14 // execute real code
15 }
16 else
17 {
18 puts("Folder exists.\n");
19 // exit without doing anything
20 }
21 }

This program is similar to the one that we have seen before, it calls
stat to check if the file exists and if it doesn’t it continue its execution.

In this case, our first idea would be to get the pointer to the stat
function but that will lead us to an error. Frida will give us a valid
pointer to stat but that address is not the one that is going to be
called in the end. For this, we will check with a disassembler (radare2
in my case) and Frida:
Intermediate usage 66

As we can notice by checking the disassembly:


Intermediate usage 67

1 │ 0x100003ed6 e843000000 sym.imp.memse\


2 t () ; void *memset(void *s, int c, size_t n)
3 │ 0x100003edb 488d3d7c0000. rdi = [str.bi\
4 n_ls] ; section.3.__TEXT.__cstring
5 │ \
6 ; 0x100003f5e ; "/bin/ls"
7 │ 0x100003ee2 488bb568ffff. rsi = qword [\
8 var_98h]
9 │ 0x100003ee9 e83c000000 sym.imp.stat_\
10 INODE64 ()
11 │ 0x100003eee 83f8ff var = eax - 0\
12 xffffffffffffffff
13 │ ┌─< 0x100003ef1 0f8511000000 if (var) goto\
14 0x100003f08

What ends up being called is stat_INODE64. Now we will check how


this function is named in in Frida’s REPL:

1 [Local::a.out]-> Module.enumerateImports("a.out")
2 [
3 {
4 "address": "0x7fff2051725c",
5 "module": "/usr/lib/libSystem.B.dylib",
6 "name": "dyld_stub_binder",
7 "slot": "0x103229000",
8 "type": "function"
9 },
10 {
11 "address": "0x7fff205456f8",
12 "module": "/usr/lib/libSystem.B.dylib",
13 "name": "memset",
14 "slot": "0x10322d000",
15 "type": "function"
16 },
17 {
18 "address": "0x7fff20410274",
19 "module": "/usr/lib/libSystem.B.dylib",
20 "name": "puts",
21 "slot": "0x10322d008",
22 "type": "function"
23 },
Intermediate usage 68

24 {
25 "address": "0x7fff204ca39c",
26 "module": "/usr/lib/libSystem.B.dylib",
27 "name": "stat$INODE64",
28 "slot": "0x10322d010",
29 "type": "function"
30 }
31 ]

And in this list we can notice a mangled name:

1 {
2 "address": "0x7fff204ca39c",
3 "module": "/usr/lib/libSystem.B.dylib",
4 "name": "stat$INODE64",
5 "slot": "0x10322d010",
6 "type": "function"
7 }

This means that if we want to call Module.getExportByName we need


the stat$INODE64 function name instead of stat.
Once we have this data, we can write our instrumentation script:

1 const redirectString = Memory.allocUtf8String("/bin/fooba\


2 r");
3 const statPtr = Module.getExportByName(null, "stat$INODE6\
4 4");
5
6 Interceptor.attach(statPtr, {
7 onEnter(args) {
8 const firstArg = args[0];
9 let statArg = firstArg.readUtf8String();
10 console.log("stat is checking: " +
11 firstArg.readUtf8String());
12 if (statArg.indexOf("bin/ls") != -1) {
13 args[0] = redirectString;
14 }
15
16 console.log("final stat path: " +
17 args[0].readUtf8String());
18 }
19 });
Intermediate usage 69

We can fire this script up:


$ frida -f a.out -l script.js --no-pause

And the output:

1 ➜ Desktop frida -f a.out -l script.js --no-pause


2 ____
3 / _ | Frida 14.2.12 - A world-class dynamic instru\
4 mentation toolkit
5 | (_| |
6 > _ | Commands:
7 /_/ |_| help -> Displays the help system
8 . . . . object? -> Display information about '\
9 object'
10 . . . . exit/quit -> Exit
11 . . . .
12 . . . . More info at https://www.frida.re/docs/home/
13 Spawned `a.out`. Resuming main thread! \
14
15 File does not exist.
16 Installing our own busybox binaries
17 stat is checking: /bin/ls
18 final stat path?: /bin/foobar
19 [Local::a.out]-> Process terminated

Is the one we expected, stat parameter gets redirected to /bin/foobar


instead.

6.8 Undoing instrumentation


There are times when we want to destroy instrumenting a function
after a certain set of conditions happen either because we do not have
any more checks to do against this instrumented code or because it is
going to impact the process’s performance. With Frida it is possible
to undo hooks in runtime.
For this part, we will reuse the previous example but it will check two
files instead:
Intermediate usage 70

1 #include <sys/types.h>
2 #include <sys/stat.h>
3 #include <unistd.h>
4 #include <stdio.h>
5
6 void
7 check_file(char* path)
8 {
9 struct stat st = { 0 };
10 if (stat(path, &st) == -1)
11 {
12 printf("File [%s] does not exist.\n", path);
13 }
14 else
15 {
16 printf("File [%s] does not exist.\n", path);
17 }
18 }
19 void
20 main()
21 {
22 check_file("/bin/ls");
23 check_file("/bin/cd");
24 }

So now, we want to check if ls exists but we do not want to do


any further checks. For this, we will undo instrumentation using
InvocationListener which is what Interceptor.attach returns (we
can check this if we are using Frida with TypeScript’s autocomplete).
Our script will be:
Intermediate usage 71

1 const redirectString = Memory.allocUtf8String("/bin/fooba\


2 r");
3 const statPtr = Module.getExportByName(null, "stat$INODE6\
4 4");
5
6 let statListener = Interceptor.attach(statPtr, {
7 onEnter(args) {
8 this.removeHook = false;
9 let statArg = args[0].readUtf8String();
10 console.log("stat is checking: " + args[0].readUtf8St\
11 ring());
12 if (statArg.indexOf("bin/ls") != -1) {
13 args[0] = redirectString;
14 this.removeHook = true;
15 }
16 console.log("final stat path?: " + args[0].readUtf8St\
17 ring());
18 },
19 onLeave(retval) {
20 if (this.removeHook) {
21 console.log("Removing stat instrumentation...");
22 statListener.detach();
23 }
24 }
25 });

In this case we set statListener to the return of Interceptor.attach,


then we will also set a shared variable between onEnter and onLeave
named this.removeHook which is by default set to false. In case we
want to remove instrumentation and this variable is set to true, we
can call the detach() method of statListener and remove instrumen-
tation.
And finally, this is the output we get:
Intermediate usage 72

1 $ frida -f a.out -l script.js --no-pause


2
3 File [/bin/ls] does not exist.
4 File [/bin/cd] does not exist.
5 stat is checking: /bin/ls
6 final stat path?: /bin/foobar
7 Removing stat instrumentation...
8 [Local::a.out]-> Process terminated

We can see that our instrumentation code is checking stat until


/bin/ls is checked. Right afterwards, we will drop instrumentation
and let the process run as usual.

6.9 std::string
Something that is very interesting to us is the ability to read
strings, however this is not always possible by simply calling Frida’s
readUtf8String/readCString built-ins due to the different ways a
string can be represented. For example, Window’s UNICODE_STRING
is defined in a struct as follows:

1 typedef struct _UNICODE_STRING {


2 USHORT Length;
3 USHORT MaximumLength;
4 PWSTR Buffer;
5 } UNICODE_STRING, *PUNICODE_STRING;

A common string type to parse is a C++ std::string. A simi-


lar concept will be seen in Swift.String’s datatype later on. For
std::string’s the LSB (Least Significant Bit) will store 0 if it’s a
short string (< 22 bytes) or 1 for long strings. If it’s a long string, the
pointer to the string we want to get will be stored at two times the
Process.pointerSize of the process we are attached to.
To test this knowledge out and see how to obtain the string, let’s see
this simple program:
Intermediate usage 73

1 #include <iostream>
2
3 void print_std_string(std::string arg_1) {
4 std::cout << arg_1 << std::endl;
5 }
6
7 int
8 main(void) {
9 std::string my_string =
10 "Frida is great,"
11 " you should check it out at frida.re";
12 print_std_string(my_string);
13 return 0;
14 }

This program simply calls the print_std_string(std::string


arg_1) function and prints it to screen. This way it is easy to get the
std::string parameter and inspect it.
Once we fire up this program in Frida’s REPL and run
Module.enumerateExportsSync() on our binary we notice that names
are mangled, but due to the name we have chosen for the test function
we can spot a mangled function named _Z16print_std_stringNSt3_-
_112basic_stringIcNS_11char_traitsIcEENS_9allocatorIcEEEE.
This is the function we want to use Interceptor.attach on.

1 Interceptor.attach(Module.getExportByName(null, '_Z16prin\
2 t_std_stringNSt3__112basic_stringIcNS_11char_traitsIcEENS\
3 _9allocatorIcEEEE'), {
4 onEnter(args) {
5 const LSB = args[0].readU8() & 1;
6 console.log('LSB: ' + LSB);
7 const stdString = args[0].
8 add(Process.pointerSize * 2).
9 readPointer().
10 readUtf8String();
11 console.log("std::string: " + stdString);
12 }
13 });

Then, we can run this small script and get the following output:
Intermediate usage 74

1 LSB: 1
2 std::string: Frida is great, you should check it out at f\
3 rida.re
4 [Local::a.out]-> Process terminated

It is important to address that this was tested using clang++ 12.0.0,


the memory layout may differ within compilers such as GCC which
implements unions to store small strings.
Another common use-case is to examine a std::vector. The way an
std::vector<> is represented in memory varies on the process archi-
tecture and compiler, in this example we are going to demonstrate
how it works under clang++.
1 // clang++ vectortest.cc
2 #include <vector>
3 #include <iostream>
4
5 void print_vector_size(std::vector<int> v)
6 {
7 std::cout << "vector size:" << v.size() << std::endl;
8 }
9
10 int main()
11 {
12 std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8, 9, 10};
13 print_vector_size(v);
14 return 0;
15 }

The above code shows a very simple procedure where a function


prints the length of the vector. When inspecting a vector, the most
important thing to be aware of is of its size and its datatype(s).
Given the data-type it is possible to iterate over the pointer up to the
latest element of the vector.
The way this vector is represented in memory is as follows:
Intermediate usage 75

Every 4 bytes there is a member of the vector from 0x01 to 0x0a. The
tail of this vector can be obtained by offsetting the original pointer
by the Process.pointerSize (in this case a 64-bit application the
pointerSize=8). Because every member of the vector is an int every
4 bytes of the vector it is possible to obtain the value of the it.
In other words:

• std::vector pointer at offset 0: Stores the pointer to the head of


the vector.
• std::vector pointer at offset Process.pointerSize: Stores the
pointer to the tail of the vector.

To test this out, the following code will iterate every member of the
std::vector and store them in a list:

1 const vectorPrintPtr = ptr(0x00400af0);


2
3 Interceptor.attach(vectorPrintPtr, {
4 onEnter: function (args) {
5 const finalAddr = args[0].add(8).readPointer();
6 let startAddr = args[0].readPointer();
7 let vector_size = 0;
8 let elements = [];
9 while (startAddr < finalAddr) {
10 vector_size += 1;
11 elements.push(startAddr.readInt());
12 startAddr = startAddr.add(0x4);
13 }
14 console.log(elements);
15 console.log('vector_size:' + vector_size);
16 }
17 })

And when running this instrumentation script we get the following


output:
Intermediate usage 76

1 [Local::a.out ]->
2 1,2,3,4,5,6,7,8,9,10
3 vector_size:10

6.9.1 std::vector in MSVC


MSVC++ generates a different memory layout for the vector structure.
In this case the head of the pointer is not placed at the address that the
pointer points to. Instead the address of the head is found by offsetting
the pointer by the Process.pointerSize. The head of the tail is at
double the pointerSize of the process (in 64-bit, that is the 16).
In short:

• The pointer that points to the address of the vector head is placed
at Process.pointerSize
• The pointer that points to the address of the vector tail is placed
at double the Process.pointerSize

Which leaves us with the following code:

1 const vectorPrintPtr = ptr(0x00400af0);


2
3 Interceptor.attach(vectorPrintPtr, {
4 onEnter: function(args) {
5 const finalAddr = args[0].
6 add(Process.pointerSize * 2).
7 readPointer();
8 this.startAddr = args[0].
9 add(Process.pointerSize).
10 readPointer();
11 this.vector_size = 0;
12 this.elements = [];
13 while (this.startAddr < finalAddr) {
14 this.vector_size += 1;
15 this.elements.push(this.startAddr.readInt());
16 this.startAddr = this.startAddr.add(0x4);
17 }
18 console.log(this.elements);
19 console.log('vector_size:' + this.vector_size);
Intermediate usage 77

20 }
21 })

And when instrumenting our MSVC application the instrumentaiton


script prints the following output:

1 [Local::vectortest.exe ]->
2 1,2,3,4,5,6,7,8,9,10,11
3 vector_size:11

6.10 Operating with ArrayBuffers


Lots of situations will require working directly with ArrayBuffers but
operating with them might not be straightforward because data might
not always be simple strings. To get a better understanding of how
to operate with them in Frida we will use the fprintf function and
replace the contents of the second argument (our aim is to replace
“target” to “foobar”).

1 int main(int argc, char *argv[]){


2 if(argc < 3){
3 fprintf(stderr, "Usage: %s <target> <port\
4 >\n", argv[0]);
5 exit(1);
6 }
7 return 0;
8 }

This C program calls the function that is defined as: int


fprintf
fprintf(FILE *stream, const *format, ...);. Usually,
char
functions will provide us with the length of the string but this is not
the case so to get around this limitation .readCString() will provide
us the length of the *format parameter.
Intermediate usage 78

1 const fprintfPtr = Module.getExportByName(null, "fprintf"\


2 );
3
4 Interceptor.attach(fPrintfPtr, {
5 onEnter(args) {
6 const firstArg = args[1];
7 this.bufferSize = firstArg.readCString().length + 1;
8 this.arrayBuf = firstArg.readByteArray(this.bufferSiz\
9 e);
10 this.str = String.fromCharCode.apply(null, new Uint8A\
11 rray(this.arrayBuf));
12 this.str = this.str.replace("target", "foobar");
13 args[1] = str2ab(this.str).unwrap();
14 },
15 });
16
17 function str2ab(str) {
18 let buf = new ArrayBuffer(str.length);
19 let bufView = new Uint8Array(buf);
20 for (var i = 0, strLen = str.length; i < strLen; i++) {
21 bufView[i] = str.charCodeAt(i);
22 }
23 return buf;
24 }

To understand how this instrumentation code works we will examine


it step by step.
str2ab is an auxiliary function that converts a string back to an
Uint8Array.

1 this.buffer_size = args[1].readCString().length + 1;
2 console.log("buffer_size:" + this.buffer_size);
3 this.arrayBuf = args[1].readByteArray(this.buffer_size);

readCString() returns the contents of the format* string and gives us


the length of the content (+ 1 for the null terminator). Having the size
allows us to call readByteArray with the correct size.
Intermediate usage 79

1 let str = String.fromCharCode.apply(null, new Uint8Array(\


2 arrayBuf));
3 str = str.replace("target", "foobar");

When the ArrayBuffer is obtained by calling readByteArray we use


String.fromCharCode.apply(null, new Uint8Array) to convert it
to a human-readable string (you can skip this step and modify the
ArrayBuffer directly).
args[1] = str2ab(str).unwrap();

Once the string is modified the str2ab function transforms the string
back to an Uint8Array but we cannot just reassign this Uint8Array
to args[1] because it is expecting a pointer. To do so, Frida has an
auxiliary method called .unwrap() that returns a pointer to the first
element of the ArrayBuffer.
Then, it is possible to verify the output:

1 frida -f a.out -l ins.js --no-pause -q


2 . . . . Connected to Local System (id=local)
3 Spawned `a.out`. Resuming main thread! \
4
5 Usage: a.out <foobar> <port>

The str2ab (String to ArrayBuffer) function is a slightly modified


version of the one found in developers.google.com³ adapted to
Uint8Arrays.
³https://developers.google.com/web/updates/2012/06/How-to-convert-
ArrayBuffer-to-and-from-String
7. Advanced usage
7.1 NOP functions
There are scenarios where all we want is to NOP out some calls, be
it because it will trigger an undesired execution or because we want
to nullify some functionalbility. In this case, Frida offers us two ways
to actually do this using the replace API or memory patching. In this
example we will try to NOP KERNEL32.DLL’s CreateFileW.

7.1.1 Using the replace API


const CreateFileWPtr = Module.getExportByName(“kernelbase.dll”,
“CreateFileW”);

1 Interceptor.replace(CreateFileWPtr, new NativeCallback((l\


2 pFileName, dwDesiredAccess, dwShareMode, lpSecurityAttrib\
3 utes, dwCreationDisposition, dwFlagsAndAttributes, hTempl\
4 ateFile) {}, ‘int’, [‘pointer’, ‘int’, ‘int’, ‘int’, ‘int\
5 ’, ‘int’, ‘pointer’]));

In essence, what we are doing is grabbing the pointer to CreateFileW


and replacing it with an empty function body; however, this adds
quite an overhead if we are only NOPing.
With memory patching and a different API:
Advanced usage 81

1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <fcntl.h>
4
5 int
6 main(int argc, char *argv[])
7 {
8 int fd;
9 fd = open("code.dat", O_RDONLY);
10 if (fd == -1)
11 {
12 fprintf(stderr, "file not found\n");
13 }
14
15 return 0;
16 }

7.1.2 Patching memory

Memory.patchCode example in x86.


1 Memory.patchCode(openPtr, Process.pageSize, function (cod\
2 e)
3 {
4 var cw = new X86Writer(code, {pc: openPtr});
5 cw.putNopPadding(Process.pageSize);
6 cw.putRet();
7 cw.flush();
8 }
9 );

The Memory.patchCode API allows us to modify N bytes at X address


which is given as a NativePointer, this pointer must be writable for
us to modify it. In some systems such as iOS, the address pointer is
written to a temporary location before being mapped into memory so
beware of these caveats.
Patching is platform and architecture dependent, so be sure to use the
correct code generation writer (ARM, X86, AArch64, MIPS…) – For a
full list of CPU code writers refer to: https://frida.re/docs/javascript-
api/#x86writer
Advanced usage 82

To NOP functions or code blocks, memory patching is the desired


way of doing it, it generates less overhead for the target binary and
is cleaner but if you are not sure just go with the .replace() approach.

7.2 Memory scanning


Frida allows scanning memory for patterns if we provide a memory
range and the size we want to scan. The pattern must be a hex string
separated by spaces.
For example, the string “Frida rocks!” translates as "46 52 49 44 41
20 72 6f 63 6b 73 21".
It is also possible to use wildcards using the ? character. An example:
"46 52 49 44 41 20 ?? ?? ?? ?? ?? 21"

This pattern will match any “Frida _____!” pattern found in memory
and return it in a list.
Now, we will use our previous example that checks whether a file ex-
ists or not and we will ask it to search for a file named “Frida rocks!”.
We will use the Memory.scanSync to find any pattern containing frida
_____! in memory.

We first fire up our Frida REPL and get the information of the first
module:

1 [Local::a.out]-> bin = Process.enumerateModulesSync()[0]


2 {
3 "base": "0x10a87e000",
4 "name": "a.out",
5 "path": "/Users/fernandou/Desktop/a.out",
6 "size": 16384
7 }

We now have the bin variable that stores the base address of the
module, path, and size. Once we have this information, we can scan
using the previous pattern string:
Advanced usage 83

1 [Local::a.out]-> Memory.scanSync(bin.base, bin.size, "46 \


2 52 49 44 41 20 ?? ?? ?? ?? ?? 21")
3 [
4 {
5 "address": "0x102ae5fa0",
6 "size": 12
7 }
8 ]

The scanSync API returns us a single match at the address


0x102ae5fa0 and the size of the match is 12 bytes. If we want to see
what is at that address, we can do so by using hexdump:

It is also possible to use partial wild cards instead of ??, you can use
a single ? to pair it up: 46 2? 21.

7.2.1 Reacting on memory patterns


One of the applications of the Memory.scan API is reacting whenever a
match is found. This is a useful feature specially when the user wants
to modify data on the fly. Unlike in the previous section where the
API was used to identify the address that matched the pattern this
time we will modify the matched pattern to change the flow of the
application.
To demonstrate the power of this feature, the following program will
be used:
Advanced usage 84

1 #include <stdio.h>
2 #include <time.h>
3 #include <unistd.h>
4
5 struct keyPress {
6 int key_type;
7 int timestamp;
8 int scan_code;
9 int virtual_scan_code;
10 };
11
12 void guess_pressed_key(struct keyPress* p)
13 {
14 printf("key_type: %d scan_code: %d\n", p->key_typ\
15 e, p->scan_code);
16 sleep(5);
17 if(p->scan_code == 52)
18 {
19 printf("arrow up\n");
20 }
21
22 if(p->scan_code == 51)
23 {
24 printf("arrow right\n");
25 }
26 }
27
28 int main()
29 {
30 struct keyPress kp;
31 kp.key_type = 301;
32 kp.timestamp = (int)time(NULL);
33 kp.scan_code = 52;
34 kp.virtual_scan_code = 52;
35 printf("%p\n", guess_pressed_key);
36 guess_pressed_key(&kp);
37 return 0;
38 }

The aforementioned code takes a simple struct and prints “arrow up”
or “arrow right” depending on the value of scan_code which is an
int member of a struct. The main idea behind this example is to get
Advanced usage 85

the program to print “arrow right” by modifying memory.


The struct is composed by four integers which means that in memory
each member is offseted by 4 bytes each. Although the timestamp
member is always random, it is possible to guess each value when
dumping the memory (using hexdump):

2d 01 00 00 is the first member, 301. 24 8a 46 62 is the second


member which is a timestamp and the remaining two members are
34 00 00 00, 34 00 00 00 meaning 52 each. Now that we have
a clear idea of how this information is represented in Memory we
can use the Memory.scan API to react whenever this pattern is seen
in memory and modify it on the fly:
2d 01 00 00 ?? ?? ?? ?? 33 00 00 00 33 00 00 00

Where ?? is used to match any value, allowing us to get a match even


if the timestamp is always changing. Now the Memory.scan API can
be used match the pattern and replace it:

1 // get the latest rw- range.


2 const module = Process.enumerateRangesSync('rw-').pop();
3
4 Memory.scan(module.base, module.size, '2d 01 ?? ?? ?? ?? \
5 ?? ?? 34 ?? ?? ?? 34 ?? ?? 00', {
6 onMatch(address, size) {
7 console.log("Pattern matched @ " + address);
8
9 address.writeByteArray([
10 0x2d, 0x01, 0x00, 0x00, 0x00,
11 0x00, 0x00, 0x00, 0x33, 0x00,
12 0x00, 0x00, 0x33, 0x00, 0x00,
13 0x00
14 ]);
15 }
16 });

Whenever there is a match, the address is printed and then the


Advanced usage 86

method .writeByteArray is called on address to write the bytes that


trigger “arrow right”. When running this script in Frida it delivers the
following output:

1 Spawned `a.out`. Resuming main thread! \


2
3 key_type: 301 scan_code: 52
4 [Local::a.out ]-> Pattern matched @ 0xffb57990
5 arrow right
6 Process terminated

7.3 Using custom libraries (DLL/.so)


There might be scenarios when using custom libraries is required
be it because there are functions in the library that are useful in
our instrumentation code hence it is interesting to call them from
our instrumentation code or because there are already replacements
written in the library for the functions are going to be instrumented
(essentially, to avoid reinventing the wheel). For this use case, Frida
offers the Module.load method.
Module.load allows to load an external library into our instrumen-
tation session, once loaded it behaves as a regular module in Frida
meaning it has access to Module’s methods like findExportByName,
enumerateExports, enumerateImports… etc.

To illustrate how this works, the following C program is used:

1 #include <stdio.h>
2 #include <stdlib.h>
3
4 int main() {
5 FILE *fp;
6 fp = fopen("file.txt", "w+");
7 fprintf(fp, "%s %s", "May the force", "be with you");
8 fclose(fp);
9 return 0;
10 }

The purpose of this example is to replace the function fopen using a


custom DLL instead of using Frida’s Interceptor.replace.
Advanced usage 87

7.3.1 Creating a custom DLL


The first step is having a custom DLL, if you know how this process
works then you can skip this subsection.
The DLL will have a single function that envelopes fopen and prints
the filepath argument, therefore what is needed is our libtest.c
file containing the function my_fopen:

1 #include <stdlib.h>
2 #include <stdio.h>
3
4 FILE *my_fopen(const char *filename, const char *mode) {
5 printf("lib: %s\n", filename);
6 return fopen(filename, mode);
7 }

And a separate libtest.h with the my_fopen declaration:

1 #include <stdlib.h>
2 #include <stdio.h>
3
4 FILE *my_fopen(const char *filename, const char *mode);

Once these files are created, then the only remaining task is using
clang to create a shared library:

$ clang -shared -undefined dynamic_lookup -o libtest.so


libtest.c

If everything went well, there should be a libtest.so shared library


file in your current folder:

1 file libtest.o
2 libtest.o: ELF 64-bit LSB shared object, x86-64, version \
3 1 (SYSV), dynamically linked, not stripped

7.3.2 Using our custom library


Once the custom library is created it can be used from Frida. To
illustrate this example everything is done in Frida’s REPL with the
Advanced usage 88

aforementioned program, no special arguments are required. This


workflow requires us to first load our custom DLL, then obtain a
pointer to our custom function and create a NativeFunction from this
function so that it can be used from our code. Then the final step is
replacing the original function with the custom one.
When inside the command line, the first step is loading our custom
library so Module.load provides this functionality:

1 [Local::a.out]-> myModule = Module.load('/home/lazarus/li\


2 btest.so')
3 {
4 "base": "0x7f0cf409c000",
5 "name": "libtest.so",
6 "path": "/home/lazarus/libtest.so",
7 "size": 20480
8 }

When Frida loads the module it returns a module object that now
operates as is. For example it is possible to enumerate the loaded
library exports:

1 [Local::a.out]-> myModule.enumerateExports()
2 [
3 {
4 "address": "0x7f0cf409d120",
5 "name": "my_fopen",
6 "type": "function"
7 }
8 ]

Our custom my_fopen function is at the address 0x7f0cf409d120, this


is the address that is needed when creating a NativeFunction to the
custom function:
myfopen = new NativeFunction(ptr("0x7f0cf409d120"),
'pointer', ['pointer', 'pointer']);

The return value is a pointer to a FILE object, so it is set as pointer


type and the arguments are pointers to const char as well. Note that
once the NativeFunction is created then it can be called from the
Advanced usage 89

instrumentation code as main time as desired. The final step is to call


Interceptor.replace and call the custom function instead:

1 Interceptor.replace(fopenPtr, new NativeCallback((pathnam\


2 e, mode) => {
3 return myfopen(pathname, mode);
4 }, 'pointer', ['pointer', 'pointer']))

As it can be seen the custom myfopen function is being called instead


of the regular fopen and the program will continue working as
intended. The effects of this replacement can be seen when running
the %resume command:

1 [Local::a.out]-> %resume
2 [Local::a.out]-> lib: file.txt
3 lib: /dev/urandom

The custom library function correctly prints the values of the first
argument.

7.4 Reading and writing registers


Frida has code writers to generate machine code directly to memory
at a specific address for x86, x64 and ARM. We can use this to write
instructions directly to memory at a given address as we have seen
before when NOPing instructions.
In order to see how this works we have this easy program to test with:
Advanced usage 90

1 #include <stdio.h>
2
3 int add(int a, int b) {
4 return a + b;
5 }
6
7 void
8 main()
9 {
10 printf("result: %d", add(10, 20));
11 }

This program simply calls the function add(int a, int b) and returns
the sum of them.
Say we are on ARM, once this function is called a is stored in the x0
register and b is stored in the x1 register. We can quickly check this
is true by writing the following script:

1 const addPtr = Module.getExportByName(null, "add");


2
3 Interceptor.attach(addPtr, {
4 onEnter (args) {
5 console.log('x0:' + this.context.x0.toInt32());
6 console.log('x1:' + this.context.x1.toInt32());
7 }
8 });

Which returns after executing it with frida -l script.js -f a.out:

1 Spawned `a.out`. Resuming main thread!


2 result: 30
3 x0:10
4 x1:20
5 [Local::a.out]-> Process terminated

Although we could simply use args[0] and args[1] to modify the


values, we will use the code writers instead. In this case, we need the
Arm64Writer to modify the x0 register:
Advanced usage 91

1 const addPtr = Module.getExportByName(null, "add");


2 Memory.patchCode(addPtr, Process.pointerSize, function (c\
3 ode) {
4 //const cw = new Arm64Writer(code, { pc: addPtr });
5 const cw = new Arm64Writer(code, { pc: addPtr });
6 cw.putLdrRegU64('x0', 1337);
7 cw.putRet();
8 cw.flush();
9 });

Since we are loading using the LDR instruction the number 1337 into
the x0 register when we add a RET instruction the caller will use
whatever is stored in x0 as the return value.
Now, we can run the script:

1 Spawned `a.out`. Resuming main thread!


2 result: 1337
3 Process terminated

And you can see we have easily modified the function in memory!
For a complete list of methods available, refer to:
https://frida.re/docs/javascript-api/#arm64writer

7.5 Reading structs


We are able to read function arguments with Frida using the
args:NativePointer[] array. However, this is not possible with
arguments that are not simple types such as structs.
Where can we find structs? We can find structs in the Unix time
libraries for example, or more importantly in Windows’s APICALLs
such as the ones in NTDLL.
Stages:
1. Understanding and reading a user-controlled struct.
2. Reading a UNIX syscall structure.
3. Reading a Windows NTDLL structure.
Advanced usage 92

7.5.1 Reading from a user-controlled struct.


Given this declaration:
void print_struct(myStruct s)

We want to log each different member of s. As we can see, the only


thing that we have is s and we can’t apply any Frida API method such
as .readInt() or .readCString(). We need to first gather the offsets
of the struct to be sure what we are trying to read.
myStruct corresponds to the following:

1 struct myStruct
2 {
3 short member_1;
4 int member_2;
5 int member_3;
6 char *member_4;
7 } sample_struct;

In order to gather the offsets we need to figure out the sizes of each
type, a short list:

1 {
2 "short": 4,
3 "int": 4,
4 "pointer": Process.pointerSize,
5 "char": 1,
6 "long": Process.pointerSize,
7 "longlong": 8,
8 "ansi": Process.pointerSize,
9 "utf8": Process.pointerSize,
10 "utf16": Process.pointerSize,
11 "string": Process.pointerSize,
12 "float": 4,
13 };

So what we can see is that short has a size of 4, longlong a size of 8,


char is 1 but then there’s Process.pointerSize for the ansi, string and
pointer ones. The reason for this is that size of these types is process
Advanced usage 93

dependent on its architecture, it’s variable hence we need to take this


information into account.
It’s important to note that we can always read the first member
without any major issues, because the offset of it is 0.
So, what are the offsets of the previous structure?

1 struct myStruct
2 {
3 short member_1; // 0x0 (4 bytes)
4 int member_2; // 0x4 (4 bytes)
5 int member_3; // 0x8 (4 bytes)
6 char *member_4; // 0x12 (8 bytes)
7 } sample_struct;

How can we check this is true for each type? We can compile a test
program and get these values from sizeof().
So, now we have the offsets of the structure and we want to read each
value. In this case we will use the .add() operator.
.add() as the name says adds an offset to a given NativePointer.
Therefore, we can place our pointer in the desired offset to read each
value:

1 // Given s = args[0]:NativePointer
2
3 s.readShort() // 1st member.
4 s.add(4).readInt() // 2nd member.
5 s.add(8).readInt() // 3rd member.
6 s.add(12).readPointer().readCString(); // 4th member.

This way we will have obtained the values for each structure offset.
Next, we will try to parse a linux SYSCALL struct.

7.6 SYSCALL struct


For this example we will be using a known linux SYSCALL named
gettimeofday.
Advanced usage 94

MAN page for gettimeofday: https://man7.org/linux/man-


pages/man2/gettimeofday.2.html
We have the following declaration:

1 int gettimeofday(struct timeval *tv, struct timezone *tz);

From this we can quickly figure out that timeval and timezone are
two structs. And we cannot check what these values are by simply
using Frida’s API.
The timeval struct is:

1 struct timeval {
2 time_t tv_sec; /* seconds */
3 suseconds_t tv_usec; /* microseconds */
4 };

The time_t size is even dependent on the API level you are
targeting in Android systems. Do not forget to get it’s size with
Process.PointerSize()

And the timezone struct is:

1 struct timezone {
2 int tz_minuteswest; /* minutes west of Greenwich \
3 */
4 int tz_dsttime; /* type of DST correction */
5 };

For this example we will write a simple command and compile it with
clang:
Advanced usage 95

1 #include <sys/time.h>
2 #include <stdio.h>
3
4 int
5 main()
6 {
7 struct timeval current_time;
8 gettimeofday(&current_time, NULL);
9 printf("seconds : %ld\nmicro seconds : %ld\n",
10 current_time.tv_sec, current_time.tv_usec);
11
12 printf("%p", &current_time);
13 getchar();
14 return 0;
15 }

And run: clang -Wall program.c. The expected output should be:

1 pala@jkded:~/code$ ./a.out
2 seconds : 1601394944
3 micro seconds : 402896
4 0x7fff4a1f8d48

So, given this we will try to access the time_t structure given
0x7fff4a1f8d48 is the structure pointer:

1 [Local::a.out]-> structPtr = ptr("0x7fff0b9a3118")


2 "0x7fff0b9a3118"
3 [Local::a.out]-> structPtr.readLong()
4 "1601395177"
5 [Local::a.out]-> structPtr.add(8).readLong()
6 "439353"

As we can see, the first member is already at offset 0, however we


need to get the process pointer size to guess the next offset:

1 [Local::a.out]-> Process.pointerSize
2 8

Now that we know that the pointerSize is 8, we can infer that long’s
size will be 8 bytes and place ourselves in the right offset.
Advanced usage 96

7.7 WINAPI struct.


There are a lot of structures in the Windows API and therefore we
need to be confident in our structure parsing skills. We can find these
structures in NTDLL calls to represent strings such as UNICODE_STRING
and other structures such as the SYSTEMINFO one.
For this example we will take a look at the WINAPI call GetSystemInfo
that takes a LPSYSTEM_INFO structure as an argument. And this is what
a LPSYSTEM_INFO struct looks like:

1 typedef struct _SYSTEM_INFO {


2 union {
3 DWORD dwOemId;
4 struct {
5 WORD wProcessorArchitecture;
6 WORD wReserved;
7 } DUMMYSTRUCTNAME;
8 } DUMMYUNIONNAME;
9 DWORD dwPageSize;
10 LPVOID lpMinimumApplicationAddress;
11 LPVOID lpMaximumApplicationAddress;
12 DWORD_PTR dwActiveProcessorMask;
13 DWORD dwNumberOfProcessors;
14 DWORD dwProcessorType;
15 DWORD dwAllocationGranularity;
16 WORD wProcessorLevel;
17 WORD wProcessorRevision;
18 } SYSTEM_INFO, *LPSYSTEM_INFO;

Wow! Quite a complicated struct that we have here right? Let’s first
find the size of each offset, especially the ones that can be troublesome
such as LPVOID.
On a Windows 10 64-bit system compiled for 32-bit under Visual
C++ we get the following values:
Advanced usage 97

Type Size
WORD 2
DWORD 4
DWORD_PTR 4
LPVOID 4

We can check this is true by calling Process.pointerSize() in an


attached process:

1 [Local::ConsoleApplication2.exe]-> Process.pointerSize
2 4

Beware that these numbers will change if compiled on 64 bit:

Type Size
WORD 2
DWORD 4
DWORD_PTR 8
LPVOID 8

Beware that compilers may align the stack so ALWAYS be careful


when calculating offset.s
Once we have these values, we can infer the offset for each member.
Don’t be afraid of the union keyword, it won’t be affecting our
calculations for the time being.
Getting all the values is out of the scope of this part, so we will getting
some of them as an example:

1 dwPageSize
2 lpMinimumApplicationAddress
3 dwNumberOfProcessors

Complete offset list:


Advanced usage 98

1 typedef struct _SYSTEM_INFO {


2 union {
3 DWORD dwOemId; // offset: 0
4 struct {
5 WORD wProcessorArchitecture;
6 WORD wReserved;
7 } DUMMYSTRUCTNAME;
8 } DUMMYUNIONNAME;
9 DWORD dwPageSize; // offset: 4
10 LPVOID lpMinimumApplicationAddress; // offset: 8
11 LPVOID lpMaximumApplicationAddress; // offset: 12
12 DWORD_PTR dwActiveProcessorMask; // offset: 16
13 DWORD dwNumberOfProcessors; // offset: 20
14 DWORD dwProcessorType; // offset: 24
15 DWORD dwAllocationGranularity; // offset: 28
16 WORD wProcessorLevel; // offset 32
17 WORD wProcessorRevision; // offset 34
18 } SYSTEM_INFO, *LPSYSTEM_INFO;

And this is the example program that we will be using to test our
guesses:

1 #include <iostream>
2 #include <Windows.h>
3 int main()
4 {
5 SYSTEM_INFO sysInfo ;
6 GetSystemInfo(&sysInfo);
7 printf("%p", &sysInfo);
8 getchar();
9 }

Now that we have the complete offset list, we can know get
the values of dwPageSize, lpMinimumApplicationAddress, and
dwNumberOfProcessors respectively:
Advanced usage 99

1 [Local::ConsoleApplication2.exe]-> sysInfoPtr.add(4).read\
2 Int()
3 4096
4 [Local::ConsoleApplication2.exe]-> sysInfoPtr.add(8).read\
5 Int()
6 65536
7 [Local::ConsoleApplication2.exe]-> sysInfoPtr.add(20).rea\
8 dInt()
9 8

7.8 Tips for calculating structure


offsets
The hardest part of interacting with offsets in Frida is calculating each
one and that’s usually what takes most of the time, but there are some
tricks that can be used.
If the structure that is trying to be calculated is documented such as
GetSystemInfo the values can be figured out by checking the type
and the architecture, then inspecting what this value really means
(DWORD means 4 bytes). It must always be taken into account that
the size of pointer types change based on the program architecture.
Instead of reading the source, another trick is to simply use the
function sizeof over a data type to get the sizes of some data types:
printf("%d", sizeof(DWORD));

An alternative approach, which is limited to when there’s access to


the source, is leveraging clang’s memory layout feature to get the
complete offset calculation of a struct. For example MSDN’s __stat
API is defined as:

1 int _stat(
2 const char *path,
3 struct _stat *buffer
4 );

With clang, we can get the record layout with two steps:
Advanced usage 100

clang -E [-I] test.c > ptest.c

Which will generate a file that can be later used with the -cc1
parameter:
clang -cc1 -fdump-record-layouts ptest.c

And generate us the offsets for each struct member:

With this information if we are interested in obtaining the offset of


the member st_size, by checking the above picture the offset should
be 20 compiled as a 64-bit application under clang.
Advanced usage 101

In some cases, it is required to add an extra parameter


to enable support for __declspec attributes:
-fms-extensions

clang -cc1 -fdump-record-layouts -fmx-extensions ptest.c

7.9 CModule
The CModule API allows us to pass a string of C code and compile it
to machine code in memory. It is important to note however that this
feature compiles under tinycc¹ and thus is somewhat limited.
CModule is useful to implement functions that need to run in the
highest performance mode. It is also useful to implement hot call-
backs for Interceptor and Stalker with the objective of increasing
performance or easier interaction with C objects and pointers.
CModule syntax:
new CModule(source, [, symbols])

Source is the string containing C code and symbols is an object


where it is possible to specify additional symbol names and their
NativePointer values.
It is recommended to define:

1 void init(void)
2 void finalize(void)

As methods for initialisation and memory clean-up. We can make use


of the method .dispose() of a CModule object when we want to GC
in case you don’t want it to be destroyed during script unload.

¹https://bellard.org/tcc/
Advanced usage 102

1 const openImpl = Module.getExportByName(null, 'open');


2
3 Interceptor.attach(openImpl, new CModule(`
4 #include <gum/guminterceptor.h>
5 #include <stdio.h>
6
7 void
8 onEnter (GumInvocationContext * ic)
9 {
10 const char *path;
11 path = gum_invocation_context_get_nth_argument (ic, 0\
12 );
13 printf ("open() path=\\"%s\\"\\n", path);
14 }
15 `));

In this example what we are doing is to instrument the open()


function using CModule. We replace our JavaScript callbacks with
our C code.
We need to include the frida-gum library and the standard library.
The void onEnter(GumInvocationContext *ic) is a method that
gum recognizes and offers us the InvocationContext (information
available when the function was called but not executed yet.)
With this InvocationContext we can call the gum_invocation_-
context_get_ngth_argument(ic, N) where N is 0 in this case to get
the first argument. We can then print the value using <stdio.h>’s
printf function to screen.

This however defeats the purpose of writing instrumentation code in


JavaScript so use it when you really need performance or for more
complex tasks.

7.9.1 CModule: A practical use case


In this example we are going to work with the UNIX library
<sys/time.h> with the same struct as we have seen before (timeval).

In this use case our aim is to be able to read the timeval structure
with ease, however as we mentioned before we do not have access to
Advanced usage 103

libs out of tinycc but it is possible to pass to CModule as an argument


{toolchain: “external”} so that it is able to work with system libs
but it is important to note that as of the time of writing this is only
supported in MacOS and (some)Linux systems – I tested this under
MacOs 11.1 and Debian 10.
This is a ‘hidden’ argument (in the sense that there is not much
documentation, you would have to read test cases to know it exists)
that you can use and is indeed very useful and this leaves us with the
following syntax when creating CModule objects:

1 new CModule(`c_code_goes_here`, symbols, {toolchain: ‘ext\


2 ernal|internal|any’});

Alright, so we have the same program as the SYSCALL structure part.


We are going to use that as a base program to test this feature out.
What we want to achieve with this example:

• Replicating the onEnter behaviour


• Saving InvocationStates within callbacks (this is, sharing argu-
ments, thread states, etc…)
• Printing the first parameter of the struct tv_secs on the onLeave
callback.

This example uses void * arg which is not recommended and


instead gpointer should be used, but I think this is a more familiar
use case.
Advanced usage 104

1 #include <gum/guminterceptor.h>
2 #include <stdio.h>
3 #include <sys/time.h>
4
5 typedef struct _IcState IcState;
6 struct _IcState
7 {
8 void * arg;
9 };
10
11 void onEnter(GumInvocationContext *ic){
12 IcState * is = GUM_IC_GET_INVOCATION_DATA(ic, IcState);
13 is->arg = gum_invocation_context_get_nth_argument(ic, 0\
14 );
15 printf("%p\\n", is->arg);
16 }
17
18 void onLeave(GumInvocationContext * ic)
19 {
20 IcState * is = GUM_IC_GET_INVOCATION_DATA(ic, IcState);
21 printf("%p\\n", is->arg);
22 struct timeval * t = (struct timeval*)is->arg;
23 printf("timeval: %ld\\n\\n", t->tv_sec, t->tv_usec);
24 }

Notes: This way of coding is a bit different to standard programs


because we want to operate with callbacks.
We use \n with two backslashes because this string is inside a
JavaScript multiline one.
First we include <gum/guminterceptor.h> so that we are able to access
the onEnter and onLeave callbacks.
We also need to store the InvocationState between callbacks, so we
are creating a struct named IcState that stores a single member:
void *arg;

Now we have the onEnter callback in C:


void onEnter(GumInvocationContext)

With this context we are able to use auxiliary functions of the gum
API such as GUM_IC_GET_INVOCATION_DATA which we will use for ini-
Advanced usage 105

tializing the struct and gum_invocation_context_get_nth_argument


to get the argument.
We store the first argument in the IcState struct:
is->arg = gum_invocation_context_get_nth_argument(ic, 0);

and then we are able to use it in our onLeave callback, but first we
need to cast the argument so that we are able to use the struct:
struct timeval * t = (struct timeval*)is->arg;
And then we are able to access the timeval struct argument with
t.tv_secs and t.tv_usecs.

And this should be the expected output:

1 [Local::a.out]-> %resume
2 [Local::a.out]-> cmodule struct pointer: 0x7ffd8e826e00
3 Myprogram struct pointer 0x7ffd8e826e00
4 cmodule timeval: 1612343654
5 cmodule usec: 263111
6 myprogram seconds : 1612343654
7 myprogram micro seconds : 263111

7.9.2 CModule: Reading return values


It is possible to read the return value of an instrumented function
from CModule. We will see now a brief example on how to do it:

1 void
2 onLeave(GumInvocationContext * ic)
3 {
4 int retval;
5 retval = (int) gum_invocation_context_get_return_valu\
6 e(ic);
7
8 printf("=> return value=%d\\n", retval);
9 }

This example assumes that the return value is an integer but there is
however a cleaner way to solve this:
Advanced usage 106

1 void
2 onLeave(GumInvocationContext * ic)
3 {
4 const int retval = GPOINTER_TO_INT(gum_invocation_con\
5 text_get_return_value(ic));
6
7 printf("=> return value=%d\\n", retval);
8 }

See GPOINTER_TO_INT? What gum_invocation_context_get_return_-


value is returning is not the return value itself but a pointer (which is
why we always have NativePointers when working in JS) - And we
need to cast it to an integer be it with (int) or using the GPOINTER_-
TO_INT API
Which translates into the same result but stays always in sync with
frida-gum’s API.

7.9.3 CModule vs JavaScript agent


performance
Once the instrumentation agent is written, there are situations where
the agent reaches a practical state but is still too slow due to JavaScript
VM exits or simply heavy workloads (networking, memory, file
operations…). For these scenarios, it is important to take into account
the performance upgrade of CModule.
To test how performant CModule is against the same instrumentation
script in JavaScript let’s use the following C program:

1 #include <stdio.h>
2 #include <math.h>
3 #include <stdlib.h>
4 #include <time.h>
5
6 double local_sqrt(double a) {
7 return sqrt(a);
8 }
9
10 int main() {
Advanced usage 107

11 clock_t t;
12 t = clock();
13 for(int i = 0; i < 100000; i++) {
14 local_sqrt((double)i);
15 }
16 t = clock() - t;
17 double total_time = (double)t / CLOCKS_PER_SEC;
18 printf("Time ellapsed: %f", total_time);
19
20 return 0;
21 }

Compiled with $ clang -lm main.c

This program just takes a number from the for iteration and cal-
culates its square root. When executed without instrumentation it
takes 0.002 seconds to complete.
Now, to test how instrumentation affects performance the following
instrumentation script is used:

1 const localSqrtPtr = ptr(0x401140);


2
3 Interceptor.attach(localSqrtPtr, {
4 onLeave: function(retval) {
5 console.log(retval);
6 }
7 });

This script simply instruments the double localSqrtPtr(double)


function and prints the return value on screen. When executing this
script using QuickJS runtime, it takes 5.8 seconds to complete. With
JavaScript’s V8 as runtime it takes 5.321 seconds.
On the other hand, let’s see what happens when CModule is used for
the same purpose:
Advanced usage 108

1 const localSqrtPtr = ptr(0x401140);


2
3 Interceptor.attach(localSqrtPtr, new CModule(`
4 #include <stdio.h>
5 #include <gum/guminterceptor.h>
6
7 void onLeave(GumInvocationContext * ic)
8 {
9 double fd;
10 fd = (double) gum_invocation_context_get_return_v\
11 alue(ic);
12 printf("cmodule: %.2lf\n", fd);
13 }
14 `));

When executed using CModule in the instrumentation script and


printing all the arguments, it takes 1.3 seconds to complete.
The difference is very noticeable thus when writing instrumentation
scripts it is recommended to first write all the logic in JavaScript
and see how the instrumented target performs. If this performance
is sufficient for the task there is no need for further optimizations,
but in case that more performance is needed CModule provides a new
world for optimizing tasks.

It is important to notice that VM exits performance slowdown is


paid for every transaction. This means that whenever an onEnter
or onLeave callback finishes and the instrumentation script tries
to access a variable from CModule that performance slowdown
is paid and hinders the performance capabilities of CModule. In
the previously shown example all the functionality is contained
within CModule(it is not returning values to the JavaScript side)
and thus the performance gain is significant.
Advanced usage 109

7.9.4 CModule: Sharing state between JS and


C
When instrumenting some binaries eventually we might get across
some calls that are a hotspot, this means that they are being called
too many times per second and are paying a high performance toll
for being instrumented using the JS side of Frida. Frida allows us
to instrument code using CModule and only accessing the required
values of the instrumented function whenever it is needed, reducing
the toll on the instrumented binary performance while still allowing
the user to keep their JS instrumentation code.
To do this we will use the previous program that repeatedly calls the
sqrt function and share the return value with our JS code. To prepare
for this scenario the first step in our JS code is to allocate a buffer to
share with our CModule:
const sqrtReturnPtr = Memory.alloc(4);

This creates a NativePointer that is going to be shared with our


CModule code. Our CModule code then looks like this:

1 const myCm = new CModule(`


2 #include <gum/guminterceptor.h>
3 extern double sqrtReturnPtr;
4 void onLeave(GumInvocationContext * ic)
5 {
6 double result;
7 result = (double)gum_invocation_context_get_return_va\
8 lue(ic);
9 sqrtReturnPtr = result;
10 }
11 `, {sqrtReturnPtr})

The C code has an extern variable declared which is shared between


our JS instrumentation code and our C Code. Our JS code sees this
variable as a NativePointer and only pays the performance price
when accessing this variable.
To test this out, we are going to increase the size of the for loop to
ensure that the application takes longer to finish and call setTimeout
to get the current return value after 2 seconds:
Advanced usage 110

1 Interceptor.attach(localSqrtPtr, cm);
2
3 setTimeout(() => {
4 console.log("sqrt value after 2 seconds: " + sqrtReturn\
5 Ptr.readDouble());
6 }, 2000)

sqrtReturnPtr is a shared pointer between our CModule and our JS


code so in order to obtain the real value it is needed to call the
.readDouble API to get the value. The same goes for other datatypes:
int, char[], float…

Finally when instrumenting the aforementioned application this is


the output obtained:
[Local::a.out ]-> sqrt value after 2 seconds: 8915179
Shared state can also be done with the onEnter callback, or when
using NativeFunctions that interact with CModule’s code. Use it
wisely!

7.10 Sharing state between two


CModule objects
The previous example showed how to share state/variables between
our JS code and the C code, but what if there are two different
functions to instrument and need to share their state? This can be
done by using the second parameter of the CModule constructor as
seen before. For this example the same code as the previous section
is reused. This time however, an extra function is added:
Advanced usage 111

1 const cmFunction = new CModule(`


2 #include <stdio.h>
3
4 extern double sqrtReturnPtr;
5
6 void printCurrentValue()
7 {
8 printf("sqrt current value: %d", sqrtReturnPtr);
9 }`, {sqrtReturnPtr});

This code exposes the function void printCurrentValue() that prints


the current shared value of sqrtReturnPtr. However, to be able to call
this function from our JS code a NativeFunction is required:

1 const printCurrentValue = new NativeFunction(cmFunction.p\


2 rintCurrentValue, 'int', []);

cmFunction.printCurrentValue returns the pointer to the function,


and the NativeFunction constructor replicates its definition returning
a callable function from JS. The aforementioned code that calls
.setTimeout can then be replaced with a call to our CModule func-
tion:

1 setTimeout(() => {
2 printCurrentValue();
3 }, 1000);

And then when instrumenting our application it shows the following


input:
sqrt current value: 861554496

7.10.1 Notifying from C code


Another use case when using CModule might be allowing the C code
to work on its own and only report feedback to JS when needed. This
can be done by passing a NativeCallback when creating a CModule
and calling this function from C which triggers the NativeCallback
on the JS side.
Advanced usage 112

To illustrate this example the previous square root example is going


to be reused with the purpose of notifying the JS code only when
the result of the square root operation modulo 10000 is 0. The for
loop now iterates up to 100000 the notification should only arrive 9
times in total. The first step is adding to the CModule code an extern
declaration of the function that will notify the JS code:
extern void notify_from_c(const double * value);

This function can now be called from the CModule side this way
notify_from_c(&value);. The next step is adding in the JS side
the CModule symbols a callback that receives the value from the
notify_from_c function and acts on it. This is done by expanding
the symbols argument in the CModule constructor:

1 const cm = new CModule(`/* code goes here*/`, {


2 sqrtReturnPtr,
3 notify_from_c: new NativeCallback(notifyPtr => {
4 const notifyValue = notifyPtr.readDouble();
5 console.log('cmodule notify_from_c:' + notifyValue);
6 }, 'void', ['pointer'])
7 });

With this set the onLeave callback in our CModule will call the
notify_from_c function whenever the square root value modulus of
10000 is zero:

1 const cm = new CModule(`


2 #include <gum/guminterceptor.h>
3
4 extern double sqrtReturnPtr;
5 extern void notify_from_c(const double * value);
6
7 void onLeave(GumInvocationContext * ic)
8 {
9 double result;
10 result = (double)gum_invocation_context_get_return_\
11 value(ic);
12 sqrtReturnPtr = result;
13 if ((int)sqrtReturnPtr % 1337 == 0)
14 {
Advanced usage 113

15 notify_from_c(&sqrtReturnPtr);
16 }
17 }
18 `, {
19 sqrtReturnPtr,
20 notify_from_c: new NativeCallback(notifyPtr => {
21 const notifyValue = notifyPtr.readDouble();
22 console.log('notification from C code: ' +
23 notifyValue);
24 }, 'void', ['pointer'])
25 });

When executing this script against the target program we get the
following output:

1 [Local::a.out ]-> notification from C code, value: 0


2 notification from C code, value: 10000
3 notification from C code, value: 20000
4 notification from C code, value: 30000
5 notification from C code, value: 40000
6 notification from C code, value: 50000
7 notification from C code, value: 60000
8 notification from C code, value: 70000
9 notification from C code, value: 80000
10 notification from C code, value: 90000
11 Time ellapsed: 0.024194

It is interesting to make use of these notifications whenever the C


code can mostly do work on its own and only sending data back to
JS or acting on the received data is required.

7.11 CModule boilerplates


Until now we only were exposed to the GUM APIs I have shown
you in the example, this is due to the fact that when this was written
CModule had still to be written checking against source code for types
and functions, however since frida 14.2.12 it is possible to generate a
boilerplate for CModule with the most commonly used methods:
frida-create cmodule|agent
Advanced usage 114

• agent: creates a boilerplate of a TypeScript agent.


• cmodule: Creates a boilerplate of a CModule, which includes all
the built-in headers and so an external toolchain can be used.
This adds support for code-completion in your editor of choice.

Be careful, because this command creates the boilerplate in the


current working directory!
Once you execute frida-create cmodule, this is what you should get
in a boilerplate:
frida-create cmodule output.

1 $ ls
2 include/ meson.build test.c
3 $ ls include/
4 capstone.h glib.h gum/ json-glib/ platform.\
5 h x86.h
6 $ ls include/gum/
7 arch-x86/ guminterceptor.h gummetalarray.h gum\
8 modulemap.h gumspinlock.h
9 gumdefs.h gummemory.h gummetalhash.h gum\
10 process.h gumstalker.h

And the .c file should look like this:


CModule boilerplate.

1 #include <gum/guminterceptor.h>
2
3 static void frida_log (const char * format, ...);
4 extern void _frida_log (const gchar * message);
5
6 void
7 init (void)
8 {
9 frida_log ("init()");
10 }
11
12 void
13 finalize (void)
14 {
Advanced usage 115

15 frida_log ("finalize()");
16 }
17
18 void
19 on_enter (GumInvocationContext * ic)
20 {
21 gpointer arg0;
22
23 arg0 = gum_invocation_context_get_nth_argument (ic, 0);
24
25 frida_log ("on_enter() arg0=%p", arg0);
26 }
27
28 void
29 on_leave (GumInvocationContext * ic)
30 {
31 gpointer retval;
32
33 retval = gum_invocation_context_get_return_value (ic);
34
35 frida_log ("on_leave() retval=%p", retval);
36 }
37
38 static void
39 frida_log (const char * format,
40 ...)
41 {
42 gchar * message;
43 va_list args;
44
45 va_start (args, format);
46 message = g_strdup_vprintf (format, args);
47 va_end (args);
48
49 _frida_log (message);
50
51 g_free (message);
52 }

We can now have the basic methods include or modify them to suit
our needs, we also have access to GumInvocationContext members
Advanced usage 116

and type-checking.
To build the CModule, the following commands are required:
$ meson build && ninja -C build

With the cmodule.so file generated, it can be injected to our target


process via:
frida -C cmodule.so <PID>

7.12 Stalker
Stalker is a code tracing engine which allows following threads and
capture every function, block and instruction being called. Explaining
how a code tracer works is out of the scope of this book, however if
you are interested you can read about the anatomy of a code tracer².
It is possible to run stalker directly using C (via frida-gum) but we
will focus on using it from JS. This is the basic syntax of Stalker (to
follow what is happening on a thread):
Stalker.follow([threadId, options])

Where threadId is the thread id we want to follow and options is for


enabling events to trace:

1 events: {
2 call: true, // CALL instructions: yes please
3 ret: false, // RET instructions
4 exec: false, // all instructions
5 block: false, // block executed: coarse execution tra\
6 ce
7 compile: false // block compiled: useful for coverage
8 }

Only use the exec option when you are sure you need it because it
takes a huge impact on performance and it is a lot of data to digest
for Frida.
²https://medium.com/@oleavr/anatomy-of-a-code-tracer-b081aadb0df8
Advanced usage 117

7.12.1 Getting a thread id


As we have seen before, we need to get a thread identifier to use
Stalker, we will see how to get one:
Obtaining the process’ thread list via Process.enumerateThreadsSync()
returns a list of threads:

1 [
2 {
3 "context": {
4 "pc": "0x113341568",
5 "r10": "0x10f363000",
6 "r11": "0x246",
7 "r12": "0x10f363578",
8 "r13": "0x0",
9 "r14": "0x1133e3298",
10 "r15": "0x1133eb070",
11 "r8": "0x31",
12 "r9": "0x0",
13 "rax": "0x1133c7132",
14 "rbp": "0x7ffee089c8d0",
15 "rbx": "0x3722d28603514",
16 "rcx": "0x10f363000",
17 "rdi": "0x1133e46e0",
18 "rdx": "0x0",
19 "rip": "0x113341568",
20 "rsi": "0x4",
21 "rsp": "0x7ffee089bab8",
22 "sp": "0x7ffee089bab8"
23 },
24 "id": 1031,
25 "state": "waiting"
26 }
27 ]

From within a instrumented function using this.threadId:


Advanced usage 118

1 Interceptor.attach(myInstrumentedFunction, {
2 onEnter (args) {
3 Stalker.follow(this.threadId, {
4 // ...
5 });
6 // ...
7 }
8 onLeave (retval) {
9 Stalker.unfollow(this.threadId);
10 }
11 });

In case that you want to follow the thread where an instrumented


function is called, the second method is the preferred one.

7.12.2 Stalker: Tracing from a known function


call
Now, we will see the Stalker engine in action. For this example, we
will use a basic program that tries to open a file:

1 #include <stdio.h>
2 #include <stdlib.h>
3 #include <fcntl.h>
4 #include <unistd.h>
5
6 int
7 main(int argc, char *argv[])
8 {
9 pause();
10 int fd;
11 fd = open("code.dat", O_RDONLY);
12 if (fd == -1)
13 {
14 fprintf(stderr, "file not found\n");
15 }
16
17 return 0;
18 }
Advanced usage 119

Once we compile it, we will open it in Frida’s REPL and check its
exports:

1 [Local::a.out]-> Module.enumerateExportsSync("a.out")
2 [
3 {
4 "address": "0x10bfa8000",
5 "name": "_mh_execute_header",
6 "type": "variable"
7 },
8 {
9 "address": "0x10bfabf00",
10 "name": "main",
11 "type": "function"
12 }
13 ]

As we can see, the main function is exported and it is a good enough


entrypoint for our code tracing to begin:

1 let mainPtr = Module.getExportByName(null, "main");


2 Interceptor.attach(mainPtr, {
3 onEnter (args) {
4 Stalker.follow(this.threadId, {
5 events: {
6 call: true,
7 ret: false,
8 exec: false,
9 block: false,
10 compile: false,
11 },
12 onReceive: function (events) {
13 var calls = Stalker.parse(events, {
14 annotate: true,
15 });
16 for (var i = 0; i < calls.length; i++) {
17 let call = calls[i];
18 console.log(call[2]);
19 }
20 },
21 onCallSummary: function (summary) {
Advanced usage 120

22 console.log(JSON.stringify(summary, null, 4));


23 }
24 });
25 },
26
27 onLeave(retval) {
28 Stalker.unfollow(this.threadId);
29 }
30 });

Now let’s break down this script before we execute it:

1 onEnter (args) {
2 Stalker.follow(this.threadId, {
3 events: {
4 call: true,
5 ret: false,
6 exec: false,
7 block: false,
8 compile: false,
9 },

Stalker.follow is set to follow the thread id of every call to main()


however this is only called once.

1 onReceive: function (events) {


2 var calls = Stalker.parse(events, {
3 annotate: true,
4 });
5 for (var i = 0; i < calls.length; i++) {
6 let call = calls[i];
7 console.log(call[2]);
8 }
9 },

The onReceive callback receives every event collected. We can parse


the events using Stalker.parse built-in method to get a list of events
which include the event type, the parent caller, and the callee. We can
leverage this to our advantage and print he callee which is at call[2].
Advanced usage 121

1 onCallSummary: function (summary) {


2 console.log(JSON.stringify(summary, null, 4));
3 }

onCallSummary callback returns a summary of what has been called


throughout the lifetime of the Stalker object and how many times it
was called. We can pretty print this via console.log.
Finally, we will run the stalker script:
frida -l stalker.js -f a.out --no-pause

And we will get the following output displaying each called address
and the total times it was called (for illustration purposes, only the
summary will be properly displayed):
Advanced usage 122
Advanced usage 123

7.12.3 Tracing instructions


In our previous example we were only tracing call instructions to
obtain the address of the function being called. It is also possible using
the transform callback to log all the instructions we are tracing.
In this example we will only trace instructions without any fur-
ther processing using the previous example program (you can use
whichever program you want to try it on).
Inside our previous Stalker code, we will insert a new callback
named transform and its argument will be iterator which acts as
an iterator:

1 transform: function (iterator) {


2 let instruction = iterator.next();
3 do {
4 console.log(instruction);
5 iterator.keep();
6 } while ( (instruction = iterator.next()) !== null );
7 }

Each item of the iterator contains the instruction(s) executed, so we


can keep iterating until none are left. This instruction object is special
since not only it contains the instructions but also the .address,
.mnemonic, members. We will see more detailed examples covering
these members.
For now, our console.log(instruction) is only logging the complete
instruction being traced, and our output will be this:

1 push rbp
2 mov rbp, rsp
3 push r15
4 push r14
5 push r13
6 push r12
7 push rbx
8 sub rsp, 0xb8
9 mov qword ptr [rbp - 0xd8], r9
10 mov r12, r8
Advanced usage 124

11 mov r13, rcx


12 mov rbx, rdx
13 mov r15, rsi
14 mov r14, rdi
15 lea rax, [rip + 0x6870031f]
16 mov rax, qword ptr [rax]
17 mov qword ptr [rbp - 0x30], rax
18 movsx eax, word ptr [rdx + 0x10]
19 test al, 8
20 je 0x7fff2043972d
21 bt eax, 9
22 jb 0x7fff2043974e
23 cmp qword ptr [rbx + 0x18], 0

And we can get a trace of all the instructions being executed in real-
time. It is also possible to filter out given a certain mnemonic:

1 transform: function (iterator) {


2 let instruction = iterator.next();
3 do {
4 if (instruction.mnemonic === 'jne') {
5 console.log(instruction);
6 }
7 iterator.keep();
8 } while ( (instruction = iterator.next()) !== null );
9 }

Will check all mnemonics a print only the JNE ones:

1 jne 0x7fff20409785
2 jne 0x7fff20409753
3 jne 0x7fff20410fe3
4 jne 0x7fff2040c0e0
5 jne 0x7fff20416095
6 jne 0x7fff204399fc
7 jne 0x7fff204f9259
8 jne 0x7fff204f93f7
9 jne 0x7fff204f9375
10 jne 0x7fff204f93e1
11 jne 0x7fff204f9407
12 jne 0x7fff204f937e
13 jne 0x7fff2041007b
Advanced usage 125

7.12.4 Getting RET addresses


It is possible to use the putCallout method to safely store the context
values of the instruction at the time of its execution. putCallout
passes a callback and returns a context object with access to CPU
registers and the ability to read/modify them.
In this example, we can take advantage of putCallout to log the
addresses that a RET instruction returns to:

1 let statPtr = Module.getExportByName(null, "main");


2 Interceptor.attach(statPtr, {
3 onEnter(args) {
4 Stalker.follow(this.threadId,
5 transform: function(iterator) {
6 let instruction = iterator.next();
7 do {
8 if (instruction.mnemonic == 'ret') {
9 iterator.putCallout(printRet);
10 }
11 iterator.keep();
12 } while ((instruction = iterator.next()) \
13 !== null);
14 },
15
16 });
17 },
18
19 onLeave(retval) {
20 Stalker.unfollow(this.threadId);
21 }
22 });
23
24 function printRet(context) {
25 console.log('RET @ ' + context.pc);
26 }

And returns:
Advanced usage 126

1 RET @ 0x10ac8ee8a
2 RET @ 0x10ac8ee8a
3 RET @ 0x10ac8ee8a
4 RET @ 0x10ac8ee8a
5 RET @ 0x10ac8ee8a
6 RET @ 0x10ac8ee8a
7 RET @ 0x10ac8901e
8. MacOS
Although I am not yet very familiar with MacOS and Swift, I thought
it was interesting to write about using Frida with MacOS (and this
translates into knowledge for iOS) with some example applications.
It is important to notice that working with MacOS and iOS apps is
a bit different from what we have seen until now, in the sense that
ObjC classes and methods syntax are different.

8.1 ObjC
The ObjC object allows us to access a variety of useful information:

• ObjC.available tells us if there is an ObjC runtime loaded, be


sure to check that it is indeed available before using any other
property or method of ObjC.*.
• ObjC.classes is an object that maps classnames to ObjC.Object
bindings:

ObjC.classes.NSString.stringWithString
Which we can use: ObjC.classes.NSTring.stringWithString_-
("foobar");

It is also possible to list all methods loaded by a class by calling


ObjC.classes.NSString.$ownMethods. Here is an example output
limited to the first 10 results (due to format restrictions):
MacOS 128

1 ObjC.classes.NSString.$ownMethods.slice(0, 10)
2 [
3 "+ NSStringFromLSInstallPhase:",
4 "+ NSStringFromLSInstallState:",
5 "+ NSStringFromLSInstallType:",
6 "+ stringWithUTF8String:",
7 "+ stringWithFormat:",
8 "+ string",
9 "+ allocWithZone:",
10 "+ initialize",
11 "+ supportsSecureCoding",
12 "+ stringWithCharacters:length:"
13 ]

$ownMethods is limited to the object’s class. In case you want to


include the parent class’s results you can use $methods instead.
We can obtain the moduleName of a class using $moduleName:

1 [Local::objCLI]-> ObjC.classes.NSString.$moduleName
2 "/System/Library/Frameworks/Foundation.framework/Versions\
3 /C/Foundation"

8.2 Intercepting NSURL InitWithString


We will now begin to learn how to instrument MacOS apps with a sim-
ple use case, a swift program that simply queries stackoverflow.com:

1 let url = URL(string: "http://www.stackoverflow.com")!


2 print(url);
3 let task = URLSession.shared.dataTask(with: url) {(data, \
4 response, error) in
5
6 if error != nil || data == nil {
7 print("Client error!")
8 return
9 }
10 guard let response = response as? HTTPURLResponse, (2\
11 00...299).contains(response.statusCode) else {
12 print("Server error!")
MacOS 129

13 return
14 }
15 print("The Response is : ",response)
16 print(data);
17 }
18
19 task.resume()

Our first option here is using frida-trace to see which NSURL* classes
and methods are being called, the syntax for this however is different
from what we have seen until now.
frida-trace -f swiftApp -m "-[NSURL **]"

Which will in turn create a .js file for each handler it has detected and
print us something like this:

1 Started tracing 1987 functions. Press Ctrl+C to stop.


2 /* TID 0x407 */
3 1007 ms -[NSURL initWithString:0x7fa35550bb40]
4 1008 ms | -[NSURL initWithString:0x7fa35550bb40 rel\
5 ativeToURL:0x0]
6 1008 ms -[NSURL isFileReferenceURL]
7 1008 ms | -[NSURL _cfurl]
8 1008 ms -[NSURL retain]
9 1009 ms -[NSURL retain]
10 1009 ms -[NSURL retain]
11 1009 ms -[NSURL release]
12 1009 ms -[NSURL retain]
13 1009 ms -[NSURL release]
14 1009 ms -[NSURL retain]
15 1010 ms -[NSURL release]
16 1010 ms -[NSURL retain]
17 1010 ms -[NSURL description]
18 1010 ms | -[NSURL scheme]
19 1010 ms | | -[NSURL _cfurl]
20 1010 ms | -[NSURL baseURL]
21 1010 ms | | -[NSURL _cfurl]
22 1010 ms | -[NSURL relativeString]
23 1010 ms | | -[NSURL _cfurl]
24 1010 ms -[NSURL release]
25 1010 ms -[NSURL release]
MacOS 130

26 1010 ms -[NSURL release]


27 1010 ms -[NSURLSessionConfiguration setDisposition:0x7\
28 fff88b58508]

If we check the beginning of frida-trace, we can see something that


is of interest to us:

1 /* TID 0x407 */
2 1003 ms -[NSURL initWithString:0x7fa35550bb40]
3 1008 ms | -[NSURL initWithString:0x7fa35550bb40 rel\
4 ativeToURL:0x0]
5 1008 ms -[NSURL isFileReferenceURL]
6 1008 ms | -[NSURL _cfurl]
7 1008 ms -[NSURL retain]

-[NSURL initWithString] is being called at the beginning to initialize


the URL object. This is the one that holds the string we are send-
ing to it. Now let’s go ahead and open the file at __handlers__-
/NSURL/initWithString_.js.

And frida-trace has generated a stub file for us to fill now, but there
is something catching our attention here and that is that frida’s stub
is printing us ${args[2]}, why is that?
When we intercept ObjC objects, we need to take into account that
the args[] array does not contain elements the same way it would on
Windows or Linux binaries. This array instead stores args[0]->self
args[1]->selector and args[2+(n-1)]->arguments.
Which translates into us having to work directly with args[2] instead
to get the first argument and use this address to create ObjC.Object’s
instead.
Note: In case that the above formula is not clear if a function/method
has 2 arguments the second argument address will be placed at
args[3] in the args[] array (instead of the usual args[1]). - We will
have an example of this later on.
We can now fill the stub using ObjC.Object and the address provided
by args[2] as follows:
MacOS 131

1 onEnter(log, args, state) {


2 log(`-[NSURL initWithString:${args[2]}]`);
3 const myString = new ObjC.Object(args[2]);
4 log(myString.toString());
5 },

We create the object myString that holds the ObjC.Object, then


this is rendered using the .toString() method so we can get an
understandable representation. Then, the result is as follows:

1 /* TID 0x407 */
2 1003 ms -[NSURL initWithString:0x7fa35550bb40]
3 1003 ms http://www.stackoverflow.com
4 1008 ms | -[NSURL initWithString:0x7fa35550bb40 rel\
5 ativeToURL:0x0]
6 1008 ms -[NSURL isFileReferenceURL]
7 1008 ms | -[NSURL _cfurl]

As we can see, when NSURL initWithString is called we are able to


render to screen the string used to initialize the object.

8.3 Obj-C: Intercepting


fileExistsAtPath
Now we will directly instrument an Objective-C application (so no
Swift layer) - This is a simple application that just check whether a
file exists using the NSFileManager API:
MacOS 132

1 int main(int argc, const char * argv[]) {


2 @autoreleasepool {
3 NSString *filepath = @"/Users/fernandou/Desktop/t\
4 est.c";
5 NSFileManager *fileManager = [NSFileManager defau\
6 ltManager];
7
8 if ([fileManager fileExistsAtPath:filepath]) {
9 NSLog(@"File exists.");
10 }
11 else {
12 NSLog(@"File does not exist.");
13 }
14 return 0;
15 }

We will check using frida-trace that the NSFileManager API is indeed


being called:
$ frida-trace -m "-[NSFileManager **]" -f objCLI

And the output we will be getting is:

1 /* TID 0x407 */
2 60 ms -[NSFileManager fileExistsAtPath:0x10f24f018]
3 60 ms | -[NSFileManager getFileSystemRepresentati\
4 on:0x7ffee09b43e0 maxLength:0x400 withPath:0x10f24f018]

We will now learn how to get an address to -[NSFileManager


fileExistsAtPath:] so that we can write an instrumentation script.

First, we will fire up Frida’s REPL:


$ frida -f myObjCApp

Once we are inside the REPL, we need an API resolver to get the
handler and so we will create it:
myResolver = new ApiResolver('ObjC');

This resolver has access to a method named enumerateMatchesSync,


which we will use to our advantage. We will try to get all methods
matching fileExists*:
MacOS 133

myResolver.enumerateMatchesSync('-[NSFileManager
fileExists*])

And this will return us two possible methods:


enumerateMatchesSync output.

1 [
2 {
3 "address": "0x7fff211855af",
4 "name": "-[NSFileManager fileExistsAtPath:]"
5 },
6 {
7 "address": "0x7fff2117d115",
8 "name": "-[NSFileManager fileExistsAtPath:isDirec\
9 tory:]"
10 }
11 ]

Now we have two addresses however, if we want to automate we will


have to be more specific which translates in avoid using the wildcard
operator and writing the method the way we want it:

1 [Local::objCLI]-> myResolver.enumerateMatchesSync("-[NSFi\
2 leManager fileExistsAtPath:]")
3 [
4 {
5 "address": "0x7fff211855af",
6 "name": "-[NSFileManager fileExistsAtPath:]"
7 }
8 ]

And now we have the address of the -[NSFileManager


fileExistsAtPath:] method. What is left is to actually write
our instrumentation script:
MacOS 134

ObjC address hook implementation.

1 const myResolver = new ApiResolver('ObjC');


2
3 const NSFileManagerFileExistsPtr = ptr(myResolver.enumera\
4 teMatchesSync('-[NSFileManager fileExists*]')[0].address)\
5 // It returns a list so we need to grab the first elemen\
6 t and obtain the address.
7
8 Interceptor.attach(NSFileManagerFileExistsPtr, {
9 onEnter(args) {
10 const filePath = new ObjC.Object(args[2]);
11 console.log("[frida] filepath: " + filePath.toStr\
12 ing());
13 }
14 });

And we can inject this script via:


$ frida -l agent.js myObjCApp

Which results in the following output:

1 [Local::objCLI]-> [frida] filepath: path/Users/fernandou\


2 /Desktop/test.c
3 2021-02-16 07:54:52.802 objCLI[1087:22680899] File exists.

And that’s it! We have created our first instrumentation script for
ObjC apps without REPL interaction.

8.4 ObjC: Methods with multiple


arguments.
We can take a step further and read a method that has two arguments
instead of a single one. For this purpose, we will modify the previous
program to call [fileManager fileExistsAtPath: isDirectory:]
instead.
We will also use Frida’s REPL to get the method pointer differently.
MacOS 135

1 int main(int argc, const char * argv[]) {


2 @autoreleasepool {
3 NSString *filepath = @"/Users/fernandou/Desktop/t\
4 est.c";
5 NSFileManager *fileManager = [NSFileManager defau\
6 ltManager];
7 BOOL res;
8
9 if ([fileManager fileExistsAtPath:filepath isDire\
10 ctory:&res]) {
11 NSLog(@"File exists.");
12 }
13 else {
14 NSLog(@"File does not exist.");
15 }
16 }
17 return 0;
18 }

isDirectory will be nil in case the target path is not a folder.


Now we build the file and fire up frida’s REPL. From there, we will
now see how to get the pointer to a method using the ObjC.classes
object which maps ObjC classes to JavaScript objects. If we write in
Frida’s REPL ObjC.classes we can see that it begins to autocomplete:

For instance if we want to know all methods available in


ObjC.classes.NSFileManager we can do it using $ownMethods
which returns an array containing native method names exposed by
the object class:
MacOS 136

1 [Local::objCLI]-> ObjC.classes.NSFileManager.$ownMethods
2 [
3 "+ defaultManager",
4 "- dealloc",
5 "- delegate",
6 "- setDelegate:",
7 "- fileExistsAtPath:",
8 "- createDirectoryAtPath:withIntermediateDirectories:\
9 attributes:error:",
10 "- createDirectoryAtURL:withIntermediateDirectories:a\
11 ttributes:error:",
12 "- homeDirectoryForCurrentUser",
13 "- URLsForDirectory:inDomains:",
14 "- getRelationship:ofDirectoryAtURL:toItemAtURL:error\
15 :",
16 "- enumeratorAtURL:includingPropertiesForKeys:options\
17 :errorHandler:",
18 "- temporaryDirectory",
19 "- stringWithFileSystemRepresentation:length:",
20 "- removeItemAtPath:error:",
21 "- enumeratorAtPath:",
22 "- contentsOfDirectoryAtPath:error:",
23 "- isExecutableFileAtPath:",
24 "- destinationOfSymbolicLinkAtPath:error:",
25 ...

If we take a look at the list, we can see that the method


fileExistsAtPath:isDirectory: is available to us. We can access
it and then its member .implementation which returns a pointer to
the mapped object:

1 [Local::objCLI]-> t = ObjC.classes.NSFileManager['- fileE\


2 xistsAtPath:isDirectory:'].implementation
3 function
4 [Local::objCLI]-> ptr(t)
5 "0x7fff2117d115"

Once we have this information, we can do something similar to what


we have seen before except that this time it won’t be args[2] the
element storing our interesting parameter but args[3] instead:
MacOS 137

1 Interceptor.attach(ptr(t), {
2 onEnter(args) {
3 this.isDir = args[3];
4 },
5 onLeave: function(retval) {
6 let objCIsDir = new ObjC.Object(this.isDir);
7 console.log(objCIsDir);
8 }
9 })

And once we %resume:

1 [Local::objCLI]-> %resume
2 2021-02-16 22:48:51.725 objCLI[96805:23540150] File exist\
3 s.
4 [Local::objCLI]-> nil

It says our .c file is not a directory so the parameter is set to nil after
method execution.

8.5 ObjC: Reading a CFDataRef


This came up as a question in the Frida IRC/Telegram channel and I
thought it was interesting to illustrate it in these pages. The question
in turn is how to access CFDataRef bytes. For this use case, we will
write a small example application:

1 #import <Foundation/Foundation.h>
2
3 void print_ptr(CFDataRef dRef) {
4 NSLog(@"%@", dRef);
5 }
6
7 int main(int argc, const char * argv[]) {
8
9 @autoreleasepool {
10 const UInt8 *myString = "foobar";
11 CFDataRef data = CFDataCreateWithBytesNoCopy(NULL\
12 , myString, strlen(myString), kCFAllocatorNull);
MacOS 138

13
14 NSLog(@"%p", print_ptr);
15 getchar();
16 print_ptr(data);
17 }
18 return 0;
19 }

In this example we have a print_ptr function that


prints the CFDataRef named data which is created calling
CFDataCreateWithBytesNoCopy and we are using as a string a
simple one: foobar.
For illustration purposes, we will print the address of the print_ptr
function and call getchar() so that we can copy the address and fire
up Frida”s REPL.

You can infer the address by calling Process.enumerateExportsSync()


then calling Module.enumerateExportsSync() and getting the
print_ptr()pointer

1 [Local::objCLI]-> printPtr = ptr(0x10606c20)


2 [Local::objCLI]-> Interceptor.attach(printPtr, {
3 onEnter(args) {
4 cfData = new ObjC.Object(args[0]);
5 cfString = cfData.bytes().readUtf8String();
6 console.log(`string output: ${cfString}`);
7 }
8 })
9 [Local::objCLI]-> %resume
10 [Local::objCLI]-> string output: foobar

Firsts things first, since we are defining a C function we don’t have


to think about args[2] this time but args[0] instead to get our first
argument.
If we try to print the argument without parsing it, we will get this
output:
MacOS 139

{length = 6, bytes = 0x666f6f626172}

So, we get the bytes and bytes length but not the representation of
the string. For this purpose, we can access the .bytes() method in
ObjC.Object and from this representation call .readUtf8String() to
read it as an UTF8 string.
In the case that it is not an UTF8 string, be sure to use the appropriate
method to read it.

8.6 Getting CryptoKit’s AES.GCM.seal


data before encryption
CryptoKit has an AES class to encrypt and decrypt data using AES-
GCM-128 bits up to 256. With Frida, it is possible to obtain the data
and the key before data is encrypted, so let’s write up a quick example:

1 import Foundation
2 import CryptoKit
3
4 let pass = "foobar"
5 let data = "frida is fun!".data(using: .utf8)!
6 let key = SymmetricKey(data: SHA256.hash(data: pass.data(\
7 using: .utf8)!))
8 let iv = AES.GCM.Nonce()
9 let mySealedBox = try AES.GCM.seal(data, using: key, nonc\
10 e: iv)
11 let dataToShare = mySealedBox.combined?.base64EncodedData\
12 ()

This simple example takes the “frida is fun!” string and encrypts it
using “foobar” as key. After building this sample, we can disassemble
the binary before opening it up in Frida. What we will find in the list
of functions is:
MacOS 140

1 0x100003c30 1 6 sym.imp.static_CryptoKit.AE\
2 S.GCM.seal_A_where_A:_Foundation.DataProtocol___:_A__usin\
3 g:_CryptoKit.SymmetricKey__nonce:_Swift.Optional_CryptoKi\
4 t.AES.GCM.Nonce___throws____CryptoKit.AES.GCM.SealedBox
5 0x100003c42 1 6 sym.imp.CryptoKit.AES.GCM.S\
6 ealedBox.combined.getter_:_Swift.Optional_Foundation.Data
7 0x100003c48 1 6 sym.imp.type_metadata_acces\
8 sor_for_CryptoKit.AES.GCM.SealedBox

And if we check the dissassembly itself we can find that the CryptoKit
function we wrote is being called:

Now, we can fire up Frida and check our binary imports (I named this
example binary swiftCLI):
Module.enumerateImportsSync("swiftCLI")

And we can quickly notice our target among all the imports:
MacOS 141

1 ...
2 {
3 "address": "0x7fff56762210",
4 "module": "/System/Library/Frameworks/CryptoKit.f\
5 ramework/Versions/A/CryptoKit",
6 "name": "$s9CryptoKit3AESO3GCMO4seal_5using5nonce\
7 AE9SealedBoxVx_AA12SymmetricKeyVAE5NonceVSgtK10Foundation\
8 12DataProtocolRzlFZ",
9 "slot": "0x10079d030",
10 "type": "function"
11 },
12 ...

There are other functions that might be interesting to inspect but they
are not needed in this case:

1 $s8swiftCLI3key9CryptoKit12SymmetricKeyVvp
2 $s8swiftCLI2iv9CryptoKit3AESO3GCMO5NonceVvp
3 $s8swiftCLI11mySealedBox9CryptoKit3AESO3GCMO0dE0Vvp

Once we now the function name we want to instrument, we can


see that the first and second arguments are the data and the key
respectively.
For illustration purposes, the script we will write will print the values
of the data and the key but also the hexdump so we can see that
everything is indeed stored in memory:

1 let imports = Module.enumerateImportsSync("swiftCLI");


2
3 imports.forEach(function(value) {
4 if (value.name === "$s9CryptoKit3AESO3GCMO4seal_5using5\
5 nonceAE9SealedBoxVx_AA12SymmetricKeyVAE5NonceVSgtK10Found\
6 ation12DataProtocolRzlFZ") {
7 console.log("[+] Address: " + value.address);
8 Interceptor.attach(value.address, {
9 onEnter(args) {
10 console.log("Raw data: " + args[0].readCString())\
11 ;
12 console.log(hexdump(args[0]));
13 }
MacOS 142

14 });
15 }
16 });

And after running our script:


frida -l script.js -f swiftCLI --no-pause

We can get the raw data and spot the key:

One thing we are able to notice is that in memory our password


is always 3 bytes after the data that is going to be encrypted. This
means, we could shift the length of our data (which we obtain via
readCString()) and add our magical offset 0x3, so we will modify our
code to be able to intercept the key:

1 let imports = Module.enumerateImportsSync("swiftCLI");


2
3 imports.forEach(function(value) {
4
5 if (value.name === "$s9CryptoKit3AESO3GCMO4seal_5using5\
6 nonceAE9SealedBoxVx_AA12SymmetricKeyVAE5NonceVSgtK10Found\
7 ation12DataProtocolRzlFZ") {
8 console.log("[+] Address: " + value.address);
9 Interceptor.attach(value.address, {
10 onEnter(args) {
11 let rawData = args[0].readCString();
12 let key = args[0].add(rawData.length + 0x3).readC\
MacOS 143

13 String();
14
15 console.log("Raw data: " + rawData);
16 console.log("Key: " + key);
17
18 },
19 });
20 }
21 });

And after running it, we get the following output:

1 [Local::swiftCLI]-> Raw data: frida is fun!


2 Key: foobar

8.7 Swift.String
Another thing you might have noticed is that strings are inlined when
they are small but once they grow bigger than 15 bytes it is not
possible to parse them easily as before. The reason is Swift’s memory
layout for string types.
To test this, we are going to take a very basic Swift example: A
program that builds a hello string and receives the person’s name as
an argument.

1 func greet(person: String) -> String {


2 let greeting = "Hello, " + person + "!!"
3 return greeting
4 }
5
6 let greeting = greet(person: "Ole Andre Vadla Ravnas")
7
8 print(greeting)

For <= 15 byte strings, flags are stored in the latest byte and the string
itself are the first 15 bytes. We can read the string as usual.
For strings longer than 16 they are considered large by Swift’s
memory layout and thus are split in 8 bytes for countAndFlagsBits
MacOS 144

which is an UInt64 containing flags such as isASCII, isNFC,


isNativelyStored and 8 bytes for a Builtin.BridgeObject.
According to Swift’s ABI¹ data is split in two registers (which we can
quickly check in ) rax for metadata where LSB(Least Significant Bit)
is length and MSB(Most Significant Bit) contains flags. rdx register
should contain on the LSB the pointer to the string which offsetted
by 32 is our string. MSB stores again flags/metadata.
Now, we will instrument the greet(person: String) -> String
function. Since Swift’s exports are mangled, first we will check the
exports to find the name of our target function:

1 [Local::swiftCLI]-> Module.enumerateExportsSync("swiftCLI\
2 ")
3 [
4 {
5 "address": "0x108a4f000",
6 "name": "_mh_execute_header",
7 "type": "variable"
8 },
9 {
10 "address": "0x108a52a80",
11 "name": "main",
12 "type": "function"
13 },
14 {
15 "address": "0x108a52dc0",
16 "name": "$s8swiftCLI5greet6personS2S_tF",
17 "type": "function"
18 },
19 {
20 "address": "0x108a57048",
21 "name": "$s8swiftCLI8greetingSSvp",
22 "type": "variable"
23 }
24 ]

And so we get our function named $s8swiftCLI5greet6personS2S_tF.


Now, we can instrument it:

¹https://github.com/apple/swift/blob/main/docs/ABI/RegisterUsage.md
MacOS 145

1 const greetFuncPtr = Module.getExportByName(null, '$s8swi\


2 ftCLI5greet6personS2S_tF');
3 Interceptor.attach(greetFuncPtr, {
4 onEnter: function(args) {
5 // Note: this.context.rdx === args[1]
6 const tmpAddr = this.context.rdx.and('0xFFFFFFFFFF').
7 toString(16);
8 const lsbAddr = ptr(new UInt64('0x' + tmpAddr));
9 console.log(lsbAddr.add(32).readCString());
10 });

Which outputs the string:


[Local::swiftCLI]-> Ole Andre Vadla Ravnas

Note: Tested under Swift 5. This information should be stable but


Swift’s memory layout might change anytime in later versions.
Since reading Swift’s ABI can be a tough task, I recommend reading
TannerJin’s Swift-MemoryLayout² repository with very good expla-
nations of Swift’s core types.
²https://github.com/TannerJin/Swift-MemoryLayout
9. r2frida
r2frida is a plugin for radare2¹, a reverse engineering framework. This
plugin aims to join the capabilities of static analysis of radare2 and
the instrumentation provided by Frida. Even though it is possible to
achieve similar results using only Frida alone (when it comes to in-
memory related stuff) having the power of radare2 joined to assemble
patches in memory and static analysis all-in-one makes it a great
tandem.
In order to get this plugin working a working radare2 setup is
required. From now, radare2 will be referred to asr2. To get r2 setup
in your environment, please refer to their INSTALL.md².
To setup this plugin, the simplest way is by running radare2’s package
manager:
r2pm -ci r2frida

r2frida commands
When searching for r2frida documentation or blogposts,
it is likely that you will find commands starting with
backslash “”. This is the old way of running commands
within r2frida, commands are now run starting with :.

9.0.1 Testing r2frida


In order to confirm that r2frida is setup properly, it simplest way of
doing so is by running it against a known binary. In this case /bin/ls
is a good candidate for it in unix systems:
r2 frida:///bin/ls

And then it should get into r2’s shell:


¹https://www.radare.org/n/
²https://github.com/radareorg/radare2/blob/master/INSTALL.md
r2frida 147

1 $ r2 frida:///bin/ls
2 -- This binary no good. Try another.
3 [0x00000000]>

The next step is verifying that Frida is working and is read properly
by r2frida:

1 [0x00000000]> :?V
2 {"version":"15.1.17"}

Congratulations, your r2frida setup is now working!

Attaching to running processes


To attach to already running processes, the name or the
PID of the process is required instead of the absolute path:
For a PID: r2 frida://<PID>. For example r2
frida://1234

For a process name: r2 frida://<PROCESS_NAME>. For


example r2 frida://notepad.exe

9.1 Tracing functions


r2frida allows the user to trace functions without writing instrumen-
tation code. To do so, it is possible to use the command dt(=trace) and
dtf(=trace format). :dtf allows to format the output of the traced
functions, instrument onEnter blocks and display backtraces. When
typing :dtf? it displays the following help:
r2frida 148

For example, to instrument a function that receives a const char* as


the first argument it is possible to do so by:
:dtf 0x7f101d1ed000 z^

z will read the first argument as a UTF-8 string and ^ traces the
onEnter instead of the onLeave block. To understand how to use the
tracing command it is better to do it through practical examples.

9.1.1 Tracing functions from imports/exports


The easiest path to instrument function is whenever these functions
are easily obtainable from the process’ imports and exports. Through
r2frida it is possible to retrieve the address of the function that
we want to instrument by using the :iE command but before going
further; we will get a known binary to test with: wget.
To instrument wget with parameters in r2frida it is done this way:
r2 "frida:///wget man7.org"

Note that the unlike the previous example where r2frida was
spawned this time the frida:/// block is surrounded by double
quotes, this allows to pass arguments to the target binary. The target
function to instrument in this case is fopen that receives the filename
in the first argument and mode in second argument both as const
char*’s. Once we are in the r2 shell we should have tue following
console:
r2frida 149

1 r2 "frida:///usr/bin/wget man7.org"
2 r_config_set: variable 'asm.cmtright' not found
3 -- If you're having fun using radare2, odds are that you\
4 're doing something wrong.
5 [0x00000000]>

fopen is a function imported from libc and to be able to list the


imports/exports of this module we need to position ourselves in the
correct module address. To do so, we can use the :dm command to
enumerate modules and their addresses:

The output has been shortened, but the important bit is the first entry
of libc-2.27.so. By entering the address in r2’s console, we position
ourselves in the module address:

1 [0x00000000]> 0x00007ff0a9db9000
2 [0x7ff0a9db9000]>

From here, it is possible to enumerate exports by using the :iE


command. However, the output of :iE is huge and to filter out the
results to those containing fopen the ∼ operator allows to do so:
r2frida 150

1 [0x7ff0a9db9000]> :iE~fopen
2 0x7ff0a9e45450 f _IO_file_fopen
3 0x7ff0a9e37de0 f fopen
4 0x7ff0a9e380d0 f fopencookie
5 0x7ff0a9e37de0 f _IO_fopen
6 0x7ff0a9e37de0 f fopen64
7 [0x7ff0a9db9000]>

0x7ff0a9e37de0 is the memory address of the libc’s fopen and as


mentioned before it contains two arguments that can be read as
UTF8 strings. With this information it is now possible to use the
:dtf: command to trace the function and print out the values of each
argument:

1 [0x7ff0a9db9000]> :dtf 0x7ff0a9e37de0 zz^


2 true
3 [0x7ff0a9db9000]> :dc
4 resumed spawned process.

The :dtf command returns true signaling that the command was
issued succesfully. z will display the value read as an UTF8String and
the second z will do the same thing for the second argument. ‘^’ also
shows the backtrace of the function. To resume execution, the :dc
does so.

It is possible to combine different format types for dif-


ferent arguments, for example it is possible to print the
hexdump of the second argument instead by replacing z
with h: :dtf 0x7ff0a9e37de0 zh^

When the execution is resumed, the following output is showed in


our console:
r2frida 151

The output has been filtered to include only the interesting bits, but
it can be seen that each argument is interpreted correctly as a UTF8
string and displayed along their backtrace.

9.1.2 Tracing functions by using offsets


Unlike the previous section, this time the objective is to instrument a
function that is not directly listed in the imports/exports table which
is something common when tracing functions. In this situation what
we need to do is to retrieve the offset to the function(s) that are to be
instrumented.
To illustrate this example, the following code will be used to trace the
memcmp and check functions:
r2frida 152

1 // gcc check_password.c
2 #include <stdio.h>
3 #include <stdlib.h>
4 #include <string.h>
5
6 // Damn_YoU_Got_The_Flag
7 char password[] = "\x18\x3d\x31\x32\x03\x05\x33\x09\x03\x\
8 1b\x33\x28\x03\x08\x34\x39\x03\x1a\x30\x3d\x3b";
9
10 inline int check(char* input);
11
12 int check(char* input) {
13 for (int i = 0; i < sizeof(password) - 1; ++i) {
14 password[i] ^= 0x5c;
15 }
16 return memcmp(password, input, sizeof(password) - 1);
17 }
18
19 int main(int argc, char **argv) {
20 if (argc != 2) {
21 printf("Usage: %s <password>\n", argv[0]);
22 return EXIT_FAILURE;
23 }
24 int size_of_password = (sizeof(password) - 1);
25 printf("size: %d", size_of_password);
26 if (strlen(argv[1]) == (sizeof(password) - 1) && check(\
27 argv[1]) == 0) {
28 puts("You got it !!");
29 return EXIT_SUCCESS;
30 }
31
32 puts("Wrong");
33 return EXIT_FAILURE;
34
35 }

This time the code is compiled using gcc and when opening in in
r2frida and inspecting the imports/exports of the binary it is a blank
slate:
r2frida 153

1 [0x00000000]> :dm
2 0x00005591d1947000 - 0x00005591d1948000 r-x /tmp/a.out
3 0x00005591d1b47000 - 0x00005591d1b48000 r-- /tmp/a.out
4 0x00005591d1b48000 - 0x00005591d1b49000 rw- /tmp/a.out
5 # ...
6 [0x00000000]> s 0x00005591d1947000
7 [0x5591d1947000]> :iE
8 [0x5591d1947000]> :ii
9 [0x5591d1947000]>

When opening this binary with r2 -A to analyze it, this is the output
obtained when listing functions:

1 $ r2 -A a.out
2 [0x00000610]> afl
3 0x00000610 1 42 entry0
4 0x00000640 4 50 -> 40 sym.deregister_tm_clones
5 0x00000680 4 66 -> 57 sym.register_tm_clones
6 0x000006d0 5 58 -> 51 sym.__do_global_dtors_aux
7 0x00000600 1 6 sym.imp.__cxa_finalize
8 0x00000710 1 10 entry.init0
9 0x000008a0 1 2 sym.__libc_csu_fini
10 0x000008a4 1 9 sym._fini
11 0x00000830 4 101 sym.__libc_csu_init
12 0x0000077b 7 170 main
13 0x0000071a 4 97 sym.check
14 0x00000598 3 23 sym._init
15 0x000005c0 1 6 sym.imp.puts
16 0x000005d0 1 6 sym.imp.strlen
17 0x000005e0 1 6 sym.imp.printf
18 0x00000000 2 25 loc.imp._ITM_deregisterTMClo\
19 neTable
20 0x000005f0 1 6 sym.imp.memcmp
21 0x000001a5 1 38 fcn.000001a5
22 [0x00000610]>

What is seen in the first column are the offsets for the functions
and the ones which are of interest to us are sym.imp.memcmp and
sym.check:
r2frida 154

1 # offset function
2 0x0000071a 4 97 sym.check
3 0x000005f0 1 6 sym.imp.memcmp

After retrieving the values for both functions, the next step is spawn-
ing the binary using r2frida to calculate the memory addresses of
these functions. To ensure that both the memcmp and the check function
are called the binary has been spawned with the following argument:
r2 "frida:///tmp/a.out testtesttesttesttestt"

The next step is retrieving the base address of a.out which can be
done by using the :dm command to list modules joint with ∼ to filter
out the results:

1 [0x00000000]> :dm~out
2 0x000055c05fc86000 - 0x000055c05fc87000 r-x /tmp/a.out
3 0x000055c05fe86000 - 0x000055c05fe87000 r-- /tmp/a.out
4 0x000055c05fe87000 - 0x000055c05fe88000 rw- /tmp/a.out

In this situation, the base address for the spawned process is


0x000055c05fc86000 which can be used to add the offsets of each
function and get their real address. For example, for sym.check:
[0x55c05fc8671a]> 0x000055c05fc86000 + 0x0000071a

0x55c05fc8671a is the address of the sym.check function, the next


step is to trace it:

1 [0x55c05fc8671a]> :dtf 0x55c05fc8671a z^


2 true

With z the value is read as a UTF8 string. The next function to


instrument is memcmp:

1 [0x55c05fc8671a]> 0x000055c05fc86000 + 0x000005f0


2 [0x55c05fc865f0]>
3 [0x55c05fc865f0]> :dtf 0x55c05fc865f0 hh
4 true
r2frida 155

Since memcmp receives two const void* parameters the tracing format
that we are using here is hh to hexdump the address of both argu-
ments. Now that both functions have been traced the execution of
the process can be resumed by calling :dc:

When the execution is resumed, the argument we passed to the check


function “testtesttesttesttestt” can be read when the function is traced.
The hexdumps for both arguments are also printed and from there the
flag can be obtained.

9.2 Disassembling functions in


memory
With r2frida it is possible to analyze and disassemble functions
in memory provided the right addresses, something that is very
powerful to inspect what functions do to extract valuable information
from them. To learn to do this, we are going to use the previous code
with the known offsets of the sym.check function and the memcmp
function.
Again, we open the binary the same way as before:
r2frida 156

r2 "frida:///tmp/a.out testtesttesttesttestt"

And then set emu.str=true to view the strings obtained from emula-
tion and place ourselves at the sym.check address:

1 [0x00000000]> e emu.str=true
2 [0x00000000]> :dm~out
3 0x000055bf4aefa000 - 0x000055bf4aefb000 r-x /tmp/a.out
4 0x000055bf4b0fa000 - 0x000055bf4b0fb000 r-- /tmp/a.out
5 0x000055bf4b0fb000 - 0x000055bf4b0fc000 rw- /tmp/a.out
6 [0x00000000]> 0x000055bf4aefa000 + 0x0000071a
7 [0x55bf4aefa71a]>

The next step is analyzing the function which is done by typing af @


address, in our case:

And with this , we have access to the disassembly of the function in


memory and the addresses used by it.

9.3 Replace return values


Intercepting and replacing the return value of an address is possible
using the :di command. The help of :di? shows:
r2frida 157

1 [0x00000000]> :di?
2 di intercept help
3 di-1 intercept ret_1
4 di0 intercept ret0
5 di1 intercept ret1
6 dif intercept fun help
7 dif-1 intercept fun ret_1
8 dif0 intercept fun ret0
9 dif1 intercept fun ret1
10 difi intercept fun ret int
11 difs intercept fun ret string
12 dii intercept ret int
13 dis intercept ret string
14 div intercept ret void

What this means is that :di-1 will replace the return value of the
address with -1, :di0 will make the return value 0 and the same goes
for :di1 which sets the return value to one. The same code as in the
previous section is what we areusing to test this command out.
The idea is to patch the check function’s return value so that it returns
0 allowing the code to return the string “You got it !!”. The first thing
to do to get the address of the check function:
1 [0x00000000]> 0x0000563b168e5000 + 0x0000071a
2 [0x563b168e571a]> :di0 0x563b168e571a

The address of the check function for this execution is


0x563b168e571a. The next step is to modify the return value
by using the :di0 command:
[0x563b168e571a]> :di0 0x563b168e571a

And when checking the main function, the latest function called is
0x563b168e55c0 on which we are going to place a breakpoint by using
the :db command to be able to see what happens:
r2frida 158

Now the execution can be resumed by using the :dc command:

We can see that although the process was spawned with the
“testtesttesttesttestt” string instead of the correct flag it returned 0
and the code returns “You got it !!” in turn.

9.4 Replacing return values


(hijacking)
Intercepting and replacing (also known as hijacking) the return value
of an address is possible using the :di command. The help of :di?
shows:

1 [0x00000000]> :di?
2 di intercept help
3 di-1 intercept ret_1
4 di0 intercept ret0
5 di1 intercept ret1
6 dif intercept fun help
7 dif-1 intercept fun ret_1
8 dif0 intercept fun ret0
9 dif1 intercept fun ret1
10 difi intercept fun ret int
11 difs intercept fun ret string
12 dii intercept ret int
13 dis intercept ret string
14 div intercept ret void

What this means is that :di-1 will replace the return value of the
address with -1, :di0 will make the return value 0 and the same goes
r2frida 159

for :di1 which sets the return value to one. The same code as in the
previous section is what we areusing to test this command out.
The idea is to patch the check function’s return value so that it returns
0 allowing the code to return the string “You got it !!”. The first thing
to do to get the address of the check function:

1 [0x00000000]> 0x0000563b168e5000 + 0x0000071a


2 [0x563b168e571a]> :di0 0x563b168e571a

The address of the check function for this execution is


0x563b168e571a. The next step is to modify the return value
by using the :di0 command:
[0x563b168e571a]> :di0 0x563b168e571a

And when checking the main function, the latest function called is
0x563b168e55c0 on which we are going to place a breakpoint by using
the :db command to be able to see what happens:

Now the execution can be resumed by using the :dc command:

We can see that although the process was spawned with the
“testtesttesttesttestt” string instead of the correct flag it returned 0
and the code returns “You got it !!” in turn.
r2frida 160

9.5 Allocating strings


A common use case is allocating strings on the heap, this can be
done with the :dmas command. The :dmas command takes a string
value and returns the address of the allocated string in the heap, for
example:

1 [0x00000000]> :dmas r2fridarul3s


2 0x7f58d1436b60

The string r2fridarules is allocated at the address 0x7f58d1436b60.


To keep track of the stuff that we have allocated throughout our ses-
sion there command :dmal returns the list of our current allocations:

1 [0x00000000]> :dmal
2 0x7f58d1436b60 "r2fridarul3s"

9.6 Calling functions


It is possible to call functions of the process within r2frida by using
the :dx command. The help for the :dx command is as follows:

1 [0x00000000]> :dx?
2 dxc dx call
3 dxo dx objc
4 dxs dx syscall

The :dx command is able to perform regular CALLs, Objective-C calls


or syscalls. To test this feature out the following code example is used:
r2frida 161

1 #include <stdio.h>
2
3 int main()
4 {
5 FILE *fp = NULL;
6 fp = fopen("sample_file.dat", "w");
7 fclose(fp);
8 return 0;
9 }

What we are going to do is to call the fopen function of this binary


with a custom string, in order to do so the first step is opening it up in
r2frida and the required strings for filename and mode respectively:

1 [0x00000000]> :dmas r2fridarul3s


2 0x7fd21c121cb0
3 [0x00000000]> :dmas w
4 0x7fd21e825e90
5 [0x00000000]> :dmal
6 0x7fd21c121cb0 "r2fridarul3s"
7 0x7fd21e825e90 "w"

Now that both strings have been allocated the next step is figuring
out the address of the fopen function. This can be done as previously
learned by getting the base address of the process:

The result is that the address of the fopen function for this process is
0x55c153ffa560 which can now be used to call :dxc:

1 [0x55c153ffa560]> :dxc 0x55c153ffa560 0x7fd21c121cb0 0x7f\


2 d21e825e90
3 "0x7fd200000bc0"

The result 0x7fd200000bc0 is the address of the pointer to the FILE*


returned by fopen. When inspecting our local folder the file is now
present:
r2frida 162

1 $ ls | grep r2
2 r2fridarul3s
10. Optimizing our Frida
setup
When instrumenting applications it is not only important to optimize
our instrumentation code for edge cases, but also optimizing the
library that we are injecting in our target application.
Frida provides in its config.mk file certain features that might not be
needed in our agent, and this would help reduce the memory footprint
of the injected agent. Among the features that can be disabled are:

• V8 Runtime
• Frida connectivity (TLS and ICE, OpenSSL)
• Frida Objective-C bridge
• Swift Bridge
• Java Bridge

To understand what percentege represents each of the components


the following graph illustrates it:
Optimizing our Frida setup 164

Size of components in frida-agent.so

Note: The representation of each component is done as of 03/10/2022


and might change in future releases, use this graph as a rough
estimate only.
After seeing what can be easily disabled, it is time to decide what
functionality is needed or not in our application. For example, when
instrumenting Android applications the Swift and Objective-C run-
time are not needed at all. If we have no use for V8 functionality, it
is better to simply strip it from our agent and get a lighter version.
On the other hand, when instrumenting applications in iOS the Java
bridge is likely not needed.
For desktop applications outside of MacOS where the Obj-C brige
is useful, all the functionality can be removed unless you want to
instrument binaries using the V8 runtime.
These optimizations vary on a per-case basis and should be consid-
ered in the later stages of instrumentation development.
As a result of these optimizations, when compiling an agent with no
V8 runtime and no bridges the result are as follows:
optimized: 9.5M frida-agent.so
non-optimized: 24M frida-agent.so

9.5M vs 24M, this is roughly a 61% decrease in the size of the agent
Optimizing our Frida setup 165

(and that’s not counting other libraries such as frida-gadget). The


biggest decrease of size in the agent is due to disabling the V8 runtime
whereas the other features do not have such a big impact (but every
byte counts).
The following table shows suggested settings for different target
Operative Systems:

Feature WindowsLinux MacOS Android iOS


V8 disabled* *disabled *disabled *disabled *disabled
Run-
time
ObjC disabled disabled enabled disabled enabled
bridge
Java disabled disabled disabled enabled disabled
Bridge
Swift disabled disabled enabled disabled enabled
bridge
Database disabled* disabled* disabled* disabled* disabled*

* in disabled means that you should enable if your use case requires
it.
For a more detailed overview of Frida’s memory footprint I recom-
mend readying through frida.re/docs/footprint/¹.

10.1 Building an optimized Frida


agent
In case that you are interested in building a custom agent by yourself,
you can do so by cloning the Frida repository:
git clone --recurse-submodules https://github.com/frida/frida

Once you clone this repository, you will find a file named config.mk
inside with the following settings (among others):

¹https://frida.re/docs/footprint/
Optimizing our Frida setup 166

1 FRIDA_V8 ?= enabled
2 FRIDA_CONNECTIVITY ?= enabled
3 FRIDA_DATABASE ?= enabled
4 FRIDA_JAVA_BRIDGE ?= auto
5 FRIDA_OBJC_BRIDGE ?= auto
6 FRIDA_SWIFT_BRIDGE ?= auto

These are the default settings. It is possible to disable specific features


by setting the enabled flag to disabled. For example, this would
disable the Frida V8 Runtime:

1 FRIDA_V8 ?= disabled
2 FRIDA_CONNECTIVITY ?= enabled
3 FRIDA_DATABASE ?= enabled
4 FRIDA_JAVA_BRIDGE ?= auto
5 FRIDA_OBJC_BRIDGE ?= auto
6 FRIDA_SWIFT_BRIDGE ?= auto

And once this change has been made, the agent can be compiled:
make python-linux-x86_64
11. A real-world use case:
Building an anti-cheat
with Frida
This project is a proof of concept of an anti-cheat that emerged from
a challenge: writing an anti-cheat without modding the client nor
server-side.
However, I believe this small project is worth being documented in
this book so that you are able to see how powerful FRIDA is and the
infinite possibilities you have when using such toolkit.

11.1 Background
The way this proof of concept was born emerged while playing a
Quake 3 engine based game named Jedi Knight: Jedi Academy. This
game features lightsabers (swords for the ones not familiar with Star
Wars) and it is the only game (that I know of) that along with Jedi
Knight: Jedi Outcast features swordfigting using the Quake 3 engine.
These games are still played and competitive players always require
playing under the original game, this is using the original November
2003 binaries. Why? The main Problem with this game is that it was
built using an ancient compiler, Intel’s ICC with a version that it is
not possible to retrieve now but, it was adapted to modern compilers
such as latest MSVC++ and GCC.
This produced some side effects such as differences FPU calculations
and some extra instructions here and there, hence the swordfight-
ing changed altering clashes, damages and rendering useless dual-
wielding (due to an increased block-rate).
Although this issue not only happened to the base game built in newer
compilers. Mods that were built and ran along with the original server
binaries still generated alterations in the swordfighting.
A real-world use case: Building an anti-cheat with Frida 168

With all these issues, competitive players only accepted playing on


the original binaries but this generated some issues:

• No possibility to patch engine exploits.


• No possibility to develop anti-cheat capabilities.
• Ban list is limited.

And this is how the idea itself was born.

11.2 Anti-cheat Requirements


These are the initial requirements for a valid anti-cheat, using the
original binaries:

• No alteration of the game files.


• No alteration of the game binaries or libraries.
• No changes to the combat system.
• Not dependant on client-side.

And the initial set of features:

• Ability to inspect the player’s network information: snaps, rate,


hilt.
• Increase banlist limit.
• Infer timenudge settings.
• Everything must be done in realtime.

Why hilts? In this game lightsaber hilt model’s have slightly different
ignition tags which means they are slightly longer or shorter (gives
an edge in a fight).
A real-world use case: Building an anti-cheat with Frida 169

11.2.1 Timenudge
What is timenudge? It is a client command that adds local lag to try
to make sure you interpolate instead of extrapolate and may give you
advantage when there is a ping difference of +-10ms between two
players (helps predicting where a player will land or make you weird
in the eyes of other players).
The interpolation window is 50ms in servers with a tickrate of 20
(sv_fps 20), the default. With the command cl_timenudge it is possible
to remove the interpolation window, forcing the client to show the
latest position instead of the smooth trajectory.
The cl_timenudge command is only allowed between a range of -
30 (thus reducing in 30ms the interpolation time) and +30, but it is
possible to modify the client to bypass this restriction. This is what
happens to the lagometer (network graph) when setting a timenudge
value of -60, 0 and +60:

The upper blue line where timenudge=0 means that the client is in
sync with the server in real-time whereas in timenudge=-60 there it
shows yellow spikes due to game desyncying.
A real-world use case: Building an anti-cheat with Frida 170

11.3 Quick environment setup


For reference, this is the quick setup I made for this part:

• CentOS 8:
- vim
- dnf install patch pkg-config gcc gcc-c++ make
glibc-devel glibc-devel.i686
• frida:
- pip install frida frida-tools
• Original server binaries:
- wget https://files.jkhub.org/jka/official/jalinuxded_-
1.011.zip

11.4 Anti-cheat architecture


Since we are not able to modify the game files, we will have to build
an anti-cheat around the API that is exposed to us.

For an API reference, we can look at https://github.com/id-


Software/Quake-III-Arena but since the 2003 binary and
the repository code probably differ I will mostly base this
part on the original binary (thus dissassembly).

The Quake3 Virtual Machine is a complex topic and out of the scope
of these pages thus I will only be covering what we need to follow
this part.
What we need to know is that an export named vmMain acts as a
syscall dispatcher, it receives an integer as its first argument and then
checks against the gameExport_t table to verify which event has to be
handled, a quick overview:
A real-world use case: Building an anti-cheat with Frida 171

Member Info
GAME_INIT Called every time a level
changes
GAME_SHUTDOWN ^ same + server shutdown
GAME_CLIENT_- Player or bot is connected
CONNECT to the server
GAME_CLIENT_- Player modifies their info:
USERINFO_CHANGED network, playermodel,
names
GAME_CLIENT_- Player or bot disconnects
DISCONNECT
GAME_CLIENT_THINK Frames when the server is
idle
GAME_CONSOLE_- Falback to engine when a
COMMAND command is not recognized
but might be available
BOTAI_* AI management (bots)
ICARUS_* ICARUS scripting engine
stuff

At first sight, we would be tempted to instrument this vmMain function


but there is a big issue when instrumenting it: This function being
called constantly, and even when there is nothing for us to look
at (i.e: GAME_CLIENT_THINK) it will make the server suffer an
unneccesary overhead.
It is possible to reduce the overhead in the vmMain function in-
strumentation by using CModule, but there would be an increasing
impact the higher the tickerate of the server. However, there might be
a different approach; let’s take a look again at the vmMain function:
A real-world use case: Building an anti-cheat with Frida 172

From this function, we can see that each time an event is triggered
there is anoter function being called which handles the event (since
vmMain is a dispatcher after all) - And if we only instrument the
functions triggered by the events that are interesting to us we would
remove the overhead issue and would be able to build our agent
around.
For example, if we wanted to instrument the event of a player/bot
connecting to our server we could check the second event of the
gameExport_t table GAME_CLIENT_CONNECT. We would only have to
instrument the ClientConnect function. Once the game is loaded and
the GAME_INIT event is triggered, the gamecode jampgamei386.so lib
is loaded in memory and gives us access to mangled mpgame exports
(which includes functions such as ClientConnect, and some other
engine exports).
After obtaining this information, the anti-cheat design can be de-
cided:

a. Only instrument functions triggered by the events that we are


interested in.
A real-world use case: Building an anti-cheat with Frida 173

b. No need to instrument the Quake’s virtual machine itself.


c. Everything has to be built around what is given to us by the
binary.

11.5 Extending the banlist


The game is limited when it comes to banning players, it would be
interesting to us to be able to have an extended banlist as well as one
that could be updated in real-time. For this first task, we will check
the ClientConnect function:

This function provides lots of interesting intel. It uses auxiliary


functions to get the information we need: trap_GetUserinfo and
A real-world use case: Building an anti-cheat with Frida 174

Info_ValueForKey. trap_GetUserinfo (1) returns a string with the


userinfo string of the connected player. This string however has an
specific format, and to avoid having to parse it ourselves we have the
Info_ValueForKey (2) function which is used to grab the IP field and
check it against G_FilterPacket (3).
This is the shape of the userinfo string in memory:

With the information that is now in our hands, it is possible to extend


the banlist:

1. It is possible to detect when a new player connects by instru-


menting every call to the ClientConnect function.
2. trap_getUserinfo has access to the player’s IP address ONLY
while they are in the connection stage.
3. Info_ValueForKey let’s us read the IP field of the userinfo string
or any other fields such as rate and snaps.
4. Use G_FilterPacket to deny access to the server.

This would be the path to filter out banned IPs:


jampgamei386!(ClientConnect -> trap_getUserInfo -> Info_-
ValueForKey("ip") -> G_FilterPacket)

The next step is writing the instrumentation code:

1 class ClientConnect {
2 onEnter (args:NativePointer[]) {
3 let userinfo = Memory.alloc(MAX_INFO_STRING); // siz\
4 e: 1024 or game goes brrrr
5 let isBot = 0;
6 const clientId:number = args[0].toInt32();
7 if (args[2].toInt32() == 1) {
8 isBot = 1
9 log('(bot) clientConnect: ' + clientId);
10 clientList[clientId] = false;
11 }
A real-world use case: Building an anti-cheat with Frida 175

12 else {
13 clientList[clientId] = true;
14 log('clientConnect: ' + clientId);
15 }
16
17 if (!isBot) {
18 getUserInfo(clientId, userinfo, MAX_INFO_STRING);
19 let tmpIp:any = InfoValueforKey(userinfo, ipKey);
20 const clientIP:string|null = tmpIp.readUtf8String();
21
22 log("clientIP: " + tmpIp.readUtf8String());
23 if (bannedIPsArray.includes(tmpIp.readUtf8String())\
24 ) {
25 // if banned, we ban :)
26 log('filtered: ' + clientIP);
27 Interceptor.replace(G_FilterPacketPtr, new Native\
28 Callback((packet) => {
29 return qTrue;
30 }, 'bool', ['pointer']));
31 }
32 }
33
34 const tmpSnaps:any = InfoValueforKey(userinfo, snapsK\
35 ey);
36 log("Snaps: " + tmpSnaps.readUtf8String());
37 const tmpRate:any = InfoValueforKey(userinfo, rateKey\
38 );
39 log("Rate: " + tmpRate.readUtf8String());
40 let nameKey = Memory.allocUtf8String("name");
41 const tmpName:any = InfoValueforKey(userinfo, nameKey\
42 );
43 log("playername: " + tmpName.readUtf8String());
44 }
45 onLeave (retval:NativeReturnValue) {
46 Interceptor.revert(G_FilterPacketPtr);
47 }
48 }

The function being instrumented is ClientConnect which receives


three arguments: the client ID, if it is a bot or not and if they are
new to the server or not. Whenever there is a map change, the client
is considered as not new.
A real-world use case: Building an anti-cheat with Frida 176

args[0] stores the client identifier, args[2] tells us if they are a bot
or not and the same in args[1] goes for new or not clients.

1 let isBot = 0;
2 const clientId:number = args[0].toInt32();
3 if (args[2].toInt32() == 1) {
4 isBot = 1
5 log('(bot) clientConnect: ' + clientId);
6 clientList[clientId] = false;
7 }
8 else {
9 clientList[clientId] = true;
10 log('clientConnect: ' + clientId);
11 }

The first if block consists of reading the clientId and indexing it


in a key-value memory value to keep track of the current clients in
the server and then checking if it is a bot or not. Bot clients will be
ignored.

1 if (!isBot) {
2 getUserInfo(clientId, userinfo, MAX_INFO_STRING);
3 let tmpIp:any = InfoValueforKey(userinfo, ipKey);
4 const clientIP:string|null = tmpIp.readUtf8String();
5
6 log("clientIP: " + tmpIp.readUtf8String());
7 if (bannedIPsArray.includes(tmpIp.readUtf8String())\
8 ) {
9 // if banned, we ban :)
10 log('filtered: ' + clientIP);
11 Interceptor.replace(G_FilterPacketPtr, new Native\
12 Callback((packet) => {
13 return qTrue;
14 }, 'bool', ['pointer']));
15 }
16 }

If the client is not a bot, the trap_getUserInfo function returns the


userinfo string and the Info_ValueForKey function returns the value
of the keys in the userinfo string.
A real-world use case: Building an anti-cheat with Frida 177

If the IP is not valid, Interceptor.replace forces G_FilterPacket


to always return 1 and thus triggering the in-game ban. The list of
blocked IPs is retrieved from a file using the frida-fs API:

1 // Load banned hosts.


2 const bannedIpsData = fs.readFileSync('/home/jedi/bans.da\
3 t');
4 const bannedIPsArray = bannedIpsData.toString().split('\n\
5 ');

1 onLeave (retval:NativeReturnValue) {
2 Interceptor.revert(G_FilterPacketPtr);
3 }

When the ban has been triggered, the G_FilterPacket instrumenta-


tion can be safely reverted.

11.5.1 Monitoring userinfo changes


One of the requirements for the MVP was that we were able to
somehow verify the console variables (knowns as CVARs) for the
client tickrate and their client snaps value. The information required
is stored in the player’s userinfo string and can be queried by the trap
function trap_GetUserInfo.
A real-world use case: Building an anti-cheat with Frida 178

Initially it might be a good idea to get the data via this function but the
problem with that is we are not sure of when to query for it. Theere is
however an alternative path and that is checking whenever the player
changes their userinfo string.
A real-world use case: Building an anti-cheat with Frida 179

From the xreferences to the ClientUserinfoChanged(int) function, it is


possible to notice that there are several times when the client userinfo
changes: Whenever the client changes by themselves their userinfo
string, whenever the player changes teams, when the player connects,
etc… So this would be the optimal place to implement or verification.
Before writing the instrumentation for this function, there are some
auxiliary functions required:

• GetUserInfo function to extract the userinfo string


of a client, defined as getUserInfo = new
NativeFunction(ptr(value.address), 'void', ['int',
'pointer', 'int']);.
• The InfoValueforKey function that was used in the ClientCon-
nect function instrumentation.
• The trap_SendConsoleCommand function to send console
commands to the server: trap_SendConsoleCommand = new
NativeFunction(ptr(value.address), 'void', ['int',
'pointer']);

Once the auxiliary functions are clear we are now ready to instrument
the ClientUserinfoChanged function.
A real-world use case: Building an anti-cheat with Frida 180

1 let userinfo = Memory.alloc(MAX_INFO_STRING); // siz\


2 e: 1024 or game goes brrrr
3 const clientId:number = args[0].toInt32();
4
5 getUserInfo(clientId, userinfo, MAX_INFO_STRING);
6
7 const tmpSnaps:any = InfoValueforKey(userinfo, snapsK\
8 ey);
9 const snaps = tmpSnaps.readUtf8String();
10 const tmpRate:any = InfoValueforKey(userinfo, rateKey\
11 );
12 const rate = tmpRate.readUtf8String();
13 let nameKey = Memory.allocUtf8String("name");
14 const tmpName:any = InfoValueforKey(userinfo, nameKey\
15 );

The first part of our onEnter block’s purpose is to retrieve the userinfo
string of the player that has triggered the action in the same way it
was done when a player connects to the server. It is possible to extract
more information from the infostring but for now it is enough with
snaps, rate, and the playernme.

1 if(parseInt(snaps) > SV_TICKRATE) {


2 const clientKickString = Memory.allocUtf8String("cl\
3 ientkick " + clientId.toString() + "\n");
4 trap_SendConsoleCommand(0, clientKickString);
5 }

If the user sets snaps that are not valid (in this case, a snaps number
bigger than the server’s tickrate) then it is possible to reject the player
by straight up issuing a kick command to the server by leveraging the
trap_SendServerCommand function and issuing the command client-
kick. The next step is to identify whenever a player is trying to set a
number of tickrate different than the standard (25000).
A real-world use case: Building an anti-cheat with Frida 181

1 if(parseInt(rate) != 25000) {
2 const clientKickString = Memory.allocUtf8String("cl\
3 ientkick " + clientId.toString() + "\n");
4 trap_SendConsoleCommand(0, clientKickString);
5 }

This way, it would not be possible for the players in the server to
change their userinfo settings without being removed from the server
immediately.

11.5.2 Predicting timenudge values


The timenudge value is oen of themost interesting bits of data that the
players are interested in obtaining but obtaining this data is not trivial.
This value is a client-side controlled value and there is no direct query
for it in the binary. The only remaining path to obtaining this bit of
data is by guessing, but how?
When explaining the concept of timenudge the important bit is that
it plays with client syncing/desyncing so there might be a way to
predict a client’s timenudge value by checking how synced they are
compared to the server and compensate their ping. After testing out
several formulas (thank you Google Spreadsheets) the following one
brought very good results:

Before writing the instrumentation code we need to think how to ob-


tain this data. The commandtime is one of the fields of the playerState
struct whereas the currentFrame can be determined by instrumenting
an auxiliary function called G_RunFrame:
A real-world use case: Building an anti-cheat with Frida 182

The G_RunFrame function is called periodically by the Quake 3 VM


(as seen in the vmMain() x-reference) and executes the current frame
which is represented as an int in the only param that the function
receives. The player ping is determined by using the same structure
to obtain the commandTime and the server tickrate can either be
hardcoded or obtained by querying the server. The G_RunFrame
function can be instrumented to share the current levelTime with our
instrumentation code:

1 let currentFrame:number = 0;
2
3 class G_RunFrame {
4 onEnter (args:NativePointer[]) {
5 currentFrame = args[0].toInt32();
6 }
7 }

The biggest issue when obtaining the values from the playerState
struct is that it is nested inside another struct named gentity_s
and the only way to do this is by manually calculating the gentity
struct or reverse engineering the code. In my case, I have manually
calculated the required offsets but I understand this is a troublesome
task for the reader hence the best way to go through this is to reverse
engineer them.
A real-world use case: Building an anti-cheat with Frida 183

In this block of disassembly @!!!!!!!! it is possible to obtain the offset


to the playerState ping member and thanks to it is possible to infer
the commandTime playerState member which is the first member of
the playerState struct. Once this is information is gathered, the final
step is deciding where to set up our instrumentation code.
An interesting function is the player_die function that is triggered
whenever a player is killed by another and its params receive g_-
entity values. Having access to the g_entity of the player is important
to be able to determine their commandTime whenever the command is
issued and other data such as the player identifier and wether they
are a bot or not (as seen in previous instrumentation). The player_die
function is declared as follows:
void player_die( gentity_t *self, gentity_t *inflictor,
gentity_t *attacker, int damage, int meansOfDeath )

In this call it would be possible to track the attacker and the attacked
players as well as damage and the means of death (fall, weapon,
environment…). Let’s get our hands on instrumenting this function
and target the attacker’s timenudge value:
A real-world use case: Building an anti-cheat with Frida 184

1 class PlayerDie {
2
3 killer_clientNum:number = -1;
4 killer_cmdTime:number = 0;
5 killer_clientPing:number = 999;
6 onEnter (args:NativePointer[]) {
7 this.killer_clientNum = args[2].readInt();
8 const killer_playerState_s:NativePointer = args[2].ad\
9 d(532);
10
11 this.killer_cmdTime = killer_playerState_s.readPointe\
12 r().readInt();
13 this.killer_clientPing = killer_playerState_s.readPoi\
14 nter().add(524).readInt();
15 }

The attacker value is stored in the third argument (args[2]) and the
first member of the gentity_t structure returns the client identifier.
The playerState structure begins in the offset 0x532 and its first
member matches the client’s commandTime whereas in playerState’s
0x524 offset the client ping is stored.

1 onLeave () {
2
3 if (clientList[this.killer_clientNum]) {
4 let {timenudge, bogusTimenudge} = isBogusTimenudge(\
5 this.killer_cmdTime, currentFrame, this.killer_clientPing\
6 );
7 if (bogusTimenudge === true) {
8 SITHagent_sendServerChatMessage("timenudge: " + t\
9 imenudge.toString());
10
11 const killer_clientKickString = Memory.allocUtf8S\
12 tring("clientkick " + this.killer_clientNum.toString() + \
13 "\n");
14 trap_SendConsoleCommand(0, killer_clientKickStrin\
15 g);
16 }
17 }
18 }
A real-world use case: Building an anti-cheat with Frida 185

In the onLeave callback it grabs the client from the clientList to verify
if it is a bot or not and proceed to calculate their timenudge values.
If the timenudge value is invalid, the player will be automatically
kicked from the server. This codeblock displays a call to an auxiliary
function isBogusTimeNudge that is used to calculate the timenudge
value and validate it. This function is described as follows:

1 const highPingCompensation:number = 19;


2 function isBogusTimenudge(cmdTime:number, currentFrame:nu\
3 mber, ping:number): {timenudge: number, bogusTimenudge:bo\
4 olean} {
5 if (ping > 50) {
6 highPingCompensation = 19;
7 }
8 const timenudge:number = ((cmdTime - currentFrame) + pi\
9 ng - highPingCompensation + (1000/SV_TICKRATE)) * -1;
10 const bogusTimenudge:boolean = timenudge >= ping || tim\
11 enudge < -7;
12 return {timenudge, bogusTimenudge}
13 }

This function takes the ping, the currentFrame and the commandTime
of the client and calculates the timenudge using the aforementioned
formula. Negative values of at least -7 will drop the client and positive
values over the player ping’s are invalid.
Of course, there are alternatives like instrumenting SV_UserMove or
ClientThink_Real that would allow us to get a real-time calculation
of timenudge values but these alternatives rely heavily on server
frames and hence need to be optimized. Later on we will revisit this
idea but first, let’s optimize or current code.

11.6 Optimizing G_RunFrame calls


The G_RunFrame function is the core to our timenudge predictions
but the issue with this function is that it is called on every server
frame and hence it affects the performance of the server itself. This is
specially noticeable if the V8 runtime is used instead of the QuickJS
A real-world use case: Building an anti-cheat with Frida 186

one. To optimize this, it is possible to use Frida’s CModule and


only use get the leveltime whenever it is required, reducing the
performance toll drastically.
The first thing required is to allocate 4 bytes to store the pointer of
the current frame value:
const currentFrame:NativePointer = Memory.alloc(4);

1 let runFrameCModule:CModule = new CModule(`


2 #include <gum/guminterceptor.h>
3
4 extern int currentFrame;
5
6 void onEnter (GumInvocationContext *ic)
7 {
8 currentFrame = (int)gum_invocation_context_get_nth_\
9 argument(ic, 0);
10 }
11 `, {currentFrame});

The next step is instancing a CModule object with a single onEnter


callback, then inside the callback the only remaining step required is
to get the first argument and typecasting it to int. currentFrame is a
shared value between the JS runtime and the CModule, and only pays
the price of the JS VM exit when the currentFrame pointer is accessed.
This only happens when our code attempts to read the currentFrame
value which is only when calculating the player timenudge values. To
instrument the G_RunFrame function the only remaining step is to call
Interceptor.attach:

Interceptor.attach(G_RunFramePtr, runFrameCModule);

Once this is done, the function is instrumented direcly in C. The value


of this pointer can be easily accessed by calling any of Frida’s Memory.
API calls:
currentFrame.readInt() returns the currentFrame value when ac-
cessed.
This use case is a great example of how Frida allows to do high level
stuff directly in JavaScript and also optimize whenever it is required.
A real-world use case: Building an anti-cheat with Frida 187

It is a good practice to first instrument our code in Javascript and


whenever a hotspot is identified research the best way to optimize it.

11.6.1 Persistence across map changes


So far we have been able to have a good enough (for starters) anti-
cheat however there is an issue with it: Whenever there is a map
change, the game library gets reloaded and with it our intercepted
functions are lost.
Internally, the server will send a syscall to vmMain to shutdown
the game which triggers the G_ShutdownGame function. G_Shut-
downGame receives a single int argument telling it to restart it
(switch maps) or completely shutdown the server.

To persist across map changes and map restarts, the best effort is to
A real-world use case: Building an anti-cheat with Frida 188

hook this function and reenable our instrumentation once the server
has restarted (in this case, reloaded the libraries).

1 class GShutDownGame {
2 onLeave() {
3 setTimeout(hookJampgameExports, 3000);
4 }
5 }

This can be achieved by delaying the call to the function that


instruments the functions a couple of seconds (usually, 2 is more than
enough).
Once this has been achieved, our anti-cheat is now persistent across
changes.

11.6.2 Conclusions
This is only a ‘simple’ proof of concept anti-cheat but the main idea
behind this development is to demonstrate how many thungs are
possible by using the Frida toolkit. In most scenarios it will be used
to bypass or extract information from an application but it is also
possible to use it to build around an existing closed binary. This proof
of concept has also helped in demonstrating how it is possible to
optimize our code after identifying bottlenecks.
12. Resources
Resources and references that are have helped write this handbook or
are just useful.

• frida.re - the official Frida website¹


• The Frida API Bible²: Official resource for the Frida JavaScript
API. Keep this page bookmarked.
• Frida’s Telegram Channel³
• Awesome Frida⁴: A curated list of awesome projects, libraries,
and tools powered by Frida.

Technical concepts

• Anatomy of a code tracer⁵: Explains how Frida’s Stalker engine


works in the background
• Marcosatti’s dynamic recompilation guide⁶
• Swift ABI⁷

Scripts and code examples

• Frida CodeShare repository⁸: Repository of snippets that are


usable with Frida’s command line.
• iddoeldor’s frida snippets⁹: Excellent repository, with several
useful and practical examples.
¹https://frida.re
²https://frida.re/docs/javascript-api/
³https://t.me/fridadotre
⁴https://github.com/dweinstein/awesome-frida
⁵https://medium.com/@oleavr/anatomy-of-a-code-tracer-b081aadb0df8
⁶https://github.com/marcosatti/Dynarec_Guide
⁷https://github.com/apple/swift/blob/main/docs/ABI/RegisterUsage.md
⁸https://codeshare.frida.re
⁹https://github.com/iddoeldor/frida-snippets
Resources 190

• Android-related Frida snippets¹⁰: Repository of rocco8620 with


various Frida code examples.
• @enovella’s r2frida practical examples¹¹: Practical examples of
examples using r2frida.

Tutorials

• R2Frida for noobs¹²: Contains various tutorials on reverse engi-


neering and r2frida. Generally his resources are really good.
• r2frida’s wiki¹³: Wiki with examples on how to use the r2frida
plugin.
• entdark’s blogposts¹⁴: Blogposts mostly about Frida and with
practical examples (shameless plug).

¹⁰https://github.com/rocco8620/useful-android-frida-snippets
¹¹https://github.com/enovella/r2frida-wiki
¹²https://bananamafia.dev/post/r2frida-1/
¹³https://r2wiki.readthedocs.io/en/latest/radare-plugins/frida/
¹⁴https://www.entdark.net/search/label/frida

You might also like