You are on page 1of 86

Reverse engineering and malware analysis

COMP7905A – 2023/24
Behavioral Analysis
Today's Agenda
• Learning malware analysis
• What is and why malware analysis? Who is malware analyst ?
• Type of skills used for malware analysis
• Malware lab environment and malware sources
• Basic code check
• Windows binary (PE) and Function calls
• Windows environment tools
• Linux environment tools
• Behavioral analysis
• Artifacts to collected
• Tools
Learning Malware Analysis
- Why malware analysis? What is malware analysis?
Why malware
analysis is
important?
Malware is used as a component of many
cyber attacks
Most of these cyber-attacks use malicious
software (also called malware) to infect
their targets. Knowledge, skills, and tools
required to analyze malicious software
are essential to detect, investigate, and
defend against such attacks.
Malicious Software == Malware

 from anywhere
 In any form
 Can affect any OS
 target any victim
Malware analysis: technical hands-on course
• Our proposed method is iterative and recursive (spiral of analysis), alternately using
dynamic (behavioral) and static (code) analysis techniques to extract the full functionality of
the executable.
• Run malware inside a virtual machine to monitor how it interacts with the Windows OS.
• Examine the essential forensic artifacts/traces identified during the execution of malware.
• I will look at the packets collected
• I Will examine the assembly code
• However, we can get more if integrated with Network Forensics and Memory Forensics.
Questions to be
answered
•How was the malware initiated?
•What processes were created?
•Which files were linked to the malicious processes?
•Which APIs called or DLLs were accessed?
•What persistent mechanism was used?
•What network connections were established?
•Is C2 involved? DNS or IP addresses?
•What messages were transmitted?
•Are the connections encrypted?
•What category is this malware?
•When were these files compiled?
•Is it packed or obfuscated?
•What imports is this malware used?
•Any hints of the malware’s functionality?
Job: Malware Analysts
Job Descriptions: Threat Analysts
This candidate is expected to have these qualifications:
• Bachelor’s degree or equivalent and experience in analyzing malware
• Must be knowledgeable of operating system internals
• Proactive and self-motivated
• Be able to work in an environment with little supervision
• Knowledge and experience installing and configuring sandbox environments (cuckoo sandbox)
• Knowledge of Computer Forensics
• Experience with:
• Software development with programming languages: C, C++, Java, Python, Shell scripting
• Virtual environments (Vmware or VirtualBox)
• Malware reverse engineering (Static and Behavior analysis tools & techniques)
• Network traffic analysis (Pcap Analysis)
• Memory Forensics (Volatility)
• Writing regular expressions
• Writing rules for malicious software and their network traffic (e.g. using Yara and/or regular expressions)
• Threat Research

• General Responsibilities
• Threat research and malware reverse engineer
• Implementation and integration of threat research and malware reverse engineer in analyzing attack incidents
• Software development (for analyzing malware)
• Encoding and encryption algorithm analysis
• Network traffic analysis
• Memory Analysis
• Detection rule writing (regular expressions, yara, etc)
Malware Analysis / Write up Template
Learning Malware Analysis
- Type and methodology of malware analysis
Malware Analysis Types
 Dynamic analysis
 IOCs?
 Coding
 File formats
 OS Internals
 Reverse engineering
 Anti-analysis
 Memory forensics
Methodology: A Spiral of Malware Analysis
• First proposed by Lenny Z. (2007), then
further extended by Murry and Andrew
(2010).

• An iterative and recursive methodology

• Alternately using behavioral and code analysis


techniques to extract the key functionalities
of the malicious executable.

• A Spiral of Analysis: collect, analyze, review


feedback (Influences), re-analyze (collect
further traces)
Learning Malware Analysis
- Malware lab environment and malware sources
Your personal
Malware Lab
Environment
Virtualization Software
• Basic hardware requirements for Host:
• X64-compatible 2.4 GHz CPU, 8GB RAM, 60GB hard drive
space, VMware, VM Fusion, Virtual Box
• You can check out this document and watch a
demo videos for further information
• VirtualBox
• Free
• VMware
• Player (No snapshot function)
• Workstation for Windows
• Fusion for Mac OSX
VM Images
•Windows 10 Build 18363
• CPU: Enable Virtualization Support from
BIOS
• RAM: 1 GB Free
• HDD: > 60GB

•Linux REMnux VM
• CPU: Enable Virtualization Support from
BIOS
• RAM: 1GB Free
• For Cuckoo Sandbox (Recommend
2+ GB Free)
• HDD: > 80GB
Basic Code Check
- PE, Life of binaries, CreateProcess
Tools & Techniques
• File type identification (file or FileAlyzer)
• Fingerprint by hashing (md5 or sha1)
• Strings (ASCII and Unicode: –a –el –td)*
• Packed and obfuscated malware (PEiD)
• Identify import libraries and API calls (PE - IAT)
• Identify export functions (PE - EAT)
• Checking online sandbox (such as: VirusTotal or
Hybrid-analysis) with hash, but not binary
• Offline malware scans:
• PEStudio v8.97 | pe-bear v0.5.3.1
• Simple Antivirus scanning (clamav)
• APT scanner Thor-Lite with yara
*On 24/10/14, Michal Zalewski published CVE-2014-8485 claiming that “strings” run on untrusted file may
cause potential attacks.
http://lcamtuf.blogspot.hk/2014/10/psa-dont-run-strings-on-untrusted-files.html
PE Format
PE (Portable Executable) Format
• Native format for: Win32 executable, 32-bit DLLs, COM files, OCX
controls, Control Panel Applets (.CPL files), .NET executables and
kernel mode drivers
• Divided by sections (names are irrelevant which are ignored by OS)
• .text - Executable Code Section
• .data (.rdata or .bss) - Data Sections (global or static variables, strings or
constants)
• .rsrc - Resources Section (menu, bitmap, dialog, strings, icon, version info)
• .edata - Export Data Section
• .idata - Import Data Section
• .debug - Debug Section
Example: iauzzy (unpacked)
objdump -p iauzzy.exe
The Exports Section
• Exported functions name and variables are called “symbols”. Each
symbol has an ordinal number, or an ASCII name associated with it
• Symbols can be imported by name or its ordinal
• The export directory points to an array called Export Address Table
(EAT), which is an array of function pointers that contain the address
of an exported function (or symbol)
• DLL are modules that contain exported functions and data. A DLL is
loaded at runtime by its calling modules (EXE or DLL). When a DLL is
loaded, it is mapped into the address space of the calling process.
The Imports Section
• An opposite of Exports Section
• There’s one descriptor for each imported executable
• Each descriptor points to two near identical arrays: Import Address Table (IAT) and Import
Name Table (INT)
• These tables contain elements inside a data structure which contains:
• Function // memory address of the imported function
• Ordinal // Ordinal value of imported API
• AddressOfData // RVA* to an imported API
• ForwarderString // RVA* to a forwarder string
• Delayload reads the library function by runtime using API calls of LoadLibrary and
GetProcAddress. Delayload data is pointed to IMAGE_IMPORT_ENTRY_DELAY_IMPORT
entry in data directory
The Resources Section
• Contains raw data such as icons,
bitmaps and dialogs
• The data directory contains the RVA
and size
• The resources are organized similar to
file system (with directory and file
nodes)
• Each directory can be named by an ID
value
• Malware authors are frequently kept
the encrypted codes in this section
Life of binaries
symbols

Object
source compiler linker binaries
file

Dynamic
library Static
library
Dynamic Kernel ,
library Drivers
Dynamic
library

memory
binaries Loader User
codes

DLLs
Call Stack

Mark R. & Aaron M. (2011) Windows Sysinternals Administration Reference.


Microsoft Press
Function Call, Memory Stack and CPU Instructions
• During a function call, the callee function might require one or more
parameters. A parameter is a specific type of a variable, including a
string, a (signed | unsigned) integer, a pointer to an array (of strings)
• The caller function will put (push) the parameter(s) into the stack of
the running thread before entering (calling) the callee
• There are 2 main instructions are used to put or remove the
parameters to or from the memory stack (inside the running thread).
They are PUSH and POP
• The PUSH instruction puts the data on “top” of the stack and the POP
instruction removes that value from the top of the stack.
• Due to the way stack operates, it is considered as a LIFO data
structure
• The memory stack is placed inside each thread in a way suppose not
overlapping the codes but in a reverse direction of virtual memory
address. Therefore, when we said, a PUSH (i.e. adding 4-bytes of
stack address in 32-bit environment) will result a (4-bytes) decrease
of the virtual memory address in the stack point ESP
Create process
Life of CreateProcess
Convert and validate parameters
Stage 1
and flags

Open Image (EXE) and


Stage 2 create section object

Create Windows
Stage 3
process object

Create Windows
Stage 4
thread object
Windows subsystem New process

Perform Windows subsystem Set up for new process


Stage 5
specific process initialization and thread

Stage 6 Start execution of Final process/image


the internal thread initialization

Start execution at
Return to callers entry point to image
Tracing NotePad Startup
Basic Code Check
- Tools and Techniques
Basic Code Check with Windows
• Analysis tools in Windows
• FileAlyzer
• PEiD
• FileInsight
• BinText
• Dumpbin
• Strings
• PeStudio
FileAlyzer
• Analysis tools in Windows
• FileAlyzer
• Right Click the file -> Analyze file with FileAlyzer 2

 MD5:97E17AD0883F8B44CF4869C4E0ED4E3C
 SHA-1: B51A237BB4F682473C772C7FFD6C6A890CDF6AB1
FileAlyzer (cont.)

 MZ
 PE Header and Sections
 Import Libraries and Functions
FileAlyzer (cont.)

 MZ
 PE Header and Sections
 Import Libraries and Functions
FileAlyzer: General
FileAlyzer: IAT
FileAlyzer: Classification

Requires Internet connection, not encourage


PEiD
Check if packed, IAT, EAT
FileInsight
BinText

43
PeStudio

https://www.youtube.com/live/i2-NQ_73V50?si=uFwruY2EbVzk-o3l
Strings2
Basic Code Check
- Linux environment tools
Basic Code Check with Linux
• Analysis tools in Linux / REMnux
• file
• objdump
• xxd
• strings
• Nm
• Ldd

• Analysis tool for MacOS


• Otool
• Strings
file & objdump
• $ file withme_vbox.exe

 $ objdump -h withme_vbox.exe
xxd & strings
• $ xxd –g1 withme_vbox.exe | less

 $ strings -a withme_vbox.exe > withme_vbox_ascii.txt


 $ strings -el withme_vbox.exe > withme_vbox_unicode.txt
Behavioral Analysis
Behavioral analysis
• Behavioral analysis is any examination performed after executing malware.
Dynamically monitoring of running malware, which is more difficult to conceal
• Behavioral analysis is typically performed after preliminary checking of malware
(aka ‘basic code check’ or ‘basic static analysis’)
• [However] basic static analysis [will usually be] reached a dead end, whether due
to obfuscation, packing or the analyst having exhausted the available … techniques
• Substantial number of free tools are available for collecting, monitoring and run-
time analysis of malware
• Behavioral analysis also help analyst to extract binary or payload if the malware is a
generic dropper
• However, dynamic techniques do have their limitations, because not all code paths
may execute when a piece of malware is run
Workflow of Behavioral Analysis
• Revert VMs from a clean snapshot
• Ensure it is an isolated environment
• Launch analysis tools to monitor:
• File system
• Network activity
• Process
• Registry
• Execute malware as Admin privilege
• Stop the analysis tools and collect information
• Analysis the results
Sandboxing approach
• A sandbox is a security mechanism for running untrusted programs in a safe
environment without fear of harming “real” systems
• A malware sandbox enables users to track how potential malware applications
execute, what system changes were made, and what network traffic was generated,
without risking loss of data or compromising a network
• There are many malware sandboxes, such as:
• Commercial solutions: Norman SandBox (BlueCoat), CWSandbox (ThreatTrack) and JoeSandbox
• Free: VirusTotal, Cuckoo and Malwr
• Good for Triage, but you need to submit your malware to the sandbox hosting
websites
• In-house sandbox solution is extremely expensive and not worth purchasing unless
you have a lot of malware to analyze
Any.one
• https://app.any.run
Install behavioral tools
• Autoruns (v13.91): List auto-start locations • Regshot (v1.8.2): Shows registry and file
• Process Explorer (v16.21): Display processes, changes between snapshots
threads, DLLs loaded • CaptureBAT (v2.0.0):a client honeypot
• Process Monitor (v3.50): Log files, registry, • HandleDiff (v.0.2): Detect changes to the
network, process, thread changes handle tables of process
• ListDLLs (v3.2): Display DLLs loaded • Wireshark (2.6.2): Network Protocol
• TCPView (v3.05): Lists active TCP/UDP Analyzer and packet capture utility
endpoints • ProcDot (1.22 Build 57): Use Process
• VMmap (v3.21): Display of a process’ virtual Monitor and PCAP-log to generate a graph
and physical memory usage • Fakenet (v 1.0): FakeNet to simulate DNS
• Winobj (v.2.22): Display Windows’ Object and various web services
Manager namespace • Noriben (v 1.7.2): Noriben is a Python-based
• BinText (v3.03): Text extractor malware analysis sandbox
Autoruns – Create the baseline
• Create a Autoruns baseline
Process Explorer –
display network information
• Configure Process Explorer and monitoring the real-time network
activity of process
• View -> Select Columns… > Process Network
Process Monitor – Start Capture
• Launch the Process Monitor and Start the capture
procmon: Load filters
• After importing the filters, they
are needed to be loaded to have
effect on the output screen
procmon: Highlight list
• CreateFile • SetPipeInformation
• CreatePipe • StartDevice
• Load Image • StopDevice
• Process Create • TCP Connect
• Process Exit • TCP Receive
• Process Start
• TCP Send
• RegCreateKey
• Thread Create
• RegSetValue
• Thread Exit
• SetAllInformationFile
• SetBasicInformationFIle
• WriteFile
10/13/23
CaptureBAT
• CaptureBAT records local processes’ interactions
with their environment. It is able to monitor the
state of a system during the execution of
applications
• CaptureBAT’s logs tend to be less noisy than
those created by other similar tools like Process
Monitor. This is because CaptureBAT comes with
filters that eliminate the majority of standard,
non-malicious activities from the logs.
• If CaptureBAT is used with the “-c” parameter, it
will capture any files deleted in the background,
allowing user to look at and restore even those
files.
• Using CaptureBAT with the “-n” parameter tells
the tool to capture network traffic, like a sniffer
would, saving the result into a local .cap file.
CaptureBAT – All in one
• Start the CaptureBAT by below command:
• "C:\Program Files\Capture\CaptureBAT.exe" -c -n -l .\logs\CaptureBatLog.txt

• Collect pcap and log files from C:\Program Files\Capture after analysis
• capture_ddmyyyy_xxx.zip (you will be able found pcap file inside the zip)
• logs\*.*
• Press “Enter” to stop capture !!!
• Close the CaptureBAT windows or press Ctrl+C will loss all the result!
Noriben: Seaweed • By Brian Baskin @bbaskin (github.com/Rurik)
Lunch Box
https://github.com/Rurik/Noriben
System Monitor
(sysmmon)
• A Windows system service and device driver that, once
installed, remains resident across system reboots to
monitor and log system activity to the Windows event
log
• It provides detailed information about process
creations, network connections, and changes to file
creation time
• The events are collected and can be sent to Windows
Event or SIEM for subsequently analysis to identify
malicious or anomalous activity
• Note that Sysmon does not provide analysis of the
events it generates, nor does it attempt to protect or
hide itself from attackers.
Using Sysmon for malware investigation
• Sysmon from Sysinternals is a substantial host-level tracing tool that can be help in detecting advanced
threats on your network. In contrast to common Anti-Virus / Host-based intrusion detection system
(HIDS) solutions, Sysmon performs system activity deep monitoring, and log high-confidence indicators of
advanced attacks.
• Sysmon monitors the following activities:
• Process creation (with full command line and hashes)
• Process termination
• Network connections
• File creation timestamps changes
• Driver/image loading
• Create remote threads
• Raw disk access
• Process memory access
• ProcessTampering (Process image change: Process Herpaderping)
• Download the xml files from:
• Sophos.xml or
• SwiftOnSecurity.xml
Source: https://support.sophos.com/support/s/article/KB-000038882 and
https://github.com/SwiftOnSecurity/sysmon-config
Default Installation (1/2)
• The default configuration [only -i
switch] includes the following
events:
• Process create (with SHA1)
• Process terminate
• Driver loaded
• File creation time changed
• RawAccessRead
• CreateRemoteThread
• Sysmon service state changed
Source: https://support.sophos.com/support/s/article/KB-000038882
Default Installation (2/2)
• From the screenshot [-n]
configures Sysmon to Log
network connections as well
• The Sysmon Logs can be viewed
in Event Viewer:
• Application and Services
Logs/Microsoft/Windows/Sysmon/
Operational

Source: https://support.sophos.com/support/s/article/KB-000038882
Custom Installation
• We can also install Sysmon with a custom
configuration by specifying an XML file
during or after installation.
• For example, there are times when we are
interested in the DNS queries made by a
certain endpoint and the executable behind
those requests. In the latest version of
Sysmons, v10 has this capability to log DNS
queries but it's only supported on Windows 10
and later.
• By default, Sysmon does not log DNS requests.
• Fresh installation of Sysmon:
• sysmon –accepteula –I C:\Sophos\Sophos_Sysmon.xml

• If Sysmon is already installed:


• sysmon.exe -c C:\Sophos\Sophos_Sysmon.xml
Source: https://support.sophos.com/support/s/article/KB-000038882
Behavioral Analysis
- The demo
Execute the malware
• Execute the malware with Admin privilege
• Observe the behavioral of malware in Process Explorer
Strings in memory
• Even the malware is obfuscated
by a packer, it will be possible to
read the strings from memory.
DLLs and Handles
View > Lower Pane View > DLLs / Handles
TCP/IP and Threads information
Parent Process: PID 3260
• TCP/IP of parent process  Threads of parent process
Image Path and Parent PID
Child Process: PID 3856, PPID 3260
 Command line of PID3856  Threads of child process

• TCP/IP of child process


Fakedns results
• REMnux fakedns shows…
Kill the malware

Can you kill the malware?


Killed the Process = Safe?
Autoruns - Check persistence
Compare current status with baseline
 Compare current status with captured baseline

An item that wasn’t there last time will show the Green colour

 Further investigate is needed when have new founding


 Fine tune your analysis

78
Process Monitor – Export Data
Select “All events”

Save two different format:


- CSV
- Used in ProcDOT
- PML
- Native Format
Analysis result by Process Monitor
ProcDOT – Generate the Graph

Procmon: Exported as CSV format from Process Monitor


Windump: pcap file captured by
- Wireshark
- CaptureBAT
- FakeNet
- Others…

81
ProcDOT Result: Parent Process
Takeaways
• Behavioral analysis is any examination performed after executing malware.
Dynamically monitoring of running malware, which is more difficult to conceal
• Many free tools (such as: CaptureBAT, Noribien and Sysmon) are available for
collecting, monitoring, and run-time analysis of malware.
• Behavioural analysis also help the analyst to extract binary or payload if the malware
is a generic dropper
• However, dynamic techniques are limited because not all code paths may execute
when a piece of malware is run.
• The security framework starts recommending to use of behavioural analysis to
monitor processes, binaries and network activities of enterprises’ environments to
detect malicious or hacking activities. Tools such as EDR continuously records and store
comprehensive endpoint activity data to allow analysts to hunt threats in real time.
Q&A
Capture – A behavioral analysis tool for
applications and documents
• Thousands of events are generated that would overwhelm an analyst if one would ‘‘listen’’ to
all events.
• Three event-based techniques: user-level API hooking, kernel-level API hooking, and kernel-
level callbacks.
• Some drawbacks - in particular, applications that directly call the kernel and avoid using the
Win32 API cannot be monitored where the Kernel-level callback mechanism is the only
portable solution.
• There is one exclusion list for each monitor: FileSystemMonitor.exl, RegistryMonitor.exl, and
ProcessMonitor.exl.
• Techniques:
• CmRegistryCallback function
• PsSetCreateProcessNotifyRoutine function
• the file monitor driver is a minifilter driver
• Try to get understand Fig. 5 - Capture the architecture diagram
Malware Sandboxing (Build your own
Sandbox)
• Compare the tools this document recommended and the tools we
used in 7905A VM
• Read the Static analysis phase (code analysis)
• Read the Dynamic analysis phase (behavioral analysis)
• Yara will be covered in our yara lecture

You might also like