You are on page 1of 57

Digital Evidence Search with

A Pattern Matching Game


The Power of Regular Expression
What types of evidence to match?
• Personal information
• name, phone number, email address, date of birth, zip code, SSN
• User account validation
• username, password
• Source code
• HTML, Java
• Network
• website visited (URL), IP address, Hex, MAC address, timestamp
• File
• file names, file attributes
What is regular expression (regex)?
• A special text string for describing a search pattern
• string is written in an expression language
• Extremely useful in extracting information from
• text file: source code, log files,
• documents: spreadsheets, PowerPoint, Word (need to unzip them)
• binary strings in files
• Often use with grep command f, r,a,n,k Literal Characters
| Logic: or
( | ) or relation in a group

https://jex.im/regulex/#!flags=&re=(f%7CF)rank
Download lab files
unzip the file

Verify some unzipped files


Verify the txt file. We will search the content of the file using regular expression.
Search for a specific name “Frank”
A simple pattern of all names

Frank_Xu
Frank Space Xu
One 1-10 1 or more One 1-10
Uppercase Lowercase space Uppercase Lowercase

space 1 or more
[A-Z]
[a-z] {1,10} \s +
Match all names in a text file
Word Character: [a-zA-Z0-9_]
Word string: a list of word characters
Word Boundary: \b 1. Before the first character in the word string
2. After the last character in the word string

NOT a Word Character

I a 50 year ol , he i 2!
m s d s
\b \b \b \b \b \b \b \b \b \b \b \b \b \b \b \b
I a 50 year ol , he i 2!
m s d s
\b \b \b \b \b \b \b \b \b \b \b \b \b \b \b \b

Search two characters between any two \b


\<..\>: Don’t’ cross \b, AND must
include a word string [a-zA-Z0-9_]

\b..\b: any boundaries


I a 50 year ol , he i 2!
m s d s
\b \b \b \b \b \b \b \b \b \b \b \b \b \b \b \b
-w : match only whole words [a-zA-Z0-9_]+

! is not a word character


Test the patten in a text
file
look behind look ahead
(?<=foo)xxx  xxx(?=foo)

Match xxx with a preceding string foo Match xxx with a following string foo

has to be Perl-compatible

shorthand classes
\w   "word" character (letter,
digit, or underscore)
[a-zA-Z0-9_]
\d   digit
\s   whitespace (space, tab,
vtab, newline)

-Po
Negative lookup

look behind look ahead

(?<!foo)xxx  xxx(? !foo)

Match xxx without a preceding string foo Match xxx without a following string foo

• Negative lookup
pattern MUST be
single quote ‘ ’

• a lookup string
length must be
fixed
This is not a name!
First try. can’t have numbers before a name!
Fix name mismatch problems using lookup

negative

negative

Second try. “Ave” can’t be the last name. Need more testing if necessary.
Exclude Ave, St, and Dr from last name
Match phone numbers
• 1234567890
• 123.456.7890 1234567890
• 123-456-7890
• 123 456 7890
• (123)456-7890 [0-9]{10}
• +11234567890
Match any 10 digitals phone numbers (xxxxxxxxxx and x is a digital)
Match any 10 digitals phone numbers with the patten xxx.xxxx.xxx and x is a digital

123.456.7890

\b[0-9]{3} \.[0-9]{3}\.[0-9]{4}\b
1234567890
Match both?
123.456.7890

{0,1}
Test both phone number types
1234567890
Match all four? 123.456.7890
123-456-7890
123 456 7890
Test all four phone number types
1234567890
123.456.7890
Match all five? 123-456-7890
123 456 7890
syntax: (?if then|else) (123)456-7890
(\()? \b[0-9]{3} (?(1) \)|[. -]?) [0-9]{3}[. -]?[0-9]{4}\b
john.doe@gmail.com
john-doe@md.gov
Match email addresses johnDoe001@mozilla.org
Local part domain name

johndoe@ubalt.edu
• Consists of letters, digital, -, .
• 2-15 characters long @ •

Consists of letters, digital
1-15 characters long
. • Consists of letters, digital
• 3-4 characters long

[a-zA-Z0-9.-]{2,15} @ [a-zA-Z0-9]{1,15} \. [a-zA-Z0-9]{3,4}


Check if a password is valid
Pattern definition:
• Minimum length of 3, maximum length of 18
• Composed by letters, numbers or dashes or @
Match Java source code
Search string in Java and show line numbers

Show how many time the key words appears in Java source code
Match HTML code (including tags and
content) using Backreference \n
<h1>This is a heading. </h1>
Opening tag content Closing tag
Match HTML tags using Lazy quantifier

<h1>This is a heading. </h1> MUST use -P

Greedy quantifier Lazy quantifier Description


* *? Star Quantifier: 0 or more
+ +? Plus Quantifier: 1 or more
? ?? Optional Quantifier: 0 or 1
{n} {n} ? Quantifier: exactly n
{n,} {n,} ? Quantifier: n or more
{n,m} {n,m} ? Quantifier: between n and m
Match content of HTML code
<h1>This is a heading. </h1>

(?<=<([a-z0-9]{2}) >).* (?=</ \1>)

look behind look ahead


192.168.0.1
Match IP4 address with group () 443.125.121.1
255.255.255.0
89.25.23.0
Match HTTP requests
http://www.ubalt.edu
• http://www.ubalt.edu
Match variations of website
• https://www.ubalt.edu
• www.ubalt.edu
• ubalt.edu
Match Hexadecimal number

Match Hex of colors


The standard (IEEE 802) format for printing MAC-48 addresses in
human-friendly form is six groups of two hexadecimal digits,
separated by hyphens - or colons :.
Match MAC address
01-23-45-67-89-AB
01:23:45:67:89:AB
PaloAlto_00:0a:30
VMware_86:44:c3

Match MAC patterns


Match MAC from a pcap file
tshark help
• -r <infile>, --read-file <infile>:

• set the filename to read from


(or '-' for stdin)
• -e <field>
• field to print if -Tfields
selected (e.g. tcp.port,
_ws.col.Info)
Convert pcap to text
Match the first pattern in a pcap file

Match the second pattern in a pcap file


Grep email from customers.docx
.docx is a compressed file

content
unzip .docx to a directory
The content of .docx is saved in a xml file.

grep “ubalt.edu” but results show many Word format information


Remove xml tag using sed
sed commands
replace character “1” of a phone number with 4

replace character “-” of a phone number with “.”

Remove all “-”


<h1>This is a heading. </h1>
Remove (use lazy match) all html tags first failed attempt due to sed doesn’t support -P

Remove all html tags using [^ not allowed character set]


Remove all xml tags

minor issue
Replace paragraph (paraID) tag with spaces

grep emails
Show .docx content without unzip to disk -p extract files to pipe, no messages

Show .docx props without unzip to disk


Summary
• grep is a powerful tool to extract digital forensic evidence
• sed is a stream editor
• grep/sed use regular expression (regex/pattern) to match text
• Key regex operations
• literal string: cat, character classes: [], [^], or: a|b, group: ()
• quantification: ?, *,+, {}
• scope: \b, \< \>, \w
• greedy vs. lazy: +?, *?, {}?
• back reference: \1, \2, …,\n
• lookahead and lookbehind: (?=), (?<=)
• Need both positive tests and negative tests
https://staff.washington.edu/weller/grep.html

You might also like