8 1 Evidence Search A Pattern Match Game

Digital Evidence Search with
A Pattern Matching Game

The Power of Regular Expression
What types of evidence to match?
• Personal information
• name, phone number, email address, date of birth, zip code, SSN
• User account validation
• username, password
• Source code
• HTML, Java
• Network
• website visited (URL), IP address, Hex, MAC address, timestamp
• File
• file names, file attributes
What is regular expression (regex)?
• A special text string for describing a search pattern
• string is written in an expression language
• Extremely useful in extracting information from
• text file: source code, log files,
• documents: spreadsheets, PowerPoint, Word (need to unzip them)
• binary strings in files
• Often use with grep command f, r,a,n,k Literal Characters
| Logic: or
( | ) or relation in a group
https://jex.im/regulex/#!flags=&re=(f%7CF)rank
Download lab files
unzip the file
Verify some unzipped files

Verify the txt file. We will search the content of the file using regular expression.
Search for a specific name “Frank”
A simple pattern of all names
Frank_Xu
Frank Space Xu
One 1-10 1 or more One 1-10
Uppercase Lowercase space Uppercase Lowercase
space 1 or more
[A-Z]
[a-z] {1,10} \s +
Match all names in a text file
Word Character: [a-zA-Z0-9_]
Word string: a list of word characters
Word Boundary: \b 1. Before the first character in the word string
2. After the last character in the word string
NOT a Word Character
I a 50 year ol , he i 2!
m s d s
\b \b \b \b \b \b \b \b \b \b \b \b \b \b \b \b
m s d s
Search two characters between any two \b

\<..\>: Don’t’ cross \b, AND must
include a word string [a-zA-Z0-9_]
\b..\b: any boundaries

m s d s
-w : match only whole words [a-zA-Z0-9_]+
! is not a word character

Test the patten in a text
file
look behind look ahead
(?<=foo)xxx xxx(?=foo)
Match xxx with a preceding string foo Match xxx with a following string foo
has to be Perl-compatible
shorthand classes
\w "word" character (letter,
digit, or underscore)
[a-zA-Z0-9_]
\d digit
\s whitespace (space, tab,
vtab, newline)
-Po
Negative lookup
(?<!foo)xxx xxx(? !foo)
Match xxx without a preceding string foo Match xxx without a following string foo
• Negative lookup
pattern MUST be
single quote ‘ ’
• a lookup string
length must be
fixed
This is not a name!
First try. can’t have numbers before a name!
Fix name mismatch problems using lookup
negative
negative
Second try. “Ave” can’t be the last name. Need more testing if necessary.
Exclude Ave, St, and Dr from last name
Match phone numbers
• 1234567890
• 123.456.7890 1234567890
• 123-456-7890
• 123 456 7890
• (123)456-7890 [0-9]{10}
• +11234567890
Match any 10 digitals phone numbers (xxxxxxxxxx and x is a digital)
Match any 10 digitals phone numbers with the patten xxx.xxxx.xxx and x is a digital
123.456.7890
\b[0-9]{3} \.[0-9]{3}\.[0-9]{4}\b
1234567890
Match both?
123.456.7890
{0,1}
Test both phone number types
1234567890
Match all four? 123.456.7890
123-456-7890
123 456 7890
Test all four phone number types
1234567890
123.456.7890
Match all five? 123-456-7890
123 456 7890
syntax: (?if then|else) (123)456-7890
(\()? \b[0-9]{3} (?(1) \)|[. -]?) [0-9]{3}[. -]?[0-9]{4}\b
john.doe@gmail.com
john-doe@md.gov
Match email addresses johnDoe001@mozilla.org
Local part domain name
johndoe@ubalt.edu
• Consists of letters, digital, -, .
• 2-15 characters long @ •
•
Consists of letters, digital
1-15 characters long
. • Consists of letters, digital
• 3-4 characters long
[a-zA-Z0-9.-]{2,15} @ [a-zA-Z0-9]{1,15} \. [a-zA-Z0-9]{3,4}

Check if a password is valid
Pattern definition:
• Minimum length of 3, maximum length of 18
• Composed by letters, numbers or dashes or @
Match Java source code
Search string in Java and show line numbers
Show how many time the key words appears in Java source code
Match HTML code (including tags and
content) using Backreference \n
<h1>This is a heading. </h1>
Opening tag content Closing tag
Match HTML tags using Lazy quantifier
<h1>This is a heading. </h1> MUST use -P
Greedy quantifier Lazy quantifier Description

* *? Star Quantifier: 0 or more
+ +? Plus Quantifier: 1 or more
? ?? Optional Quantifier: 0 or 1
{n} {n} ? Quantifier: exactly n
{n,} {n,} ? Quantifier: n or more
{n,m} {n,m} ? Quantifier: between n and m
Match content of HTML code
(?<=<([a-z0-9]{2}) >).* (?=</ \1>)

192.168.0.1
Match IP4 address with group () 443.125.121.1
255.255.255.0
89.25.23.0
Match HTTP requests
http://www.ubalt.edu
• http://www.ubalt.edu
Match variations of website
• https://www.ubalt.edu
• www.ubalt.edu
• ubalt.edu
Match Hexadecimal number
Match Hex of colors

The standard (IEEE 802) format for printing MAC-48 addresses in
human-friendly form is six groups of two hexadecimal digits,
separated by hyphens - or colons :.
Match MAC address
01-23-45-67-89-AB
01:23:45:67:89:AB
PaloAlto_00:0a:30
VMware_86:44:c3
Match MAC patterns

Match MAC from a pcap file
tshark help
• -r <infile>, --read-file <infile>:
• set the filename to read from

(or '-' for stdin)
• -e <field>
• field to print if -Tfields
selected (e.g. tcp.port,
_ws.col.Info)
Convert pcap to text
Match the first pattern in a pcap file
Match the second pattern in a pcap file

Grep email from customers.docx
.docx is a compressed file
content
unzip .docx to a directory
The content of .docx is saved in a xml file.
grep “ubalt.edu” but results show many Word format information

Remove xml tag using sed
sed commands
replace character “1” of a phone number with 4
replace character “-” of a phone number with “.”
Remove all “-”

Remove (use lazy match) all html tags first failed attempt due to sed doesn’t support -P
Remove all html tags using [^ not allowed character set]

Remove all xml tags
minor issue
Replace paragraph (paraID) tag with spaces
grep emails
Show .docx content without unzip to disk -p extract files to pipe, no messages
Show .docx props without unzip to disk

Summary
• grep is a powerful tool to extract digital forensic evidence
• sed is a stream editor
• grep/sed use regular expression (regex/pattern) to match text
• Key regex operations
• literal string: cat, character classes: [], [^], or: a|b, group: ()
• quantification: ?, *,+, {}
• scope: \b, \< \>, \w
• greedy vs. lazy: +?, *?, {}?
• back reference: \1, \2, …,\n
• lookahead and lookbehind: (?=), (?<=)
• Need both positive tests and negative tests
https://staff.washington.edu/weller/grep.html

8 1 Evidence Search A Pattern Match Game

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

8 1 Evidence Search A Pattern Match Game

Uploaded by

Copyright:

Available Formats

Digital Evidence Search with

A Pattern Matching Game

Verify some unzipped files

NOT a Word Character

Search two characters between any two \b

\b..\b: any boundaries

! is not a word character

look behind look ahead

(?<!foo)xxx xxx(? !foo)

[a-zA-Z0-9.-]{2,15} @ [a-zA-Z0-9]{1,15} \. [a-zA-Z0-9]{3,4}

<h1>This is a heading. </h1> MUST use -P

Greedy quantifier Lazy quantifier Description

(?<=<([a-z0-9]{2}) >).* (?=</ \1>)

look behind look ahead

Match Hex of colors

Match MAC patterns

• set the filename to read from

Match the second pattern in a pcap file

grep “ubalt.edu” but results show many Word format information

replace character “-” of a phone number with “.”

Remove all “-”

Remove all html tags using [^ not allowed character set]

Show .docx props without unzip to disk

You might also like