Professional Documents
Culture Documents
https://jex.im/regulex/#!flags=&re=(f%7CF)rank
Download lab files
unzip the file
Frank_Xu
Frank Space Xu
One 1-10 1 or more One 1-10
Uppercase Lowercase space Uppercase Lowercase
space 1 or more
[A-Z]
[a-z] {1,10} \s +
Match all names in a text file
Word Character: [a-zA-Z0-9_]
Word string: a list of word characters
Word Boundary: \b 1. Before the first character in the word string
2. After the last character in the word string
I a 50 year ol , he i 2!
m s d s
\b \b \b \b \b \b \b \b \b \b \b \b \b \b \b \b
I a 50 year ol , he i 2!
m s d s
\b \b \b \b \b \b \b \b \b \b \b \b \b \b \b \b
Match xxx with a preceding string foo Match xxx with a following string foo
has to be Perl-compatible
shorthand classes
\w "word" character (letter,
digit, or underscore)
[a-zA-Z0-9_]
\d digit
\s whitespace (space, tab,
vtab, newline)
-Po
Negative lookup
Match xxx without a preceding string foo Match xxx without a following string foo
• Negative lookup
pattern MUST be
single quote ‘ ’
• a lookup string
length must be
fixed
This is not a name!
First try. can’t have numbers before a name!
Fix name mismatch problems using lookup
negative
negative
Second try. “Ave” can’t be the last name. Need more testing if necessary.
Exclude Ave, St, and Dr from last name
Match phone numbers
• 1234567890
• 123.456.7890 1234567890
• 123-456-7890
• 123 456 7890
• (123)456-7890 [0-9]{10}
• +11234567890
Match any 10 digitals phone numbers (xxxxxxxxxx and x is a digital)
Match any 10 digitals phone numbers with the patten xxx.xxxx.xxx and x is a digital
123.456.7890
\b[0-9]{3} \.[0-9]{3}\.[0-9]{4}\b
1234567890
Match both?
123.456.7890
{0,1}
Test both phone number types
1234567890
Match all four? 123.456.7890
123-456-7890
123 456 7890
Test all four phone number types
1234567890
123.456.7890
Match all five? 123-456-7890
123 456 7890
syntax: (?if then|else) (123)456-7890
(\()? \b[0-9]{3} (?(1) \)|[. -]?) [0-9]{3}[. -]?[0-9]{4}\b
john.doe@gmail.com
john-doe@md.gov
Match email addresses johnDoe001@mozilla.org
Local part domain name
johndoe@ubalt.edu
• Consists of letters, digital, -, .
• 2-15 characters long @ •
•
Consists of letters, digital
1-15 characters long
. • Consists of letters, digital
• 3-4 characters long
Show how many time the key words appears in Java source code
Match HTML code (including tags and
content) using Backreference \n
<h1>This is a heading. </h1>
Opening tag content Closing tag
Match HTML tags using Lazy quantifier
content
unzip .docx to a directory
The content of .docx is saved in a xml file.
minor issue
Replace paragraph (paraID) tag with spaces
grep emails
Show .docx content without unzip to disk -p extract files to pipe, no messages