Professional Documents
Culture Documents
(\d{0,5})/
Introduction to Regular Expressions
• Basic syntax
• All RegEx statements must begin and end with /
• /something/
• Escaping reserved characters is crucial
• /(i.e. / is invalid because ( must be closed
• However, /\(i\.e\. / is valid for finding ‘(i.e. ’
• Reserved characters include:
• .*?+()[]{}/\|
• Also some characters have special meanings
based on their position in the statement
Regular Expression Matching
• Text Matching
• A RegEx can match plain text
• ex. if ($name =~ /Dan/) { print “match”; }
• But this will match Dan, Danny, Daniel, etc…
• Full Text Matching with Anchors
• Might want to match a whole line (or string)
• ex. if ($name =~ /^Dan$/) { print “match”; }
• This will only match Dan
• ^ anchors to the front of the line
• $ anchors to the end of the line
Regular Expression Matching
• Order of results
• The search will begin at the start of the string
• This can be altered, don’t ask yet
• Every character is important
• Any plain text in the expression is treated literally
• Nothing is neglected (close doesn’t count)
• / s/ is not the same as / s/
• Far easier to write than to debug!
Regular Expression Char Classes
• General Quantifiers
• Some more special characters
• $favoriteNumber =~ /\d*/;
• Matches any size number or no number at all
• $firstName =~ /\w+/;
• Matches one or more characters
• $middleInitial =~ /\w?/;
• Matches one or zero characters
Regular Expression Repetition
• Backreferences
• With all these wildcards and possible matches, we
usually need to know what the expression finally
ended up matching.
• Backreferences let you see what was matched
• Can be used after the expression has evaluated or
even inside the expression itself
• Handled very differently in different languages
• Numbered from left to right, starting at 1
Grouping for Backreferences
• Perl backreferences
• Used inside the expression
• $txt =~ /\b(\w+)\s+\1\b/
• Finds any duplicated word, must use \1 here
• Used after the expression
• $class =~ /(.+?)-(\d+)/
• The first word between hyphens is stored in the
Perl variable $1 (not \1) and the number goes in $2
• print “I am in class $1, section $2”;
Grouping for Backreferences
• Java backreferences
• Annoying but still useful
• Pattern p = Pattern.compile(“(.+?)-(\\d+)”);
Matcher m = p.matcher(mySchedule);
m.find();
System.out.println(“I am in class ” + m.group(1) +
“, section ” + m.group(2));
• Ugly, but usually better than the alternative
• m.group() returns the entire string matched
Grouping for Backreferences
• Javascript backreferences
• Used inside the expression
• Not supported
• Used after the expression
• /(.+?)-(\d+)/.test(class);
• alert(RegExp.$1);
• str = str.replace(/(\S+)\s+(\S+)/, “$2 $1”);
• RegExp supports all of Perl’s special backreference
variables (wait a few slides)
Grouping for Backreferences
• PHP/Python backreferences
• Allows the use of specifically named backreferences
• Groups also maintain their numbers
• .NET backreferences
• Allows named backreferences
• If you try to access named groups by number, stuff
breaks
• In-depth syntax
• http://kobesearch.cpan.org/htdocs/perl/perlreref.html