You are on page 1of 7

1|Page

Learn Regular Expression (Regex) syntax with C# and .NET
What are Regular Expressions?
Regular Expressions are a powerful pattern matching language that is part of many modern programming languages. Regular Expressions allow you to apply a pattern to an input string and return a list of the matches within the text. Regular expressions also allow text to be replaced using replacement patterns. It is a very powerful version of find and replace. There are two parts to learning Regular Expressions;

 

learning the Regex syntax learning how to work with Regex in your programming language This article introduces you to the Regular Expression syntax. After learning the syntax for Regular Expressions you can use it many different languages as the syntax is fairly similar between languages. Microsoft's .NET Framework contains a set of classes for working with Regular Expressions in theSystem.Text.RegularExpressions namespace.

Download the Regular Expression Designer
When learning Regular Expressions, it helps to have a tool that you can use to test Regex patterns. Rad Software has a Free Regular Expression Tool available for download that will help as you go through the article.

The basics - Finding text
Regular Expressions are similar to find and replace in that ordinary characters match themselves. If I want to match the word "went" the Regular Expression pattern would be "went". Text: Anna Jones and a friend went to lunch Regex: went Matches: Anna Jones and a friend went to lunch went The following are special characters when working with Regular Expressions. They will be discussed throughout the article. . $ ^ { [ ( | ) * + ? \

Matching any character with dot
The full stop or period character (.) is known as dot. It is a wildcard that will match any character except a new line (\n). For example if I wanted to match the 'a' character followed by any two characters. Text: abc def ant cow Regex: a.. Matches: abc def ant cow abc ant If the Singleline option is enabled, a dot matches any character including the new line character.

Matching word characters
Backslash and a lowercase 'w' (\w) is a character class that will match any word character. The following Regular Expression matches 'a' followed by two word characters. Text: abc anaconda ant cow apple Regex: a\w\w Matches: abc anaconda ant cow apple abc

to match Tab and Space use [\t\0x0020] Matching digits The digits zero to nine can be matched using \d (backslash and lowercase 'd'). For example. the following Regular Expression matches any three characters where the Text: Regex: Matches: abc def ant first character is either 'd' or 'a'.2|Page ana ant app Backslash and an uppercase 'W' (\W) will match any non-word character. Matches: "bc " "ef " "nt " "cow" Matching ranges of characters Ranges of characters can be matched using the hyphen (-).. For example. Text: "abc anaconda ant" Regex: a\w\w\s Matches: "abc " Note that ant was not matched as it is not followed by a white space character.[a-d]. Matching white-space White-space can be matched using \s (backslash and 's'). Be careful using \s as it can lead to unexpected behaviour by matching line breaks (\n and \r). Text: abc def ant cow Regex: [^da]. White-space is defined as the space character. e. new line (\n). Sometimes it is better to explicitly specify the characters to match instead of using \s. 'b'.g. the following Regular Expression matches any three digits in a row. Text: 123 12 843 8472 Regex: \d\d\d Matches: 123 12 843 8472 123 843 847 Matching sets of single characters The square brackets are used to specify a set of single characters to match. carriage return (\r). form feed (\f). Any single character within the set will match. Text: abc pen nda uml Regex: . abc def ant cow The caret (^) can be added to the start of the set of characters to specify that none of the characters in the character set should be matched.. tab (\t) and vertical tab (\v). The following Regular Expression matches any three character where the first character is not 'd' and not 'a'. abc def ant cow [da]. the following Regular Expression matches any three characters where the second character is either 'a'. Matches: abc pen nda uml abc nda . 'c' or 'd'. The following Regular Expression matches the letter 'a' followed by two word characters then a white space character.

Matching zero or more times with star (*) The star tells the Regular Expression to match the character. group. Note that the asterisk (*) is usually called the star when talking about Regular Expressions. group. group. The following Regular Expression matches the character 'a' followed by 'n' then optionally followed by another 'n'. After it is found once it will be matched again if it follows the first match. group. Text: Anna Jones and a friend owned an anaconda Regex: a\w* Options: IgnoreCase Matches: Anna Jones and a friend owned an anaconda Anna and a an anaconda Matching one or more times with plus (+) The plus sign tells the Regular Expression to match the character. Text: abc no 0aa i8i Regex: [a-z0-9]\w\w Matches: abc no 0aa i8i abc 0aa i8i The pattern could be written more simply as [a-z\d] Specifying the number of times to match with Quantifiers Quantifiers let you specify the number of times that an expression must match. This means that the character. it can be matched but it does not have to match. The most frequently used quantifiers are the asterisk character (*) and the plus sign (+). The following Regular Expression matches the character 'a' followed by at least one word character. or character class that immediately precedes it zero or more times. or character class is optional.3|Page Ranges of characters can also be combined together. or character class must be found at least once. or character class that immediately precedes it one or more times. This means that the character. Text: Anna Jones and a friend owned an anaconda Regex: an? Options: IgnoreCase Matches: Anna Jones and a friend owned an anaconda An a an a . The question mark matches zero or onetimes. The following Regular Expression matches the character 'a' followed by zero or more word characters. the following Regular Expression matches any of the characters from 'a' to 'z' or any digit from '0' to '9' followed by two word characters. Text: Anna Jones and a friend owned an anaconda Regex: a\w+ Options: IgnoreCase Matches: Anna Jones and a friend owned an anaconda Anna and an anaconda Note that "a" was not matched as it is not followed by any word characters. Matching zero or one times with question mark (?) To specify an optional match use the question mark (?).

There must be two 'n' characters for a match to occur. For example. To specify that a match must occur at the end of a string use the dollar character ( $). the caret (^) will match the beginning of each line in a multiline string rather than only the start of the string. If the Multiline option is on then the pattern will match at the end of each line in a multiline string. IgnoreCase Matches: Jones Microsoft have an online reference for Regex in . Text: an anaconda ate Anna Jones Regex: ^a Matches: an anaconda ate Anna Jones "a" at position 1 The pattern above only matches the a in "an". Text: "an anaconda ate Anna Jones" Regex: \w+$ Options: Multiline. Text: Anna Jones and Anne owned an anaconda Regex: an{2} Options: IgnoreCase Matches: Anna Jones and Anne owned an anaconda Ann Ann A range of matches can be specified by curly brackets with two numbers inside ({n. group. or character class can be specified with the curly brackets ({n}). The first number (n) is the minimum number of matches required. Text: Anna and Anne lunched with an anaconda annnnnex Regex: an{2. This Regular Expression matches the character 'a' followed by a minimum of two 'n' characters and a maximum of three 'n' characters.3} Options: IgnoreCase Matches: Anna and Anne lunched with an anaconda annnnnex Ann Ann annn The Regex stops matching after the maximum number of matches has been found.m}). The following Regular Expression matches the character 'a' followed by a minimum of two 'n' characters.NET . Matching the start and end of a string To specify that a match must occur at the beginning of a string use the caret character (^). If the Multiline option is on. the second (m) is the maximum number of matches permitted.4|Page an an a a Specifying the number of matches The minimum number of matches required for a character. Note that the caret (^) has different behaviour when used inside the square brackets. This Regular Expression pattern matches the word at the end of the line in a multiline string. I want a Regular Expression pattern to match the beginning of the string followed by the character 'a'.NET: Regular Expression Syntax on MSDN To learn more about Regular Expression syntax see the next article: C# Regular Expression (Regex) Examples in .

match grouping.]+)\.com". using System.yahoo.) applies to the whole group making it optional.) closes the group then matches '. \( or \). An example in C# The regular expression classes are in the System. They allow the quantifiers (such as plus and star) to be applied to sections of the match instead of just individual characters. The pattern cannot be changed. Regex exp = new Regex( @"http://(www\. The common character escapes are listed below.Text. Matching special characters with character escapes Special characters such as Tab and carriage return are matched using character escapes.]+)\. If you want to match the round bracket characters you must use the escape character before the bracket e.NET More Advanced Regular Expression Syntax This article continues from Learn Regular Expression (Regex) syntax with C# and . Text: an anaconda ate Anna Jones Regex: \w+\r\n Match: ate Depending on your operating system you might have to combine the \r and \n character escapes to create the correct new line sequence for your platform.html and http://yahoo. In this example. The syntax is similar to C and C#.)?([^\. A regular expression pattern must be specified when creating a Regex object.NET and covers character escapes. Match Grouping Groups perform a few different functions.com'. Special Character \t \r \n \u0020 Description Matches a tab Matches a carriage return Matches a new line Matches a Unicode character using hexadecimal representation.com http://yahoo.g. For Microsoft Windows systems you should generally use \r\n which is a carriage return then line feed (CRLF). matching boundaries and RegexOptions. This regex matches 'http://' optionally followed by 'www.com The question mark after the group (www\.' then starts a group and matches one or more of any character that is not a full stop/period (. To simply match the end of a line or string use the dollar sign ($). A group is specified by the round brackets ( and ).RegularExpressions. The Regex class represents a regular expression. some C# code examples. the Regular Expression pattern matches one or more word characters followed by a carriage return then a new line.5|Page C# Regular Expression (Regex) Examples in .)?([^\.Text.yahoo.com Regex: http://(www\.RegularExpressions namespace. Text: http://www. .com/index. Exactly four digits must be specified.com Matches: http://www.

string InputText = "http://www. } } Groups within a Match can be referenced by number or by name (see below). MatchCollection MatchList = exp.WriteLine("\tMatched:" + GroupCurrent. the following Regular Expression matches one or more word characters followed by a word boundary followed by a hyphen (-) followed by another word boundary followed by one or more word characters. RegexOptions. The MatchCollection class stores a list of successful matches found by applying the regular expression pattern to an input string. Group GroupCurrent. .with an anaconda William-Scott Use \B to specify that a match must not occur on a \b boundary.WriteLine("Group 1 matched"). i < FirstMatch.Count > 0) { if (MatchList[1].WriteLine("\tGroup didn't match").Specifies that no options are set.WriteLine(FirstMatch.Success) { Console.Specifies case-insensitive matching.Success) { Console. For example. Named Groups Groups can be named to allow easier identification with the following syntax.Value). Regular Expression Options Regular Expression Options can be used in the constructor for the Regex class. if (GroupCurrent. The Success property on the group can be used to check if the Group matched or not. if (MatchList.Value).Groups.None . } } Matches also allow sections of the match to be used in replacement expressions when usingRegex. i++) { GroupCurrent = FirstMatch. (?<NameOfGroup>expression) Matching boundaries between words To match a boundary between a word character (\w) and a non-word character (\W) use \b.com/". Console. RegexOptions. The Group class represents a group within the regex pattern.Groups[i].6|Page RegexOptions.IgnoreCase).IgnoreCase . Text: Anna Jones and John William-Scott went to lunch. } else { Console. Match FirstMatch = MatchList[0].yahoo.with an anaconda Regex: \w+\b-\b\w+ Options: IgnoreCase Matches: Anna Jones and John William-Scott went to lunch. The match will occur at the first or last character in words separated by any nonalphanumeric characters.Replace().Count. Each Match object has a Groupscollection. for (int i = 1.Matches(InputText).

. and not just the beginning and end of the entire string.RightToLeft . RegexOptions.). This flag can be used only in conjunction with the IgnoreCase.Specifies that the regular expression is compiled to an assembly. . respectively.Singleline . Changes the meaning of the dot (.IgnorePatternWhitespace .) so it matches every character (instead of every character except \n).ExplicitCapture . The regular expression will be faster to match but it takes more time to compile initially.Specifies that the only valid captures are groups that are explicitly named or in the form (?<name>. The use of this flag with any other flags results in an exception.Specifies single-line mode.Compiled .Multiline . Multiline. and Compiled flags. RegexOptions. Changes the meaning of ^ and $ so they match at the beginning and end.Multiline mode. This option (although tempting) should only be used when the expression will be used many times.g. RegexOptions.ECMAScript . RegexOptions. e.7|Page RegexOptions.Eliminates unescaped white space from the pattern and enables comments marked with the hash sign (#). of any line.Enables ECMAScript-compliant behavior for the expression.Specifies that the search will be from right to left instead of from left to right. in a foreach loop RegexOptions. RegexOptions..