You are on page 1of 8

< Day Day Up >

3.1. Regular Expressions


3.1.1 Definition and Example
For users already familiar with the concept of regular expression metacharacters, this section may be bypassed. However, this preliminary
material is crucial to understanding the variety of ways in which grep, sed, and awk are used to display and manipulate data.
What is a regular expression? A regular expression
[1]
is just a pattern of characters used to match the same characters in a search. In most
programs, a regular expression is enclosed in forward slashes; for example, /love/ is a regular expression delimited by forward slashes, and
the pattern love will be matched any time the same pattern is found in the line being searched. What makes regular expressions interesting is
that they can be controlled by special metacharacters. If you are new to the idea of regular expressions, let us look at an example that will
help you understand what this whole concept is about. Suppose that you are working in the vi editor on an e-mail message to your friend. It
looks like this:
[1]
If you receive an error message that contains the string RE, there is a problem with the regular expression you are using in the
program.
% vi letter
------------------------------------------------------------------
Hi tom,
I think I failed my anatomy test yesterday. I had a terrible
stomachache. I ate too many fried green tomatoes.
Anyway, Tom, I need your help. I'd like to make the test up
tomorrow, but don't know where to begin studying. Do you
think you could help me? After work, about 7 PM, come to
my place and I'll treat you to pizza in return for your help. Thanks.
Your pal,
guy@phantom
~
~
~
~
----------------------------------------------------------------------
Now, suppose you find out that Tom never took the test either, but David did. You also notice that in the greeting, you spelled Tom with a
lowercase t. So you decide to make a global substitution to replace all occurrences of tom with David, as follows:
% vi letter
------------------------------------------------------------------
Hi David,
I think I failed my anaDavidy test yesterday. I had a terrible
sDavidachache. I think I ate too many fried green Davidatoes.
Anyway, Tom, I need your help. I'd like to make the test up
Davidorrow, but don't know where to begin studying. Do you
think you could help me? After work, about 7 PM, come to
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
1 of 8 1/30/2012 5:29 PM
my place and I'll treat you to pizza in return for your help. Thanks.
Your pal,
guy@phanDavid
~
~
~
--> :1,$s/tom/David/g
----------------------------------------------------------------------
The regular expression in the search string is tom. The replacement string is David. The vi command reads "for lines 1 to the end of the file
($), substitute tom everywhere it is found on each line and replace it with David." Hardly what you want! And one of the occurrences of Tom
was untouched because you only asked for tom, not Tom, to be replaced with David. So what to do? Enter the regular expression
metacharacters.
3.1.2 Regular Expression Metacharacters
Metacharacters are characters that represent something other than themselves. The two types of metacharacters that you will learn about in
this book are shell metacharacters and regular expression metacharacters. They serve different purposes. Shell metacharacters are evaluated
by the UNIX/Linux shell. For example, when you use the command: rm *, the asterisk is a shell metacharacter, called a wildcard, and is
evaluated by the shell to mean "Match on all filenames in the current working directory." The shell metacharacters are described for the shells
in their respective chapters.
Regular expression metacharacters are evaluated by the programs that perform pattern matching, such as vi, grep, sed, and awk.
[2]
They
are special characters that allow you to delimit a pattern in some way so that you can control what substitutions will take place. There are
metacharacters to anchor a word to the beginning or end of a line. There are metacharacters that allow you to specify any characters, or
some number of characters, to find both upper-and lowercase characters, digits only, and so forth. For example, to change the name tom or
Tom to David, the following vi command would have done the job:
[2]
The Korn and Bash shells now support pattern-matching metacharacters similar to the regular expression metacharacters
described for grep, sed, and awk.
:1,$s/\<[Tt]om\>/David/g
This command reads, "From the first line to the last line of the file (1,$), substitute (s) the word Tom or tom with David," and the g flag says
to do this globally (i.e., make the substitution if it occurs more than once on the same line). The regular expression metacharacters are \<
and \> for beginning and end of a word, and the pair of brackets, [Tt], match for one of the characters enclosed within them (in this case,
for either T or t). There are five basic metacharacters that all UNIX/Linux pattern-matching utilities recognize. Table 3.1 presents regular
expression metacharacters that can be used in all versions of vi, ex, grep, egrep, sed, and awk. Additional metacharacters are described for
each of the utilities where applicable.
Table 3.1. Regular Expression Metacharacters
Metacharacter Function Example What It Matches
^ Beginning-of-line anchor /^love/ Matches all lines beginning with love
$ End-of-line anchor /love$/ Matches all lines ending with love
. Matches one character /l..e/ Matches lines containing an l,
followed by two characters, followed
by an e
* Matches zero or more of
the preceding characters
/ *love/ Matches lines with zero or more
spaces, followed by the pattern love
[ ] Matches one in the set /[Ll]ove/ Matches lines containing love or Love
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
2 of 8 1/30/2012 5:29 PM
Metacharacter Function Example What It Matches
[xy] Matches one character
within a range in the set
/[AZ]ove/ Matches letters from A through Z
followed by ove
[^ ] Matches one character
not in the set
/[^AZ]/ Matches any character not in the
range between A and Z
\ Used to escape a
metacharacter
/love\./ Matches lines containing love,
followed by a literal period; Normally
the period matches one of any
character
Additional Metacharacters Supported by Many UNIX/Linux Programs That Use RE Metacharacters
\<
Beginning-of-word
anchor
/\<love/
Matches lines containing a word that
begins with love (supported by vi
and grep)
\> End-of-word anchor /love\>/ Matches lines containing a word that
ends with love (supported by vi and
grep)
\(..\) Tags match characters
to be used later
/\(love\)able
\1er/
May use up to nine tags, starting with
the first tag at the leftmost part of
the pattern. For example, the pattern
love is saved as tag 1, to be
referenced later as \1. In this
example, the search pattern consists
of lovable followed by lover
(supported by sed, vi, and grep)
x\{m\} or
x\{m,\} or
x\{m,n\}
Repetition of character x,
m times, at least m
times, at least m and not
more than n times
[a]
o\{5,10\} Matches if line contains between 5
and 10 consecutive occurrences of
the letter o (supported by vi and
grep)
[a]
Not dependable on all versions of UNIX/Linux or all pattern-matching utilities; usually works with vi and grep.
Assuming that you know how the vi editor works, each metacharacter is described in terms of the vi search string. In the following
examples, characters are highlighted to demonstrate what vi will find in its search.
Example 3.1.
(A simple regular expression search)
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/love/
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
3 of 8 1/30/2012 5:29 PM
-----------------------------------------------------------------
EXPLANATION
The regular expression is love. The pattern love is found by itself and as part of other words, such as lovely, gloves, and clover.
Example 3.2.
(The beginning-of-line anchor (^))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/^love/
-----------------------------------------------------------------
EXPLANATION
The caret (^) is called the beginning-of-line anchor. Vi will find only those lines where the regular expression love is matched at the beginning
of the line, i.e., love is the first set of characters on the line; it cannot be preceded by even one space.
Example 3.3.
(The end-of-line anchor ($))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
4 of 8 1/30/2012 5:29 PM
~
~
~
/love$/
----------------------------------------------------------------
EXPLANATION
The dollar sign ($) is called the end-of-line anchor. Vi will find only those lines where the regular expression love is matched at the end of the
line, i.e., love is the last set of characters on the line and is directly followed by a newline.
Example 3.4.
(Any Single Character (.))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/l.ve/
-----------------------------------------------------------------
EXPLANATION
The dot (.) matches any one character, except the newline. Vi will find those lines where the regular expression consists of an l, followed by
any single character, followed by a v and an e. It finds combinations of love and live.
Example 3.5.
(Zero or more of the preceding character (*))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
5 of 8 1/30/2012 5:29 PM
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/o*ve/
-----------------------------------------------------------------
EXPLANATION
The asterisk (*) matches zero or more of the preceding character.
[a]
It is as though the asterisk were glued to the character directly before it
and controls only that character. In this case, the asterisk is glued to the letter o. It matches for only the letter o and as many consecutive
occurrences of the letter o as there are in the pattern, even no occurrences of o at all. Vi searches for zero or more occurrences of the letter
o followed by a v and an e, finding love, loooove, lve, and so forth.
[a]
Do not confuse this metacharacter with the shell wildcard (*). They are totally different. The shell asterisk matches for zero or
more of any character, whereas the regular expression asterisk matches for zero or more of the preceding character.
Example 3.6.
(A set of characters ([]))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/[Ll]ove/
----------------------------------------------------------------
EXPLANATION
The square brackets match for one of a set of characters. Vi will search for the regular expression containing either an uppercase or
lowercase l followed by an o, v, and e.
Example 3.7.
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
6 of 8 1/30/2012 5:29 PM
(A range of characters ( [ - ] ))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/ove[a-z]/
-----------------------------------------------------------------
EXPLANATION
The dash between characters enclosed in square brackets matches one character in a range of characters. Vi will search for the regular
expression containing an o, v, and e, followed by any character in the ASCII range between a and z. Since this is an ASCII range, the range
cannot be represented as [za].
Example 3.8.
(Not one of the characters in the set ([^]))
% vi picnic
----------------------------------------------------------------
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them? I can only hope love
is forever. I live for you. It's hard to get back in the
groove.
~
~
~
/ove[^a-zA-Z0-9]/
----------------------------------------------------------------
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
7 of 8 1/30/2012 5:29 PM
EXPLANATION
The caret inside square brackets is a negation metacharacter. Vi will search for the regular expression containing an o, v, and e, followed by
any character not in the ASCII range between a and z, not in the range between A and Z, and not a digit between 0 and 9. For example, it will
find ove followed by a comma, a space, a period, and so on, because those characters are not in the set.
< Day Day Up >
3.1. Regular Expressions file:///G:/cubtraining/__UNIX_R__Shells_by_Example__4th_Edition_/0...
8 of 8 1/30/2012 5:29 PM

You might also like