You are on page 1of 60

Learning sed and awk

Presented by Yogesh Sawant


January 2008
Course Goal
• This course provides you with
knowledge and skills to
– Develop sed and awk scripts
– Use sed and awk to automate
common tasks
– Use sed and awk to create formatted
reports
Course Map
• A primer on sed and awk
– Conceptual understanding of sed and awk
– Similarities of sed and awk
– Working of sed and awk
– How to invoke sed and awk
• UNIX regular expressions
– Overview of UNIX regular expressions
– Metacharacters
• Delving deeper into sed
– Syntax of sed commands
– Commonly used sed commands
• Delving deeper into awk
– Programming model of awk
– Variables
– Operators
– Conditionals
– Loops
– Arrays
– Functions
A primer on sed and awk
• Objectives
– Learn what is sed and awk
– Learn similarities of sed and awk
– Understand working of sed and awk
– Learn how to use sed and awk
What is sed
• sed is a non-interactive stream-oriented UNIX utility
• sed is used for parsing text files, and to apply textual
transformations to a sequential stream of data
• sed reads the input data stream line by line, applies the
operations that have been specified, and then outputs
the modified data
– $ sed ‘s/needle/magnet/g’ haystack > new_haystack
• sed is often used as a filter in a pipeline
• sed has its origins in ed, the original UNIX line editor
• Basic difference between sed and ed is that ed is not
stream oriented, whereas sed is
• ed is an interactive editor, whereas sed is not
When should you use sed

• To automate editing actions to be


performed on one or more files
• To simplify the task of performing the
same edits on multiple files
• To write data conversion programs
What is AWK
• AWK can be described as: A Pattern-Matching
Programming Language
• AWK is designed for processing text-based data,
either in files or data streams
• A typical example of an awk program is one that
transforms data into a formatted report
• You can trace the lineage of awk to sed and
grep, and through these two programs to ed,
the original UNIX line editor
What does awk offer me?
• View a text file as a textual database made up of
records and fields
• Use variables to manipulate this database
• Use arithmetic and string operations
• Use programming constructs such as loops and
conditionals
• Generate formatted reports
• Define functions
• Execute UNIX commands from a script
• Process the result of UNIX commands
Similarities of sed and awk
• They are invoked using similar syntax
– $ sed ‘instructions’ /foo/bar
– $ awk ‘instructions’ /foo/bar
• They are both stream-oriented, reading input
from text files one line at a time and directing the
result to standard output
• They use regular expressions for pattern
matching
• They allow the user to specify instructions in a
script
How sed and awk work
• Read one line at a time from the input file
• Make a copy of the input line
• Execute the given instructions on the copy
of input line
• Output the modified line
Outp
Input
ut
strea Input
strea
m line
m
instructio
ns
Instructions to sed and awk
• Each instruction has two parts: a pattern and a
procedure
• Pattern is a regular expression delimited with
forward slashes (/)
• Procedure specifies one or more actions to be
performed
• In sed, procedure consists of editing commands
like those used in the line editor
• In awk, procedure consists of programming
statements and functions
Invoking sed
• Specifying instructions on the command line
– sed [-e] ‘instructions’ /foo/bar
– sed ‘s/us/them/’ ring_file.text
– Enclosing single quotes prevent the shell from
interpreting special characters
– sed ‘s/us/them/’ ring_file > new_file
– -e option is necessary only when you specify more
than one instructions
• -e option tells sed to interpret the next argument as an
instruction
• sed –e ‘s/us/them/’ –e ‘s/ring/stone/’
ring_file.text
Invoking sed
• Using a script file
– Editing instructions can be placed in a file
– sed –f scriptfile /foo/bar
– Editing instructions in the file are executed in the
order in which they appear
$ cat subs_file
s/us/them/
s/ring/stone/
$sed –f subs_file ring_file.text
– Comments can be added in the sed script with the
help of number sign (#)
Invoking sed
• Suppressing automatic display of input lines
– -n --quiet, --silent
– By default, sed writes each input line to output after
processing it. This option prevents that.
– $ sed –n ‘/Mordor/ p’ ring_file
• In-place editing of files
– -i --in-place
– GNU sed provides the feature of replacing the original
file with the result of applying sed program
– $ sed –i ‘s/Ring/Stone/’ ring_file
Invoking awk
• Specifying instructions on the command line
– awk ‘instructions’ /foo/bar
– Enclosing single quotes prevent the shell from
interpreting special characters
– $ awk ‘/ya/’ /etc/passwd
– If procedure is not specified in the instruction, default
action is to print the line
– $ awk –F : ‘/ya/ { print $5 }’ /etc/passwd
• -F --field-separator
– This option lets you change the field delimiter
– The default field delimiter is one or more spaces and / or tabs
– Procedure should be enclosed within braces ({})
Invoking awk
• Specifying instructions on the command line
– Multiple instructions can be mentioned separated with
semicolons
– $ awk –F : ‘/ya/ { print $5 ; print $6; print
$7 }’ /etc/passwd
– awk interprets each input line as a record, and each
word on that line as a field
– $0 represents entire input line
– $1, $2, $3 represent individual fields on the input line
Invoking awk
• Using a script file
– Editing instructions can be placed in a file
• -f scriptfile
• --file=scriptfile
– This option instructs the awk utility to get the script from the
specified file

awk –f scriptfile /foo/bar


$ cat awkscr_file
/ya/ {
print $5
print $6
print $7
}
$ awk -F : -f awkscr_file /etc/passwd
Invoking awk
• Assigning value to a variable
– -v var=value
– --assign=var=value
• This option sets value to a variable before the script is
executed. This happens even before the BEGIN
procedure is run.
• The –v option and its assignment must precede all the
file name arguments, as well as the program text
$ cat awkscr_option-v
{
if (match ($0, user))
print user, "exists"
}
$ awk -v user=yogeshs -f awkscr_option-v /etc/passwd
UNIX Regular Expressions
• Objective
– Learn what are regular expressions
– Learn to use regular expressions in
the UNIX environment
UNIX Regular Expressions
• An expression is something that can not be interpreted
literally
• An expression is something that needs to be evaluated
• An expression describes a result
• A regular expression is a string that is used to describe
or match a set of strings, according to certain syntax
rules
Metacharacters

. Matches any single character except newline


* Matches any number (including zero) of single characters that immediately
precedes it
[…] Matches any one of the characters enclosed between brackets
A circumflex (^) as first character inside brackets reverses the match for all
characters
A hyphen (-) is used to indicate a range of characters
^ As first character of regular expression, matches beginning of line
$ As last character of regular expression, matches end of line
\{n\} Matches exactly n occurrences of the single character
that immediately precedes it
\{n,\}Matches at least n occurrences
\{n,m\} Matches any number of occurrences between n and m
\ Escapes the special character that follows
Extended Metacharacters (egrep
and awk)

+ Matches one or more occurrences of the preceding


regular expression
? Matches zero or one occurrences of the preceding
regular expression
| specifies that either the preceding or following regular
expression can be matched (alteration)
$ egrep ‘an|the’ a_case_of_identity
() Groups regular expression
Delving deeper into sed
• Objective
– Understand how sed commands work
– Learn commonly used sed
commands
sed commands

• sed command set consists of 25 commands


• An address is optional with any command
– [address] command
– Address can be a pattern described as a regular expression
• $ sed ‘/Dark/ d’ ring_file
– Address can be specified with the help of a line number
• $ sed ‘3 d’ ring_file
• $ sed ‘5,10 d’ ring_file
• $ sed ‘$ d’ ring_file
– Appending the ! character to the end of an address negates the sense of match
• $ sed ‘/Dark/! d’ ring_file
• Multiple commands can be placed on the same line, separated by
semicolon (;)
– $ sed ‘s/Mortal/Immortal/; s/Men/Gods/’ ring_file
• Command can be grouped at the same address by surrounding the list of
commands in braces
– $ sed ‘2,10 {s/Mortal/Immortal/; s/Men/Gods/}’ ring_file
sed commands – substitution (s)

• [address] s/pattern/replacement/flags
– Regular expression can be delimited with any character except newline
• s#/usr/mail#/usr2/mail#
– If address is mentioned, substitute command is applied to lines
matching it
• $ sed ‘3,5 s/One/None/’ ring_file
• $ sed ‘/Dark/ s/One/None/’ ring_file
– In the replacement section, following characters have special meaning:
• & Replaced by the string matched by the regular expression
– $ sed ‘s/sky/blue \&/’ ring_file
• \n Matches the nth substring previously specified in the pattern using \
( and \)
– $sed ‘s/\(Dark\) \(Lord\)/very \1 and sluggardly \2/’
ring_file
• \ Used to escape the ampersand (&), the backslash (\), and the delimiter
when they are used literally in the replacement section
– $ sed ‘s/\/usr\/mail/\/usr2\/mail/’ mail_user
sed commands – substitution (s)

– Flags that modify the substitution are:


• n A number (1 to 512) indicating that a replacement should
be made for only the nth occurrence of the pattern
– $ sed ‘s/Ring/Stone/2’ ring_file
•g Make changes globally on all occurrences in the
pattern space
– $ sed ‘s/Ring/Stone/g’ ring_file
•p Print the contents of the pattern space
– $ sed –n ‘$p’ ring_file
• w file Write the contents of the pattern space to file
– $ sed ‘s/Ring/Stone/w stonefile’ ring_file
sed commands – delete (d)

• [address] d
• This command takes an address and
deletes the contents of the pattern space if
the line matches the address
• $ sed ‘/ring/ d’ ring_file
• If the line matches the address, the entire
line is deleted, not just the portion of the
line that is matched
sed commands – append (a)
• [address] a\
– text
• This command places the given text after the line that is
matched by address
$cat sedscr_append
/find them/ a\
wherever they are
$ sed –f sedscr_append ring_file
• This command can be applied only to a single line, not a
range of lines
• A backslash is required after a to escape end-of-line
• Text to be appended must be placed on the next line
• To append multiple lines of text, all lines must end with a
backslash, except the last line
sed commands – insert (i)

• [address] i\
– text
• This command places the given text before the line that
is matched by address
$cat sedscr_insert
/find them/ i\
And what the rings do?
$ sed –f sedscr_insert ring_file
• This command can be applied only to a single line, not a
range of lines
• A backslash is required after i to escape end-of-line
• Text to be inserted must be placed on the next line
• To insert multiple lines of text, all lines must end with a
backslash, except the last line
sed command – change (c)

• [address] c\
– text
• This command replaces the line selected by address with given text
$cat sedscr_insert
/bind them/ c\
One Ring to bring Frodo Baggins, and put an end to all the
rings
$ sed –f sedscr_insert ring_file
• A backslash is required after c to escape end-of-line
• Replacement text must be placed on the next line
• To provide multiple lines as replacement text, all lines must end with
a backslash, except the last line
• When a range of lines is specified as address, all lines as a group
are replaced by a single copy of text
sed commands – transform (y)

• [address] y/source-chars/dest-chars/
• This command transliterates any character in the
pattern space which match any of the source-
chars with corresponding character in dest-chars
• $ sed ‘y/DL/dl/’ ring_file
• The replacement is made by character position.
Therefore, it has no idea of a word
• This command affects the entire contents of the
pattern space.
sed commands – print (p)

• [address] p
• This command causes the contents of the
pattern space to be output
• $ sed –n ‘/Mordor/ p’ ring_file
• It is useful when default output is
suppressed using –n option of sed
• Command = prints the line number
• $ sed –n ‘/Mordor/ =’ ring_file
sed commands – write (w)

• [address] w file
• This command appends the contents of the
pattern space to the given file
• $ sed ‘/Mordor/ w places’ ring_file
• Exactly one space must be present between w
and file
• This command will create the file if it does not
exist
• If the file exists, its contents will be overwritten
each time the script is executed
sed commands – next (n)

• [address] n
• This command outputs the contents of pattern space and
then reads the next line of input
• In effect, this command causes the next line of input to
replace the current line in the pattern space. Subsequent
commands in the sed script are applied to the
replacement line, not the current line.
/Men/{
n
s/Dark/White/
}
• Matches any line containing word Men and substitutes
word Dark with White on the next line
sed commands – quit (q)

• [line-address] q
• This command causes sed to stop
reading new input lines (and stop sending
them to output)
• This command can take only a single-line
address. Once the address is reached, the
script will be terminated.
• $ sed ‘/Mordor/ q’ ring_file
• $ sed ‘100q’ ring_file
Delving deeper into awk
• Objective
– Understand programming model of
awk
– Learn variables of awk
– Learn operators of awk
– Learn conditionals, loops of awk
– Learn arrays of awk
– Learn functions of awk
Programming model of AWK
• The essential organization of an awk script is of
the form:
– pattern { action }
– An action is one or more statements that will be
performed on those input lines that match the pattern
• $ awk ‘/ring/ { print }’ ring_file
– If no pattern is specified, the action is performed for
every input line
• $ awk { print } ring_file
– If no action is specified, the default action is to print
the line
• $ awk ‘/ring/’ ring_file
Programming model of AWK

• A pattern can be any of the following:


– /regular expression/
• $ awk '/[Ll]ord|king/ {print}' ring_file
– BEGIN
• Specify action to be taken before any lines are read
• $ awk 'BEGIN {print "howdy, folks"} //' ring_file
– END
• Specify action to be taken after last line is read
• $ awk 'BEGIN {lc=0} // {lc++} END {print lc}'
ring_file
– relational expression
• $ awk 'BEGIN {i=1} i<4 {print; i++}' ring_file
– pattern,pattern
• $ awk '/Dark/,/Shadow/ {print}' ring_file
Variables in awk

• There are three kinds of variables in awk:


– user-defined
– built-in
– fields
• A variable need not be declared or initialized
• A variable can contain a string or numeric value
• An un-initialized variable has empty string as its
string value and zero as its numeric value
Variables in awk

• User-defined variables
– Name of a variable must be a sequence of letters,
digits, and underscores, and it may not begin with a
digit

$ cat awkscr_var-user-defined
BEGIN { FS=":" }
{
if ($7 == "/bin/bash")
bash_users++
}
END { print bash_users, "users are having bash
as their default shell" }
$ awk -f awkscr_var-user-defined /etc/passwd
Variables in awk

• Built-in variables
• There are two types of built-in variables in awk:
– Variables whose values can be changed
• FS defines field separator
• OFS defines output field separator
• RS defines record separator
• ORS defines output record separator
– Variables that can be used, and whose values are
internally updated by awk
• FILENAME contains name of current input file
– All built-in variable’s names are entirely uppercase
Variables in awk

• Field variables
• awk considers each input line as a record, and
each word as a field
– $1, $2, $3 etc. refer to individual fields in the input
record
• $ awk 'BEGIN {FS=":"} {print $1, " is known
as ", $5}' /etc/passwd
– $0 refers to the entire input record
• $ awk 'BEGIN {FS=":"} {if ($7 ==
"/sbin/nologin") print $0}' /etc/passwd
Operators of awk
• Arithmetic operators
+ Addition
- Subtraction
* Multiplication
/ Division
% Modulo
^ Exponentiation
** Exponentiation
• Assignment operators
++ Add 1 to variable
-- Subtract 1 from variable
+= Assign result of addition
-= Assign result of subtraction
*= Assign result of subtraction
/= Assign result of division
%= Assign result of modulo
^= Assign result of exponentiation
**= Assign result of exponentiation
Operators of awk

• Relational operators
< Less than
> Greater than
<= Less than or equal to
>= Greater than or equal to
== Equal to
!= Not equal to
~ Matches
!~ Does not match
• Boolean operators
– Boolean operators allow you to combine a series of comparisons
|| Logical OR
&& Logical AND
! Logical NOT
Conditionals in awk

• A conditional statement allows you to make a test before performing an


action
if (expression)
action1
[else
action2]
• An expression might contain arithmetic, relational, or Boolean operators
• If action consists of more than one statement, it is enclosed within a pair of
in braces
$ cat awkscr_if
{
print $1;
if (index ($7, "bash"))
print " uses bash"
else
print " does not use bash"
}
$ awk -F : -f awkscr_if /etc/passwd
Conditionals in awk

• awk provides a conditional operator that is found


in C programming language
• expr ? action1 : action2
$ cat awkscr_ternary
{
print $1,
index ($7, "bash") ? " uses bash" : "does
not use bash"
}
$ awk -F : -f awkscr_ternary /etc/passwd
Looping in awk

• Loops can be specified using while, do, or for statement


• While loop
while (condition)
action
• If the conditional expression is never true, the action is not
performed
• An action consisting of more than one statement must be enclosed
in braces
$ cat awkscr_while
BEGIN {
i=1
while (i<=10) {
print num, "*", i , "=", num * i
i++
}
}
$ awk -f awkscr_while -v num=100
Looping in awk

• Do loop
do
action
while (action)
• The action is performed at least once
• An action consisting of more than one statement must be enclosed
in braces
$ cat awkscr_do-while
BEGIN {
i=1
do {
print num, "*", i , "=", num * i
i++
}
while (i<=10)
}
$ awk -f awkscr_do-while -v num=5
Looping in awk

• For loop
for (initialization; condition; increment)
action
• This loop starts by executing initialization
• As long as condition is true, it repeatedly executes action, and
then increment
• An action consisting of more than one statement must be
enclosed in braces
$ cat awkscr_for
BEGIN {
for (i=1; i<=10; i++)
print num, "*", i, "=", num * i
}
$ awk -f awkscr_for -v num=3
Looping in awk

• Statements that affect the flow control of a loop are:


– break
• Breaks out of the loop so that no more iterations of the loop are
performed
– continue
• Stops the current iteration and starts a new iteration at the top
• Statements that affect the main input loop of awk are:
– next
• Causes next line of input to be read and then resume execution at
the top of the main loop
– exit
• Exits the main loop and passes control to the END block, if there is
one
Arrays in awk

• An array is a variable that can be used to store a set of values


• In awk, you don’t have to declare size of the array
– users[1] = “root”
• Individual elements are accessed by their index in the array
– print users[3]
• Whether an element exists in an array at a certain index can be
determined as:
if (index in array)
print “subscript index is present”
• To remove an individual element of an array, use the delete
statement:
– delete array[index]
• You can not have a variable and an array with the same name in the
same awk program
Associative arrays in awk

• Index of an associative array can be a string or a number


• There is a special looping syntax for accessing all
elements of an associative array
– for (index in array)
– print index, array[index]
• All arrays in awk are associative arrays

$ cat awkscr_array-associative
BEGIN { FS=":" }
{ shells[$7]++ }
END { for (s in shells) print shells[s], "users are
having", s, "as their default shell"}
$ awk -f awkscr_array-associative /etc/passwd
Functions in awk

• A function is a self-contained computation


that accepts a number of arguments and
returns some value
• Arithmetic functions
– int(x) Returns turncated value of x
• $ awk ‘BEGIN {print int(57.43)}’
– sqrt(x) Returns square root of x
• $ awk ‘BEGIN {print sqrt(25)}’
Functions in awk

• String functions
– length(s) Returns length of string s, or length of $0 if no string is
supplied
• $ awk '{ if (length ($0)) print }' ring_file
– index(s,t) Returns position of substring t in string s, or zero if not
present
• awk -F : '{ if(index($0, "bash")) print}' /etc/passwd
– tolower(s) Translate all uppercase characters in string s to
lowercase and return the new string
• $ awk -F : '{ print tolower ($5)}' /etc/passwd
– toupper(s) Translate all lowercase characters in string s to
uppercase and return the new string
• $ awk -F : '{ print toupper ($5)}' /etc/passwd
– sprintf(“format”, expr) Uses printf format specification for
expr
• awk -F : '{name=sprintf ("%s is known as %s", $1, $5);
print name}' /etc/passwd
Functions in awk

• String functions
– match(s,r) Returns the position in s where regular expression r
begins, or 0 if no occurrences are found
• $ awk -F : '{ if (match ($0, "bash")) print $1 " uses "
$7}' /etc/passwd
– sub(r,s,t) Substitute first occurrence of regular expression r with s
in string t
• If t is not supplied, defaults to $0
• $ awk '{sub ("Dark", "Blue"); print}' ring_file
– gsub(r,s,t) Substitute first occurrence of regular expression r with s
in string t
• If t is not supplied, defaults to $0
• $ awk '{ gsub (":", " "); print }' /etc/passwd
– split(s,a,sep) Parses string s into elements of array a using field
separator sep
• $ awk '{split ($0, all, ":"); print all[1], " is known
as " all[5]}' /etc/passwd
Writing functions with awk
• Function is a program component that can be reused
function name(parameter-list) {
statements
}
• A valid function name is a sequence of letters, digits, and underscores that
doesn’t start with a digit
• The parameter-list is a comma-separated list of function’s arguments and
local variable names
– The argument names are used to hold the argument values passed to the
function
– The local variables are initialized to empty string
– A function can not have a parameter with the same name as the function itself
• Whitespace characters (spaces and tabs) are not allowed between function
name and the open-parenthesis of the argument list
• Passing variables as parameters to a function is the case of pass by value
• When an array is passed as parameter to function, it is the case of call by
reference in awk
• If return statement is not written, then the function returns an unpredictable
value
Epilogue

• Part of solving a problem is knowing which tool to use


• Using awk for simple problems such as 'printing fourth
column from a file' is wasting a mighty tool on trivial
problems
• Use sed and / or cut for such naive problems
• When you need to work context-oriented, use awk
• Context-oriented means: problems like "get all numbers
in a file totaling them" or "get the content of a certain line
and apply changes to the other lines following it
depending on this content" or something like that
Why not to use awk when sed can
do the job?

• Using awk instead of sed has the price of performance


and size
• awk takes a substantially longer time to load compared
to sed or ed, and does its job at a considerably slower
pace
• The real distinguishing point between sed and awk as a
text processor is that awk is able to work with a
persistent context, whereas capabilities of sed in this
area are limited to non-existent. If you - for instance - would
have to sum one field to a total you would do it with awk (it would be
possible to do it with sed, but would be a nightmare - poorly suited
tool for the job)
How to Learn More
• Books
– sed & awk,
By Dale Dougherty and Arnold Robbins,
O’Reilly and Associates Inc.
– Mastering Regular Expressions,
By Jeffery E. F. Friedl,
O’Reilly and Associates Inc.
– Effective awk Programming,
By Arnold Robbins,
O’Reilly and Associates Inc.
How to Learn More
• Internet
– sed
• The sed tutorial from Grymoire
– http://www.grymoire.com/Unix/Sed.html
• Handy one-line sed scripts
– http://sed.sourceforge.net/sed1line.txt
• sed Tutorial
– http://www.gnulamp.com/sed.html
– AWK
• The awk tutorial from Grymoire
– http://www.grymoire.com/Unix/Awk.html
• awk Tutorial
– http://www.grymoire.com/Unix/Awk.html
• The GNU Awk User’s Guide
– http://
www.gnu.org/software/gawk/manual/html_node/index.html