AWK Examples

How awk works (in brief):
Awk works as " search for pattern in the records of the file and if found perform actions on the line"
Awk program consists of Rules- A Rule consists of two things- Pattern and Action.
Pattern to search for and action to perform when pattern is found.
###########################################
Example 1:
Search for the pattern 81491 and print the records having this text.
bos90631:sm017r awk ' /81491/ {print $0} ' test1

sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
Note that entire program is enclosed inside single quotes.
Pattern / / is used for regular expresions.
print $0 and print both are the same. Both print the entire record.
Here print is the action and the action is enclosed within {}
###########################################
Example 2:
Pattern is what helps us select the rows and action indicates

what to do with the rows selected. Actions are always written inside {}.
Either the pattern or action can be omitted but not both.
If pattern is omitted then action is performed on every line.
If action is omitted default action is to print the entire record.
# skipping pattern
bos90631:sm017r awk ' {print }' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
# skipping action
bos90631:sm017r awk '/81491/ ' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
###########################################
Example 3: Empty action field
Empty action field is different than no action.

Not writing {} and not writing anything in {} are two different things.
Not writing anything within {} will print nothing.
There will be no output.
bos90631:sm017r awk '/81491588/ {} ' test1

[no output]
###########################################
Example 4: Multiple rules in single awk program.
We can have multiple rules in a awk program.

Multiple rules can be coded one after another.
If a record satisfies multiple rules then multiple actions will be carried

out on that record.
Example:
awk '/sh/ {print $0}
/nepal/ {print $0}' test1
In above example /sh/ is the 1st pattern and /nepal/ is the 2ns pattern.
Below shows that one record matches both the patterns "sh" and "nepal"
and hence is printed twice(2 actions)
bos90631:sm017r awk '/sh/ {print $0}

/nepal/ {print $0}' test1>
shushant 7798977047 nepal 200/9000/100
shushant 7798977047 nepal 200/9000/100
Example 5: awk program inside a file(-f option)
awk program can be put inside a file and then used usinng -f option
Below example shows that we create a file nepal.awk with awk program
in it.
Note that when put inside the file, we dont need to write the single quotes.
bos90631:sm017r more nepal.awk

/nepal/ {print $0}
bos90631:sm017r awk -f nepal.awk test1

shushant 7798977047 nepal 200/9000/100
Note: If we dont specify the input file (data file) then awk expects input from standard
input , which normally is keyboard.
We end the input from keyboard by pressing cntl+D.
##########################################
Example 6: awk is case senstive.
Note that awk is case sensitive.

bos90631:sm017r awk '/SUKUL/ ' test1
[No Output]
##########################################
Example 7: Creating awk executable.
We can make the awk script executable and then run it like any other sommand we use .ex: sort etc.
We can do this by putting #! /bin/awk -f at the beginning of script and

make the file executable using chmod.
bos90631:sm017r more nepalexec.awk

#! /bin/awk -f
/nepal/ {print $0}
bos90631:sm017r ls -lrt nepalexec.awk
-rwxrwxrwx 1 sm017r dba 34 Aug 7 04:43 nepalexec.awk
bos90631:sm017r /export/home/sm017r/nepalexec.awk test1
shushant 7798977047 nepal 200/9000/100
Note: In this example i had to fully qualify the path as current directory is not added as the path of
variable PATH. This may not be true on your installation.
##########################################
Example 8: Comments in awk program and continuing on next line.

Comments in awk are like shell comments .ie using #
We can move program text to next line after either of , { ? : || && do else.
Other way is to use the '\' at the end of the line . This technque works
anywhere , even betwen a regular expression.
In below example note that the word 'Jhansi' is broken on two lines.
The program still interprets it as "Jhansi" because of the use or "\".
With this "\" we are actually escaping newline character.
bos90631:sm017r more jhansi.awk

#this code will search for jhansi
/Jh\
ansi/ {print $0}

bos90631:sm017r awk -f jhansi.awk test1
bhanu 8097123451 Jhansi 200/1000/500
Example 9: Using semicolon
We can put two statements on the same line separated by a semicolon
bos90631:sm017r awk '/77989/{print $0}; /nep/ {print $0}' test1

shushant 7798977047 nepal 200/9000/100
shushant 7798977047 nepal 200/9000/100
Earlier we had seen a similar example where we had written two rules on
separate lines. However thats not compulsory. We can write different statements on same line
separated by semicolon.
Example 10: Multiple files as input to awk
We can have multiple files as input to awk program.

The name of the current file is specified by the bult in variable
FILENAME.
cp test1 test2
# we copy our test file as test2
bos90631:sm017r awk '{print FILENAME}' test1 test2

test1
test1
test1
test1
test1
test2
test2
test2
test2
test2
Above awk prints one record for each record in the input file.
Thats is why it prints "test1" 5 times and "test2" 5 times.
Example 11: Using Record Separator.(Built variable RS)
Bult In variable RS indicates the record separator.

Default value is \n
We can change the value to any single character.
This will cause the new character to be assumed as record separator.

We can change the value of RS to another character at any point in the program.
However this is normally done at BEGIN pattern so that it applies to all records.
bos90631:sm017r awk 'BEGIN {RS="/"} {print $0}' test1

sukul 8149158828 mumbai 100
900
200
uma 8149122222 chennai 100
800
300
bhanu 8097123451 Jhansi 200
1000
500
shushant 7798977047 nepal 200
9000
100
himanshu 9090909090 bokharo 100
800
300
We can see that this program assumes / as the record separator.

Thus the program treats "sukul 8149158828 mumbai 100" as 1st record and "900" as second and
"200" as 3rd record
RS="" means the records are separated by blank lines.
Example 12: Built in variable FNR and NR
FNR: Current record number in the current file. The value is reset after every file is read
NR: Number of records overall. This is not reset anytime.
bos90631:sm017r awk '{print NR}' test1 test2

1
2
3
4
5
6
7
8
9
10
Note: FNR does not work on my installation.
Example 13: Accessing individual fields in the Record.
We can access individual fields by making use of $ followed by field number.
By default fields are assumed to be separated by whitespaces(tabs and spaces).

We can change this by change this by changing the value of built in variable FS(field separator).
Notice the comma used to separate each field when we print.

If comma is not used the two fields will be concatenated
bos90631:sm017r awk '{print $2,$1}' test1

8149158828 sukul
8149122222 uma
8097123451 bhanu
7798977047 shushant
9090909090 himanshu
Above program prints 2nd and 1st field in the file(assuming space/tab as delimiter)
Example 14: Built in variable NF(number of fields)
NF: Built in variable NF contains the number of fields in a record.
We can also refer fields beyond the last one, but we will get only null string.
$0 represents entire record . Note that refering any field beyond the last field
does not give any error.
awk '{ print NF, $8}' test1
4
4
4
4
4
This prints the number of records in each record of the file test1.
Example 15: Using non-numeric Constant with $
It is not compulsory to use a numeric constant with $ to print a field.

We can use any expression after $ to refer to a field.
NF indicates the number of fields and hence $NF will alwsys print last column.
NR indicates the record number and hence $NR will print 1st field for 1st record, 2nd column for 2nd
row and so on.
bos90631:sm017r awk '{print $NF, $NR}' test1

100/900/200 sukul
100/800/300 8149122222
200/1000/500 Jhansi
200/9000/100 200/9000/100
100/800/300
We can use any expression after $. Notice the use of () to enclose the
expression.
Below example will print the 4th field.
bos90631:sm017r awk '{print $(2*2)}' test1

100/900/200
100/800/300
200/1000/500
200/9000/100
100/800/300
If result of expression is 0 then entire record will print.

Negative number are not allowed.
In below example (3-3) makes it $0 and hence entire record is printed.
bos90631:sm017r awk '{print $(3-3)}' test1

sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
Example 16: Changing value of any field
We can change the value of any field.

This does not change the value of the input file(awk never does that).
However internal copy of the record is changed . This means any change
did to the field is reflected on $0.
Below example changes the phone number field to 9999999999
bos90631:sm017r awk '/81491/ { $2="9999999999";print $0}' test1

sukul 9999999999 mumbai 100/900/200
uma 9999999999 chennai 100/800/300
Example 17: Creating New fields
We can create new fields also.

This creates the new field with appropriate field separators.
Adding new field changes the value of the internal copy of of $0.
To check whether appropriate number of field separators are added
to the field we use OFS built in variable which controls the
output field separator.
Creating a new field also changes the value to NF.

NF gets assigned to the highest field we create.
(Note that just refernceing the out of range field dos not change
the value of $0 or NF)
bos90631:sm017r awk 'BEGIN{ OFS=":"}{ $8="amdocs"; print $0}' test1

sukul:8149158828:mumbai:100/900/200::::amdocs
uma:8149122222:chennai:100/800/300::::amdocs
bhanu:8097123451:Jhansi:200/1000/500::::amdocs
shushant:7798977047:nepal:200/9000/100::::amdocs
himanshu:9090909090:bokharo:100/800/300::::Amdocs
Example 18: Built in Variable FS(Field Separator)
Field separator is a single character or a regex which is used to determine

how awk splits the records into fields.
Field separator is represented by FS.
We can change the value of FS in BEGIN pattern so that is affects all the records.
Default value of S is " "(single space)
Note that Two consecutive spaces(or tabs) does not create an empty field.
But if FS=";" then two consecutive ; will create an empty field.
Here in below example we assume field separtor is /
bos90631:sm017r awk 'BEGIN{ FS="/"}{ print $1}' test1

sukul 8149158828 mumbai 100
uma 8149122222 chennai 100
bhanu 8097123451 Jhansi 200
shushant 7798977047 nepal 200
himanshu 9090909090 bokharo 100
We can set FS="[ ]" if we want to force single space and delimiter.
This will cause two consecutive spaces to to counted as an empty field.
We can set the field separator on the command line using -F option
awk -F/ '{ print $2 }' test1

900
800
1000
9000
800
Example 19: getline command
awk reads input file one record at a time implicitly.

We can also read the record explicitly by making use of getline command.(with no arguments)
The command getline returns a numeric indicating if it was successful or not:

1) 1 if record is found
2) 0 if end of file is encountered.
3) -1 error if file cannot be opened.
When we execute getline without arguments the next record is read in to $0.
The original record that was already in $0 will be overridden.
So we should use getline oney after we are done working with
current record because it gets flushed the moment we read next record using getline.
The value of NF,NR and FNR and $0 are also set as per new record.
Below is an example of file having carriage return and code using getline to fix it.
[test file]
test3:
sukul 8149158828
mumbai 100/900/200
uma 8149122222 chennai
100/800/300
bhanu
8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
fixing carriage return:
awk '{record=$0;noofflds=split(record,a)
while (noofflds!=4 && (getline > 0))
{
record=record $0
noofflds=split(record,a)
}
print record
}' test3
bos90631:sm017r awk '{record=$0;noofflds=split(record,a)

while (noofflds!=4 && (getline > 0))
{
record=record $0
noofflds=split(record,a)
}
print record
>while (noofflds!=4 && (getline > 0))
>{' test3
>record=record $0
>noofflds=split(record,a)
>}' test3
>print record
>}' test3
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
#split is used to split data into array. The number returned by

split is the numeber of array element it creates.
Example 20: getline with variable
getline var
This is used to read the next record explicitly into a variable.
This does not change the value of $0.
This changes the value of variables NR and FNR, but not NF and $0
because the record is not split into fields.
pht022e2:/home/nemo_dev/sm017r> awk '{print $0;(rc=getline var);print var,rc}' test1

sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300 1
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100 1
0
Example 20: getline with a file

getline < "file1"
This is used to take input from any other file which is not a standard input to awk.
We may use this when we want file to be used as lookup.
Filename should be specified in double quotes( " ")
Since the main stream input is not used this does not change the value ofNR or FNR.
But record is split into fields in normal manner , So the value of $0 and NF is changed.
[test file]
vi lookup
sukul
himanshu
uma
shushant
awk '{ name=$1; flag=0;
while ((getline < "lookup") > 0)
{
if (name==$1)
{ flag=1 }
};
if (flag==0)
{ print name " is the black sheep"};
close("lookup")
}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ name=$1; flag=0;

while ((getline < "lookup") > 0)
{
if (name==$1)
{ flag=1 }
};
if (flag==0)
{ print name " is the black sheep"};
close("lookup")
}' test1> > > > > > > > >
bhanu is the black sheep
Note that we need to close the lookup file before we start reading it
from the top.
Also note that the normal getline<"filename" will override the value of $0
that came from the main input. So make sure you save the value to anither variable
before we run the getline command.
Example 21: getline with file (using variable)
getline var <"file1"
This reads the data from the file to a variable and hence does not override the value of $0.
The drawback of reading into a variable is that entire record goes to this variable and we dont enjoy
the benefit of being able to access individual fields as $0 is not changed.
pht022e2:/home/nemo_dev/sm017r> awk '{ flag=0;

while ((getline var < "lookup") > 0)
{
if (var==$1)
{ flag=1 }
};
if (flag==0)
{ print $1 " is the black sheep"};
close("lookup")
}' test1> > > > > > > > >
bhanu is the black sheep
Example 22 : getline with other command with pipe.

command | getline
The string command is executed an the output is piped to awk as input.

The command must be enclosed inside quotes.
This allows to read one row at a time from the command output.
Note that this will change the value of $0 and NF.
pht022e2:/home/nemo_dev/sm017r> awk '{ command="date"; rc=(command | getline);

if (rc>0) {print $0}
close(command)}' test1> >
Wed Aug 8 05:52:51 CDT 2012
Wed Aug 8 05:52:51 CDT 2012
Wed Aug 8 05:52:51 CDT 2012
Wed Aug 8 05:52:51 CDT 2012
Wed Aug 8 05:52:51 CDT 2012
Note that we saved the command to a variable "command" and then used it with getline.
It is easier in case same command is executed several times ad also to close the command.
If not then we have to write the exact command every time and can gget trouble some incase of long
command.
Note that command has to be exact everytime it is referenced
We also have a variation as : command | getline var

This will not change the value of $0. Instead value read from the comand will be stored in the
variable.
Example 23: print command in brief
print is used to print data to the standard output.

We can use print "" to print blank lines.
And we can also insert a \n to print data on two diferent lines using a single print command.
Note that the elements should be separated by comma for results to be separated by spaces.
pht022e2:/home/nemo_dev/sm017r> awk '{ print $1 "\n" $2 } ' test1

sukul
8149158828
uma
8149122222
bhanu
8097123451
shushant
7798977047
himanshu
9090909090
Example 24: Built in variable OUTPUT FIELD SEPARATOR(OFS)
OFS: stands for output field separtor.

Default value is spaces and hence we see values separated by spaces when we print data items
separated by comma.
Basically this decides what characters to use to replace the comma we put
in the print command.
We can change it to any other value and the field separator in output will change.
Following wil change a space delimited file to a comma delimited file.
We generally set the value of OFS in BEGIN pattern so that it applies to all rows.
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN { OFS="," };{ print $0}' test1

sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN { OFS="," };{ print $1,$2,$3,$4}' test1

sukul,8149158828,mumbai,100/900/200
uma,8149122222,chennai,100/800/300
bhanu,8097123451,Jhansi,200/1000/500
shushant,7798977047,nepal,200/9000/100
himanshu,9090909090,bokharo,100/800/300
Example 25: Built in Variable OUTPUT RECORD SEPARATOR
ORS: outtput record separator.
Each print statement prints the ORS after each string is printed.
Default value is \n and hence we mve in to the next record after every print stetement.
We can set this value in BEGIN pattern so as to apply to all records
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN{ ORS="OOOO"} ;{print $0}' test1

sukul 8149158828 mumbai 100/900/200OOOOuma 8149122222 chennai 100/800/300OOOObhanu
8097123451 Jhansi 200/1000/500OOOOshushant 7798977047 nepal 200/9000/100OOOOhimanshu
9090909090 bokharo 100/800/300OOOO
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN{ ORS="\n\n"} ;{print $0}' test1

sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
Note that If the ORS does not contain a new line characters then all output will run on the same line.
Example 26: Built in variable Output Format (OFMT)
When we print numbers awk converts them to string before printing.

It makes use of the function sprintf.
sprintf makes use of variable OFMT which indicates how to print numbers.
Default value of OFMT is %.6g
awk 'BEGIN {OFMT="%d"};{ print 17.35}' test1

This should print 17
(Does not work on my installation.)
Example 27: printf - formatted output
printf is used for output formatting.

We make use of formatting string when working with printf.
Note that printf does not add a newline character at the end.
We need to add newline character explicitly.
The output separators OFS and ORS do not have an effect on printf.
A format specifier starts with % and ends with a format control character.
Following are the format control characters:

c--> ascii character.
d--> decimal integer.
i--> decimal intteger
e--> exponential notation.
f--> floating notation.
g--> prints numbr in either scientific notation or floating point notation,whichever uses characters.
o--> usigned octal integer
s--> string
x--> unsignd hexadecimal integer
%% --> % used to print a %
Modifiers for printf statments:
modifiers spcified between % and the format control letters.

1) -(minus): minus sign before the width modifier specifies left justification
2) 'width': specifies the desired width of output field.

The value is the minimum width and not the maximum width.
If the item is more in width it can be as wide as necessary.
3) .prec : specifies the precision to be used.

NUmber of digits to be printed after the decimal point.
awk 'BEGIN { printf "%-20s | %-20s | %-20s | %30s \n","NAME","PHONENO","CITY","SCORE"}

{ printf "%-20s | %-20s | %-20s | %30s \n",$1,$2,$3,$4 }' test1
pht022e2:/home/nemo_dev/sm017r> awk 'BEGIN { printf "%-20s | %-20s | %-20s | %30s \

n","NAME","PHONENO","CITY","SCORE"}
{ printf "%-20s | %-20s | %-20s | %30s \n",$1,$2,$3,$4 }' test1>
NAME | PHONENO | CITY | SCORE
sukul | 8149158828 | mumbai | 100/900/200
uma | 8149122222 | chennai | 100/800/300
bhanu | 8097123451 | Jhansi | 200/1000/500
shushant | 7798977047 | nepal | 200/9000/100
himanshu | 9090909090 | bokharo | 100/800/300
Example 28: Sending print/printf output to files

Normally print and printf both send output to the standard output which is screen.
But we can make the output go to other places like another file or another command.
To print output to the file we use print > "filename".

Note that filename should be inside double quotes.
awk ' {print $1 >"namefile"}

{print $2 > "phonefile"} ' test1
This shows that wwe can can create multiple files over single read of the input file.
pht022e2:/home/nemo_dev/sm017r> more namefile

sukul
uma
bhanu
shushant
himanshu
pht022e2:/home/nemo_dev/sm017r> more phonefile
8149158828
8149122222
8097123451
7798977047
9090909090
We can also use >> to append to existing file.
Example 29: Sending the print output to a command as input
This means giving print output as input to another command.

The command should be inside double quotes.
awk '{ print $1 > "namefile" ;print $1 | "sort -r > namefilesrt"}' test1
bos90631:sm017r more namefile
sukul
uma
bhanu
shushant
himanshu
bos90631:sm017r more namefilesrt
uma
sukul
shushant
himanshu
bhanu
Though we did not close the output file and command it is recommended
to close the command and file using the close function.
The file or the pipe stays open till we actually close the file/command
or awk exits.
close(filename)
close(command)
The command used to open the pipe should be used exactly to close it.
Reasons to close the command/file:

1) To start reading the file again from top in same awk program the file should be closed and then
reopened.
2) To prevent exceeding number of files open at any given time.
3) If we are pipe data to a command, it buffers the data and only after we close it the command will
execute
(or at the end of awk)
awk ' { report="mailx sukulm@gmail.com";

print "Awk script worked" | report
print " Please check logs" | report
close(report) }' test1
Only when we close the report the maail command runs.
Example 30: Using Regular Expressions
Regular expressions make use of // for matching.

We have seen this earlier.
Following 3 are same:
awk '/foo/ { print $0}' test1

awk '$0 ~ /foo/ {print $0}' test1
awk ' if( $0 ~ /foo/) {print $0}' test1
With in // we can write any regular expressions.

For matching regular expression we use the operator ~ .
For not matching we use !~ (non matching).
Details of regular expressions:
^ --> matches the beginning of the string or beginning of line.
awk '$1 ~ /^s/ {print $1} ' test1

bos90631:sm017r awk '$1 ~ /^s/ {print $1} ' test1
sukul
shushant
____________________________________________
$ --> matches the end of string or end of line.
awk '$1 ~ /u$/ {print $1}' test1

bhanu
himanshu
____________________________________________
[...] --> character set. matches any characters in the brackets.

we can use - to provide ranges . ex: [0-9]
To include the characters \,],-,^ in the character set , we should put a '\' infront of it.
awk '$0 ~ /n[en]/ { print $0} ' test1
bos90631:sm017r awk '$0 ~ /n[en]/ { print $0} ' test1

uma 8149122222 chennai 100/800/300
shushant 7798977047 nepal 200/9000/100
Searches for n followed by either n or e.

____________________________________________
[^...] --> complemented character set.

This matches any characacter except those in square brackets:
awk '$0 ~ /n[^en]/ {print $0}' test1
bos90631:sm017r awk '$0 ~ /n[^en]/ {print $0}' test1

uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
____________________________________________
| --> Alternation operator and used to specify alternatives.
awk '$0 ~ /uma|sukul/ { print $0}' test1

sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
____________________________________________
* --> used to match zero or any number of preceeding regular expression.

awk ' $0 ~ /900*/ {print $0}' test1
sukul 8149158828 mumbai 100/900/200
shushant 7798977047 nepal 200/9000/100
____________________________________________
+ -> used to mtch 1 or more number of preceeding regular expression
os90631:sm017r awk '$0 ~ /90+/ {print $0}' test1

sukul 8149158828 mumbai 100/900/200
shushant 7798977047 nepal 200/9000/100
____________________________________________
? --> used to match 0 or 1(but not more than 1) number of preceeding expression.
awk ' $0 ~ /900?/ {print $0}' test1
____________________________________________
\ --> Used to escape special meaning of special characaters like $
To search for $ we use \$
Example 31: Using boolean operators
&&- AND
||-OR
!- NOT
awk '{if($1 ~ /u/ && $2=="8149158828") print $1 " IS THE LUCKY PERSON"}' test1
sukul IS THE LUCKY PERSON
awk '{if($1 ~ /hu/ || $2 ~ /81491/) print $1 " IS THE LUCKY PERSON"}' test1
bos90631:sm017r awk '{if($1 ~ /hu/ || $2 ~ /81491/) print $1 " IS THE LUCKY PERSON"}' test1
sukul IS THE LUCKY PERSON
uma IS THE LUCKY PERSON
shushant IS THE LUCKY PERSON
himanshu IS THE LUCKY PERSON
Example 32: Using BEGIN and END pattern.
BEGIN pattern is executed before 1st record is read.

END pattern is executed at the end of file.
awk 'BEGIN {print "DATAFILE HEADER"}

{n++;print $0}
END{ print "DATAFILE TRAILER COUNT:" n}' test1

DATAFILE HEADER
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
DATAFILE TRAILER COUNT:5
Note that we can have multiple BEGIN and END patterns and
they get executed in order they are written

Example 33: Setting variable values on command line
-v option can be used to set varibles on the command line
The variables are set even before BEGIN pattern is executed.
-v option is wrritten preceeding the filename arguments as well
the program text.
(not working on my installation)
Variables can be set on the command line before the filenames.

A same variable can be specified with different before every file name.
awk '{ print $n}' n=2 test1 n=3 test2
bos90631:sm017r awk '{ print $n}' n=2 test1 n=3 test2

8149158828
8149122222
8097123451
7798977047
9090909090
mumbai
chennai
Jhansi
nepal
bokharo
Example 34: if - else statement.
awk '{ if ($0 ~ /h/)

{ print " H FOUND IN RECORD " NR}
else
{ print " H NOT FOUND IN RECORD " NR}}' test1

H NOT FOUND IN RECORD 1
H FOUND IN RECORD 2
H FOUND IN RECORD 3
H FOUND IN RECORD 4
H FOUND IN RECORD 5
Example 35: while statement
while (condtion)
body
awk '{ while ( i <=3 )
{print i;
i++}
}' test1

1
2
3
>
do-while executes atleast once and then checks the condition.
Example 36: for loop

We can use for loop just as we use in C.
for (initialization,condition,increment)
body
awk ' { for(i=0;i<=3;i++){ print i}}' test1
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
The loop executed for each record in the file.

The Limitation is that we can only do one initialization and increment in the for structure.
Like other langugage we cannot use "," to provide multiple initializations or increments .
Example 37: For loop for arrays.

We also have another for structure for arrays.
Following is the syntax:
for ( i in arrayname)
{
do something with array[i]
}

bos90631:sm017r awk '{ a[1]=1;a[2]=2;a[3]=3;
for ( i in a)
{ print "Element no " i " is " a[i]
}}' testx
Element no 2 is 2
Element no 3 is 3
Element no 1 is 1
Example 38: continue statement
This causes to skip the rest of the body of the loop causing the next cycle around the loop to begin.
awk '{ a[1]=1;a[2]=2;a[3]=3;

for ( i in a)
{ print "Element no: " i
if (a[i]==2)
{ continue
}
print "Element value: " a[i]
}}' testx

Element no: 2
Element no: 3
Element value: 3
Element no: 1
Element value: 1
Note that element is 2 it continues to the next cycle and ignores the rest of the code.
Example 39: next statement.

The next statement causes awk to immeditely stop the processing of current record and go on to the
next record.
Rest of the code after next is not executed for the current record.
It starts entire processing for the next record.
NEXT is different from getline because of the fact that the next statement abandon the remaining code
processing and starts from beginning for the new record.
But getline continues the remaining processing using newly fetchhed record.
awk ' {if ($1=="bhanu")

next
print $1}' test1

sukul
uma
shushant
himanshu
Note that when name was bhanu processing started with next record and thus
print statement was not executed for bhanu.
Example 40: exit statement

awk immediately exit the program and ignores remaining records.
awk ' {if ($1=="bhanu")

exit
print $1}' test1

sukul
uma
As soon as text "bhanu" is found the program exits.
Example 41: Arrays
Arrays in awk are associative.
Each of the awk elements are identified by their indices.
Awk arrays are different from arrays in other languages:

1) no need to specify the size of the arrays before using them
2) any number or string can be an index.
array1["CAT"]="meoww"
array2["DOG"]="barks"
Above array is valid even when we dont have numeric indices.
Also we can add elememts at any position.

a[1]="Sukul"
a[2]="uma"
a[20]="shushant"
Note that we can add element at 20th position irrespective whether we have added elements 3,4,5...
Notice the below 2 for loops and understand why for(i in array) is used with awk arrays.
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";

for (i=1;i<=5;i++)
{ print a[i]
}
}' testx

sukul
uma
bhanu
Note that since we had not assigned values to a[3] and a[4] above for loop printed blanks for them.
Ideally we should not printed anything because they dont exist.
Thus the above for loop is not inteligent enough to understand whether
the element exists or not.
Instead below for loop makes more sense
awk '{ a[1]="sukul";a[2]="uma";a[5]="bhanu";

for (i in a )
{ print a[i]
}
}' testx
uma
bhanu
sukul
Note that this for loop understand existence or non-existence of an

array element and prints them accordingly.
This is the reason why we use for( i in array) syntax when working with arrays in awk.
Example 42: numeric built in functions
awk ' {
print int(17.23) #gives integer part
print sqrt(900) #gives square root
print exp(2) # exponential
print log(10) # natural log
print sin(30) # sine. (x in radians)
print cos(30) # cosine. (x in radians)
} ' testx

17
30
7.38906
2.30259
30
30
Example 43: String built in function- index
index(string1,string2) : searches string1 for 1st occurenence of string2 and returns the position of
beginning of string2.
If not found it returns zero
Below shows the position of 1st "u" in the data file
awk '{ print index($0,"u")}' test1

2
1
5
3
8
Example 44: String built in function- length
Returns the length of the string input
#prints the lengths of names

awk '{ print length($1)}' test1
5
3
5
8
8
Example 45: String built in function- match
match(string,regexp): searches for regexp in the string

and returns the position where the substring begins and
if no match found returns 0.
It also sets two built in variables

1) RSTART: sets the value of index where the substring begins
2) RLENGTH: length of the characters of matached string
note: did not work on my installation.
Example 46: String built in function- split

split(string,arrayname,separator)
awk splits the string 'string' into array 'arrayname' based on the separator we provide.
Split returns the number of array elements th split created.
If we skip separator, FS value is used.
awk '{ numberofelements=split($0,array1,"u")

print "Record no:" NR
print "Number of array elements created:" numberofelements
print array1[1],"|",array1[2],"|",array1[3]}' test1

Record no:1
Number of array elements created:4
s | k | l 8149158828 m
Record no:2
| ma 8149122222 chennai 100/800/300 |
Record no:3
bhan | 8097123451 Jhansi 200/1000/500 |
Record no:4
sh | shant 7798977047 nepal 200/9000/100 |
Record no:5
himansh | 9090909090 bokharo 100/800/300 |
Example 47: String built in function- sub

Sub stands for substitute.
sub(regexp,replacement,target)
sub replaces the 1st occurence of regexp with the replacement text
in the target.
It returns 0 or 1 depending upon number of strings replaced.
awk '{str = "water, water, everywhere"

sub(/at/, "ith", str);
print str}' test1
wither, water, everywhere
pht022e2:/home/nemo_dev/sm017r> awk '{ sub(/uma/,"shri",$0);print $0}' test1

sukul 8149158828 mumbai 100/900/200
shri 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
Note that the 1st occurenece of "uma" is replaced by "shri".
Note another variance of this using &.

This keeps the original string intact and just appends the new data.
awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:" noofrep, "|", $0}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:"

noofrep, "|", $0}' test1
Replace cnt:0 | sukul 8149158828 mumbai 100/900/200
Replace cnt:1 | uma shri 8149122222 chennai 100/800/300
Replace cnt:0 | bhanu 8097123451 Jhansi 200/1000/500
Replace cnt:0 | shushant 7798977047 nepal 200/9000/100
Replace cnt:0 | himanshu 9090909090 bokharo 100/800/300
Example 48: String built in function- global sub
Same as sub but it replaces all the occurences in the input record.
awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep, "|", $0}' test1
pht022e2:/home/nemo_dev/sm017r> awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep,

"|", $0}' test1
Replace cnt:3 | sAkAl 8149158828 mAmbai 100/900/200
Replace cnt:1 | Ama 8149122222 chennai 100/800/300
Replace cnt:1 | bhanA 8097123451 Jhansi 200/1000/500
Replace cnt:1 | shAshant 7798977047 nepal 200/9000/100
Replace cnt:1 | himanshA 9090909090 bokharo 100/800/300
Example 49: String built in function- substr

Substring is used to extract a part of the string.
substr(string,start,length)
pht022e2:/home/nemo_dev/sm017r> awk '{ s1=substr($0,5,10);print s1}' test1

l 81491588
8149122222
u 80971234
hant 77989
nshu 90909
Example 50: String built in function-toupper, tolower

Used to convert case from upper to lower OR lower to upper case.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}' test1

SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
Example 51: system builtin function- system
Used to execute any system command from awk itself.

The system command is run and control comes back to awk.
pht022e2:/home/nemo_dev/sm017r> awk '{ record=toupper($0);print record}

END { system("ls -lrt test*")}' test1>
SUKUL 8149158828 MUMBAI 100/900/200
UMA 8149122222 CHENNAI 100/800/300
BHANU 8097123451 JHANSI 200/1000/500
SHUSHANT 7798977047 NEPAL 200/9000/100
HIMANSHU 9090909090 BOKHARO 100/800/300
-rw-r----- 1 sm017r nemo_dev 187 Aug 8 05:31 test1
note the last line of the output. It contains the result of ls -lrt test* that was run
from within awk.
Example 52 : understanding ARGV and ARGC.

The command line arguments that we pass to awk program are stored in an array called ARGV.
ARGC: This contains the number of command line arguments.
The ARGV is indexed from 0 to ARGC-1
awk '{print ARGC;

print ARGV[0]
print ARGV[1]}' test1
this prints all the 3 for each line in the input file.
Note that ARGV[1] is the name of the input file .
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
2
awk
test1
Example 52: Built variables ENVIRON and FILENAME
awk also has a array ENVIRON which contains the values of the environment variables.
The index for this array is the name of the variable.
FILENAME variable gives the name of the input file.

If the data is read from standard input the value is set to "-".
awk '{print ENVIRON["HOME"], ENVIRON["SHELL"], FILENAME }' test1

/home/nemo_dev/sm017r /usr/bin/ksh test1
we can see that ENVIRON["HOME"] prints the value of the HOME

environment variable and same also applies to ENVIRON["SHELL"].

AWK Examples

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

AWK Examples

Uploaded by

Copyright:

Available Formats

How awk works (in brief):

Pattern to search for and action to perform when pattern is found.

bos90631:sm017r awk ' /81491/ {print $0} ' test1

Note that entire program is enclosed inside single quotes.

Pattern / / is used for regular expresions.

Here print is the action and the action is enclosed within {}

Pattern is what helps us select the rows and action indicates

Either the pattern or action can be omitted but not both.

If pattern is omitted then action is performed on every line.

If action is omitted default action is to print the entire record.

Example 3: Empty action field

Empty action field is different than no action.

bos90631:sm017r awk '/81491588/ {} ' test1

Example 4: Multiple rules in single awk program.

We can have multiple rules in a awk program.

If a record satisfies multiple rules then multiple actions will be carried

bos90631:sm017r awk '/sh/ {print $0}

Example 5: awk program inside a file(-f option)

bos90631:sm017r more nepal.awk

bos90631:sm017r awk -f nepal.awk test1

Example 6: awk is case senstive.

Note that awk is case sensitive.

Example 7: Creating awk executable.

We can do this by putting #! /bin/awk -f at the beginning of script and

bos90631:sm017r more nepalexec.awk

Example 8: Comments in awk program and continuing on next line.

bos90631:sm017r more jhansi.awk

We can put two statements on the same line separated by a semicolon

bos90631:sm017r awk '/77989/{print $0}; /nep/ {print $0}' test1

Example 10: Multiple files as input to awk

We can have multiple files as input to awk program.

bos90631:sm017r awk '{print FILENAME}' test1 test2

Example 11: Using Record Separator.(Built variable RS)

Bult In variable RS indicates the record separator.

We can change the value to any single character.

This will cause the new character to be assumed as record separator.

bos90631:sm017r awk 'BEGIN {RS="/"} {print $0}' test1

We can see that this program assumes / as the record separator.

Example 12: Built in variable FNR and NR

bos90631:sm017r awk '{print NR}' test1 test2

Note: FNR does not work on my installation.

Example 13: Accessing individual fields in the Record.

We can access individual fields by making use of $ followed by field number.

By default fields are assumed to be separated by whitespaces(tabs and spaces).

Notice the comma used to separate each field when we print.

bos90631:sm017r awk '{print $2,$1}' test1

Example 14: Built in variable NF(number of fields)

NF: Built in variable NF contains the number of fields in a record.

Example 15: Using non-numeric Constant with $

It is not compulsory to use a numeric constant with $ to print a field.

bos90631:sm017r awk '{print $NF, $NR}' test1

bos90631:sm017r awk '{print $(2*2)}' test1

If result of expression is 0 then entire record will print.

In below example (3-3) makes it $0 and hence entire record is printed.

bos90631:sm017r awk '{print $(3-3)}' test1

Example 16: Changing value of any field

We can change the value of any field.

Below example changes the phone number field to 9999999999

bos90631:sm017r awk '/81491/ { $2="9999999999";print $0}' test1

Example 17: Creating New fields

We can create new fields also.

Creating a new field also changes the value to NF.

bos90631:sm017r awk 'BEGIN{ OFS=":"}{ $8="amdocs"; print $0}' test1

Example 18: Built in Variable FS(Field Separator)