Professional Documents
Culture Documents
AWK Examples
AWK Examples
Awk works as " search for pattern in the records of the file and if found perform actions on the line"
Awk program consists of Rules- A Rule consists of two things- Pattern and Action.
###########################################
Example 1:
Search for the pattern 81491 and print the records having this text.
print $0 and print both are the same. Both print the entire record.
###########################################
Example 2:
# skipping pattern
bos90631:sm017r awk ' {print }' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
bhanu 8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
# skipping action
bos90631:sm017r awk '/81491/ ' test1
sukul 8149158828 mumbai 100/900/200
uma 8149122222 chennai 100/800/300
###########################################
###########################################
Example:
awk '/sh/ {print $0}
/nepal/ {print $0}' test1
In above example /sh/ is the 1st pattern and /nepal/ is the 2ns pattern.
Below shows that one record matches both the patterns "sh" and "nepal"
and hence is printed twice(2 actions)
awk program can be put inside a file and then used usinng -f option
Below example shows that we create a file nepal.awk with awk program
in it.
Note that when put inside the file, we dont need to write the single quotes.
Note: If we dont specify the input file (data file) then awk expects input from standard
input , which normally is keyboard.
We end the input from keyboard by pressing cntl+D.
##########################################
##########################################
We can make the awk script executable and then run it like any other sommand we use .ex: sort etc.
##########################################
In below example note that the word 'Jhansi' is broken on two lines.
The program still interprets it as "Jhansi" because of the use or "\".
With this "\" we are actually escaping newline character.
Earlier we had seen a similar example where we had written two rules on
separate lines. However thats not compulsory. We can write different statements on same line
separated by semicolon.
cp test1 test2
# we copy our test file as test2
Above awk prints one record for each record in the input file.
Thats is why it prints "test1" 5 times and "test2" 5 times.
FNR: Current record number in the current file. The value is reset after every file is read
NR: Number of records overall. This is not reset anytime.
We can also refer fields beyond the last one, but we will get only null string.
$0 represents entire record . Note that refering any field beyond the last field
does not give any error.
awk '{ print NF, $8}' test1
4
4
4
4
4
This prints the number of records in each record of the file test1.
NF indicates the number of fields and hence $NF will alwsys print last column.
NR indicates the record number and hence $NR will print 1st field for 1st record, 2nd column for 2nd
row and so on.
We can use any expression after $. Notice the use of () to enclose the
expression.
Below example will print the 4th field.
Adding new field changes the value of the internal copy of of $0.
To check whether appropriate number of field separators are added
to the field we use OFS built in variable which controls the
output field separator.
We can change the value of FS in BEGIN pattern so that is affects all the records.
Note that Two consecutive spaces(or tabs) does not create an empty field.
But if FS=";" then two consecutive ; will create an empty field.
We can set FS="[ ]" if we want to force single space and delimiter.
This will cause two consecutive spaces to to counted as an empty field.
We can set the field separator on the command line using -F option
When we execute getline without arguments the next record is read in to $0.
The original record that was already in $0 will be overridden.
So we should use getline oney after we are done working with
current record because it gets flushed the moment we read next record using getline.
The value of NF,NR and FNR and $0 are also set as per new record.
Below is an example of file having carriage return and code using getline to fix it.
[test file]
test3:
sukul 8149158828
mumbai 100/900/200
uma 8149122222 chennai
100/800/300
bhanu
8097123451 Jhansi 200/1000/500
shushant 7798977047 nepal 200/9000/100
himanshu 9090909090 bokharo 100/800/300
awk '{record=$0;noofflds=split(record,a)
while (noofflds!=4 && (getline > 0))
{
record=record $0
noofflds=split(record,a)
}
print record
}' test3
This changes the value of variables NR and FNR, but not NF and $0
because the record is not split into fields.
This is used to take input from any other file which is not a standard input to awk.
We may use this when we want file to be used as lookup.
Since the main stream input is not used this does not change the value ofNR or FNR.
But record is split into fields in normal manner , So the value of $0 and NF is changed.
[test file]
vi lookup
sukul
himanshu
uma
shushant
awk '{ name=$1; flag=0;
while ((getline < "lookup") > 0)
{
if (name==$1)
{ flag=1 }
};
if (flag==0)
{ print name " is the black sheep"};
close("lookup")
}' test1
Note that we need to close the lookup file before we start reading it
from the top.
Also note that the normal getline<"filename" will override the value of $0
that came from the main input. So make sure you save the value to anither variable
before we run the getline command.
This reads the data from the file to a variable and hence does not override the value of $0.
The drawback of reading into a variable is that entire record goes to this variable and we dont enjoy
the benefit of being able to access individual fields as $0 is not changed.
Note that we saved the command to a variable "command" and then used it with getline.
It is easier in case same command is executed several times ad also to close the command.
If not then we have to write the exact command every time and can gget trouble some incase of long
command.
Note that command has to be exact everytime it is referenced
We can change it to any other value and the field separator in output will change.
Following wil change a space delimited file to a comma delimited file.
We generally set the value of OFS in BEGIN pattern so that it applies to all rows.
Each print statement prints the ORS after each string is printed.
Default value is \n and hence we mve in to the next record after every print stetement.
We can set this value in BEGIN pattern so as to apply to all records
Note that If the ORS does not contain a new line characters then all output will run on the same line.
sprintf makes use of variable OFMT which indicates how to print numbers.
Note that printf does not add a newline character at the end.
We need to add newline character explicitly.
The output separators OFS and ORS do not have an effect on printf.
A format specifier starts with % and ends with a format control character.
But we can make the output go to other places like another file or another command.
Though we did not close the output file and command it is recommended
to close the command and file using the close function.
The file or the pipe stays open till we actually close the file/command
or awk exits.
close(filename)
close(command)
The command used to open the pipe should be used exactly to close it.
3) If we are pipe data to a command, it buffers the data and only after we close it the command will
execute
(or at the end of awk)
&&- AND
||-OR
!- NOT
awk '{if($1 ~ /u/ && $2=="8149158828") print $1 " IS THE LUCKY PERSON"}' test1
sukul IS THE LUCKY PERSON
awk '{if($1 ~ /hu/ || $2 ~ /81491/) print $1 " IS THE LUCKY PERSON"}' test1
bos90631:sm017r awk '{if($1 ~ /hu/ || $2 ~ /81491/) print $1 " IS THE LUCKY PERSON"}' test1
sukul IS THE LUCKY PERSON
uma IS THE LUCKY PERSON
shushant IS THE LUCKY PERSON
himanshu IS THE LUCKY PERSON
Note that we can have multiple BEGIN and END patterns and
they get executed in order they are written
Example 33: Setting variable values on command line
-v option can be used to set varibles on the command line
The variables are set even before BEGIN pattern is executed.
-v option is wrritten preceeding the filename arguments as well
the program text.
(not working on my installation)
while (condtion)
body
awk '{ while ( i <=3 )
{print i;
i++}
}' test1
1
2
3
>
do-while executes atleast once and then checks the condition.
for (initialization,condition,increment)
body
awk ' { for(i=0;i<=3;i++){ print i}}' test1
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
0
1
2
3
This causes to skip the rest of the body of the loop causing the next cycle around the loop to begin.
Note that element is 2 it continues to the next cycle and ignores the rest of the code.
NEXT is different from getline because of the fact that the next statement abandon the remaining code
processing and starts from beginning for the new record.
But getline continues the remaining processing using newly fetchhed record.
Note that when name was bhanu processing started with next record and thus
print statement was not executed for bhanu.
array1["CAT"]="meoww"
array2["DOG"]="barks"
Above array is valid even when we dont have numeric indices.
Notice the below 2 for loops and understand why for(i in array) is used with awk arrays.
bhanu
Note that since we had not assigned values to a[3] and a[4] above for loop printed blanks for them.
Ideally we should not printed anything because they dont exist.
Thus the above for loop is not inteligent enough to understand whether
the element exists or not.
awk ' {
print int(17.23) #gives integer part
print sqrt(900) #gives square root
print exp(2) # exponential
print log(10) # natural log
print sin(30) # sine. (x in radians)
print cos(30) # cosine. (x in radians)
} ' testx
17
30
7.38906
2.30259
30
30
index(string1,string2) : searches string1 for 1st occurenence of string2 and returns the position of
beginning of string2.
If not found it returns zero
awk '{ noofrep=sub(/uma/,"& shri",$0);print "Replace cnt:" noofrep, "|", $0}' test1
Same as sub but it replaces all the occurences in the input record.
awk '{ noofrep=gsub(/u/,"A",$0);print "Replace cnt:" noofrep, "|", $0}' test1
note the last line of the output. It contains the result of ls -lrt test* that was run
from within awk.
awk also has a array ENVIRON which contains the values of the environment variables.