You are on page 1of 85

Awk

Programming

WHAT IS AWK?
created by: Aho, Weinberger, and Kernighan
scripting language used for manipulating data
and generating reports

versions of awk
awk,

nawk, mawk, pgawk,


GNU awk: gawk

WHAT CAN YOU DO WITH AWK?

awk operation:
scans

a file line by line


splits each input line into fields
compares input line/fields to pattern
performs action(s) on matched lines

Useful for:

transform

data files
produce formatted reports

Programming constructs:
format

output lines
arithmetic and string operations
conditionals and loops

THE COMMAND: AWK

BASIC AWK SYNTAX

awk [options] script file(s)

awk [options] f scriptfile file(s)

Options:
-F to change input field separator
-f to name script file

BASIC AWK PROGRAM

consists of patterns & actions:


pattern {action}
if

pattern is missing, action is applied to all lines


if action is missing, the matched line is printed
must have either pattern or action

Example:
awk '/for/' testfile
prints all lines containing string for in testfile
6

BASIC TERMINOLOGY: INPUT FILE


A field is a unit of data in a line
Each field is separated from the other fields by
the field separator

default

field separator is whitespace

A record is the collection of fields in a line


A data file is made up of records

EXAMPLE INPUT FILE

BUFFERS

awk supports two types of buffers:


record and field
field buffer:
one

for each fields in the current record.


names: $1, $2,

record buffer :
$0

holds the entire record


9

SOME SYSTEM VARIABLES


FS
RS

Field separator (default=whitespace)


Record separator (default=\n)

NF
NR

Number of fields in current record


Number of the current record

OFS
ORS

Output field separator (default=space)


Output record separator (default=\n)

FILENAME Current filename

10

EXAMPLE: RECORDS AND FIELDS


% cat emps
Tom Jones
Mary Adams
Sally Chang
Billy Black
%
1
2
3
4

4424
5346
1654
1683

5/12/66
11/4/63
7/22/54
9/23/44

543354
28765
650000
336500

awk '{print NR, $0}' emps


Tom Jones
4424
5/12/66
Mary Adams
5346
11/4/63
Sally Chang
1654
7/22/54
Billy Black
1683
9/23/44

543354
28765
650000
336500

11

EXAMPLE: SPACE AS FIELD


SEPARATOR
% cat emps
Tom Jones
Mary Adams
Sally Chang
Billy Black
%
1
2
3
4

4424
5346
1654
1683

5/12/66
11/4/63
7/22/54
9/23/44

543354
28765
650000
336500

awk '{print NR, $1, $2, $5}' emps


Tom Jones 543354
Mary Adams 28765
Sally Chang 650000
Billy Black 336500

12

EXAMPLE: COLON AS FIELD


SEPARATOR
% cat em2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk -F: '/Jones/{print $1, $2}' em2
Tom Jones 4424
13

AWK SCRIPTS
awk scripts are divided into three major parts:

comment lines start with #

CSCI 330 - The UNIX System

14

AWK SCRIPTS

BEGIN: pre-processing

processing that must be completed before


the file processing starts (i.e., before awk starts
reading records from the input file)
useful for initialization tasks such as to initialize
variables and to create report headings

CSCI 330 - The UNIX System

performs

15

AWK SCRIPTS

BODY: Processing

main processing logic to be applied to input

records
like a loop that processes input data one record at a
time:

if a file contains 100 records, the body will be executed 100


times, one for each record

CSCI 330 - The UNIX System

contains

16

AWK SCRIPTS

END: post-processing

logic to be executed after all input data have


been processed
logic such as printing report grand total should be
performed in this part of the script

CSCI 330 - The UNIX System

contains

17

PATTERN / ACTION SYNTAX

CSCI 330 - The UNIX System

18

CATEGORIES OF PATTERNS

CSCI 330 - The UNIX System

19

EXPRESSION PATTERN TYPES

match

input record
regular expression enclosed by /s
explicit pattern-matching expressions
~ (match), !~ (not match)

expression operators

CSCI 330 - The UNIX System

entire

arithmetic
relational
logical

20

EXAMPLE: MATCH INPUT RECORD

CSCI 330 - The UNIX System

% cat employees2
Tom Jones:4424:5/12/66:543354
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500
% awk F: '/00$/' employees2
Sally Chang:1654:7/22/54:650000
Billy Black:1683:9/23/44:336500

21

EXAMPLE: EXPLICIT MATCH


% cat datafile
Charles Main

3.0

.98

34

western

Sharon Gray

5.3

.97

23

southwest SW

Lewis Dalsass

2.7

.8

18

southern

Suan Chin

5.1

.95

15

southeast SE

Patricia Hemenway

4.0

.7

17

eastern

TB Savage

4.4

.84

20

northeast NE

AM Main

5.1

.94

13

north

NO

Margot Weber

4.5

.89

central

CT

Ann Stephens

5.7

.94

13

WE
SO
EA

CSCI 330 - The UNIX System

northwest NW

% awk '$5 ~ /\.[7-9]+/' datafile


southwest SW

Lewis Dalsass

2.7

.8

18

central

Ann Stephens

5.7

.94

13

CT

22

EXAMPLES: MATCHING WITH RES

% awk '/^[ns]/{print $1}' datafile


northwest
southwest
southern
southeast
northeast
north

CSCI 330 - The UNIX System

% awk '$2 !~ /E/{print $1, $2}' datafile


northwest NW
southwest SW
southern SO
north NO
central CT

23

ARITHMETIC OPERATORS
Meaning
Add
Subtract
Multiply
Divide
Modulus
Exponential

Example
x+y
xy
x*y
x/y
x%y
x^y

Example:
% awk '$3 * $4 > 500 {print $0}' file

CSCI 330 - The UNIX System

Operator
+
*
/
%
^

24

RELATIONAL OPERATORS
Meaning
Less than
Less than or equal
Equal to
Not equal to
Greater than
Greater than or equal to
Matched by reg exp
Not matched by req exp

Example
x<y
x<=y
x == y
x != y
x>y
x>=y
x ~ /y/
x !~ /y/

CSCI 330 - The UNIX System

Operator
<
<=
==
!=
>
>=
~
!~

25

LOGICAL OPERATORS
Meaning
Logical AND
Logical OR
NOT

Example
a && b
a || b
!a

Examples:
% awk '($2 > 5) && ($2 <= 15)
{print $0}' file
% awk '$3 == 100 || $4 > 50' file

CSCI 330 - The UNIX System

Operator
&&
||
!

26

RANGE PATTERNS

Matches ranges of consecutive input lines

pattern can be any simple pattern


pattern1 turns action on
pattern2 turns action off

CSCI 330 - The UNIX System

Syntax:
pattern1 , pattern2 {action}

27

RANGE PATTERN EXAMPLE

CSCI 330 - The UNIX System

28

AWK ACTIONS

CSCI 330 - The UNIX System

29

AWK EXPRESSIONS

Expression is evaluated and returns value

Can involve variables

As part of expression evaluation


As target of assignment

CSCI 330 - The UNIX System

consists of any combination of numeric and string


constants, variables, operators, functions, and
regular expressions

30

AWK VARIABLES
A user can define any number of variables within
an awk script
The variables can be numbers, strings, or arrays
Variable names start with a letter, followed by
letters, digits, and underscore
Variables come into existence the first time they
are referenced; therefore, they do not need to be
declared before use
All variables are initially created as strings and
initialized to a null string

CSCI 330 - The UNIX System

31

AWK VARIABLES

Examples:
% awk '$1 ~ /Tom/
{wage = $3 * $4; print wage}'
filename
% awk '$4 == "CA"
{$4 = "California"; print $0}'
filename

CSCI 330 - The UNIX System

Format:
variable = expression

32

AWK ASSIGNMENT OPERATORS


=

CSCI 330 - The UNIX System

++
-+=
-=
*=
/=
%=
^=

assign result of right-hand-side expression to


left-hand-side variable
Add 1 to variable
Subtract 1 from variable
Assign result of addition
Assign result of subtraction
Assign result of multiplication
Assign result of division
Assign result of modulo
Assign result of exponentiation

33

AWK EXAMPLE
File: grades
john 85 92 78 94 88
andrea 89 90 75 90 86
jasper 84 88 80 92 84
awk script: average
# average five grades
{ total = $2 + $3 + $4 + $5 + $6
avg = total / 5
print $1, avg }
Run as:
awk f average grades

CSCI 330 - The UNIX System

34

OUTPUT STATEMENTS

CSCI 330 - The UNIX System

print
print easy and simple output
printf
print formatted (similar to C printf)
sprintf
format string (similar to C sprintf)

35

FUNCTION: PRINT
Writes to standard output
Output is terminated by ORS

ORS is newline

If called with no parameter, it will print $0


Printed parameters are separated by OFS,

default

OFS is blank

Print control characters are allowed:


\n

CSCI 330 - The UNIX System

default

\f \a \t \\

36

PRINT EXAMPLE

% awk '{print $0}' grades


john 85 92 78 94 88
andrea 89 90 75 90 86

CSCI 330 - The UNIX System

% awk '{print}' grades


john 85 92 78 94 88
andrea 89 90 75 90 86

% awk '{print($0)}' grades


john 85 92 78 94 88
andrea 89 90 75 90 86
37

PRINT EXAMPLE

% awk '{print $1 "," $2}' grades


john,85
andrea,89

CSCI 330 - The UNIX System

% awk '{print $1, $2}' grades


john 85
andrea 89

38

PRINT EXAMPLE

% awk '{OFS="-";print $1 "," $2}' grades


john,85
andrea,89

CSCI 330 - The UNIX System

% awk '{OFS="-";print $1 , $2}' grades


john-85
andrea-89

39

REDIRECTING PRINT OUTPUT


Print output goes to standard output
unless redirected via:

will open file or command only once


subsequent redirections append to already open
stream

CSCI 330 - The UNIX System

> file
>> file
| command

40

PRINT EXAMPLE
% awk '{print $1 , $2 > "file"}' grades
CSCI 330 - The UNIX System

% cat file
john 85
andrea 89
jasper 84

41

PRINT EXAMPLE

john 85
% awk '{print $1,$2 | "sort k 2"}' grades
jasper 84
john 85
andrea 89

CSCI 330 - The UNIX System

% awk '{print $1,$2 | "sort"}' grades


andrea 89
jasper 84

42

PRINT EXAMPLE

% date |
awk '{print "Month: " $2 "\nYear: ", $6}'
Month: Nov
Year: 2008

CSCI 330 - The UNIX System

% date
Wed Nov 19 14:40:07 CST 2008

43

PRINTF: FORMATTING OUTPUT


Syntax:

works

like C printf
each format specifier in format-string requires
argument of matching type

CSCI 330 - The UNIX System

printf(format-string, var1, var2, )

44

FORMAT SPECIFIERS
decimal integer
single character
string of characters
floating point number
octal number
hexadecimal number
scientific floating point notation
the letter %

CSCI 330 - The UNIX System

%d, %i
%c
%s
%f
%o
%x
%e
%%

45

FORMAT SPECIFIER EXAMPLES


Given: x = A, y = 15, z = 2.3, and $1 = Bob Smith
What it Does

%c

printf("The character is %c \n", x)


output: The character is A

%d

printf("The boy is %d years old \n", y)


output: The boy is 15 years old

%s

printf("My name is %s \n", $1)


output: My name is Bob Smith

%f

printf("z is %5.3f \n", z)


output: z is 2.300

CSCI 330 - The UNIX System

Printf Format
Specifier

46

FORMAT SPECIFIER MODIFIERS


between % and letter
%10s
%7d
%10.4f
%-20s
meaning:

of field, field is printed right justified


precision: number of digits after decimal point
- will left justify

CSCI 330 - The UNIX System

width

47

SPRINTF: FORMATTING TEXT


Syntax:

Works

like printf, but does not produce output


Instead it returns formatted string

Example:
{

CSCI 330 - The UNIX System

sprintf(format-string, var1, var2, )

text = sprintf("1: %d 2: %d", $1, $2)


print text
}
48

AWK BUILTIN FUNCTIONS

Example: tolower("MiXeD cAsE 123")


returns "mixed case 123"
toupper(string)
returns a copy of string, with each lower-case
character converted to upper-case.

CSCI 330 - The UNIX System

tolower(string)
returns a copy of string, with each upper-case
character converted to lower-case. Nonalphabetic
characters are left unchanged.

49

AWK EXAMPLE: LIST OF PRODUCTS

CSCI 330 The UNIX System

103:sway bar:49.99
101:propeller:104.99
104:fishing line:0.99
113:premium fish bait:1.00
106:cup holder:2.49
107:cooler:14.89
112:boat cover:120.00
109:transom:199.00
110:pulley:9.88
105:mirror:4.99
108:wheel:49.99
111:lock:31.00
102:trailer hitch:97.95

50

AWK EXAMPLE: OUTPUT

CSCI 330 - The UNIX System

Marine Parts R Us
Main catalog
Part-id name
price
======================================
101
propeller
104.99
102
trailer hitch
97.95
103
sway bar
49.99
104
fishing line
0.99
105
mirror
4.99
106
cup holder
2.49
107
cooler
14.89
108
wheel
49.99
109
transom
199.00
110
pulley
9.88
111
lock
31.00
112
boat cover
120.00
113
premium fish bait
1.00
======================================
Catalog has 13 parts

51

AWK EXAMPLE: COMPLETE


BEGIN {

}
{
printf("%3d\t%-20s\t%6.2f\n", $1, $2, $3)
count++
}
END {

CSCI 330 - The UNIX System

FS= ":"
print "Marine Parts R Us"
print "Main catalog"
print "Part-id\tname\t\t\t price"
print "======================================"

is output sorted ?

print "======================================"
print "Catalog has " count " parts"
}

52

AWK ARRAY
awk allows one-dimensional arrays
to store strings or numbers
index can be number or string

array need not be declared


its

size
its elements

CSCI 330 - The UNIX System

array elements are created when first used


initialized

to 0 or

53

ARRAYS IN AWK

Examples:
list[1] = "one"
list[2] = "three"

CSCI 330 - The UNIX System

Syntax:
arrayName[index] = value

list["other"] = "oh my !"


54

ILLUSTRATION: ASSOCIATIVE
ARRAYS

awk arrays can use string as index


CSCI 330 - The UNIX System

55

AWK BUILTIN SPLIT FUNCTION


split(string, array, fieldsep)
string into pieces separated by fieldsep, and
stores the pieces in array
if the fieldsep is omitted, the value of FS is used.

Example:
split("auto-da-fe", a, "-")
sets the contents of the array a as follows:
a[1] = "auto"
a[2] = "da"
a[3] = "fe"

CSCI 330 - The UNIX System

divides

56

EXAMPLE: PROCESS SALES DATA


input file:

output:

CSCI 330 - The UNIX System

summary

of category sales

57

ILLUSTRATION: PROCESS EACH


INPUT LINE

CSCI 330 - The UNIX System

58

ILLUSTRATION: PROCESS EACH


INPUT LINE

CSCI 330 - The UNIX System

59

SUMMARY: AWK PROGRAM

CSCI 330 - The UNIX System

60

EXAMPLE: COMPLETE PROGRAM

CSCI 330 - The UNIX System

% cat sales.awk
{
deptSales[$2] += $3
}
END {
for (x in deptSales)
print x, deptSales[x]
}
% awk f sales.awk sales

61

DELETE ARRAY ENTRY

Format:
delete array_name [index]

Example:

CSCI 330 - The UNIX System

The delete function can be used to delete an


element from an array.

delete deptSales["supplies"]

62

AWK CONTROL STRUCTURES

Conditional

Repetition
for

with counter
with array index

while

CSCI 330 - The UNIX System

if-else

do-while
also:

break, continue
63

IF STATEMENT
Syntax:

Example:
if ( NR < 3 )
print $2
else
print $3

CSCI 330 - The UNIX System

if (conditional expression)
statement-1
else
statement-2

64

FOR LOOP
Syntax:

Example:
for (i = 1; i <= NR; i++)
{
total += $i
count++
}

CSCI 330 - The UNIX System

for (initialization; limit-test; update)


statement

65

FOR LOOP FOR ARRAYS


Syntax:

Example:
for (x in deptSales)
{
print x, deptSales[x]
}

CSCI 330 - The UNIX System

for (var in array)


statement

66

WHILE LOOP
Syntax:

Example:
i = 1
while (i <= NF)
{
print i, $i
i++
}

CSCI 330 - The UNIX System

while (logical expression)


statement

67

DO-WHILE LOOP
Syntax:
statement
while (condition)

statement is executed at least once, even if


condition is false at the beginning
Example:

i = 1
do {
print $0
i++
} while (i <= 10)

CSCI 330 - The UNIX System

do

68

LOOP CONTROL STATEMENTS


break
exits loop

continue
skips rest of current iteration, continues with
next iteration

CSCI 330 - The UNIX System

69

LOOP CONTROL EXAMPLE


20; x++) {
> 100) continue
x
< 0 ) break

CSCI 330 - The UNIX System

for (x = 0; x <
if ( array[x]
printf "%d ",
if ( array[x]
}

70

EXAMPLE: SENSOR DATA


Temperature
Rainfall
Snowfall
Windspeed
Winddirection

also: sensor readings

Plan: print average readings in descending order

CSCI 330 - The UNIX System

1
2
3
4
5

71

EXAMPLE: SENSOR READINGS

CSCI 330 - The UNIX System

2008-10-01/1/68
2008-10-02/2/6
2007-10-03/3/4
2008-10-04/4/25
2008-10-05/5/120
2008-10-01/1/89
2007-10-01/4/35
2008-11-01/5/360
2008-10-01/1/45
2007-12-01/1/61
2008-10-10/1/32

72

EXAMPLE: PRINT SENSOR DATA

CSCI 330 - The UNIX System

BEGIN {
printf("id\tSensor\n")
printf("----------------------\n")
}
{
printf("%d\t%s\n", $1, $2)
}

73

EXAMPLE: PRINT SENSOR


READINGS

CSCI 330 - The UNIX System

BEGIN {
FS="/"
printf(" Date\t\tValue\n
printf("---------------------\n")
}
{
printf("%s
%7.2f\n", $1, $3)
}

74

EXAMPLE: PRINT SENSOR


SUMMARY

CSCI 330 - The UNIX System

BEGIN {
FS="/"
}
{
sum[$2] += $3;
count[$2]++;
}
END {
for (i in sum) {
printf("%d %7.2f\n",i,sum[i]/count[i])
}
}

75

EXAMPLE: REMAINING TASKS


awk f sense.awk sensors readings
Sensor Average
2 input files
----------------------Winddirection 240.00
Temperature
59.00
Windspeed
30.00
sorted
Rainfall
6.00
Snowfall
4.00

CSCI 330 - The UNIX System

sensor names

76

EXAMPLE: PRINT SENSOR


AVERAGES

Remaining tasks:
nature of input data
use: number of fields in record

substitute

sensor id with sensor name


use: associative array

CSCI 330 - The UNIX System

recognize

sort

readings
use: sort gr k 2
77

EXAMPLE: SENSE.AWK

CSCI 330 - The UNIX System

NF > 1 {
name[$1] = $2
}
NF < 2 {
split($0,fields,"/")
sum[fields[2]] += fields[3];
count[fields[2]]++;
}
END {
for (i in sum) {
printf("%15s %7.2f\n", name[i],
sum[i]/count[i]) | "sort -gr -k 2"
}
}

78

EXAMPLE: PRINT SENSOR


AVERAGES

Remaining tasks:
use: sort -gr
Substitute sensor id with sensor name
1. use:
join -j 1 sensor-data sensor-averages

CSCI 330 - The UNIX System

Sort

2. within awk

79

EXAMPLE: SOLUTION 1 (1/3)

CSCI 330 - The UNIX System

#! /bin/bash
trap '/bin/rm /tmp/report-*-$$; exit' 1 2 3
cat << HERE > /tmp/report-awk-1-$$
BEGIN {FS="/"}
{
sum[\$2] += \$3;
count[\$2]++;
}
END {
for (i in sum) {
printf("%d %7.2f\n", i, sum[i]/count[i])
}
}
HERE

80

EXAMPLE: SOLUTION 1 (2/3)

CSCI 330 - The UNIX System

cat << HERE > /tmp/report-awk-2-$$


BEGIN {
printf(" Sensor Average\n")
printf("-----------------------\n")
}
{
printf("%15s %7.2f\n", \$2, \$3)
}
HERE

81

EXAMPLE: SOLUTION 1 (3/3)

join j 1 sensor-data /tmp/report-r-$$


> /tmp/report-t-$$

CSCI 330 - The UNIX System

awk -f /tmp/report-awk-1-$$
sensor-readings |
sort > /tmp/report-r-$$

sort -gr -k 3 /tmp/report-t-$$ |


awk -f /tmp/report-awk-2-$$
/bin/rm /tmp/report-*-$$

82

EXAMPLE: OUTPUT

CSCI 330 - The UNIX System

Sensor Average
----------------------Winddirection 240.00
Temperature
59.00
Windspeed
30.00
Rainfall
6.00
Snowfall
4.00

83

EXAMPLE: SOLUTION 2 (1/2)


#! /bin/bash
trap '/bin/rm /tmp/report-*$$; exit' 1 2 3
NF > 1 {
name[\$1] = \$2
}
NF < 2 {

CSCI 330 - The UNIX System

cat << HERE > /tmp/report-awk-3-$$

split(\$0,fields,"/")
sum[fields[2]] += fields[3];
count[fields[2]]++;
}
84

EXAMPLE: SOLUTION 2 (2/2)

CSCI 330 - The UNIX System

END {
for (i in sum) {
printf("%15s %7.2f\n", name[i],
sum[i]/count[i])
}
}
HERE
echo "
Sensor Average"
echo "-----------------------"
awk -f /tmp/report-awk-3-$$ sensor-data
sensor-readings | sort -gr -k 2
/bin/rm /tmp/report-*$$

85

You might also like