Methodologies for Population/Quantitative Genetics Animal Science 562

AWK Programming
Introduction
“Computer users spend a lot of time doing simple, mechanical data manipulation - changing the format of data, checking its validity, finding items with some property, adding up numbers, printing reports, and the like. All of these jobs ought to be mechanized, but it’s a real nuisance to have to write a special-purpose program in a standard language like C or Pascal each time such a task comes up. “Awk is a programming language that make it possible to handle simple, mechanical data manipulation tasks with very short programs, often only one or two lines long. An awk program is a sequence of patterns and actions that tell what to look for in the input data and what to do when it’s found.” Aho, Kernighan and Weinberger. 1988. “The AWK Programming Language”

1.1 Getting Started
File emp.dat contains name, pay rate in dollars per hour, number of hours worked, one employee record per line. Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18 Task: Print the name and pay (rate times hours) for everyone who worked more than zero hours. Program: awk ‘$3 > 0 { print $1, $2 * $3 }’ emp.dat The Structure of an AWK Program Each AWK program is a sequence of one or more pattern-action statements pattern {action} pattern {action} … Running an AWK Program Type a command line of the form awk ‘program’ input files omit the input files from the command line awk ‘program’ awk will apply the program to whatever you type next on your terminal until you type an end-of-file signal (control-d on UNIX systems). 1

Task: Use prinf to print the total pay for every employee awk ‘{printf (“total pay for %s is $%. as a number with 2 digits after the decimal point • no blanks or new lines are produced automatically. $1. the second $2. value2. A field is a sequence of characters that doesn’t contain any blanks or tabs. as a string of characters print the second value. $0}’ ‘{print “total pay for “. First field in current input line is called $1. value1. $1. $NF}’ (Any expression can be used after $ to denote a field ‘{print $1.2f \n”. Print Every Line Print Certain Fields NF.2f print the first value $1. you must create them yourself. “ is “.) awk awk awk ‘{print}’ OR awk ‘{print $0}’ ‘{print $1. $1.3 Fancier Output • print statement is meant for quick and easy output • use printf statement to format the output exactly the way you want it Lining Up Fields printf statement form printf (format. and so forth. …. The entire line is called $0. valuen) where format is a string that contains text to be printed verbatim interspersed with specification of how each of the values is to be printed A specification is a % followed by a few characters that control the format of a value. $3}’ ‘{print NF. The number of fields can vary from line to line. $2 * $3}’ Computing and Printing awk Printing Line Numbers awk Putting Text in the Output awk 1. $2 * $3)}’ • contains two % specifications %s %. Don’t forget the \n. $2*$3. Awk reads one line at a time and splits each line into fields. 2 . $2 * $3}’ ‘{print NR.Animal Science 562 AWK Programming Executing long programs awk -f progfile optional list of input files 1. The Number of Fields number.2 Simple Output Only two types of data in awk: numbers and strings of characters.

awk ‘$2 >= 4 || $3 >= 20’ emp. $0)}’ emp.data. $1)}’ Selection by Text Content Task: Print all lines in which the first field is Susie awk ‘$1 = =”Susie”’ Combinations of Patterns • Patterns can be combined with parentheses and the logical operators &&. which stand for AND. “negative hours worked”}’ awk ‘$3 >60 {print $0. sorted in order of increasing pay.4 Selection • Awk patterns are good for selecting interesting lines from the input for further processing.2f %s \n”.35 {print $0. “number of fields is not equal to 3”}’ awk ‘$2 <3. awk ‘{printf (“%-8s $%6.data Selection by Computation Task: Print the pay of those employees whose total pay exceeds $50. awk ‘{printf (“%6.2f for %s \n”.00 or more per hour. and NOT. and !. respectively. 1. Selection by Comparison Task: A comparison pattern to select the records of employees who earn $5. “rate exceeds $10 per hour”}’ awk ‘$3 <0 {print $0. Task: Print lines where $2 is at least 4 or $3 is at least 20. $2 * $3)}’ Sorting the Output Task: Print all data for each employee. $2*$3.data | sort • pipes the output of awk into the sort command. $2 * $3. ||. “too many hours worked”}’ 3 . awk ‘$2 >= 5’ emp.Animal Science 562 AWK Programming Task: Print each employee’s name and pay. along with his or her pay.2f \n”. Task: Use comparison patterns to apply five plausibility tests to each line of emp. OR. $1.data Data Validation • Awk is an excellent tool for checking that data has reasonable values and that it is in the right format. “rate is below minimum wage”}’ awk ‘$2 > 10 {print $0. awk ‘$2*$3 > 50 {printf(“$%. awk ‘NF !=3 {print $0.

$2 > maxrate {maxrate = $2. pay print “average pay is ”. $3 > 15 {emp=emp+1} END {print emp. maxemp} String Concatenation Task: Create new strings by combining old ones {names = names $1 “ “} END {print names} Built-in Functions 4 .5 Computing with AWK • In awk. Counting Task: Use a variable emp to count employees who have worked more than 15 hours. Task: Use BEGIN to print a heading. pay /NR } Handling Text • One strength of awk is its ability to handle strings of characters as conveniently as most languages handle numbers.Animal Science 562 AWK Programming BEGIN and END The special pattern BEGIN matches before the first line of the first input file is read. print “ ”} {print} • You can put several statements on a single line if you separate them by semicolons. “employees worked more than 15 hours”} Computing Sums and Averages Task: Use the built-in variable NR to count the number of employees awk ‘END {print NR. (Note. user-created variables are not declared. This is a multiple line file and must be executed from a file. maxrate. Task: Find the employee who is paid the most per hour. maxemp=$1}’ END print “highest hourly rate: “. 1. “employees” print “Total pay is ”.) BEGIN {print “Name Rate Hours”. “employees”} Task: Compute the average pay {pay = pay + $2*$3} END {print NR. and END matches after the last line of the last file has been processed. “ for “.

Kernighan.compute compound interest # input: amount rate years # output: compounded value at the end of each year.12 5 References Aho. V. J. Sed and Awk: UNIX Power Tools. pay/n else print “no employees are paid more than $6/hour” } While Statement Task: Show how the value of an amount of money invested at a particular interest rate grows over a number of years. A.06 5 1000 . Total pay is “. #interest1 . Weinberger. $1*(1+$2)^i) i=i+1 } } Try gawk -f interest1 1000 . Inc. “employees. { i=1 while (i <= $3){ printf(“\t%. using the formula value = amount (1 + rate)years.Animal Science 562 AWK Programming • Provides built-in variables that maintain frequently used quantities: number of fields.2f\n”. D.6 Control-Flow Statements (Note: These constructs are available in gawk and not awk at ISU.) IF-Else Statement $2 > 6 {n = n+1. O’Reilly and Associates. New York. California. 1990. 1988. 5 . and P. B. “average pay is “. The AWK Programming Language. input line number • Built-in functions for computing square root logarithms random numbers 1. pay. Addison-Wesley.. Dougherty. W. pay = pay + $2 * $3} END {if (n > 0) print n.

Sign up to vote on this title
UsefulNot useful