You are on page 1of 28

Perl Basics

A Perl Tutorial NLP Course - 2006

What is Perl?

Practical Extraction and Report Language Interpreted Language


Optimized for String Manipulation and File I/O Full support for Regular Expressions

Running Perl Scripts

Windows

Download ActivePerl from ActiveState Just run the script from a 'Command Prompt' window Put the following in the first line of your script #!/usr/bin/perl Run the script % perl script_name

UNIX Cygwin

Basic Syntax

Statements end with semicolon ; Comments start with #

Only single line comments You dont have to declare a variable before you access it You don't have to declare a variable's type

Variables

Scalars and Identifiers

Identifiers

A variable name Case sensitive A single value (string or numerical) Accessed by prefixing an identifier with '$' Assignment with '=' $scalar = expression

Scalar

Strings

Quoting Strings

With ' (apostrophe) Everything is interpreted literally With " (double quotes) Variables get expanded With ` (backtick) The text is executed as a separate process, and the output of the command is returned as the value of the string
Check 01_printDate.pl

Comparison Operators
String lt gt eq le ge ne cmp Operation less than greater than equal to less than or equal to greater than or equal to not equal to compare, return 1, 0, -1 Arithmetic < > == <= >= != <=>

Logical Operators
Operator ||, or &&, and !, not xor Operation logical or logical and logical not logical xor

String Operators
Operator . x .=
$string1 = "potato"; $string2 = "head"; $newstring = $string1 . $string2; #"potatohead" $newerstring = $string1 x 2; #"potatopotato" $string1 .= $string2; #"potatohead" Check concat_input.pl

Operation string concatenation string repetition concatenation and assignment

Perl Functions

Perl functions are identified by their unique names (print, chop, close, etc) Function arguments are supplied as a comma separated list in parenthesis.

The commas are necessary The parentheses are often not Be careful! You can write some nasty and unreadable code this way!

Check 02_unreadable.pl

Lists

Ordered collection of scalars


Zero indexed (first item in position '0') Elements addressed by their positions (): list constructor , : element separator []: take slices (single or multiple element chunks)

List Operators

List Operations

sort(LIST) a new list, the sorted version of LIST reverse(LIST) a new list, the reverse of LIST join(EXPR, LIST) a string version of LIST, delimited by EXPR split(PATTERN, EXPR) create a list from each of the portions of EXPR that match PATTERN

Check 03_listOps.pl

Arrays

A named list

Dynamically allocated, can be saved Zero-indexed Shares list operations, and adds to them @: reference to the array (or a portion of it, with []) $: reference to an element (used with [])

Array Operators

Array Operations

push(@ARRAY, LIST) add the LIST to the end of the @ARRAY pop(@ARRAY) remove and return the last element of @ARRAY unshift(@ARRAY, LIST) add the LIST to the front of @ARRAY shift(@ARRAY) remove and return the first element of @ARRAY scalar(@ARRAY) return the number of elements in the @ARRAY
Check 04_arrayOps.pl

Associative Arrays - Hashes

Arrays indexed on arbitrary string values


Key-Value pairs Use the "Key" to find the element that has the "Value" % : refers to the hash {}: denotes the key $ : the value of the element indexed by the key (used with {})

Hash Operators

Hash Operations

keys(%ARRAY) return a list of all the keys in the %ARRAY values(%ARRAY) return a list of all the values in the %ARRAY each(%ARRAY) iterates through the key-value pairs of the %ARRAY delete($ARRAY{KEY}) removes the key-value pair associated with {KEY} from the ARRAY

Arrays Example
#!/usr/bin/perl # Simple List operations # Address an element in the list @stringInstruments = ("violin","viola","cello","bass"); @brass = ("trumpet","horn","trombone","euphonium", "tuba"); $biggestInstrument = $stringInstruments[3]; print("The biggest instrument: ", $biggestInstrument); # Join elements at positions 0, 1, 2 and 4 into a white-space delimited string print("orchestral brass: ", join(" ",@brass[0,1,2,4]), "\n"); @unsorted_num = ('3','5','2','1','4'); @sorted_num = sort( @unsorted_num ); # Sort the list print("Numbers (Sorted, 1-5): ", @sorted_num, "\n"); #Add a few more numbers @numbers_10 = @sorted_num; push(@numbers_10, ('6','7','8','9','10')); print("Numbers (1-10): ", @numbers_10, "\n"); # Remove the last print("Numbers (1-9): ", pop(@numbers_10), "\n"); # Remove the first print("Numbers (2-9): ", shift(@numbers_10), "\n"); # Combine two ops print("Count elements (2-9): ", $#@numbers_10; # scalar( @numbers_10 ), "\n"); print("What's left (numbers 2-9): ", @numbers_10, "\n");

Hashes Example
#!/usr/bin/perl # Simple List operations $player{"clarinet"} = "Susan Bartlett"; $player{"basson"} = "Andrew Vandesteeg"; $player{"flute"} = "Heidi Lawson"; $player{"oboe"} = "Jeanine Hassel"; @woodwinds = keys(%player); @woodwindPlayers = values(%player); # Who plays the oboe? print("Oboe: ", $player{'oboe'}, "\n"); $playerCount = scalar(@woodwindPlayers); while (($instrument, $name) = each(%player)) { print( "$name plays the $instrument\n" ); }

Pattern Matching

A pattern is a sequence of characters to be searched for in a character string

/pattern/ =~: tests whether a pattern is matched !~: tests whether patterns is not matched

Match operators

Patterns
Pattern /def/ /\bdef\b/ /^def/ /^def$/ /de?f/ /d[eE]f/ /d[^eE]f/ Matches "define" a def word Pattern /d.f/ /d.+f/ dif dabcf df, daffff deef, deeef deeef deeeeef up to deeef Matches

def in start of /d.*f/ line /de{1,3}f/ def line df, def def, dEf daf, dzf /de{3}f/ /de{3,}f/ /de{0,3}f/

Character Ranges
Escape Pattern Sequence \d [0-9] \D \w \W \s \S [^0-9] [_0-9A-Za-z] [^_0-9A-Za-z] [ \r\t\n\f] [^\r\t\n\f] Description Any digit Anything but a digit Any word character Anything but a word char White-space Anything but white-space

Backreferences
Memorize the matched portion of input Use of parentheses.

/[a-z]+(.)[a-z]+\1[a-z]+/ asd-eeed-sdsa, sd-sss-ws NOT as_eee-dfg

They can even be accessed immediately after the pattern is matched

\1 in the previous pattern is what is matched by (.)

Pattern Matching Options


Escape Description Sequence g Match all possible patterns i x Ignore case Ignore white-space in pattern

Substitutions

Substitution operator

s/pattern/substitution/options

If $string = "abc123def";
$string =~ s/123/456/ Result: "abc456def" $string =~ s/123// Result: "abcdef" $string =~ s/(\d+)/[$1]/ Result: "abc[123]def Use of backreference!

Predefined Read-only Variables


$& $` $' is the part of the string that matched the regular expression is the part of the string before the part that matched is the part of the string after the part that matched

EXAMPLE
$_ = "this is a sample string"; /sa.*le/; # matches "sample" within the string # $` is now "this is a " # $& is now "sample" # $' is now " string" Because these variables are set on each successful match, you should save the values elsewhere if you need them later in the program.

The split and join Functions


The split function takes a regular expression and a string, and looks for all occurrences of the regular expression within that string. The parts of the string that don't match the regular expression are returned in sequence as a list of values. The join function takes a list of values and glues them together with a glue string between each list element. Split Example $line = "merlyn::118:10:Randal:/home/merlyn: /usr/bin/perl"; @fields = split(/:/,$line); # split $line, using : as delimiter # now @fields is ("merlyn","","118","10","Randal", # "/home/merlyn","/usr/bin/perl") Join Example $bigstring = join($glue,@list); For example to rebuilt the password file try something like: $outline = join(":", @fields);

String - Pattern Examples


A simple Example #!/usr/bin/perl print ("Ask me a question politely:\n"); $question = <STDIN>; # what about capital P in "please"? if ($question =~ /please/) { print ("Thank you for being polite!\n"); } else { print ("That was not very polite!\n"); }

String Pattern Example


#!/usr/bin/perl print ("Enter a variable name:\n"); $varname = <STDIN>; chop ($varname); # Try asd$asdas... It gets accepted! if ($varname =~ /\$[A-Za-z][_0-9a-zA-Z]*/) { print ("$varname is a legal scalar variable\n"); } elsif ($varname =~ /@[A-Za-z][_0-9a-zA-Z]*/) { print ("$varname is a legal array variable\n"); } elsif ($varname =~ /[A-Za-z][_0-9a-zA-Z]*/) { print ("$varname is a legal file variable\n"); } else { print ("I don't understand what $varname is.\n"); }