You are on page 1of 159

PDFaid.

Com
#1 Pdf Solutions

Learning Perl
Randal L. Schwartz, merlyn@stonehenge.com
Version 4.1.2 on 27 Nov 2010

This document is copyright 2010 by Randal L. Schwartz, Stonehenge Consulting Services, Inc.

1
Overview

• Anything you miss is in the book


• Including more examples
• Some “rounding off the corners” in here
• Book contains footnotes

2
History
• Created by Larry Wall in 1987
• Version 4 released in 1991, along with Camel book
• And sysadmins rejoiced!
• Version 5 released in 1995 (complete internal rewrite)
• Helped propel emerging “www” into interactivity
• Version 6 proposed in 2001 (in progress)
• Far from dead
• Continual development on new 5.x releases
• More CPAN uploads than ever
• Perl jobs listings strong

3
Philosophy
• Easy to solve easy problems (“scripting”)
• Possible to solve hard problems (“programming”)
• Many ways to say the same thing
• But with different emphasis
• Shortcuts for common operations
• Why say “can not” when you can say “can’t”?
• No reserved words
• Prefix characters distinguish data from built-ins

4
Community
• perl.com and perl.org
• Mailing lists at lists.perl.org
• Perlmonks
• StackOverflow
• IRC channels (including entire IRC network)
• Local user groups (“Perl Mongers”) listed at www.pm.org
• Conferences (OSCON and smaller regionals)
• CPAN (module sharing unlike any other project)

5
Program Syntax
• Plain text file
• Whitespace mostly insignificant
• Most statements delimited by semicolons (not newline!)
• Comment is pound-sign to end of line
• Unix execution typical of scripting languages:
#!/usr/bin/perl
print "Hello world!\n";

6
Scalars

7
Numbers
• Numbers are always floating point
• Numbers represented sanely:
2 3.5 -5.01 4.25e15
• Numbers can be non-decimal:
0777 0xFEEDBEEF 0b11001001
• Numbers take traditional operators and parens:
2 + 3, 2 * 5, 3 * (4 + 5)

8
Strings

• Sequence of characters (any bytes, any length)


• String literals can be single quoted:
'foo' 'longer string' 'don\'t leave home without it!'
• Double-quoted strings get backslash escapes:
"hello\n" "coke\tsprite"
• Double-quoted strings also interpolate variables (later)

9
String operators

• Concatenation:
'Fred' . ' ' . 'Flintstone'
• Replication:
'hello' x 3
'hello' x 3.5

10
Scalar conversion
• Numbers can be used as strings:
(13 + 4) . ' monkeys'
• Strings can be used as numbers:
'12' / 3
'12fred' / 3
'fred' / 3
• Trailing garbage ignored, completely garbage = 0

11
Warnings
• Add to your program:
use warnings;
• Or on shebang line:
#!/usr/bin/perl -W
• Questionable actions noted
• Doesn’t change execution behavior though
• Example:
"12fred" + 15 # warning about not strictly numeric
• Even more info:
use diagnostics;

12
Scalar variables
• Dollar plus one or more letters, digits, underscore:
$a $b $fred $FRED
• Case matters
• Lowercase for locals, capital for globals
• Perl doesn’t care—just a convention
• Typically get value through assignment:
$a = 3;
$b = $a + 2;
$b = $b * 4; # same as "$b *= 4"

13
Simple output

• print takes a comma-separated list of values:


$x = 3 + 4;
print "Hello, the answer is ", $x, "\n";
• Double-quoted strings are variable interpolated:
print "Hello, the answer is $x\n";

14
Comparison operators
• Numeric comparison:
== != < > <= >=
• String comparison:
eq ne lt gt le ge
• Needed to distinguish:
'12' lt '3'
'12' < '3'
• Use math symbols with numbers, words for words
• Common mistake:
$x == 'foo'

15
Simple if
• Choice based on a comparison:
if ($x > 3) {
print "x is greater than 3\n";
} else {
print "x is less than or equal to 3\n";
}
• “else” part is optional
• Braces are required (called “blocks”)
• Prevents the “dangling else” issue from C

16
Boolean logic
• Comparisons can be stored:
$is_bigger = $x > 3;
if ($is_bigger) { ... }
• What’s in $is_bigger? “true” or “false”
• False: 0, empty string, '0', or undef (described later)
• True: everything else
• Can’t count on specific return value for “true”
• It'll be some true value though
• Use “!” for not:
if (! $is_bigger) { ... }

17
Simple user input
• Use the line-input function:
$x = <STDIN>;
• Includes the newline, typically removed:
chomp($x);
• Combine those two:
chomp($x = <STDIN>);
• Parens are needed on that

18
Simple loops
• Use “while”:
$count = 0;
while ($count < 10) {
$count += 2;
print "count is now $count\n"; # gives 2 4 6 8 10
}
• Controlled by boolean expression
• Braces required
• May execute 0 times if boolean is initially false

19
The undef value
• Initial value for all variables
• Treated like 0 for math, empty string for strings
• Great for looping accumulators:
$n = 1;
while ($n < 10) {
$sum += $n; # undef initially acts like 0
$n += 2;
}
print "The total was $sum.\n";

20
Detecting undef
• Detect undef with defined()
• Often useful with “error” returns:
$line = <STDIN>; # might be at eof
if (defined $line) {
print "got a line, it was $line";
} else {
print "at eof!\n";
}

21
Lists and Arrays

22
List element access
• Use element access to initialize a list:
$fred[0] = "yabba";
$fred[1] = "dabba";
$fred[2] = "doo";
• This is not $fred! Different namespace!
• Use these like individual scalar variables:
print "$fred[0]\n";
$fred[1] .= "whatsis";

23
Computing the element
• Subscript expression can be any integer-yielding value:
$n = 2;
print $fred[$n]; # prints $fred[2]
print $fred[$n / 3]; # truncates 2/3 to 0, thus $fred[0]
• Maximum element index available:
$max = $#fred; # $max = 2
$last = $fred[$#fred]; # always last element

24
Going out of bounds
• Out-of-bound subscripts return undef:
defined $fred[17]; # false
• Assigning beyond end of array just stretches it:
$fred[17] = "hello";
• Negative subscripts count back from end:
$fred[-1] # last element
$fred[-2] # second to last element

25
List literals
• List of scalars, separated by commas, enclosed in parens:
(1, 2, 3) # same as (1, 2, 3, )
("fred", 4.5)
( ) # empty list
(1..100) # same as (1, 2, 3, ... up to ... , 98, 99, 100)
(0..$#fred) # all indicies
• Quoted words:
qw(fred barney betty wilma)
qw[fred barney betty wilma]
• Works with “paired” delimiters
• Or duplicate non-paired delimiters:
qw|foo bar|

26
List assignment

• Corresponding values are copied:


($fred, $barney, $dino) = ("flintstone", "rubble", undef);
• Too short on right? Extras get undef:
($fred, $barney, $dino) = qw(flintstone rubble);
• Too short on left? Extras are ignored

27
“All of the” shortcut
• Imagine assigning to consecutive array elements:
($rocks[0], $rocks[1], $rocks[2], $rocks[3]) =
qw(talc mica feldspar quartz);
• Simpler: “all of the”
@rocks = qw(talc mica feldspar quartz);
• Previous value is always completely erased
• Can also be used on right side of assignment:
($a, $b, $c, $d) = @rocks;
@his_rocks = @rocks;
@her_rocks = ("diamond", @rocks, "emerald");

28
Array operations
• Remove end of array:
@numbers = (1..10);
$final = pop @numbers; # $final gets 10
• Add to end of array:
push @numbers, 10..15;
• Add to beginning of array:
unshift @numbers, -10..0;
• Remove from beginning of array:
$minus_ten = shift @numbers;
• pop and shift are destructive, removing single element

29
Array interpolation
• Single elements act like scalars:
@rocks = qw(flintstone slate rubble);
print "barney $rocks[2]\n"; # barney rubble\n
• “all of the” inserts spaces:
print "got @rocks\n"; # got flintstone slate rubble\n
• Beware email addresses in double quotes:
print "My email address is merlyn@stonehenge.com\n";
• Precede @ with \ to avoid interpolation

30
Using foreach
• Simplest way to walk a list:
foreach $rock (qw(bedrock slate lava)) {
print "One rock is $rock\n";
}
• $rock is set to each element in the list in turn
• Any outer $rock is unaffected
• Assigning to $rock affects the element:
@rocks = qw(bedrock slate lava);
foreach $rock (@rocks) { $rock = "hard $rock" }
• Leaving variable off uses $_
foreach (@rocks) { $_ = "hard $_" }
• Aside: $_ is often the default for many operations

31
List operations
• Reverse:
@rocks = qw(bedrock slate rubble granite);
@reversed = reverse @rocks;
@rocks = reverse @rocks;
• Sort (stringwise):
@rocks = sort @rocks;
@rocks = reverse sort @rocks;
• Default sort is not numeric:
@result = sort 97..102; # 100, 101, 102, 97, 98, 99
• Numeric and other user-defined sorts covered later

32
Scalar and list context
• Important to recognize when Perl needs which:
42 + something # looking for a scalar
sort something # looking for a list
• ... because some things return different values
@people = qw(fred barney betty);
@sorted = sort @people; # @people returns elements
$number = 42 + @people; # @people returns 3 (count)
• Even assignment itself has context:
@copy = @people; # list assignment (elements)
$count = @people; # scalar assignment (count)

33
List context <STDIN>
• Scalar context: one line at a time, undef at EOF
• List context: all remaining lines:
@lines = <STDIN>;
• Kill those newlines:
chomp(@lines = <STDIN>);
• Once read, we’re at EOF
• No more use of <STDIN> in that invocation
• Makes an entire list in memory
• Not good for 4GB web logs

34
Subroutines

35
Define and invoke
• Define with “sub”:
sub marine {
$n += 1;
print "Hello, sailor number $n!\n";
}
• Invoke in an expression with & in front of name:
&marine; # Hello, sailor number 1!
&marine; # Hello, sailor number 2!

36
Return values
• Last expression evaluated is return value:
sub add_a_to_b {
print "hey, I was invoked!\n";
$a + $b;
}
$a = 3; $b = 4; $c = &add_a_to_b;
• Not necessarily textually last:
sub bigger_of_a_or_b {
if ($a > $b) { $a } else { $b }
}

37
Arguments
• Values passed in parens get assigned to @_
$n = &max(10, 15);
sub max {
if ($_[0] > $_[1]) { $_[0] } else { $_[1] }
}
• Note that @_ has nothing to do with $_
• @_ is automatically local to the subroutine
• Perl doesn’t care if you pass too many or too few

38
Private sub vars
• Use “my” to create your own local vars:
sub max {
my ($x, $y);
($x, $y) = @_; # copy args to $x, $y
if ($x > $y) { $x } else { $y }
}
• Variables declared in my() are local to block
• Steps can be combined:
my($x, $y) = @_;
• Typical first line of a subroutine
• Or maybe:
my $x = shift; my $y = shift;

39
Variable arg lists
• Just respond to all of @_:
sub max {
my $best = shift;
foreach $next (@_) {
if ($next > $best) { $best = $next }
}
$best;
}
• Now it works for 15 args, 2 args, 1 arg, even 0 args!

40
“my” not just for subs
• Lexical vars (introduced with my()) works in any block
foreach (1..10) {
my $square = $_ * $_;
print "$_ squared is $square\n";
}
• Or any file:
my $value = 10;
...
print $value; # 10
• But why use that last one?

41
“use strict”
• Ensure declared variable names:
use strict; # at top of file
• Now all user-defined vars must be “introduced”
my $x = 3;
$x; # ok
$y; # not ok... no "my" in scope
{ my $z;
$z; # ok
}
$z; # not ok (out of scope)
• Helps catch typos
• Use in every program over 10 lines

42
Early subroutine exit
• Break out of a subroutine with “return”
• Sets the return value as a side effect
my @names = qw(fred barney betty dino wilma);
my $result = &which_element_is("dino", @names);
sub which_element_is {
my($what, @array) = @_;
foreach (0..$#array) { # indices of @array's elements
if ($what eq $array[$_]) { return $_ }
}
−1; # not found
}
• Most legacy code has no explicit returns

43
Persistent but local
• Back to first example:
sub marine {
state $n = 0; # initial value
$n += 1;
print "Hello, sailor number $n!\n";
}
• Introduced in 5.10

44
Simple I/O

45
Reading to EOF
• Reading a line:
chomp(my $line = <STDIN>);
• Reading all lines, one at a time:
while (defined(my $line = <STDIN>)) {
do something with $line;
}
• Shortcut for defined($_ = <STDIN>)
while (<STDIN>) { ... }
• Very common
• Don’t confuse with:
foreach (<STDIN>) { ... }

46
Filters
• Act like a Unix filter
• Read from files named on command line
• Write to standard output
• Any “-” arg, or no args at all: read standard input
• Perl uses <> (diamond) for this:
while (<>) {
# one line is in $_
}
• Invoke like a filter:
./myprogram fred barney betty

47
@ARGV
• Really a two step process
• Args copied into @ARGV at startup
• <> looks at current @ARGV
• Common to alter @ARGV before diamond:
if ($ARGV[0] eq "-v") {
$verbose = 1; shift;
}
while (<>) { ... } # now process lines

48
Formatting Output
• Use printf:
printf "hello %s, your password expires in %d days!\n",
$user, $expires;
• First arg defines constant text, and denotes parameters
• Remaining args provide parameter values
• All the standard conversions
• s(tring), f(loat), g(eneral), d(ecimal), e(xponential)
• All the standard conversions, generally formed as:
% [-] [width] [.precision] type
• Negative width is left-justify; missing width is “minimal”
• Double up the % to get a literal one: %%

49
Filehandles
• Name for input or output connection
• No prefix character
• STDIN, STDOUT, STDERR, DATA, ARGV, ARGVOUT
• Open to connect:
open MYHANDLE, "<", "inputfile";
• Direction can be:
"<" for read, ">" for write, ">>" for append
• Returns success:
my $opened = open ...;
if (! $opened) { ... }
• Close to disconnect:
close MYHANDLE;

50
die and warn
• What if opening is essential?
• Use die to abort early:
if (! $opened) { die "cannot open inputfile: $!"; }
• Sends diagnostic to Standard Error
• Adds filename and line number of error
• Use $! to indicate text of system-related error
• Not enough for fatal? Just warn:
if ($n < 100) { warn "needed 100 units!\n" }

51
Using filehandles
• Use line-input operator with your handle name:
my $one_line = <MYHANDLE>;
while (<MYHANDLE>) { ... }
• Add handle to print or printf:
print THATHANDLE "hello, world!\n";
printf STDERR "%s: %s\n", $0, $reason;
• Default filehandle is STDOUT unless you select one:
my $old = select MYHANDLE;
print "this goes to MYHANDLE\n";
select $old; # restore it

52
say
• Introduced in 5.10 to make things easier:
# in 5.8 and earlier:
print "hello, world!\n";
# in 5.10 and later:
use 5.010;
say "hello, world!";
• Newline automatically added
• Saves at least 4 characters of typing!

53
Hashes

54
Overview
• Mapping from key to value
• Keys are unique strings
• Values are any scalar (including undef)
• Useful for:
• Aggregating data against many items
• Mapping things from one domain to another
• Guaranteeing uniqueness
• Scales well, even for large datasets

55
Hash element access
• Like array element, but with {} instead of []
$family_name{"fred"} = "flintstone";
$family_name{"barney"} = "rubble";
• Access the elements:
foreach my $first_name (qw(fred barney)) {
my $family_name = $family_name{$first_name};
print "That's $first_name $family_name for ya!\n";
}

56
“All of the” hash
• Use % for hash like @ for array:
my %family_name = qw(fred flintstone barney rubble);
• Key/value pairs initialize elements in the hash
• Unwind in a list context:
my @values = %family_name;
my %new_hash = %family_name; # copy
my %given_name = reverse %family_name;
• Use “big arrow” for clarity:
my %family_name = (
fred => "flintstone", barney => "rubble",
"bamm-bamm" => "rubble", dino => undef,
);

57
Hash operations
• keys/values access... the keys and values:
my %data = (a => 1, b => 2, c => 3);
my @keys = keys %data; # some permutation of (a, b, c)
my @values = values %data; # corresponding (1, 2, 3)
• Don’t change hash between keys and values!
• Order is unpredictable (like unwinding in list context)
• Thus, often combined with sort:
foreach my $key (sort keys %data) {
my $value = $data{$key};
... do something with $key and $value
}

58
each(%hash)
• Efficient walking of a hash
• Each call to each() returns the “next” key/value pair
• If all pairs are used up, returns empty list
• Typically used in loop:
while (my($key, $value) = each %somehash) { ... }
• Note list assignment in scalar (boolean) context here
• Internal order again (like flatten or keys or values)

59
Typical hash usage
• Bedrock library tracks numbers of books checked out
• Key in hash = “has a library card”
• Value in hash = “number of books”
• Could be be undef (never used card)
• Has some books checked out?
if ($books{"fred"}) { ... }
• Has used their card?
if (defined $books{"barney"}) { ... }

60
exists and delete
• How do we say “has a card”?
• undef values whether no card, or unused card
• Use exists:
if (exists $books{"dino"}) { ... }
• True if any key matches selected key
• Revoke library card with delete:
delete $books{"slate"};
• Removes key if it exists

61
Counting things
• Count the words:
my %count;
while (<>) {
chomp;
$count{$_} += 1;
}
foreach $word (sort keys %count) {
print "$word was seen $count{$word} times\n";
}
• $count{$_} += 1 creates the initial entry automatically

62
The %ENV

• Your %ENV reflects your process environment:


foreach my $key (sort keys %ENV) {
print "$key=$ENV{$key}\n";
}
• Setting values affects child processes;
$ENV{PATH} .= ":/some/additional/place";

63
Regular Expressions

64
Overview
• Patterns that match strings
• Many examples in Unix
grep 'flint.*stone' somefile
• Not a filename match (“glob”)
• Most common use in Perl:
$_ = "yabba dabba do";
if (/abba/) { print "It matched!\n" }

65
Metacharacters
• Most characters match themselves
• /i/ matches “i”, and /2/ matches “2”
• Period matches any single character except newline
• /bet.y/ matches “bety” “betsy” “bet.y”
• Regular expressions understand double-quote things
• /coke\tsprite/
• Backslash also removes specialness
• /3\.14159/

66
Simple quantifiers
• Star means “0 or more” of preceding item
• /fred\t*barney/
• /fred.*barney/
• Plus means “1 or more” of preceding item
• /fred\t+barney/
• Question mark means “0 or 1” of preceding item
• /bamm-?bamm/

67
Parens in patterns
• Quantifiers apply to smallest item before them
• /fred+/ matches “freddddd” not “fredfred”
• Use parens to group items
• /(fred)+/ matches “fredfred” not “fredddd”
• Parens also establish backreferences
• \1 matches another copy of the first paren pair:
/(.)\1/ # same character doubled up
• Multiple backreferences permitted:
/(.)(.)\2\1/ # matches “abba” “acca” “bddb” “aaaa”

68
Relative backreferences

• Introduced in 5.10
• \g{N} counts from beginning if N is positive
• But counts relatively backwards if N is negative
• Generalizing the “abba” pattern:
• /.... (.)(.)\g{-1}\g{-2} ..../
• Other parens ahead of this won’t break it

69
Alternatives
• Vertical bar gives choice:
/fred|barney|betty/
• Low precedence
• Often used with parens:
/fred (and|or) barney/
• Careful! These aren’t the same:
/fred( +|\t+)barney/ # locked in spaces or tabs
/fred( |\t)+barney/ # any combination of spaces and tabs

70
Character classes
• List of characters delimited by []
• Matches only one character in string
• Often used with quantifiers
• List every character (order doesn’t matter):
[abcwxyz]
• Use “-” for ranges:
[abcw-z]
[a-zA-Z0-9]
• Negate with initial caret:
[^0-9] # everything except digits
[^\n] # everything except newline, same as “.”

71
Class shortcuts
• [0-9] is same as \d (“digits”)
/HAL-\d+/
• [^0-9] is same as \D
• [a-zA-Z0-9_] is same as \w (“word” characters)
• Often used as /\w+/
• And \W is the opposite of that
• [\f\t\n\r ] is the same as \s (“space”)
• \S is non-space
• Frequently used for parsing:
/(\S+)\s+(\S+)/ # two data items separated by whitespace
• Perl 5.10 adds:
\h for [\t ], \v for [\f\n\r], \R for “linebreak”

72
Using Regular
Expressions

73
Alternate delimiters
• /foo/ is actually m/foo/
• But “m” can take other delimiters like qw():
m(hello) m[fred|barney] m!this!
• Balanced delimiters nest properly:
m(fred (and|or) barney)
• Otherwise, you can always backslash the terminator:
m!foo\!bar!
• Use alternate delimiters to avoid escaping forward slash:
/http:\/\// # works, but ugly
m{http://} # much nicer

74
Modifiers
• Case insensitive matching:
print "Would you like to play a game? ";
chomp($_ = <STDIN>);
if (/yes/i) { # case-insensitive match
print "In that case, I recommend that you go bowling.\n";
}
• Match newlines with period:
/Barney.*Fred/s
• Combine the modifiers in any order
/yes.*no/is or /yes.*no/si

75
More readable regex
• Add “x” modifier to ignore most whitespace:
/-?\d+\.?\d*/ # what is this
/ -? \d+ \.? \d* /x # a little better
• Spaces and tab no longer match themeselves
• Unless escaped or inside a character class
• More common to use \s+ instead
• Pound-sign to end of line is also a comment:
/ -? # optional prefix
\d+ # some digits
\.? # optional period
\d* #optional digits
/x # closing of regex with “x”

76
Anchors
• In absence of anchors, regex float from left to right
• Caret anchors to beginning:
/^fred/ # only match fred at beginning of string
• Dollar anchors to end:
/fred$/ # only match fred at end of string
• Use both to ensure entire string is matched:
/^fred$/
• Common mistake: validation without anchor:
if (/\d+/) { # only digits allowed
• Wrong! Allows “foo34bar”
• Better: if (/\D/) { # has a non digit, fail

77
Word boundaries
• Match at edge (beginning or ending) of word with “\b”:
/\bfred\b/ # matches fred but not frederick or manfred
• Words are defined as things that match /\w+/
• Thus, in “That's a word boundary!”, the words are:
That s a word boundary
• Note the “s” is by itself as a separate word
• This means /\bcan\b/ will match in “can’t stop it!”
• Also available: “not word boundary” /\B/

78
Binding operator

• So far, matches are against $_


• Use =~ to bind to another value instead:
my $some_other = <STDIN>;
if ($some_other =~ /rubble/) { ... }
• Note this isn’t an assignment
• Merely says “don’t look at $_, look over here”

79
Interpolating patterns
• Sometimes, pieces of the regex come from operations:
my $what = "Larry";
while (<>) {
if (/^($what)/) {
print "We saw $what in beginning of $_\n";
}
}
• The value “Larry” becomes part of the regex
• Replace first line to get pattern from arguments:
my $what = shift;
• Pattern is parenthesized to permit “fred|barney”
• Ill-formed patterns are fatal exceptions

80
Match variables
• Backreferences are available after a successful match
• \1 in the regex maps to $1 as a read-only variable:
$_ = "Hello there, neighbor";
if (/\s(\w+),/) {
print "the word was $1\n"; # the word was there
}
if (/(\S+) (\S+), (\S+)/) {
print "words were $1 $2 $3\n";
}
• Failed matches do not reset the memories
• Always check success first!

81
Noncapturing parens

• Parens used for precedence also trigger memory:


if (/(\S+) (and|or) (\S+)/) { ... } # value in $1 and $3
• To avoid triggering memory, change (...) to (?:...):
if (/(\S+) (?:and|or) (\S+)/) { ... } # value in $1 and $2
• Especially handy when accessed from a distance
• But see also “named captures”...

82
Named captures
• New in 5.10
• Instead of mapping to $1, $2, $3, map to $+{name}:
if (/(?<name1>\S+) (?:and|or) (?<name2>\S+)/) {
print "$+{name1} and $+{name2}\n";
}
• Now it’s relatively safe from maintenance trouble
• Names can even be “out of order”
• And \g{name1} is a backreference to (?<name1>...)

83
Automatic Match Vars
• A regex matching a string divides the string into 3 parts:
• The part that actually matched (available in $&)
• The part before the part that matched (available in $`)
• The part after the part that matched (available in $')
• Example test harness:
my $regex = "Fred|Barney";
my $text = "I saw Barney with Fred today!";
if ($text =~ $regex) {
print "($`)($&)($')\n"; # delimits pieces with parens
}

84
Generalized quantifiers
• Using *, +, and ? for quantifiers frequently covers it
• Sometimes, you want “5 through 15” though:
/a{5,15}/
• Two numbers in braces indicate lower and upper bound
• A single number means “exactly that many”:
• /c{7}/
• /,{5}chameleon/
• Single number followed by comma: “that many or more”:
/(fred){3,}/
• Thus, * is {0,} and + is {1,} and ? is {0,1}
• But if you type those instead, people look funny at you

85
Regex precedence
• In precedence from highest to lowest:
(...) (?:...) (?<label>...) # parens for grouping/backrefs
a* a+ a? a{n} a{n,m} # quantifiers
abc ^a a$ # anchors and sequence
a|b|c # alternation
a [abc] \d \1 # atoms
• Like in math, in absence of parens, highest goes first
• Surprising example:
/^fred|barney$/ # should be /^(fred|barney)$/

86
Processing with
regular expressions

87
Substitutions
• s/old/new/ looks for regex /old/ in $_, replacing with new
$_ = "He's out bowling with Barney tonight.";
s/Barney/Fred/; # Replace Barney with Fred
print "$_\n";
• Nothing happens if match fails
• Regex match triggers normal backreference variables:
s/with (\w+)/against $1's team/;
print "$_\n";
• s/// returns true if replacement succeeds:
$_ = "fred flintstone";
if (s/fred/wilma/) { ... }

88
Global changes
• Normally, first matching wins, and we’re done
• Add “g” option for “global”:
$_ = "home, sweet home!";
s/home/cave/g; # “cave, sweet cave!”
• Transform to canonical whitespace:
s/\s+/ /g;
• Additionally, the “i”, “x”, and “s” options can be used

89
Alternate delimiters
• Just like qw() and m(), we can pick other delimiters
• If a non-paired delimiter, appears three times:
s#^https://#http://#;
• If a paired delimiter, use two pairs:
s{fred}{barney};
s{fred}[barney];
s<fred>#barney#;

90
Case shifting
• Easy to capitalize or lowercase a string
• Uppercase remaining replacement with \U
$_ = "I saw Barney with Fred.";
s/(fred|barney)/\U$1/gi;
• Lowercase with \L
• Case shifting continues until \E or end of replacement:
s/(\w+) with (\w+)/\U$2\E with $1/i;
• Lowercase versions \l and \u affect only next character:
s/(fred|barney)/\u$1/ig;
• Combine them for “initial cap”:
s/(fred|barney)/\u\L$1/ig;

91
The split function
• Break a string by delimiter:
my @fields = split /separator/, $string;
my @things = split /:/, "abc:def:g:h";
• Adjacent delimiter matches create empty values:
my @things = split /:/, "abc:def::g:h";
• .... unless the delimiter matches it all at once:
my @things = split /:+/, "abc:def::g:h";
• Leading empty fields are kept; trailing ones discarded:
my @things = split /:/, ":::a:b:c:::";
• Default args are split on whitespace in $_:
while (<>) { my @words = split; ... }

92
The join function

• Reversing split, but without a regex, just a string:


my $whole = join $delimiter, @pieces;
my $x = join ":", 4, 6, 8, 10, 12;
• Might not need the glue at all:
my @words = qw(Fred);
my $output = join ", ", @words;

93
List-context match
• m// in scalar context returns true/false
• m// in list context returns ordered backreferences:
$_ = "Hello there, neighbor!";
my($first, $second, $third) = /(\S+) (\S+), (\S+)/;
print "$second is my $third\n";
• m//g in list context returns every match:
my @words = /(\w+)/g;
• This is like a split, but you say what to keep, not discard

94
Non-greedy quantifiers
• Quantifiers normally “go long” then “back off”:
"fred and barney weigh more than just barney!"
=~ /fred.*barney/ # matches nearly entire string
• Append “?” to quantifier to make it lazy:
"fred and barney weigh more than just barney!"
=~ /fred.*?barney/ # matches first barney, not last
• Can be more efficient
• Can’t be the default for compatibility reasons

95
Multi-line text

• Typically, matches are against one line


• Could slurp an entire file into a variable:
my $text = join "", <MYHANDLE>; # not efficient
• The ^ and $ anchors bind to entire string by default
• Use “m” option to add “and any embedded line”:
if ($text =~ /^wilma\b/im) { ... }

96
Updating many files
• Editing text files “in place” requires a rename dance
• Configure diamond to do this automatically:
@ARGV = qw(my list of files here);
$^I = ".bak"; # appended after opening
while (<>) { # read lines from old version
s/foo/bar/g; # make changes
print; # print to new copy of file
}
• Even do this from the command line:
perl -pi.bak -e 's/foo/bar/g' my list of files here

97
More Control
Structures

98
unless
• Reverses “if”
if (! ($n > 10)) { ... }
unless ($n > 10) { ... }
• Useful to avoid that “empty if” strategy:
if ($n > 10) {
# do nothing
} else {
... do this ...
}
• Please don’t use “unless .. else”
• ...forbidden in Perl6 anyway

99
until
• Reverses “while”
while (! ($j > $i) ) {
$j *= 2;
}
until ($j > $i) {
$j *= 2;
}
• Just another way of saying it, sometimes clearer

100
Expression modifiers
• Simplify structures that have single-expression bodies
• Turn them “inside-out”:
if ($n < 0) { print "$n is a negative number.\n" }
print "$n is a negative number.\n" if $n < 0;
• Note the decrease in punctuation
• Doesn’t change execution order: just syntax
• Also works for other kinds:
print "invalid input" unless &valid($input);
$i *= 2 until $i > $j;
print " ", ($n += 2) while $n < 10;
&greet($_) foreach @person;

101
The naked block
• Defines a syntax boundary, useful for lexical (“my”) vars:
{
print "Enter a number: ";
chomp(my $n = <STDIN>);
my $root = sqrt $n; # square root
print "The square root of $n is $root.\n";
}
• Now $root and $n won’t pollute the neighboring code

102
The elsif clause
• Multi-way if statements:
if (first expression) {
... first expression is true ...
} elsif (second expression) {
... second expression is true ...
} elsif (third expression) {
... third expression is true ...
} else {
... none of them were true ...
}
• Only one block will be executed
• Note the spelling. not “elseif” or “else if”. Larry’s rules.

103
Autoincrement
• ++ adds one to a variable:
$a++;
• Can appear before or after the variable
• After means “value is used before increment”
my $old = $n++;
• Before means “value is used after increment”
print "we got here ", ++$n, " times\n";
• Similarly, “--” reduces by one.

104
The for loop
• Like C/Java/Javascript:
for (initializer; test; increment) {
... body ...
}
• Equivalent:
initializer; while (test) {
... body ...;
increment;
}

105
Using for
• Typically used for computed iterations:
for ($i = 1; $i <= 10; $i++) {
print "I can count to $i!\n";
}
• Note that $i is not local to the loop here
• To make it local, declare it:
for (my $i = 1; ...
• Increment doesn’t have to be 1, or even numeric:
for ($_ = "bedrock"; s/(.)//; ) {
print "one character is $1\n";
}

106
foreach and for
• The foreach loop walks a variable through a list
• The for loop computes a sequence on the fly
• And yet, the words themselves can be interchanged
for $n (1..10) { ... } # foreach loop
foreach ($n = 1; $n <= 10; $n++) { # for loop
• Perl figures out which you mean by syntax
• “for” saves 4 characters when used for “foreach”
• Most Perl loops are typically foreach

107
The last function
• Breaks out of a loop early
• Similar to “break” in C
• Jumps out of one level of loop:
while (<>) {
for (split) {
last if /#/;
print "$_ ";
}
print "\n";
}
• Loop is while, until, for, foreach, or naked block

108
The next function
• Skips the remaining processing on this iteration
• Similar to “continue” in C
• Example:
while (<>) {
next if /^#/;
for (split) {
print "$_\n";
}
}

109
The redo function
• Jumps upward in current iteration:
foreach (@words) {
print "please type $_: ";
if ("$_\n" ne <STDIN>) {
print "sorry, that isn’t it!\n";
$errors++;
redo;
}
}
• Like last and next, works with innermost block

110
Labeled blocks
• How do you get to an outer block? Name it!
LINE: while (<>) {
WORD: for (split) {
last LINE if /__END__/;
$word_count++;
}
$line_count++;
}
• Use label with last/next/redo to say “this loop”
• Perl Best Practices recommends naming all loops
• Larry recommends using nouns for names (as above)

111
The ternary ?: operator
• Like if/then/else, but within an expression
my $absolute = $n > 0 ? $n : -$n;
• Guaranteed to “short circuit”: skips unneeded expression:
my $average = $n ? ($sum/$n) : "n/a";
• Don’t use in place of full if/then/else:
$some_expression ? &do_this() : &do_that();
• If you’re not using the return value, likely bad idea

112
Logical operators
• Logical “and” is &&: both sides true for true result
• Logical “or” is ||: either side true for true result
• Short-circuit... stops when we know the answer
• “true || something” never needs to look at “something”
• “false && something” ditto
• Returns the last expression evaluated
my $last_name = $last_name{$first} || "no last name";
• Can also be spelled out (“and” “or”) for low precedence
• Perl 5.10 introduces // “defined or”:
my $last_name = $last_name{$first} // "no last name";

113
Short-circuit controls
• Short circuiting can be exploited (obfuscated):
if ($next > $best) { $best = $next }
$next > $best and $best = $next;
unless ($m > 10) { print "m is too small" }
$m > 10 or print "m is too small";
• Don’t do this, except for one case:
open MYHANDLE, "<", "somefile"
or die "cannot open somefile: $!";
• That’s not only sanctioned... it’s recommended!

114
Perl Modules

115
Installing modules
• Core vs CPAN
• Check if installed with “perldoc Module::Name”
• Install individual MakeMaker-based modules
• Extract distribution and cd into it
• Type “perl Makefile.PL”
• Type “make install”
• For Module::Build-based modules, slightly different:
• Extract distro
• Type “perl Build.PL”
• Type “./Build install”
• But what about dependencies?

116
CPAN shell
• Most modules get installed directly from the CPAN
• Modules in CPAN have dependencies noted
• CPAN shell handles download, test, depends, and installs
• Two methods of invocation:
• perl -MCPAN -eshell (works nearly everywhere)
• cpan (most modern systems)
• Once you reach a prompt, it’s just “install Foo::Bar”
• You might get asked about dependencies
• Also check out “CPANPLUS” and “CPAN minus”

117
Using Modules
• Simple task: get basename of file:
(my $basename = $name) =~ s#.*/##; # broken
• Didn’t consider \n in filename (yes, legal)
• Didn’t consider portability
• Easy fix:
use File::Basename;
my $basename = basename($name);
• Module defines basename, dirname, fileparse
• To get just some:
use File::Basename qw(basename dirname);

118
OO Modules
• Don’t fear the OO! Just enough to get you by
• The File::Spec module doesn’t export subs:
use File::Spec;
my $new_name =
File::Spec->catfile($dirname, $basename);
• The -> syntax there is calling a “class method”
• Until you learn Perl OO, just copy the syntax

119
OO Instances
• The DBI module (found in the CPAN) uses instances
• Like File::Spec, no subroutines are imported
• Calling “constructors” returns “objects”
use DBI;
my $dbh = DBI->connect(...);
• These objects can then take methods themselves:
my $sth = $dbh->prepare("...");
$sth->execute();
my @row = $sth->fetchrow_array;
$sth->finish;
• Until you learn proper OO, the examples should work

120
File tests

121
File test operators
• Many file tests, most boolean, some valued
• Test if a file exists with -e
die "$filename already exists!" if -e $filename;
• Works with a filehandle or a filename, default $_
• Boolean tests: -r (read), -w (write), -x (execute)
• -e (exists), -z (zero size), -f (plain file), -d (directory)
• -t (is a terminal), -T (“text” file), -B (“binary” file)
• Numeric values: -s (size in bytes)
• -M (mod time in days), -A (access time), -C (inode change)
• Skip retesting with underscore:
if (-r $somefile and -w _) { ... } # both read and write
if (-w -r $file) { ... } # perl 5.10 only

122
stat and lstat
• Even more info!
my ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev,
$size, $atime, $mtime, $ctime, $blksize, $blocks)
= stat($filename);
• Designed on traditional Unix stat call
• Mapped as well as possible on non-Unix
• stat() chases a symlink, lstat() gives info on symlink itself
• Timestamps are Unix epoch time—convert with localtime
my @stat = stat("/tmp");
my $when = localtime $stat[9];
print "/tmp last modified at $when\n";

123
Bit manipulation
• Bitwise “and”:
10 & 12 # result is 8, the common bits of 1010 and 1100
• Similarly:
10 | 12 # bitwise “or”, result is 14
10 ^ 12 # bitwise “xor”, result is 6
6 << 2 # left shift, result is 24
25 >> 2 # right shift, result is 6
~ 10 # bit complement, result depends on int size
• Use this with stat to extract mode:
my $permissions = $mode & 0777;
$permissions |= 0111; # “or”-in the executable bits

124
Directory operations

125
Using chdir
• Changes the per process “current directory”
• Used for all “relative” filenames
• Inherited from parent, inherited by children (later)
• Might fail, so test result:
chdir "/etc" or die "Cannot chdir /etc: $!";
• Always test result
• Does not understand tilde-expansion (like shells)

126
Globbing
• Getting files that match a filename pattern:
$ echo *.pm
barney.pm dino.pm fred.pm wilma.pm
• Performed by shell automatically; mostly you don’t worry
$ perl -e 'print "@ARGV\n"' *.pm
barney.pm dino.pm fred.pm wilma.pm
• But what if you have “*.pm” inside your program?
my @names = glob "*.pm"; # expand as the shell would
• Ignores dotfiles... include them explicitly:
my @all = glob(".* *");

127
Ancient globbing

• You may see <*> in place of glob(‘*’)


• Same internal function, different syntax
• Collides with <HANDLE> reading
• Perl has weird rules to sort out which was which
• If you really want to avoid <>, you can use readline:
my @lines = readline HANDLE;

128
Directory handles
• Globbing is easy to type
• Sorts its results; might not be needed
• Lower level access with readdir():
opendir DH, "/some/dir" or die "opendir: $!";
foreach $file (readdir DH) { ... }
closedir DH;
• opendir is like open, readdir is like readline, etc
• Names are unsorted, and include all names
• Even . and ..
• Names don’t include any directory part
• If you need recursive directory processing, check out:
“perldoc File::Find”

129
File operations

130
Removing files
• The equivalent to command-line “rm” is unlink():
unlink "slate", "bedrock", "lava";
• Since unlink takes a list, and glob is happy to return one:
unlink glob "*.pm"; # like "rm *.pm" at the shell
• Return value from unlink is number of files deleted
• Can’t diagnose trouble unless done one at a time:
foreach $goner (qw(slate bedrock lava)) {
unlink $goner or warn "cannot unlink $goner: $!";
}

131
Renaming files
• Like Unix “mv”:
rename "old", "new" or warn "Cannot rename: $!";
• Batch rename all *.old to *.new:
foreach my $old (glob "*.old") {
(my $new = $old) =~ s/\.old$/.new/;
if (-e $new) {
warn "cannot rename $old to $new: existing file\n";
} else {
rename $old, $new or warn "$old => $new: $!\n";
}
}

132
Hard/soft links
• Hard links, like “ln”:
link $old, $new or die "Cannot link: $!";
• Symlinks (“soft” links), like “ln -s”:
symlink $old, $new or die "Cannot symlink: $!";
• Read where the link points:
my $link = readlink $new;
• readlink() returns undef if it’s not a symlink
• Interesting trick: symlinks that are invalid:
-l $name and not -e $name

133
Making directories
• Make directories with mkdir:
mkdir "fred" or warn "Cannot mkdir fred: $!";
• Default permissions applied, unless you provide it:
mkdir "private", 0700;
• The “umask” is still applied to this value though
• Note the permission is given in octal here
• Remove directories with rmdir:
rmdir "fred" or warn "Cannot remove fred: $!";
• Directories must be empty:
unlink glob "fred/.* fred/*"; ...
• Even that might fail if there are subdirs
• Consider rmtree() in File::Path

134
Modifying permissions
• Change permissions with chmod():
chmod 0755, "fred", "barney";
• Again, note the octal value for permissions
• Combine with stat() to set relative permissions:
for my $file (glob "*") {
my @stat = stat($file) or next;
my $new_perms = ($stat[2] & 0777) & ~ 0111;
chmod $new_parms, $file or warn "chmod $file: $!";
}
• Or see File::chmod in the CPAN:
use File::chmod; chmod "-UGx", $_ for glob("*");

135
Modifying ownership
• Use chown to change owner and group:
defined(my $user = getpwnam "merlyn") or die;
defined(my $group = getgrnam "users") or die;
chown $user, $group, glob "/home/merlyn/*";
• Ability to change owner and group is restricted
• “-1” means “no change” for owner and/or group
• On most modern systems

136
Modifying timestamps

• Make everything looked accessed now, modified yesterday


my $now = time; # current unix epoch time
my $ago = $now - 86400; # seconds in a day
utime $now, $ago, glob "*"; # set atime, mtime, for all files
• The ctime value is always set to “now”

137
Strings and sorting

138
Finding a substring
• index() returns 0-based value of first occurrence:
my $stuff = "Howdy world!";
my $where = index($stuff, "wor"); # $where gets 6
• Start later with third parameter:
my $first_w = index($stuff, "w"); # 2
my $second_w = index($stuff, "w", $first_w + 1); # 6
my $third_w = index($stuff, "w", $second_w + 1); # -1
• Go from right to left with rindex():
my $last_slash = rindex("/etc/passwd", "/");

139
Manipulate with substr
• Select from a string with substr():
my $part = substr($string, $start, $length);
print "J", substr("Hi, Hello!", 5, 4), "!\n";
• Initial position can be negative to count from end
• substr can also be used on left side of assignment:
my $string = "Hello, world!";
substr($string, 0, 5) = "Goodbye";
• Handy for “regional” substitutions:
substr($string, -20) =~ s/fred/barney/g;
• Or use fourth arg:
my $previous = substr($string, 0, 5, "Goodbye");

140
Formatting with sprintf

• Like printf, but into string, not handle:


my $date_tag =
sprintf "%04d/%02d/%02d %2d:%02d:%02d",
$year, $month, $day, $hour, $minute, $second;
• Great for rounding off numbers:
my $rounded = sprintf "%.2f", 2.49997;

141
Advanced sorting
• Replace the “comparison” in the built-in sort:
• Define a subroutine that compares $a to $b
• Return -1 if $a is “less than” $b, +1 if other way around
• Return 0 if they are “equal”, or incomparable
• Example:
sub numeric {
return -1 if $a < $b;
return 1 if $a > $b;
return 0;
}
• Shortcut for this:
sub numeric { return $a <=> $b }

142
Using your compare
• Place your comparison sub name after “sort” before list:
my @sorted = sort numeric 97..102;
• Routine gets called “n log n” times to provide order
• No need for external routine!
my @sorted = sort { $a <=> $b } @numbers;
• Descending? just swap $a and $b or add reverse:
my @down = sort { $b <=> $a } @numbers;
my @down = reverse sort { $a <=> $b } @numbers;
• For strings, use “cmp”
my @sorted = sort { "\L$a" cmp "\L$b" } @items;

143
Sorting hash by value
• Trophy time: sort names by value descending:
my %score =
("barney" => 195, "fred" => 205, "dino" => 30);
my @winners = sort by_score keys %score;
• What’s in by_score? Indirect sort of keys:
sub by_score { $score{$b} <=> $score{$a} }
• What if the scores tie?
$score{"bamm-bamm"} = 195;
• Add a second level sort:
sub by_score
{ $score{$b} <=> $score{$a} or $a cmp $b }
• Now sorts by descending score, and ascending name

144
Process management

145
The system function
• Fire off child process:
system "date"
• Child inherits stdin, stdout, stderr
• Perl is waiting for child
• Return value is related to exit status of child
• But 0 is good!
system "date" and die "can't launch date!";
• Any shell metachars in string cause /bin/sh to interpret
• Multiple args for pre-parsed commands (never /bin/sh)
system "tar", "cvf", $tarfile, @dirs;

146
The exec function
• Like system, but doesn’t fork
• Command overlays current Perl process
• Good for things that don’t need to return:
chdir "/tmp" or die;
$ENV{PATH} .= ":/usr/rockers/bin";
exec "bedrock", "-o", "args1", @ARGV;
• Once exec succeeds, Perl isn’t there!
• Only reason still in Perl: command not found:
exec "date";
die "date not found: $!";

147
Backquotes
• Grab output of command:
chomp(my $now = `date`);
• Includes all stdout, typically ending in newline
• Backquotes are double-quote interpolated:
$arg = "sleep";
my $doc = `perldoc -t -f $arg`;
• Might want to merge stderr in there:
my $output_and_errors = `somecommand 2>&1`;
• Stdin also inherited—might want to force it away:
my $result = `date </dev/null`;

148
Backquotes as list
• If output has multiple lines, use backquotes in list context
my @who_lines = `who`;
• Each element will be like:
merlyn tty/42 Dec 7 19:41
• Use in a loop:
foreach (`who`) {
my ($user, $tty, $date) = /(\S+)\s+(\S+)\s+(.*)/;
$ttys{$user} .= "$tty at $date\n";
}

149
Processes as filehandles
• open a pipe directly within Perl:
open DATE, "date|" or die;
open MAIL, "|mail merlyn" or die;
• Processes run in parallel, coordinated by kernel
• Read and write like any filehandle:
my $now = <DATE>;
print MAIL "The time is now $now";
• Close the writehandle to indicate EOF
close MAIL;
my $status = $?;
• Closing a process handle sets $? (like system return)

150
Why filehandles?
• For reading something simple, doesn’t help much
• For writing, about the only way to do it
• One example where reading works well:
open F, "find / -atime +90 -size +1000 -print|"
or die "fork: $!";
while (<F>) {
chomp;
printf "%s size %dK last accessed on %s\n",
$_, (1023 + -s $_)/1024, -A $_;
}

151
How low can we go?
• Full support for:
• fork
• exec
• waitpid
• exit
• arbitrary pipes and file descriptors
• System V IPC
• waiting on multiple handles for I/O
• Sophisticated frameworks have been built (like POE)

152
Simple signal handling
• Send a signal with kill():
kill 2, 4201; # send SIGINT to 4201
• Wait for a SIGINT, and clean up:
sub int_handler {
unlink glob "/tmp/*$$*"; # remove my files
exit 0;
}
$SIG{INT} = 'int_handler'; # register
• Handler doesn’t have to exit:
my $flag = 0;
sub int_handler { $flag++ }
• Now check $flag from time to time

153
But wait... there’s more

154
In Llama book

• Smart matching
• Given/when
• Trapping errors with eval
• Simple uses of grep and map
• Slices

155
In Alpaca book
• The debugger
• Packages
• References
• Storing complex data structures
• Objects
• Writing Modules
• Embedded documentation
• Testing
• Publishing to CPAN

156
Beyond that
• Even more regex things
• More functions, operators, built-in variables, switches
• Network operations
• Security
• Embedding Perl in other applications
• Dynamic loading
• Operator overloading
• Data structure tie-ing
• Unicode handling
• Direct DBM access for simple persistence

157
And Perl 6

• Early-early-early adopters can play already


• Still a ways before early adopters can use for production
• Everything you learn about Perl5 will still be useful in Perl6
• Except some of the syntax is different
• And some things are a lot easier
• And a few things are harder

158
For more information

• merlyn@stonehenge.com
• http://www.stonehenge.com/merlyn/

159

You might also like