You are on page 1of 9

A brief Perl primer Overview Roles of Perl Styles of Perl Programming Perl data types and operations on them

Subroutines, scopes, and other flow control OO in Perl Regexes Cool Perl tricks My recommended style Resources Program overview Weaknesses of Perl Roles of Perl Perl is useful for a number of things. Thanks to a strong developer community, t here is a central repository called CPAN through which software modules for task s from fuzzy logic to talking to Oracle are distributed. A rich (if cryptic) pat tern matching/transformation/extraction sublanguage for regular expressions (reg exes) provides powerful text manipulation capabilities. Perl datatypes are freeform, automatically sized to meet needs. Perl's syntax is very loose, allowing e asy migration from many different languages. Integration with other languages is fairly easy, using low-level tools such as libperl or XS, or using modules such as Inline::Java . Perl is portable -- Pure Perl code is generally portable acro ss Unices, Windows, MacOS, and other platforms. Memory management in Perl is ver y easy -- the Perl garbage collector handles most tasks for you, only needing he lp to break circular references. Perl is also semi-interpreted, giving it better performance than purely interpreted languages. Styles of Perl Programming #!/usr/bin/perl print "Hello, world!\n"; Perl started life inspired by C, shell, various Unix utilities such as sed and a wk, and some other languages, and has continued to borrow useful/beautiful const ructs from others throughout its evolution. It has been said that Perl is a lang uage in which people can write their own language, and this is equally valid as criticism and praise. It is very possible to use Perl primarily as a better shel l script language than shell -- shell programming is subject to the whims of var ious Unix vendors, not just in the details of the shell, but also in how sed, aw k, and the like work. Portable shell programming is very difficult. Many program mers have reported being happy moving from languages such as LISP to Perl. I don 't know LISP well enough to comment on this :) Due to the popularity of C, C++, and Java, many people's Perl resemble those languages structures and flow. Perl is, in my eyes, a better C than C, and similarly for those other languages, and you can see a lot of C in my Perl. There are, however, advantages to picking up Perl idioms over time. Unlike a lot of these other languages, you don't need to know much Perl to do something useful with it. Perl is also capable of Object-Or iented Programming, with most CPAN modules offering only object interfaces. Perl Data Types Perl has 5 essential data types. Data types Scalar ($) -- Scalars hold single values. They hold numbers, strings, object s, references, and any other single elements you might want. Perl isn't picky ab out what you put in a scalar, and will determine by context and content what you want when you use a scalar. Scalars are marked by the '$' prefix. $a = "Cat"; $a = $a . "'s Whiskers";

print $a . "\n"; returns Cat's Whiskers. If we then do: $a += 2; print "$a\n"; we get 2. We then might do $a .= " is it\n"; print $a; We'd get 2 is it. When Perl is asked to interpret a scalar as a number that was last used as a string, it looks for something that looks like a number at th e beginning, numerifies it, and uses it. When asked to use a number as a string, it simply stringifies it. Perl strings are sized automatically, can be assigned into, and can be manipulated in powerful ways using substr and regular expressi ons. Arrays (@) -- Arrays hold ordered, indexed lists of scalars. To refer to a l ist as a whole, the '@' prefix is used. To refer to a scalar in a list, '$' is u sed with the index. It is also possible to refer to a slice of a list, assigning into or reading out of those parts, with a continuous or explicit range. A list is an immediate form of an array, specified as values in parentheses. @cats = ("Wally", "Nemesis", "Sammy", "Oliver"); foreach $cat (@cats) { print "We have a cat called $cat\n"; } is the same thing as @cats = ("Wally", "Nemesis", "Sammy", "Oliver"); for($i=0; $i<= $#cats; $i++) { print("We have a cat called " . $cats[$i] . "\n"); } Assigning lists is handled intelligently: ($foo,undef,$bar) = @ARGV[0..$#ARGV]; print $foo . "\n"; This prints out the first argument passed to the script. Assigning to undef in a list is useful to discard parts of the source list. Note that it is not a p roblem if @ARGV is bigger than the target list -- extra elements are discarded. Notice also that if @ARGV is shorter, the assignment will still happen, but the unmatched elements will get the value undef (more on that later). Finally, note that the @ARGV array (familiar to C programmers) is an array in Perl (@ARGV's el ement 0 is C's argv[1].) The last index of an array @foo is available as $#foo, whereas the size of the same array is available as scalar(@foo) or otherwise att empting to retrieve @foo in scalar context. It is possible to push, pop, shift, and unshift an array, each an operation that places and retrieves new values at the start and end of an array, respectively. A destructive way to do the above e xamples would be @cats = ("Wally", "Nemesis", "Sammy", "Oliver"); while(my $cat = pop(@cats) ) { printf("We have a cat called %s\n", $cat);

} Hash/Associative Array (%) -- Hashes are another kind of list that, instead of using numeric indices, uses scalar indices (called keys). They can be used fo r a number of purposes, including code-organizational purposes. Like arrays, the y use their own symbol when referring to the entire hash, but individual members are accessed using the scalar symbol $ with the key. $a = "Haha\n"; my %prefs; $prefs{directory} = $ENV{HOME} . "./myprogram"; $prefs{username} = $ENV{USER}; $prefs{foo} = $a; Notice that the environment variables are available as the hash %ENV in Perl . It is also possible to retrieve the keys of a hash using the keys function, wh ich returns them in list context. It is possible to delete keys and their associ ated values using the delete function. Filehandles (no symbol) -- Filehandles in Perl have no symbol, and are globa l (unfortunately). There are modules that get around this, not described here. F ilehandles can be opened with open, closed with close, and read with readline. T here is a special syntax that can make readline implicit within a certain type o f while loop. open(RESOLV, "/etc/resolv.conf"); while($line = <RESOLV>) { print $line; } close(RESOLV); This code prints out the contents of /etc/resolv.conf (not very useful unles s you're on a Unix system). Notice that this is equiv to the following examples: $myfile = "/etc/resolv.conf"; open(RESOLV, $myfile) die "Could not open $myfile: $!\n"; while($line = readline(RESOLV)) { chomp($line); # Remove newline, if present print $line . "\n"; } and this too (idiomatic Perl, I'll explain in a second what this is doing) open(RESOLV, "/etc/resolv.conf") : $!\n"; while(<RESOLV>) { print; } die "Could not open /etc/resolv.conf

In the last 2 examples, the global variable $! is accessed if the open faile d. This variable contains a string indicating why the last function failed, and might contain a string such as "permission denied" or "no such directory". The l ast example uses a special perl scalar called the accumulator, which will be dis cussed momentarily. Notice that the angle brackets around the filehandle acts to read a line from it, and when that fails, the while loop terminates. To write t o a file, open it with a > marker at the start of its name. You may then direct print statements to it by listing the filehandle before the print content (no co mma):

open(MYFILE, ">" . $ENV{HOME} . "/myfile.txt") n"; print MYFILE "Hello, world!\n"; close(MYFILE);

die "Can't do example: $!\

Code (&) -- The equivilent of C's function pointers is possible, as are anon ymous functions, local functions, piecemail construction of functions through re gexes and eval, and various other black magic. If you decide you need to use any of this, I suggest you borrow or purchase O'Reilly's Programming Perl or Perl C ookbook. Alternatively, you can come find me, or dig out one of the tutorials on the web, or perhaps you'd care to read the Perl manpages/docs. The accumulator Perl has a special scalar that's accessed in many functions if no argument is pr ovided to them. It has the name $_. The perl idiom above, while(<FILEHANDLE>) { ... } assigns the value of each readline() into $_ for each iteration of the loo p. print, without arguments, prints out the accumulator. Perl is very variadic, which might bother C folk. Still, the use of this idiom can result in less code on a line, reducing visual clutter. Other useful functions that can use the accu mulator include: split, chomp, (regular expressions), map, grep, foreach references Perl doesn't have pointers (they don't work well with garbage collection). Inste ad, Perl has references, which are almost as powerful, and considerably less dan gerous. Scalars hold references. To take a reference to something, prefix it wit h a backslash (\), and to dereference, prefix it with the type you expect to get back from it (it thus should have a double prefix). In some cases (such as pass ing through multiple references at once), it's necessary to be more verbose, and access it through the symbol table for the type you desire, using the syntax (s ymbol){reference}, nesting as needed. This last tip shouldn't be needed too ofte n. $a = "Foo"; $b = \$a; print $$b . "\n"; $$b = "Bar"; print "Hey! Weird way to do it gives " . ${$b} . "\n"; print "Now A is $a\n"; This gives the output Foo Hey! Weird way to do it gives Bar Now A is Bar definedness Variables can have the special value undef. This can be used to catch things tha t don't have a value yet or have had their value removed. The operators defined and undef are used to test and set this value. Notice that undefining a value in a hash does not remove it's key. Use delete to remove both. With the right opti ons, Perl will complain when undef is attempted to be used in some ways. truth Perl's notion of truth is simple. All strings (apart from the empty string) are true. The number 0 is false, and all other numbers are true. Things not defined are false. Subroutines, scopes, and other flow control Perl has subroutines, and a number of other flow control mechanisms. The more us eful/commonly used of them are described below. foreach $scalar (list or array) {...} - This iterates over an array or list,

aliasing each value in it to $scalar for that iteration. This can be used on ha shes by using the keys or values function in the list slot. Notice that it's don e with an alias, so it's possible to alter the list using this. for($variable; CONDITION ; POSTLOOP) {...} - This is a traditional C loop. I 'll assume you know how it works. while(CONDITION) {...} - Also from C, works similarly. if(CONDITION) {...} - Same as in C. Note that elseif is replaced with elsif. The C keywords continue and break are replaced by next and last, respectively. T hey also can take an argument, which, if the loop is labeled, lets them specify which loop they're talking about. This lets you break out of nested loops withou t a lot of ugly logic. Scopes Without additional qualification, all variables in Perl are global. There are tw o types of scoping, done with the qualifiers my and local. my is equivilent to C 's scoping (except with garbage collecting, so it's safe to return something mad e with my. local saves the old value of the variable away, and arranges for it t o be restored when the current block exits. Generally, use of local should be di scouraged unless it's being used to modify Perl's builtin globals for just a mom ent. There is another scoping mechanism that's almost exclusively used by the OO facilities that we won't discuss here. Note that the my and local keywords can be positioned flexibly, such as in the variable slot of foreach. Subroutines Perl starts execution at the top of a program (outside a subroutine) and progres ses downwards. It's possible (and definitely recommended) to organize programs t hrough subroutines. Subroutines recieve their arguments through the @_ array (a cousin to $_, which has it's own set of array-oriented functions that use it if nothing else is specified). Variadic functions are natural and easy in Perl. Unf ortunately, if you want named, checked parameters, you need to manage that yours elf. Recursion is safe in Perl (don't forget to use my). Here's a sub that takes two arguments, does subtraction, and returns the results print my_subtract(10,3) . "\n"; sub my_subtract { my ($base, $decrement) = @_; if(!( defined($base) && defined($decrement))) {die "my_subtract() passed bad arguments!\n";} return($base - $decrement); } That script prints 7. Note that there's no restriction on where subroutines can be in your program -- all global subroutines declared this way are defined when a Perl script is compiled, before it is run. It's possible to take a reference t o a sub or a code block. $foo = \&my_subtract; print &$foo(3,1) . "\n"; It is possible to return a number of parameters from a function. However, note t hat returning an array or a hash flattens it and returns its elements one-by-one . If you're only returning one array/hash, it's best to return it as the last pa rameter -- otherwise it's best to return a reference. OO in Perl Objects in Perl are handled through scalars, and implemented through a special n amespace mechanism. Typically, an object is declared in a seperate file, which i s loaded through the use keyword. For further examples in this section, below is

an object declaration that we'll assume lies in AutonDemo.pm #!/usr/bin/perl -w package AutonDemo; sub new { my $self = # This is taking a reference to an anonymous hash { # This is the syntax for initializing hashes. reads => 0, writes => 0, value => undef, karma => 2 }; bless $self; return $self; } sub getvalue { my $self = shift; # Shifts @_ $self->{reads}++; if(rand(10) < 3) { print "You feel ill\n"; $self->{karma}--; } if($self->{karma} < 0) {$self->{value} = 0;} return $self->{value}; } sub setvalue { my($self,$value) = @_; $self->{writes}++; if(rand(10) < 1) { print "You feel safe\n"; $self->{karma}++; } $self->{value} = $value; } sub report { print "Value is " . $self->{value} . "\n"; print "Reads/Writes " . $self->{reads} . " " . $self->{writes} . "\n"; print "Karma is " . $self->{karma} . "\n"; } 1; This class implements a logged variable. Here's a regression test for that class #!/usr/bin/perl -w use AutonDemo; $foo = AutonDemo::new();

$foo->setvalue(4); foreach $UNUSED (0 .. 10) { print $foo->getvalue . "\n"; } $foo->report(); Regexes in Perl Regular expressions are an important feature in Perl. Unfortunately, they're als o hard to read (in any language). There are many places you may have used some s ubset or relative of the regular expression language. DOS and shell wildcards ar e cousins to the language, offering a minute subset of what it is capable of. C offers POSIX regular expressions, unfortunately with a cumbersome API. Sed and A wk implement variants on POSIX regular expressions. Perl's regexes are the resul t of gradual expansion on POSIX standard regexes over several years, and have pr oved popular enough that Perl-style regexes have been backported to C (via the p cre package) and into some other languages (such as Python). Perl regexes are no rmally specified via enclosing forward slashes (/), and applied via the =~ opera tor. This syntax is used for matching. Perl Regexes are also capable of substitu tion, using the same operator but prepending a s to the slashes and using three of them. Parentheses inside Perl regexes are used to capture content (escape any parentheses that you're trying to match with a backslash). This content is stor ed in the variables $1, $2, and upwards, each corrisponding to the position of o ne of the sets of parentheses. If a regex appears alone on a line, it is applied to the accumulator. $foo = "My name is Pat, I think"; $foo =~ /is\s([^,]+)/; $name = $1; print "I found the name $name\n"; $foo =~ s/$name/Andrew/; print $foo . "\n"; Note that if a match or a substitution fails, it will return false. It's possibl e to add modifiers after the closing slash of a regex. Perl's regex style is doc umented in the perlre manpage. If you're not all that familiar with regexes, it might be helpful to find a book about them (O'Reilly makes a good one, alternati vely, O'Reilly's Programming Perl has a section on them that's entirely Perl-cen tric) Cool Perl tricks Perl has a number of cool, quirky, and useful features tie lets you bind a scalar to an object so writes and reads to that object t ransparently become method calls to that object. It works with hashes and arrays too. dbmopen/dbmclose let you bind a hash to a DBM file, making it easy for your programs to have persistant storage DBD is an abstraction over all the database drivers that Perl supports, lett ing you write (more) portable SQL without losing as many features as you might o therwise DBD::CSV is a DBD database driver that allows you to talk SQL to CSV files ( that is, no real database). DBD::Excel is another DBD driver (still under development) that lets you tal k SQL to Excel files DBD::RAM lets you talk SQL to an in-memory database Coy is a module that translates fatal errors you make with die() (or that Pe rl might generate itself) into haiku Math::BigInt is a module that lets you do arbitrary-precision math Tk is a module that lets Perl talk to the Tk graphics toolkit

eval() interprets a string as Perl code, compiling and running it in-place taint mode tracks what variables are acquired from the user versus the scrip t and trusted files, making it safer to write CGIs and setuid scripts My recommended style My style is, to a certain degree, the style accepted by the Perl community at la rge. Start your program with definitions of variables that the user might want to change. Try to keep these definitions in a format that end users are unlikely t o accidentally cause syntax errors by editing Below those definitions, call a subroutine main(), which holds the core logi c of your program. Keep main() small -- use subroutines to do most of the real work. main() sho uld be an overview of what your program does. reduce visual clutter by using $_ format code to reduce visual clutter, breaking statements over several lines use warnings (in the #! line at the top of your program, add the -w flag to perl) For larger programs, also put use strict; at the top of your program If you have a chain of functions calling other functions, put each on a sepe rate line instead of doing a long multiline function. Comment these well. For large printed sections of code, use a HERE document instead of lots of q uotes. HERE documents replace a single scalar in a statement with a <<IDENTIFIER (the statement finishes normally). After that line finishes, all future text un til IDENTIFIER appears alone at the beginning of a line is seen as the content o f that scalar. Example of a HERE document $foo = <<EOHERE; Meow This is multiline I think EOHERE print $foo; Resources You may find the following helpful http://www.cpan.org => CPAN - CPAN is the community repository for Perl modu les. http://www.perl.com => Perl.com - Has articles and several other resources http://www.perlmonks.org => Perlmonks is a good place to ask questions http://www.perldoc.com => Perldoc.com is another documentation source man perl is an index into the perl manpages that are useful to learn concept s perldoc -f function is a way to learn about particular perl functions O'Reilly's Programming Perl is a good book Weaknesses of Perl Perl has a few weaknesses It's irritating that parameter passing for subroutines must be done manually It's possible to code illegibly in Perl Dealing with standard filehandles is awkward It can be confusing that @a and $a have nothing to do with each other, espec ially given that $a[0] belongs to @a and not to $a With such a fluid language, error checking is less powerful than inmore stat

ic languages There are a number of internal variables (see man perlvar) that can change P erl's behavior in surprising ways when altered Perl's defaults (without -w and use strict) are very lax in the code it acce pts The syntax for accessing a value through several references of different kin ds can be very hairy (more so than going through chains of structs in C) Exception handling is fairly weak in Perl