You are on page 1of 145

Perl for Linguists

Michael Hammond

University of Arizona

Perl for Linguists – p.1/38


Perl for linguists

Perl for Linguists – p.2/38


Perl for linguists
Why programming?

Perl for Linguists – p.2/38


Perl for linguists
Why programming? Why Perl?


Perl for Linguists – p.2/38


Perl for linguists
Why programming? Why Perl?


Learn a little Perl.




Perl for Linguists – p.2/38


Perl for linguists
Why programming? Why Perl?


Learn a little Perl.




More advanced Perl. . .




Perl for Linguists – p.2/38


Why programming?

Perl for Linguists – p.3/38


Why programming?
Collect data


Perl for Linguists – p.3/38


Why programming?
Collect data


Analyze data


Perl for Linguists – p.3/38


Why programming?
Collect data


Analyze data


Model theory


Perl for Linguists – p.3/38


Why programming?
Collect data


Analyze data


Model theory


General professional skills




Perl for Linguists – p.3/38


Code for this tutorial
All the code for this presentation is available over
the web at the following URL:

http://linguistics.arizona.edu/~hammond/berkeley.html

Perl for Linguists – p.4/38


Unzipping the programs
1. Download the file programfiles.zip by
right-clicking on the link, selecting ‘save target
as’, and selecting your desktop as the
destination.
2. Unzip the file by double-clicking on the
downloaded file on your desktop and moving
all the files there to a new directory on the
desktop.
3. Open the MS-DOS prompt in the new
directory by double-clicking on the file
doswindow.bat.
Perl for Linguists – p.5/38
Collecting data

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)


Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI


(tkexp.pl)

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI

(tkexp.pl)
Running experiments remotely

(Bailey & Hahn replication, bhrep.cgi)

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI


(tkexp.pl)
Running experiments remotely

(Bailey & Hahn replication, bhrep.cgi)


Assembling corpora from local static

resources (makecorpus.pl)

Perl for Linguists – p.6/38


Collecting data
Running experiments locally (expprog.pl)

Running experiments locally with a GUI


(tkexp.pl)
Running experiments remotely

(Bailey & Hahn replication, bhrep.cgi)


Assembling corpora from local static

resources (makecorpus.pl)
Assembling corpora from nonlocal dynamic

resources: the web (websearch.pl)

Perl for Linguists – p.6/38


Analyzing data

Perl for Linguists – p.7/38


Analyzing data
Looking for patterns (visgrep.pl)

Perl for Linguists – p.7/38


Analyzing data
Looking for patterns (visgrep.pl)


Counting things (neightk.pl)




Perl for Linguists – p.7/38


Analyzing data
Looking for patterns (visgrep.pl)


Counting things (neightk.pl)




Finding verbs (verbs.pl)




Perl for Linguists – p.7/38


Modeling theory

Perl for Linguists – p.8/38


Modeling theory
Optimality Theory (web interface,


sylpars.pl)

Perl for Linguists – p.8/38


Modeling theory
Optimality Theory (web interface,


sylpars.pl)
N-gram models (a bunch of examples from a


course on Statistical NLP that I did recently)

Perl for Linguists – p.8/38


General professional skills

Perl for Linguists – p.9/38


General professional skills
General programming skills


Perl for Linguists – p.9/38


General professional skills
General programming skills


Web programming


Perl for Linguists – p.9/38


Why Perl?

Perl for Linguists – p.10/38


Why Perl?
Free


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Written by a “linguist”


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Written by a “linguist”


Perl poetry


Perl for Linguists – p.10/38


Why Perl?
Free


Multi-platform


Easy


Multiple dialects


Powerful regular expression tools




Written by a “linguist”


Perl poetry


Obfuscated perl, “japhs”, etc.




Perl for Linguists – p.10/38


Learn a little perl. . .

Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl syntax


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl syntax


IO


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics


Perl syntax


IO


Regular expressions


Perl for Linguists – p.11/38


Learn a little perl. . .
Windows basics
Perl syntax
IO
Regular expressions
Where to find out more

Perl for Linguists – p.11/38


Windows basics

Perl for Linguists – p.12/38


Windows basics
It is easiest to invoke perl programs from the
!

DOS prompt. (If configured properly, you can


just double-click them though.)

Perl for Linguists – p.12/38


Windows basics
It is easiest to invoke perl programs from the
"

DOS prompt. (If configured properly, you can


just double-click them though.)
perl myprogram.pl: invokes the perl
"

interpreter with a program myprogram.pl.

Perl for Linguists – p.12/38


Windows basics
It is easiest to invoke perl programs from the
#

DOS prompt. (If configured properly, you can


just double-click them though.)
perl myprogram.pl: invokes the perl
#

interpreter with a program myprogram.pl.


ctrl-c: stops the current program
#

Perl for Linguists – p.12/38


A goal
We will build our efforts around an example
$

program: a program that parses a text file in


English into sentences and then tries to find
all the verbs (verbs.pl).
This exemplifies simple data and control
$

structures and what we might call


“computational linguistic reasoning”.
We won’t get to the full version of this
$

program—it’s infinitely expandable


actually—but we can get a good start.

Perl for Linguists – p.13/38


Programming overview

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
%

program).

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
&

program).
Convert code into something the computer
&

can execute (run perl on your code).

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
'

program).
Convert code into something the computer
'

can execute (run perl on your code).


Program execution (run perl on your code).
'

Perl for Linguists – p.14/38


Programming overview
Implement some idea as perl code (write the
(

program).
Convert code into something the computer
(

can execute (run perl on your code).


Program execution (run perl on your code).
(

Results (what happens as a consequence).


(

Perl for Linguists – p.14/38


Perl syntax

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
)

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
*

Statements can be organized into groups.


*

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
+

Statements can be organized into groups.


+

Statements are operations on some bit of


+

data, e.g. “print this string”, “add these


numbers”, etc.

Perl for Linguists – p.15/38


Perl syntax
A perl program is a series of statements.
,

Statements can be organized into groups.


,

Statements are operations on some bit of


,

data, e.g. “print this string”, “add these


numbers”, etc.
Groups of statements can apply:
,

1. only when specific conditions hold, or


2. more than once, etc.

Perl for Linguists – p.15/38


Statements

Perl for Linguists – p.16/38


Statements
A statement includes some operation
-

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:

Perl for Linguists – p.16/38


Statements
A statement includes some operation
.

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:
print("Hello!"); (prints the string
.

“Hello!”)

Perl for Linguists – p.16/38


Statements
A statement includes some operation
/

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:
print("Hello!"); (prints the string
/

“Hello!”)
rand(5); (gets a random number between 0
/

and 5)

Perl for Linguists – p.16/38


Statements
A statement includes some operation
0

(typically marked with following parentheses),


ends with a semicolon, and may contain
more. For example:
print("Hello!"); (prints the string
0

“Hello!”)
rand(5); (gets a random number between 0
0

and 5)
localtime(); (gets the current time)
0

Perl for Linguists – p.16/38


Try it

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
4. Select Save and then Exit from the File
menu.

Perl for Linguists – p.17/38


Try it
1. Open the DOS window (through the start
menu or by clicking on the doswindow.bat
icon.)
2. Type edit myprogram.pl
3. Type print("I am a linguist!");
4. Select Save and then Exit from the File
menu.
5. Type perl myprogram.pl at the prompt.

Perl for Linguists – p.17/38


Variables

Perl for Linguists – p.18/38


Variables
Notice how rand() and localtime() don’t
1

seem to do anything.

Perl for Linguists – p.18/38


Variables
Notice how rand() and localtime() don’t
2

seem to do anything.
What they do is return a string. You can save
2

that result and then print it:

Perl for Linguists – p.18/38


Variables
Notice how rand() and localtime() don’t
3

seem to do anything.
What they do is return a string. You can save
3

that result and then print it:


$myvariable = localtime();
print($myvariable);

Perl for Linguists – p.18/38


Variables

Perl for Linguists – p.19/38


Variables
Simple “scalar” variables: $myvar,
4

$aVariable, $x.

Perl for Linguists – p.19/38


Variables
Simple “scalar” variables: $myvar,
5

$aVariable, $x.
A value is assigned to a variable with the
5

assignment operator =.

Perl for Linguists – p.19/38


Variables
Simple “scalar” variables: $myvar,
6

$aVariable, $x.
A value is assigned to a variable with the
6

assignment operator =.
Variables can hold numbers, strings, etc.
6

Perl for Linguists – p.19/38


Reading a file

Perl for Linguists – p.20/38


Reading a file
First use open() to open a file and associate
7

it with a handle, e.g.


open(F, "myfile.txt");.

Perl for Linguists – p.20/38


Reading a file
First use open() to open a file and associate
8

it with a handle, e.g.


open(F, "myfile.txt");.
Read a line from the file with record-reading
8

operator: F .
9

Perl for Linguists – p.20/38


Reading a file
First use open() to open a file and associate
;

it with a handle, e.g.


open(F, "myfile.txt");.
Read a line from the file with record-reading
;

operator: F .
<

When you’re done, close the handle:


;

close(F);.

Perl for Linguists – p.20/38


Sample code
open(F,"myfile.txt");
$line = <F>;
print($line);
close(F);

Perl for Linguists – p.21/38


Control structures
if: conditional application
>

while: iteration as long as some condition is


>

true
for: iteration for a specific number of times
>

foreach: iteration for every member of a list


>

Perl for Linguists – p.22/38


More sample code
open(F,"myfile.txt");
while ($line = <F>) {
print($line);
}
close(F);

Perl for Linguists – p.23/38


Arrays

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
?

together: @myarray, @thisBigArray,


@them.

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
@

together: @myarray, @thisBigArray,


@them.
Adding an element to the end of an array:
@

push(@myarray, "hat");.

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
A

together: @myarray, @thisBigArray,


@them.
Adding an element to the end of an array:
A

push(@myarray, "hat");.
Retrieving an element from the end of an
A

array: $it = pop(@myarray);.

Perl for Linguists – p.24/38


Arrays
Array variables: a list of variables grouped
B

together: @myarray, @thisBigArray,


@them.
Adding an element to the end of an array:
B

push(@myarray, "hat");.
Retrieving an element from the end of an
B

array: $it = pop(@myarray);.


Retrieving a specific element:
B

print($myarray[4]); (array indices start


at 0).
Perl for Linguists – p.24/38
Sample code again
open(F,"myfile.txt");
while ($line = <F>) {
push(@mylines, $line);
}
close(F);
foreach $theline (@mylines) {
print($theline);
}

Perl for Linguists – p.25/38


Regular expressions

Perl for Linguists – p.26/38


Regular expressions
“Regular expression”: a restrictive way of
C

indicating string patterns.

Perl for Linguists – p.26/38


Regular expressions
“Regular expression”: a restrictive way of
D

indicating string patterns.


$myvar =~ /regexp/: does $myvar
D

contain the regular expression?

Perl for Linguists – p.26/38


Regular expressions
“Regular expression”: a restrictive way of
E

indicating string patterns.


$myvar =~ /regexp/: does $myvar
E

contain the regular expression?


$myvar =~ s/regexp/string/: if
E

$myvar contains the regular expression,


replace it with the string.

Perl for Linguists – p.26/38


The verb-finding program

Perl for Linguists – p.27/38


The verb-finding program
Recall that the program breaks an English
F

text file into sentences and does its best at


finding all the verbs.

Perl for Linguists – p.27/38


The verb-finding program
Recall that the program breaks an English
G

text file into sentences and does its best at


finding all the verbs.
An hour is not enough time to figure out how
G

to do all of this, but just the bits we’ve covered


are a lot of it (verbs.pl).

Perl for Linguists – p.27/38


Where to find out more

Perl for Linguists – p.28/38


Where to find out more
The free ActiveState Perl we’re using comes
H

with extensive web-based documentation.

Perl for Linguists – p.28/38


Where to find out more
The free ActiveState Perl we’re using comes
I

with extensive web-based documentation.


In any perl implementation the perldoc
I

command can be used to find out lots and


lots of stuff.

Perl for Linguists – p.28/38


Where to find out more
The free ActiveState Perl we’re using comes
J

with extensive web-based documentation.


In any perl implementation the perldoc
J

command can be used to find out lots and


lots of stuff.
The official and best perl website is
J

www.cpan.org, but see also www.perl.org.

Perl for Linguists – p.28/38


Advanced stuff

Perl for Linguists – p.29/38


Advanced stuff
Let’s now do some more advanced stuff.
K

Perl for Linguists – p.29/38


Advanced stuff
Let’s now do some more advanced stuff.
L

‘Advanced’ in terms of perl.


L

Perl for Linguists – p.29/38


Advanced stuff
Let’s now do some more advanced stuff.
M

‘Advanced’ in terms of perl.


M

‘Advanced’ in terms of linguistics.


M

Perl for Linguists – p.29/38


Advanced perl

Perl for Linguists – p.30/38


Advanced perl
Object-oriented programming and public perl
N

modules

Perl for Linguists – p.30/38


Advanced perl
Object-oriented programming and public perl
O

modules
Tk (editperl.pl, visgrep.pl)
O

Perl for Linguists – p.30/38


Advanced perl
Object-oriented programming and public perl
P

modules
Tk (editperl.pl, visgrep.pl)
P

Remote computing (bhexp.cgi, dbiex.pl,


P

websearch.pl)

Perl for Linguists – p.30/38


Object-oriented programming

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
Q

style of programming.

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
R

style of programming.
OO Programs are a network of things or
R

objects, not a list of commands.

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
S

style of programming.
OO Programs are a network of things or
S

objects, not a list of commands.


Objects have their own data and specialized
S

functions for dealing with their own data.

Perl for Linguists – p.31/38


Object-oriented programming
Object-oriented (OO) programming is another
T

style of programming.
OO Programs are a network of things or
T

objects, not a list of commands.


Objects have their own data and specialized
T

functions for dealing with their own data.


Objects can be refer to other objects or inherit
T

the properties of other objects.

Perl for Linguists – p.31/38


Translating to OO

Perl for Linguists – p.32/38


Translating to OO
Original verb-finding program: verbs.pl.
U

Perl for Linguists – p.32/38


Translating to OO
Original verb-finding program: verbs.pl.
V

OO verb-finding program: verbsOO.pl.


V

Perl for Linguists – p.32/38


Caveats

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
W

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
X

OO programs are slower.


X

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
Y

OO programs are slower.


Y

OO programming in perl is not orthodox OO


Y

(lots of “oddments”).

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
Z

OO programs are slower.


Z

OO programming in perl is not orthodox OO


Z

(lots of “oddments”).
00 programming isn’t intuitive for most folks.
Z

Perl for Linguists – p.33/38


Caveats
OO programs are longer.
[

OO programs are slower.


[

OO programming in perl is not orthodox OO


[

(lots of “oddments”).
00 programming isn’t intuitive for most folks.
[

Unfortunately, 00 programming is essential for


[

some modules.

Perl for Linguists – p.33/38


Tk

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
\

graphical user interfaces.

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
]

graphical user interfaces.


There is a special perl module so that you
]

can build GUIs indirectly using Tk.

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
^

graphical user interfaces.


There is a special perl module so that you
^

can build GUIs indirectly using Tk.


This module is widely available for Unix/Linux
^

and Windows, but not clear if it’s available for


Macs.

Perl for Linguists – p.34/38


Tk
Tk is an independent language for making
_

graphical user interfaces.


There is a special perl module so that you
_

can build GUIs indirectly using Tk.


This module is widely available for Unix/Linux
_

and Windows, but not clear if it’s available for


Macs.
If you want to write programs in Perl that
_

really look like modern computer programs,


then you need to use Tk.
Perl for Linguists – p.34/38
GUI coding

Perl for Linguists – p.35/38


GUI coding
Tk programs use OO programming style;
`

buttons, windows, etc. are all “objects”.

Perl for Linguists – p.35/38


GUI coding
Tk programs use OO programming style;
a

buttons, windows, etc. are all “objects”.


You create these objects and then your
a

program waits for the user to interact with


them.

Perl for Linguists – p.35/38


GUI coding
Tk programs use OO programming style;
b

buttons, windows, etc. are all “objects”.


You create these objects and then your
b

program waits for the user to interact with


them.
makecorpus.pl makecorpusTk.pl
b

Perl for Linguists – p.35/38


Remote computing

Perl for Linguists – p.36/38


Remote computing
CGI (“Common Gateway Interface”):
c

programs that run remotely (generating


javascript and HTML: bhexp.cgi)

Perl for Linguists – p.36/38


Remote computing
CGI (“Common Gateway Interface”):
d

programs that run remotely (generating


javascript and HTML: bhexp.cgi)
Interacting with local or remote databases
d

(generating sql: dbiex.pl)

Perl for Linguists – p.36/38


Remote computing
CGI (“Common Gateway Interface”):
e

programs that run remotely (generating


javascript and HTML: bhexp.cgi)
Interacting with local or remote databases
e

(generating sql: dbiex.pl)


Interacting with the web generally
e

(websearch.pl)

Perl for Linguists – p.36/38


Modeling

Perl for Linguists – p.37/38


Modeling
Perl not good for computational clarity; not
f

well-defined

Perl for Linguists – p.37/38


Modeling
Perl not good for computational clarity; not
g

well-defined
Perl exceptionally good at string processing
g

Perl for Linguists – p.37/38


Modeling
Perl not good for computational clarity; not
h

well-defined
Perl exceptionally good at string processing
h

Computational tasks as string processing:


h

sylpars.pl

Perl for Linguists – p.37/38


What now?

Perl for Linguists – p.38/38


What now?
Programming and perl hopefully
i

demystified. . .

Perl for Linguists – p.38/38


What now?
Programming and perl hopefully
j

demystified. . .
Some ideas about what you can do with it if
j

you’re thinking that you need to program.

Perl for Linguists – p.38/38


What now?
Programming and perl hopefully
k

demystified. . .
Some ideas about what you can do with it if
k

you’re thinking that you need to program.


If you already know some perl, perhaps some
k

other ideas about what to do with what you


already know.

Perl for Linguists – p.38/38

You might also like