Professional Documents
Culture Documents
Bash Shell-Scripting PDF
Bash Shell-Scripting PDF
1
Tabel of Contents
Module 3 Processes 41
Module 5 A Overview 49
Module 8 Parameters 68
2
Module 1
Introduction to Operating System:
In simple terms, an operating system is a manager. It manages all the
available resources on a computer. These resources can be the hard disk, a
printer, or the monitor screen. Even memory is a resource that needs to be
managed. Within an operating system are the management functions that
determine who gets to read data from the hard disk, what file is going to be
printed next, what characters appear on the screen, and how much memory a
certain program gets.
Operating systems may be classified by both how many tasks they can
perform `simultaneously' and by how many users can be using the system
`simultaneously'. That is: single-user or multi-user and single-task or multi-
tasking. A multi-user system must clearly be multi-tasking.
Here the system is such that many users can work at a time. There is
one large CPU and high capacity storage medium enclosed into what
is called as the system unit and different terminals are attached to it.
E a ch u se r w o rks o n a se p a ra te te rm in a l a n d u tiliz e s th e C P
U ‘s
resources.
Each users p ro g ra m a n d o th e r file s a re sto re d in th e syste m u n its‘
storage media. Thus the CPU is one and many users are using it.
Therefore there is a need of such an OS that will effectively divide the
resources of the CPU among all users. Such an OS is called a multi
user OS.
3
Features of Multi User OS
1. Multi Processing
As many users are working at a time, every user will run their
own program. W hen one program is run by a user it is a
process. When the same program is run by another user it is
another process. If there are different users running different
programs there are many processes undergoing execution. A
u se r sh o u ld n o t w a it u n til o th e r u se rs‘p ro g ra m s fin is h e xe cu tio
n.
Same program can share by many users at a time and run that
together. This ability of the OS to run several processing
together is called multi-processing.
2. Time Sharing
The CPU can execute only one instruction at a time. Since there
are several users running their programs the OS divides the CPU
time for each user. It allots a definite time interval called time slice
w ith in w h ic h th a t u se r‘s p ro g ra m is e xe cu te d . O n ce th e tim e slic e
is
o v e r th e C P U sw itch e s to th e n e xt u se r a n d e xe c u te s th a t
u se r‘s p ro g ra m . A fte r th e tim e slic e o f th e u se r is o ve r th e n e
xt u se r‘s program is executed.
3. Memory Management
4. Multi Tasking
The Shell: The shell acts as an interface between the user and the
machine and effectively interprets every command given by the user and
advices the kernel to act accordingly.
A single user OS will have only one shell devoted entirely to the user
whereas in a multi user OS every user will have a separate shell.
Kernel: The Kernel is the part of OS that interacts directly with the
hardware of the Computer system.
During the past 25 years the UNIX Operating System has evolved into a
powerful, flexible, and versatile operating system. It serves as the Operating
System for all types of computers, including single user personal computers
and engineering workstations, multi-user microcomputers, minicomputers,
mainframes and supercomputers, as well as special purpose devices, with
approximately 20 million computers now running UNIX and more than 100
million people using these systems. This rapid growth is expected to continue.
The success of UNIX is due to many factors, including its portability to a wide
range of machines, its adaptability and simplicity, the wide range of tasks that
it can perform, its multi-user and multi tasking nature, and its suitability for
networking, which has become increasingly important as the Internet has
blossomed. What follows is a description of the features that have made the
UNIX system so popular.
Understanding UNIX:
The features that made UNIX a hit from the start are:
Multitasking capability
Multi-user capability
Portability
Cooperative Tools and Utilities
Excellent Networking capability
Open Source Code
Multitasking
Many computers do just one thing at a time, as anyone who uses a PC or
laptop can attest. Try logging onto your company's network while opening
your browser while opening a word processing program. Chances are the
processor will freeze for a few seconds while it sorts out the multiple
instructions.
UNIX, on the other hand, lets a computer do several things at once, such as
printing out one file while the user edits another file. This is a major feature for
users, since users don't have to wait for one application to end before starting
another one.
Multi-user
The same design that permits multitasking permits multiple users to use the
computer. The computer can take the commands of a number of users --
determined by the design of the computer -- to run programs, access files,
and print documents at the same time.
The computer can't tell the printer to print all the requests at once, but it does
prioritize the requests to keep everything orderly. It also lets several users
access the same document by compartmentalizing the document so that the
changes of one user don't override the changes of another user.
Portability
A major contribution of the UNIX system was its portability, permitting it to
move from one brand of computer to another with a minimum of code
6
changes. At a time when different computer lines of the same vendor didn't
talk to each other -- yet alone machines of multiple vendors -- that meant a
great savings in both hardware and software upgrades.
It also meant that the operating system could be upgraded without having all
the customer's data inputted again. And new versions of UNIX were backward
compatible with older versions, making it easier for companies to upgrade in
an orderly manner.
UNIX comes with hundreds of programs that are divided into two classes:
Integral utilities that are absolutely necessary for the operation of the
computer, such as the command interpreter, and
Tools that aren't necessary for the operation of UNIX but provide the
user with additional capabilities, such as typesetting capabilities and e-
mail.
Man
DC Mail
Calendar
fsck nroff vi
Tools can be added or removed from a UNIX system, depending upon the
applications required.
7
Open Source Code:
UNIX has provision for protecting data and communicating with other users.
The source code (Open Source) for the UNIX system has been made
available to users and programmers.
History of UNIX:
1965 Bell Laboratories joins with MIT and General Electric in the
development effort for the new operating system, Multics, which would
provide multi-user, multi-processor, and multi-level (hierarchical) file
system, among its many forward-looking features.
1969 AT&T was unhappy with the progress and drops out of the Multics
project. Some of the Bell Labs programmers who had worked on this project,
Ken Thompson, Dennis Ritchie, Rudd Canaday, and Doug McIlroy
designed and implemented the first version of the Unix File System on a PDP-
7 along with a few utilities. It was given the name UNIX by Brian Kernighan as
a pun on Multics.
1971 The system now runs on a PDP-11, with 16Kbytes of memory, including
8Kbytes for user programs and a 512Kbyte disk.
Its first real use is as a text processing tool for the patent department at Bell
Labs. That utilization justified further research and development by the
programming group. UNIX caught on among programmers because it was
designed with these features:
Programmers environment
Simple user interface
Simple utilities that can be combined to perform powerful functions
Hierarchical file system
Simple interface to devices consistent with file format
Multi-user, multi-process system
Architecture independent and transparent to the user.
By 1977, the fifth and sixth editions had been released; these contained many
new tools and utilities. The number of machines running the UNIX System,
8
primarily at Bell laboratories and Universities, increased to more than 600 by
1978. The seventh edition, the direct ancestor of the UNIX Operating
System available today, was released in 1979.
UNIX System III, based on the Seventh edition, became A T & T ‘s first
commercial release of the UNUX System in 1982. However, after System III
was released, AT&T, through its W estern Electric manufacturing subsidiary,
continued to sell versions of the UNIX system. UNIX System III, the various
research editions, and experimental versions were distributed to colleagues at
universities and other research laboratories.
9
UC Berkeley release with many
1984 System V Release 2 networking capabilities
Motif supported
1995 Solaris 2.5
X/Open mark for systems registered
1995 HP-UX 10.0 under the Single UNIX Specification
CDE supported
10
Specification and the Common
Desktop Environment (CDE)
1997 Solaris 2.6
Performance improvements and
1997 Single UNIX Specification, Ver2 networking software added
Today, the UNIX leaders include Solaris, Linux, HP-UX, AIX, and SCO.
11
Why UNIX Is Popular?
One of the most significant points of UNIX is the availability of source code for
the system. (For those new to software, source code contains the
programming elements that, when passed through a compiler, will produce a
binary program— which can be executed.) The binary program contains
sp e cific co m p u te r in stru ctio n s, w h ic h te ls th e syste m ―w h a t to d o .‖ W h e
n th e source code is available, it means that the system (or any
subcomponent) can
be modified without consulting the original author ofthe program. Access to
the source code is a very positive thing and can result in many benefits. For
example, if software defects (bugs) are found within the source code, they can
be fixed right away— without perhaps waiting for the author to do so.
Another great reason is that new software functions can be integrated into the
source code, thereby increasing the usefulness and the overall functionality of
th e so ftw a re . H a vin g th e a b ilty to e xte n d th e so ftw a re to th e u s e r‘s
requirements is a massive gain for the end user and the software industry as
a whole. Over time, the software can become much more useful. One
downside to having access to the source code is that it can become hard to
manage, because it is possible that many different people could have
modified the code in unpredictable (and perhaps negative) ways. However,
th is p ro b le m is typ ic a ly a d d re sse d b y h a vin g a ―so u rce co d e m a in ta
in e r,‖
which reviews the source code changes before the modifications are
incorporated into the original version.
Another downside to source code access is that individuals may use this
information with the goal in mind of compromising system or component
security. The Internet Worm of 1988 is one such popular example. The
author, who was a graduate student at Cornell University at the time, was able
to exploit known security problems within the UNIX system to launch a
software program that gained unauthorized access to systems and was able
to replicate itself to many networked computers. The Worm was so successful
in attaching and attacking systems that it caused many of the computers to
crash due to the amount of resources needed to replicate. Although the Worm
d id n ‘t a ctu a ly ca u se sig n ific a n t p e rm a n e n t d a m a g e to th e s yste m s it in fe
cte d ,
it opened the eyes of the UNIX community about the dangers of source code
access and security on the Internet as a whole.
Flexible Design
12
UNIX was designed to be modular, which makes it a very flexible architecture.
The modularity helps provide a framework that makes it much easier to
introduce new operating system tools, applications, and utilities, or to help in
the migration of the operating system to new computer platforms or other
d e vic e s. A lth o u g h so m e m ig h t a rg u m e n t th a t U N IX is n ‘t fle xib le e n o u g
h fo r their needs, it is quite adaptable and can handle most requirements.
This is evidenced by the fact that UNIX runs on more general computer
platforms and devices than any other operating system.
GNU
The GNU project, started in the early 1980s, was intended to act as a
counterbalance to the widespread activity of corporate greed and adoption of
lic e n s e a g re e m e n ts fo r co m p u te r so ftw a re . T h e ―GNU is not UNIX‖
p ro je ct
w a s re sp o n sib le fo r p ro d u cin g so m e o f th e w o rld ‘s m o st p o p u la r U
N IX
software.
This includes the Emacs editor and the gcc compiler. They are the
cornerstones of the many tools that a significant number of developers use
every day.
Open Software
Programming Environment
There are tools to handle many system administration tasks that you might
encounter. Also, there are tools for development, graphics manipulation, text
processing, database operations— just about any user- or system-
related re q u ire m e n t. If th e b a s ic o p e ra tin g s yste m v e rsio n d o e s n ‘t p ro v id e
a p a rticular tool that you need, chances are that someone has already
developed the tool and it would be available via the Internet.
System Libraries
Well Documented
UNIX is well documented with both online manuals and with many reference
books and user guides from publishers. Unlike some operating systems, UNIX
provides online main page documentation of all tools that ship with the
system.
Further, the UNIX community provides journals and magazine articles about
UNIX, tools, and related topics of interest.
14
ARCHITECTURE OF UNIX SYSTEM:
To understand how the UNIX System works, you need to understand its
structure. The UNIX Operating System is made up of several major
components. Those components include the Kernel, the shell, the file
system, and the commands or user programs.
UNIX is a layered operating system. The innermost layer is the hardware that
provides the services for the OS. The operating system, referred to in UNIX
as the kernel, interacts directly with the hardware and provides the services
to th e u se r p ro g ra m s. T h e se u se r p ro g ra m s d o n ‘t n e e d to kn o w a
n ything about the hardware. They just need to know how to interact with
the kernel a n d it‘s u p to th e k e rn e lto p ro vid e th e d e sire d se rvic e . O n e o f th
e b ig a p p e a ls
of UNIX to programmers has been that most well written user programs are
independent of the underlying hardware, making them readily portable to new
systems.
Note: The core of the UNIX system is the Kernel. The kernel controls the
co m p u ter’s reso u rces,allo ttin g th em to d ifferen t u sers an d to d ifferen t
tasks.
User programs interact with the kernel through a set of standard system
calls. These system calls request services to be provided by the kernel. Such
services would include accessing a file: open close, read, write, link, or
execute a file; starting or updating accounting records; changing ownership of
a file or directory; changing to a new directory; creating, suspending, or killing
a process; enabling access to hardware devices; and setting limits on system
resources.
Apart from the utilities that are provided as part of the UNIX operating system,
more than a thousand UNIX based application programs, like database
management systems, word processors, accounting software etc.,
The basic unit used to organize information in the UNIX System is called a
file. The UNIX file system provides a logical method for organizing, storing,
retrieving, manipulating, and managing information.
15
UNIX SHELLS
The Shell reads your commands and interprets them as requests to execute
a program or programs, which it then arranges to have carried out. Because
the shell plays this role, it is called a command interpreter. Besides being a
command interpreter, the shell is also a programming language. As a
programming language, it permits you to control how and when commands
are carried out. For each user working with UNIX at any time different shell
programs are raining. There may be several shells running in memory, but
only one kernel.
2. The C Shell
16
The original UNIX system shell, sh, was written by Steve Bourne, and as a
result it is known as the Bourne shell.
The C shell, csh, was originally developed as part of BSD UNIX. csh
introduced a number of important enhancement to sh, including the concept
of a command history list and job control.
The Korn shell, ksh, builds on the sh and extends it by adding many features
from the C shell.
Each of these shells has their own respective prompts. The Bourne shell has
the $ prompt. So when you login it is the bourn shell that is established for
you and the stage is set for you to work on the machine.
Features of Shell:
Shell Variables: The user can control the behavior of the shell, as well
as other programs utilities by storing data in variables.
17
The File System
The UNIX file system looks like an inverted tree structure. You start with the
root directory, denoted by /, at the top and work down through sub-directories
underneath it.
Each node is either a file or a directory of files, where the latter can contain
other files and directories. You specify a file or directory by its path name,
either the full, or absolute, path name or the one relative to a location. The full
path name starts with the root, /, and follows the branches of the file system,
each separated by /, until you reach the desired file, e.g.:
/home/Sreedhar/source/xntp
A relative path name specifies the path relative to another, usually the current
working directory that you are at. Two special directory entries should be
introduced now:
● ● /Sreedhar/source/xntp
18
This indicates that I should first go up one directory level, then come down
through the Sreedhar directory, followed by the source directory and then to
xntp.
Every directory and file is listed in its parent directory. In the case of the root
directory, that parent is itself. A directory is a file that contains a table listing
the files contained within it, giving file names to the inode numbers in the list.
An inode is a special file designed to be read by the kernel to learn the
information about each file. It specifies the permissions on the file, ownership,
date of creation and of last access and change, and the physical location of
the data blocks on the disk containing the file.
The system does not require any particular structure for the data in the file
itself. The file can be ASCII or binary or a combination, and may represent
text data, a shell script, compiled object code for a program, directory table,
junk, or anything you would like.
Unix Programs
The shell is a command line interpreter. The user interacts with the kernel
through the shell. You can write ASCII (text) scripts to be acted upon by a
shell.
System programs are usually binary, having been compiled from C source
code. These are located in places like /bin, /usr/bin, /usr/local/bin, /usr/ucb,
etc.
19
Module 2
Exploring the UNIX Shell:
The shell is a rather unique component of the UNIX operating system since it
is one of the primary ways to interact with the system. It is typically through
the shell that users execute other commands or invoke additional functions.
UNIX supports a large number of different shells, and also many of the
popular ones are freely available on the Internet. Also, many versions of UNIX
come with one or more shells and as the system administrator, you can install
20
additional shells when necessary and configure the users of the system to use
different shells, depending on specific preferences or requirements. The table
below lists many of the popular shells and a general description of each.
Once a user has logged into the system, the default shell prompt appears and
the shell simply waits for input from the user. Thus, logging into a Solaris
system as the root user for example, the standard Bourne shell prompt will be
The system echoes this prompt to signal that it is ready to receive input from
the keyboard. At this point, this user is free to type in any standard UNIX
command, application, or custom script name and the system will attempt to
execute or run the command. The shell assumes that the first argument given
bash GNU Bourne-Again shell that includes elements from the Korn
shell and C shell.
ksh The Korn shell combines the best features of the Bourne and C
shells and includes powerful programming tools
zsh Korn shell like, but also provides many more features such as
built-in spell correction and programmable command completion.
The configuration you use to access your UNIX System can be based on one
of two basic models: using multi-user computer or single user computer.
On a multi-user system, you use your own terminal device to access the UNIX
system. The computer you access can be a workstation, a microcomputer, a
mainframe computer, or even a super computer.
Single user systems are direct personal computer. In this you can directly run
UNIX OS. (UnixWare 7.1 by SCO, Solaris 7 from SunSoft, Public domain
Version of UNIX, and popular variant of UNIX known as Linux can use on
single user system).
21
Your display can be character-based, or it can be bit mapped. It may display a
single window or multiple windows, as in the X-Windows system.
UNIX System from a Terminal: If your terminal has not been set to work with
a UNIX System, you must have its options set appropriately. Setting options is
done in different ways on different terminals.
Selecting a LOGIN : Every UNIX System has at least one person, called the
System Administrator, whose job is to maintain the system, and make it
available to its users. The system administrator is also responsible for adding
new users to the system and setting up their initial work environment on the
computer.
Login name must be more than two characters long, and if it is longer
than eight, only the first eight characters are relevant.
Your logname should not have any symbols or spaces in it, and it must
be unique for each user. Some lognames are reserved customarily for
certain uses. For example, the root normally refers to the system
administrator or superuser who is responsible for the whole system.
22
login:
Dial in Access: You may have to dial into the computer using a modem
before you are connected. Use your emulator or dial function to dial the UNIX
System access number. W hen the system answers the call, you will hear a
high-pitched tone you should see some characters appear on screen. Then
you getting UNIX system login prompt.
Logging In:
As a multi-user system, the UNIX System first requires that you identify
yourself before you access to the system.
23
Changing Your Password:
When you first log into a UNIX System, you will have either no password at all
(a null password) or an arbitrary password assigned by the system
administrator. These are only intended for temporary use. Neither offers any
real security. A null password gives anyone access to your account; one
assigned by the system administrator is likely to be easily guessed by
someone. Officially assigned passwords often consist of simple combinations
of your initials and your student, employee, or social security number. If your
password is simply your employee number and the letter X, anyone with
access to this information has access to all of your computer files. Sometimes
random combinations of letters and numbers are used. Such passwords are
difficult to remember, and consequently users will be tempted to write them
down in a convenient place. (Resist this temptation!)
You change your password by using the passwd command. When you issue
this command, the system checks to see if you are the owner of the login.
This prevents someone from changing your password and locking you out of
your own account. passwd first announces that it is changing the password,
and then it asks for your (current) old password, like this:
$ passwd
Old password:
New password:
The system asks for a new password and asks for the password to be verified
(you do this by retyping it). The next time you log in, the new password is
effective. Although you can ordinarily change your password whenever you
want, on some systems after you change your password you must wait a
specific period of time before you can change it again.
24
Don't . Use a word (or words) in any language
Use a proper name
Use information that can be found in your wallet
Use information commonly known about you (car license, pet
name, etc)
Use control characters. Some systems can't handle them
Write your password anywhere
Ever give your password to *anybody*
On some systems, you will be required to change your password the first time
you log in. This will work as described previously and will look like this:
login: sreedhar
Password:
Your password has expired.
Choose a new one.
Old password:
New password:
Re-enter new password:
Password Aging
To ensure the secrecy of your password, you will not be allowed to use the
same password for long stretches of time. On UNIX Systems, passwords age.
When yours gets to the end of its lifespan, you will be asked to change it. The
length of time your password will be valid is determined by your system
administrator. However, you can view the status of your password on most
UNIX systems. Generally, the s option to the passwd command shows you
the status of your password, like this:
25
$ passwd -s
rayjay PW 04/01/99 7 30 5
name
passwd status
date last changed
min days between changes
max days between changes
days before user will be warned to change password
The first field contains your login name; the next fields list the status of your
password, the date it was last changed, and the minimum and maximum days
allowed between password changes; and the last field is the number of days
before your password will need to be changed. Note that this is simply an
example-Km your system, you may not be allowed to read all of these fields.
An Incorrect Login
If you make a mistake in typing either your login or your password, the UNIX
System will respond this way:
login: sreedhar
Password:
Login Incorrect
login:
You will receive the "Password:" prompt even if you type an incorrect or
nonexistent login name. This prevents someone from guessing login names
and learning which one is valid by discovering one that yields the
"Password:" prompt. Because any login results in "Password:" an intruder
cannot guess login names in this way.
If you repeatedly type your login or password incorrectly (three to five times,
depending on how your system administrator has set the default), the UNIX
System will disconnect your terminal if it is connected via modem or LAN. On
some systems, the system administrator will be notified of erroneous login
attempts as a security measure. If you do not successfully log in within some
time interval (usually a minute), you will be disconnected.
If you have problems logging in, you might also check to make sure that your
CAPS LOCK key has not been set. If it has been set, you will inadvertently enter
an incorrect logname or password, because in UNIX uppercase and
lowercase letters are treated differently. (Note that unlike in some other
environments, your account will not get locked if you enter your password
incorrectly some number of times, you will just get disconnected.)
26
When you successfully enter your login and password, the UNIX System
responds with a set of messages, similar to this:
login: sreedhar
Password:
UNIX System V/386/486 Release 4.0 Version 3.0
minnie
Copyright (c) 1984, 1986, 1987, 1988, 1989, 1990 AT&T
Copyright (C) 1987, 1988 Microsoft Corp.
Copyright (C) 1990, NCR Corp.
All Rights Reserved
Last login: Mon January 29 19:55:17 on term/17
You first see the UNIX System announcement that tells you the particular
version of UNIX you are using. Next you see the name of your system, minnie
in this case. This is followed by the copyright notice.
Finally, you see a line that tells you when you logged in last. This is a security
feature. If the time of your last login does not agree with when you remember
logging in, call your system administrator. This discrepancy could be an
indication that someone has broken into your system and is using your login.
After this initial announcement, the UNIX System presents system messages
and news.
Because every user has to log in, the login sequence is the natural place to
put messages that need to be seen by all users. When you log in, you will first
see a message of the day (MOTD). Because every user must see this MOTD,
the system administrator (or root) usually reserves these messages for
comments of general interest, such as this:
After you log in, you will see the UNIX System command prompt at the far left
side of the current line. The default system prompt (for most UNIX Systems) is
the dollar sign:
27
$
This $ is the indication that the UNIX System is waiting for you to enter a
command.
In the examples in this book, you will see the $ at the beginning of a line as it
would be seen on the screen, but you are not supposed to type it.
The UNIX System enables you to define a prompt string, PS1, which is used
as a command prompt. The symbol PS1 is a shell variable (see Chapter 7)
that contains the string you want to use as your prompt. To change the
command prompt, set PS1 to some new string. For example,
changes your primary prompt string from whatever it currently is to the string "
UNIX:> ". From that point, whenever the UNIX System is waiting for you to
enter a command, it will display this new prompt at the beginning of the line.
You can change your prompt to any string of characters you want. You can
use it to remind yourself which system you are on, like this:
$ PS1="MyUnix->
MyUnix->
If you redefine your prompt, it stays effective until you change it or until you
log off. Later in this chapter, you will learn how to make these changes
automatically when you first log in.
The UNIX System makes a large number of programs available to the user.
To run one of these programs you issue a command. For example, when you
type news or passwd, you are really instructing the UNIX System command
interpreter to execute a program with the name news or passwd, and to
display the results on your screen.
28
Some commands simply provide information to you; news works this way. An
often-used command is date, which prints out the current day, date, and time.
There are hundreds of other commands, and you will learn about many of
them in this book. Different variants of the UNIX system share a large
common set of commands (sometimes different names are used for the same
command in different UNIX variants) and provide other commands that are
unique for that particular version of UNIX.
The UNIX system offers several file and directory related commands which
the user can use according to his requirement.
Commands are case sensitive. command and Command are not the same.
Options are generally preceded by a hyphen (-), and for most commands,
more than one option can be strung together, in the form:
command -[option][option][option]
e.g.: ls – alR
will perform a long list on all files in the current directory and recursively
perform the list through all sub-directories.
For most commands you can separate the options, preceding each with a
hyphen, e.g.:
as in:
ls -a -l – R
29
These are the standard conventions for commands. However, not all Unix
co m m a n d s w il fo lo w th e sta n d a rd . S o m e d o n ‘t re q u ire th e h yp h e n
b e fo re o p tio n s a n d so m e w o n ‘t le t yo u g ro u p o p tio n s to g e th e r, i.e . th e y m
a y re q u ire that each option be preceded by a hyphen and separated by white
space from other options and arguments.
Options and syntax for a command are listed in the man page for the
command.
UNIX Commands:
UNIX comes with a large number of commands that fall under each of the
categories listed above for both the generic user and the system
administrator. It is quite hard to list and explain all of the available UNIX
functions and/or commands in a single book. Therefore, a review of some of
the more important user-level commands and functions has been provided
and subsequent modules provide a more in-depth look at system-level
commands. All of the commands discussed below can be run by generic
users and of course by the system administrator. However, one or more
subfunctions of a command may be available only to the system
administrator.
The standard commands are listed bellow, which are available across many
different versions of UNIX. For example, if we wanted to get a listing of all the
users that are currently logged into the system, the who command can be
used.
30
Metacharacters and Wildcards
The metacharacters have special meaning to the shell; they should not
normally be used as any part of a file name.
The "-" symbol can usually be used in a filename provided it is not the first
character. For example, if we had a file called -l then issuing the command ls
-l would give you a long listing of the current directory because the ls
command would think the l was an option rather than -l being a file name
argument. Some UNIX commands provide facilities to overcome this problem.
The shell offers certain special characters called a wild card character that
helps us to specify certain patterns. The shell will then match the pattern in
the file names and select all the files whose name matches the pattern and
will apply the specified file command. The wild card characters are as follows
31
The wildcard ? is expanded by the shell to match any single character in a file
name. The exception is that the ? w il N O T m a tch a d o t ―.‖ a s the
first character of a file name (for example, in a hidden file).
For example:
32
2. ?ell? matches any five-character filenames with 'ell' in the
middle.
3. he* matches any filename beginning with 'he'.
4. [m-z]*[a-l] matches any filename that begins with a letter from 'm'
to 'z' and ends in a letter from 'a' to 'l'.
5. {/usr,}{/bin,/lib}/file expands to /usr/bin/file /usr/lib/file /bin/file and
/lib/file.
Note that the UNIX shell performs these expansions (including any filename
matching) on a command's arguments before the command is executed.
Example
*c
includes all files ending with '.c' because * stands for any number of
any characters, e.g new.c, ptr.c, str.c etc.
A command like rm *.c will therefore delete all files ending with '.c' The
other files which do not end with '.c' will be retained. The pattern
specifies that the files must neccessarily end with '.c'.
Example
cat ab?xy
The above command will display the contents of all files whose name starts
with ab followed by any one character followed by xy.
This wild card specifies any one of the character listed out within the [ ].
Example
rm ab[efg]yz
The above command will delete all the files that begin with ab followed by
either e, f, or g followed by xy.
PIPES UNIX offers a provision whereby the output of one program can be
made the
input of another program. Both the programs are separated by the |
symbol.
Example
$ cat fil.cjpg
33
The above command will display the contents of the file fll.c page by page
because the output is piped to a program called pg which displays the output
only one screenful at a time.
These files are referred to as standard input, standard output and standard
error.
Standard out (stdout) and standard error (stderr) is where the command
expects to put its output, usually the screen.
34
Note: Remember that in AIX, not all file names refer to real data files!
35
36
37
Two or more commands can be separated by a pipe on a single command
line. The requirement is that any command to the left of a pipe must send
output to standard output.
Any command to the right of the pipe must take its input from standard input.
The example on the visual shows that the output of who is passed as input to
wc -l, which gives us the number of active users on the system.
38
A command is referred to as a filter if it can read its input from standard input,
alter it in some way, and write its output to standard output. A filter can be
used as an intermediate command between pipes.
The output of the grep command is then piped to the wc -l command. The
result is that the command is counting the number of directories. In this
example, the grep command is acting as a filter.
39
The \ must be the last character on the line and immediately followed by
pressing Enter.
Do not confuse the continuation prompt > with the redirection character >. The
secondary prompt will not form part of the completed command line. If you
require a redirection character you must type it explicitly.
Module 3
Processes:
UNIX can run a number of different processes at the same time as well as
many occurrences of a program (such as vi) existing simultaneously in the
system.
To identify the running processes, execute the command ps, which will be
covered later in this course. For example, ps -u team01 shows all running
processes from user team01.
41
ps prints information only about processes started from your current terminal.
Only the Process ID, Terminal, Elapsed Time and Command are displayed.
The -e option displays information about EVERY process running in the
system.
The -f option in addition to the default information provided by ps, displays the
User Name, PPID, start time for each process (that is, a FULL listing).
The -l option displays the User ID, PPID and priorities for each process in
addition to the information provided by ps (that is, a LONG listing)
42
Processes that are started from and require interaction with the terminal are
called foreground processes. Processes that are run independently of the
initiating terminal are referred to as background processes.
Background processes are most useful with commands that take a long time
to run.
Notes: The <ctrl-c> may not always work. A Shell script or program can trap
the signal a <ctrl-c> generates and ignore its meaning.
43
You can stop a foreground process by pressing <ctrl-z>. This does not
terminate the process; it suspends it so that you can subsequently restart it.
To find out what suspended/background jobs you have, issue the jobs
command.
The bg, fg, kill commands can be used with a job number. For instance, to
kill job number 3, you can issue the command: kill %3 The jobs command
does not list jobs that were started with the nohup command if the user has
logged off and then logged back into the system. On the other hand, if a user
invokes a job with the nohup command and then issues the jobs command
without logging off, the job will be listed.
44
Module 4
Shell Script:
When a shell script is executed, the shell reads the file one line at a time and
processes the commands in sequence.
Any UNIX command can be run from within a shell script. There are also a
number of built-in shell facilities which allow more complicated functions to be
performed. These will be illustrated later.
45
A shell script is a collection of commands in a file. In the example a shell
script hello is shown.
To execute this script, start the program ksh and pass the name of the shell
script as argument:
$ ksh hello
This shell reads the commands from the script and executes all commands
line by line.
The .profile contains a sequence of commands that help you customize your
environment. Because the .profile is read each time you start a new Korn
shell, the commands you put in this file to customize your environment will be
executed each time you start a new ksh.
These commands can include, but are certainly not limited to, the following:
46
1. aliases
2. terminal control characteristics
3. creation/definition of shell environment variables (including your
prompt)
The first file that the operating system uses at login is the /etc/environment
file. This file contains variables specifying the basic environment for all
processes and can only be changed by the system administrator.
The second file that the operating system uses at login time is the /etc/profile
file. This file controls system-wide default variables such as the mail
messages and terminal types.
The .profile file is the third file read at login time. It resides in a user's login
directory and enables a user to customize their individual working
environment. The .profile file overrides commands run and variables set and
exported by the /etc/profile file.
Ensure that newly created variables do not conflict with standard variables
such as MAIL, PS1, PS2 and so forth.
47
At startup time the shell checks to see if there is any new mail in
/usr/spool/mail/$LOGNAME. If there is then MAILMSG is echoed back. In
normal operation, the shell checks periodically.
The .profile file is read only when the user logs in.
Be aware that your .profile file may not be read if you are accessing the
system through CDE (the Common Desktop Environment). By default, CDE
instead uses a file called .dtprofile. In the CDE environment, if you wish to
use the .profile file, it is necessary to uncomment the DTSOURCEPROFILE
variable assignment at the end of the .dtprofile file.
48
Module 5
Overview
The tilde (~) Expansion:
The C shell provides an easy way to abbreviate the pathname of your home
directory. When the tilde symbol (~) appears at the beginning of a word in
your command line, the shell replaces it with the full pathname of your login
directory.
Example:
% mv file ~/newfile
% mv file $home/newfile
The whence command can be used to determine exactly where the command
you specify is located. For instance, it may be a command located on the disk
drive, it may be an alias, or it may be built-in to the Korn shell. whence reports
the proper location.
whence
49
Aliases
Aliases in the Korn shell allow you to create your own commands. You can
simply rename existing commands, or you can group commands together to
create entirely new commands. This feature is also available in the C shell,
but the command syntax is slightly different.
alias name='value'
50
The ENV variable specifies a Korn shell script to be invoked every time a new
shell is created. The shell script in this example is .kshrc (which is the
standard name used), but any other filename can also be used.
The difference between .profile and .kshrc is that .kshrc is read each time a
subshell is spawned, whereas .profile is read once at login.
EDITOR=/usr/bin/vi
export EDITOR
It will do the same thing that the set -o vi command does as shown in the
example.
The alias command invoked with no arguments prints the list of aliases in the
form name=value on standard output.
51
The Korn shell sets up a number of aliases by default. Notice that the history
and r commands are in fact aliases of the fc command. Once this alias is
established, typing an r will reexcute the previously entered command.
To carry down the value of an alias to subsequent subshells, the ENV variable
has to be modified. The ENV variable is normally set to $HOME/.kshrc in the
.profile file (although you can set ENV to any shell script). By adding the alias
definition to the .kshrc file (by using one of the editors) and invoking the
.profile file, the value of the alias will be carried down to all subshells, because
the .kshrc file is run every time a Korn shell is explicitly invoked.
The file pointed to by the ENV variable should contain Korn shell specifics.
The unalias command will cancel the alias named. The names of the aliases
specified with the unalias command will be removed from the alias list.
52
The /etc/environment file contains default variables set for each process.
Only the system administrator can change this file. PATH is the sequence of
directories that is searched when looking for a command whose path name is
incomplete.
LOCPATH is the full path name of the location of National Language Support
information, part of this being the National Language Support Table.
NLSPATH is the full path name for messages.
53
Module 6
The vi Editor
This unit covers only a subset of the vi functions. It is a very powerful editor.
Refer to the online documentation for additional functions.
vi does its editing in a buffer. When a session is initiated, one of two things
happens:
54
The editor starts in command mode.
55
56
Module 7
The Variables:
There are a number of variables automatically set by the shell when it starts.
These allow you to reference arguments on the command line.
User Variables
It is important to note that you must NOT precede or follow the equal sign with
a space or TAB character.
Sample Session:
$person=Sreedhar
This sample session indicates that person does not represent the string
Richard. The string person is echoed as person. The BourneShell will only
do the substitution of the value of the variable when the name of the variable
is preceded with a dollar sign ($).
Sample Sesssion:
$echo person
person
$echo $person
Sreedhar
$
Sample Session:
$person=‘S re e d h a r a nd Venkatesh'
$echo $person
Sreedhar and Venkatesh
$
57
Shell variables are an integral part of shell programming. They provide the
ability to store and manipulating information within a shell program.
All shell variable names are case sensitive. For example, HOME and home
are not the same.
As a convention uppercase names are used for the standard variables set by
the system and lowercase is used for the variables set by the user.
58
The set command displays your current option settings for all the variables.
The set command is a built-in command of the shell, and therefore gives a
different output depending on the shell being run, for instance a Bourne or a
Korn shell.
The echo command displays the string of text to standard out (by default to
the screen).
To set a variable, use the = with NO SPACES on either side. Once the
variable has been set, to refer to the value of that variable precede the
variable name with a $. There must be NO SPACE between the $ and the
variable name.
59
Notice there need not be a space BEFORE the $ of the variable in order for
the shell to do variable substitution. Note, though, what happened when there
was no space AFTER the variable name. The shell searched for a variable
whose name was xylong, which did not exist. When a variable that has not
been defined is referenced, the user does not get an error. Rather a null string
is returned.
To eliminate the need for a space after the variable name, the curly braces { }
are used.
60
A variable can be set to the output of some command or group of commands
by using the backquotes (also referred to as grave accents). They should not
be mistaken for single quotes. In the examples the output of the date and
who commands are stored in variables.
The backquotes are supported by the bourne shell, C shell and Korn shell.
The use of $(command) is specific to the Korn shell.
The contents of the user variables and the shell variables can be modified by
the user. It is possible to assign a new value to them. The new value can be
assigned from the dollar ($) prompt or from inside a BourneShell script.
Read-only variables are different. The value of read-only variables can not be
changed.
The variable must be initialized to some value; and then, by entering the
following command, it can be made read only.
61
Sample Session:
$person=Sreedhar
$readonly person
$echo $person
Sreedhar
$person=Venkatesh
person: is read only
$
The readonly command given without any arguments will display a list of all
the read-only variables.
Sample Session:
$person=Sreedhar
$readonly person
$example=Venkatesh
$readonly example
$readonly
readonly person
readonly example
$
The read-only shell variables are similar to the read-only user variables;
except the value of these variables is assigned by the shell, and the user
CANNOT modify them.
The shell will store the name of the command you used to call a program in
the variable named $0.
It has the number zero because it appears before the first argument on the
command line.
Sample Session:
$cat name_ex
echo 'The name of the command used'
echo 'to execute this script was' $0
$name_ex
The name of the command used
to execute this script was name_ex
62
$
Arguments
The BourneShell will store the first nine command line arguments in the
variables named $1, $2, ..., $9. These variables appear in this section
because you cannot change them using the equal sign. It is possible to
modify them using the set command.
Sample Session:
$cat arg_ex
echo 'The first five command line'
echo 'arguments are' $1 $2 $3 $4 $5
$arg_ex Sreedhar Venkatesh Santhosh
The first five command line
arguments are Sreedhar venkatesh Santhosh
$
The script arg_ex will display the first five command-line arguments. The
variables representing $4 and $5 have a null value.
Sample Session:
$cat display_all
echo $*
$display_all Sreedhar venkatesh Santhosh
Sreedhar venkatesh Santhosh
$
Sample Session:
$cat num_args
echo 'This script was called with'
echo $# 'arguments'
$num_args Sreedhar venkatesh Santhosh
This script was called with
3 arguments
$
63
BourneShell Environment - Exporting Variables
Within a process, you can declare, initialize, read, and modify variables. The
variable is local to that process. W hen a process forks a child process, the
parent process does not automatically pass the value of the variable to the
child process.
Sample Session:
$cat no_export
car=mercedes # set the variable
echo $0 $car $$ # $0 = name of file executed
# $car =value of variable car
# $$ = PID number (process id)
inner # execute another BourneShell script
echo $0 $car $$ # display same as above
$cat inner
echo $0 $car $$ # display variables for this process
$chmod a+x no_export
$chmod a+x inner
$no_export
no_export mercedes 4790
inner 4792
no_export mercedes 4790
$
Can the value be passed from parent to child process? Yes, by using the
export command. Let's look at an example.
Sample Session:
$cat export_it
car=mercedes
export car
echo $0 $car $$
inner1
echo $0 $car $$
$cat inner1
echo $0 $car $$
car=chevy
64
echo $0 $car $$
$chmod a+x export_it
$chmod a+x inner1
$export_it
export_it mercedes 4798
inner1 mercedes 4800
inner1 chevy 4800
export_it mercedes 4798
$
Exporting variables is only valid from the parent to the child process. The
child process cannot change the parent's variable.
The BourneShell script can read user input from standard input. The read
command will read one line from standard input and assign the line to one or
more variables. The following example shows how this works.
Sample Session:
$cat read_script
echo "Please enter a string of your choice"
read a
echo $a
$
This simple script will read one line from standard input (keyboard) and assign
it to the variable a.
Sample Session:
$read_script
Please enter a string of your choice
Here it is
Here it is
$
65
The line read from standard input can also be assigned to several variables
as shown in the following example.
Sample Session:
$cat reads
echo "Please enter three strings"
read a b c
echo $a $b $c
echo $c
echo $b
echo $a
$
This time, we will turn on the trace mechanism and follow the execution of this
BourneShell script.
Sample Session:
$sh -x reads
+ echo Please enter three strings
Please enter three strings
+ read a b c
this is more than three strings
+ echo this is more than three strings
this is more than three strings
+ echo more than three strings
more than three strings
+ echo is
is
+ echo this
this
$
It is interesting to note that the spaces separate the values for the variables
a,b, and c. For example, the variable a was assigned the string this, the
variable b was assigned the string is, and the remainder of the line was
assigned to c (including the spaces).
Sample Session:
$cat read_ex
echo 'Enter line: \c'
read line
echo "The line was: $line"
$
66
In this example, the \c option will suppress the carriage return.
The single quote marks protect the backslash from being interpreted
by the shell. Also notice that the double quote marks have no
effect on the substitution of the variable line.
Sample Session:
$read_ex
Enter line: All's well that ends well
The line was: All's well that ends well
$
67
Module 8
Parameters:
A shell is invoked by typing its name. Parameters are passed to the script by
appending them to the script name, with spaces as separators.
POSITIONAL PARAMETERS
Let's look at an example BourneShell script to see how these are used.
Sample Session:
$cat neat_shell
echo $1 $2 $3
echo $0 is the name of the shell script
echo "There were $# arguments."
echo $*
$
Sample Session:
Now, if we type the name of the BourneShell script with no arguments, we get
the following results.
Sample Session:
$neat_shell
68
In this sample session, there were no arguments given so none were printed.
$0 is the positional parameter that refers to the name of the script. Since
there were no arguments given with this invocation of neat_shell, there were
zero arguments listed.
The special variable $0 represents the name of the executing program. The
following shell, if called script.sh would output This program is called
script.sh.:
#!/bin/sh
echo This program is called $0.
exit 0
The first parameter to the shell is known as $1, the second as $2, etc. The
collection of ALL parameters is known as $*.
#!/bin/sh
echo the first parameter is $1
echo the second parameter is $2
echo the collection of ALL parameters is $*
exit 0
69
etc. as we do to any other user-defined variables, or system variables for that
matter.
Saying a=10 or b=alpha is fine but $1=dollar or $2=100 is simply not done.
There is one way to assign values to the positional parameters using the set
command.
$ echo $1 $2 $3 $4 $5 $6 $7
Friends come and go, but enemies accumulate
#!/bin/sh
echo The first parameter is $1.
shift
echo The second parameter is $1.
shift
echo The third parameter is $1.
exit 0
70
Module 9
Regular Expresiion:
What is a Regular Expression?
Regular expressions are used when you want to search for specify lines of
text containing a particular pattern. Most of the UNIX utilities operate on ASCII
files a line at a time. Regular expressions search for patterns on a single line,
and not for patterns that start on one line and end on another.
Regular expressions confuse people because they look a lot like the file
matching patterns the shell uses. They even act the same way--almost. The
square brackers are similar, and the asterisk acts similar to, but not identical
to the asterisk in a regular expression. In particular, the Bourne shell, C shell,
find, and cpio use file name matching patterns and not regular expressions.
There are three important parts to a regular expression. Anchors are used to
specify the position of the pattern in relation to a line of text. Character Sets
match one or more characters in a single position. Modifiers specify how
many times the previous character set is repeated. A simple example that
demonstrates all three parts is the regular expression "^#*." The up arrow is
an anchor that indicates the beginning of the line. The character "#" is a
simple character set that matches the single character "#." The asterisk is a
modifier. In a regular expression it specifies that the previous character set
can appear any number of times, including zero. This is a useless regular
expression, as you will see shortly.
There are also two types of regular expressions: the "Basic" regular
expression, and the "extended" regular expression. A few utilities like awk and
egrep use the extended expression. Most use the "regular" regular
71
expression. From now on, if I talk about a "regular expression," it describes a
feature in both types.
Here is a table of the Solaris (around 1991) commands that allow you to
specify regular expressions:
Most UNIX text facilities are line oriented. Searching for patterns that span
several lines is not easy to do. You see, the end of line character is not
included in the block of text wthat is searched. It is a separator. Regular
expressions examine the text between the separators. If you want to search
for a pattern that is at one end or the other, you use anchors. The character
"^" is the starting anchor, and the character "$" is the end anchor. The regular
expression "^A" will match all lines that start with a capital A. The expression
"A$" will match all lines that end with the capital A. If the anchor characters
are not used at the proper end of the pattern, then they no longer act as
anchors. That is, the "^" is only an anchor if it is the first character in a regular
expression. The "$" is only an anchor if it is the last character. The expression
"$1" does not have an anchor. Neither is "1^." If you need to match a "^" at the
beginning of the line, or a "$" at the end of a line, you must escape the special
characters with a back slash. Here is a summary:
72
Pattern Matches
^A "A" at the beginning of a line
A$ "A" at the end of a line
A^ "A^" anywhere on a line
$A "$A" anywhere on a line
^^ "^" at the beginning of a line
$$ "$" at the end of a line
The use of "^" and "$" as indicators of the beginning or end of a line is a
convention other utilities use. The vi editor uses these two characters as
commands to go to the beginning or end of a line. The C shell uses "!^" to
specify the first argument of the previous line, and "!$" is the last argument on
the previous line.
The character "." is one of those special meta-characters. By itself it will match
any character, except the end-of-line character. The pattern that will match a
line with a single characters is
^.$
If you want to match specific characters, you can use the square brackets to
identify the exact characters you are searching for. The pattern that will match
any line of text that contains exactly one number is
73
^[0123456789]$
This is verbose. You can use the hyphen between two characters to specify a
range:
^[0-9]$
You can intermix explicit characters with character ranges. This pattern will
match a single character that is a letter, number, or underscore:
[A-Za-z0-9_]
Character sets can be combined by placing them next to each other. If you
wanted to search for a word that
You can easily search for all characters except those in square brackets by
putting a "^" as the first character after the "[." To match all characters except
vowels use "[^aeiou]."
Like the anchors in places that can't be considered an anchor, the characters
"]" and "-" do not have a special meaning if they directly follow "[." Here are
some examples:
74
a "-", a "z", or a "]"
The third part of a regular expression is the modifier. It is used to specify how
may times you expect to see the previous character set. The special character
"*" matches zero or more copies. That is, the regular expression "0*"
matches zero or more zeros, while the expression "[0-9]*" matches zero or
more numbers.
This explains why the pattern "^#*" is useless, as it matches any number of
"#'s" at the beginning of the line, including zero. Therefore this will match
every line, because every line starts with zero or more "#'s."
At first glance, it might seem that starting the count at zero is stupid. Not so.
Looking for an unknown number of characters is very important. Suppose you
wanted to look for a number at the beginning of a line, and there may or may
not be spaces before the number. Just use "^ *" to match zero or more spaces
at the beginning of the line. If you need to match one or more, just repeat the
character set. That is, "[0-9]*" matches zero or more numbers, and "[0-9][0-
9]*" matches one or more numbers.
You can continue the above technique if you want to specify a minimum
number of character sets. You cannot specify a maximum number of sets with
the "*" modifier. There is a special pattern you can use to specify the minimum
and maximum number of repeats. This is done by putting those two numbers
between "\{" and "\}." The back slashes deserve a special discussion.
Normally a backslash turns off the special meaning for a character. A period
is matched by a "\." and an asterisk is matched by a "\*."
If a backslash is placed before a "<," ">," "{," "}," "(," ")," or before a digit, the
back slash turns on a special meaning. This was done because these special
functions were added late in the life of regular expressions. Changing the
meaning of "{" would have broken old expressions. This is a horrible crime
punishable by a year of hard labor writing COBOL programs. Instead, adding
a back slash added functionality without breaking old programs. Rather than
complain about the unsymmetry, view it as evolution.
Having convinced you that "\{" isn't a plot to confuse you, an example is in
order. The regular expression to match 4, 5, 6, 7 or 8 lower case letters is
[a-z]\{4,8\}
Any numbers between 0 and 255 can be used. The second number may be
omitted, which removes the upper limit. If the comma and the second number
75
are omitted, the pattern must be duplicated the exact number of times
specified by the first number.
You must remember that modifiers like "*" and "\{1,5\}" only act as modifiers if
they follow a character set. If they were at the beginning of a pattern, they
would not be a modifier. Here is a list of examples, and the exceptions:
Searching for a word isn't quite as simple as it at first appears. The string "the"
will match the word "other." You can put spaces before and after the letters
and use this regular expression: " the ." However, this does not match words
at the beginning or end of the line. And it does not match the case where
there is a punctuation mark after the word.
There is an easy solution. The characters "\<" and "\>" are similar to the "^"
and "$" anchors, as they don't occupy a position of a character. They do
"anchor" the expression between to only match if it is on a word boundary.
The pattern to search for the word "the" would be "\<[tT]he\>." The character
before the "t" must be either a new line character, or anything except a letter,
number, or underscore. The character after the "e" must also be a character
other than a number, letter, or underscore or it could be the end of line
character.
76
Backreferences - Remembering patterns with \(, \) and \1
\([a-z]\)\([a-z]\)[a-z]\2\1
Potential Problems
The "\<" and "\>" characters were introduced in the vi editor. The other
programs didn't have this ability at that time. Also the "\{min,max\}" modifier is
new and earlier utilities didn't have this ability. This made it difficult for the
novice user of regular expressions, because it seemed each utility has a
different convention. Sun has retrofited the newest regular expression library
to all of their programs, so they all have the same ability. If you try to use
these newer features on other vendor's machines, you might find they don't
work the same way.
The other potential point of confusion is the extent of the pattern matches.
Regular expressions match the longest possible pattern. That is, the regular
expression
A.*B
Two programs use the extended regular expression: egrep and awk. With
these extensions, those special characters preceded by a back slash no
longer have the special meaning: "\{," "\}," "\<," "\>," "\(," "\)" as well as the
"\digit." There is a very good reason for this, which I will delay explaining to
build up suspense.
77
The character "?" matches 0 or 1 instances of the character set before, and
the character "+" matches one or more copies of the character set. You can't
use the \{ and \} in the extended regular expressions, but if you could, you
might consider the "?" to be the same as "\{0,1\}" and the "+" to be the same
as "\{1,\}."
By now, you are wondering why the extended regular expressions is even
worth using. Except for two abbreviations, there are no advantages, and a lot
of disadvantages. Therefore, examples would be useful.
The three important characters in the expanded regular expressions are "(,"
"|," and ")." Together, they let you match a choice of patterns. As an example,
you can egrep to print all From: and Subject: lines from your incoming mail:
All lines starting with "From:" or "Subject:" will be printed. There is no easy
way to do this with the Basic regular expressions. You could try
"^[FS][ru][ob][mj]e*c*t*: " and hope you don't have any lines that start with
"Sromeet:." Extended expressions don't have the "\<" and "\>" characters. You
can compensate by using the alternation mechanism. Matching the word "the"
in the beginning, middle, end of a sentence, or end of a line can be done with
the extended regular expression:
(^| )the([^a-z]|$)
There are two choices before the word, a space or the beginining of a line.
After the word, there must be something besides a lower case letter or else
the end of the line. One extra bonus with extended regular expressions is the
ability to use the "*," "+," and "?" modifiers after a "(...)" grouping. The
following will match "a simple problem," "an easy problem," as well as "a
problem."
I promised to explain why the back slash characters don't work in extended
regular expressions. Well, perhaps the "\{...\}" and "\<...\>" could be added to
the extended expressions. These are the newest addition to the regular
expression family. They could be added, but this might confuse people if
those characters are added and the "\(...\)" are not. And there is no way to add
that functionality to the extended expressions without changing the current
usage. Do you see why? It's quite simple. If "(" has a special meaning, then
"\(" must be the ordinary character. This is the opposite of the Basic regular
expressions, where "(" is ordinary, and "\(" is special. The usage of the
parentheses is incompatable, and any change could break old programs.
If the extended expression used "( ..|...)" as regular characters, and "\(...\|...\)"
for specifying alternate patterns, then it is possible to have one set of regular
expressions that has full functionality. This is exactly what GNU emacs does,
by the way.
78
The rest of this is random notes.
Regular
Class Type Meaning
Expression
_
A single character (except
. all Character Set
newline)
^ all Anchor Beginning of line
$ all Anchor End of line
[...] all Character Set Range of characters
* all Modifier zero or more duplicates
\< Basic Anchor Beginning of word
\> Basic Anchor End of word
\(..\) Basic Backreference Remembers pattern
\1..\9 Basic Reference Recalls pattern
_+ Extended Modifier One or more duplicates
? Extended Modifier Zero or one duplicate
\{M,N\} Extended Modifier M to N Duplicates
(...|...) Extended Anchor Shows alteration
_
\(...\|...\) EMACS Anchor Shows alteration
\w EMACS Character set Matches a letter in a word
\W EMACS Character set Opposite of \w
79
Module 10
A Sample Shell Script
This visual shows another way of invoking a shell script. This method relies on
the user first making the script an executable file with the chmod command.
Note that the shell uses the PATH variable to find executable files. If you get
an error message like the following,
$ hello
ksh: hello: not found
check your PATH variable. The directory in which the shell script is stored
must be defined in the PATH variable.
80
Each shell script is executed in a subshell. Variables defined in a shell script
cannot be passed back to the parent shell.
If you invoke a shell script with a . (dot), it runs in the current shell. Variables
defined in this script (dir1, dir2) are therefore defined in the current shell.
Every process gives back an exit status to its parent process. Per convention
0 is given back when the process ended successfully and not equal 0 in all
other cases.
81
To find out the exit code of a completed command, use echo $?:
$ date
$ echo $?
0
$_
This shows successful execution of the date command. The visual shows an
example for an unsuccessful execution of a command.
CONTROL CONSTRUCTS:
The BourneShell control constructs can alter the flow of control within the
script. The BourneShell provides simple two-way branch if statements and
multiple-branch case statements, plus for, while, and until statements.
You can negate any criterion by preceding it with an exclamation mark (!).
Parentheses can be used to group criteria. If there are no parentheses, the -a
(logical AND operator) takes precedence over the -o (logical OR operator).
The test utility will evaluate operators of equal precedence from left to right.
82
Within the expression itself, you must put special characters, such as
parentheses, in quote marks so the BourneShell will not evaluate them but will
pass them to test.
The test utility will work from the command line but it is more often used in a
script to test input or verify access to a file.
Another way to do the test evaluation is to surround the expression with left
and right brackets. A space character must appear after the left bracket and
before the right bracket.
Test expressions can be in many different forms. The expressions can appear
as a set of evaluation criteria. The general form for testing numeric values is:
int1 op int2
This criterion is true if the integer int1 has the specified algebraic relationship
to integer int2.
-eq equal
string1 op string2
83
The operators (op) are:
Sample Session:
$cat test_string
number=1
numero=0001
if test $number = $numero
then echo "String vals for $number and $numero are ="
else echo "String vals for $number and $numero not ="
fi
if test $number -eq $numero
then echo "Numeric vals for $number and $numero are ="
else echo "Numeric vals for $number and $numero not ="
fi
$sh -x test_string
number=1
numero=0001
+ test 1 = 0001
+ echo String vals for 1 and 0001 not =
String vals for 1 and 0001 not =
+ test 1 -eq 0001
+ echo Numeric vals for 1 and 0001 are =
Numeric vals for 1 and 0001 are =
$test_string
String vals for 1 and 0001 not =
Numeric vals for 1 and 0001 are =
The test utility can be used to determine information about file types. All of
the criterion can be found in Appendix B. A few of them are listed here:
84
-x filename true if filename exists and is executable
Example:
$test -d new_dir
If new_dir is a directory, this criterion will evaluate to true. If it does not exist,
then it will be false.
The if statement evaluates the expression and then returns control based on
this status. The fi statement marks the end of the if, notice that fi is if spelled
backward.
Sample Session:
$cat check_args
if (test $# = 0)
then echo 'Please supply at least 1 argument'
exit
fi
echo 'Program is running'
$
85
This little script will check to insure that you are giving at least one argument.
If none are given it will display the error message and exit. If one or more
arguments are given it will display "Program is running" and run the rest of the
script, if any.
Sample Session:
$check_args
Please supply at least 1 argument
$check_args xyz
Program is running
$
The else part of this structure makes the single-branch if statement into a two-
way branch. If the expression returns a true status, the commands between
the then and the else statement will be executed. After these have been
executed, control will start again at the statement after the fi.
If the expression returns false, the commands following the else statement will
be executed.
Sample Session:
$cat test_string
number=1
numero=0001
if test $number = $numero
then echo "String values of $number and $numero are equal"
else echo "String values of $number and $numero not equal"
fi
if test $number -eq $numero
then echo "Numeric values of $number and $numero are equal"
else echo "Numeric values of $number and $numero not equal"
fi
$
86
Taking Decision using if then elif
The elif construct combines the else and if statements and allows you to
construct a nested set of if then else structures.
Sample Session:
$cat case_ex
echo 'Enter A, B, or C: \c'
read letter
case $letter in
A) echo 'You entered A' ;;
B) echo 'You entered B' ;;
C) echo 'You entered C' ;;
*) echo 'You did not enter A, B, or C' ;;
esac
$chmod a+x case_ex
$case_ex
87
Enter A, B, or C: B
You entered B
$case_ex
Enter A, B, or C: b
You did not enter A, B, or C
$
This example uses the value of a character that the user entered as the test
string. The value is represented by the variable letter. If letter has the value
of A, the structure will execute the command following A. If letter has a value
of B or C, then the appropriate commands will be executed. The asterisk
indicates any string of characters; and it, therefore, functions as a catchall for
a no-match condition. The lowercase b in the second sample session is an
example of a no match condition.
This structure will assign the value of the first item in the argument list to the
loop index and executes the commands between the do and done
statements. The do and done statements indicate the beginning and end of
the for loop.
After the structure passes control to the done statement, it assigns the value
of the second item in the argument list to the loop index and repeats the
commands. The structure will repeat the commands between the do and
done statements once for each argument in the argument list. When the
argument list has been exhausted, control passes to the statement following
the done.
Sample Session:
$cat find_henry1
for x in project1 project2 project3
do
grep henry $x
done
88
Sample Session:
$head project?
==> project1 <==
henry
joe
mike
sue
$find_henry
henry
henry
$
Each file in the argument list was searched for the string, henry. When a
match was found, the string was printed.
As long as the expression returns a true exit status, the structure continues to
execute the commands between the do and the done statement. Before each
loop through the commands, the structure executes the expression. When
89
the exit status of the expression is false (non-zero), control is passed to the
statement following the done statement.
The until and while structures are very similar. The only difference is that the
test is at the top of the loop. The until structure will continue to loop until the
expression returns true or a nonerror condition. The while loop will continue
as long as a true or nonerror condition is returned.
Sample Session:
$cat until_ex
secretname='jenny'
name='noname'
echo 'Try to guess the secret name!'
echo
until (test "$name" = "$secretname")
do
echo 'Your guess: \c'
read name
done
echo 'You did it!'
$
The until loop will continue until name is equal to the secret name.
Sample Session:
90
Your guess: gaylan
Your guess: art
Your guess: richard
Your guess: jenny
You did it!
$
The break and continue loop control commands correspond exactly to their
counterparts in other programming languages. The break command
terminates the loop (breaks out of it), while continue causes a jump to the next
iteration (repetition) of the loop, skipping all the remaining commands in that
particular loop cycle.
#!/bin/bash
echo
echo "Printing Numbers 1 through 20 (but not 3 and 11)."
a=0
echo -n "$a " # This will not execute for 3 and 11.
done
# Exercise:
# Why does loop print up to 20?
echo; echo
##############################################################
####
a=0
91
while [ "$a" -le "$LIMIT" ]
do
a=$(($a+1))
if [ "$a" -gt 2 ]
then
break # Skip entire rest of loop.
fi
exit 0
92
Module 11
Useful Utilities for Shells
cat - concatenate a file
Common Options
You can list a series of files on the command line, and cat will concatenate
them, starting each in turn, immediately after completing the previous one,
e.g.:
DATE
Example
$date
93
%T Gives time as HH:MM:SS
Example
S date "+DATE IS%D TIME IS %T"
Example
$ date "+DAY %d MONTH %m YEAR %y"
The find command recursively searches the directory tree for each specified
path, seeking files that match a Boolean expression written using the terms
given in the text that follows the expression. The output from the find
command depends on the terms specified by the final parameter.
94
Note that the -print option is the default so is not required. This was not
always the case. In earlier versions of AIX and on other UNIX systems that
have not yet implemented the POSIX standard for the find command, the -
print option is required for the result to be displayed or used in a pipe.
The command following -exec, in this example ls, is executed for each file
name found.
95
find replaces the {} with the names of the files matched. It is used as a
placeholder for matches.
Note use of the escaped ; to terminate the command that find is to execute.
The find command may also be used with a -ls option; that is, $ find . -name
'm*' -ls.
The \; is hard coded with the find command. This is required for use with -
exec and -ok.
It is a good idea to use the -ok option rather than -exec if there are not a lot of
files that match the search criteria. It is a lot safer if your pattern is not exactly
what you think it is.
96
The grep command searches for the pattern specified and writes each
matching line to standard output.
The search can be for simple text, like a string or a name. grep can also look
for logical constructs, called regular expressions, that use patterns and
wildcards to symbolize something special in the text, for example, only lines
that start with an uppercase T.
The command displays the name of the file containing the matched line, if
more than one file is specified for the search.
97
98
99
On-Line Documentation:
The UNIX manual, usually called man pages, is available on-line to explain
the usage of the UNIX system and commands. To use a man page, type the
command "man" at the system prompt followed by the command for which
you need information.
Syntax
man [options] command_name
Common Options
Another program used to read and write files associated with an archive is tar.
Some of the available options are
100
gzip
This reduces the size of a file, thus freeing valuable disk space. For example,
type
% ls -l science.txt
and note the size of the file using ls -l . Then to compress science.txt, type
% gzip science.txt
This will compress the file and place it in a file called science.txt.gz
% gunzip science.txt.gz
nslookup
nslookup host
domain name, IP address, and alias information for the given host.
e.g., nslookup www.kent.edu gives related data for www.kent.edu
Cut command.
cut command selects a list of columns or fields from one or more files.
Option -c is for columns and -f for fields. It is entered as
cut options [files]
for example if a file named testfile contains
this is firstline
this is secondline
this is thirdline
Examples:
cut -c1,4 testfile will print this to standard output (screen)
ts
101
ts
ts
It is printing columns 1 and 4 of this file which contains t and s (part of this).
Options:
Examples:
sed command launches a stream line editor which you can use at command
line.
you can enter your sed commands in a file and then using -f option edit your
text file. It works as
options:
for more information about sed, enter man sed at command line in your
system.
102
Module 12
Arithmetic on Shell Variables
A Unix command called expr evaluates an expression given to it on the
command line
Each operator and operand given to expr must be a separate argument
The usual arithmetic operators (+.-,*,/,%) are recognized by expr
Remember to use backslashes to protect the expression from the shell
expr only evaluates integer arithmetic expressions
Use the ':' operator with expr to match characters in the first operand
against a regular expression given as the second argument; by default
it returns the number of characters matched
#!/bin/sh
# Perform some arithmetic
x=24
y=4
Result=`expr $x \* $y`
echo "$x times $y is $Result"
#!/bin/sh
#Usage read echo
103
Module 13
Functions
Like "real" programming languages, Bash has functions, though in a
somewhat limited implementation. A function is a subroutine, a code block
that implements a set of operations, a "black box" that performs a specified
task. Wherever there is repetitive code, when a task repeats with only slight
variations, then consider using a function.
function function_name {
command...
}
or
function_name () {
command...
}
This second form will cheer the hearts of C programmers (and is more
portable).
function_name ()
{
command...
}
In this case, however, a semicolon must follow the final command in the
function.
#!/bin/bash
JUST_A_SECOND=1
104
funky ()
{ # This is about as simple as functions get.
echo "This is a funky function."
echo "Now exiting funky function."
} # Function declaration must precede call.
fun ()
{ # A somewhat more complex function.
i=0
REPEATS=30
echo
echo "And now the fun really begins."
echo
funky
fun
exit 0
Debugging
The Bash shell contains no debugger, nor even any debugging-specific
commands or constructs. Syntax errors or outright typos in the script generate
cryptic error messages that are often of no help in debugging a non-functional
script.
#!/bin/bash
# ex74.sh
105
a=37
if [$a -gt 27 ]
then
echo $a
fi
exit 0
106
Module 14
Sed:
Sed is the ultimate stream editor. If that sounds strange, picture a stream
flowing through a pipe. Okay, you can't see a stream if it's inside a pipe.
That's what I get for attempting a flowing analogy.
Anyhow, sed is a marvelous utility. Unfortunately, most people never learn its
real power. The language is very simple, but the documentation is terrible.
The Solaris on-line manual pages for sed are five pages long, and two of
those pages describe the 34 different errors you can get. A program that
spends as much space documenting the errors than it does documenting the
language has a serious learning curve.
Sed has several commands, but most people only learn the substitute
command: s. The substitute command changes all occurrences of the regular
expression into a new value. A simple example is changing "day" in the "old"
file to "night" in the "new" file:
I didn't put quotes around the argument because this example didn't need
them. If you read my earlier tutorial, you would understand why it doesn't need
quotes. If you have meta-characters in the command, quotes are necessary.
In any case, quoting is a good habit, and I will henceforth quote future
examples. That is:
s Substitute command
/../../ Delimiter
day Regular Expression Pattern String
night Replacement string
If you have many commands and they won't fit neatly on one line, you can
break up the line using a backslash:
sed -e 's/a/A/g'
-e 's/e/E/g' \
-e 's/i/I/g' \
107
-e 's/o/O/g' \
-e 's/u/U/g' <old >new
Sed is extremely powerful, and you can do things in sed that you can't do in
any standard word processor. And because sed is external to the word
processor and comes with every Unix system in the world, once you learn sed
you'll have a very handy tool in your toolkit, even if (like me) you rarely use
Unix.
How it works: You feed sed a script of editing commands (like, "change every
line that begins with a colon to such-and-such") and sed sends your revised
text to the screen. To save the revisions on disk, use the redirection arrow,
>newfile.txt. Sample syntax:
awk:
Awk is a ``pattern scanning and processing language'' which is useful for
writing quick and dirty programs that don't have to be compiled. The calling
syntax of awk is like sed:
Like sed, awk can work on standard input or on a file. Like the shell, if you
start an awk program with
#!/bin/awk – f
then you can execute the program directly from the shell.
Most systems also have nawk, which stands for ``new awk.'' Nawk has many
more features than awk and is generally more useful. I am just going to cover
awk, but you should check out nawk too in your own time. Nawk has some
nice things like a random number generator, that awk doesn't have.
pattern { action }
What such a statement does is apply the action to all lines that match the
pattern. If there is no pattern, then it applies the action to all lines. If there is
108
no action, then the default action is to copy the line to standard output.
Patterns can be regular expressions enclosed in slashes (they can be more
than that, but for now, just assume that they are regular expressions).
So, for example, the program awkgrep works just like ``grep Jim''.
/Jim/
UNIX> cat input
Which of these lines doesn't belong:
Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX> awkgrep input
Jimmy Carter
UNIX> awkgrep < input
Jimmy Carter
UNIX>
Basically look like C programs. There are some big differences, but for the
most part, you can do most basic things that you can do in C.
Awk breaks up each line into fields, which are basically whitespace-separated
words. You can get at word i by specifying $i. The variable NF contains the
number of words on the line. The variable $0 is the line itself.
So, to print out the first and last words on each line, you can do:
Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX> awk '{ print $1, $NF }' input
Which belong:
Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX>
109
An alternative awkgrep prints out $0 when it finds the pattern:
UNIX> cat awkgrep2
#!/bin/awk -f
/Jim/ { print $0 }
UNIX> awkgrep2 input
Jimmy Carter
UNIX>
Awk has a printf just like C. You don't have to use parentheses when you call
it (although you can if you'd like). Unlike print, printf will not print a newline if
you don't want it to. So, for example, awkrev reverses the lines of a file:
Clinton Bill
Bush George
Reagan Ronald
Carter Jimmy
Stallone Sylvester
UNIX>
A few things that you'll notice about awkrev: Actions can be multiline. You
don't need semicolons to separate lines like in C. However, you can specify
multiple commands on a line and separate them with semi-colons as in C.
And you can block commands with curly braces as in C. If you want a
command to span two lines (this often happens with complex printf
statements), you need to end the first line with a backslash.
Also, you'll notice that awkrev didn't declare the variable i. Awk just figured
out that it's an integer.
Type casting
Awk lets you convert variables from one type to another on the fly. For
example, to convert an integer to a string, you simply use it as a string. String
construction can be done with concatenation, which is often very convenient.
These principles are used in awkcast:
110
0 appended: number: 0, string Jim0
UNIX>
There are two special patterns, BEGIN and END, which cause the
corresponding actions to be executed before and after any lines are
processed respectively. Therefore, the following program (awkwc) counts the
number of lines and words in the input file.
BEGIN { nl = 0; nw = 0 }
{ nl++ ; nw += NF }
END { print "Lines:", nl, "words:", nw }
UNIX> awkwc awkwc
Lines: 5 words: 26
UNIX> wc awkwc
5 26 103 awkwc
UNIX>
Awk tries to process each statement on each line. Unlike sed, there is no
``hold space.'' Instead, each statement is processed on the original version of
each line. Two special commands in awk are next and exit. Next specifies to
stop processing the current input line, and to go directly to the next one,
skipping all the rest of the statements. Exit specifies for awk to exit
immediately.
Here are some simple examples. awkpo prints out only the odd numbered
lines (note that this is an awkward way to do this, but it works):
BEGIN { ln=0 }
{ ln++
if (ln%2 == 0) next
print $0
}
111
3 Bill Clinton
4 George Bush
5 Ronald Reagan
6 Jimmy Carter
7 Sylvester Stallone
awkptR prints out all lines until it reaches a lines with a capital R
/R/ { exit }
{ print $0 }
Bill Clinton
George Bush
UNIX>
Arrays
Arrays in awk are a little odd. First, you don't have to malloc() any storage --
just use it and there it is. Second, arrays can have any indices -- integers,
floating point numbers or strings. This is called ``associative'' indexing, and
can be very convenient. You cannot have multi-dimensional arrays or arrays
of arrays though. To simulate multidimensional arrays, you can just
concatenate the indices.
BEGIN { nt = 0 ; np = 0 }
112
This simply initializes two variables: nt is the number of tournaments, and np
is the number of players.
This only works on lines that are all capital letters. These are the lines that
identify tournaments. On these lines, it does the following:
The next part works on all lines that contain the pattern '--'. These are the
lines with golfers' scores:
/--/ { golfer = $1
for (i = 2; $i != "--" ; i++) golfer = golfer" "$i
if (isgolfer[golfer] != "yes") {
isgolfer[golfer] = "yes"
g[np] = golfer
np++;
}
score[golfer" "this] = $(i+1)
}
The first two lines of this action set the golfer variable to be the golfer's name.
Note that you can do string comparison in awk using standard boolean
operators, unlike in C where you would have to use strcmp().
The next 5 lines use awk's associative arrays: The array isgolfer is checked
to see if it contains the string ``yes'' under the golfer's name. If so, we have
processed this golfer before. If not, we sed the golfer's entry in isgolfer to
``yes,'' set the np-th entry of the array g to be the golfer, and increment np.
Finally, we set the golfer's score for the tournament in the score array. Note
that we don't use double-indirection. Instead, we simply concatenate the
golfer's name and the tournament's name, and use that as the index for the
array.
113
printf("\n")
}
}
The first three lines print out 25 spaces, and then the names of the
tournaments as held in the tourn array. Then we loop through each golfer,
and print the golfer's name, padded to 25 characters, and then his score in
each tournament. Note that if the golfer didn't play in the tournament, that
entry of the tournament array will be the null string. This is quite convenient,
because we don't have to test for whether the golfer played the tournament --
we can just use awk's default values.
UNIX> awkgolf kemper # Note that the ouput is only sorted because its
# sorted in the input file
KEMPER
Justin Leonard -10
Greg Norman -7
Nick Faldo -7
Nick Price -7
Loren Roberts -6
Jay Haas -5
Paul Stankowski -5
Lee Janzen -4
Phil Mickelson -4
Davis Love III -3
Tom Lehman 0
Vijay Singh 0
Kirk Triplett 1
Steve Jones 2
Mark O'Meara 5
Don Pooley missed
Ernie Els missed
Fred Couples missed
Hal Sutton missed
Jesper Parnevik missed
Scott McCarron missed
Steve Stricker missed
UNIX> cat masters usopen kemper memorial | awkgolf
MASTERS USOPEN KEMPER MEMORIAL
Tiger Woods 281 6 5
Tommy Tolles 283 2 -11
Tom Watson 284 16 0
Paul Stankowski 285 6 -5 -3
Fred Couples 286 13 missed
Davis Love III 286 5 -3 -7
Justin Leonard 286 9 -10 0
Steve Elkington 287 7
Tom Lehman 287 -2 0 -3
Ernie Els 288 -4 missed -1
114
Vijay Singh 288 21 0 -14
Jesper Parnevik 289 11 missed -4
Lee Westwood 291 6
Nick Price 291 6 -7
Lee Janzen 292 13 -4 -11
Jim Furyk 293 2 -12
Mark O'Meara 294 9 5 -2
Scott McCarron 294 3 missed missed
Scott Hoch 298 3 -11
Jumbo Ozaki 300 missed
Frank Nobilo 303 9 -10
Bob Tway missed 2 -7
Brad Faxon missed 17 2
David Duval missed 11 -5
Greg Norman missed missed -7 -12
Loren Roberts missed 4 -6
Nick Faldo missed 11 -7
Phil Mickelson missed 10 -4
Steve Jones missed 15 2 3
Steve Stricker missed 9 missed -1
Jay Haas 2 -5 -4
Billy Andrade 4 -7
Hal Sutton 6 missed -1
Kirk Triplett 1 -2
Don Pooley missed -4
UNIX>
File indirection
You can specify that the output of print and printf go to a file with indirection.
For example, to copy standard input to the file f1 you could do:
Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX>
Sometimes you just want to write a program that doesn't use standard input.
To do this, you just write the whole program as a BEGIN statement, exiting at
the end.
115
Multiline awk programs in the Bourne shell
The Bourne shell lets you define multiline strings simply by putting newlines in
the string (within single or double quotes, of course). This means that you can
embed simple multiline awk scripts in a sh program without having to use
cumbersome backslashes, or intermediate files. For example, shwc works
just like awkwc, but works as a shell script rather than an awk program.
Awk's limitations
Awk is useful for simple data processing. It is not useful when things get more
complex for a few reasons. First, if your data file is huge, you'll do better to
write a C program (using for example the fields library from CS302/360)
because it will be more efficient sometimes by a factor of 60 or more. Second,
once you start writing procedure calls in awk, it seems to me you may as well
be writing C code. Third, you often find awk's lack of double indirection and
string processing cumbersome and inefficient.
Awk is not a good language for string processing. Irritatingly, it doesn't let you
get at string elements with array operations. I.e. the following will fail:
116
Module 15
Database Using Shell Scripts
There are one or two facts about databases. If you know anything at all about
databases you'll know everything that follows.
117
echo "<p>"
echo "use mydatabase;" > /tmp/$$.sql
echo "select latitude,longitude,easting,northing from gazetteer where feature
= '$PLACE';" >> /tmp/$$.sql
mysql -u demo < /tmp/$$.sql > /tmp/$$.res
ROWS=`cat /tmp/$$.res | wc -l`
if [ $ROWS -eq 0 ]
then
echo "No information for" $PLACE
else
echo "<table border=2><tr>"
tail +2 /tmp/$$.res | sed -e 's/^/<tr><td>/
s/ /<td>/g'
echo "</table>"
fi
echo "</body></html>"
rm /tmp/$$.*
Actual database access is performed using the command line MySQL client
programme. To ensure that this can be found the search path is modified by
the second and third lines of the script.
PATH=$PATH:/usr/local/mysql/bin
export PATH
The name of the location being queried is then extracted from the
QUERY_STRING environment variable.
On a normal Unix system any user can create files in the directory /tmp, the
symbol $$ in the file name is replaced by the current process identification
number, this is always unique so avoids any problems with two instances of
the back end running simultaneously.
use mydatabase;
select latitude,longitude,easting,northing from gazetteer where feature =
'Prague';
The output from the MySQL client is also written to a temporary file. Typical
text is shown below (for a different query).
118
188820 -11160 325 284
197880 -5820 424 563
It will be noted that the output file includes column names and that columns
are separated by TAB characters.
The next step is to determine the number of lines in the output file, this will be
zero if no matches have been found. This is done by arranging the for the
standard Unix command wc to read the file and write the number of lines to its
standard output.
The code
if [ $ROWS -eq 0 ]
then
echo "No information for" $PLACE
else
echo "<table border=2><tr>"
tail +2 /tmp/$$.res | sed -e 's/^/<tr><td>/
s/ /<td>/g'
echo "</table>"
fi
Note that the sed edit script, introduced by the sed command line argument -e
spreads over two lines.
119
Example cat and variables
cat >> $sql0 <<-EOA
SET ECHO OFF
SET FEEDBACK OFF
SET HEADING OFF
SELECT my_package.my_function($column)
FROM v\$database
WHERE name LIKE '%&1%';
EXIT
EOA
sqlplus -s $uid/$password@database @$sql0 $sql_arg_1 > $log0
The file created has its name stored in the variable $sql0 and as we can see
the block between the EOA flags is the data that goes into the file. The data
block is actually a segment of SQL*Plus statements, as indicated by the
filename variable. As is common with SQL*Plus code, the key words are
picked out in ALL CAPS, with objects (tables, procedures, columns, etc.) all in
lower case. The SELECT line contains a reference to a called, packaged,
PL/SQL function which has a column name as an argument. Here the column
name is held in a variable called $column and this will be substituted at script
run-time by the real value.
120
Complex File Creation:
So what's the point of all this extra typing? Well for one thing it allows you to
put special bits of code into the block which will only be used at certain times,
by hiding them in complex command groups. This example shows how this is
done below.
This is basically the same block except the WHERE clause has been hidden
inside an if statement. Now, depending on the Database Type in the $db_type
variable, the WHERE clause can take one of two forms. Conveniently, the
additional argument which is not required by SQL*Plus in the first form, is
ignored at execution time, even though it is still available on the last line. This
is common with all scripts, arguments are only used if they are referenced
from within the script.
So there you have the first two ways of creating another file from a script. The
version using cat can only cope with a single output form, the version using
echo can output a multitude of forms depending on the complex command
forms you use. The choice is yours. There are, however, other ways to create
output files. You can use direct generation as in the example List to create a
list of files. Or the indirect method shown in the example Counted List where
lines are built inside a loop construct and then appended to the file to create a
menu file. Or in the example Sorted List where a list of words is sorted into
alphabetic order, duplicates are removed, then the rest stored in a file.
Example list
count=1
121
for file in `ls -1 *.log`
do
echo "$count: $file" >> $mnu0
count=`expr $count + 1`
done
122
Module 16
OVERVIEW OF PERL
What is perl?
Perl, sometimes referred to as Practical Extraction and Reporting Language,
is an interpreted programming language with a huge number of uses,
libraries and resources. Arguably one of the most discussed and used
languages on the internet, it is often referred to as the swiss army knife, or
duct tape, of the web.
Perl was first brought into being by Larry Wall circa 1987 as a general
purpose Unix scripting language to make his programming work simpler.
Although it has far surpassed his original creation, Larry Wall still oversees
development of the core language, and the newest version, Perl 6.
Running Perl
The simplest way to run a Perl program is to invoke the Perl interpreter with
the name of the Perl program as an argument:
perl sample.pl
The name of the Perl file is sample.pl, and perl is the name of the Perl
interpreter. This example assumes that Perl is in the execution path; if not,
you will need to supply the full path to Perl too:
/usr/local/hin/perl sample.pl
This is the preferred way of invoking Perl because it eliminates the possibility
that you might accidentally invoke a copy of Perl other than the one you
intended. W e will use the full path from now on to avoid any confusion.
c:\NTperl\perl sample.pl
123
Invoking Perl on UNIX
UNIX systems have another way to invoke an interpreter on a script file. Place
a line like
#!/usr/local/bin/perl
at the start of the Perl file. This tells UNIX that the rest of this script file is to be
interpreted by /usr/local/bin/perl. Then make the script itself executable:
chmod +x sample.pl
You can then "execute" the script file directly and let the script file tell the
operating system what interpreter to use while running it.
#!/usr/local/bin/perl -w -t
A Perl Script
Perl code can be quite free-flowing. The broad syntactic rules governing
where a statement starts and ends are
No prizes for guessing what happens when Perl runs this code; it prints
My name is Sreedhar
124
If the \n doesn't look familiar, don't worry; it simply means that Perl should
print a newline character after the text; in other words, Perl should go to the
start of the next line.
That's right, print is a function. It may not look like it in any of the examples so
far, where there are no parentheses to delimit the function arguments, but it is
a function, and it takes arguments. You can use parentheses in Perl functions
if you like; it sometimes helps to make an argument list clearer. More
accurately, in this example the function takes a single argument consisting of
an arbitrarily long list. We'll have much more to say about lists and arrays
later, in the "Data Types" section. There will be a few more examples of the
more common functions in the remainder of this chapter, but refer to the
"Functions" chapter for a complete run-down on all of Perl's built-in functions.
So what does a complete Perl program look like? Here's a trivial UNIX
example, complete with the invocation line at the top and a few comments:
That's not at all typical of a Perl program though; it's just a linear sequence of
commands with no structural complexity. The "Flow Control" section later in
this overview introduces some of the constructs that make Perl what it is. For
now, we'll stick to simple examples like the preceding for the sake of clarity.
Exercise:
125
Appendix A
List of basic UNIX Commands:
The basic UNIX commands include some of the most commonly used commands for
users, and constructs for building shell scripts.
The following charts offer a summary of some simple UNIX commands. These are
certainly not all of the commands available in this robust operating system, but these
will help you get started.
126
Ten VALUABLE UNIX Commands:
Once you have mastered the basic UNIX commands, these will be quite valuable in
managing your own account.
127
Ten FUN UNIX Commands:
These are ten commands that you might find interesting or amusing. They are
actually quite helpful at times, and should not be considered idle entertainment.
128
Ten HELPFUL UNIX Commands
These ten commands are very helpful, especially with graphics and word processing
type applications.
129
Ten USEFUL UNIX Commands:
These ten commands are useful for monitoring system access, or simplifying your
own environment.
130