You are on page 1of 130

UNIX SHELL SCRIPTING

Perhaps the most important achievement of UNIX is to demonstrate that a


powerful operating system for interactive use need not be expensive
either in equipment or in human effort: UNIX can run on hardware costing
as little as $40,000, and less than two man-years were spent on the main
system software.

The UNIX Time-Sharing System (1974)


Dennis M. Ritchie and Ken Thompson

1
Tabel of Contents

Module 1. Introduction to Operating System 01

Module 2 Exploring the UNIX Shell 20

Module 3 Processes 41

Module 4 A Shell Script 45

Module 5 A Overview 49

Module 6 The vi Editor 54

Module 7 The variable 57

Module 8 Parameters 68

Module 9 Regular Expressions 71

Module 10 A Sample Shell Script 80

Module 11 Useful Utilities of Shell 93

Module 12 Arithmetic on Shell 103

Module 13 Functions 104

Module 14 Sed and AWK 107

Module 15 Database Using Shell Script 117

Module 16 Overview of Perl 123

Exercise for Lab Experience

Appendix List of UNIX Commands 126

2
Module 1
Introduction to Operating System:
In simple terms, an operating system is a manager. It manages all the
available resources on a computer. These resources can be the hard disk, a
printer, or the monitor screen. Even memory is a resource that needs to be
managed. Within an operating system are the management functions that
determine who gets to read data from the hard disk, what file is going to be
printed next, what characters appear on the screen, and how much memory a
certain program gets.

Note: An operating system (OS) is a collection of system programs that


together control the operation of a computer system.

Operating systems may be classified by both how many tasks they can
perform `simultaneously' and by how many users can be using the system
`simultaneously'. That is: single-user or multi-user and single-task or multi-
tasking. A multi-user system must clearly be multi-tasking.

Single User Operating System

 MS D O S /P C D O S w a s d e sig n e d sp e cific a ly to su it a sin g le u s


e r‘s
requirements.
 The user can run only one program at a time.
 At any instance of time there is only one process going on in the CPU

Multi User Operating System

 Here the system is such that many users can work at a time. There is
one large CPU and high capacity storage medium enclosed into what
is called as the system unit and different terminals are attached to it.
E a ch u se r w o rks o n a se p a ra te te rm in a l a n d u tiliz e s th e C P
U ‘s
resources.
 Each users p ro g ra m a n d o th e r file s a re sto re d in th e syste m u n its‘
storage media. Thus the CPU is one and many users are using it.
Therefore there is a need of such an OS that will effectively divide the
resources of the CPU among all users. Such an OS is called a multi
user OS.

3
Features of Multi User OS

1. Multi Processing

 As many users are working at a time, every user will run their
own program. W hen one program is run by a user it is a
process. When the same program is run by another user it is
another process. If there are different users running different
programs there are many processes undergoing execution. A
u se r sh o u ld n o t w a it u n til o th e r u se rs‘p ro g ra m s fin is h e xe cu tio
n.

 Same program can share by many users at a time and run that
together. This ability of the OS to run several processing
together is called multi-processing.
2. Time Sharing

 The CPU can execute only one instruction at a time. Since there
are several users running their programs the OS divides the CPU
time for each user. It allots a definite time interval called time slice
w ith in w h ic h th a t u se r‘s p ro g ra m is e xe cu te d . O n ce th e tim e slic e
is
o v e r th e C P U sw itch e s to th e n e xt u se r a n d e xe c u te s th a t
u se r‘s p ro g ra m . A fte r th e tim e slic e o f th e u se r is o ve r th e n e
xt u se r‘s program is executed.

Thus eve ry u se r‘s p ro g ra m is co n sta n tly b e in g in te rru p te d b y a


n o th e r u se r‘s p ro g ra m b u t n o u se r re a liz e s th is b e ca u se th e C P
U
is very fast. Thus the OS effectively divides the CPU time between
several users.

3. Memory Management

 A program can run only if it is loaded into the internal memory. So


when many users are running their programs that all programs have
to be loaded into the memory. So the CPU memory is divided
logically such that all users programs get their share of the CPU
m e m o ry. A ls o w h e n a u se r‘s p rogram finishes execution it has to
be
eliminated from the internal memory and that part of the memory
sh o u ld b e u tiliz e d fo r sto rin g o th e r u se r‘s p ro g ra m .

4. Multi Tasking

 Many users work on a multi user environment each running their


own process. Thus there is more than one process executing
together. But a user can run more than one process or program for
him self if his requirement demands. Such an activity when a
number of processes are running for one user is called Multi-
4
tasking.
PARTS OF Operating System:

 Any Operating System consists of two parts.


o The Shell and
o The Kernel

The Shell: The shell acts as an interface between the user and the
machine and effectively interprets every command given by the user and
advices the kernel to act accordingly.

A single user OS will have only one shell devoted entirely to the user
whereas in a multi user OS every user will have a separate shell.

Kernel: The Kernel is the part of OS that interacts directly with the
hardware of the Computer system.

Why is UNIX Important?

During the past 25 years the UNIX Operating System has evolved into a
powerful, flexible, and versatile operating system. It serves as the Operating
System for all types of computers, including single user personal computers
and engineering workstations, multi-user microcomputers, minicomputers,
mainframes and supercomputers, as well as special purpose devices, with
approximately 20 million computers now running UNIX and more than 100
million people using these systems. This rapid growth is expected to continue.
The success of UNIX is due to many factors, including its portability to a wide
range of machines, its adaptability and simplicity, the wide range of tasks that
it can perform, its multi-user and multi tasking nature, and its suitability for
networking, which has become increasingly important as the Internet has
blossomed. What follows is a description of the features that have made the
UNIX system so popular.

Understanding UNIX:

The UNIX operating system was designed to let a number of programmers


access the computer at the same time and share its resources.
The operating system coordinates the use of the computer's resources,
allowing one person, for example, to run a spell check program while another
creates a document, lets another edit a document while another creates
graphics, and lets another user format a document -- all at the same time, with
each user oblivious to the activities of the others.
The operating system controls all of the commands from all of the keyboards
and all of the data being generated, and permits each user to believe he or
she is the only person working on the computer.
This real-time sharing of resources makes UNIX one of the most powerful
5
operating systems ever.
Although UNIX was developed by programmers for programmers, it provides
an environment so powerful and flexible that it is found in businesses,
sciences, academia, and industry. Many telecommunications switches and
transmission systems also are controlled by administration and maintenance
systems based on UNIX.
While initially designed for medium-sized minicomputers, the operating
system was soon moved to larger, more powerful mainframe computers. As
personal computers grew in popularity, versions of UNIX found their way into
these boxes, and a number of companies produce UNIX-based machines for
the scientific and programming communities.

The uniqueness of UNIX

The features that made UNIX a hit from the start are:

 Multitasking capability
 Multi-user capability
 Portability
 Cooperative Tools and Utilities
 Excellent Networking capability
 Open Source Code

Multitasking
Many computers do just one thing at a time, as anyone who uses a PC or
laptop can attest. Try logging onto your company's network while opening
your browser while opening a word processing program. Chances are the
processor will freeze for a few seconds while it sorts out the multiple
instructions.
UNIX, on the other hand, lets a computer do several things at once, such as
printing out one file while the user edits another file. This is a major feature for
users, since users don't have to wait for one application to end before starting
another one.

Multi-user
The same design that permits multitasking permits multiple users to use the
computer. The computer can take the commands of a number of users --
determined by the design of the computer -- to run programs, access files,
and print documents at the same time.
The computer can't tell the printer to print all the requests at once, but it does
prioritize the requests to keep everything orderly. It also lets several users
access the same document by compartmentalizing the document so that the
changes of one user don't override the changes of another user.

Portability
A major contribution of the UNIX system was its portability, permitting it to
move from one brand of computer to another with a minimum of code

6
changes. At a time when different computer lines of the same vendor didn't
talk to each other -- yet alone machines of multiple vendors -- that meant a
great savings in both hardware and software upgrades.
It also meant that the operating system could be upgraded without having all
the customer's data inputted again. And new versions of UNIX were backward
compatible with older versions, making it easier for companies to upgrade in
an orderly manner.

Cooperative Tools and Utilities

UNIX comes with hundreds of programs that are divided into two classes:
 Integral utilities that are absolutely necessary for the operation of the
computer, such as the command interpreter, and

 Tools that aren't necessary for the operation of UNIX but provide the
user with additional capabilities, such as typesetting capabilities and e-
mail.

Man
DC Mail

Calendar
fsck nroff vi

Fig 1.1 UNIX Tools

Tools can be added or removed from a UNIX system, depending upon the
applications required.

Excellent Networking Capability:


The UNIX system provides an excellent environment for networking. It offers
programs and utilities that provide the services needed to build networked
applications-the basis for distributed, networked computing. With networked
computing, information and processing is shared among different computers
in a network. The UNIX system has proved to be useful in client/server
computing. The UNIX system also has been the base system for the
development of Internet Services. UNIX provides an excellent platform for
Web Servers.

7
Open Source Code:
UNIX has provision for protecting data and communicating with other users.
The source code (Open Source) for the UNIX system has been made
available to users and programmers.

History of UNIX:
1965 Bell Laboratories joins with MIT and General Electric in the
development effort for the new operating system, Multics, which would
provide multi-user, multi-processor, and multi-level (hierarchical) file
system, among its many forward-looking features.

1969 AT&T was unhappy with the progress and drops out of the Multics
project. Some of the Bell Labs programmers who had worked on this project,
Ken Thompson, Dennis Ritchie, Rudd Canaday, and Doug McIlroy
designed and implemented the first version of the Unix File System on a PDP-
7 along with a few utilities. It was given the name UNIX by Brian Kernighan as
a pun on Multics.

1971 The system now runs on a PDP-11, with 16Kbytes of memory, including
8Kbytes for user programs and a 512Kbyte disk.

Its first real use is as a text processing tool for the patent department at Bell
Labs. That utilization justified further research and development by the
programming group. UNIX caught on among programmers because it was
designed with these features:

 Programmers environment
 Simple user interface
 Simple utilities that can be combined to perform powerful functions
 Hierarchical file system
 Simple interface to devices consistent with file format
 Multi-user, multi-process system
 Architecture independent and transparent to the user.

1973 UNIX is re-written using C, a new language developed by Dennis


Ritchie. Being written in this high-level language greatly decreased the effort
needed to port it to new machines.

1974 Thompson and Ritchie publish a paper in the Communications of the


ACM describing the new Unix OS. This generates enthusiasm in the
Academic community which sees a potentially great teaching tool for studying
programming systems development. Since AT&T is prevented from marketing
the product due to the 1956 Consent Decree they license it to Universities for
educational purposes and to commercial entities.

By 1977, the fifth and sixth editions had been released; these contained many
new tools and utilities. The number of machines running the UNIX System,

8
primarily at Bell laboratories and Universities, increased to more than 600 by
1978. The seventh edition, the direct ancestor of the UNIX Operating
System available today, was released in 1979.

UNIX System III, based on the Seventh edition, became A T & T ‘s first
commercial release of the UNUX System in 1982. However, after System III
was released, AT&T, through its W estern Electric manufacturing subsidiary,
continued to sell versions of the UNIX system. UNIX System III, the various
research editions, and experimental versions were distributed to colleagues at
universities and other research laboratories.

A UNIX System Timeline

The following timeline summarizes the development of UNIX from its


beginning -

Year UNIX Variant or Standard Comments

1969 UNICS (later called UNIX) A new operating system invented by


Ken Thompson and Dennis Ritchie
for the PDP-7

1973 Fourth Edition Written in C programming language;


widely used inside Bell Laboratories

First version widely available


1975 Sixth Edition outside of Bell Labs; more than 600
machines ran it

1978 3BSD Virtual memory

1979 Seventh Edition Included the Bourne shell, UUCP,


and C; the direct ancestor or
modern UNIX
1980 Xenix
Introduced by Microsoft
1980 4BSD
Introduced by UC Berkeley
1982 System III
First public release outside of Bell
1983 System V Release 1 Labs

1983 4.1BSD First supported release

UC Berkeley release with


1984 4.2BSD performance enhancements

9
UC Berkeley release with many
1984 System V Release 2 networking capabilities

Protection and locking of files,


enhanced system administration,
1986 HP-UX and job control features added

First version of HP-UX released for


1987 System V Release 3 HP Precision Architecture

1987 4.3BSD STREAMS, RFS, TLI added

1988 POSIX Minor enhancements to 4.2BSD

1989 System V Release 4 POSIX.l published

1990 XPG3 Unified System V, BSD, and Xenix

1990 OSF/1 X/Open specification set

Open Software Foundation release


1991 Linux 0.01 designed to compete with SVR4

Linus Torvalds started development


1992 SVR4.2 of Linux

USL developed version of SVR4 for


1992 HP-UX 9.0 the desktop

1993 Solaris 2.3 Supported workstations including a


GUI
1993 4.4BSD
POSIX compliant
1993 SVR4.2MP
Final Berkeley release
1994 Linux 1.0
Last version of UNIX developed by
USL
1994 Solaris 2.4
First version of Linux not considered
1995 UNIX 95 a "beta"

Motif supported
1995 Solaris 2.5
X/Open mark for systems registered
1995 HP-UX 10.0 under the Single UNIX Specification

CDE supported

1996 Linux 2.0 Conformed to the Single UNIX

10
Specification and the Common
Desktop Environment (CDE)
1997 Solaris 2.6
Performance improvements and
1997 Single UNIX Specification, Ver2 networking software added

1997 System V Release 5 (SVR5) UNIX 95 compliant, JAVA supported


(SCO)
Open Group specification set

1997 UnixWare 7 Enhanced SV kernel, including 64-


bit support, increased reliability, and
1997 HP-UX 11.0 performance enhancements

1998 UNIX 98 SCO UNIX based on SVR5 kernel

64-bit operating system

1998 Solaris 7 Open Group mark for systems


registered under the Single UNIX
Specification, Version 2
1999 Linux 2.2
Support for 64-bit applications, free
for noncommercial users

Device drivers added

Versions of UNIX Today


With most things in life, where there is active competition the best will
ultimately survive and triumph. This is the case with several different versions
or flavors of UNIX. Although many different versions exist, a common design
and/or code base is present in most of them. Also, two major kinds of UNIX
operating system software markets exist today. The commercial market is
where customers generally have to pay for the operating system software and
generally may not get any source code (well, not for free anyway!). The other
market is also commercial, but is considered open source. Open source
means that you get full access to the source code of the system or programs
and can make changes or modifications to that source code as long as you
maintain the rights of the original software owner.

Today, the UNIX leaders include Solaris, Linux, HP-UX, AIX, and SCO.

11
Why UNIX Is Popular?

Many people ask why UNIX is so popular or why it is used so much, in so


many different ways and in so many computing environments. The answer
lies with the very nature of UNIX and the model that was used to design,
build, and continuously improve the operating system.

Availability of Source Code

One of the most significant points of UNIX is the availability of source code for
the system. (For those new to software, source code contains the
programming elements that, when passed through a compiler, will produce a
binary program— which can be executed.) The binary program contains
sp e cific co m p u te r in stru ctio n s, w h ic h te ls th e syste m ―w h a t to d o .‖ W h e
n th e source code is available, it means that the system (or any
subcomponent) can
be modified without consulting the original author ofthe program. Access to
the source code is a very positive thing and can result in many benefits. For
example, if software defects (bugs) are found within the source code, they can
be fixed right away— without perhaps waiting for the author to do so.

Another great reason is that new software functions can be integrated into the
source code, thereby increasing the usefulness and the overall functionality of
th e so ftw a re . H a vin g th e a b ilty to e xte n d th e so ftw a re to th e u s e r‘s
requirements is a massive gain for the end user and the software industry as
a whole. Over time, the software can become much more useful. One
downside to having access to the source code is that it can become hard to
manage, because it is possible that many different people could have
modified the code in unpredictable (and perhaps negative) ways. However,
th is p ro b le m is typ ic a ly a d d re sse d b y h a vin g a ―so u rce co d e m a in ta
in e r,‖
which reviews the source code changes before the modifications are
incorporated into the original version.

Another downside to source code access is that individuals may use this
information with the goal in mind of compromising system or component
security. The Internet Worm of 1988 is one such popular example. The
author, who was a graduate student at Cornell University at the time, was able
to exploit known security problems within the UNIX system to launch a
software program that gained unauthorized access to systems and was able
to replicate itself to many networked computers. The Worm was so successful
in attaching and attacking systems that it caused many of the computers to
crash due to the amount of resources needed to replicate. Although the Worm
d id n ‘t a ctu a ly ca u se sig n ific a n t p e rm a n e n t d a m a g e to th e s yste m s it in fe
cte d ,
it opened the eyes of the UNIX community about the dangers of source code
access and security on the Internet as a whole.

Flexible Design

12
UNIX was designed to be modular, which makes it a very flexible architecture.
The modularity helps provide a framework that makes it much easier to
introduce new operating system tools, applications, and utilities, or to help in
the migration of the operating system to new computer platforms or other
d e vic e s. A lth o u g h so m e m ig h t a rg u m e n t th a t U N IX is n ‘t fle xib le e n o u g
h fo r their needs, it is quite adaptable and can handle most requirements.
This is evidenced by the fact that UNIX runs on more general computer
platforms and devices than any other operating system.

GNU

The GNU project, started in the early 1980s, was intended to act as a
counterbalance to the widespread activity of corporate greed and adoption of
lic e n s e a g re e m e n ts fo r co m p u te r so ftw a re . T h e ―GNU is not UNIX‖
p ro je ct
w a s re sp o n sib le fo r p ro d u cin g so m e o f th e w o rld ‘s m o st p o p u la r U
N IX
software.

This includes the Emacs editor and the gcc compiler. They are the
cornerstones of the many tools that a significant number of developers use
every day.

Open Software

UNIX is open, which basically means that no single company, institution, or


individual owns UNIX— nor can it be controlled by a central
authority. However, the UNIX name remains a trademark. Anyone using
the Internet may obtain open source software, install it, and modify it, and
then redistribute the software without ever having to shell out any money in
the process. The open source movement has gained great
advances and has clearly demonstrated that
quality software can, in fact, be free. Granted, it is quite true that certain
versions of UNIX are not open, and you do indeed need to pay to use
these operating systems in the form of an end-user licensing agreement.
Generally speaking, vendors that charge for UNIX represent only
a portion of the total number of UNIX releases available within the UNIX
community.

Programming Environment

UNIX provides one ofthe best development environments available by


providing many of the important tools software developers need. Also, there
are software tools such as compilers and interpreters for just about every
major programming language known in the world. Not only can one write
programs in just about any computer language, UNIX also provides additional
development tools such as text editors, debuggers, linkers, and related
software. UNIX was conceived and developed by programmers for
programmers, and it stands to reason that it will continue to be the
p ro g ra m m e r‘s d e ve lo p m ent platform of choice now and in the future.
13
Availability of Many Tools

UNIX comes with a large number of useful applications, utilities, and


p ro g ra m s, w h ic h m a n y p e o p le co n sid e r to b e o n e o f U N IX
‘s g re a te st strengths. They are collectively known or commonly referred
to as UNIX
―to o ls ,‖ a n d th e y co ve r a w id e ra n g e o f fu n ctio n s a n d p u rp o se s. O n
e o f th e most significant aspects of UNIX is the availability of software to
accomplish one or more very specific tasks. You will find throughout this
text that the concept of tools is quite universal and is used repeatedly. This
book not only discusses the subject of system administration but also
provides detailed descriptions of UNIX-based tools. As a system
administrator, you will come to depend on certain tools to help you do your
job. Just as construction workers
rely on the tools they use, so too will the administrator rely on the software
that permits them to handle a wide range of functions, tasks, issues, and
problems.

There are tools to handle many system administration tasks that you might
encounter. Also, there are tools for development, graphics manipulation, text
processing, database operations— just about any user- or system-
related re q u ire m e n t. If th e b a s ic o p e ra tin g s yste m v e rsio n d o e s n ‘t p ro v id e
a p a rticular tool that you need, chances are that someone has already
developed the tool and it would be available via the Internet.

System Libraries

A system library is a collection of software that programmers use to augment


their applications. UNIX comes with quite a large collection of functions or
routines that can be accessed from several different languages to aid the
application writer with a variety of tasks. For example, should the need arise
to sort data, UNIX provides several different sort functions.

Well Documented

UNIX is well documented with both online manuals and with many reference
books and user guides from publishers. Unlike some operating systems, UNIX
provides online main page documentation of all tools that ship with the
system.

Also, it is quite customary that open source tools provide good


documentation.

Further, the UNIX community provides journals and magazine articles about
UNIX, tools, and related topics of interest.

14
ARCHITECTURE OF UNIX SYSTEM:
To understand how the UNIX System works, you need to understand its
structure. The UNIX Operating System is made up of several major
components. Those components include the Kernel, the shell, the file
system, and the commands or user programs.

UNIX is a layered operating system. The innermost layer is the hardware that
provides the services for the OS. The operating system, referred to in UNIX
as the kernel, interacts directly with the hardware and provides the services
to th e u se r p ro g ra m s. T h e se u se r p ro g ra m s d o n ‘t n e e d to kn o w a
n ything about the hardware. They just need to know how to interact with
the kernel a n d it‘s u p to th e k e rn e lto p ro vid e th e d e sire d se rvic e . O n e o f th
e b ig a p p e a ls
of UNIX to programmers has been that most well written user programs are
independent of the underlying hardware, making them readily portable to new
systems.

Note: The core of the UNIX system is the Kernel. The kernel controls the
co m p u ter’s reso u rces,allo ttin g th em to d ifferen t u sers an d to d ifferen t
tasks.

User programs interact with the kernel through a set of standard system
calls. These system calls request services to be provided by the kernel. Such
services would include accessing a file: open close, read, write, link, or
execute a file; starting or updating accounting records; changing ownership of
a file or directory; changing to a new directory; creating, suspending, or killing
a process; enabling access to hardware devices; and setting limits on system
resources.

UNIX is a multi-user, multi-tasking operating system. You can have many


use rs lo g g e d in to a syste m sim u lta n e o u sly , e a ch ru n n in g m a n y p ro g ra m
s. It‘s th e ke rn e l‘s jo b to ke e p e a ch p ro ce ss a n d u se r se p a ra te a n d to
re g u la te access to system hardware, including CPU, memory, disk and
other I/O
devices.

UNIX utilities or commands are a collection of about 200 programs that


service the day-to-day processing requirements. These programs are invoked
through the shell, which is itself another utility.

Apart from the utilities that are provided as part of the UNIX operating system,
more than a thousand UNIX based application programs, like database
management systems, word processors, accounting software etc.,

The basic unit used to organize information in the UNIX System is called a
file. The UNIX file system provides a logical method for organizing, storing,
retrieving, manipulating, and managing information.

15
UNIX SHELLS

The Shell reads your commands and interprets them as requests to execute
a program or programs, which it then arranges to have carried out. Because
the shell plays this role, it is called a command interpreter. Besides being a
command interpreter, the shell is also a programming language. As a
programming language, it permits you to control how and when commands
are carried out. For each user working with UNIX at any time different shell
programs are raining. There may be several shells running in memory, but
only one kernel.

UNIX shell, including three major variants of the shell.

1. The Bourne shell

2. The C Shell

3. The Korn shell

16
The original UNIX system shell, sh, was written by Steve Bourne, and as a
result it is known as the Bourne shell.

The C shell, csh, was originally developed as part of BSD UNIX. csh
introduced a number of important enhancement to sh, including the concept
of a command history list and job control.

The Korn shell, ksh, builds on the sh and extends it by adding many features
from the C shell.

Each of these shells has their own respective prompts. The Bourne shell has
the $ prompt. So when you login it is the bourn shell that is established for
you and the stage is set for you to work on the machine.

Features of Shell:

 Interactive Processing: It acts as an interface and provides


communication between the users and the system.

 Background Processing: Time consuming; non-interactive tasks can


proceed while the user continues with other processing.

 Input/Output redirection: Programs, which can interact with a user,


can be made to take their input from another source, such as a file and
send their output to another destination, such as printers.

 Shell Scripts: A frequently used sequence of shell commands can be


stored in a file. The name of the file can be later used to execute the
stored sequence with a single command.

 Shell Variables: The user can control the behavior of the shell, as well
as other programs utilities by storing data in variables.

17
The File System
The UNIX file system looks like an inverted tree structure. You start with the
root directory, denoted by /, at the top and work down through sub-directories
underneath it.

Sreedhar Solo STUD

Each node is either a file or a directory of files, where the latter can contain
other files and directories. You specify a file or directory by its path name,
either the full, or absolute, path name or the one relative to a location. The full
path name starts with the root, /, and follows the branches of the file system,
each separated by /, until you reach the desired file, e.g.:

/home/Sreedhar/source/xntp

A relative path name specifies the path relative to another, usually the current
working directory that you are at. Two special directory entries should be
introduced now:

● the current directory


● ● the parent of the current directory

S o if I‘m a t /h o m e /fra n k a n d w is h to sp e cify th e p a th a b o ve in a re la tiv e


fashion I could use:

● ● /Sreedhar/source/xntp

18
This indicates that I should first go up one directory level, then come down
through the Sreedhar directory, followed by the source directory and then to
xntp.

Unix Directories, Files and Inodes

Every directory and file is listed in its parent directory. In the case of the root
directory, that parent is itself. A directory is a file that contains a table listing
the files contained within it, giving file names to the inode numbers in the list.
An inode is a special file designed to be read by the kernel to learn the
information about each file. It specifies the permissions on the file, ownership,
date of creation and of last access and change, and the physical location of
the data blocks on the disk containing the file.

The system does not require any particular structure for the data in the file
itself. The file can be ASCII or binary or a combination, and may represent
text data, a shell script, compiled object code for a program, directory table,
junk, or anything you would like.

T h e re ‘s n o h e a d e r, tra ile r, la b e l in fo rm a tio n o r EOF character as part of the


file.

Unix Programs

A program, or command, interacts with the kernel to provide the


environment and perform the functions called for by the user. A program can
be: an executable shell file, known as a shell script; a built-in shell command;
or a source compiled, object code file.

The shell is a command line interpreter. The user interacts with the kernel
through the shell. You can write ASCII (text) scripts to be acted upon by a
shell.

System programs are usually binary, having been compiled from C source
code. These are located in places like /bin, /usr/bin, /usr/local/bin, /usr/ucb,
etc.

19
Module 2
Exploring the UNIX Shell:
The shell is a rather unique component of the UNIX operating system since it
is one of the primary ways to interact with the system. It is typically through
the shell that users execute other commands or invoke additional functions.

The shell is commonly referred to as a command interpreter and is


responsible for executing tasks on behalf of the user. Figure 2-1 shows a
pictorial view of how the shell fits with the UNIX system. As you can see, the
shell operates within the framework just like any other program. It provides an
interface between the user, the operating system functions, and ultimately the
system Kernel.

The UNIX Shell


Another powerful feature of the UNIX shell is the ability to support the
development and execution of custom shell scripts. The shell contains a mini
programming language that provides a lightweight way to develop new tools
and utilities without having to be a heavyweight software programmer. A UNIX
shell script is a combination of internal shell commands, regular UNIX
commands, and some shell programming rules.

UNIX supports a large number of different shells, and also many of the
popular ones are freely available on the Internet. Also, many versions of UNIX
come with one or more shells and as the system administrator, you can install
20
additional shells when necessary and configure the users of the system to use
different shells, depending on specific preferences or requirements. The table
below lists many of the popular shells and a general description of each.

Once a user has logged into the system, the default shell prompt appears and
the shell simply waits for input from the user. Thus, logging into a Solaris
system as the root user for example, the standard Bourne shell prompt will be

The system echoes this prompt to signal that it is ready to receive input from
the keyboard. At this point, this user is free to type in any standard UNIX
command, application, or custom script name and the system will attempt to
execute or run the command. The shell assumes that the first argument given

Shell Name General Description


sh Standard Bourne shell, which is one of the most popular shells
around.

csh Standard shell with C like language support

bash GNU Bourne-Again shell that includes elements from the Korn
shell and C shell.

tcsh Standard C shell with command-line editing and filename


completion capabilities.

ksh The Korn shell combines the best features of the Bourne and C
shells and includes powerful programming tools

zsh Korn shell like, but also provides many more features such as
built-in spell correction and programmable command completion.

Accessing a UNIX System

The configuration you use to access your UNIX System can be based on one
of two basic models: using multi-user computer or single user computer.

On a multi-user system, you use your own terminal device to access the UNIX
system. The computer you access can be a workstation, a microcomputer, a
mainframe computer, or even a super computer.

Single user systems are direct personal computer. In this you can directly run
UNIX OS. (UnixWare 7.1 by SCO, Solaris 7 from SunSoft, Public domain
Version of UNIX, and popular variant of UNIX known as Linux can use on
single user system).

21
Your display can be character-based, or it can be bit mapped. It may display a
single window or multiple windows, as in the X-Windows system.

Before You Start


UNIX System from a PC: Many different application packages, called
terminal emulators, run on a PC and enable you to connect to a UNIX system.
Terminal emulators all function the same basic way, in that they act as
terminal attached to the UNIX machine. This allows you to enter commands
the same way that you would if you were using a terminal.

UNIX System from a Terminal: If your terminal has not been set to work with
a UNIX System, you must have its options set appropriately. Setting options is
done in different ways on different terminals.

Selecting a LOGIN : Every UNIX System has at least one person, called the
System Administrator, whose job is to maintain the system, and make it
available to its users. The system administrator is also responsible for adding
new users to the system and setting up their initial work environment on the
computer.

Login name should created by the system administrator. In general, login


name (logname) can be almost any combination of letters and numbers, but
the UNIX System places some constraints on logname selections:

 Login name must be more than two characters long, and if it is longer
than eight, only the first eight characters are relevant.

 It can contain any combination of lowercase letters and numbers and


must begin with a lowercase letter. If you log in using uppercase
letters, a UNIX system will assume that your terminal can only receive
uppercase letters, and will only send uppercase letters for the entire
session.

 Your logname should not have any symbols or spaces in it, and it must
be unique for each user. Some lognames are reserved customarily for
certain uses. For example, the root normally refers to the system
administrator or superuser who is responsible for the whole system.

Connecting to a UNIX System:


Direct Connect: With single user workstations and personal computers, and
with the primary administration terminal on a multi-user system (console), a
cable permanently connects the terminal with the computer. After booting
your PC and invoking your terminal emulator or turning on your terminal, hit
the carriage return and you should see the UNIX System prompt that says

22
login:

Dial in Access: You may have to dial into the computer using a modem
before you are connected. Use your emulator or dial function to dial the UNIX
System access number. W hen the system answers the call, you will hear a
high-pitched tone you should see some characters appear on screen. Then
you getting UNIX system login prompt.

Local Area Network: Another means of connecting your PC or terminal to the


UNIX System is via a local I network. A local area network (LAN) is a set of
communication devices and cables t connects several PCs or terminals and
computers. A number of LAN environments are in use today, such as LAN
Manager and NetWare. Each LAN environment provides a set of software that
can be used in conjunction with a specialized hardware card at each end of
the network, called a NIC (network interface card) or a LAN card; that enables
you to connect a client machine to a server machine. The clients and servers
may be running Windows or UNIX, or both. The protocol most frequently used
to connect a client machine to a UNIX server is TCP/IP, with other protocols
such as IPX and SPX also widely used on LANs.

An example of this environment would be a group of Windows PCs connected


to a common UNIX server running a UNIX operating system such as
UnixWare 7, Solaris, or Linux. This type of environment usually is maintained
by a LAN administrator, a person who knows how local area networks work.
This is often the same person like system administrator.

In accessing a UNIX System on a LAN, you first need to configure your PC to


be able to recognize the system you wish to connect to.

IP Network: If PC is connected to an IP network, such as the Internet or an


intranet, you can use the telnet command to access any computer on this
network that allows such connections. The computer you access may be a
UNIX computer, or a computer running some other Operating system, and it
may be a local computer or one located thousands of miles away. A variety of
telnet commands can help you manage a telnet session with the computer
you accessing.

Logging In:

As a multi-user system, the UNIX System first requires that you identify
yourself before you access to the system.

login: <user login name>

23
Changing Your Password:

When you first log into a UNIX System, you will have either no password at all
(a null password) or an arbitrary password assigned by the system
administrator. These are only intended for temporary use. Neither offers any
real security. A null password gives anyone access to your account; one
assigned by the system administrator is likely to be easily guessed by
someone. Officially assigned passwords often consist of simple combinations
of your initials and your student, employee, or social security number. If your
password is simply your employee number and the letter X, anyone with
access to this information has access to all of your computer files. Sometimes
random combinations of letters and numbers are used. Such passwords are
difficult to remember, and consequently users will be tempted to write them
down in a convenient place. (Resist this temptation!)

The passwd Command :

You change your password by using the passwd command. When you issue
this command, the system checks to see if you are the owner of the login.
This prevents someone from changing your password and locking you out of
your own account. passwd first announces that it is changing the password,
and then it asks for your (current) old password, like this:

$ passwd

passwd: changing password

Old password:

New password:

Re-enter new password:

The system asks for a new password and asks for the password to be verified
(you do this by retyping it). The next time you log in, the new password is
effective. Although you can ordinarily change your password whenever you
want, on some systems after you change your password you must wait a
specific period of time before you can change it again.

How to pick a password?

When choosing a password, it is important that it be something that could not


be guessed -- either by somebody unknown to you trying to break in, or by an
acquaintance who knows you. Suggestions for choosing and using a
password follow:

24
 Don't . Use a word (or words) in any language
 Use a proper name
 Use information that can be found in your wallet
 Use information commonly known about you (car license, pet
name, etc)
 Use control characters. Some systems can't handle them
 Write your password anywhere
 Ever give your password to *anybody*

 Do . Use a mixture of character types (alphabetic, numeric, special)


 Use a mixture of upper case and lower case
 Use at least 6 characters
 Choose a password you can remember
 Change your password often
 Make sure nobody is looking over your shoulder when you
are entering your password

Caution: If you do forget your password, there is no way to retrieve it.


Because it is encrypted, even your system administrator cannot lookup
your password. If you cannot remember it administrator will have to give
you a new password.

Changing a Password at Initial Login

On some systems, you will be required to change your password the first time
you log in. This will work as described previously and will look like this:

login: sreedhar
Password:
Your password has expired.
Choose a new one.
Old password:
New password:
Re-enter new password:

Password Aging

To ensure the secrecy of your password, you will not be allowed to use the
same password for long stretches of time. On UNIX Systems, passwords age.
When yours gets to the end of its lifespan, you will be asked to change it. The
length of time your password will be valid is determined by your system
administrator. However, you can view the status of your password on most
UNIX systems. Generally, the s option to the passwd command shows you
the status of your password, like this:

25
$ passwd -s
rayjay PW 04/01/99 7 30 5
name
passwd status
date last changed
min days between changes
max days between changes
days before user will be warned to change password

The first field contains your login name; the next fields list the status of your
password, the date it was last changed, and the minimum and maximum days
allowed between password changes; and the last field is the number of days
before your password will need to be changed. Note that this is simply an
example-Km your system, you may not be allowed to read all of these fields.

An Incorrect Login

If you make a mistake in typing either your login or your password, the UNIX
System will respond this way:

login: sreedhar
Password:
Login Incorrect
login:

You will receive the "Password:" prompt even if you type an incorrect or
nonexistent login name. This prevents someone from guessing login names
and learning which one is valid by discovering one that yields the
"Password:" prompt. Because any login results in "Password:" an intruder
cannot guess login names in this way.

If you repeatedly type your login or password incorrectly (three to five times,
depending on how your system administrator has set the default), the UNIX
System will disconnect your terminal if it is connected via modem or LAN. On
some systems, the system administrator will be notified of erroneous login
attempts as a security measure. If you do not successfully log in within some
time interval (usually a minute), you will be disconnected.

If you have problems logging in, you might also check to make sure that your
CAPS LOCK key has not been set. If it has been set, you will inadvertently enter
an incorrect logname or password, because in UNIX uppercase and
lowercase letters are treated differently. (Note that unlike in some other
environments, your account will not get locked if you enter your password
incorrectly some number of times, you will just get disconnected.)

26
When you successfully enter your login and password, the UNIX System
responds with a set of messages, similar to this:

login: sreedhar
Password:
UNIX System V/386/486 Release 4.0 Version 3.0
minnie
Copyright (c) 1984, 1986, 1987, 1988, 1989, 1990 AT&T
Copyright (C) 1987, 1988 Microsoft Corp.
Copyright (C) 1990, NCR Corp.
All Rights Reserved
Last login: Mon January 29 19:55:17 on term/17

You first see the UNIX System announcement that tells you the particular
version of UNIX you are using. Next you see the name of your system, minnie
in this case. This is followed by the copyright notice.

Finally, you see a line that tells you when you logged in last. This is a security
feature. If the time of your last login does not agree with when you remember
logging in, call your system administrator. This discrepancy could be an
indication that someone has broken into your system and is using your login.

After this initial announcement, the UNIX System presents system messages
and news.

Message of the Day (MOID)

Because every user has to log in, the login sequence is the natural place to
put messages that need to be seen by all users. When you log in, you will first
see a message of the day (MOTD). Because every user must see this MOTD,
the system administrator (or root) usually reserves these messages for
comments of general interest, such as this:

Attention ALL Users !!!


minnie will be coming down on Sunday Feb. 5, 2007 from
8:00am until 12:00pm (noon) for system maintenance. Please
schedule your work accordingly. Thank you.

The UNIX System Prompt

After you log in, you will see the UNIX System command prompt at the far left
side of the current line. The default system prompt (for most UNIX Systems) is
the dollar sign:

27
$

This $ is the indication that the UNIX System is waiting for you to enter a
command.

In the examples in this book, you will see the $ at the beginning of a line as it
would be seen on the screen, but you are not supposed to type it.

The command prompt is frequently changed by users. Users who have


accounts on different machines may use a different prompt on each one to
remind them which computer they are using. Some users change their prompt
to tell them where they are in the UNIX file system or you may simply find the
$ symbol unappealing and wish to use a different symbol or set of symbols
that you find more attractive. It is simple to do this.

The UNIX System enables you to define a prompt string, PS1, which is used
as a command prompt. The symbol PS1 is a shell variable (see Chapter 7)
that contains the string you want to use as your prompt. To change the
command prompt, set PS1 to some new string. For example,

$ PS1 = "UNIX:> "

changes your primary prompt string from whatever it currently is to the string "
UNIX:> ". From that point, whenever the UNIX System is waiting for you to
enter a command, it will display this new prompt at the beginning of the line.
You can change your prompt to any string of characters you want. You can
use it to remind yourself which system you are on, like this:

$ PS1="MyUnix->
MyUnix->

or simply to give yourself a reminder:

$ PS1="Leave at 4:30 PM> "


Leave at 4:30 p.m.>

If you redefine your prompt, it stays effective until you change it or until you
log off. Later in this chapter, you will learn how to make these changes
automatically when you first log in.

Some Basic UNIX Commands


Entering Commands on UNIX Systems

The UNIX System makes a large number of programs available to the user.
To run one of these programs you issue a command. For example, when you
type news or passwd, you are really instructing the UNIX System command
interpreter to execute a program with the name news or passwd, and to
display the results on your screen.

28
Some commands simply provide information to you; news works this way. An
often-used command is date, which prints out the current day, date, and time.
There are hundreds of other commands, and you will learn about many of
them in this book. Different variants of the UNIX system share a large
common set of commands (sometimes different names are used for the same
command in different UNIX variants) and provide other commands that are
unique for that particular version of UNIX.

Unix Command Line Structure

The UNIX system offers several file and directory related commands which
the user can use according to his requirement.

A command is a program that tells the Unix system to do something. It has


the form:

command [options] [arguments]

where an argument indicates on what the command is to perform its action,


usually a file or series of files. An option modifies the command, changing the
way it performs.

Commands are case sensitive. command and Command are not the same.

Options are generally preceded by a hyphen (-), and for most commands,
more than one option can be strung together, in the form:

command -[option][option][option]

e.g.: ls – alR

will perform a long list on all files in the current directory and recursively
perform the list through all sub-directories.

For most commands you can separate the options, preceding each with a
hyphen, e.g.:

command -option1 -option2 -option3

as in:

ls -a -l – R

Some commands have options that require parameters. Options requiring


parameters are usually specified separately,

e.g.: lpr – P printer3 -# 2 file

will send 2 copies of file to printer3.

29
These are the standard conventions for commands. However, not all Unix
co m m a n d s w il fo lo w th e sta n d a rd . S o m e d o n ‘t re q u ire th e h yp h e n
b e fo re o p tio n s a n d so m e w o n ‘t le t yo u g ro u p o p tio n s to g e th e r, i.e . th e y m
a y re q u ire that each option be preceded by a hyphen and separated by white
space from other options and arguments.

Options and syntax for a command are listed in the man page for the
command.

UNIX Commands:

UNIX comes with a large number of commands that fall under each of the
categories listed above for both the generic user and the system
administrator. It is quite hard to list and explain all of the available UNIX
functions and/or commands in a single book. Therefore, a review of some of
the more important user-level commands and functions has been provided
and subsequent modules provide a more in-depth look at system-level
commands. All of the commands discussed below can be run by generic
users and of course by the system administrator. However, one or more
subfunctions of a command may be available only to the system
administrator.

The standard commands are listed bellow, which are available across many
different versions of UNIX. For example, if we wanted to get a listing of all the
users that are currently logged into the system, the who command can be
used.

UNIX Command Meaning

cat Show the content of file.

date Show system date and time.

hostname Display name of system.

find Search for a specific file.

grep Search a file for specified pattern.

ls List files in a directory.

more Another command to show content of file.

ps Show status of processes.

who Show current users on the system.

30
Metacharacters and Wildcards

The metacharacters have special meaning to the shell; they should not
normally be used as any part of a file name.

The "-" symbol can usually be used in a filename provided it is not the first
character. For example, if we had a file called -l then issuing the command ls
-l would give you a long listing of the current directory because the ls
command would think the l was an option rather than -l being a file name
argument. Some UNIX commands provide facilities to overcome this problem.

The shell offers certain special characters called a wild card character that
helps us to specify certain patterns. The shell will then match the pattern in
the file names and select all the files whose name matches the pattern and
will apply the specified file command. The wild card characters are as follows

 This wild card character matches any number of characters.

 Therefore any pattern which contains the  symbol it will be replaced


by any number of any characters.

31
The wildcard ? is expanded by the shell to match any single character in a file
name. The exception is that the ? w il N O T m a tch a d o t ―.‖ a s the
first character of a file name (for example, in a hidden file).

The wildcard * is expanded by the shell to match zero to any number of


characters in a file name. The single * will be expanded to mean all files in the
current directory except those beginning with a dot. Beware of the command
rm * which could cause serious damage removing all files!

Specifying a Multiple File Names

Multiple filenames can be specified using special pattern-matching characters.


The rules are:

 '?' matches any single character in that position in the filename.


 '*' matches zero or more characters in the filename. A '*' on its
own will match all files. '*.*' matches all files with containing a '.'.
 Characters enclosed in square brackets ('[' and ']') will match
any filename that has one of those characters in that position.
 A list of comma separated strings enclosed in curly braces ("{"
and "}") will be expanded as a Cartesian product with the
surrounding characters.

For example:

1. ??? matches all three-character filenames.

32
2. ?ell? matches any five-character filenames with 'ell' in the
middle.
3. he* matches any filename beginning with 'he'.
4. [m-z]*[a-l] matches any filename that begins with a letter from 'm'
to 'z' and ends in a letter from 'a' to 'l'.
5. {/usr,}{/bin,/lib}/file expands to /usr/bin/file /usr/lib/file /bin/file and
/lib/file.

Note that the UNIX shell performs these expansions (including any filename
matching) on a command's arguments before the command is executed.

Example
*c

includes all files ending with '.c' because * stands for any number of
any characters, e.g new.c, ptr.c, str.c etc.

A command like rm *.c will therefore delete all files ending with '.c' The
other files which do not end with '.c' will be retained. The pattern
specifies that the files must neccessarily end with '.c'.

? ▬ T h is w ild ca rd sp e cifie s a n y o n e ch a ra cte r. T h e re fo re in a p a tte rn if th e


wild card ? appears then it will be replaced by any one character.

Example
cat ab?xy

The above command will display the contents of all files whose name starts
with ab followed by any one character followed by xy.

This wild card specifies any one of the character listed out within the [ ].

Example
rm ab[efg]yz

The above command will delete all the files that begin with ab followed by
either e, f, or g followed by xy.

PIPES  UNIX offers a provision whereby the output of one program can be
made the
input of another program. Both the programs are separated by the |
symbol.

Example
$ cat fil.cjpg

33
The above command will display the contents of the file fll.c page by page
because the output is piped to a program called pg which displays the output
only one screenful at a time.

UNIX Standard Files:


There are three files are automatically opened for each process in the system.

These files are referred to as standard input, standard output and standard
error.

Standard input, sometimes abbreviated to stdin is where a command expects


to find its input, usually the keyboard.

Standard out (stdout) and standard error (stderr) is where the command
expects to put its output, usually the screen.

These defaults can be changed using redirection.

34
Note: Remember that in AIX, not all file names refer to real data files!

S o m e file s m a y b e ―sp e cia lfile s‖ w h ic h in re a lity a re a p o in te r to so m e of


the devices on the system (for example /dev/tty0).

35
36
37
Two or more commands can be separated by a pipe on a single command
line. The requirement is that any command to the left of a pipe must send
output to standard output.

Any command to the right of the pipe must take its input from standard input.

The example on the visual shows that the output of who is passed as input to
wc -l, which gives us the number of active users on the system.

38
A command is referred to as a filter if it can read its input from standard input,
alter it in some way, and write its output to standard output. A filter can be
used as an intermediate command between pipes.

A filter is commonly used with a string of piped commands, as in the example


above. The ls -l command lists all the files in the current directory and then
pipes this information to the grep command. The grep command will be
covered in more detail later in the course, but in this example, the grep
command is used to find all lines beginning with a d (directories).

The output of the grep command is then piped to the wc -l command. The
result is that the command is counting the number of directories. In this
example, the grep command is acting as a filter.

Placing multiple commands separated b y a ― ; ‖ o n a sin g le lin e p ro d u ce


s th e same result as entering each command on a separate command line.
There need be no association between the two commands.

39
The \ must be the last character on the line and immediately followed by
pressing Enter.

Do not confuse the continuation prompt > with the redirection character >. The
secondary prompt will not form part of the completed command line. If you
require a redirection character you must type it explicitly.

Module 3
Processes:

A program or a command that is actually running on a system is referred to as


40
a process.

UNIX can run a number of different processes at the same time as well as
many occurrences of a program (such as vi) existing simultaneously in the
system.

The process ID (PID) is extracted from a process table.

In a shell environment, the process ID is stored in the variable $$.

To identify the running processes, execute the command ps, which will be
covered later in this course. For example, ps -u team01 shows all running
processes from user team01.

41
ps prints information only about processes started from your current terminal.
Only the Process ID, Terminal, Elapsed Time and Command are displayed.
The -e option displays information about EVERY process running in the
system.

The -f option in addition to the default information provided by ps, displays the
User Name, PPID, start time for each process (that is, a FULL listing).

The -l option displays the User ID, PPID and priorities for each process in
addition to the information provided by ps (that is, a LONG listing)

42
Processes that are started from and require interaction with the terminal are
called foreground processes. Processes that are run independently of the
initiating terminal are referred to as background processes.

Background processes are most useful with commands that take a long time
to run.

A process can only be run in the background if:

1. It doesn't require keyboard input, and


2. It is invoked with an ampersand & as the last character in the command
line.

Notes: The <ctrl-c> may not always work. A Shell script or program can trap
the signal a <ctrl-c> generates and ignore its meaning.

43
You can stop a foreground process by pressing <ctrl-z>. This does not
terminate the process; it suspends it so that you can subsequently restart it.

To restart a suspended processes in the background, use the bg command.


To bring a suspended or background process into the foreground, use the fg
command.

To find out what suspended/background jobs you have, issue the jobs
command.

The bg, fg, kill commands can be used with a job number. For instance, to
kill job number 3, you can issue the command: kill %3 The jobs command
does not list jobs that were started with the nohup command if the user has
logged off and then logged back into the system. On the other hand, if a user
invokes a job with the nohup command and then issues the jobs command
without logging off, the job will be listed.

44
Module 4
Shell Script:

A shell script is a simple text file that contains UNIX commands.

When a shell script is executed, the shell reads the file one line at a time and
processes the commands in sequence.

Any UNIX command can be run from within a shell script. There are also a
number of built-in shell facilities which allow more complicated functions to be
performed. These will be illustrated later.

Any UNIX editor can be used to create a shell script.

45
A shell script is a collection of commands in a file. In the example a shell
script hello is shown.

To execute this script, start the program ksh and pass the name of the shell
script as argument:

$ ksh hello

This shell reads the commands from the script and executes all commands
line by line.

The .profie file


A fte r a u se r lo g s in a n d a s p a rt o f sta rtin g u p th e u se r‘s sh e ll, two profile
files are executed. The first is the system profile /etc/profile, which is run by
every user, and the second is the .profile in the user home directory, which
is only run by the user who owns it.

The .profile contains a sequence of commands that help you customize your
environment. Because the .profile is read each time you start a new Korn
shell, the commands you put in this file to customize your environment will be
executed each time you start a new ksh.

These commands can include, but are certainly not limited to, the following:

46
1. aliases
2. terminal control characteristics
3. creation/definition of shell environment variables (including your
prompt)

The first file that the operating system uses at login is the /etc/environment
file. This file contains variables specifying the basic environment for all
processes and can only be changed by the system administrator.

The second file that the operating system uses at login time is the /etc/profile
file. This file controls system-wide default variables such as the mail
messages and terminal types.

/etc/profile can only be changed by the administrator.

The .profile file is the third file read at login time. It resides in a user's login
directory and enables a user to customize their individual working
environment. The .profile file overrides commands run and variables set and
exported by the /etc/profile file.

Ensure that newly created variables do not conflict with standard variables
such as MAIL, PS1, PS2 and so forth.

47
At startup time the shell checks to see if there is any new mail in
/usr/spool/mail/$LOGNAME. If there is then MAILMSG is echoed back. In
normal operation, the shell checks periodically.

The ENV="$HOME/.kshrc" variable will cause the file $HOME/.kshrc to be run


every time a new Korn shell is explicitly started. This file will usually contain
Korn shell specifics.

The .profile file is read only when the user logs in.

Be aware that your .profile file may not be read if you are accessing the
system through CDE (the Common Desktop Environment). By default, CDE
instead uses a file called .dtprofile. In the CDE environment, if you wish to
use the .profile file, it is necessary to uncomment the DTSOURCEPROFILE
variable assignment at the end of the .dtprofile file.

48
Module 5
Overview
The tilde (~) Expansion:

The C shell provides an easy way to abbreviate the pathname of your home
directory. When the tilde symbol (~) appears at the beginning of a word in
your command line, the shell replaces it with the full pathname of your login
directory.

Example:

% mv file ~/newfile

Is the abbreviated way of typing this

% mv file $home/newfile

The whence Command

The whence command can be used to determine exactly where the command
you specify is located. For instance, it may be a command located on the disk
drive, it may be an alias, or it may be built-in to the Korn shell. whence reports
the proper location.

whence

$ whence ls <works with basic commands>


/bin/ls

$ whence dir <works with aliases>


/bin/ls -al | more

$ whence echo <works with built-in commands>


echo

49
Aliases
Aliases in the Korn shell allow you to create your own commands. You can
simply rename existing commands, or you can group commands together to
create entirely new commands. This feature is also available in the C shell,
but the command syntax is slightly different.

The ksh syntax for alias commands:

alias name='value'

50
The ENV variable specifies a Korn shell script to be invoked every time a new
shell is created. The shell script in this example is .kshrc (which is the
standard name used), but any other filename can also be used.

The difference between .profile and .kshrc is that .kshrc is read each time a
subshell is spawned, whereas .profile is read once at login.

You can also set the following variable in $HOME/.profile:

EDITOR=/usr/bin/vi
export EDITOR

It will do the same thing that the set -o vi command does as shown in the
example.

The alias command invoked with no arguments prints the list of aliases in the
form name=value on standard output.

51
The Korn shell sets up a number of aliases by default. Notice that the history
and r commands are in fact aliases of the fc command. Once this alias is
established, typing an r will reexcute the previously entered command.

To carry down the value of an alias to subsequent subshells, the ENV variable
has to be modified. The ENV variable is normally set to $HOME/.kshrc in the
.profile file (although you can set ENV to any shell script). By adding the alias
definition to the .kshrc file (by using one of the editors) and invoking the
.profile file, the value of the alias will be carried down to all subshells, because
the .kshrc file is run every time a Korn shell is explicitly invoked.

The file pointed to by the ENV variable should contain Korn shell specifics.

The unalias command will cancel the alias named. The names of the aliases
specified with the unalias command will be removed from the alias list.

52
The /etc/environment file contains default variables set for each process.
Only the system administrator can change this file. PATH is the sequence of
directories that is searched when looking for a command whose path name is
incomplete.

TZ is the time zone information.

LANG is the locale name currently in effect.

LOCPATH is the full path name of the location of National Language Support
information, part of this being the National Language Support Table.
NLSPATH is the full path name for messages.

53
Module 6
The vi Editor

It is important to know vi for the following reasons:

• It is th e only editor available in maintenance mode on RISC System/6000


• S ta n d a rd e d ito r a cro ss a l U N IX syste m s
• C o m m a n d -line editing feature
• U se d a s d e fa u lt e d ito r fo r so m e p ro g ra m s

This unit covers only a subset of the vi functions. It is a very powerful editor.
Refer to the online documentation for additional functions.

vi does its editing in a buffer. When a session is initiated, one of two things
happens:

• If th e file to b e e d ite d e xis ts, a co p y o f th e file is p u t in to a b u ffe r in /tmp by


default.
• If the file does not exist, an empty buffer is opened for this session.

Tildes represent empty lines in the editor.

54
The editor starts in command mode.

55
56
Module 7
The Variables:
There are a number of variables automatically set by the shell when it starts.
These allow you to reference arguments on the command line.

User Variables

It is legal to assign any sequence of non-blank characters as the name of a


variable. The sample session below creates a variable called person and
initializes it with the string Richard.

It is important to note that you must NOT precede or follow the equal sign with
a space or TAB character.

Sample Session:

$person=Sreedhar

This sample session indicates that person does not represent the string
Richard. The string person is echoed as person. The BourneShell will only
do the substitution of the value of the variable when the name of the variable
is preceded with a dollar sign ($).

Sample Sesssion:

$echo person
person
$echo $person
Sreedhar
$

If you want to have imbedded spaces in a variable, it is necessary to quote


the string.

Sample Session:

$person=‘S re e d h a r a nd Venkatesh'
$echo $person
Sreedhar and Venkatesh
$

57
Shell variables are an integral part of shell programming. They provide the
ability to store and manipulating information within a shell program.

All shell variable names are case sensitive. For example, HOME and home
are not the same.

As a convention uppercase names are used for the standard variables set by
the system and lowercase is used for the variables set by the user.

58
The set command displays your current option settings for all the variables.
The set command is a built-in command of the shell, and therefore gives a
different output depending on the shell being run, for instance a Bourne or a
Korn shell.

The echo command displays the string of text to standard out (by default to
the screen).

To set a variable, use the = with NO SPACES on either side. Once the
variable has been set, to refer to the value of that variable precede the
variable name with a $. There must be NO SPACE between the $ and the
variable name.

59
Notice there need not be a space BEFORE the $ of the variable in order for
the shell to do variable substitution. Note, though, what happened when there
was no space AFTER the variable name. The shell searched for a variable
whose name was xylong, which did not exist. When a variable that has not
been defined is referenced, the user does not get an error. Rather a null string
is returned.

To eliminate the need for a space after the variable name, the curly braces { }
are used.

Note that the $ is OUTSIDE of the braces.

60
A variable can be set to the output of some command or group of commands
by using the backquotes (also referred to as grave accents). They should not
be mistaken for single quotes. In the examples the output of the date and
who commands are stored in variables.

The backquotes are supported by the bourne shell, C shell and Korn shell.
The use of $(command) is specific to the Korn shell.

Read-Only User Variables

The contents of the user variables and the shell variables can be modified by
the user. It is possible to assign a new value to them. The new value can be
assigned from the dollar ($) prompt or from inside a BourneShell script.

Read-only variables are different. The value of read-only variables can not be
changed.

The variable must be initialized to some value; and then, by entering the
following command, it can be made read only.

Command format: readonly variable_name

variable_name = name of the variable to be made read only

61
Sample Session:

$person=Sreedhar
$readonly person
$echo $person
Sreedhar
$person=Venkatesh
person: is read only
$

The readonly command given without any arguments will display a list of all
the read-only variables.

Sample Session:

$person=Sreedhar
$readonly person
$example=Venkatesh
$readonly example
$readonly
readonly person
readonly example
$

Read-Only Shell Variables

The read-only shell variables are similar to the read-only user variables;
except the value of these variables is assigned by the shell, and the user
CANNOT modify them.

Name of the Calling Program

The shell will store the name of the command you used to call a program in
the variable named $0.

It has the number zero because it appears before the first argument on the
command line.

Sample Session:

$cat name_ex
echo 'The name of the command used'
echo 'to execute this script was' $0

$name_ex
The name of the command used
to execute this script was name_ex

62
$

Arguments

The BourneShell will store the first nine command line arguments in the
variables named $1, $2, ..., $9. These variables appear in this section
because you cannot change them using the equal sign. It is possible to
modify them using the set command.

Sample Session:

$cat arg_ex
echo 'The first five command line'
echo 'arguments are' $1 $2 $3 $4 $5
$arg_ex Sreedhar Venkatesh Santhosh
The first five command line
arguments are Sreedhar venkatesh Santhosh
$

The script arg_ex will display the first five command-line arguments. The
variables representing $4 and $5 have a null value.

The BourneShell variable $* represents all of the command-line arguments as


shown in the following example.

Sample Session:

$cat display_all
echo $*
$display_all Sreedhar venkatesh Santhosh
Sreedhar venkatesh Santhosh
$

The BourneShell variable $# contains the number of arguments on the


command line. This is a string variable that represents a decimal number.
You can use the expr utility to perform calculations with that number and test
to perform logical tests on it.

Sample Session:

$cat num_args
echo 'This script was called with'
echo $# 'arguments'
$num_args Sreedhar venkatesh Santhosh
This script was called with
3 arguments
$

63
BourneShell Environment - Exporting Variables

Within a process, you can declare, initialize, read, and modify variables. The
variable is local to that process. W hen a process forks a child process, the
parent process does not automatically pass the value of the variable to the
child process.

Here is an example of the variables not being exported.

Sample Session:

$cat no_export
car=mercedes # set the variable
echo $0 $car $$ # $0 = name of file executed
# $car =value of variable car
# $$ = PID number (process id)
inner # execute another BourneShell script
echo $0 $car $$ # display same as above
$cat inner
echo $0 $car $$ # display variables for this process
$chmod a+x no_export
$chmod a+x inner
$no_export
no_export mercedes 4790
inner 4792
no_export mercedes 4790
$

When no_export was executed, it, of course, assigned a value of mercedes to


the variable car and printed it out. The call to inner created a child process.
Its PID is 4792, while the parent PID is 4790. Notice, when inner tried to print
the value of car, it printed nothing. The reason is because the value of car
was not passed by the parent.

Can the value be passed from parent to child process? Yes, by using the
export command. Let's look at an example.

Sample Session:

$cat export_it
car=mercedes
export car
echo $0 $car $$
inner1
echo $0 $car $$
$cat inner1
echo $0 $car $$
car=chevy

64
echo $0 $car $$
$chmod a+x export_it
$chmod a+x inner1
$export_it
export_it mercedes 4798
inner1 mercedes 4800
inner1 chevy 4800
export_it mercedes 4798
$

In the export_it BourneShell script, the variable car was initialized to


mercedes; and then it was exported. This means that the value of car is now
available to a child process. When inner1 prints out the value of car it has the
value of mercedes. This is as we expect because the value of car was
exported from the parent. The next line of inner1 changes the value of car to
chevy. This is shown in the next line of the sample session. The last line of
the session shows the return to the parent process and the value is still
mercedes. How is this possible?

Exporting variables is only valid from the parent to the child process. The
child process cannot change the parent's variable.

Reading Input Into a Shell Variable

The BourneShell script can read user input from standard input. The read
command will read one line from standard input and assign the line to one or
more variables. The following example shows how this works.

Sample Session:

$cat read_script
echo "Please enter a string of your choice"
read a
echo $a
$

This simple script will read one line from standard input (keyboard) and assign
it to the variable a.

Sample Session:

$read_script
Please enter a string of your choice
Here it is
Here it is
$

65
The line read from standard input can also be assigned to several variables
as shown in the following example.

Sample Session:

$cat reads
echo "Please enter three strings"
read a b c
echo $a $b $c
echo $c
echo $b
echo $a
$

This time, we will turn on the trace mechanism and follow the execution of this
BourneShell script.

Sample Session:

$sh -x reads
+ echo Please enter three strings
Please enter three strings
+ read a b c
this is more than three strings
+ echo this is more than three strings
this is more than three strings
+ echo more than three strings
more than three strings
+ echo is
is
+ echo this
this
$

It is interesting to note that the spaces separate the values for the variables
a,b, and c. For example, the variable a was assigned the string this, the
variable b was assigned the string is, and the remainder of the line was
assigned to c (including the spaces).

Sample Session:

$cat read_ex
echo 'Enter line: \c'
read line
echo "The line was: $line"
$

66
In this example, the \c option will suppress the carriage return.
The single quote marks protect the backslash from being interpreted
by the shell. Also notice that the double quote marks have no
effect on the substitution of the variable line.

Sample Session:

$read_ex
Enter line: All's well that ends well
The line was: All's well that ends well
$

67
Module 8
Parameters:
A shell is invoked by typing its name. Parameters are passed to the script by
appending them to the script name, with spaces as separators.

POSITIONAL PARAMETERS

A BourneShell script can also read in command-line arguments. The first


argument is referred to as $1, the second is $2, and so on. Command-line
arguments are referred to as positional parameters.

Let's look at an example BourneShell script to see how these are used.

Sample Session:

$cat neat_shell
echo $1 $2 $3
echo $0 is the name of the shell script
echo "There were $# arguments."
echo $*
$

Insure that the BourneShell script is executable by issuing this command:

Sample Session:

$chmod a+x neat_shell


$

Now, if we type the name of the BourneShell script with no arguments, we get
the following results.

Sample Session:

$neat_shell

neat_shell is the name of the shell script


There were 0 arguments.

68
In this sample session, there were no arguments given so none were printed.
$0 is the positional parameter that refers to the name of the script. Since
there were no arguments given with this invocation of neat_shell, there were
zero arguments listed.

$0: The Name of the Invoking Command

The special variable $0 represents the name of the executing program. The
following shell, if called script.sh would output This program is called
script.sh.:

#!/bin/sh
echo This program is called $0.
exit 0

$1 $2 $3 ... $9, $*: Shell Parameters

The first parameter to the shell is known as $1, the second as $2, etc. The
collection of ALL parameters is known as $*.

Consider the following as an example (file prog):

#!/bin/sh
echo the first parameter is $1
echo the second parameter is $2
echo the collection of ALL parameters is $*
exit 0

The output of that program could be:

sh_prompt;SPMgt; prog first second


the first parameter is first
the second parameter is second
the collection of ALL parameters is first second
sh_prompt;SPMgt;

$#: Number of Parameters

The number of parameters used can be obtained by looking at the value of


$#.

Setting values of positional Parameters


Though we have compared the positional parameters with variables, they are
in essence quite different. For insta n ce yo u ca n ‘t a ssig n va lu e s to $ 1 , $ 2 ..

69
etc. as we do to any other user-defined variables, or system variables for that
matter.

Saying a=10 or b=alpha is fine but $1=dollar or $2=100 is simply not done.
There is one way to assign values to the positional parameters using the set
command.

$ set Friends come and go, but enemies accumulate

T h e a b o ve co m m a n d se ts th e va lu e $ 1 w ith ‗F rie n d s‘, $ 2 w ith ‗co m e ‘


a n d so
on. To verify, we use the echo statement to display their values.

$ echo $1 $2 $3 $4 $5 $6 $7
Friends come and go, but enemies accumulate

Using shift: Shifts Parameters


When a large number of parameters (more than 9) are passed to the shell,
shift can be used to read those parameters. If the number of parameters to be
read is known, say three, a program similar to the following could be written:

#!/bin/sh
echo The first parameter is $1.
shift
echo The second parameter is $1.
shift
echo The third parameter is $1.
exit 0

Obviously the above example contains redundancy, especially if there are a


large number of parameters.

To solve this problem: use a for or while loop.

70
Module 9
Regular Expresiion:
What is a Regular Expression?

A regular expression is a set of characters that specify a pattern. The term


"regular" has nothing to do with a high-fiber diet. It comes from a term used to
describe grammars and formal languages.

Regular expressions are used when you want to search for specify lines of
text containing a particular pattern. Most of the UNIX utilities operate on ASCII
files a line at a time. Regular expressions search for patterns on a single line,
and not for patterns that start on one line and end on another.

It is simple to search for a specific word or string of characters. Almost every


editor on every computer system can do this. Regular expressions are more
powerful and flexible. You can search for words of a certain size. You can
search for a word with four or more vowels that end with an "s." Numbers,
punctuation characters, you name it, a regular expression can find it. W hat
happens once the program you are using find it is another matter. Some just
search for the pattern. Others print out the line containing the pattern. Editors
can replace the string with a new pattern. It all depends on the utility.

Regular expressions confuse people because they look a lot like the file
matching patterns the shell uses. They even act the same way--almost. The
square brackers are similar, and the asterisk acts similar to, but not identical
to the asterisk in a regular expression. In particular, the Bourne shell, C shell,
find, and cpio use file name matching patterns and not regular expressions.

The Structure of a Regular Expression

There are three important parts to a regular expression. Anchors are used to
specify the position of the pattern in relation to a line of text. Character Sets
match one or more characters in a single position. Modifiers specify how
many times the previous character set is repeated. A simple example that
demonstrates all three parts is the regular expression "^#*." The up arrow is
an anchor that indicates the beginning of the line. The character "#" is a
simple character set that matches the single character "#." The asterisk is a
modifier. In a regular expression it specifies that the previous character set
can appear any number of times, including zero. This is a useless regular
expression, as you will see shortly.

There are also two types of regular expressions: the "Basic" regular
expression, and the "extended" regular expression. A few utilities like awk and
egrep use the extended expression. Most use the "regular" regular

71
expression. From now on, if I talk about a "regular expression," it describes a
feature in both types.

Here is a table of the Solaris (around 1991) commands that allow you to
specify regular expressions:

Utility Regular Expression Type


vi Basic
sed Basic
grep Basic
csplit Basic
dbx Basic
dbxtool Basic
more Basic
ed Basic
expr Basic
lex Basic
pg Basic
nl Basic
rdist Basic
awk Extended
nawk Extended
egrep Extended
EMACS EMACS Regular Expressions
PERL PERL Regular Expressions

The Anchor Characters: ^ and $

Most UNIX text facilities are line oriented. Searching for patterns that span
several lines is not easy to do. You see, the end of line character is not
included in the block of text wthat is searched. It is a separator. Regular
expressions examine the text between the separators. If you want to search
for a pattern that is at one end or the other, you use anchors. The character
"^" is the starting anchor, and the character "$" is the end anchor. The regular
expression "^A" will match all lines that start with a capital A. The expression
"A$" will match all lines that end with the capital A. If the anchor characters
are not used at the proper end of the pattern, then they no longer act as
anchors. That is, the "^" is only an anchor if it is the first character in a regular
expression. The "$" is only an anchor if it is the last character. The expression
"$1" does not have an anchor. Neither is "1^." If you need to match a "^" at the
beginning of the line, or a "$" at the end of a line, you must escape the special
characters with a back slash. Here is a summary:

72
Pattern Matches
^A "A" at the beginning of a line
A$ "A" at the end of a line
A^ "A^" anywhere on a line
$A "$A" anywhere on a line
^^ "^" at the beginning of a line
$$ "$" at the end of a line

The use of "^" and "$" as indicators of the beginning or end of a line is a
convention other utilities use. The vi editor uses these two characters as
commands to go to the beginning or end of a line. The C shell uses "!^" to
specify the first argument of the previous line, and "!$" is the last argument on
the previous line.

It is one of those choices that other utilities go along with to maintain


consistancy. For instance, "$" can refer to the last line of a file when using ed
and sed. Cat -e marks end of lines with a "$." You might see it in other
programs as well.

Matching a character with a character set

The simplest character set is a character. The regular expression "the"


contains three character sets: "t," "h" and "e." It will match any line with the
string "the" inside it. This would also match the word "other." To prevent this,
put spaces before and after the pattern: " the ." You can combine the string
with an anchor. The pattern "^From: " will match the lines of a mail message
that identify the sender. Use this pattern with grep to print every address in
your incoming mail box:

grep '^From: ' /usr/spool/mail/$USER

Some characters have a special meaning in regular expressions. If you want


to search for such a character, escape it with a back slash.

Match any character with .

The character "." is one of those special meta-characters. By itself it will match
any character, except the end-of-line character. The pattern that will match a
line with a single characters is
^.$

Specifying a Range of Characters with [...]

If you want to match specific characters, you can use the square brackets to
identify the exact characters you are searching for. The pattern that will match
any line of text that contains exactly one number is

73
^[0123456789]$

This is verbose. You can use the hyphen between two characters to specify a
range:

^[0-9]$

You can intermix explicit characters with character ranges. This pattern will
match a single character that is a letter, number, or underscore:

[A-Za-z0-9_]

Character sets can be combined by placing them next to each other. If you
wanted to search for a word that

1. Started with a capital letter "T."


2. Was the first word on a line
3. The second letter was a lower case letter
4. Was exactly three letters long, and
5. The third letter was a vowel

the regular expression would be "^T[a-z][aeiou] ."

Exceptions in a character set

You can easily search for all characters except those in square brackets by
putting a "^" as the first character after the "[." To match all characters except
vowels use "[^aeiou]."

Like the anchors in places that can't be considered an anchor, the characters
"]" and "-" do not have a special meaning if they directly follow "[." Here are
some examples:

Regular Expression Matches


[] The characters "[]"
[0] The character "0"
[0-9] Any number
[^0-9] Any character other than a number
[-0-9] Any number or a "-"
[0-9-] Any number or a "-"
[^-0-9] Any character except a number or a "-"
[]0-9] Any number or a "]"
[0-9]] Any number followed by a "]"
[0-9-z] Any number,
or any character between "9" and "z".
[0-9\-a\]] Any number, or

74
a "-", a "z", or a "]"

Repeating character sets with *

The third part of a regular expression is the modifier. It is used to specify how
may times you expect to see the previous character set. The special character
"*" matches zero or more copies. That is, the regular expression "0*"

matches zero or more zeros, while the expression "[0-9]*" matches zero or
more numbers.

This explains why the pattern "^#*" is useless, as it matches any number of
"#'s" at the beginning of the line, including zero. Therefore this will match
every line, because every line starts with zero or more "#'s."

At first glance, it might seem that starting the count at zero is stupid. Not so.
Looking for an unknown number of characters is very important. Suppose you
wanted to look for a number at the beginning of a line, and there may or may
not be spaces before the number. Just use "^ *" to match zero or more spaces
at the beginning of the line. If you need to match one or more, just repeat the
character set. That is, "[0-9]*" matches zero or more numbers, and "[0-9][0-
9]*" matches one or more numbers.

Matching a specific number of sets with \{ and \}

You can continue the above technique if you want to specify a minimum
number of character sets. You cannot specify a maximum number of sets with
the "*" modifier. There is a special pattern you can use to specify the minimum
and maximum number of repeats. This is done by putting those two numbers
between "\{" and "\}." The back slashes deserve a special discussion.
Normally a backslash turns off the special meaning for a character. A period
is matched by a "\." and an asterisk is matched by a "\*."

If a backslash is placed before a "<," ">," "{," "}," "(," ")," or before a digit, the
back slash turns on a special meaning. This was done because these special
functions were added late in the life of regular expressions. Changing the
meaning of "{" would have broken old expressions. This is a horrible crime
punishable by a year of hard labor writing COBOL programs. Instead, adding
a back slash added functionality without breaking old programs. Rather than
complain about the unsymmetry, view it as evolution.

Having convinced you that "\{" isn't a plot to confuse you, an example is in
order. The regular expression to match 4, 5, 6, 7 or 8 lower case letters is

[a-z]\{4,8\}

Any numbers between 0 and 255 can be used. The second number may be
omitted, which removes the upper limit. If the comma and the second number

75
are omitted, the pattern must be duplicated the exact number of times
specified by the first number.

You must remember that modifiers like "*" and "\{1,5\}" only act as modifiers if
they follow a character set. If they were at the beginning of a pattern, they
would not be a modifier. Here is a list of examples, and the exceptions:

Regular Expression Matches


_
* Any line with an asterisk
\* Any line with an asterisk
\\ Any line with a back slash
^* Any line starting with an asterisk
^A* Any line
^A\* Any line starting with an "A*"
^AA* Any line if it starts with one "A"
^AA*B Any line with one or more "A"'s followed by a "B"
^A\{4,8\}B Any line starting with 4, 5, 6, 7 or 8 "A"'s
followed by a "B"
^A\{4,\}B Any line starting with 4 or more "A"'s
followed by a "B"
^A\{4\}B Any line starting with "AAAAB"
\{4,8\} Any line with "{4,8}"
A{4,8} Any line with "A{4,8}"

Matching words with \< and \>

Searching for a word isn't quite as simple as it at first appears. The string "the"
will match the word "other." You can put spaces before and after the letters
and use this regular expression: " the ." However, this does not match words
at the beginning or end of the line. And it does not match the case where
there is a punctuation mark after the word.

There is an easy solution. The characters "\<" and "\>" are similar to the "^"
and "$" anchors, as they don't occupy a position of a character. They do
"anchor" the expression between to only match if it is on a word boundary.
The pattern to search for the word "the" would be "\<[tT]he\>." The character
before the "t" must be either a new line character, or anything except a letter,
number, or underscore. The character after the "e" must also be a character
other than a number, letter, or underscore or it could be the end of line
character.

76
Backreferences - Remembering patterns with \(, \) and \1

Another pattern that requires a special mechanism is searching for repeated


words. The expression "[a-z][a-z]" will match any two lower case letters. If you
wanted to search for lines that had two adjoining identical letters, the above
pattern wouldn't help. You need a way of remembering what you found, and
seeing if the same pattern occurred again. You can mark part of a pattern
using "\(" and "\)." You can recall the remembered pattern with "\" followed by
a single digit. Therefore, to search for two identical letters, use "\([a-z]\)\1."
You can have 9 different remembered patterns. Each occurrence of "\(" starts
a new pattern. The regular expression that would match a 5 letter palindrome,
(e.g. "radar"), would be

\([a-z]\)\([a-z]\)[a-z]\2\1

Potential Problems

That completes a discussion of the Basic regular expression. Before I discuss


the extensions the extended expressions offer, I wanted to mention two
potential problem areas.

The "\<" and "\>" characters were introduced in the vi editor. The other
programs didn't have this ability at that time. Also the "\{min,max\}" modifier is
new and earlier utilities didn't have this ability. This made it difficult for the
novice user of regular expressions, because it seemed each utility has a
different convention. Sun has retrofited the newest regular expression library
to all of their programs, so they all have the same ability. If you try to use
these newer features on other vendor's machines, you might find they don't
work the same way.

The other potential point of confusion is the extent of the pattern matches.
Regular expressions match the longest possible pattern. That is, the regular
expression

A.*B

matches "AAB" as well as "AAAABBBBABCCCCBBBAAAB." This doesn't


cause many problems using grep, because an oversight in a regular
expression will just match more lines than desired. If you use sed, and your
patterns get carried away, you may end up deleting more than you wanted
too.

Extended Regular Expressions

Two programs use the extended regular expression: egrep and awk. With
these extensions, those special characters preceded by a back slash no
longer have the special meaning: "\{," "\}," "\<," "\>," "\(," "\)" as well as the
"\digit." There is a very good reason for this, which I will delay explaining to
build up suspense.

77
The character "?" matches 0 or 1 instances of the character set before, and
the character "+" matches one or more copies of the character set. You can't
use the \{ and \} in the extended regular expressions, but if you could, you
might consider the "?" to be the same as "\{0,1\}" and the "+" to be the same
as "\{1,\}."

By now, you are wondering why the extended regular expressions is even
worth using. Except for two abbreviations, there are no advantages, and a lot
of disadvantages. Therefore, examples would be useful.

The three important characters in the expanded regular expressions are "(,"
"|," and ")." Together, they let you match a choice of patterns. As an example,
you can egrep to print all From: and Subject: lines from your incoming mail:

egrep '^(From|Subject): ' /usr/spool/mail/$USER

All lines starting with "From:" or "Subject:" will be printed. There is no easy
way to do this with the Basic regular expressions. You could try
"^[FS][ru][ob][mj]e*c*t*: " and hope you don't have any lines that start with
"Sromeet:." Extended expressions don't have the "\<" and "\>" characters. You
can compensate by using the alternation mechanism. Matching the word "the"
in the beginning, middle, end of a sentence, or end of a line can be done with
the extended regular expression:

(^| )the([^a-z]|$)

There are two choices before the word, a space or the beginining of a line.
After the word, there must be something besides a lower case letter or else
the end of the line. One extra bonus with extended regular expressions is the
ability to use the "*," "+," and "?" modifiers after a "(...)" grouping. The
following will match "a simple problem," "an easy problem," as well as "a
problem."

egrep "a[n]? (simple|easy)? problem" data

I promised to explain why the back slash characters don't work in extended
regular expressions. Well, perhaps the "\{...\}" and "\<...\>" could be added to
the extended expressions. These are the newest addition to the regular
expression family. They could be added, but this might confuse people if
those characters are added and the "\(...\)" are not. And there is no way to add
that functionality to the extended expressions without changing the current
usage. Do you see why? It's quite simple. If "(" has a special meaning, then
"\(" must be the ordinary character. This is the opposite of the Basic regular
expressions, where "(" is ordinary, and "\(" is special. The usage of the
parentheses is incompatable, and any change could break old programs.

If the extended expression used "( ..|...)" as regular characters, and "\(...\|...\)"
for specifying alternate patterns, then it is possible to have one set of regular
expressions that has full functionality. This is exactly what GNU emacs does,
by the way.

78
The rest of this is random notes.

Regular
Class Type Meaning
Expression
_
A single character (except
. all Character Set
newline)
^ all Anchor Beginning of line
$ all Anchor End of line
[...] all Character Set Range of characters
* all Modifier zero or more duplicates
\< Basic Anchor Beginning of word
\> Basic Anchor End of word
\(..\) Basic Backreference Remembers pattern
\1..\9 Basic Reference Recalls pattern
_+ Extended Modifier One or more duplicates
? Extended Modifier Zero or one duplicate
\{M,N\} Extended Modifier M to N Duplicates
(...|...) Extended Anchor Shows alteration
_
\(...\|...\) EMACS Anchor Shows alteration
\w EMACS Character set Matches a letter in a word
\W EMACS Character set Opposite of \w

79
Module 10
A Sample Shell Script

This visual shows another way of invoking a shell script. This method relies on
the user first making the script an executable file with the chmod command.

After this step the script can be invoked by its name.

Note that the shell uses the PATH variable to find executable files. If you get
an error message like the following,

$ hello
ksh: hello: not found

check your PATH variable. The directory in which the shell script is stored
must be defined in the PATH variable.

80
Each shell script is executed in a subshell. Variables defined in a shell script
cannot be passed back to the parent shell.

If you invoke a shell script with a . (dot), it runs in the current shell. Variables
defined in this script (dir1, dir2) are therefore defined in the current shell.

Every process gives back an exit status to its parent process. Per convention
0 is given back when the process ended successfully and not equal 0 in all
other cases.

81
To find out the exit code of a completed command, use echo $?:

$ date
$ echo $?
0
$_

This shows successful execution of the date command. The visual shows an
example for an unsuccessful execution of a command.

CONTROL CONSTRUCTS:

The BourneShell control constructs can alter the flow of control within the
script. The BourneShell provides simple two-way branch if statements and
multiple-branch case statements, plus for, while, and until statements.

In discussing these control structures, the BourneShell keywords will be in


bold type and the normal type are the user supplied items to cause the
desired effect in command format boxes.

Types of Tests Used with Control Constructs:

The test utility evaluates expressions and returns a condition indicating


whether or not the expression is true (equal to zero) or false (not equal to
zero). There are no options with this utility. The format for this utility is as
follows:

Command Format: test expression

expression - composed of constants, variables, and


operators

Expressions will be looked at in greater detail later with some examples.


There are a few items that need to be mentioned that apply to expressions.
Expressions can contain one or more evaluation criteria that test will evaluate.
A -a that separates two criteria is a logical AND operator. In this case, both
criteria must evaluate to true in order for test to return a value of true. The -o
is the logical OR operator. When this operator separates two criteria, one or
the other (or both) must be true for test to return a true condition.

You can negate any criterion by preceding it with an exclamation mark (!).
Parentheses can be used to group criteria. If there are no parentheses, the -a
(logical AND operator) takes precedence over the -o (logical OR operator).
The test utility will evaluate operators of equal precedence from left to right.

82
Within the expression itself, you must put special characters, such as
parentheses, in quote marks so the BourneShell will not evaluate them but will
pass them to test.

Since each element (evaluation criterion, string, or variable) in an expression


is a separate argument, each must be separated by a space.

The test utility will work from the command line but it is more often used in a
script to test input or verify access to a file.

Another way to do the test evaluation is to surround the expression with left
and right brackets. A space character must appear after the left bracket and
before the right bracket.

test expression = [ expression ]

Test on Numeric Values

Test expressions can be in many different forms. The expressions can appear
as a set of evaluation criteria. The general form for testing numeric values is:

int1 op int2

This criterion is true if the integer int1 has the specified algebraic relationship
to integer int2.

The valid operators (op) are:

-eq equal

-ne not equal

-gt greater than

-lt less than

-ge greater than or equal

-le less than or equal

Test on Character Strings

The evaluation criterion for character strings is similar to numeric


comparisons. The general form is:

string1 op string2

83
The operators (op) are:

string1 = string2 true if string1 and string 2 are equal

string1 != string2 true if string1 and string2 are not equal

string1 true if string1 is not the null string

Sample Session:

$cat test_string
number=1
numero=0001
if test $number = $numero
then echo "String vals for $number and $numero are ="
else echo "String vals for $number and $numero not ="
fi
if test $number -eq $numero
then echo "Numeric vals for $number and $numero are ="
else echo "Numeric vals for $number and $numero not ="
fi

$chmod 755 test_string

$sh -x test_string
number=1
numero=0001
+ test 1 = 0001
+ echo String vals for 1 and 0001 not =
String vals for 1 and 0001 not =
+ test 1 -eq 0001
+ echo Numeric vals for 1 and 0001 are =
Numeric vals for 1 and 0001 are =

$test_string
String vals for 1 and 0001 not =
Numeric vals for 1 and 0001 are =

Test on File Types

The test utility can be used to determine information about file types. All of
the criterion can be found in Appendix B. A few of them are listed here:

-r filename true if filename exists and is readable

-w filename true if filename exists and is writable

84
-x filename true if filename exists and is executable

-f filename true if filename exists and it is a plain file

-d filename true if filename exists and it is a directory.

-s filename true if filename exits and it contains


information (has a size greater than 0
bytes)

Example:
$test -d new_dir

If new_dir is a directory, this criterion will evaluate to true. If it does not exist,
then it will be false.

Taking Decisions using if then

The format for this construct is:

Command Format: if expression


then commands
fi

The if statement evaluates the expression and then returns control based on
this status. The fi statement marks the end of the if, notice that fi is if spelled
backward.

The if statement executes the statements immediately following it if the


expression returns a true status. If the return status is false, control will
transfer to the statement following the fi.

Sample Session:

$cat check_args
if (test $# = 0)
then echo 'Please supply at least 1 argument'
exit
fi
echo 'Program is running'
$

85
This little script will check to insure that you are giving at least one argument.
If none are given it will display the error message and exit. If one or more
arguments are given it will display "Program is running" and run the rest of the
script, if any.

Sample Session:

$check_args
Please supply at least 1 argument
$check_args xyz
Program is running
$

Taking Decision using if then else

The format for this construct is:

Command Format: if expression


then commands
else commands
fi

The else part of this structure makes the single-branch if statement into a two-
way branch. If the expression returns a true status, the commands between
the then and the else statement will be executed. After these have been
executed, control will start again at the statement after the fi.

If the expression returns false, the commands following the else statement will
be executed.

Sample Session:

$cat test_string
number=1
numero=0001
if test $number = $numero
then echo "String values of $number and $numero are equal"
else echo "String values of $number and $numero not equal"
fi
if test $number -eq $numero
then echo "Numeric values of $number and $numero are equal"
else echo "Numeric values of $number and $numero not equal"
fi
$

86
Taking Decision using if then elif

The format for this construct is:

Command Format: if expression


then commands
elif expression
then commands
else commands
fi

The elif construct combines the else and if statements and allows you to
construct a nested set of if then else structures.

The case control Structure

The format for this construct is:

Command Format: case test-string in


pattern-1 ) commands-1 ;;
pattern-2 ) commands-2 ;;
pattern-3 ) commands-3 ;;
.
.
.
*) commands ;;
esac

The case structure allows a multiple-branch decision mechanism. The path


that is taken depends on a match between the test-string and one of the
patterns.

Sample Session:

$cat case_ex
echo 'Enter A, B, or C: \c'
read letter
case $letter in
A) echo 'You entered A' ;;
B) echo 'You entered B' ;;
C) echo 'You entered C' ;;
*) echo 'You did not enter A, B, or C' ;;
esac
$chmod a+x case_ex
$case_ex

87
Enter A, B, or C: B
You entered B
$case_ex
Enter A, B, or C: b
You did not enter A, B, or C
$

This example uses the value of a character that the user entered as the test
string. The value is represented by the variable letter. If letter has the value
of A, the structure will execute the command following A. If letter has a value
of B or C, then the appropriate commands will be executed. The asterisk
indicates any string of characters; and it, therefore, functions as a catchall for
a no-match condition. The lowercase b in the second sample session is an
example of a no match condition.

The Loop Control Structure


The for Loop:

The format for this construct is:

Command Format: for loop-index in argument-list


do
commands
done

This structure will assign the value of the first item in the argument list to the
loop index and executes the commands between the do and done
statements. The do and done statements indicate the beginning and end of
the for loop.

After the structure passes control to the done statement, it assigns the value
of the second item in the argument list to the loop index and repeats the
commands. The structure will repeat the commands between the do and
done statements once for each argument in the argument list. When the
argument list has been exhausted, control passes to the statement following
the done.

Sample Session:

$cat find_henry1
for x in project1 project2 project3
do
grep henry $x
done

88
Sample Session:

$head project?
==> project1 <==
henry
joe
mike
sue

==> project2 <==


joe
mike
sue

==> project3 <==


joe
mike
sue
henry

==> project4 <==


joe
mike

$find_henry
henry
henry
$

Each file in the argument list was searched for the string, henry. When a
match was found, the string was printed.

The while Loop

The format for this construct is:

Command Format: while expression


do
commands
done

As long as the expression returns a true exit status, the structure continues to
execute the commands between the do and the done statement. Before each
loop through the commands, the structure executes the expression. When

89
the exit status of the expression is false (non-zero), control is passed to the
statement following the done statement.

The commands to be executed must change the expression test or an infinite


loop can result.

The until Loop

The format for this construct is:

Command Format: until expression


do
commands
done

The until and while structures are very similar. The only difference is that the
test is at the top of the loop. The until structure will continue to loop until the
expression returns true or a nonerror condition. The while loop will continue
as long as a true or nonerror condition is returned.

The commands to be executed must change the expression test or an infinite


loop can result.

Sample Session:

$cat until_ex
secretname='jenny'
name='noname'
echo 'Try to guess the secret name!'
echo
until (test "$name" = "$secretname")
do
echo 'Your guess: \c'
read name
done
echo 'You did it!'
$

The until loop will continue until name is equal to the secret name.

Sample Session:

$chmod a+x until_ex


$until_ex
Try to guess the secret name!

90
Your guess: gaylan
Your guess: art
Your guess: richard
Your guess: jenny
You did it!
$

The break, and continue Statement

The break and continue loop control commands correspond exactly to their
counterparts in other programming languages. The break command
terminates the loop (breaks out of it), while continue causes a jump to the next
iteration (repetition) of the loop, skipping all the remaining commands in that
particular loop cycle.

#!/bin/bash

LIMIT=19 # Upper limit

echo
echo "Printing Numbers 1 through 20 (but not 3 and 11)."

a=0

while [ $a -le "$LIMIT" ]


do
a=$(($a+1))

if [ "$a" -eq 3 ] || [ "$a" -eq 11 ] # Excludes 3 and 11.


then
continue # Skip rest of this particular loop iteration.
fi

echo -n "$a " # This will not execute for 3 and 11.
done

# Exercise:
# Why does loop print up to 20?

echo; echo

echo Printing Numbers 1 through 20, but something happens after 2.

##############################################################
####

# Same loop, but substituting 'break' for 'continue'.

a=0

91
while [ "$a" -le "$LIMIT" ]
do
a=$(($a+1))

if [ "$a" -gt 2 ]
then
break # Skip entire rest of loop.
fi

echo -n "$a "


done

echo; echo; echo

exit 0

The break command may optionally take a parameter. A plain break


terminates only the innermost loop in which it is embedded, but a break N
breaks out of N levels of loop.

92
Module 11
Useful Utilities for Shells
cat - concatenate a file

Display the contents of a file with the concatenate command, cat.

Syntax cat [options] [file]

Common Options

-n precede each line with a line number


-v display non-printing characters, except tabs, new-lines, and
form-feeds
-e display $ at the end of each line (prior to new-line)
(when used with -v option)

Examples % cat filename

You can list a series of files on the command line, and cat will concatenate
them, starting each in turn, immediately after completing the previous one,
e.g.:

% cat file1 file2 file3

DATE

 The date command gives the current date and time.



Syntax
d ate[“+ fo rm at-S trln g ”]

Example
$date

Its output is as follows

Mon Nov 27 11:24:35 EST 2006

 The date format can also be used with format-string

The format-string can include following format-characters.

%D  Gives date in MM/DD/YY format

93
%T  Gives time as HH:MM:SS

%H  Gives Hour from 00 to 23

%M  Gives Minute from 00 to 59

%S  Gives Second from 00 to 59

%m  Gives month of the year

%d  Gives day of the month

%y  Gives last two digits of the year.

Example
S date "+DATE IS%D TIME IS %T"

Will give the output as: DATE IS 1 1/27/06 TIME IS 16:38:23

Example
$ date "+DAY %d MONTH %m YEAR %y"

will give the output as: DAY 27 MONTH 1 1 YEAR 06

The find command recursively searches the directory tree for each specified
path, seeking files that match a Boolean expression written using the terms
given in the text that follows the expression. The output from the find
command depends on the terms specified by the final parameter.

94
Note that the -print option is the default so is not required. This was not
always the case. In earlier versions of AIX and on other UNIX systems that
have not yet implemented the POSIX standard for the find command, the -
print option is required for the result to be displayed or used in a pipe.

The command following -exec, in this example ls, is executed for each file
name found.

95
find replaces the {} with the names of the files matched. It is used as a
placeholder for matches.

Note use of the escaped ; to terminate the command that find is to execute.

The find command may also be used with a -ls option; that is, $ find . -name
'm*' -ls.

Note that the -exec option is non-interactive.

The \; is hard coded with the find command. This is required for use with -
exec and -ok.

It is a good idea to use the -ok option rather than -exec if there are not a lot of
files that match the search criteria. It is a lot safer if your pattern is not exactly
what you think it is.

96
The grep command searches for the pattern specified and writes each
matching line to standard output.

The search can be for simple text, like a string or a name. grep can also look
for logical constructs, called regular expressions, that use patterns and
wildcards to symbolize something special in the text, for example, only lines
that start with an uppercase T.

The command displays the name of the file containing the matched line, if
more than one file is specified for the search.

97
98
99
On-Line Documentation:

The UNIX manual, usually called man pages, is available on-line to explain
the usage of the UNIX system and commands. To use a man page, type the
command "man" at the system prompt followed by the command for which
you need information.

Syntax
man [options] command_name

Common Options

-k keyword list command synopsis line for all keyword


matches

-M path path to man pages

-a show all matching man pages (SVR4)

Backup using tar

Another program used to read and write files associated with an archive is tar.
Some of the available options are

-A Append files to an archive


-c Create a new archive
-f Name of archive
-P Keep absolute paths of files
-t List the files in an archive
-v Verbose mode
-x Extract files from an archive
-z Compress/decompress files using gzip

100
gzip

This reduces the size of a file, thus freeing valuable disk space. For example,
type

% ls -l science.txt

and note the size of the file using ls -l . Then to compress science.txt, type

% gzip science.txt

This will compress the file and place it in a file called science.txt.gz

To see the change in size, type ls -l again.

To expand the file, use the gunzip command.

% gunzip science.txt.gz

nslookup

Nslookup is a program to query Internet domain name servers. Nslookup


has two modes: interactive and non-interactive. Interactive mode allows the
user to query name servers for information about various hosts and domains
or to print a list of hosts in a domain. Non-interactive mode is used to print just
the name and requested information for a host or domain.

nslookup host

domain name, IP address, and alias information for the given host.
e.g., nslookup www.kent.edu gives related data for www.kent.edu

Cut command.

cut command selects a list of columns or fields from one or more files.
Option -c is for columns and -f for fields. It is entered as
cut options [files]
for example if a file named testfile contains

this is firstline
this is secondline
this is thirdline

Examples:
cut -c1,4 testfile will print this to standard output (screen)
ts

101
ts
ts
It is printing columns 1 and 4 of this file which contains t and s (part of this).

Options:

 -c list cut the column positions identified in list.


 -f list will cut the fields identified in list.
 -s could be used with -f to suppress lines without delimiters.

Awk and Sed

Awk is a programming language that can be applied to data-manipulation and


computing tasks on a UNIX operating system. Sed, a stream editor, acts like
a filter by executing a group of editing instructions for a text file.

Examples:

df -t | awk 'BEGIN {tot=0} $2 == "total" {tot=tot+$1} END {print


(tot*512)/1000000}'

Will give total space in your system in megabytes.


Here the output of command df -t is being passed into awk which is counting
the field 1 after pattern "total" appears. Same way if you change $1 to $4 it will
accumulate and display the addition of field 4

sed command launches a stream line editor which you can use at command
line.

you can enter your sed commands in a file and then using -f option edit your
text file. It works as

sed [options] files

options:

 -e 'instruction' Apply the editing instruction to the files.


 -f script Apply the set of instructions from the editing script.
 -n suppress default output.

for more information about sed, enter man sed at command line in your
system.

102
Module 12
Arithmetic on Shell Variables
 A Unix command called expr evaluates an expression given to it on the
command line
 Each operator and operand given to expr must be a separate argument
 The usual arithmetic operators (+.-,*,/,%) are recognized by expr
 Remember to use backslashes to protect the expression from the shell
 expr only evaluates integer arithmetic expressions
 Use the ':' operator with expr to match characters in the first operand
against a regular expression given as the second argument; by default
it returns the number of characters matched

Arithmetic is done with expr


expr 5 + 7
expr 5 \* 7

Backslash required in front of '*' since it is a filename wildcard and would be


translated by the shell into a list of file names

You can save arithmetic result in a variable

Store the following in a file named arith.sh and execute it

#!/bin/sh
# Perform some arithmetic
x=24
y=4
Result=`expr $x \* $y`
echo "$x times $y is $Result"

read and echo Revisited


Let us consider the simple shell script given bellow:

#!/bin/sh
#Usage read echo

echo Enter the Values of a, b and c


read a b c
echo $a $b $c

103
Module 13
Functions
Like "real" programming languages, Bash has functions, though in a
somewhat limited implementation. A function is a subroutine, a code block
that implements a set of operations, a "black box" that performs a specified
task. Wherever there is repetitive code, when a task repeats with only slight
variations, then consider using a function.

function function_name {
command...
}

or

function_name () {
command...
}

This second form will cheer the hearts of C programmers (and is more
portable).

As in C, the function's opening bracket may optionally appear on the second


line.

function_name ()
{
command...
}

A function may be "compacted" into a single line.

fun () { echo "This is a function"; echo; }

In this case, however, a semicolon must follow the final command in the
function.

fun () { echo "This is a function"; echo } # Error!

Functions are called, triggered, simply by invoking their names.

Example: Simple functions

#!/bin/bash

JUST_A_SECOND=1

104
funky ()
{ # This is about as simple as functions get.
echo "This is a funky function."
echo "Now exiting funky function."
} # Function declaration must precede call.

fun ()
{ # A somewhat more complex function.
i=0
REPEATS=30

echo
echo "And now the fun really begins."
echo

sleep $JUST_A_SECOND # Hey, wait a second!


while [ $i -lt $REPEATS ]
do
echo "----------FUNCTIONS---------->"
echo "<------------ARE-------------"
echo "<------------FUN------------>"
echo
let "i+=1"
done
}

# Now, call the functions.

funky
fun
exit 0

Debugging
The Bash shell contains no debugger, nor even any debugging-specific
commands or constructs. Syntax errors or outright typos in the script generate
cryptic error messages that are often of no help in debugging a non-functional
script.

Example: A buggy script

#!/bin/bash
# ex74.sh

# This is a buggy script.


# Where, oh where is the error?

105
a=37

if [$a -gt 27 ]
then
echo $a
fi

exit 0

Output from script:

./ex74.sh: [37: command not found


What's wrong with the above script (hint: after the if)?

106
Module 14
Sed:
Sed is the ultimate stream editor. If that sounds strange, picture a stream
flowing through a pipe. Okay, you can't see a stream if it's inside a pipe.
That's what I get for attempting a flowing analogy.

Anyhow, sed is a marvelous utility. Unfortunately, most people never learn its
real power. The language is very simple, but the documentation is terrible.
The Solaris on-line manual pages for sed are five pages long, and two of
those pages describe the 34 different errors you can get. A program that
spends as much space documenting the errors than it does documenting the
language has a serious learning curve.

Sed has several commands, but most people only learn the substitute
command: s. The substitute command changes all occurrences of the regular
expression into a new value. A simple example is changing "day" in the "old"
file to "night" in the "new" file:

sed s/day/night/ <old >new

I didn't put quotes around the argument because this example didn't need
them. If you read my earlier tutorial, you would understand why it doesn't need
quotes. If you have meta-characters in the command, quotes are necessary.
In any case, quoting is a good habit, and I will henceforth quote future
examples. That is:

sed 's/day/night/' <old >new

There are four parts to this substitute command:

s Substitute command
/../../ Delimiter
day Regular Expression Pattern String
night Replacement string

sed in shell script

If you have many commands and they won't fit neatly on one line, you can
break up the line using a backslash:

sed -e 's/a/A/g'
-e 's/e/E/g' \
-e 's/i/I/g' \

107
-e 's/o/O/g' \
-e 's/u/U/g' <old >new

Sed is extremely powerful, and you can do things in sed that you can't do in
any standard word processor. And because sed is external to the word
processor and comes with every Unix system in the world, once you learn sed
you'll have a very handy tool in your toolkit, even if (like me) you rarely use
Unix.

How it works: You feed sed a script of editing commands (like, "change every
line that begins with a colon to such-and-such") and sed sends your revised
text to the screen. To save the revisions on disk, use the redirection arrow,
>newfile.txt. Sample syntax:

sed "one-or-two-sed-commands" input.file >newfile.txt


sed -f bigger_sed.script input.file >newfile.txt

awk:
Awk is a ``pattern scanning and processing language'' which is useful for
writing quick and dirty programs that don't have to be compiled. The calling
syntax of awk is like sed:

UNIX> awk program [ file ]


or

UNIX> awk -f program-file [ file ]

Like sed, awk can work on standard input or on a file. Like the shell, if you
start an awk program with

#!/bin/awk – f

then you can execute the program directly from the shell.

Most systems also have nawk, which stands for ``new awk.'' Nawk has many
more features than awk and is generally more useful. I am just going to cover
awk, but you should check out nawk too in your own time. Nawk has some
nice things like a random number generator, that awk doesn't have.

awk programs are composed of ``pattern-action'' statements of the form:

pattern { action }

What such a statement does is apply the action to all lines that match the
pattern. If there is no pattern, then it applies the action to all lines. If there is

108
no action, then the default action is to copy the line to standard output.
Patterns can be regular expressions enclosed in slashes (they can be more
than that, but for now, just assume that they are regular expressions).

So, for example, the program awkgrep works just like ``grep Jim''.

UNIX> cat awkgrep


#!/bin/awk -f

/Jim/
UNIX> cat input
Which of these lines doesn't belong:

Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX> awkgrep input
Jimmy Carter
UNIX> awkgrep < input
Jimmy Carter
UNIX>

Basically look like C programs. There are some big differences, but for the
most part, you can do most basic things that you can do in C.

Awk breaks up each line into fields, which are basically whitespace-separated
words. You can get at word i by specifying $i. The variable NF contains the
number of words on the line. The variable $0 is the line itself.

So, to print out the first and last words on each line, you can do:

UNIX> cat input


Which of these lines doesn't belong:

Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX> awk '{ print $1, $NF }' input
Which belong:

Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX>

109
An alternative awkgrep prints out $0 when it finds the pattern:
UNIX> cat awkgrep2
#!/bin/awk -f

/Jim/ { print $0 }
UNIX> awkgrep2 input
Jimmy Carter
UNIX>

Awk has a printf just like C. You don't have to use parentheses when you call
it (although you can if you'd like). Unlike print, printf will not print a newline if
you don't want it to. So, for example, awkrev reverses the lines of a file:

UNIX> cat awkrev


#!/bin/awk -f

{ for (i = NF; i > 0; i-- ) printf "%s ", $i


printf "\n" }
UNIX> awkrev input
belong: doesn't lines these of Which

Clinton Bill
Bush George
Reagan Ronald
Carter Jimmy
Stallone Sylvester
UNIX>

A few things that you'll notice about awkrev: Actions can be multiline. You
don't need semicolons to separate lines like in C. However, you can specify
multiple commands on a line and separate them with semi-colons as in C.
And you can block commands with curly braces as in C. If you want a
command to span two lines (this often happens with complex printf
statements), you need to end the first line with a backslash.

Also, you'll notice that awkrev didn't declare the variable i. Awk just figured
out that it's an integer.

Type casting
Awk lets you convert variables from one type to another on the fly. For
example, to convert an integer to a string, you simply use it as a string. String
construction can be done with concatenation, which is often very convenient.
These principles are used in awkcast:

UNIX> echo "4 Jim" | awkcast


Word 1: as a number: 4, as a string: 4.
0 appended: number: 40, string 40
Word 2: as a number: 0, as a string: Jim.

110
0 appended: number: 0, string Jim0
UNIX>

Casting a string to an integer gives it its atoi() value.

BEGIN and END

There are two special patterns, BEGIN and END, which cause the
corresponding actions to be executed before and after any lines are
processed respectively. Therefore, the following program (awkwc) counts the
number of lines and words in the input file.

UNIX> cat awkwc


#!/bin/awk -f

BEGIN { nl = 0; nw = 0 }
{ nl++ ; nw += NF }
END { print "Lines:", nl, "words:", nw }
UNIX> awkwc awkwc
Lines: 5 words: 26
UNIX> wc awkwc
5 26 103 awkwc
UNIX>

next and exit

Awk tries to process each statement on each line. Unlike sed, there is no
``hold space.'' Instead, each statement is processed on the original version of
each line. Two special commands in awk are next and exit. Next specifies to
stop processing the current input line, and to go directly to the next one,
skipping all the rest of the statements. Exit specifies for awk to exit
immediately.

Here are some simple examples. awkpo prints out only the odd numbered
lines (note that this is an awkward way to do this, but it works):

UNIX> cat awkpo


#!/bin/awk -f

BEGIN { ln=0 }
{ ln++
if (ln%2 == 0) next
print $0
}

UNIX> cat -n input


1 Which of these lines doesn't belong:
2

111
3 Bill Clinton
4 George Bush
5 Ronald Reagan
6 Jimmy Carter
7 Sylvester Stallone

UNIX> cat -n input | awkpo


1 Which of these lines doesn't belong:
3 Bill Clinton
5 Ronald Reagan
7 Sylvester Stallone
UNIX>

awkptR prints out all lines until it reaches a lines with a capital R

UNIX> cat awkptR


#!/bin/awk -f

/R/ { exit }
{ print $0 }

UNIX> awkptR input


Which of these lines doesn't belong:

Bill Clinton
George Bush
UNIX>

Arrays

Arrays in awk are a little odd. First, you don't have to malloc() any storage --
just use it and there it is. Second, arrays can have any indices -- integers,
floating point numbers or strings. This is called ``associative'' indexing, and
can be very convenient. You cannot have multi-dimensional arrays or arrays
of arrays though. To simulate multidimensional arrays, you can just
concatenate the indices.

Take a look at awkgolf. This is typical of quick-and-dirty awk programs that


you sometimes write to look at data. This one processes golf scores. Suppose
you have some score files, as in the files usopen, masters, kemper and
memorial. These files first have the name of the tournament in all caps, and
then scores for a bunch of golfers. Suppose you'd like to see all the golfers
with scores for each tournament in a readable form. This is what awkgolf
does. Let's break it into its four parts.

The first part is the BEGIN line:

BEGIN { nt = 0 ; np = 0 }

112
This simply initializes two variables: nt is the number of tournaments, and np
is the number of players.

The next line looks a little cryptic:

/^[A-Z]*$/ { this = $0; tourn[nt] = $0 ; nt++; next }

This only works on lines that are all capital letters. These are the lines that
identify tournaments. On these lines, it does the following:

 Sets the this variable to be the tournament name.


 Puts the tournament's name into the tourn array.
 Increments nt variable.
 Skips the rest of the program and goes onto the next line.

The next part works on all lines that contain the pattern '--'. These are the
lines with golfers' scores:

/--/ { golfer = $1
for (i = 2; $i != "--" ; i++) golfer = golfer" "$i
if (isgolfer[golfer] != "yes") {
isgolfer[golfer] = "yes"
g[np] = golfer
np++;
}
score[golfer" "this] = $(i+1)
}
The first two lines of this action set the golfer variable to be the golfer's name.
Note that you can do string comparison in awk using standard boolean
operators, unlike in C where you would have to use strcmp().

The next 5 lines use awk's associative arrays: The array isgolfer is checked
to see if it contains the string ``yes'' under the golfer's name. If so, we have
processed this golfer before. If not, we sed the golfer's entry in isgolfer to
``yes,'' set the np-th entry of the array g to be the golfer, and increment np.

Finally, we set the golfer's score for the tournament in the score array. Note
that we don't use double-indirection. Instead, we simply concatenate the
golfer's name and the tournament's name, and use that as the index for the
array.

The last part of the program does the final formatting:

END { printf("%-25s", " ");


for (j = 0; j < nt; j++) printf("%9s", tourn[j])
printf("\n")

for (i = 0; i < np; i++) {


printf("%-25s", g[i])
for (j = 0; j < nt; j++) printf("%9s", score[g[i]" "tourn[j]])

113
printf("\n")
}
}
The first three lines print out 25 spaces, and then the names of the
tournaments as held in the tourn array. Then we loop through each golfer,
and print the golfer's name, padded to 25 characters, and then his score in
each tournament. Note that if the golfer didn't play in the tournament, that
entry of the tournament array will be the null string. This is quite convenient,
because we don't have to test for whether the golfer played the tournament --
we can just use awk's default values.

Ok, lets try awkgolf:

UNIX> awkgolf kemper # Note that the ouput is only sorted because its
# sorted in the input file
KEMPER
Justin Leonard -10
Greg Norman -7
Nick Faldo -7
Nick Price -7
Loren Roberts -6
Jay Haas -5
Paul Stankowski -5
Lee Janzen -4
Phil Mickelson -4
Davis Love III -3
Tom Lehman 0
Vijay Singh 0
Kirk Triplett 1
Steve Jones 2
Mark O'Meara 5
Don Pooley missed
Ernie Els missed
Fred Couples missed
Hal Sutton missed
Jesper Parnevik missed
Scott McCarron missed
Steve Stricker missed
UNIX> cat masters usopen kemper memorial | awkgolf
MASTERS USOPEN KEMPER MEMORIAL
Tiger Woods 281 6 5
Tommy Tolles 283 2 -11
Tom Watson 284 16 0
Paul Stankowski 285 6 -5 -3
Fred Couples 286 13 missed
Davis Love III 286 5 -3 -7
Justin Leonard 286 9 -10 0
Steve Elkington 287 7
Tom Lehman 287 -2 0 -3
Ernie Els 288 -4 missed -1

114
Vijay Singh 288 21 0 -14
Jesper Parnevik 289 11 missed -4
Lee Westwood 291 6
Nick Price 291 6 -7
Lee Janzen 292 13 -4 -11
Jim Furyk 293 2 -12
Mark O'Meara 294 9 5 -2
Scott McCarron 294 3 missed missed
Scott Hoch 298 3 -11
Jumbo Ozaki 300 missed
Frank Nobilo 303 9 -10
Bob Tway missed 2 -7
Brad Faxon missed 17 2
David Duval missed 11 -5
Greg Norman missed missed -7 -12
Loren Roberts missed 4 -6
Nick Faldo missed 11 -7
Phil Mickelson missed 10 -4
Steve Jones missed 15 2 3
Steve Stricker missed 9 missed -1
Jay Haas 2 -5 -4
Billy Andrade 4 -7
Hal Sutton 6 missed -1
Kirk Triplett 1 -2
Don Pooley missed -4
UNIX>

File indirection

You can specify that the output of print and printf go to a file with indirection.
For example, to copy standard input to the file f1 you could do:

UNIX> awk '{print $0 > "f1"}' < input


UNIX> cat f1

Which of these lines doesn't belong:

Bill Clinton
George Bush
Ronald Reagan
Jimmy Carter
Sylvester Stallone
UNIX>

Awk without standard input

Sometimes you just want to write a program that doesn't use standard input.
To do this, you just write the whole program as a BEGIN statement, exiting at
the end.

115
Multiline awk programs in the Bourne shell

The Bourne shell lets you define multiline strings simply by putting newlines in
the string (within single or double quotes, of course). This means that you can
embed simple multiline awk scripts in a sh program without having to use
cumbersome backslashes, or intermediate files. For example, shwc works
just like awkwc, but works as a shell script rather than an awk program.

UNIX> shwc awkwc


Lines: 5 words: 26
UNIX> shwc < awkwc
Lines: 5 words: 26
UNIX> shwc awkwc awkwc
usage: shwc [ file ]
UNIX>

Awk's limitations

Awk is useful for simple data processing. It is not useful when things get more
complex for a few reasons. First, if your data file is huge, you'll do better to
write a C program (using for example the fields library from CS302/360)
because it will be more efficient sometimes by a factor of 60 or more. Second,
once you start writing procedure calls in awk, it seems to me you may as well
be writing C code. Third, you often find awk's lack of double indirection and
string processing cumbersome and inefficient.

Awk is not a good language for string processing. Irritatingly, it doesn't let you
get at string elements with array operations. I.e. the following will fail:

UNIX> cat sp.awk


{ s = $1 ; s[0] = 'a' ; print s }
UNIX> awk -f sp.awk input
awk: syntax error near line 1
awk: illegal statement near line 1
UNIX>
Of course, sed is ideal for string processing, so often you can get what you
want with a combination of sed and awk.

116
Module 15
Database Using Shell Scripts
There are one or two facts about databases. If you know anything at all about
databases you'll know everything that follows.

1. A database consists of one (or more) tables which consist of a


sequence of identically structured rows or records. The rows (records)
are subdivided into fields or columns. A schema is a table that
describes a table or tables.
2. The data in a database is manipulated (updated, queried etc.,) using
commands written in SQL (Structured Query Language). Many people
seem to associate SQL with one particular database package, this is
wrong, all well known database packages (Oracle, MySQL, MS
Access, Postgres, MS SQL Server etc.,) support SQL although there
may be minor differences.
3. Most database packages operate in a client/server fashion. The
database server receives SQL requests via the net and returns results
via the net. The results of such queries will, in general, be sets of rows
or records. The database server is a permanently running programme
in principle similar to a WWW server. One exception to this rule is MS
Access which operates by direct manipulation of the host operating
system files that hold the database tables.
4. How databases actually store their tables, schemas etc., varies from
package to package and is, almost always, of no concern to the user.
For information MS Access stores all the tables and schemas of a
database in a single file whose name conventionally ends in the letters
".mdb". For each table MySQL maintains several Unix file system files,
typically one for the data, one for the schema and one for the index.
Oracle stores everything for all its databases in a group of 4-10 files
that are built on top of the local file system.

A Shell Script (CGI Backend)


#!/bin/sh
PATH=$PATH:/usr/local/mysql/bin
export PATH
echo "Content-type: text/html"
echo
PLACE=`echo $QUERY_STRING | cut -d= -f2`
echo "<html><head><title>Shell Example #3</title></head>"
echo "<body><h1>Shell Example #3</h1><p>Results of database query for"
echo $PLACE

117
echo "<p>"
echo "use mydatabase;" > /tmp/$$.sql
echo "select latitude,longitude,easting,northing from gazetteer where feature
= '$PLACE';" >> /tmp/$$.sql
mysql -u demo < /tmp/$$.sql > /tmp/$$.res
ROWS=`cat /tmp/$$.res | wc -l`
if [ $ROWS -eq 0 ]
then
echo "No information for" $PLACE
else
echo "<table border=2><tr>"
tail +2 /tmp/$$.res | sed -e 's/^/<tr><td>/
s/ /<td>/g'
echo "</table>"
fi
echo "</body></html>"
rm /tmp/$$.*

Actual database access is performed using the command line MySQL client
programme. To ensure that this can be found the search path is modified by
the second and third lines of the script.

PATH=$PATH:/usr/local/mysql/bin
export PATH

The name of the location being queried is then extracted from the
QUERY_STRING environment variable.

The MySQL command line client can be used non-interactively by arranging


for it to read SQL from its standard input, in this case using redirection from a
file. The required SQL is constructed in a temporary file.

On a normal Unix system any user can create files in the directory /tmp, the
symbol $$ in the file name is replaced by the current process identification
number, this is always unique so avoids any problems with two instances of
the back end running simultaneously.

Here is a typicaly example of the contents of the SQL file.

use mydatabase;
select latitude,longitude,easting,northing from gazetteer where feature =
'Prague';

The output from the MySQL client is also written to a temporary file. Typical
text is shown below (for a different query).

latitude longitude easting northing


195180 -21240 145 487
190860 -8040 384 346

118
188820 -11160 325 284
197880 -5820 424 563

It will be noted that the output file includes column names and that columns
are separated by TAB characters.

The next step is to determine the number of lines in the output file, this will be
zero if no matches have been found. This is done by arranging the for the
standard Unix command wc to read the file and write the number of lines to its
standard output.

The code

if [ $ROWS -eq 0 ]
then
echo "No information for" $PLACE
else
echo "<table border=2><tr>"
tail +2 /tmp/$$.res | sed -e 's/^/<tr><td>/
s/ /<td>/g'
echo "</table>"
fi

operates conditionally on the number of rows. The interesting case arises


when the number of rows is non-zero. In this case the standard Unix
command tail is used to transfer the file, less its first line, to the standard input
of the standard Unix command sed. sed is the Unix non-interactive editor
that is used here to modify the MySQL command line client output by

 Inserting <tr><td> at the start of every line. Remember that the


metacharacter ^ matches the start of a line in the regular expressions
used by all Unix editors.
 Replacing all occurences of TAB characters by the string <td>. The
final g on the sed sub-command ensures that the substitution is global.

Note that the sed edit script, introduced by the sed command line argument -e
spreads over two lines.

Simple File Creation:


There are two simple ways to create another file, one uses the cat command
in conjunction with the redirect symbol, the other way is to use the echo
command in conjunction with the redirect symbol. The example Indented Cat
is a good example of the cat method in the Pipes and Redirects section. This
example only contains litteral text however. It is more appropriate to see
something like the example below, which shows a variable being used in the
source data block.

119
Example cat and variables
cat >> $sql0 <<-EOA
SET ECHO OFF
SET FEEDBACK OFF
SET HEADING OFF
SELECT my_package.my_function($column)
FROM v\$database
WHERE name LIKE '%&1%';
EXIT
EOA
sqlplus -s $uid/$password@database @$sql0 $sql_arg_1 > $log0

The file created has its name stored in the variable $sql0 and as we can see
the block between the EOA flags is the data that goes into the file. The data
block is actually a segment of SQL*Plus statements, as indicated by the
filename variable. As is common with SQL*Plus code, the key words are
picked out in ALL CAPS, with objects (tables, procedures, columns, etc.) all in
lower case. The SELECT line contains a reference to a called, packaged,
PL/SQL function which has a column name as an argument. Here the column
name is held in a variable called $column and this will be substituted at script
run-time by the real value.

There are some unfortunate consequences of generating SQL*Plus


statements from within a shell script which you have to be aware of. Firstly,
don't forget to put the EXIT statement at the end of the block or you will end
up with a script that stays in SQL*Plus forever. Secondly, don't forget to put
the semi-colons (;) at the end of every SQL statement, or each statement will
overwrite the previous one or just create one long unprocessable mess.
Thirdly, some internal database tables may contain the dollar symbol, which is
special to the shell, so escape them with the back-slash (\) as shown on the
FROM line.

On the WHERE line there is a reference to a SQL*Plus positional parameter


'&1' which will pick up its value from the variable $sql_arg_1 at run-time as
shown in the last line, just after the end of block flag. Did I say this was a
simple example? Well, at least you don't have to worry about quoting when
using this method. All quotes find their way to the destination file unscathed.
Now to do the same thing using echo instead of cat, see the example below.

Example simple echo

echo "SET ECHO OFF" >> $sql0


echo "SET FEEDBACK OFF" >> $sql0
echo "SET HEADING OFF" >> $sql0
echo "SELECT my_package.my_function($column)" >> $sql0
echo " FROM v\$database" >> $sql0
echo " WHERE name LIKE '%&1%';" >> $sql0
echo "EXIT" >> $sql0
sqlplus -s $uid/$password@database @$sql0 $sql_arg_1 > $log0

120
Complex File Creation:

So what's the point of all this extra typing? Well for one thing it allows you to
put special bits of code into the block which will only be used at certain times,
by hiding them in complex command groups. This example shows how this is
done below.

Example complex echo forms

echo "SET ECHO OFF" >> $sql0


echo "SET FEEDBACK OFF" >> $sql0
echo "SET HEADING OFF" >> $sql0
echo "SELECT my_package.my_function($column)" >> $sql0
echo " FROM v\$database" >> $sql0
if [ "$db_type" = "m" ]
then
echo " WHERE name = '$db_name';" >> $sql0
else
echo " WHERE name LIKE '%&1%';" >> $sql0
fi
echo "EXIT" >> $sql0
sqlplus -s $uid/$password@database @$sql0 $sql_arg_1 > $log0

This is basically the same block except the WHERE clause has been hidden
inside an if statement. Now, depending on the Database Type in the $db_type
variable, the WHERE clause can take one of two forms. Conveniently, the
additional argument which is not required by SQL*Plus in the first form, is
ignored at execution time, even though it is still available on the last line. This
is common with all scripts, arguments are only used if they are referenced
from within the script.

So there you have the first two ways of creating another file from a script. The
version using cat can only cope with a single output form, the version using
echo can output a multitude of forms depending on the complex command
forms you use. The choice is yours. There are, however, other ways to create
output files. You can use direct generation as in the example List to create a
list of files. Or the indirect method shown in the example Counted List where
lines are built inside a loop construct and then appended to the file to create a
menu file. Or in the example Sorted List where a list of words is sorted into
alphabetic order, duplicates are removed, then the rest stored in a file.

Example list

ls -1 *.log > $lst0

Example counted list

count=1

121
for file in `ls -1 *.log`
do
echo "$count: $file" >> $mnu0
count=`expr $count + 1`
done

Example sorted list

echo $@ | tr ' ' '\n' | sort -u > $lst0

122
Module 16
OVERVIEW OF PERL

What is perl?
Perl, sometimes referred to as Practical Extraction and Reporting Language,
is an interpreted programming language with a huge number of uses,
libraries and resources. Arguably one of the most discussed and used
languages on the internet, it is often referred to as the swiss army knife, or
duct tape, of the web.

Perl was first brought into being by Larry Wall circa 1987 as a general
purpose Unix scripting language to make his programming work simpler.
Although it has far surpassed his original creation, Larry Wall still oversees
development of the core language, and the newest version, Perl 6.

Running Perl

The simplest way to run a Perl program is to invoke the Perl interpreter with
the name of the Perl program as an argument:

perl sample.pl

The name of the Perl file is sample.pl, and perl is the name of the Perl
interpreter. This example assumes that Perl is in the execution path; if not,
you will need to supply the full path to Perl too:

/usr/local/hin/perl sample.pl

This is the preferred way of invoking Perl because it eliminates the possibility
that you might accidentally invoke a copy of Perl other than the one you
intended. W e will use the full path from now on to avoid any confusion.

This type of invocation is the same on all systems with a command-line


interface. The following line will do the trick on Windows NT, for example:

c:\NTperl\perl sample.pl

123
Invoking Perl on UNIX

UNIX systems have another way to invoke an interpreter on a script file. Place
a line like

#!/usr/local/bin/perl

at the start of the Perl file. This tells UNIX that the rest of this script file is to be
interpreted by /usr/local/bin/perl. Then make the script itself executable:

chmod +x sample.pl

You can then "execute" the script file directly and let the script file tell the
operating system what interpreter to use while running it.

You can supply Perl command-line arguments on the interpreter invocation


line in UNIX scripts. The following line is a good start to any Perl script:

#!/usr/local/bin/perl -w -t

A Perl Script

A Perl program consists of an ordinary text file containing a series of Perl


commands. Commands are written in what looks like a bastardized amalgam
of C, shell script, and English. In fact, that's pretty much what it is.

Perl code can be quite free-flowing. The broad syntactic rules governing
where a statement starts and ends are

 Leading white space is ignored. You can start a Perl statement


anywhere you want: at the beginning of the line, indented for clarity
(recommended), or even right-justified (definitely frowned on) if you
like.
 Commands are terminated with a semicolon.
 White space outside of string literals is irrelevant; one space is as good
as a hundred. That means you can split statements over several lines
for clarity.
 Anything after a pound sign (#) is ignored. Use this to pepper your
code with useful comments.

Here's a Perl statement:

print "My name is Sreedhar\n";

No prizes for guessing what happens when Perl runs this code; it prints

My name is Sreedhar

124
If the \n doesn't look familiar, don't worry; it simply means that Perl should
print a newline character after the text; in other words, Perl should go to the
start of the next line.

Printing more text is a matter of either stringing together statements or giving


multiple arguments to the print function:

print "My name is Sreedhar,\n";


print "I live in Bangalore,\n",
"I work in a Wipro there.\n";

That's right, print is a function. It may not look like it in any of the examples so
far, where there are no parentheses to delimit the function arguments, but it is
a function, and it takes arguments. You can use parentheses in Perl functions
if you like; it sometimes helps to make an argument list clearer. More
accurately, in this example the function takes a single argument consisting of
an arbitrarily long list. We'll have much more to say about lists and arrays
later, in the "Data Types" section. There will be a few more examples of the
more common functions in the remainder of this chapter, but refer to the
"Functions" chapter for a complete run-down on all of Perl's built-in functions.

So what does a complete Perl program look like? Here's a trivial UNIX
example, complete with the invocation line at the top and a few comments:

#!/usr/local/bin/perl -w # Show warnings

print "My name is Sreedhar,\n"; # Let's introduce ourselves


print "I live in Bangalore,\n",
"I work in a Wipro there.\n"; # Remember
the line breaks

That's not at all typical of a Perl program though; it's just a linear sequence of
commands with no structural complexity. The "Flow Control" section later in
this overview introduces some of the constructs that make Perl what it is. For
now, we'll stick to simple examples like the preceding for the sake of clarity.

Exercise:

1. Write a shell script to modify all files in a directory.


2. Create a shell script to write to create a user screen, which will
allow user to enter data in a file, delete a record, add a record,
and also allow updating or querying the file.

125
Appendix A
List of basic UNIX Commands:

The basic UNIX commands include some of the most commonly used commands for
users, and constructs for building shell scripts.

The following charts offer a summary of some simple UNIX commands. These are
certainly not all of the commands available in this robust operating system, but these
will help you get started.

Ten ESSENTIAL UNIX Commands:


These are ten commands that you really need to know in order to get started with
UNIX. They are probably similar to commands you already know for another
operating system.
Command Example Description

1. ls ls Lists files in current directory


ls -alF List in long format

2. cd cd tempdir Change directory to tempdir


cd .. Move back one directory
cd ~dhyatt/web-docs Move into dhyatt's web-docs
directory

3. mkdir graphics Make a directory called graphics


mkdir

4. rmdir rmdir emptydir Remove directory (must be empty)

5. cp cp file1 web-docs Copy file into directory


cp file1 file1.bak Make backup of file1

6. rm rm file1.bak Remove or delete file


rm *.tmp Remove all file

7. mv mv old.html Move or rename files


new.html

8. more more index.html Look at file, one page at a time

9. lpr lpr index.html Send file to printer

10. man man ls Online manual (help)

126
Ten VALUABLE UNIX Commands:

Once you have mastered the basic UNIX commands, these will be quite valuable in
managing your own account.

Command Example Description

1. grep grep "bad word" * Find which files contain a


<str><files> certain word

2. chmod <opt> chmod 644 *.html Change file permissions


<file> chmod 755 file.exe read only
Change file permissions
to executable

3. passwd passwd Change passwd

4. ps <opt> ps aux List all running processes


ps aux | grep by #ID
dhyatt List process #ID's running
by dhyatt

5. kill <opt> <ID> kill -9 8453 Kill process with ID #8453

6. gcc (g++) gcc file.c -o file Compile a program written


<source> g++ fil2.cpp -o fil2 in C
Compile a program written
in C++

7. gzip <file> gzip bigfile Compress file


gunzip bigfile.gz Uncompress file

8. mail mail me@tjhsst.edu Send file1 by email to


(pine) < file1 someone
pine Read mail using pine

9. telnet <host> telnet Open a connection to


ssh <host> vortex.tjhsst.edu vortex
ssh -l dhyatt Open a secure connection
jazz.tjhsst.edu to jazz as user dhyatt

10. ftp <host> ftp station1.tjhsst.edu Upload or Download files


ncftp ncftp to station1
<host/directory> metalab.unc.edu Connect to archives at
UNC

127
Ten FUN UNIX Commands:

These are ten commands that you might find interesting or amusing. They are
actually quite helpful at times, and should not be considered idle entertainment.

Command Example Description

1. who who Lists who is logged on your


machine

2. finger finger Lists who is on computers in the


lab

3. ytalk ytalk Talk online with dhyatt who is on


<user@place> dhyatt@threat threat

4. history history Lists commands you've done


recently

5. fortune fortune Print random humerous message

6. date date Print out current date

7. cal <mo> cal 9 2000 Print calendar for September


<yr> 2000

8. xeyes xeyes & Keep track of cursor (in


"background")

9. xcalc xcalc & Calculator ("background"


process)

10. mpage mpage -8 file1 Print 8 pages on a single sheet


<opt> <file> | lpr and send to printer (the font will
be small!)

128
Ten HELPFUL UNIX Commands

These ten commands are very helpful, especially with graphics and word processing
type applications.

Command Example Description

1. netscape netscape & Run Netscape browser

2. xv xv & Run graphics file converter

3. xfig / xpaint xfig & (xpaint Run drawing program


&)

4. gimp gimp & Run photoshop type program

5. ispell <fname> ispell file1 Spell check file1

6. latex <fname> latex file.tex Run LaTeX, a scientific


document tool

7. xemacs / pico xemacs (or Different editors


pico)

8. soffice soffice & Run StarOffice, a full word


processor

9. m-tools (mdir, mdir a: DOS commands from UNIX


mcopy, mcopy file1 (dir A:)
mdel, mformat, a: Copy file1 to A:
etc. )

10. gnuplot gnuplot Plot data graphically

129
Ten USEFUL UNIX Commands:

These ten commands are useful for monitoring system access, or simplifying your
own environment.

Command Example Description

1. df df See how much free disk space

2. du du -b subdir Estimate disk usage of


directory in Bytes

3. alias alias lls="ls -alF" Create new command "lls" for


long format of ls

4. xhost xhost + Permit window to display from


threat.tjhsst.edu x-window program from threat
xhost - Allow no x-window access from
other systems

5. fold fold -s file1 | lpr Fold or break long lines at 60


characters and send to printer

6. tar tar -cf subdir.tar Create an archive called


subdir subdir.tar of a directory
tar -xvf subdir.tar Extract files from an archive file

7. ghostview gv filename.ps View a Postscript file


(gv)

8. ping ping See if machine is alive


(traceroute) threat.tjhsst.edu Print data path to a machine
traceroute
www.yahoo.com

9. top top Print system usage and top


resource hogs

10. logout logout or exit How to quit a UNIX shell.


(exit)

130

You might also like