You are on page 1of 25

IT-INFRASTRUCTURE

Lecture Notes 1

Shell and System Programs


Shell and System Programs

Contents
1 The Shell 2
1.1 Functionality of the Shell . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 The system program that implements the shell . . . . . . . . . . . . . . . . . 3
1.2.1 Shell Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2.2 Expansion und Quoting . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 System Programs 6
2.1 File and directory management . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.1 ls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1.2 cp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.3 mv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.4 rm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.5 mkdir . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.1.6 cat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.7 chmod . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.1.8 find . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Filter 10
3.1 Redirection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.2 Common Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.1 sort . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.2.2 head . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.3 tail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
3.2.4 grep . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3.3 Pipelines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

4 Regular Expressions 15
4.1 sed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

5 Shell Programming 17
5.1 Conditional Execution/Branches . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 Loops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

A Questions to verify the learning success 23

Author: Dipl. Ing. T. Zeitlhofer Page 1/24


Shell and System Programs

1 The Shell
The shell is a so-called command interpreter. The user enters commands via the keyboard
and the shell executes them. Commands are either internal functions of the shell (i.e., these
commands are directly interpreted by the shell) or regular programs (i.e., the “command” is
the name of a regular program which is executed by the shell).

1.1 Functionality of the Shell

SHELL
Parent-Process
(PID=100)

Child-Process

SHELL COMMAND
1
(PID=200) (PID=200)

3
2 4
↑ User-Space
fork exec
↓ Kernel-Space

Figure 1: Functionality of the shell when executing a program: 1 system call, 2 new process,
3 system call, 4 new program

Fig. 1 shows the principle functionality of the shell when executing a command (COM-
MAND). We assume that the process id (PID)1 of the shell process is 100. Four basic steps
are necessary to execute the command:

1. At first, the system call fork is called by the shell process.

2. The system call fork is used to create new processes. The new process is an exact copy
of the calling process. Let’s assume that the new process gets PID=200. Then, the
process with PID=200 is a child process of the shell process (PID=100). The program
(i.e., the set of instructions) that is executed by the child process (PID=200) is still
the program that implements shell.

3. The child process (PID=200) invokes the system call exec.


1
In the context of operating systems (OSs), the term process denotes a loaded (and possibly running)
program plus its current status. The status contains the current value of variables, registers, assigned
memory ranges,. . . (i.e., information about all resources used). The process id is a numerical identifier used
by the OS to uniquely identify each process.

Author: Dipl. Ing. T. Zeitlhofer Page 2/24


Shell and System Programs

4. The system call exec allows to exchange the program that is executed within an
existing process. I.e., a new program (COMMAND) is loaded, initialized, and then
executed within the context of the existing process. In this example, the process with
PID=200 now executes the instructions of COMMAND.

Finally, the command is executed within a child process of the shell. During execution, the
child process typically is in control of input (e.g., reading from the keyboard) and output
(writing to the terminal). When the child process is finished, the shell is in control of
input/output again and waits for new commands.
Example 1. From the perspective of the user, executing the command ls (lists all files in
the current directory) works as follows:
user@host$ ls
file1 file2
user@host$

In the first line, the user enters ls and hits the ENTER key. The second line shows the
output of the command (here, two files are found in the current directory). When the third
line is shown, the command ls has been finished and now the shell waits for new user input.


1.2 The system program that implements the shell


The shell itself is implemented by a regular system program. Therefore, it is possible to use
different implementations. Various shells do exist and one of the first implementations was
the Bourne 2 -Shell (sh). Other implementations like ksh, zsh, and bash offer an extended set
of features. The name of the latter is an abbreviation for Bourne-again-Shell. The bash
shell is the standard shell for most GNU/Linux distributions and also for MAC OS X. In
the following discussion, we always assume that the bash shell is used.
When starting the shell, some configuration files are processed which allow to configure
certain parameters of the shell.
Example 2. Whenever the shell waits for user input, a configurable character string, the
prompt, is put out at the beginning of the line. Typically, the prompt shows the name of
the current user and possibly the host name of the current computer. So, the prompt for
the user “jdoe” at the computer with host name “compu” would typically be
jdoe@compu$

Here, the shell waits for user input. 

1.2.1 Shell Variables

Like with programming languages, the shell supports the usage of variables. Several prop-
erties of the shell may be controlled via predefined shell variables.
Values are assigned to variables via the command VARIABLE=VALUE. If not otherwise specified,
values are character strings. In contrast to programming languages like C, variables do not
have to be declared bevore use.
2
named after the author Stephen Bourne

Author: Dipl. Ing. T. Zeitlhofer Page 3/24


Shell and System Programs

Example 3. The shell variable PATH specifies the directories where the shell searches for
executable programs. The standard directories are /usr/bin and /bin. To configure the
shell to look into these directories, the PATH variable is set like this:
jdoe@compu$ PATH="/usr/bin:/bin"

The names of the directories are separated by colons (i.e., it is also possible to specify more
than two directories). 
Note, in variable assignments like above, there must not be any blank before or after the
assignment operator (=).
To show the current value of a variable, the command echo (allows to write character strings
to the terminal) may be used. When referencing (and in contrast to defining) variables, the
character “$” has to be prepended to the variable’s name.
Example 4. The current value of the PATH variable is shown with:
jdoe@compu$ echo $PATH
/usr/bin:/bin


The prompt may be configured via the shell variable PS1. The character string that is
assigned to this variable may contain backslash-escaped special characters like \u which is
replaced by the shell with the name of the current user or \h which is replaced by the host
name. So, the prompt as shown above may be configured by:
jdoe@compu$ PS1="\u@\h$ "

1.2.2 Expansion und Quoting

The shell may interpret certain characters or character strings in a special way which is
called expansion. One example is the usage of shell variables as discussed above. E.g.:
jdoe@compu$ a=123; echo $a
123

Here, the first command is used to assign the value “123” to the shell variable a. In the
second command3 , the shell expands the string “$a”, i.e., it replaces the string with the value
of the variable a. So, the output of the echo command is “123” (and not “$a”). This is the
so-called variable expansion.
With pathname expansion, several characters are treated specially:

* matches any character string (even an empty character string)

? matches any single character (but not an empty string)

[...] matches any character in the set that is given inside the square brackets
3
Multiple commands may be separated by semicolons. In that case, the shell executes each command in
turn.

Author: Dipl. Ing. T. Zeitlhofer Page 4/24


Shell and System Programs

The following examples show the usage of pathname expansion:


jdoe@compu$ ls
f1.c f2.c f2.h f.c
jdoe@compu$ ls *.c
f1.c f2.c f.c
jdoe@compu$ ls f?.c
f1.c f2.c
jdoe@compu$ ls f2.[ch]
f2.c f2.h

Sometimes it is necessary to suppress the expansion of special character strings by the shell.
The is achieved by quoting. Three different quoting mechanisms can be distinguished:

Escape Character. The backslash “\” is the general escape character. If a backslash is
put in front of a special character then this character looses its special meaning, e.g.:
jdoe@compu$ a=123; echo \$a
$a

Here, the shell does not expand the string “$a”, because of the backslash in front of
“$”. I.e., “$” has lost its special meaning and the output of the echo command is “$a”
(and not the value of the variable a). The backslash may also be used to escape a
backslash, i.e., “\\” represents a single backslash:
jdoe@compu$ a=123; echo \\\$a
\$a
jdoe@compu$

Note, the first backslash is used to escape the second backslash and the third backslash
is used to escape “$”.
Single Quotes. If a character string is enclosed in single quotes, i.e. '...', all characters
loose their special meaning, e.g.:
jdoe@compu$ a=123; echo '$a'
$a
jdoe@compu$

Note, it is not possible to put a single quote inside single quotes.


Double Quotes. If a character string is enclosed in double quotes, i.e. "...", then all
but the characters $, `, !, \ loose their special meaning. The backtick (or backquote)
character ` is used for command substitution (will be discussed later) and ! is used
for history expansion, e.g., entering !! allows to executed the most recent command
again.
jdoe@compu$ a=123; echo "$a"
123
jdoe@compu$ ls *.c
f1.c f2.c f.c
jdoe@compu$ ls "*.c"
ls: *.c: No such file or directory

Note, in the two command above, “$” does not loose its special meaning inside double
quotes, but “*” looses its special meaning inside double quotes (the error message
shows that ls tried to list a file literally named *.c).

Author: Dipl. Ing. T. Zeitlhofer Page 5/24


Shell and System Programs

2 System Programs
In addition to the shell, various system programs are available with GNU/Linux and other
UNIX-like systems. The “philosophy” is to have specialized programs that are optimized for
single tasks. Then, more complex tasks may be solved by combining multiple programs in a
proper way.
The system programs can be classified according to different types of tasks:

• File and directory management


• Filter
• Software development (compiler, editor)
• Working with text files
• System administration
• Miscellaneous

Typically, the program behavior is controlled via options that are passed to the program on
the command line. The list of supported options may be quite long, so there is also a reference
documentation available (so-called manual pages or short man pages). This documentation
provides brief information about the program functionality and all supported options. It is
accessed via the system program man. E.g., the extensive man page for the bash is accessed
like this:
jdoe@compu$ man bash

Analog, the man pages for other programs can be accessed by “man <program name>”.
In the following, several common system programs are presented. Thereby, only the basic
functionality is covered. More details are found in the corresponding man pages.

2.1 File and directory management


2.1.1 ls

As already presented above, the command ls lists directory contents. When the current
directory contains two files (file1, file2) and one directory (dir1) then we get the following
output:
jdoe@compu$ ls
dir1 file1 file2
jdoe@compu$

Often, more detailed information (not just the file name) is required. For this, ls supports
the option -l (long listing):
jdoe@compu$ ls -l
drwxr-xr-x 2 jdoe users 4096 2011-01-01 16:28 dir1
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file1
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file2
jdoe@compu$

Author: Dipl. Ing. T. Zeitlhofer Page 6/24


Shell and System Programs

For each file and directory, the output contains a line with eight fields. The meaning of these
fields is shown in Fig. 2.
drwxr-xr-x 2 jdoe users 4096 2011-01-01 16:28 dir1
file/directory name
time of last modification
date of last modification
size in bytes
associated user group
associated user (owner)
number of subdirectories (with directories) / hard links4 (with files)
access rights (permissions)

Figure 2: Output of ls -l explained.

At this point, we take a closer look at file permissions for UNIX-like systems. Permissions
may be granted for reading (r), writing (w), and executing 5 (x). Thereby, it is possible to
separately assign these permissions to the owner, to the associated group, and to all other
users. Fig. 3 explains the meaning of the permissions as found in the output of ls -l.

d rwx r-x r-x


permissions for all other users
permissions for the associated group
permissions for the owner
indicates special kinds of files, e.g. “d” indicates directories

Figure 3: File Permissions

2.1.2 cp

The command cp (copy) is used to copy files. So, the following call of cp
jdoe@compu$ ls -l
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file1
jdoe@compu$ cp file1 file2
jdoe@compu$ ls -l
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file1
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file2
jdoe@compu$

creates a new file named “file2” with the same content as “file1”.
4
A hard link is an additional name for one and the same file. Hard links may be placed into different
directories, but within the same file system.
5
For directories, the “execute” permission means that changing into the directory is allowed.

Author: Dipl. Ing. T. Zeitlhofer Page 7/24


Shell and System Programs

2.1.3 mv

The command mv (move) is used to rename/move files. In contrast to cp, the original file is
removed:
jdoe@compu$ ls -l
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file1
jdoe@compu$ mv file1 file2
jdoe@compu$ ls -l
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file2
jdoe@compu$

2.1.4 rm

The command rm (remove) is used to remove files or directories. The following invocation
of rm removes the file named “file1”:
jdoe@compu$ ls -l
-rw-r--r-- 1 jdoe users 30 2011-01-01 16:24 file1
jdoe@compu$ rm file1
jdoe@compu$ ls -l
jdoe@compu$

To remove a directory, the option -r has to be used, e.g.:


jdoe@compu$ ls -l
drwxr-xr-x 2 jdoe users 4096 2011-01-01 16:28 dir1
jdoe@compu$ rm -r dir1
jdoe@compu$ ls -l
jdoe@compu$

Care has to be taken with the option -r, as all files and subdirectories within the specified
directory are removed recursively. Note, all these system commands typically assume that
you know what you are doing, i.e., you are not prompted with questions like “are you sure?”
or similar.

2.1.5 mkdir

The command mkdir (make directories) is used to create new directories. The following
invocation of mkdir
jdoe@compu$ ls -l
drwxr-xr-x 2 jdoe users 4096 2011-01-01 16:28 dir1
jdoe@compu$ mkdir dir2
jdoe@compu$ ls -l
drwxr-xr-x 2 jdoe users 4096 2011-01-01 16:28 dir1
drwxr-xr-x 2 jdoe users 4096 2011-01-01 17:28 dir2
jdoe@compu$

creates a new directory named “dir2” within the current directory.

Author: Dipl. Ing. T. Zeitlhofer Page 8/24


Shell and System Programs

2.1.6 cat

The command cat (concatenate files) writes the concatenated contents of all given files to
the standard output. In the following example, each file contains one line of text in the form
“Content of File N ” where N is the number in the file name:
jdoe@compu$ cat file1 file2 file3
Content of File 1
Content of File 2
Content of File 3
jdoe@compu$

2.1.7 chmod

The command chmod (change file mode bits) is used to change file permissions. There are
two possible ways to specify the permissions:

1. The permissions are specified by octal numbers. For this, the symbolic string rwx
is interpreted as a 3-bit binary number. The most significant bit (the leftmost bit)
represents the read permission (r) and the least significant bit (the rightmost bit) rep-
resents the execute permission (x). If the corresponding bit is set then the permission
is granted.
Example 5. The permissions to read and execute, but no write permission, (r-x) are
represented by the 3-bit binary number 101:

22 21 20
1 0 1
↑ ↑ ↑
r w x

The corresponding octal number is 5. 

The permissions for the owner, the group, and all other users can be specified by three
consecutive octal numbers. In the following example, the permissions for the file named
“file1” are set to -rwxrw-r--:
jdoe@compu$ chmod 764 file1
jdoe@compu$ ls -l
-rwxrw--r-- 1 jdoe users 30 2011-01-01 16:24 file1
jdoe@compu$

2. The permissions may also be specified symbolically. This symbolic mode is documented
in the man page.

2.1.8 find

The command find allows to search for files in a directory hierarchy. It is possible to specify
various search criteria via options. The following example shows how the user “jdoe” may
search for C-files (extension .c) within his home directory (and subdirectories thereof):

Author: Dipl. Ing. T. Zeitlhofer Page 9/24


Shell and System Programs

jdoe@compu$ find ~/ -name "*.c"


/home/jdoe/test1.c
/home/jdoe/test2.c
jdoe@compu$

Note, the tilde (~) is replaced with the name of the user’s home directory (here, /home/jdoe)
by the shell (tilde expansion). Note also, “*.c” has been quoted by double quotes. In this
case, quoting is necessary6 . Otherwise, the shell would expand the string “*.c” with the
names of all C-files in the current directory.
Using the option -mtime n, allows to search for files with respect to the date of last modifi-
cation. The number n is used to specify a multiple of 24 hours. If this number is specified
in the form −n then the search will include files that have been modified within the last
n × 24 hours. Consider two files test1.c and test2.c where test1.c has been modified
within the last 24 hours and test2.c has been modified the day before yesterday. Then,
the following call to find yields test1.c only:
jdoe@compu$ find ~/ -name "*.c" -mtime -1
/home/jdoe/test1.c
jdoe@compu$

The command find supports a lot of additional options and search criteria which are docu-
mented in the man page.

3 Filter
Programs do not directly read/write to/from terminals. Instead, three different pseudo files
are available for reading and writing (compare Fig. 4):

Standard Input (STDIN). Data may be read from STDIN.

Standard Output (STDOUT). Data may be written to STDOUT.

Standard Error (STDERR). Error messages are typically written to STDERR.

Per default, all are connected to the input terminal. I.e., when a program reads from STDIN
then keyboard input is read and when a program writes to STDOUT or STDERR then the
text is echoed on the user’s terminal.

Definition 6. Programs that read from STDIN and write to STDOUT are called filters.

3.1 Redirection
The shell allows to redirect STDIN, STDOUT, and STDERR to/from regular files (redirec-
tion). The operator “<” is used to redirect the input stream (e.g., STDIN) and “>” is used
to redirect output streams (e.g., STDOUT, STDERR).
6
Here, it is also possible to use another quoting mechanism like single quotes or the backslash.

Author: Dipl. Ing. T. Zeitlhofer Page 10/24


Shell and System Programs

(0) STDIN COMMAND STDOUT (1)

STDERR (2)

Figure 4: Input/Output pseudo files (or streams) and their id numbers

Example 7. Let’s consider that the current directory contains a file named file1. We can
use ls to list that file:
jdoe@compu$ ls file1
file1

where ls writes the name of the file to STDOUT. Using output redirection:
jdoe@compu$ ls file1 > output

we do not get any visible output on the terminal. Instead, the output of ls has now been
redirected into the file named output7 . We can verify this with the program cat which is
the simplest kind of filter program as it just writes to STDOUT what is read from STDIN:
jdoe@compu$ cat < output
file1

Here, we used input redirection to connect STDIN of cat with the file output. That means,
cat effectively reads the content of the file output and writes that to STDOUT. So, the file
output contains the string “file1” which has been produced by ls before.
Note, cat also directly supports reading from files when a filename is provided:
jdoe@compu$ cat output
file1

The effect is the same as with the previous command. But it is worth noting that from the
viewpoint of cat these are two different situations. With the input redirection, cat is not
“aware” that it is reading from a file – it just reads from STDIN. In the last command, a
different code path is triggered in the application cat. Because a parameter is given, that
code path has to open the corresponding file for reading, i.e. cat is not reading from STDIN.

Example 8. As in Example 7, we assume a file named file1 in the current directory. In
addition, we assume that there are no other files here:
jdoe@compu$ ls
file1

When we try to list a nonexisting file:


jdoe@compu$ ls file2
ls: cannot access 'file2': No such file or directory

7
With output redirection, the corresponding file is created when it did not exist before. Using “>”, the
file will be overwritten if it exists. It is also possible to use “>>” for output redirection. In that case, data
is appended to an already existing file.

Author: Dipl. Ing. T. Zeitlhofer Page 11/24


Shell and System Programs

we get an error message. This error message is written to STDERR. To redirect STDERR,
we have to prefix the redirection operator with STDERR’s id number (compare Fig. 4).
jdoe@compu$ ls file2 2> errors

Now, there is no visible output. Instead, the error message has been redirected into the file
named errors:
jdoe@compu$ cat errors
ls: cannot access 'file2': No such file or directory

It is also possible to redirect STDOUT and STDERR at the same time:


jdoe@compu$ ls file1 file2 >output 2>errors

Here, STDOUT is redirected into the file output while at the same time STDERR is redi-
rected into the file errors. 

3.2 Common Filters


3.2.1 sort

The program sort is a typical filter. If called without any additional parameters, sort reads
lines from STDIN as long as the users enters STRG-D . Then, all the lines that have been
read are written in alphabetically sorted order to STDOUT.
Example 9. Consider the file named file1 that contains three lines. The first line contains
the character “c”, the second line contains the character “b”, and the third line contains the
character “a”, i.e.:
1 c
2 b
3 a

The following command writes all lines of file1 in alphabetically sorted order to STDOUT:
jdoe@compu$ sort <file1
a
b
c

Note, in the command above, sort reads from STDIN, but the shell redirects STDIN from
file1, i.e., sort reads all lines of file1 from STDIN.
If we want to write the sorted lines into a new file named file2, we may also redirect
STDOUT:
jdoe@compu$ sort <file1 >file2

After execution of the command above, file2 contains all lines of file1 in alphabetically
sorted order. Note, this command creates the new file file2. If file2 is an existing file
then it will be overwritten. It is also possible to append data to an existing file (i.e., the file
is not overwritten, but data is appended at the end) by using the operator “>>”.
In this example, the command sort has the characteristics of a filter: data of file1 is read,
manipulated in a certain way (i.e., sorted), and the result is stored in file2. 

Author: Dipl. Ing. T. Zeitlhofer Page 12/24


Shell and System Programs

Example 10. Now, consider that file1 contains the following three lines:
1 c 2
2 b 1
3 a 10

Using sort, it is also possible to sort data according to certain columns. The column(s) may
be specified via the option -k pos where pos specifies the column:
jdoe@compu$ sort -k 2 < file1
b 1
a 10
c 2

This result shows all lines alphabetically sorted according to the second column. Note, most
likely this is not the intended result. The second column contains just numbers, so numerical
sorting would be more appropriate. For this, sort supports the option -n:
jdoe@compu$ sort -n -k 2 < file1
b 1
c 2
a 10

The command above uses input redirection to numerically sort all lines of file1 according
to the second column. 

3.2.2 head

The command head is used to write the first n lines of a file to STDOUT. The number of
lines n may be specified via the option -n. The following command writes the first two lines
of file1 from Example 9 to STDOUT:
jdoe@compu$ head -2 file1
c
b

Note, with the command above, no redirection has been used. Instead, the name of the file
(file1) has been passed to head as an additional parameter. In that case, head (like many
other filter programs) does not read from STDIN, but opens file1 for reading. Nevertheless,
we would get the same result when redirecting STDIN from file1.

3.2.3 tail

This command is similar to head, but it shows the last n lines of a file. The following
command just writes the last line of file1 from Example 9 to STDOUT:
jdoe@compu$ tail -1 file1
a

Author: Dipl. Ing. T. Zeitlhofer Page 13/24


Shell and System Programs

3.2.4 grep

The program grep is an often used filter. It allows to search text files for lines that match
certain patterns. The following command writes all lines of file1 that contain the character
“b” to STDOUT:
jdoe@compu$ grep b file1
b

It is also possible to uses patterns that are regular expressions (compares Sec. 4) which makes
grep quite powerful.

3.3 Pipelines
In addition to redirection to/from files, the shell also allows to connect STDOUT and STDIN
of two different programs in form of pipelines, compare Fig. 5.
Example 11. Consider the case, that the first two lines of file1 in Example 9 should be
written in alphabetically sorted order to STDOUT. This could be achieved by:
jdoe@compu$ head -2 file1 >tmp
jdoe@compu$ sort tmp
b
c
jdoe@compu$ rm tmp

Note, here, we used a temporary file to store the first two lines of file1. Then, sort is used
to alphabetically sort the lines of the temporary file. A more efficient implementation avoids
the creation of a temporary file by using a pipeline (operator “|”) that connects STDOUT
of head to STDIN of sort:
jdoe@compu$ head -2 file1 | sort
b
c

Pipeline
“|”
STDIN COMMAND 1 COMMAND 2 STDOUT

STDERR
STDERR

Figure 5: The pipeline connects STDOUT of one command with STDIN of a second com-
mand.

Author: Dipl. Ing. T. Zeitlhofer Page 14/24


Shell and System Programs

4 Regular Expressions
Regular expressions are a powerful tool when it comes to pattern matching (typically used
with text files). By using special (“meta”) characters, it is possible to specify patterns that
are not matched literally but according the the special meaning of the characters used. Sev-
eral system programs support regular expressions, like grep (Sec. 3.2.4) and sed (Sec. 4.1).
Although, most programs support a common subset of regular expressions, there may also
be differences that are documented in the corresponding man pages.
Common regular expressions are:

. A single dot matches any character, but a newline (ASCII: 0x0a)8 .

[...] Matches any of the characters that are enclosed by square brackets. E.g., the regular
expression [ab] matches either the character a or the character b.
It is also possible to specify ranges by using a dash, e.g.: [a-z] matches any lower
case character.
Another possibility are complementary character sets, e.g.: [^ab] matches all char-
acters but a or b. I.e., if the first character within the square brackets is a caret (^)
then all following characters a part of a complementary set. The expression matches
all characters that are not part of the complementary set.
Sometimes it is necessary to include square brackets in the set. To avoid ambiguities
with the outer brackets, a closing square bracket that is part of the set has to directly
follow the outer opening bracket, e.g.: []a] matches the characters a, or ]. Note, the
expression [a]] matches an a that is followed by a closing square bracket, i.e., the set
only contains the character a.

ˆ The caret matches the beginning of a line. E.g., the expression ^A matches any capital
A, but only at the beginning of a line.

$ The dollar sign matches the end of a line. E.g., the expression a$ matches any lower
case a, but only at the end of a line.

(...) Several regular expressions may be grouped together by enclosing them in parenthe-
ses. Note, depending on the application it may be necessary to escape the parentheses
with a backslash (i.e. \(. . . \)).

* The preceding (regular) expression may occur an arbitrary number of times, i.e., k-times
in sequence where k ∈ {0, 1, 2, . . . }. Note, it may also occur not at all (for k = 0).
E.g., the regular expression ab*c matches an a that is followed by an arbitrary number
of bs which are followed by the character c. Besides others, this pattern matches the
strings: ac, abc, and abbbbbc.
8
With Unix-like system, the newline is represented by the ASCII linefeed character (hexadecimal: 0x0a,
decimal: 10). With MS-DOS or Microsoft Windows, the ASCII sequence carriage return (hexadecimal:
0x0d, decimal: 13) plus linefeed is used to represent newlines.

Author: Dipl. Ing. T. Zeitlhofer Page 15/24


Shell and System Programs

+ Similar to *, but for k ∈ {1, 2, . . . }. I.e., the preceding element must occur at least once.
E.g., the pattern ab+c matches the strings abc and abbbbbc. However, it does not
match the string ac.

? Similar to * and +, but for k ∈ {0, 1}. I.e., the preceding element is only allowed to occur
once or not at all. E.g., the pattern ab?c matches the strings ac and abc. It does not
match the string abbbbbc.

| This allows to specify an or-relation between two patterns. I.e., it matches the preceding
expression or the succeeding expression. Depending on the application, it may be
necessary to escape the bar (|) with a backslash (i.e. \|).
Example 12. With grep, the pattern abc\|def matches either the string abc or the
string def.
jdoe@compu$ echo "abc" | grep 'abc\|def'
abc


Example 13. As documented in the man page, grep also supports an extended set of
regular expressions (option -E). In this case, the bar (|) is directly interpreted as a
regular expression:
jdoe@compu$ echo "abc" | grep -E 'abc|def'
abc

With regular expressions, the special meaning of characters may be disabled by quoting them
with a backslash. E.g., the expression \. matches just a single dot.

4.1 sed
In contrast to interactive editors which allow to modify text at the position of the cursor, the
program sed (stream editor) allows to modify text files automatically. Thereby, sed can be
used as a filter that reads input from STDIN and writes the modified text to STDOUT. Text
manipulations are specified by a scripting language that even supports conditional branches
(see the man page for details). Within the scripts, regular expressions are typically used to
select sections that should be manipulated (comparable to cursor placement with interactive
editors).
A common task with text files is search and replace. For this, sed supports the following
command:

s/REGEXP/REPLACEMENT/FLAGS The substitute command allows to specify the search pat-


tern with a regular expression (REGEXP). Within the regular expression, parentheses
((...)) may be used to identify sub-patterns.
The matched pattern is replaced by the string REPLACEMENT. Within REPLACE-
MENT, matched sub-patterns may be referenced by numbers, i.e., \1 references the
first sub-pattern, \2 matches the second sub-pattern, and so on (up to \9).

Author: Dipl. Ing. T. Zeitlhofer Page 16/24


Shell and System Programs

Example 14. To demonstrate the usage of the substitute command, we consider the following
task: mark any occurrence of the character o at the beginning of a line and any occurrence
of the character e at the end of a line. We mark the occurrences by putting the respective
characters inside angle brackets, i.e., <o> or <e>.
Which regular expression should we use? The character o at the beginning of a line is
matched by the pattern ^o and the character e at the end of a line is matched by the pat-
tern e$. To match either, the pattern ^o|e$ can be used. To be able to put the matched
character inside angle brackets, it must be possible to reference the matched character (o or
e) within REPLACEMENT. For this, we have to put the regular expression inside parenthe-
ses, i.e., (^o|e$). Then, the expression \1 allows to reference the matched character within
REPLACEMENT. So, to put the matched character inside angle brackets, the REPLACE-
MENT string becomes: <\1>.
In the following example, sed is called with the option -r which activates extended regular
expression support. The substitute command is given as the last parameter to sed (within
single quotes, recall “quoting”):
jdoe@compu$ echo "one two three" | sed -r 's/(^o|e$)/<\1>/g'
<o>ne two thre<e>

Note, the substitute command may be followed by zero or more flags. With the substitute
command above, the flag g has been specified which tells sed to replace all matches within
a line (and not just the first). Without this flag, the replacement would have been applied
to the first o, but not to the last e.


5 Shell Programming
The shell supports control constructs, like conditional execution/branches (e.g.
if-then-else) or loops (e.g. for, while), that are typically found in programming lan-
guages. Together with the support for variables, it is possible to write programs within the
shell, i.e., shell scripts.
A shell script may be directly entered at the command line or it may be stored in a file.
This file can be executed like a regular program, but the commands will be interpreted by
the shell. To make the file executable, two steps are necessary:

1. The file permissions have to allow execution (e.g. -rwxr-xr-x).

2. The first line of the file has to specify the interpreter with a so-called shebang construct.
I.e., the first line starts with the letters #! (sharp and bang) which are followed by the
name of the interpreter9 .

A simple shell script that just writes “Hello World” to STDOUT is given in Fig. 6.
9
For shell scripts, the interpreter is the program that implements the shell (e.g. #!/bin/bash). For other
scripting languages, like perl, python, ruby, . . . , the corresponding interpreter is also specified by such a
shebang line, e.g.: #!/usr/bin/perl.

Author: Dipl. Ing. T. Zeitlhofer Page 17/24


Shell and System Programs

1 #!/bin/sh
2
3 echo "Hello World"

Figure 6: Shell script that prints “Hello World”. The shebang construct specifies the inter-
preter (/bin/sh is the system’s default shell).

It is possible to give parameters to shell scripts. Within the shell script, the parameters can
be accessed by using specially named variables. I.e., the positional parameters are accessed
with $1 (the first parameter), $2 (the second parameter), and so on. Note, the variable $0
contains the file name of the shell script.
Consider a file named hello that contains the following shell script:
1 #!/bin/sh
2
3 echo "Hello $1 $2"

If called with a single parameter “World”, this shell script produces more or less the same
output as the script in Fig. 6:
jdoe@compu$ ./hello World
Hello World
jdoe@compu$ ./hello World1 World2
Hello World1 World2

Recall, the shell searches for executable files (i.e. programs) in all directories that are given
in $PATH. Typically, the current directory is not part of $PATH. Therefore, we have to specify
the full path to our shell script hello. Each directory contains two special entries, a single
dot (.) represents the current directory and two dots (..) represent the parent directory.
Therefore, the full path to an executable named hello within the current directory can be
specified by ./hello.

5.1 Conditional Execution/Branches


Every program returns an exit status (status code) upon completion. If the program was
successful (i.e., no errors) then the status code is 0. Non-successful commands return non-
zero exit codes.
Example 15. Consider the program grep (Sec. 3.2.4). If the pattern is found then grep
returns the exit code 0. If the pattern is not found then a non-zero exit code is returned. A
program’s exit code is contained in the shell variable $?:
jdoe@compu$ echo "abc" | grep "abc"
abc
jdoe@compu$ echo $?
0
jdoe@compu$ echo "abc" | grep "def"
jdoe@compu$ echo $?
1

Author: Dipl. Ing. T. Zeitlhofer Page 18/24


Shell and System Programs

The exit code may be used for conditional branches. In the following, cmnd1 and cmnd2
represent two arbitrary programs or pipelines (compare Sec. 3.3).

cmnd1 && cmnd2 This simple form of conditional execution means that cmnd2 is executed
only if cmnd1 has been completed successfully (i.e., exit code 0). The syntax resembles
a logical AND:

AND (&&) false true


false false false
true false true

I.e., if any command is not successful then the result is a non-zero exit status (“false”).
If both commands are successfully completed then the exit status of this construct is
zero (“true”). Note, if cmnd1 exits with a non-zero exit status (“false”) then the exit
status of the whole construct is already known (“false”). Therefore, it is not necessary
to execute cmnd2 in this case.
Example 16. Consider the following commands:
jdoe@compu$ echo "abc" | grep "abc" && echo "found"
abc
found
jdoe@compu$ echo "abc" | grep "def" && echo "found"

In the first line, grep successfully finds the string abc and terminates with exit status
0. Therefore, the second command (echo "found") is also executed.
In the second line, grep does not find the pattern def and returns a non-zero exit
status. Therefore, the second command is not executed (and no output is produced).


cmnd1 || cmnd2 This construct resembles a logical OR:

OR (||) false true


false false true
true true true

I.e., cmnd2 is executed only if cmnd1 returns a non-zero exit status. Note, if the first
command returns the exits status 0 (“true”) then the exit status of the whole construct
is also 0 (“true”). Therefore, it is not necessary to execute cmnd2 in this case.
Example 17. In the following examples, the echo command is executed only if grep
does not find the pattern (non-zero exit status):
jdoe@compu$ echo "abc" | grep "abc" || echo "not found"
abc
jdoe@compu$ echo "abc" | grep "def" || echo "not found"
not found

Author: Dipl. Ing. T. Zeitlhofer Page 19/24


Shell and System Programs

Note, in the first case, the string abc is the output of grep. 

if-then-else Like traditional programming languages, the shell supports if-then-else con-
structs. The general syntax is (parts in square brackets are optional):
if cmnd1; then cmnd2; [elif cmnd3; then cmnd4;] ... [else cmndN;] fi

Example 18. The following examples demonstrate the usage of the if-then-else con-
struct:
jdoe@compu$ if echo "abc" | grep "abc"; \
then echo "found"; else echo "not found"; fi
abc
found
jdoe@compu$ if echo "abc" | grep "def"; \
then echo "found"; else echo "not found"; fi
not found

Note, a single backslash at the end of line is treated as a line continuation, i.e., the
shell ignores the backslash and the following line break altogether (used to break long
lines for better readability). 

case-esac This construct allows to compare an expression with several patterns. Depend-
ing on the matching pattern, a branch is chosen. The general syntax is (parts in square
brackets are optional):
case word in [ [(] pattern [ | pattern ] ... ) cmnd ;; ] ... esac

Example 19. This kind of conditional branches are typically found in shell scripts that
start services after system start. The script in Listing 1 matches the first positional

1 #!/bin/sh
2

3 case "$1" in
4 start|restart)
5 myprogram
6 ;;
7 stop)
8 killall myprogram
9 ;;
10 *)
11 echo "Usage: $0 {start|stop|restart}"
12 exit 1
13 ;;
14 esac

Listing 1: Usage of case-esac to start or stop the service called “myprogram”.

parameter (“$1”) to the strings “start” and “restart” (start|restart is to be read as


“start” or “restart”). If the script is called with any of these two parameters then the
program myprogram is executed.
If $1 does not match “start” or “restart” then it is matched against “stop”. I.e., if the
script is called with the parameter “stop” then any running instance of myprogram is
terminated (the program killall allows to terminate “kill” processes by name).

Author: Dipl. Ing. T. Zeitlhofer Page 20/24


Shell and System Programs

If the shell script is called with any other (or even no) parameter then $1 matches the
wildcard *, i.e., a default branch is taken. In this case, usage information is printed
and the shell script terminates with a non-zero exit status (exit 1). 

5.2 Loops

for Within for-loops, a variable takes on several values in turn. The general syntax is:
for name in word ... ; do cmnd ; done

Therby, “name” is the name of the variable and “word” is the first value which may
be followed by additional values (separated by blanks). The command “cmnd” (it is
also possible to specify a list of commands) is executed for each value of the variable
“name”.
Example 20. In the following example, the variable i takes on the values 1, 2, 3, and 4
in turn:
jdoe@compu$ for i in 1 2 3 4; do echo $i; done
1
2
3
4

I.e., the echo command is called for each value of i in turn. 

Often, for-loops are used to assign a list of increasing numbers to a variable. Such
a list may be generated by the program seq. In general, this program expects two
parameters, the start and the stop value:
jdoe@compu$ seq 1 4
1
2
3
4

So, the program seq may be used to generate the list of values within the for-loop.
This is achieved by command substitution. If a command is surrounded by backticks
(`) like
`cmnd`

then the shell executes cmnd and replaces this construct by the output (STDOUT) of
cmnd.
Example 21. So, an alternative implementation of the for-loop in Example 20 is:
jdoe@compu$ for i in `seq 1 4`; do echo $i; done
1
2
3
4

Author: Dipl. Ing. T. Zeitlhofer Page 21/24


Shell and System Programs

shell mathematics
-lt <
-gt >
-le ≤
-ge ≥
?
-eq =
-ne 6=

Table 1: Arithmetic comparison operators – shell syntax compared to mathematical notation.

while The general syntax is:


while cmnd1; do cmnd2; done

I.e., the command cmnd2 is executed as long as cmnd1 returns an exit status 0.
An alternative construct (corresponding to a negation of cmnd1’s exit status) is
until cmnd1; do cmnd2; done

I.e., the command cmnd2 is executed as long as cmnd1 returns a non-zero exit status.
Example 22. The following example implements the same functionality as Examples 20
and 21, but with a while-loop:
jdoe@compu$ a=1; while [ $a -lt 5 ]; do
echo $a; a=$((a+1));
done
1
2
3
4

Note, the example above makes use of two shell constructs that are worth mentioning:

1. The expression within the square brackets is an arithmetic comparison, i.e., an


integer value of a is expected which is compared to 5. The operator -lt is an
abbreviation for less than, i.e. in mathematical notation, the expression within
the square brackets is equivalent to a < 5. Tab. 1 shows additional arithmetic
comparison operators.
2. The expression $((a+1)) allows to perform numeric calculations with shell vari-
ables. I.e., expressions within double parentheses are subject to arithmetic evalua-
tion. E.g., if the value of a is 2 then the expression $((a+1)) gives 3. Note, within
the double parentheses the variable a is referenced just by its name (without a
leading $).

Author: Dipl. Ing. T. Zeitlhofer Page 22/24


Shell and System Programs

A Questions to verify the learning success


1. Explain the functionality of the shell when executing a program.

2. Explain the usage of shell variables.

3. How do you get the reference documentation for a given system program?

4. What is expansion in the context of the shell?

5. What is quoting and which quoting mechanisms do you know?

6. What is the functionality of the program ls?

7. What is the functionality of the program cp?

8. What is the functionality of the program mv?

9. What is the functionality of the program mkdir?

10. What is the functionality of the program cat?

11. What is the functionality of the program chmod?

12. What is the functionality of the program find?

13. Use the echo command and proper quoting mechanism to write the following strings
literally to standard output:

• "abc"
• #!/bin/sh
• $0
• "$0"
• '$0'
• newline: \n
• * * * * *

14. What are the characteristics of a program that works as a filter?

15. What is the functionality of the program sort?

16. What is the functionality of the program head?

17. What is the functionality of the program tail?

18. What is the functionality of the program grep?

19. What is the functionality of the program sed?

Author: Dipl. Ing. T. Zeitlhofer Page 23/24


Shell and System Programs

20. What are regular expressions?

21. What does the first line of shell script contain?

22. With shell programming, which kind of conditional execution/branches do you know?

23. With shell programming, which kind of loops do you know?

24. Explain command substitution with the shell.

25. With shell programming, how do you compare two expressions arithmetically?

26. With shell programming, how do you arithmetically evaluate expressions?

Author: Dipl. Ing. T. Zeitlhofer Page 24/24

You might also like