You are on page 1of 20

Unit-6

1

Unit-6 Test Manipulation

INSPECTING FILES Let us first look at commands that allow us to inspect files without altering them. For example, we might want to find out how many words there are in a file, or we might want to locate places in the file which contain a particular text expression. Before going further, we must be clear as to what a text file is. This is a file which contains only printable characters and which is organised around lines. Although in some cases we can alter the files, these commands are really meant to let us look at the files or to find about them File Statistics The wc command tells you the number of characters, words and lines in a text file. % wc quotation 8 43 227 quotation This means that quotation has 8 lines, 43 words and 227 characters. A word is a string of characters delimited by any combination of one or more spaces, tabs or newlines. If you wish you can make wc operate on the standard input, whereupon you will not find any filename displayed in the output. % WC No generalisation is ever wholly true, including this one. The problem with equality is that we desire it only with our superiors. ^D 2 202 130 This also means you can use wc in a pipe, either to read from or write to. Thus

ACME/Gulshan Soni

Unit-6 % cat quotation | we 8 43 227 or % who | wc 9 45 333 In both cases the method used is perhaps not the most natural one. For example, to find out the number of users in a system you could say % who -q

2

You can do a wc on several files at a time and then you get an additional line of output giving the total figures. If you wish, you can find only the number of lines in the input by using the -1 option, only the number of words by using - w and only the number of characters by saying -c. These options can be combined in any order. So % wc -cl quotation 227 8 quotation You can see that % WC -Iwc is the same as we.

Searching for Patterns We can now come to a few commands which help in locating patterns in files. One such program is grep (for global regular expression printer). It takes one regular expression which you want it to search for, and looks for it one by one in all of the specified fields. Whenever grep finds a line in a file that contains the pattern, it prints the line on the standard output. if more than one file was given to grep to search. the line is preceded by the file name in. which the match was found, followed by a colon. If only one file was to be searched then only the line is printed. A word on regular expressions is in order. A regular expression is away of

ACME/Gulshan Soni

Unit-6

3

specifying a template or pattern which can match several text strings according to certain rules. For specifying the template, some characters are used with a certain meaning. Such characters are called metacharacters. Thus a dot (.) matches any single character. We will not go into the details of the rules governing regular expressions here, because You must have learnt about them in Your compiler design course. Regular expressions are used there specify languages consisting of legal sentences from an alphabet. From such a specification you must have learnt how to construct a lexical analyser which accepts only valid sentences, that is, sentences of the language specified bv the regular expression. In the sent context, our alphabet is the set of printable character; and the language is the set of all the text strings that match the regular expression. You should refer to your UNIX manual to find out the exact rules for constructing regular expressions for grep. Since the C-shell itself attaches a special meaning to many of the metachamcters, you will need to tell the shell not to interpret the regular expression which You are trying to pass to grep. Single quotes we the safest way of telling the shell this. So the regular expression argument to grep should be enclosed in single quotes, although double quotes also do work in many cases. We will examine this matter in the next unit on Shell Programming. Unfortunately the meaning attached to metacharacters in different utilities of UNIX is not always consistent. For example, in grep, as we just saw an arbitrary single character is matched by a period (.) while in the C-shell this is done by the question mark (?). This is a potential source of confusion, and all the more so because a beginner can find it hard to construct or even interpret a regular expression anyway. However, with practice this difficulty reduces somewhat. Moreover, not all utilities support regular expressions in their fullest manifestation, and actually the degree of support varies amongst them. By now you will be complaining because you want to see some real examples, not endless commentary on the command. So here we go % grep Gupta Payfile tells You where the suing "Gupta" occurs in the file payfile. As shown here grep is matching a text string exactly. Every line in payfile that contains the given string anywhere will be printed. You can give more than one file as an argument. % grep Thomas custfiie orderfile If you want to know the line number in the file of the line on which the matches were found, say % grep -n Australia country

ACME/Gulshan Soni

Unit-6 To count the number of lines which matched, just say % grep -c India prodfile

4

This will not print the line and only the count will be shown. You can invert the sense of a match like this % grep -v India prodfile This command will print lines that do not include the string India. Remember that grep looks for only one regular expression but can look at More than one file. So do not try % grep Ram Kumar users grep: can't open Kumar to look for Ram Kumar in a file users. The Command as shown will look for a string Ram in the two files called Kumar and Users. Instead you should say % grep "Ram Kumar. users whereupon Ram Kumar will be searched for in the file users. There is also an option to turn off case sensitivity. So % grep -i "Ram Kumar" users will find any occurrence of Ram Kumar Irrespective of case. Thus this would report RAM KuMAr as a match. What if there could be occurrences of die string in the file with an unknown number of spaces between the two words? You will now need to use regular expressions. % grep "Ram *Kumar" users matches Ram Kumar in this case. The * metacharacter specifies a closure meaning that the preceding pattern is to be matched 0 or more times, which is what we want here. Grep is line oriented and patterns are not matched across line boundaries. The metacharacters ^ and $ stand for the beginning and the end of a line respectively. So to look for an empty line, say % grep 'A$' users But if you are looking for blank lines, say

ACME/Gulshan Soni

Unit-6 % grep '^[ ^I]*$' users The [^I] is the character class consisting of spaces and tabs, and the * metacharacer is a closure which looks for 0 or more occurrences of these. To see whether khanz is a valid login, name, say % grep "Akhanz" /etc/passwd

5

because the login name is the first field in the passwd file. You can get every line in a file with line numbering by saying % grep -n . letter This is like a cat on the file but with the line numbers displayed too. To find lines containing a number, say % grep '[0-9]' table which will find a sequence of one or more digits. We have seen that grep cannot search for more than one regular expression at a time. There is another utility called egrep which can handle regular expressions with alternations. We will not look at it here but you should study the manual entry for it. There is another utility in this family called fgrep which does not handle regular expressions. Since it handles only fixed text strings, however, it is faster. Thus you can say % fgrep "Ram Kumar" empfile custfile Another advantage to this command is that you can store a list of words in a file, say search, one word per line. You can then look for the occurrence of any of those words in a file like this % fgrep -f search story Usually grep is sufficient for everyday use but whenever needed you can make use of fgrep or egrep.

ACME/Gulshan Soni

Unit-6 Comparing Files

6

We will now look at a group Of utilities which help us compare two files. While talking of cp in #2.4.8, we did not know of commands which could help us ascertain whether the original and the copy indeed had the same contents. First let us make a copy of the passwd file in our directory and then examine the two % cp /etc/passwd ~ % cmp /etc/passwd -/passwd The cm command takes two filenames as arguments and prints on the standard output the character offset and the line number of the first position where the two files differ. It is useful in comparing two binary files to see whether they are the same. It is not of much help in comparing two text files to see how they differ, because if you add or delete even one character in one of two different characters in the two files. So you can try something like % cmp lbinils /bin/cp /bin/IS /bin/cp differ. char 27, line I to see that they differ. To look at all the differences in two files say % cmp -1 /bin/Is /bin/cp and you should be flooded with several thousand lines of output, each line containing the bytes offset, the character in the first files (Is) represented in octal and the character in the second file (cp) also in octal, for every byte position where the two files differ (almost all in this case, one would imagine) until one or both the files end. If one file is shorter than the other but no differences are detected in the two upto the point the shorter file ends, cmp reports end of file on the relevant file. Now let us turn our attention back to text files. Suppose we have two text files which are sorted in ascending order. Now try % comm file1 file2 This produces on the standard output three columns of text. The first column contains lines that are to be found only in the first file and not in the second. The second column likewise contains lines present only in the second file. 'Me third column contains lines common to both files. T-his output could be all jumbled up

ACME/Gulshan Soni

Unit-6

7

if the files are not sorted. You can suppress the printing of one or more columns like this % comm -1 file 1 file2 This suppresses the printing of lines only in the first file. So % comm -3 file 1 file2 will print lines only in file 1 and those only in file2 but not those that are common to both files (column 3). You can suppress two columns as well. Thus to print only lines in file 1, say % comm -23 file 1 file2 As you would expect % comm -123 file 1 file2 will print nothing. cmp and comm are simple commands and it would be easy to write a program to accomplish what they do. We will now take a quick look at a utility which is far more complex. The diff command takes two text files as arguments and brings out the smallest set of differences between them. It can also produce output which can be used by the text editor ed to produce the second file from the first At the heart of diff is a complex algorithm to find the largest common subsequences in two blocks of text Let us look a little more at how these utilities can help you. Suppose you have a file containing the names of a few places you would like to visit, as follows % cat places Agra Cochin Delhi Goa Guwahati Jhansi Puri

ACME/Gulshan Soni

Unit-6

8

Secunderabad ^D Also let there be, another file containing the names of places a friend of yours would like to visit. % cat moreplaces Agra Guwahati Goa Gwalior Kochi Madras Udaipur ^D Now you want to plan out an itinerary after discussion. In this discussion if you both agree about wanting to visit a place, there is no difficulty. Otherwise you will have to decide what to do. So first you need to know whether you disagree at all. We will assume here that both the files are sorted, and later in this unit we shall see how this can be easily done. So to find out about the disagreements, we can say % Cmp places morephices Places and moreplaces differ char 6 line 2 Well, it was too much to expect complete agreement. To find out the differences you can use comm. % comm places moreplaces Here column 3 will tell you about the places you both agree upon. Now you only have to discuss columns 1 and 2 to arrive at an agreement. But we have still not talked of diff. Say

ACME/Gulshan Soni

Unit-6 % diff places moreplaces

9

This indicates the differences between the files in three ways a, d and c. The a stand for lines which have been added, d for lines deleted and c for lines changed between the two files. The symbol refers to the first and to the second file. We will not discuss the command a length but will see a few options to diff. % diff -e places moreplaces Produces output in a form suitable for the editor ed. You can save this output to a file and apply the change file to the first file to produce the second. If you are wondering why one would want to do such a thing, you should wait until the unit on programming tools, where we discuss version control. The essence of it is that instead of storing every version of a file completely. One stores only the initial version and all the change to it. One can always recreate any version by applying the appropriate set of changes to the initial version. There is another option to diff which ignores all but leading white space on a line % diff -b places moreplace diff can handle files of a limited size only. There is a Command called bdiff which can be used for large files, but it just uses diff after breaking up the files into manageable chunks. So difference across chunk boundaries may not come out optimally. Another command is sdiff, which works like diff but places the output from the two files side by side. Lines that are present in one file but not in the other are shown by and. Lines that are present in both files but differ somewhat are shown separated by a pipe (|). This command can be used to merge two files into one, keeping the common portion intact and incorporating the differing parts of both files.

OPERATING ON FILES We can now look at several utilities which will allow us to alter files in some way. The utilities in the previous section, in contrast, allowed us to look at the files without manipulating their contents. However, most of these utilities are filters and can write out the changed file only to the standard output, hence it can be redirected to another disk file. Very few commands allow you to change a file inplace.

ACME/Gulshan Soni

Unit-6 Printing Files

10

If there is a long file and you cat it to the screen, the output is difficult to understand because there are no page breaks, headers and the like. If you redirect the output to a printer, the resulting file is a long stream of lines without regard to the page length of your stationery. To get a formatted output, you can use the pr command. % pr places pr breaks up the file into pages with a header, text and footer area. The header contains a line giving the date, the name of the file and the page number. The length of the page can be altered by the -1 option and the header can be set by the -h option. Thus if you want to print itinerary as the heading instead of the filename places, say % pr -h Itinerary places The header and footer can be suppressed by the -t option. You can expand tabs to any desired number of spaces by using the -e option followed by the number. Thus to expand tabs to 4 spaces instead of the default of 8, say % pr -e4 places You can give a left margin to your output by using the -o option followed by the number of characters you want to use for the margin. Thus to have a 5 character margin, say % pr -o5 places You can also set up double space printing by using the -d option. If you want to print in more than one column, just use the -n option where n is the number of columns you want. So to print in two column format, use % pr -2 places The column separator is a tab by default but can be changed by the -s option to whatever single character you want. You only have to put your desired separator after the -s. The width of the output can be changed by using the -w option. For example, if you are using 132 column stationery, you can say % pr -4w132 places which will print the file in 4 column format with the width being 132 characters. If you want to merge several files, you can use the -m option. Thus

ACME/Gulshan Soni

Unit-6 % pr -2m places moreplaces

11

will print the two files, one per column. You can use the -p option to pause after every page if the output is to a terminal. Thus it could be some sort of a substitute for more or pg, although pr will not provide the several other features that more has (pattern matching, for instance).The output from pr is usually redirected to a printer to produce a hard copy. It is rarely useful to just look at a formatted file on the terminal.

Rearranging Files There are two commands which will enable you to obtain a vertical section of a text file. This is like implementing the projection of a database relation. Let us say that we have a file studfile containing the names of students and the marks they have obtained in some examination. From this we want to create a file containing only the names. The cut command is well suited to perform such a task. Let us look at a small portion of the file Ajay Sapra Pappu Ahmed Vinod Bhalla 87 85 91

You can see that the names extend from column 1 to 20 and the marks are in columns 21. To obtain the names alone we can cut out those columns like this % cut -c 1-20 studfile Ajay Sapra Pappu Ahmed Vinod Bhalia This gives us the columns 1 to 20 of the file studfile oh the standard output Similarly to get the marks alone (for some analysis, for example) you can say % cut -c21,22 studfile or % cut -c21-22 studfile or ACME/Gulshan Soni

Unit-6

12

in this case, even % cut -c21- studfile This last command cuts out all the columns starting from column number 21. Remember that cut does not affect the original file in any way. It does the transformation only onto the standard output which can be redirected as always, if You want it in a disk file. Now suppose you want the surnames of all the students in surfile. Can you do this with what you already know? You will find that you cannot achieve the desired result because the first 20 columns, which contain the name, are actually Only one fixed length field (name) of studfile as currently organised. The first name and the surname take up an arbitrary number of columns out of these 20. In other words, the first name and the surname are not of fixed length. So there are no parameters you can give to the -c option of cut which will be correct for all records in this file. In such a case You must tell cut to work with variable length fields rather than column positions. So say % cut -fl,2 studfile to try and get the names alone. You might be a trifle surprised at the result because there will be no effect. If so, it was because you expected that the field separator would be a space. But actually cut expects the fields to be separated by tabs by default. To tell it to consider a space (or any other character) as the separator, use the -d flag before specifying the field numbers % cut -d" " -f2 studfile surfile; cat surfile Sapra Ahmed Bhalla You can. create another file containing only the first names % cut -d" " -f1 studfile firfile and we might put the marks into a file as well % cut -d" " -f3- studfile marksfile

ACME/Gulshan Soni

Unit-6

13

Since every space is now considered to delimit a field, we have to cut out every field from the 3rd field onwards. That is why You will find it necessary to give the hyphen after f3. We. have now separated studfile into three files, each containing one of the fields of the file. Let us now see how we can put the fields back in a different order. Suppose we want the marks list but with the names given as surname followed by a comma and the first name. and followed by the marks secured. We have all the components available with us in the three files we just created. To put them back we can use the paste command like this % paste -d", " surfile fiffile marksfile Sapra, Ajay Ahmed, Pappu Bhalla, Vinod 91 What does this command do? It writes lines to the standard output and constructs each line by concatenating lines from the files specified with the field separator for that field. Thus the first line of the output consists of the first line of the files in the order they are specified on the command line, with the first delimiter being used after the first field, the second after the second, and so on. If only one delimiter is given, it is used to delimit all fields. The default delimiter is a tab character. We could have achieved this result using only two intermediate files because cut and paste are both filters. % cut -d" " -f2 studfile 1 paste -d". " - firfde marksfile Whenever a command accepts multiple filenames, one can use - to specify that the standard input be used at that point So we could also have achieved our result by using only two intermediate files like this % cut -d" " -f 1 studfile 1 paste -d", " surfile – marksfile 87 85

ACME/Gulshan Soni

Unit-6 Sorting Files

14

While cut and paste allow you to rearrange a file vertically, it is very common to want to rearrange a file horizontally, that is, to sort it in some order. UNIX has an elaborate sort command which allows you to sort files in various ways with a variety of options. Here we will look at some of the features of the sort command. Consider a file empfile containing the first name, the surname, the date of joining the company, the employee number and the basic salary Ram Gupta 24/03/84 2038 15200.00 Harish Gupta 18/10/89 5496 4300.00 Thomas Robinson 04107/87 3562 4800.00 Gopal Das 28/02191 8764 4400.00 Anil Jain 13/09/85 2867 6500.00 The UNIX sort is based on fields of variable length and the field delimiter can be specified. The default is the space character. Let us see the result of sorting empfile % sort empfile Anil Jain 13/09/85 2867 6500.00 Gopal Das 28/02191 8764 4400.00 Harish Gupta 18/10/89 5496 4300.00 Ram Gupta 24/03/84 2038 15200.00 Thomas Robinson 0.4/07/87 3562 4800.00 As you can see the result is written to the standard output. Sort can read from the standard input and is thus a fitter. The default mode of sorting is in the collating sequence of the machine, ASCII for example, and in ascending order starting from the first character of the line. To sort on the surname, which is the second field, you can say % sort+ 1 empfile Gopal Das 28/02/91 8764 4400.00

ACME/Gulshan Soni

Unit-6 Ram Gupta 24/03/84 2038 15200.'k)0 Harish Gupta 18110/89 5496 4300.00 Anil Jain 13/09/85 2867 6500.00 Thomas Robinson 04/07/87 3562 4800.00

15

So the +1 means that sorting starts at the second field. To sort on multiple field ranges, you can give the field number to stop sorting at %sort + 1- 2 empfile Gopal Das 28102191 8764 4400.00 Harish Gupta 18/10189 5496 4300.00 Ram Gupta 24103184 2038 15200.00 Anil Jain 13/09185 2867 6500.00 Thomas Rqbinson 04107187 3562 4800.00 Let us now try % sort +4 empfile Ram Gupta 24103184 2038 15200.00 Harish Gupta 18/10189 5496 4300.00 Gopal Das 28102191 8764 4400.00 Thomas Robinson 04/07187 3562 4800.00 Anil Jain 13/09/85 2867 6500.00 How has Ram Gupta with the highest basic salary of Rs 15,200.00 come at the beginning of the list? This is because sort sorts from left to right in the ASCII collating sequence and 1 is smaller than any other digit in this case. So the field starting with 1 appears at the beginning. In other words sort looks at the dictionary order rather than the numeric value of the field. To make sort use the numeric order for numeric fields, say % sort -n +4 empfile

ACME/Gulshan Soni

Unit-6 whereupon the record for Ram Gupta will appear at the end.

16

Let us now see how to sort on portions of fields. A practical example is when you have the dates given in dd/mm/yy form as above and you want to sort in the ascending order of the date. If the dates were in yy/mm/dd order there would have been no problem. So we need to sort on the 7th and 8th characters of the third field followed by the 4th and 5th characters and the 1st and 2nd characters. Note that including a constant "f' character in between will not make a difference, but to illustrate the syntax we will exclude this character. % sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile Ram Gupta 24103/84 2038 15200.00 Anil Jain 13/09/85 2867 6500.00 Thomas Robinson 04/07187 3562 4800.00 Harish Gupta 18/10/89 5496 4300.00 Gopal Das 28102191 8764 4400.00 The field delimiter can be any character other than the default space, in which case it has to be specified with the -t option % sort -t"|" +2 -3 +0 -1 testfile will sort on the 3rd and 1st fields of testfile considering the "I" character to be the field delimiter. If there is more than one record with the same value, you can get unique records by using the -u option, and duplicate records will not be repeated in the output. You can specify an output file where the output is to be written or you can redirect the output if you want. % sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile -o emp.out writes the result to the file "emp.out'. The sort command is one of the few utilities which can work inplace. So the output file can be the same as the input, but do not try this using redirection unless the environment variable noclobber is set! % sort +2.6 -2.9 +2.3 -2.6 +2.0 -2.3 empfile -o empfile Now empfile will have changed after the command completes. Ibis method is particularly useful when you are sorting a large file and do not have space to

ACME/Gulshan Soni

Unit-6

17

keep both the unsorted and sorted files on the disk. But remember that sort uses temporary space in the directory /usr/tmp, so there must be enough space there or your sort will abort, though your source file will not be overwritten unless the sort has completed successfully. The sort command is not limited to sorting one file. You can sort, several files in the same manner simultaneously by giving the names of all the files on the command line, but remember that the output will then be all in one file. You can check if a file is sorted in a particular manner by giving the sort command with the -c (check) option. You can sort in reverse order by using the -r option. To merge two or more files that are already sorted, use sort with the -m (merge) option. This is of course much faster than sorting the files from scratch. Incidentally the UNIX sort is not a very efficient one.

Splitting Files Sometimes one wants to split files into pieces. We will take a practical example later, and first see how we can do the job. Suppose there is a large file called "stores" consisting of the stores transaction data of a large organisation. If the file is a large one, say with 324532 records, you might at times want to split it For this you can say % split stores and the file will be split into 1000 line pieces. Each piece will be stored in a file. The last piece will have whatever is left after the penultimate piece has been created, which in this case will mean that the last piece has 532 lines. What are the pieces called, you might ask. The files are by default named xaa, xab, xac, ...I xaz, xba, xbb and so on upto xzz. So you cannot split a file into more than 676 pieces using the split command. If you want to, you can specify a prefix different from x to name each of the portions produced by indicating it on the command line. Thus if you want to call the pieces partaa, partab and so on, you can say % split stores part You can also change the number of lines that are put into each piece by giving this number on the command line. So % split -10000 stores part

ACME/Gulshan Soni

Unit-6

18

will split the file into 10000 line pieces instead of the default of 1000. Note that the split is done based on lines in each piece rather than on the size in bytes of each piece. Also there is no way of automatically telling split to produce a specified number of pieces. Thus if you need to split a file into exactly 20 pieces, you will have to first determine the number of lines in the file. Then work out a piece size which will give you the number of pieces you want (20 in this case). It should be easy to see that there can be more than one piece size which will produce a specified number of pieces from a file, because the size of the last piece can vary depending on what is left over. However, keep in mind that split will also work if the number of characters in a line is not fixed, that is, when you have variable length lines. There can be various situations where it might be necessary or desirable to split a file into several parts. Let us look at one such situation. Imagine a data file of 100 MB in a partition on the hard disk which has 150 MB of free space left. The file has 100000 records of 1000 bytes each, including the newline. Also assume that the partition which contains the /usr/tmp directory has only 80 MB of space free. We want to sort our 100 MB data file. How can we do this? You know that the sort command uses temporary work space in the /usr/tmp directory and that it needs about 1.2 times the size of the file as work space. So to sort a 100 MB file the /usr/tmp partition must have at least 120 MB of free space. Since this is not so in this case, we cannot sort the file directly although there is enough space to hold the sorted file. What you can do is to split the source file into two parts of 50 MB containing 50000 records each. Now sort each piece separately inplace. There is enough space in /usr/tmp to sort a 50 MB file. Then merge the two pieces together using the -m option to the sort command. This option does not need much work space. By reducing the size of the files to be sorted you can accomplish your goal of sorting the file.

Translating Characters There is a very useful command to translate characters in a text file. Suppose we have a file quotation % cat quotation Chess, like music, like love, has the power to make men happy and we want to change all letters to capitals or upper case. We can do this easily by using the tr command

ACME/Gulshan Soni

Unit-6 % tr '[a-z] " [A-Z]' quotation QUOTA77ON

19

CHESS, LIKE MUSIC, LIKE LOVE, HAS THE POWER TO MAKE MEN HAPPY Notice that letters that are already upper case are not affected because there is no translation specified for them. The tr command takes two arguments which specify character sets. Every character from the first set is replaced by the corresponding character from the second set. The command is a filter and takes input from the standard input and writes to the standard output. If you want to use the command on disk files you will need to redirect the input and the output accordingly, as has been done in the example above. The arguments, that is, the character sets can be specified either by enumerating them or as ranges. In the example given both the arguments have been specified as ranges. For this to be possible the characters must be in the ascending order of the ASCII collating sequence without any gaps. To implement Caesar's cipher, for instance, you can use tr on the source file. In this primitive cipher, every letter of the Roman alphabet is shifted forward by three characters. Thus a becomes d, b becomes e and z becomes c. So try , % tr '[a-z]'defghijklmnopqrstuvwxyzabc plaintext ciphertext Here we have specified the first character set as a range but the same cannot be done for the second. So the second character set has been enumerated in full. 'Me command given will not encipher upper case letters. If you want to change these too, or also want to change digits, you can modify the command appropriately. As usual, if a character in the command has special meaning to the shell, it needs to be escaped. Here we have used single quotes to escape the square brackets although double, quotes could have worked as well. What happens if the number of characters in the two character sets does not tally? Well, if there are more characters in the second set than in the first, there is no problem because there will never be occasion to translate to them. If there are more characters in the first set, the extra characters are ignored. Thus % tr '10-91' '19-fl' srcfl targetfl will change 0 to a, 1 to b, and so on, making a 5 into an f. Ale digits 6, 7, 8 and 9 will not be changed because there is no translation specified for them. The command has some other facilities. We can delete any set of characters

ACME/Gulshan Soni

Unit-6

20

from the input by using the -d option of tr and specifying only one character set So if you want to get rid of punctuation marks like a semicolon, a colon, a dash and a comma, you can say % tr -d srcfl targets The characters can also be specified by giving their octal representation after a backslash. Thus to delete all tab characters, you can say % tr -d '\0 1 1 ' srefl targetfl There is also the -s Or squeeze option using which you can collapse or squeeze multiple (Consecutive) occurrences of a character to a single occurrence of that character. Thus to replace multiple spaces by a single space, you can say % tr -s ' ' srcfi targetfl Finally we will look at the complement option specified by - c. This option complements or inverts the character set you specify. While % tr -d , [0-91 1 srcfl targetfl will delete all digits from srcfl and write the result out into targetfl, using the -c option with this will delete everything except digits from the file % tr -cd '[0-9]' srcfl targctfl leaves only digits from srcfl in targetfl.

ACME/Gulshan Soni