You are on page 1of 19
«and Fillers pipes a a 9 One advantage of pipe is that more th, one command can be ¢ na t an one con c single command line, ae § Another advantage is that a coy executed as a background process, * containing many commands, can be @ The redirection and piping tokens ean be chained together to command. Such a pipeline is known as chain Pipeline. In the example we) the content ofthe current directory, where this eon fltared te eontate sate the lines which contain .sh , output in shlist. This is commonly used in shell scripts and batch files 5,4 FILTERS process it and send it to the standard out read directly from files whose names are provided as arguments, send the results to the standard output, unless a redirection or a Some commonly used filter commands ar: sort cut Put file. A filter is actually a program that can Process it or filter it and pipeline symbol is used, e paste uniq tr we cat © grep coco ooo 5.4.1 Sort Command The sort command is used to sort the information in a text file in ascending or nding order. It is also used to merge sorted files. It takes zero, one or more filenames 4S arguments, The general syntax of sort is : descey sort | | options | | field specifiers | | input files The field specifiers tell it which fields to use for the sort. : = ® By default sort orders the data in ASCII Collating sequence - a space Bist, next the numerals, uppercase letters and finally the lower case letters. Sequence can be changed by using certain options. the The sort command without any arguments, reads the data to be es Standard input device - keyboard, sorts the data items and displays the sort: output on the terminal screen. as Sort can take multiple arguments. When multiple filenames are give! UNIX Program ing 112 ts, the data from all the files are first appended and then the Tesultane guments, a file is sorted. sort options mmand has several options. Some of the most common options ar; sort command has ‘ © given as below: a : Sorts according to dictionary and ignores the punctuation, : res caps while sorting, ft : Ignores caps whil | m + Merges two or more given sorted files into one sorted output file, n 2 Sorts according to numeric order. Sorts in reverse order. ou Removes duplicates and displays unique values, -0 outfile Places the sorted output in the file outfile, filename filename is the name of the file that needs to be sorted, -k fieldno Sorts the data according to the field number specified by fieldno, Consider the following example: $ sort namelist > namelist This command fails Since the shell creates the out; file and then ex: the command. This problem output file can be the samy ut file first erasing the original can be overcome by using the - 0 0, eas: data ption where the input file and $ sort -0 namelist namelist or different filenames as: $ sort -O namesorteq namelist Here namelist is the original file and namesorted is the sorted file. Example 5.3 : sort eee pipes a Filters 002 : 010: 005: 003 : Madhuri $ sort stdlist 001 : Sneha : 78 002 : Asha : 60 003 : Madhuri : 80 005 : Harshita : 77 010 : Akhila : 45 $ sort stdlist > newlist $ sort list1 list2 > list3 $ sort - 0 stdlist stdlist $ cat > regno 22 10 25 1 15 $ sort -n regno # numeric sort 1 10 15 22 25 $ sort -r regno 25 22 15, 10 1 $ cat >> regno 10 $ cat regno 22 10 25 1 15 10 $ sort -u regno # removes duplicates it 113 114 UNIX Pro, Bram 8 10 15 22 a $ sort -m -0 namelist list1 list2 $ sort -t ':’ -k2 phonelist Chitra : 23462346 Uma : 23492349 Rama : 23562356 Sumita : 23602360 Pratibha : 23612361 $ sort phonelist Chitra : 23462346 Pratibha : 23612361 2 Rama : 23562356 Sumita : 23602360 > Uma : 23492349 © Merging of two files can be performed by the command: $ sort -m - 0 namelist list4 list2 where the files list and list2 are merged into one file called namelist. ® -c option can be used to check whether a file is sorted or not, ® Files may be sorted depending on the key field specified. ‘The field to be sorted on, is specified using the -I option along with the column number. ~t option is used along with the - option, while sorting on fields which are not separated by default field separator blank. In the example, -t is followed by the field separator '' given within single quotes, 5.4.2. cut command required fields or columns from th Register no, name and marks, Ea extracted based on the character ‘The cut command splits files vertically. This command can be used to extract the file, Consider the file stdlist which contains 3 fields ch field is separated by the delimiter '’, The fields are Position or on field delimiter position. For example: Example 5.4 $ cut -c 1-3 stdlist 001 002 010 005 003 > aisplay " cut options plist? glist? -flist : delim: pes ana les 115 The above example cuts the first three characters (+ 1-8) from the stdlist file and t on the screen. The general form of eut command ie : cut | | options | | file list ‘The list after -b specifies the byte positions. The list following -c specifies the character positions such as -c 1-7 which passes the first 7 characters of each line. ‘The list following -f is a list of fields, separated in the file by a delimiter character (~d); for example -f 1,3 displays the fields 1 and 3 from the given file. The delimiter must be specified using the -d option. The character following -d is the field delimiter. Default delimiter is single blank space or tab. delim can be a multi-byte character. delim can be specified within single quotes or by using a backslash (\) character before the delimiter. Cutting files. using delimiters is useful when the file contains variable length records. The output of the cut command can also be redirected to a file for saving, or piped to another program as input. Example 5.5 : cut $ cat stdlist 001 : Sneha +78 002 : Asha 160 003 : Madhuri 180 005 : Harshita 177 010 : Akhila 45 $ cut +b 7-14 stdlist Sneha Asha Madhuri Harshita Akhila $cut-c 1-3 stdlist 001 002 003 005 010 $ cut -d ‘:" -f 1,3 stdlist UNiKX Proerinn 116 ng 001 : 78 002 : 60 003 : 80 005: 77 010: 45 : $ cut -d \; -f 1,2 stdlist 001 : Sneha 002 : Asha 003 : Madhuri 005 : Harshita 010 : Akhila $ date | cut -d‘-f 1,3 Fri Jan 22 5.4.3 paste command ‘The paste command is used to join contents of two files vertically. The contents of two files can be viewed side by side by pasting them. It’s general format is : paste | [ options | | file list + -d:delimeter paste uses the tab as the default delimiter. -d option can be used to specify one or more delimiters just like in cut command. The output of paste command is displayed on the screen, unless redirected or pipelined. Example 5.6 : paste $cat regno $cat names $cat phone 001 Ash 23349311 002 Hrithik 26781234 003 John 25673332 $paste regno names phone > infolist $cat infolist 001 Ash 23349311 002 Hrithik 26781234 003 John 25673332 $ paste -d \ : regno names phone 001: Ash 123349311 002 : Hrithik : 26781234 003 : John + 25673332 $ paste -d \ : regno names phone > studlist a yr pes na Fes 117 44 unia command 5a When duplicate entries are found in a file, they can be removed from the file by the unid command. The general format of this command is: unig | | options | | input file @ This command takes only one sorted file as its argument. 4 When the command is used without any options, the output is displayed without the duplicate entries. 9 The options available with uniq command are as follows: -c: preceeds each output line with a count of the number of times the line occurred in the input. -d:: selects only one copy of the duplicate entries. -u: selects only the lines which are not repeated (selects only unique lines). Uniq command is generally used in building pipelines in the shell scripts. Ignores case differences when comparing lines. --help : _ Displays a help message. Example 5.7 : uniq $ cat names Abhishek Ash Ash Deepika Hrithik Sonali Sonali $ unig names Abhishek Ash Deepika Hrithik Sonali $ unig -c names 1 Abhishek 2 Ash 1 Deepika 1 Hrithik 2 Sonali $ unig -d names Ash Sonali $ unig -u names Abhishek Deepika Hrithik 118 UNIK Progr 5.4.5 we Command we command is a filter used to count the number of lines, words an, d Character, a one or more files. It’s general format is : we | [ options | [ input files © It takes one or more filenames as its arguments. It gives the output in 4 columns; the first column indicates the nu second column indicates the number of words, third column indicates characters and the last column indicates the filename, we Options -: mber of Tings the number prints the character count including newlines (\n). cl: prints the line count, “Ls: prints the length of the longest line, + counts words delimited by white space characters or new line characters, Example 5.8 : we $ we stdlist 5 19 90. stdiist $ we -c stdlist 90 stdlist $ wx -I stdlist 5 stdlist $ we -L stdlist 17 stdlist $ we -w stdlist 19 stdlist $ we listi list 5 13 57st 5 12-23 igta 5.4.6 tr— Translating Characters Command a tes d id yommand is used to translate characters. It is a filter that manipulet ‘ndvidual characters in a line, The general syntax of tr command is : \ ade A eos line pipes and Filters 119 tr | [options | [ expressiont | [ expression2 | <| input file | Where tr takes the input from the input file and replaces the expression! with expression2. A-simple form of tr takes two arguments, Each argument may be a charactor or a string of characters. @ The behaviour of tr command can be explained with an example: Example 5.9 : tr $ cat list1 A friend in need is a friend indeed. 1. $ tr ‘frie’ ‘FRI’ < listi A FRIInd In nild Is a FRIInd IndlId. $ tr fri’ ‘FRIE’ < listi A FRlend'In need Is a FRIend Indeed. In the first example : all f are replaced by F all r are replaced by R all i are replaced by I Since the number of characters in the first string is less than that of second string, both i and e are replaced with the last character-I, in the second string. In the second example: all f are replaced with F all r are replaced with R all i are replaced with I and the extra character E in the second argument is neglected. ® — A range of characters can be specified using the hyphen (-) character. tr Options “©: Complements the set of characters specified by the first string. i.e., it matches all the characters that are not found in the first string and replaces the matched characters with the characters in the second string. In the example, all the spaces, dot and new line character, are replaced with hyphens. -d: Deletes all occurrences of input characters that are specified by the first string. “8: Replaces repeated characters with a single character. In the example multiple | Spaces have been squeezed to single space. tr cdmmand'does not affect the contents of the original file. Therefore the translated file can be re-directed to another file, if required. 120 Example 5.10 : tr ~* ; i The Woods are lovely. $trwg < list2 The goods are lovely. $ tr ‘az’ AZ’ < list2 THE WOODS ARE LOVELY. $ tr ‘Wo’ Wo’ < list2 > newlist $ cat newlist a The Woods are lOvely $tr-do < list2 The Wds are Ively. $ tr -d ‘rie’ < list Afnd_n_nd_s a fnd_ndd. $ cat list3 Fruits are tasty. $ tr -c 'a-zA-z' ~' < list3 Fruits---are---tasty-~ $ cat lista This Line Contains spaces. $tr-s'* < liste This Line contains spaces. $ who | tr ‘a-z' ‘A-2" ROOT :0 JAN 24 11:05 RAMA PTS/1 JAN 24 12:40 STUD2 PTS/2__JAN 24 12:50 5.4.7 cat The cat command has already been discussed in the earlier chapters. It is used ® display contents of a file, create a new file, append data to an existing file and concaten ‘wo or more files. The cat command allows you to view, modify or combine a file. cat Options Option | Meaning ~e Print § at end of line = Displays each line output with line number at the beginning. Silent (no error messages) Print tabs and form feeds =u Output is not buffered. Default is buffered output. ~v Print control characters -b ines. Numbers the lines (like -n) but omits the line numbers of blank } and Filters 121 55 REGULAR EXPRESSIONS A regular expression is defined as a » fi pattern consisting of a sequence of charactors ch is to be matched against a given text, vhic wr Jn Unis searching could be, searching one or more records from a database or one or ro lines from a test file, or Searching for a word ete, mov) part of the target word or ooh process. In Unix these In such cases patterns that contain Phrase to be searched are formed and used during the Patterns are a string of characters known as regular expression. A regular expression, consists of atoms and operators just like a mathematical expression which contains operands (data) and operators, The atom specifies what to look for and where in the text to be matched. The operator, which is not required in all expressions, is used to combine atoms into complex expressions, 5.6 ATOMS An atom specifies what text to be matched and where to be found. An atom in a regular expression can be of the following types : 1, Single character 2 Adot 3. Aclass 4, An anchor Single Dot Class dase character fe pe) Eee Figure 5.4 Atoms types 5.6.4 Single Character The simplest atom is a single character. If a regular expression contains one single character, that character will be searched in the text to make the pattern match A*essful. If not found the pattern match is unsuccessful. Figure 5.5 demonstrates single- ‘erecter pattern match, ~~ 122 . UNIX I String Reg. Expression UNIX UNIX UNIX 1 a 1 ira 1 | tug No match No match Matches (a) Successful Pattern match UNIX Ss String Reg. Expression UNIX UNIX DAH] ap FRR] a> [OE] = [OE] No match No match No match No match (b) Unsuccessful Pattern match Figure 5.5 Single-character pattern match 5.6.2 Dot ‘The meta-character dot is used to match any single character except a net character (\n). By itself, dot does not do anything since it can match everything, Howe it becomes powerful when combined with other atoms in a regular expression. For eas? the regular expression h. fore it matte | will match all pairs of characters, where the first character is ‘h’. There! hb, he... hx, hl, h2, ete, (Fig, 5.6) ~ L.] ON] uN TNX 3 | ring Reg. Expr. String Reg. Expr. String UN re i | > tue WN] > tue [*] 2 # | Match aah watch atch Figure 5.6 Dot Atom Examples > as and Fillers 123 pipes a 5.63 Class ee sometimes it is required to match a character from a set of characters. Such a set is vynas enaracter class. The character set used in the matching process, is enclosed in ea prackets. Figure 5.7 shows an example of a class in regular expression. sq LINUX [LAN] String Reg. Expression LINUX ul tan | => tue oRuK = tue Match Match Figure 5.7 Class atom example The character class set is a very powerful expression component and its power is further extended by using 8 additional tokens: Ranges (-) @ Exclusion (*) @ Escape (\) characters A range of text characters is indicated by using a hyphen (-). i., the expression [a2] indicates that the characters a to z are all included within the set (set of lowercase alphabets). ‘The exclusion character (4) is used to specify the characters to be excluded from the set. Upper caret () which is the UNIX not operator is used for this purpose. For example to specify any character other than the vowels, the regular expression would be Uaeiou). The escape character (\) is used when the matching character is one of the other ‘wo tokens, For example to use — (hyphen) as a character in the regular expression and not *Sa token, it can be written as : ‘Taciou\-1) — characters to be matched will be a, e, i, o, u and —. Figure 5.8 illustrates examples of lass atom, 7 Berard Re oe Purpose 3 Reger Be “Any uppercase alphabets | | = Tay lowercase alphabets | “Any digit ‘Any digit or any capital letter. ‘A digit or hyphen. (O- -4 ‘Any character except a digit (90-91 ‘Any character except X Yor Z. | se Anything except *. i) Figure 5.8 Character class examples 5.6.4 Anchors Anchors are atoms that are not matched with the text, but define where the next character in the pattern must be located in the text. There are 4 types of anchors (Fig. 5.9) Anchor | Meaning The caret character, Searches for lines beginning with the specific pattern, $ The dollar character searches for all lines ending with the specified pattern. \e Searches for words beginning with the specific patlorn. e Searches for words ending with given pattern. Figure 5.9 Types of anchor ords ending with g, “ords starting with the pat To search for all wi the regular expressi Id be ‘g’ Similarly to search for w s Pherae eee Sint ‘tern ‘Ind’, the regular expression would Regular Expression Purpose Unix matches the Pattern ‘Unix’ a..b : Matches a, any two characters, and b. {0-9} (0-9) matches any two digits, “§ ‘ matches a blank line, UNIX unix ; — matches UNIX or unix (using alternation operator |) matches HELLO or hello (usin, ; g alternation operator) A igure 5.10 Examples of sequence and Altern ation operators. ppes ane Filters 125 5.65 Operators Regular expressions become more powerful when atoms are combined with srators. There are five types of regular expression operators ; sequence, alternation, tition, group and save operators. ‘The sequence operator means no operator. i.e., if a series of character are shown in q regular expression, it implies that there is an invisible sequence operator between them, (Figure 5.10). | ; ‘The alternation operator (|) is used to define one or more alternatives. It is usually used for selecting between two or more sequence of characters as shown in figure 5.10. ‘The Repetition operator is a set of escaped braces containing two numbers separated by 2 comma as shown in figure 5.11. It specifies that the atom, just before the repetition operation may be repeated. The first number (m) indicates the minimum required times of repetition ; the second number (n) indicates the maximum number of times it may appear. rept Regular Syntax: | \{m,n\} Expression Matches previous character m to n times AN {8, 5 \} matches ‘AAA’, ‘AAAA’, or ‘AAAAA’ BA\ {2, 4\} | matches ‘BAA’, ‘BAAA’ or ‘BAAAA’ D\ 3} matches only ‘DDD’ DA {3, \} matches ‘DDD’, ‘DDDD’ etc. D must be repeated at least 3 times. GAN { 3) matches ‘G’, ‘GA’, ‘GAA’, ‘GAAA’ A can be repeated zero to 3 times, but no more. Figure 5.11 Examples of repetition operator 5.7 grep Unix consists of a special family of commands used for handling search requirements, known as the grep family. grep stands for global regular expression print. It is a family of programs or commands used to search input files for all lines that match a specified regular expression, and write them to the standard output file (Monitor). It’s general syntax is : grep | | options | | reg exp | | file list 1. Global specifies that the entire file would be searched for a specific pattern and displayed. 2. grep is invaluable for finding occurrences of a variable in programs, or words in documents or for selecting parts of the output of a program. 3. grep is a search utility. The only action it performs on a line is to send it to standard output. If the line does not match the regular expression, it is not printed, 126 UNIX Programming don the left or right-hand side of a pipe. pis a filter. It can be use ae delete or change t be used to add, ae oop ae be used to print only part of a line. oa does not read only part of a file. However many 0} other utilities. gre be overcome by combining grep wig Pree f these limitations can 5.7.1 Thegrep Command : ai It is good practice to enclose the regular expression (pattern) used in the grep mand, within single quotes. i following database stdlist will be used for all the examples of grep command: Example 5.11 : stdlist & list $ cat stdlist 001 : Sneha Iyer : 78 BCA 002 : Asha : 60 BSc 003 : Madhuri : 80 BCA 004 : Lakshmi : 80 : BCA 005 : Harshita P 77 BCA 006 : Archana 77 BCA 007 : Ramya :78 : BCA 008 : Rama : 80 BCA 009 : Pratibha :78 BCA 010 : Akhila 45 BSc $ cat list 001 : sneha Iyer 278 : BCA 002 : Asha : 60 4BSc 003 : Akhila 245 BSc Example 5.12 ; grep $ grep Sneha stdlist 001 : Sneha Iyer : 78 : BCA $ grep Sneha Iyer stdlist grep : Iyer : No such file or Directory stdlist : 001 : Sneha I i yer: 78: Br $ grep ‘Sneha Iyer’ stdlist . 001 : Sneha Iyer : 7g ‘A 178: BC $ grep Sarika stdlist # Saril sap oo ‘arika is not found 0: Akhila : 45 : Bs, 45: BSc $ grep Asha stdlist list Stdlist : 002 : Asha ; 60 : BS: list : 002 : Asha ; 60 : BSc pipes and Filters 127 ,p can be used to seareh for a speci ore? ‘yaful (as in example above), succes fi * pattern in more than one file, When the search is n the extracted records are displayed along with their mes, he filename appears at the beginning of the displayed lines. 5.7.2 grep Options ‘The most common options used with grep command are as follows: is the count option. I n. This option counts the records or lines that contain the specified pattern in all the files given as arguments. It displays only the count. -e pattern: This option is used to specify multiple search pattern. Each pattern must be proceeded by -e. However, using multiple search patterns is more convenient with the fgrep and egrep commands. 4 : Generally grep differentiates between uppercase and lowercase alphabets. ‘This option ignores case and searches for all patterns specified. 3 : When -1 option is used, it displays only the filenames containing the specific pattern, Both options -i and -1 can be used together as -il. ‘This option is used to display the line numbers of matching records along with the record, a wv : This option is known as the inverse option. It prints or displays only those lines or records that do not match the specified pattern. Example 5.13 : grep options $ grep -c ‘BSc’ stdlist 2 $ grep -e Madhuri -e Rama stdlist 003 : Madhuri : 80 : BCA 008; Rama: 80: BCA $ grep -i Sneha stdlist list stdlist : 001 : Sneha Iyer : 78 : BCA list: O01 : sneha Iyer : 78 : BCA $ grep -il sneha stdlist list stdlist list $ grep -n ‘y’ stdlist list stdlist : 1 :001 :Snehalyer : 78 stdlist : 7 : 007 178 iste 1) 00% 178 $ grep -v BCA stdlist 002 Asha: 60: BSc 010 _: Akhila_: 45 : BSc UNI Ke "ram 128 i Je 6.14 illustrates the usage of atoms with grep command, The next example 5. Example 5.14 : grep with atoms tag! . é Is -I ee rama rama 4096 Jan 6 16:16 testdir rwxrwx"™ displays all the directories in the current directory. is $ cat phonelist Chitra + 23462346 jyothi —: 25343232 Meenakshi : 26467251 Murugan: 22342234 Harish: 25254321 $ grep ‘AM’ phonelist Meenakshi : 26467251 Murugan: 22342234 displays all records starting with ‘M’ $ grep ‘BSc$" stdlist 002 : Asha 60: BSc 010: Akhila : 45: BSc displays all records ending with ‘BSc’, $ grep '8...$’ emplist.Ist 2525 : Pranav Sharma : director : 8500 3412 : Sonali Deb manager : 8900 displays records from emplist.Ist where the salary lies between 8000 and 8999. $ is used to search in the last field of the file, $ grep -I ‘manager’ *.Ist . emplist.ist empl1lst emp2.Ist displays all the filenames ending wi inins seas al the ending with Ist and containing $ grep ‘A[A-3}’ phonelist Chitra : 23462346 Jyothi : 25343232 $ grep [HhJarish phonelist Harish ; 25254321 displays all rect ‘ords with Harish or harish —= Pipes and Filters The grep family contains three commands namely: ® grep egrep (extended grep) % fgrep (fast grep) 129 Fast Grep Grep Extended Grep fgrep supports only string patterns, but no regular expressions. grep supports a limited number of regular expressions egrep supports most of the regular expressions, but not all of them.

You might also like