You are on page 1of 41

Chapter 5: Understanding Text Processing

The Complete Guide to Linux System Administration

Objectives
Use regular expressions in a variety of circumstances Manipulate text files in complex ways using multiple command-line utilities Use advanced features of the vi editor Use the sed and awk text processing utilities

The Complete Guide to Linux System Administration

Regular Expressions
Flexible way to encode many types of complex patterns Use to define pattern in many situations
Parameter to most Linux commands Within vi editor Within programming languages
Including shell scripts

Used for text

The Complete Guide to Linux System Administration

Regular Expressions (continued)

The Complete Guide to Linux System Administration

Regular Expressions (continued)

The Complete Guide to Linux System Administration

Regular Expressions (continued)


Acceptable syntax varies in small but important ways
Depending on where expression used

Examples:
[Rr]eunion[0-9][0-9].jpg [Rr]eunion[0-9]{2}.jpg Reunion-[^d].jpg

The Complete Guide to Linux System Administration

Manipulating Files
Command-line utilities useful for:
Searching Sorting Reorganizing Otherwise working with text files

The Complete Guide to Linux System Administration

Searching for Patterns with grep


grep
Rapidly scan files for specified pattern Print out lines of text that contain text matching pattern Take further action on matching lines of text
Using pipe to connect grep with other filtering commands

The Complete Guide to Linux System Administration

Searching for Patterns with grep (continued)


Examples:
grep wilson /etc/passwd grep thomas[Cc]orp *txt locate tif | grep frame

Often used at end of pipe

The Complete Guide to Linux System Administration

Examining File Contents


head and tail commands:
Display first few lines and last few lines of file By default include 10 lines -n option
Specify number of lines

Print output to STDOUT


Redirect as needed

The Complete Guide to Linux System Administration

10

Examining File Contents (continued)


tail f option
Follows file printing new lines as they are added to file by other programs Very useful for tracking log files

wc command
Count number of characters, words, and lines

The Complete Guide to Linux System Administration

11

Examining File Contents (continued)

The Complete Guide to Linux System Administration

12

Examining File Contents (continued)


strings command
Extracts text strings from file that includes binary and other non-text data Provides convenient way to check for information that may not be otherwise available

The Complete Guide to Linux System Administration

13

Examining File Contents (continued)

The Complete Guide to Linux System Administration

14

Manipulating Text Files


Filtering
Modify part of text file by adding removing or altering data in file Based on complex rules or patterns Use command-line programs to filter text files

sort command
Sort all of lines in text file

uniq command
Remove duplicate lines in file
The Complete Guide to Linux System Administration 15

Manipulating Text Files (continued)


diff command
Displays differences between two files Output format:
< indicates lines that were not found in second file > indicates lines that were not found in first file

cmp command
Gives quick check of whether two files are identical

The Complete Guide to Linux System Administration

16

Manipulating Text Files (continued)


comm command
Used to compare sorted files to see if they differ at all

ispell spell checker


Uses large dictionary to examine text file Prompts with suggestions

The Complete Guide to Linux System Administration

17

Manipulating Text Files (continued)

The Complete Guide to Linux System Administration

18

Manipulating Text Files (continued)

The Complete Guide to Linux System Administration

19

Manipulating Text Files (continued)

The Complete Guide to Linux System Administration

20

Using sed and awk


sed
Complex filtering program

awk command
Generally used for formatting output

The Complete Guide to Linux System Administration

21

Filtering and Editing Text with sed


sed command
Processes each line in text file according to series of command-line options Example:
sed -n '/lincoln/p' /tmp/names Prints to screen all lines of /tmp/names file that contain text lincoln

By default, prints each line to STDOUT

The Complete Guide to Linux System Administration

22

Filtering and Editing Text with sed (continued)


Substitution command syntax:
/pattern1/s/pattern2/pattern3/g Watches for lines containing pattern1 Replaces occurrences of pattern2 with pattern3 g option at end of command
Causes sed to replace all occurrences on each line Means global

The Complete Guide to Linux System Administration

23

Filtering and Editing Text with sed (continued)


Can place operations in file and pass file name to sed command
sed -f nolatin news-article > new_news-article

( & ) Operator within sed command


Refers to text that matches pattern2 S/[0-9]*\[0-9][0-9]/\$&/g

sed often useful as part of pipeline of Linux commands

The Complete Guide to Linux System Administration

24

Formatting with awk


Processes text
Extracts parts of file Formats text according to information you provide on command line or in script file

Format output based on fields within line of text Often can perform same functions with sed or awk

The Complete Guide to Linux System Administration

25

Formatting with awk (continued)


Each field on line is normally separated by whitespace
Can change which character awk uses to separate fields

First field is referred to by $1 second by $2, etc. Basic format: /pattern/ { actions } Example: ls -l | awk '{ print $3 $9 }'

The Complete Guide to Linux System Administration

26

Formatting with awk (continued)


Can include regular expression to select which lines awk includes in output:
ls -l | awk '/^l/ {print $3 $9 }'

Use variable or comparison in awk command


Put at beginning of command instead of pattern ls -l | awk ' $2 > 3 {print $0 }'

Using awk script file:


awk -f awk_command_list text_file

The Complete Guide to Linux System Administration

27

More Advanced Text Editing


vi editor provides advanced text editing features

The Complete Guide to Linux System Administration

28

File Operations in vi
:w command
Write file you are editing

:r file name
Insert another file into file you are editing

:q command
Exit from vi

:wq
Save and exit
The Complete Guide to Linux System Administration 29

Screen Repositioning
Line number and cursor position on line
Shown at bottom right

Use parentheses and curly braces


Move forward or backward by one sentence or paragraph at a time

Ctrl+f and Ctrl+b key combinations


Move one screen forward and backward

The Complete Guide to Linux System Administration

30

Screen Repositioning (continued)


Shift+G
Take you to any line in file Enter line number first then Shift+g

Mark
Like bookmark m command followed by name (a-z and 0-9)
Place mark

command followed by mark name Return to mark


The Complete Guide to Linux System Administration 31

Screen Repositioning (continued)


%
Navigate between matching braces, parenthesis, etc. in program source code

Shift+J
Joins two lines

The Complete Guide to Linux System Administration

32

More Line-Editing Commands


:h
View vi help file

Ctrl+]
Navigate to hyperlinks in help files

Ctrl+t
Navigate back from links in help files

The Complete Guide to Linux System Administration

33

More Line-Editing Commands (continued)


Forward slash (/)
Search forward from current cursor position Can use regular expression as search pattern

n key
Move to next occurrence of search pattern

?
Search backwards

N key
Move to previous occurrence of pattern
The Complete Guide to Linux System Administration 34

More Line-Editing Commands (continued)


Search-and-replace operations
Format
:line-number-range s/search-pattern/replacement text/flags

Example
:1$ s/^configure/configure/

The Complete Guide to Linux System Administration

35

More Line-Editing Commands (continued)


Shelling out
Execute another Linux command As if you were at shell prompt Type ! followed by command Example: :!ls /etc/samba

The Complete Guide to Linux System Administration

36

Setting vi Options
:set all
View all options currently set in vi Press spacebar multiple times to see all screens of settings

:set without the word all


Displays all options that current user has set

:set followed by option


To set option

The Complete Guide to Linux System Administration

37

Setting vi Options (continued)

The Complete Guide to Linux System Administration

38

Setting vi Options (continued)


Can automate settings
Define environment variable called EXINIT that contains set command Executed each time vi started
EXINIT='set nu nosmartindent'

Place settings in file called .exrc


Overrides information in EXINIT variable

The Complete Guide to Linux System Administration

39

Summary
Regular expressions used in many places to define patterns of information grep command used to search for lines of text containing pattern defined using regular expression sed and awk commands support complex scripting language that includes regular expressions

The Complete Guide to Linux System Administration

40

Summary (continued)
vi
Uses complex combinations of commands to reposition cursor within text Supports search-and-replace operations set command defines editor settings

The Complete Guide to Linux System Administration

41