Regular Expressions in Grep (Regex)

Regular Expressions in Grep
(Regex)
https://linuxize.com/post/regular-expressions-in-grep/
grep is one of the most useful and powerful commands in Linux for text processing.
grep searches one or more input files for lines that match a regular expression and
writes each matching line to standard output.
In this article, we’re going to explore the basics of how to use regular expressions in
the GNU version of grep, which is available by default in most Linux operating
systems.
Grep Regular Expression #

A regular expression or regex is a pattern that matches a set of strings. A pattern
consists of operators, constructs literal characters, and meta-characters, which have
special meaning. GNU grep supports three regular expression syntaxes, Basic,
Extended, and Perl-compatible.
In its simplest form, when no regular expression type is given, grep interpret search
patterns as basic regular expressions. To interpret the pattern as an extended regular
expression, use the -E ( or –extended-regexp) option.
In GNU’s implementation of grep there is no functional difference between the basic

and extended regular expression syntaxes. The only difference is that in basic regular
expressions the meta-characters ?, +, {, |, (, and ) are interpreted as literal characters.
To keep the meta-characters' special meanings when using basic regular expressions,
the characters must be escaped with a backslash (). We will explain the meaning of
these and other meta-characters later.
Generally, you should always enclose the regular expression in single quotes to avoid
the interpretation and expansion of the meta-characters by the shell.
Literal Matches #
The most basic usage of the grep command is to search for a literal character or
series of characters in a file. For example, to display all the lines containing the string
“bash” in the /etc/passwd file, you would run the following command:
grep bash /etc/passwdCopy
The output should look something like this:
root:x:0:0:root:/root:/bin/bash
linuxize:x:1000:1000:linuxize:/home/linuxize:/bin/bash
Copy
In this example, the string “bash” is a basic regular expression that consists of a four
literal characters. This tells grep to search for a string that has a “b” immediately
followed by “a”, “s”, and “h”.
By default, the grep command is case sensitive. This means that the uppercase and
lowercase characters are treated as distinct.
To ignore case when searching, use the -i option (or –ignore-case).
It is important to note that grep looks for the search pattern as a string, not a word.
So if you were searching for “gnu”, grep will also print the lines where “gnu” is
embedded in larger words, such as “cygnus” or “magnum”.
If the search string includes spaces, you need to enclose it in single or double
quotation marks:
grep "Gnome Display Manager" /etc/passwdCopy
Anchoring #
Anchors are meta-characters that that allow you to specify where in the line the
match must be found.
The ^ (caret) symbol matches the empty string at the beginning of a line. In the
following example, the string “linux” will match only if it occurs at the very beginning
of a line.
grep '^linux' file.txtCopy
The $ (dollar) symbol matches the empty string at the beginning of a line. To find a
line that ends with the string “linux”, you would use:
grep 'linux$' file.txtCopy
You can also construct a regular expression using both anchors. For example, to find
lines containing only “linux”, run:
grep '^linux$' file.txtCopy
Another useful example is the ^$ pattern that matches all empty lines.
Matching Single Character #

The . (period) symbol is a meta-character that matches any single character. For
example, to match anything that begins with “kan” then has two characters and ends
with the string “roo”, you would use the following pattern:
grep 'kan..roo' file.txtCopy
Bracket Expressions #
Bracket expressions allows match a group of characters by enclosing them in
brackets []. For example, find the lines that contain “accept” or “accent”, you could
use the following expression:
grep 'acce[np]t' file.txtCopy
If the first character inside the brackets is the caret ^, then it matches any single
character not enclosed in the brackets. The following pattern will match any
combination of strings starting with “co” followed by any letter except “l” followed by
“la”, such as “coca”, “cobalt” and so on, but will not match the lines containing “cola”:
grep 'co[^l]a' file.txtCopy
Instead of placing characters one by one, you can specify a range of characters inside
the brackets. A range expression is constructed by specifying the first and last
characters of the range separated by a hyphen. For example, [a-a] is equivalent to
[abcde] and [1-3] is equivalent to [123].
The following expression matches each line that starts with a capital letter:
grep '^[A-Z]' file.txtCopy
grep also support predefined classes of characters that are enclosed in brackets. The
following table shows some of the most common character classes:
QuantifierCharacter Classes
[:alnum:] Alphanumeric characters.
[:alpha:] Alphabetic characters.
[:blank:] Space and tab.
[:digit:] Digits.
[:lower:] Lowercase letters.
[:upper:] Uppercase letters.
For a complete list of all character classes check the Grep manual .
Quantifiers #
Quantifiers allow you to specify the number of occurrences of items that must be
present for a match to occur. The following table shows the quantifiers supported by
GNU grep:
QuantifierDescription
Match the preceding item zero or more times.
? Match the preceding item zero or one time.
Match the preceding item one or more times.
{n} Match the preceding item exactly ntimes.
{n,} Match the preceding item at least ntimes.
{,m} Match the preceding item at most mtimes.
{n,m} Match the preceding item from n to mtimes.
The * (asterisk) character matches the preceding item zero or more times. The
following will match “right”, “sright” “ssright” and so on:
grep 's*right'Copy
Below is more advanced pattern that matches all lines that starts with capital letter
and ends with either period or comma. The .* regex matches any number of any
characters:
grep -E '^[A-Z].*[.,]$' file.txtCopy
The ? (question mark) character makes the preceding item optional and it can match
only once. The following will match both “bright” and “right”. The ? character is
escaped with a backslash because we’re using basic regular expressions:
grep 'b\?right' file.txtCopy
Here is the same regex using extended regular expression:
grep -E 'b?right' file.txtCopy
The + (plus) character matches the preceding item one or more times. The following
will match “sright” and “ssright”, but not “right”:
grep -E 's+right' file.txtCopy

The brace characters {} allows you to specify the exact number, an upper or lower
bound or a range of occurrences that must occur for a match to happen.
The following matches all integers that have between 3 and 9 digits:
grep -E '[[:digit:]]{3,9}' file.txtCopy
Alternation #
The term alternation is a simple “OR”. The alternation operator | (pipe) allows you to
specify different possible matches that can be literal strings or expression sets. This
operator has the lowest precedence of all regular expression operators.
In the example below, we are searching for all occurrences of the words fatal, error,
and critical in the Nginx log error file:
grep 'fatal\|error\|critical' /var/log/nginx/error.logC

opy
If you use the extended regular expression, then the operator | should not be
escaped, as shown below:
grep -E 'fatal|error|critical' /var/log/nginx/error.log

Copy
Grouping #
Grouping is a feature of the regular expressions that allows you to group patterns
together and reference them as one item. Groups are created using parenthesis ().
When using basic regular expressions, the parenthesis must be escaped with a
backslash ().
The following example matches both “fearless” and “less”. The ? quantifier makes the
(fear) group optional:
grep -E '(fear)?less' file.txtCopy
Special Backslash Expressions #

GNU grep includes several meta-characters that consist of a backslash followed by a
regular character. The following table shows some of the most common special
backslash expressions:
ExpressionDescription
\b Match a word boundary.
< Match an empty string at the beginning of a word.
> Match an empty string at the end of a word.
\w Match a word.
\s Match a space.
The following pattern will match separate words “abject” and “object”. It will not
match the words if embedded in larger words:
grep '\b[ao]bject\b' file.txtCopy
Conclusion #
Regular expressions are used in text editors, programming languages, and
command-line tools such as grep, sed, and awk . Knowing how to construct regular
expressions can be very helpful when searching text files, writing scripts, or filtering
command output.
If you have any questions or feedback, feel free to leave a comment.
grepterminal

Regular Expressions in Grep (Regex)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regular Expressions in Grep (Regex)

Uploaded by

Copyright:

Available Formats

Regular Expressions in Grep

Grep Regular Expression #

In GNU’s implementation of grep there is no functional difference between the basic

grep bash /etc/passwdCopy

The output should look something like this:

To ignore case when searching, use the -i option (or –ignore-case).

grep "Gnome Display Manager" /etc/passwdCopy

grep '^linux' file.txtCopy

grep 'linux$' file.txtCopy

grep '^linux$' file.txtCopy

Matching Single Character #

grep 'kan..roo' file.txtCopy

grep 'acce[np]t' file.txtCopy

grep 'co[^l]a' file.txtCopy

grep '^[A-Z]' file.txtCopy

[:alnum:] Alphanumeric characters.

[:alpha:] Alphabetic characters.

[:blank:] Space and tab.

[:lower:] Lowercase letters.

[:upper:] Uppercase letters.

Match the preceding item zero or more times.

? Match the preceding item zero or one time.

Match the preceding item one or more times.

{n} Match the preceding item exactly ntimes.

{n,} Match the preceding item at least ntimes.

{,m} Match the preceding item at most mtimes.

{n,m} Match the preceding item from n to mtimes.

grep -E '^[A-Z].*[.,]$' file.txtCopy

grep 'b\?right' file.txtCopy

Here is the same regex using extended regular expression:

grep -E 'b?right' file.txtCopy

grep -E 's+right' file.txtCopy

grep -E '[[:digit:]]{3,9}' file.txtCopy

grep 'fatal\|error\|critical' /var/log/nginx/error.logC

grep -E 'fatal|error|critical' /var/log/nginx/error.log

grep -E '(fear)?less' file.txtCopy

Special Backslash Expressions #

< Match an empty string at the beginning of a word.

> Match an empty string at the end of a word.

grep '\b[ao]bject\b' file.txtCopy

If you have any questions or feedback, feel free to leave a comment.

You might also like