You are on page 1of 6

1/24/2020 Regular Expression - Java Programming Tutorial

yet another insignificant programming notes...   |   HOME

TABLE OF CONTENTS (HIDE)


1.  Introduction

Java Programming Tutorial


2.  Package java.util.regex (JDK
3.  Java Regex by Examples
3.1  Example: Find Text

Regular Expression (Regex) in 3.2  Example: Find Pattern (Express


3.3  Example: Find and Replace Tex
3.4  Example: Find and Replace wit
Java 3.5  Example: Rename Files of a Giv
4.  Other Usages of Regex in Java
4.1  The String.split() Method
4.2  The Scanner & useDelimite

1.  Introduction
Regular Expression (regex) is extremely useful in programming, especially in processing text files.

I assume that you are familiar with regex and Java. Otherwise, read up the regex syntax at:
1. My article on "Regular Expressions".
2. The Online Java Tutorial Trail on "Regular Expressions".
3. JavaDoc for java.util.regex Package.

4. JavaDoc for java.util.regex.Pattern Class, which summarizes of the regex patterns.

2.  Package java.util.regex (JDK 1.4)


Regular expression was introduced in Java 1.4 in package java.util.regex. This package contains only two classes:
1. java.util.regex.Pattern: represents a compiled regular expression. You can get a Pattern object via static method
Pattern.compile(String regexStr).
2. java.util.regex.Matcher: an engine that performs matching operations on an input CharSequence (such as String,
StringBuffer, StringBuilder, CharBuffer, Segment) by interpreting a pattern.

The steps are:

String regexStr = "......"; // Regex String


String inputStr = "......"; // Input for matching, any CharSequence such as String, StringBuffer, StringBuilder, CharBuffer
// Step 1: Compile a Regex String into a Pattern object
Pattern pattern = Pattern.compile(regexStr);
// Step 2: Allocate a matching engine for the regex pattern bind with the input string
Matcher matcher = pattern.matcher(inputStr);
// Step 3: Perform the matching and process the matching result
// Perform matching operations using:
matcher.find() : scans the input sequence looking for the next subsequence that matches the pattern
matcher.matches() : attempts to match the entire input sequence
matcher.lookingAt() : attempts to match the input sequence, starting at the beginning, against the pattern.
matcher.replaceAll(replacementStr): Find and replace all matches.
matcher.replaceFirst(replacementStr): Find and replace the first match.
// Processing matching result
matcher.group() : returns the input subsequence matched by the previous match.
matcher.start() : returns the start index of the previous match.
matcher.end() : returns the offset after the last character matched.

Check out the JavaDoc for Package java.util.regex.

3.  Java Regex by Examples


https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 1/6
1/24/2020 Regular Expression - Java Programming Tutorial

3.1  Example: Find Text


For example, given the input "This is an apple. These are 33 (thirty-three) apples.", you wish to find all occurrences of
pattern "Th" (either case-sensitive or case-insensitive).

1 import java.util.regex.Pattern;
2 import java.util.regex.Matcher;
3
4 public class TestRegexFindText {
5 public static void main(String[] args) {
6
7 // Input String for matching the regex pattern
8 String inputStr = "This is an apple. These are 33 (thirty-three) apples.";
9 // Regex to be matched
10 String regexStr = "Th";
11
12 // Step 1: Compile a regex via static method Pattern.compile(), default is case-sensitive
13 Pattern pattern = Pattern.compile(regexStr);
14 // Pattern.compile(regex, Pattern.CASE_INSENSITIVE); // for case-insensitive matching
15
16 // Step 2: Allocate a matching engine from the compiled regex pattern,
17 // and bind to the input string
18 Matcher matcher = pattern.matcher(inputStr);
19
20 // Step 3: Perform matching and process the matching results
21
22 // Try Matcher.find(), which finds the next match
23 while (matcher.find()) {
24 System.out.println("find() found substring \"" + matcher.group()
25 + "\" starting at index " + matcher.start()
26 + " and ending at index " + matcher.end());
27 }
28
29 // Try Matcher.matches(), which tries to match the entrie input string
30 if (matcher.matches()) {
31 System.out.println("matches() found substring \"" + matcher.group()
32 + "\" starting at index " + matcher.start()
33 + " and ending at index " + matcher.end());
34 } else {
35 System.out.println("matches() found nothing");
36 }
37
38 // Try Matcher.lookingAt(), which tries to match from the beginning of the input string
39 if (matcher.lookingAt()) {
40 System.out.println("lookingAt() found substring \"" + matcher.group()
41 + "\" starting at index " + matcher.start()
42 + " and ending at index " + matcher.end());
43 } else {
44 System.out.println("lookingAt() found nothing");
45 }
46 }
47 }

Output
find() found substring "Th" starting at index 0 and ending at index 2
find() found substring "Th" starting at index 18 and ending at index 20
matches() found nothing
lookingAt() found substring "Th" starting at index 0 and ending at index 2

How It Works
Three steps are required to perform regex matching:
Allocate a Pattern object. There is no constructor for the Pattern class. Instead, you invoke the static method
Pattern.compile(regexStr) to compile the regexStr, which returns a Pattern instance.

https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 2/6
1/24/2020 Regular Expression - Java Programming Tutorial

Allocate a Matcher object (an matching engine). Again, there is no constructor for the Matcher class. Instead, you invoke the
matcher(inputStr) method from the Pattern instance (created in Step 1), and bind the input string to this Matcher.

Use the Matcher instance (created in Step 2) to perform the matching and process the matching result. The Matcher class
provides a few boolean methods for performing the matches:
boolean find(): scans the input sequence to look for the next subsequence that matches the pattern. If match is found, you
can use the group(), start() and end() to retrieve the matched subsequence and its starting and ending indices, as shown
in the above example.
boolean matches(): try to match the entire input sequence against the regex pattern. It returns true if the entire input
sequence matches the pattern. That is, include regex's begin and end position anchors ^ and $ to the pattern.
boolean lookingAt(): try to match the input sequence, starting from the beginning, against the regex pattern. It returns
true if a prefix of the input sequence matches the pattern. That is, include regex's begin position anchors ^ to the pattern.
To perform case-insensitive matching, use Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE) to create the Pattern
instance (as commented out in the above example).

3.2  Example: Find Pattern (Expressed in Regular Expression)


The above example to find a particular piece of text from an input sequence is rather trivial. The power of regex is that you can use it to
specify a pattern, e.g.,
1. (\w)+ matches any word (delimited by space), where \w is a metacharacter matching any word character [a-zA-Z0-9_], and + is an
occurrence indicator for one or more occurrences.
2. \b[1-9][0-9]*\b matches any number with a non-zero leading digit, separated by spaces from other words, where \b is the position
anchor for word boundary, [1-9] is a character class for any character in the range of 1 to 9, and * is an occurrence indicator for zero
or more occurrences.

Try changing the regex pattern of the above example to the followings and observe the outputs. Take not that you need to use escape
sequence '\\' for '\' inside a Java's string.

String regexStr = "\\w+"; // escape sequence \\ for \


String regexStr = "\\b[1-9][0-9]+\\b";

Output for Regex \w+


find() found substring "This" starting at index 0 and ending at index 4
find() found substring "is" starting at index 5 and ending at index 7
find() found substring "an" starting at index 8 and ending at index 10
find() found substring "apple" starting at index 11 and ending at index 16
find() found substring "These" starting at index 18 and ending at index 23
find() found substring "are" starting at index 24 and ending at index 27
find() found substring "33" starting at index 28 and ending at index 30
find() found substring "thirty" starting at index 32 and ending at index 38
find() found substring "three" starting at index 39 and ending at index 44
find() found substring "apples" starting at index 46 and ending at index 52
matches() found nothing
lookingAt() found substring "This" starting at index 0 and ending at index 4

Output for Regex \b[1-9][0-9]*\b


find() found substring "33" starting at index 28 and ending at index 30
matches() found nothing
lookingAt() found nothing

Check out the Javadoc for the Class java.util.regex.Pattern for the list of regular expression constructs supported by Java.

3.3  Example: Find and Replace Text


Finding a pattern and replace it with something else is probably one of the most frequent tasks in text processing. Regex allows you to
express the pattern liberally, and also the replacement text/pattern. This is extremely useful in batch processing a huge text document or
many text files. For example, searching for stock prices from many online HTML files, rename many files in a directory with a certain pattern,
etc.

1 import java.util.regex.Pattern;

https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 3/6
1/24/2020 Regular Expression - Java Programming Tutorial
2 import java.util.regex.Matcher;
3
4 public class TestRegexFindReplace {
5 public static void main(String[] args) {
6 String inputStr = "This is an apple. These are 33 (Thirty-three) apples";
7 String regexStr = "apple"; // pattern to be matched
8 String replacementStr = "orange"; // replacement pattern
9
10 // Step 1: Allocate a Pattern object to compile a regex
11 Pattern pattern = Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE);
12
13 // Step 2: Allocate a Matcher object from the pattern, and provide the input
14 Matcher matcher = pattern.matcher(inputStr);
15
16 // Step 3: Perform the matching and process the matching result
17 //String outputStr = matcher.replaceAll(replacementStr); // all matches
18 String outputStr = matcher.replaceFirst(replacementStr); // first match only
19 System.out.println(outputStr);
20 }
21 }

Output for replaceAll()


This is an orange. These are 33 (Thirty-three) oranges.

Output for replaceFirst()


This is an orange. These are 33 (Thirty-three) apples.

How It Works
First, create a Pattern object to compile a regex pattern. Next, create a Matcher object from the Pattern and bind to the input string.
The Matcher class provides a replaceAll(replacementStr) to replace all the matched subsequence with the replacementStr; or
replaceFirst(replacementStr) to replace the first match only.

3.4  Example: Find and Replace with Back References


Given the input "One:two:three:four", the following program produces "four-three-two-One" by matching the 4 words separated by
colons, and uses the so-called parenthesized back-references $1, $2, $3 and $4 in the replacement pattern.

1 import java.util.regex.Pattern;
2 import java.util.regex.Matcher;
3
4 public class TestRegexBackReference {
5 public static void main(String[] args) {
6 String inputStr = "One:two:three:four";
7 String regexStr = "(.+):(.+):(.+):(.+)"; // pattern to be matched
8 String replacementStr = "$4-$3-$2-$1"; // replacement pattern with back references
9
10 // Step 1: Allocate a Pattern object to compile a regex
11 Pattern pattern = Pattern.compile(regexStr);
12
13 // Step 2: Allocate a Matcher object from the Pattern, and provide the input
14 Matcher matcher = pattern.matcher(inputStr);
15
16 // Step 3: Perform the matching and process the matching result
17 String outputStr = matcher.replaceAll(replacementStr); // all matches
18 //String outputStr = matcher.replaceFirst(replacementStr); // first match only
19 System.out.println(outputStr); // Output: four-three-two-One
20 }
21 }

Parentheses () have two meanings in regex:


1. Grouping sub-expressions: For example xyz+ matches one 'x', one 'y', followed by one or more 'z'. But (xyz)+ matches one or
more groups of 'xyz', e.g., 'xyzxyzxyz'.

https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 4/6
1/24/2020 Regular Expression - Java Programming Tutorial

2. Parenthesized Back Reference: Provide back references to the matched subsequences. The matched subsequence of the first pair of
parentheses can be referred to as $1, second pair of patentee as $2, and so on. In the above example, there are 4 pairs of parentheses,
which were referenced in the replacement pattern as $1, $2, $3, and $4. You can use groupCount() (of the Matcher) to get the
number of groups captured, and group(groupNumber), start(groupNumber), end(groupNumber) to retrieve the matched
subsequence and their indices. In Java, $0 denotes the entire regular expression. Try the following codes and check the output:

while (matcher.find()) {
System.out.println("find() found substring \"" + matcher.group()
+ "\" starting at index " + matcher.start()
+ " and ending at index " + matcher.end());
System.out.println("Group count is: " + matcher.groupCount());
for (int i = 0; i < matcher.groupCount(); ++i) {
System.out.println("Group " + i + ": substring="
+ matcher.group(i) + ", start=" + matcher.start(i)
+ ", end=" + matcher.end(i));
}
}

find() found substring "One:two:three:four" starting at index 0 and ending at index 18


Group count is: 4
Group 0: substring=One:two:three:four, start=0, end=18
Group 1: substring=One, start=0, end=3
Group 2: substring=two, start=4, end=7
Group 3: substring=three, start=8, end=13

3.5  Example: Rename Files of a Given Directory


The following program rename all the files ending with ".class" to ".out" of the directory specified.

1 import java.util.regex.Pattern;
2 import java.util.regex.Matcher;
3 import java.io.File;
4
5 public class RegexRenameFiles {
6 public static void main(String[] args) {
7 String regexStr = ".class$"; // ending with ".class"
8 String replacementStr = ".out"; // replace with ".out"
9
10 // Allocate a Pattern object to compile a regex
11 Pattern pattern = Pattern.compile(regexStr, Pattern.CASE_INSENSITIVE);
12 Matcher matcher;
13
14 File dir = new File("."); // directory to be processed
15 int count = 0;
16 File[] files = dir.listFiles(); // list all files and directories
17 for (File file : files) {
18 if (file.isFile()) { // file only, not directory
19 String inFilename = file.getName(); // get filename, exclude path
20 matcher = pattern.matcher(inFilename); // allocate Matches with input
21 if (matcher.find()) {
22 ++count;
23 String outFilename = matcher.replaceFirst(replacementStr);
24 System.out.print(inFilename + " -> " + outFilename);
25
26 if (file.renameTo(new File(dir + "\\" + outFilename))) { // execute rename
27 System.out.println(" SUCCESS");
28 } else {
29 System.out.println(" FAIL");
30 }
31 }
32 }
33 }
34 System.out.println(count + " files processed");
35 }
36 }

You can use regex to specify the pattern, and back references in the replacement, as in the previous example.

https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 5/6
1/24/2020 Regular Expression - Java Programming Tutorial

4.  Other Usages of Regex in Java

4.1  The String.split() Method


The String class contains a method split(), which takes a regular expression and splits this String object into an array of Strings.

// In String class
public String[] split(String regexStr)

For example,

public class StringSplitTest {


public static void main(String[] args) {
String source = "There are thirty-three big-apple";
String[] tokens = source.split("\\s+|-"); // whitespace(s) or -
for (String token : tokens) {
System.out.println(token);
}
}
}

There
are
thirty
three
big
apple

4.2  The Scanner & useDelimiter()


The Scanner class, by default, uses whitespace as the delimiter in parsing input tokens. You can set the delimiter to a regex via use
delimiter() methods:

public Scanner useDelimiter(Pattern pattern)


public Scanner useDelimiter(String pattern)

For example,

import java.util.Scanner;
public class ScannerUseDelimiterTest {
public static void main(String[] args) {
String source = "There are thirty-three big-apple";
Scanner in = new Scanner(source);
in.useDelimiter("\\s+|-"); // whitespace(s) or -
while (in.hasNext()) {
System.out.println(in.next());
}
}
}

REFERENCES & RESOURCES

Latest version tested: JDK 10


Last modified: November, 2018

Feedback, comments, corrections, and errata can be sent to Chua Hock-Chuan (ehchua@ntu.edu.sg)   |   HOME

https://www.ntu.edu.sg/home/ehchua/programming/java/Java_Regexe.html 6/6

You might also like