You are on page 1of 8

SDN Contribution

REGEX TOY: TESTING REGULAR EXPRESSIONS IN


ABAP

Summary
Regular expressions are a powerful tool for processing text-based information effectively and efficiently. The
Regex Toy is a small, interactive tool aimed at ABAP developers who want to test their regular expressions
quickly. It improves upon similar tools available on the Web by reflecting specifically the semantics of regular
expressions in ABAP.

Created on: 4 May 2006

Author Bio
Ralph Benzinger joined SAP in 2003. He is currently a developer in the ABAP language core
development group at Walldorf, where he is working on the ABAP compiler and run-time
environment. Ralph received a Ph.D. in Computer Science from Cornell University in the United
States. Prior to joining SAP, he worked as management consultant in the Business Technology
Office of McKinsey & Company.

© 2006 SAP AG 1
Table of Contents
REGEX TOY: TESTING REGULAR EXPRESSIONS IN ABAP ..................................................... 1

Testing Regular Expressions in ABAP ............................................................................................ 2

The Regex Toy ................................................................................................................................ 3

Summary.......................................................................................................................................... 6

Related Content............................................................................................................................... 7

Copyright.......................................................................................................................................... 8

Testing Regular Expressions in ABAP


Since the introduction of regular expressions (or regexes, for short) into ABAP with NetWeaver Release
2004s, their adoption is steadily picking up as more and more developers discover their power and
usefulness in processing text-based data. Information validation, extraction, and transformation often
become a simple matter of a carefully crafted regex that would otherwise require many lines of ABAP code.
But the many benefits of regular expressions are not my concern here.

Instead, I would like to dwell upon the difficulty in writing correct expressions that all users of regexes can
testify to. The complexity of some of the more advanced features such as submatching, combined with the
relative illegibility of regexes, has more than once lead to unexpected results that kept me scratching my
head. In order to track down why my regex does or doesn't match a given text when it really should or
shouldn't I either meditate over the regex pattern until inspiration hits me, or I fiddle around with both pattern
and text by trial-and-error until I figure out what went wrong. Generally, I choose the latter approach and
even wrote my own tool to assist me.

There are, in fact, many such regex matching tools freely available on the Web, the most popular ones being
The Regex Coach, Visual Regexp, and The Regulator (currently MIA). While all of these tools offer great
functionality, they also uniformly suffer from one major defect – they do not adequately reflect the way
regexes work in ABAP.

There are basically two different "schools" when it comes to regular expressions: Perl and POSIX. Both
flavors differ significantly in how matches are computed: whereas POSIX returns the leftmost-longest match,
Perl computes what I call the leftmost-first match. For example, the statement

FIND REGEX '(a+|[ab]+)b'


IN 'xxxaaabaabbaxxx'
MATCH OFFSET off MATCH LENGTH len.

will find the underlined sequence

xxxaaabaabbaxxx

using POSIX-style matching, but would find the underlined sequence

xxxaaabaabbaxxx

instead when using Perl-style pattern matching.

Simply speaking, POSIX uses backtracking to search exhaustively for the longest match for a given regex,
whereas Perl stops as soon as a match is found. The crux is that ABAP uses POSIX-style regexes, while
most other tools – including the ones mentioned above – use Perl-style regexes. Thus, using these tools for
ABAP development can be extremely misleading.

© 2006 SAP AG 2
The Regex Toy
Fortunately, there is no need to worry here – NetWeaver brings along its own regex tool called The Regex
Toy, although you may need to upgrade your 2004s installation to a more recent support package (SP7, I
believe) in order to use it. To invoke the Regex Toy, follow the link in the ABAP online documentation on
regular expressions, or simply start report DEMO_REGEX_TOY. You'll be presented with the following
screen:

Really straightforward, isn't it? You enter your regex, some text and an optional replacement string at the
top, select among a few options, and voila – your matches (or replacements) will be highlighted in red. The
bottom part will also show any submatches that your very first match captured with parentheses.

Let's look at some examples. Suppose we want to redact all bold text from an HTML document, like this:

REPLACE ALL OCCURRENCES OF REGEX '<b>.*</b>'


IN htmltext WITH 'X'.

Playing around with the Regex Toy quickly shows that this pattern will not do:

© 2006 SAP AG 3
We see that the .* after the <b> tag matches greedily anything up to the very last </b> tag – clearly, this is
not what we had in mind. Fixing this is slightly involved, but regex <b>([^<]|<(?!/b>))*</b> will do:

© 2006 SAP AG 4
As an example for extracting information with submatches, suppose that we want to write a regex that
separates a file path such as /foo/bar/quux.txt into the directory part /foo/bar, the base name quux,
and the extension txt. Our first try would be to match anything up to / as the directory, then anything up to
. as the base name, and the remainder as the extension:

FIND REGEX '(.*)/(.*)\.(.*)' IN text


SUBMATCHES dir base ext.

We then use the Regex Toy to test this expression against a number of sample paths. Eventually, we will
discover that our regex breaks for paths that have a dot inside the directory path but no extension:

© 2006 SAP AG 5
As we can see, everything after the dot is put erroneously into the third submatch, i.e., the extension. This
flaw (along with others) is fixed with the improved regex (.*/)?([^.]*)(\..*)?.

Readers interested in the rationale for these two mildly complex regexes should keep an open eye for further
articles on regular expressions on SDN.

Summary
Of course the Regex Toy is mostly that – a toy rather than a tool. It does, however, all that I needed at the
time when I wrote it. Let me know of any killer features that you would like to see included in a future update.
And I'm positively soliciting creative readers to send me a funny cat limerick to replace my crude poetry!

© 2006 SAP AG 6
Related Content
1. ABAP Online Documentation (index search for "regex")

2. Eddy De Clercq: "Express Yourself Regularly", SDN

3. Wikipedia article on Regular Expressions

© 2006 SAP AG 7
Copyright
© Copyright 2006 SAP AG. All rights reserved.

No part of this publication may be reproduced or transmitted in any form or for any purpose without the express permission of SAP AG.
The information contained herein may be changed without prior notice.

Some software products marketed by SAP AG and its distributors contain proprietary software components of other software vendors.

Microsoft, Windows, Outlook, and PowerPoint are registered trademarks of Microsoft Corporation.

IBM, DB2, DB2 Universal Database, OS/2, Parallel Sysplex, MVS/ESA, AIX, S/390, AS/400, OS/390, OS/400, iSeries, pSeries, xSeries,
zSeries, z/OS, AFP, Intelligent Miner, WebSphere, Netfinity, Tivoli, and Informix are trademarks or registered trademarks of IBM
Corporation in the United States and/or other countries.

Oracle is a registered trademark of Oracle Corporation.

UNIX, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.

Citrix, ICA, Program Neighborhood, MetaFrame, WinFrame, VideoFrame, and MultiWin are trademarks or registered trademarks of
Citrix Systems, Inc.

HTML, XML, XHTML and W3C are trademarks or registered trademarks of W3C®, World Wide Web Consortium, Massachusetts
Institute of Technology.

Java is a registered trademark of Sun Microsystems, Inc.

JavaScript is a registered trademark of Sun Microsystems, Inc., used under license for technology invented and implemented by
Netscape.

MaxDB is a trademark of MySQL AB, Sweden.

SAP, R/3, mySAP, mySAP.com, xApps, xApp, SAP NetWeaver, and other SAP products and services mentioned herein as well as their
respective logos are trademarks or registered trademarks of SAP AG in Germany and in several other countries all over the world. All
other product and service names mentioned are the trademarks of their respective companies. Data contained in this document serves
information purposes only. National product specifications may vary.

These materials are subject to change without notice. These materials are provided by SAP AG and its affiliated companies ("SAP
Group") for informational purposes only, without representation or warranty of any kind, and SAP Group shall not be liable for errors or
omissions with respect to the materials. The only warranties for SAP Group products and services are those that are set forth in the
express warranty statements accompanying such products and services, if any. Nothing herein should be construed as constituting an
additional warranty.

These materials are provided “as is” without a warranty of any kind, either express or implied, including but not limited to, the implied
warranties of merchantability, fitness for a particular purpose, or non-infringement.

SAP shall not be liable for damages of any kind including without limitation direct, special, indirect, or consequential damages that may
result from the use of these materials.

SAP does not warrant the accuracy or completeness of the information, text, graphics, links or other items contained within these
materials. SAP has no control over the information that you may access through the use of hot links contained in these materials and
does not endorse your use of third party web pages nor provide any warranty whatsoever relating to third party web pages.

Any software coding and/or code lines / strings (“Code”) included in this documentation are only examples and are not intended to be
used in a productive system environment. The Code is only intended better explain and visualize the syntax and phrasing rules of
certain coding. SAP does not warrant the correctness and completeness of the Code given herein, and SAP shall not be liable for errors
or damages caused by the usage of the Code, except if such damages were caused by SAP intentionally or grossly negligent.

© 2006 SAP AG 8

You might also like