You are on page 1of 22

Regular Expressions

Software Team

What are Regular Expressions?


A regular expression (regex) is a sequence of characters
used in pattern matching on text
Parse number descriptions and read raw data

Why use Regular Expressions?


Ex: TAPE AND REEL
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T
R, TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL
T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?

Terminology
String: text to search through
Index i: the location in between
letters of our string, starting before
the first one
Ex:
Regex: KEMET
String: KEMET electronic components

Group: part of a regex


T(APE)

Regex Components

String Literals
The most basic form of pattern matching
Matches a regex to an exact string or part of a string
Regex KEM will match kem
Regex CAP matches the string The shorthand for capacitance
is cap twice, between indices 18 to 20 and also 33 to 35
Regex bcd234 will not match the string abcde12345

Character Classes
Use brackets [ ] to group characters together as an option for a
single character
Simple classes: set characters side-by-side, all are available options
[O0]8[O0]5 will match to O8O5 and 0805
[O0]8[O0]5 does not match to O8O05

Ranges: shortcuts for defining a character class containing a range


of values

[a-d] matches to the one of the letters a, b, c, or d


[1-5] matches to one number between 1 and 5
[a-c1-3] matches to a, b, c, 1, 2, or 3
pop[2-5] matches to pop3 but not pop6 nor poptart

Predefined Character Classes


Character class shortcuts
1\d3
123, 183, 1a3

5\D8
5!8, 528, 5a8

\w
*, A, !, _, 1

\W2
&2, J2, %2, 32

Construct

Description

\d

A digit: [0-9]

\D

Any non-digit

\w

A word
character:
[a-zA-Z_0-9]

\W

A non-word
character

\s

A whitespace
character

\S

A nonwhitespace
character

Metacharacters
Special characters that affect the way a pattern is matched
b.t matches to bat, bgt, b2t, etc., since . will match to
any character
^Volts
Volts and Amps
10 Volts

volt$
voltage
1 volt

[^456]
4
3
9

(CAP)|(IND) matches to CAP or IND

Metacharact
er

Description

. (period)

Any character

^ (carrot)

Start of string

End of string

[^]

Negation

x|y

OR operator

Quantifiers
Number of occurrences
(+|-)?10%
+10%, -10%, 10%

12.3(4)*
12.34, 12.344, 12.3444, etc.
12.3
Not 12. or 12

0.(3)+ W
0.3 W, 0.33 W, 0.333 W, etc.

Quantifier

Meaning

X?

X, 0 or 1
occurrences

X*

X, 0 or more
occurrences

X+

X, 1 or more
occurrences

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)?

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)?(\s)*

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)?(\s)*

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?
TAPE AND REEL, TAPEANDREEL, T AND R, TR, T R,
TANDR, T & R, T&R, TAPE & REEL, TAPE&REEL

T(APE)?(\s)*((AND)|&)?(\s)*R(EEL)?

Recap
We use regexes as a pattern to search through text
Character classes, metacharacters, quantifiers
Can get complicated!
M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})

So, why is this important for us?


In general, regexes are very confusing to write
Not industry specific
Value Expressions
10 pF, 100 nF, .1 uF
\vFarad

Shortcut
http://localhost:8080/definition-manager/regex

You might also like