You are on page 1of 1

pattern

regmatches(string, regexpr(pattern, string))


Cheat Sheet extract first match [1] "tam" "tim"
string regmatches(string, gregexpr(pattern, string))
extract all matches, outputs a list
[[1]] "tam" [[2]] character(0) [[3]] "tim" "tom"
stringr::str_extract(string, pattern)
extract first match [1] "tam" NA "tim"
[[:digit:]] or \\d Digits; [0-9] stringr::str_extract_all(string, pattern)
\\D Non-digits; [^0-9] extract all matches, outputs a list
[[:lower:]] Lower-case letters; [a-z] > string <- c("Hiphopopotamus", "Rhymenoceros", "time for bottomless lyrics")
stringr::str_extract_all(string, pattern, simplify = TRUE)
[[:upper:]] Upper-case letters; [A-Z] > pattern <- "t.m"
extract all matches, outputs a matrix
[[:alpha:]] Alphabetic characters; [A-z]
stringr::str_match(string, pattern)
[[:alnum:]] Alphanumeric characters [A-z0-9]
extract first match + individual character groups
\\w Word characters; [A-z0-9_]
\\W Non-word characters grep(pattern, string) regexpr(pattern, string) stringr::str_match_all(string, pattern)
[[:xdigit:]] or \\x Hexadec. digits; [0-9A-Fa-f] [1] 1 3 find starting position and length of first match extract all matches + individual character groups
[[:blank:]] Space and tab grep(pattern, string, value = TRUE) gregexpr(pattern, string)
[[:space:]] or \\s Space, tab, vertical tab, newline, [1] "Hiphopopotamus" find starting position and length of all matches
form feed, carriage return [2] "time for bottomless lyrics“ stringr::str_locate(string, pattern)
\\S Not space; [^[:space:]] sub(pattern, replacement, string)
grepl(pattern, string) find starting and end position of first match replace first match
[[:punct:]] Punctuation characters; [1] TRUE FALSE TRUE
!"#$%&’()*+,-./:;<=>?@[]^_`{|}~ stringr::str_locate_all(string, pattern) gsub(pattern, replacement, string)
[[:graph:]] Graphical characters; stringr::str_detect(string, pattern) find starting and end position of all matches replace all matches
[[:alnum:][:punct:]] [1] TRUE FALSE TRUE
stringr::str_replace(string, pattern, replacement)
[[:print:]] Printable characters;
[[:alnum:][:punct:]\\s] replace first match
[[:cntrl:]] or \\c Control characters; \n, \r etc. stringr::str_replace_all(string, pattern, replacement)
strsplit(string, pattern) or stringr::str_split(string, pattern) replace all matches

\n New line . Any character except \n


^ Start of the string * Matches at least 0 times
\r Carriage return | Or, e.g. (a|b)
$ End of the string + Matches at least 1 time
\t Tab […] List permitted characters, e.g. [abc]
\\b Empty string at either edge of a word ? Matches at most 1 time; optional string
\v Vertical tab [a-z] Specify character ranges
\\B NOT the edge of a word {n} Matches exactly n times
\f Form feed [^…] List excluded characters
\\< Beginning of a word {n,} Matches at least n times
(…) Grouping, enables back referencing using
\\> End of a word {n,m} Matches between n and m times
\\N where N is an integer

(?=) Lookahead (requires PERL = TRUE),


e.g. (?=yx): position followed by 'xy' By default R uses extended regular expressions. Metacharacters (. * + etc.) can be used as By default the asterisk * is greedy, i.e. it always
(?!) Negative lookahead (PERL = TRUE); You can switch to PCRE regular expressions literal characters by escaping them. Characters matches the longest possible string. It can be
position NOT followed by pattern using PERL = TRUE for base or by wrapping can be escaped using \\ or by enclosing them used in lazy mode by adding ?, i.e. *?.
(?<=) Lookbehind (PERL = TRUE), e.g. patterns with perl() for stringr. in \\Q...\\E.
(?<=yx): position following 'xy' Greedy mode can be turned off using (?U). This
(?<!) Negative lookbehind (PERL = TRUE); All functions can be used with literal searches switches the syntax, so that (?U)a* is lazy and
position NOT following pattern using fixed = TRUE for base or by wrapping (?U)a*? is greedy.
patterns with fixed() for stringr. Regular expressions can be made case insensitive
?(if)then If-then-condition (PERL = TRUE); use
using (?i). In backreferences, the strings can be
lookaheads, optional char. etc in if-clause
All base functions can be made case insensitive converted to lower or upper case using \\L or \\U
?(if)then|else If-then-else-condition (PERL = TRUE) Regular expressions can conveniently be
by specifying ignore.case = TRUE. (e.g. \\L\\1). This requires PERL = TRUE.
*see, e.g. http://www.regular-expressions.info/lookaround.html created using e.g. the packages rex or rebus.
http://www.regular-expressions.info/conditional.html

CC BY Ian Kopacka • ian.kopacka@ages.at Updated: 07/19

You might also like