Professional Documents
Culture Documents
String in R
String in R
library(tidyverse)
Strings are not glamorous, high-profile components of R, but they do play a big
role in many data cleaning and preparation tasks.
str_length(x)
str_sub(x, 1, 2)
Most string functions work with regular expressions, a concise language for
• str_subset(x, "[aeiou]")
• str_count(x, "[aeiou]")
There are seven main verbs that work with patterns:
• str_detect(x, "[aeiou]")
• Uses consistent function and argument names. The first argument is always the vector of
strings to modify, which makes stringer work particularly well in conjunction with the pipe.
• Simplifies string operations by eliminating options that you don’t need 95% of the time.
• Produces outputs than can easily be used as inputs. This includes ensuring that missing
inputs result in missing outputs, and zero length inputs result in zero length outputs.
letters %>%
.[1:10] %>%
str_pad(3, "right") %>%
str_c(letters[2:11])
In R, missing values are contagious. If you want them to print as "NA", use
str_replace_na()
str_c(
"Good ", time, " ", name,
if (day1) " and Have A NICE DAY",
".“ )
• str_c(c(“Today", “is", “Monday"), collapse = ", ")
• # names of states
• states <- rownames(USArrests)
• # substr
• substr(x = states, start = 1, stop = 4)
• #> [1] "Alab" "Alas" "Ariz" "Arka" "Cali" "Colo" "Conn" "Dela" "Flor" "Geor"
• #> [11] "Hawa" "Idah" "Illi" "Indi" "Iowa" "Kans" "Kent" "Loui" "Main" "Mary"
• #> [21] "Mass" "Mich" "Minn" "Miss" "Miss" "Mont" "Nebr" "Neva" "New " "New "
• #> [31] "New " "New " "Nort" "Nort" "Ohio" "Okla" "Oreg" "Penn" "Rhod" "Sout"
• #> [41] "Sout" "Tenn" "Texa" "Utah" "Verm" "Virg" "Wash" "West" "Wisc" "Wyom"
• # abbreviate state names
• states2 <- abbreviate(states)
abbreviate(states, minlength = 5)
state_chars = nchar(states)
state_chars
# longest name
states[which(state_chars == max(state_chars))]
Some Computations
summary(nchar(states))
• # histogram
hist(nchar(states), las = 1, col = "gray80", main = "Histogram",
xlab = "number of characters in US State names")
• https://stringr.tidyverse.org/
• https://www.gastonsanchez.com/r4strings/reversing.html
• USArrests