Data Import With The Tidyverse::: Cheat Sheet

Data import with the tidyverse : : CHEAT SHEET
Read Tabular Data with readr

read_*(file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale, n_max = Inf, One of the first steps of a project is to import OTHER TYPES OF DATA
skip = 0, na = c("", "NA"), guess_max = min(1000, n_max), show_col_types = TRUE) See ?read_delim outside data into R. Data is o en stored in Try one of the following
tabular formats, like csv files or spreadsheets. packages to import other types of files:
A|B|C
A B C read_delim("file.txt", delim = "|") Read files with any delimiter. If no The front page of this sheet shows • haven - SPSS, Stata, and SAS files
1 2 3 delimiter is specified, it will automatically guess. how to import and save text files into • DBI - databases
1|2|3 4 5 NA To make file.txt, run: write_file("A|B|C\n1|2|3\n4|5|NA", file = "file.txt")
4|5|NA R using readr. • jsonlite - json
The back page shows how to import • xml2 - XML
A B C read_csv("file.csv") Read a comma delimited file with period • httr - Web APIs
A,B,C spreadsheet data from Excel files
1 2 3 decimal marks. • rvest - HTML (Web Scraping)
1,2,3 4 5 NA write_file("A,B,C\n1,2,3\n4,5,NA", file = "file.csv") using readxl or Google Sheets using
4,5,NA googlesheets4. • readr::read_lines() - text data
read_csv2("file2.csv") Read semicolon delimited files with comma

Column Specification with readr
A B C
A;B;C
1.5 2 3 decimal marks.
1,5;2;3 4.5 5 NA write_file("A;B;C\n1,5;2;3\n4,5;5;NA", file = "file2.csv")
4,5;5;NA Column specifications define what data type each
USEFUL COLUMN ARGUMENTS
column of a file will be imported as. By default
A B C read_tsv("file.tsv") Read a tab delimited file. Also read_table(). readr will generate a column spec when a file is Hide col spec message
ABC read_*(file, show_col_types = FALSE)
1 2 3 read_fwf("file.tsv", fwf_widths(c(2, 2, NA))) Read a fixed width file. read and output a summary.
123 4 5 NA write_file("A\tB\tC\n1\t2\t3\n4\t5\tNA\n", file = "file.tsv")
4 5 NA spec(x) Extract the full column specification for Select columns to import
the given imported data frame. Use names, position, or selection helpers.
read_*(file, col_select = c(age, earn))
USEFUL READ ARGUMENTS spec(x)
# cols(
A B C No header 1 2 3 Skip lines # age = col_integer(), age is an
integer Guess column types
1 2 3 read_csv("file.csv", col_names = FALSE) 4 5 NA read_csv("file.csv", skip = 1) # sex = col_character(),
4 5 NA # earn = col_double() To guess a column type, read_ *() looks at the
A B C Read a subset of lines # ) first 1000 rows of data. Increase with guess_max.
x y z Provide header 1 2 3 read_csv("file.csv", n_max = 1) sex is a read_*(file, guess_max = Inf)
A B C read_csv("file.csv", earn is a double (numeric) character
1 2 3 col_names = c("x", "y", "z")) A B C Read values as missing
4 5 NA
NA 2 3 read_csv("file.csv", na = c("1")) COLUMN TYPES DEFINE COLUMN SPECIFICATION
4 5 NA
Read multiple files into a single table Each column type has a function and Set a default type
read_csv(c(“f1.csv”, “f2.csv”, “f3.csv"), Specify decimal marks corresponding string abbreviation. read_csv(
A;B;C
id = "origin_file") read_delim("file2.csv", locale = file,
1,5;2;3,0 locale(decimal_mark = ",")) • col_logical() - "l"
col_type = list(.default = col_double())
• col_integer() - "i" )
• col_double() - "d"
Save Data with readr • col_number() - "n"
Use column type or string abbreviation
read_csv(
• col_character() - "c" file,
write_*(x, file, na = "NA", append, col_names, quote, escape, eol, num_threads, progress) • col_factor(levels, ordered = FALSE) - "f" col_type = list(x = col_double(), y = "l", z = "_")
• col_datetime(format = "") - "T" )
A B C write_delim(x, file, delim = " ") Write files with any delimiter. • col_date(format = "") - "D" Use a single string of abbreviations
A,B,C • col_time(format = "") - "t"
1 2 3
write_csv(x, file) Write a comma delimited file. # col types: skip, guess, integer, logical, character
4 5 NA 1,2,3 • col_skip() - "-", "_" read_csv(
4,5,NA write_csv2(x, file) Write a semicolon delimited file. • col_guess() - "?" file,
col_type = "_?ilc"
write_tsv(x, file) Write a tab delimited file. )
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at readr.tidyverse.org • readr 2.0.0 • readxl 1.3.1 • googlesheets4 1.0.0 • Updated: 2021-08
 
 
 

 
 
 
 
 

 
 

 
 

 
 
 
 
 
 

ft

 
 

 
 
 
 

 
 

 
Import Spreadsheets
with readxl with googlesheets4
READ EXCEL FILES READ SHEETS
A B C D E A B C D E
1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
2 x z 8 x NA z 8 NA 2 x z 8 x NA z 8 NA
3 y 7 9 10 y 7 NA 9 10 READXL COLUMN SPECIFICATION 3 y 7 9 10 y 7 NA 9 10 GOOGLESHEETS4 COLUMN SPECIFICATION
s1 s1
Column specifications define what data type Column specifications define what data type
each column of a file will be imported as. each column of a file will be imported as.
read_excel(path, sheet = NULL, range = NULL) read_sheet(ss, sheet = NULL, range = NULL)
Read a .xls or .xlsx file based on the file extension. Read a sheet from a URL, a Sheet ID, or a dribble
Use the col_types argument of read_excel() to Use the col_types argument of read_sheet()/
See front page for more read arguments. Also from the googledrive package. See front page for
set the column specification. range_read() to set the column specification.
read_xls() and read_xlsx(). more read arguments. Same as range_read().
read_excel("excel_file.xlsx")
Guess column types Guess column types
To guess a column type, read_ excel() looks at SHEETS METADATA To guess a column type read_sheet()/
READ SHEETS the first 1000 rows of data. Increase with the URLs are in the form: range_read() looks at the first 1000 rows of data.
guess_max argument. https://docs.google.com/spreadsheets/d/ Increase with guess_max.
A B C D E read_excel(path, sheet = read_excel(path, guess_max = Inf) read_sheet(path, guess_max = Inf)
NULL) Specify which sheet SPREADSHEET_ID/edit#gid=SHEET_ID
to read by position or name. Set all columns to same type, e.g. character gs4_get(ss) Get spreadsheet meta data. Set all columns to same type, e.g. character
read_excel(path, sheet = 1) read_excel(path, col_types = "text") read_sheet(path, col_types = "c")
s1 s2 s3
read_excel(path, sheet = "s1") gs4_find(...) Get data on all spreadsheet files.
Set each column individually sheet_properties(ss) Get a tibble of properties Set each column individually
read_excel( for each worksheet. Also sheet_names(). # col types: skip, guess, integer, logical, character
excel_sheets(path) Get a
vector of sheet names. path, read_sheets(ss, col_types = "_?ilc")
s1 s2 s3
col_types = c("text", "guess", "guess",“numeric") WRITE SHEETS
excel_sheets("excel_file.xlsx")
) A B C write_sheet(data, ss =
1 x 4 1 1 x 4 NULL, sheet = NULL) COLUMN TYPES
A B C D E To read multiple sheets: 2 y 5 2 2 y 5
Write a data frame into a
COLUMN TYPES l n c D L
A B C D E 1. Get a vector of sheet 3 z 6 3 3 z 6
new or existing Sheet. TRUE 2 hello 1947-01-08 hello
s1
names from the file path. logical numeric text date list FALSE 3.45 world 1956-10-21 1
A B C D E gs4_create(name, ...,
2. Set the vector names to TRUE 2 hello 1947-01-08 hello
s1 s2 A B C D sheets = NULL) Create a
be the sheet names. FALSE 3.45 world 1956-10-21 1 • skip - "_" or "-" • date - "D"
1 new Sheet with a vector
s1 s2 3. Use purrr::map_dfr() to • guess - "?" • datetime - "T"
• skip • logical • date 2 of names, a data frame,
s1 s2 s3 read multiple files into • logical - "l" • character - "c"
• guess • numeric • list s1 or a (named) list of data
one data frame. • integer - "i" • list-column - "L"
• text frames.
• double - "d" • cell - "C" Returns
path <- "your_file_path.xlsx" A B C
sheet_append(ss, data,
x1 x2 x3 1 x1 x2 x3 • numeric - "n" list of raw cell data.
path %>% excel_sheets() %>% Use list for columns that include multiple data 2 1 x 4 sheet = 1) Add rows to
2 y 5
set_names() %>% types. See tidyr and purrr for list-column data. 3 z 6 3 2 y 5 the end of a worksheet. Use list for columns that include multiple data
map_dfr(read_excel, path = path) 4 3 z 6 types. See tidyr and purrr for list-column data.
s1
OTHER USEFUL EXCEL PACKAGES CELL SPECIFICATION FOR READXL AND GOOGLESHEETS4 FILE LEVEL OPERATIONS
For functions to write data to Excel files, see: Use the range argument of readxl::read_excel() or googlesheets4 also o ers ways to modify other
• openxlsx googlesheets4::read_sheet() to read a subset of cells from a aspects of Sheets (e.g. freeze rows, set column
• writexl A B C D E sheet. width, manage (work)sheets). Go to
1 1 2 3 4 5 2 3 4 read_excel(path, range = "Sheet1!B1:D2") googlesheets4.tidyverse.org to read more.
For working with non-tabular Excel data, see: 2 x y z NA y z read_sheet(ss, range = "B1:D2")
• tidyxl 3 6 7 9 10 For whole-file operations (e.g. renaming, sharing,
s1 Also use the range argument with cell specification functions placing within a folder), see the tidyverse
cell_limits(), cell_rows(), cell_cols(), and anchored(). package googledrive at
googledrive.tidyverse.org.
RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • readxl.tidyverse.org and googlesheets4.tidyverse.org • readr 2.0.0 • readxl 1.3.1 • googlesheets4 1.0.0 • Updated: 2021-08
 
 
 
 
 

 
ff

 
 

 
 
 

 
 
 
 
 

 
 

 
 
 
 

Data Import With The Tidyverse::: Cheat Sheet

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Import With The Tidyverse::: Cheat Sheet

Uploaded by

Copyright:

Available Formats

Data import with the tidyverse : : CHEAT SHEET

Read Tabular Data with readr

read_csv2("file2.csv") Read semicolon delimited files with comma

You might also like