You are on page 1of 2

Data import with the tidyverse : : CHEAT SHEET

Read Tabular Data with readr


read_*(file, col_names = TRUE, col_types = NULL, col_select = NULL, id = NULL, locale, n_max = Inf, One of the first steps of a project is to import OTHER TYPES OF DATA
skip = 0, na = c("", "NA"), guess_max = min(1000, n_max), show_col_types = TRUE) See ?read_delim outside data into R. Data is o en stored in Try one of the following
tabular formats, like csv files or spreadsheets. packages to import other types of files:
A|B|C
A B C read_delim("file.txt", delim = "|") Read files with any delimiter. If no The front page of this sheet shows • haven - SPSS, Stata, and SAS files
1 2 3 delimiter is specified, it will automatically guess. how to import and save text files into • DBI - databases
1|2|3 4 5 NA To make file.txt, run: write_file("A|B|C\n1|2|3\n4|5|NA", file = "file.txt")
4|5|NA R using readr. • jsonlite - json
The back page shows how to import • xml2 - XML
A B C read_csv("file.csv") Read a comma delimited file with period • httr - Web APIs
A,B,C spreadsheet data from Excel files
1 2 3 decimal marks. • rvest - HTML (Web Scraping)
1,2,3 4 5 NA write_file("A,B,C\n1,2,3\n4,5,NA", file = "file.csv") using readxl or Google Sheets using
4,5,NA googlesheets4. • readr::read_lines() - text data

read_csv2("file2.csv") Read semicolon delimited files with comma


Column Specification with readr
A B C
A;B;C
1.5 2 3 decimal marks.
1,5;2;3 4.5 5 NA write_file("A;B;C\n1,5;2;3\n4,5;5;NA", file = "file2.csv")
4,5;5;NA Column specifications define what data type each
USEFUL COLUMN ARGUMENTS
column of a file will be imported as. By default
A B C read_tsv("file.tsv") Read a tab delimited file. Also read_table(). readr will generate a column spec when a file is Hide col spec message
ABC read_*(file, show_col_types = FALSE)
1 2 3 read_fwf("file.tsv", fwf_widths(c(2, 2, NA))) Read a fixed width file. read and output a summary.
123 4 5 NA write_file("A\tB\tC\n1\t2\t3\n4\t5\tNA\n", file = "file.tsv")
4 5 NA spec(x) Extract the full column specification for Select columns to import
the given imported data frame. Use names, position, or selection helpers.
read_*(file, col_select = c(age, earn))
USEFUL READ ARGUMENTS spec(x)
# cols(
A B C No header 1 2 3 Skip lines # age = col_integer(), age is an
integer Guess column types
1 2 3 read_csv("file.csv", col_names = FALSE) 4 5 NA read_csv("file.csv", skip = 1) # sex = col_character(),
4 5 NA # earn = col_double() To guess a column type, read_ *() looks at the
A B C Read a subset of lines # ) first 1000 rows of data. Increase with guess_max.
x y z Provide header 1 2 3 read_csv("file.csv", n_max = 1) sex is a read_*(file, guess_max = Inf)
A B C read_csv("file.csv", earn is a double (numeric) character
1 2 3 col_names = c("x", "y", "z")) A B C Read values as missing
4 5 NA
NA 2 3 read_csv("file.csv", na = c("1")) COLUMN TYPES DEFINE COLUMN SPECIFICATION
4 5 NA
Read multiple files into a single table Each column type has a function and Set a default type
read_csv(c(“f1.csv”, “f2.csv”, “f3.csv"), Specify decimal marks corresponding string abbreviation. read_csv(
A;B;C
id = "origin_file") read_delim("file2.csv", locale = file,
1,5;2;3,0 locale(decimal_mark = ",")) • col_logical() - "l"
col_type = list(.default = col_double())
• col_integer() - "i" )
• col_double() - "d"
Save Data with readr • col_number() - "n"
Use column type or string abbreviation
read_csv(
• col_character() - "c" file,
write_*(x, file, na = "NA", append, col_names, quote, escape, eol, num_threads, progress) • col_factor(levels, ordered = FALSE) - "f" col_type = list(x = col_double(), y = "l", z = "_")
• col_datetime(format = "") - "T" )
A B C write_delim(x, file, delim = " ") Write files with any delimiter. • col_date(format = "") - "D" Use a single string of abbreviations
A,B,C • col_time(format = "") - "t"
1 2 3
write_csv(x, file) Write a comma delimited file. # col types: skip, guess, integer, logical, character
4 5 NA 1,2,3 • col_skip() - "-", "_" read_csv(
4,5,NA write_csv2(x, file) Write a semicolon delimited file. • col_guess() - "?" file,
col_type = "_?ilc"
write_tsv(x, file) Write a tab delimited file. )

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • Learn more at readr.tidyverse.org • readr 2.0.0 • readxl 1.3.1 • googlesheets4 1.0.0 • Updated: 2021-08



















ft










Import Spreadsheets
with readxl with googlesheets4
READ EXCEL FILES READ SHEETS
A B C D E A B C D E
1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5 1 x1 x2 x3 x4 x5 x1 x2 x3 x4 x5
2 x z 8 x NA z 8 NA 2 x z 8 x NA z 8 NA
3 y 7 9 10 y 7 NA 9 10 READXL COLUMN SPECIFICATION 3 y 7 9 10 y 7 NA 9 10 GOOGLESHEETS4 COLUMN SPECIFICATION
s1 s1
Column specifications define what data type Column specifications define what data type
each column of a file will be imported as. each column of a file will be imported as.
read_excel(path, sheet = NULL, range = NULL) read_sheet(ss, sheet = NULL, range = NULL)
Read a .xls or .xlsx file based on the file extension. Read a sheet from a URL, a Sheet ID, or a dribble
Use the col_types argument of read_excel() to Use the col_types argument of read_sheet()/
See front page for more read arguments. Also from the googledrive package. See front page for
set the column specification. range_read() to set the column specification.
read_xls() and read_xlsx(). more read arguments. Same as range_read().
read_excel("excel_file.xlsx")
Guess column types Guess column types
To guess a column type, read_ excel() looks at SHEETS METADATA To guess a column type read_sheet()/
READ SHEETS the first 1000 rows of data. Increase with the URLs are in the form: range_read() looks at the first 1000 rows of data.
guess_max argument. https://docs.google.com/spreadsheets/d/ Increase with guess_max.
A B C D E read_excel(path, sheet = read_excel(path, guess_max = Inf) read_sheet(path, guess_max = Inf)
NULL) Specify which sheet SPREADSHEET_ID/edit#gid=SHEET_ID
to read by position or name. Set all columns to same type, e.g. character gs4_get(ss) Get spreadsheet meta data. Set all columns to same type, e.g. character
read_excel(path, sheet = 1) read_excel(path, col_types = "text") read_sheet(path, col_types = "c")
s1 s2 s3
read_excel(path, sheet = "s1") gs4_find(...) Get data on all spreadsheet files.
Set each column individually sheet_properties(ss) Get a tibble of properties Set each column individually
read_excel( for each worksheet. Also sheet_names(). # col types: skip, guess, integer, logical, character
excel_sheets(path) Get a
vector of sheet names. path, read_sheets(ss, col_types = "_?ilc")
s1 s2 s3
col_types = c("text", "guess", "guess",“numeric") WRITE SHEETS
excel_sheets("excel_file.xlsx")
) A B C write_sheet(data, ss =
1 x 4 1 1 x 4 NULL, sheet = NULL) COLUMN TYPES
A B C D E To read multiple sheets: 2 y 5 2 2 y 5
Write a data frame into a
COLUMN TYPES l n c D L
A B C D E 1. Get a vector of sheet 3 z 6 3 3 z 6
new or existing Sheet. TRUE 2 hello 1947-01-08 hello
s1
names from the file path. logical numeric text date list FALSE 3.45 world 1956-10-21 1
A B C D E gs4_create(name, ...,
2. Set the vector names to TRUE 2 hello 1947-01-08 hello
s1 s2 A B C D sheets = NULL) Create a
be the sheet names. FALSE 3.45 world 1956-10-21 1 • skip - "_" or "-" • date - "D"
1 new Sheet with a vector
s1 s2 3. Use purrr::map_dfr() to • guess - "?" • datetime - "T"
• skip • logical • date 2 of names, a data frame,
s1 s2 s3 read multiple files into • logical - "l" • character - "c"
• guess • numeric • list s1 or a (named) list of data
one data frame. • integer - "i" • list-column - "L"
• text frames.
• double - "d" • cell - "C" Returns
path <- "your_file_path.xlsx" A B C
sheet_append(ss, data,
x1 x2 x3 1 x1 x2 x3 • numeric - "n" list of raw cell data.
path %>% excel_sheets() %>% Use list for columns that include multiple data 2 1 x 4 sheet = 1) Add rows to
2 y 5
set_names() %>% types. See tidyr and purrr for list-column data. 3 z 6 3 2 y 5 the end of a worksheet. Use list for columns that include multiple data
map_dfr(read_excel, path = path) 4 3 z 6 types. See tidyr and purrr for list-column data.
s1

OTHER USEFUL EXCEL PACKAGES CELL SPECIFICATION FOR READXL AND GOOGLESHEETS4 FILE LEVEL OPERATIONS
For functions to write data to Excel files, see: Use the range argument of readxl::read_excel() or googlesheets4 also o ers ways to modify other
• openxlsx googlesheets4::read_sheet() to read a subset of cells from a aspects of Sheets (e.g. freeze rows, set column
• writexl A B C D E sheet. width, manage (work)sheets). Go to
1 1 2 3 4 5 2 3 4 read_excel(path, range = "Sheet1!B1:D2") googlesheets4.tidyverse.org to read more.
For working with non-tabular Excel data, see: 2 x y z NA y z read_sheet(ss, range = "B1:D2")
• tidyxl 3 6 7 9 10 For whole-file operations (e.g. renaming, sharing,
s1 Also use the range argument with cell specification functions placing within a folder), see the tidyverse
cell_limits(), cell_rows(), cell_cols(), and anchored(). package googledrive at
googledrive.tidyverse.org.

RStudio® is a trademark of RStudio, PBC • CC BY SA RStudio • info@rstudio.com • 844-448-1212 • rstudio.com • readxl.tidyverse.org and googlesheets4.tidyverse.org • readr 2.0.0 • readxl 1.3.1 • googlesheets4 1.0.0 • Updated: 2021-08






ff
















You might also like