You are on page 1of 29

JOINING DATA IN R WITH DPLYR

Binds
Joining Data in R with dplyr
Joining Data in R with dplyr

● rbind()

● cbind()

● bind_rows()

● bind_cols()
Joining Data in R with dplyr

bind_rows()
> band1 > band2
name surname name surname
1 John Lennon 1 Mick Jagger
2 Paul McCartney 2 Keith Richards
3 George Harrison 3 Charlie Watts
4 Ringo Starr 4 Ronnie Wood

> bind_rows(band1, band2)


name surname
1 John Lennon
2 Paul McCartney
tables to combine
3 George Harrison
4 Ringo Starr
5 Mick Jagger
6 Keith Richards
7 Charlie Watts
8 Ronnie Wood
Joining Data in R with dplyr

bind_cols()
> band1 > plays1
name surname instrument born
1 John Lennon 1 Guitar 1940
2 Paul McCartney 2 Bass 1942
3 George Harrison 3 Guitar 1943
4 Ringo Starr 4 Drums 1940

> bind_cols(band1, plays1)


name surname instrument born
1 John Lennon Guitar 1940
2 Paul McCartney Bass 1942
3 George Harrison Guitar 1943
4 Ringo Starr Drums 1940
Joining Data in R with dplyr

Benefits of bind_rows() and bind_cols()


● Faster
● Return a tibble
● Can handle lists of data frames
● .id
Joining Data in R with dplyr

bind_rows()
> band1 > band2
name surname name surname
1 John Lennon 1 Mick Jagger
2 Paul McCartney 2 Keith Richards
3 George Harrison 3 Charlie Watts
4 Ringo Starr 4 Ronnie Wood

> bind_rows(Beatles = band1, Stones = band2, .id = "band")


band name surname
1 Beatles John Lennon
2 Beatles Paul McCartney
Label names for new Column name for new
3 Beatles George Harrison
column column
4 Beatles Ringo Starr
5 Stones Mick Jagger
6 Stones Keith Richards
7 Stones Charlie Watts
8 Stones Ronnie Wood
JOINING DATA IN R WITH DPLYR

Let’s practice!
JOINING DATA IN R WITH DPLYR

Build a be!er
data frame
Joining Data in R with dplyr

● data.frame()

● as.data.frame()

● data_frame()

● as_data_frame()
Joining Data in R with dplyr

data.frame() defaults
● Changes strings to factors
● Adds row names
● Changes unusual column names
Joining Data in R with dplyr

data_frame()
> data_frame(
+ Beatles = c("John", "Paul", "George", "Ringo"),
+ Stones = c("Mick", "Keith", "Charlie", "Ronnie"),
+ Zeppelins = c("Robert", "Jimmy", "John Paul", "John")
+ )
# A tibble: 4 × 3
Beatles Stones Zeppelins
<chr> <chr> <chr>
1 John Mick Robert
2 Paul Keith Jimmy
3 George Charlie John Paul
4 Ringo Ronnie John
Joining Data in R with dplyr

data_frame()
data_frame() will not…
● Change the data type of vectors (e.g. strings to factors)
● Add row names
● Change column names
● Recycle vectors greater than length one
Joining Data in R with dplyr

data_frame()
● Evaluates arguments lazily, in order
> data_frame(
+ numbers = 1:5,
+ squares = numbers ^ 2
+ )
# A tibble: 5 × 2
numbers squares
<int> <dbl>
1 1 1
2 2 4
3 3 9
4 4 16
5 5 25

● Returns a tibble
Joining Data in R with dplyr

as_data_frame()
JOINING DATA IN R WITH DPLYR

Let’s practice!
JOINING DATA IN R WITH DPLYR

Working with data types


Joining Data in R with dplyr

> 1 + 1
[1] 2

> "one" + "one"


Error in "one" + "one" : non-numeric argument to binary operator
Joining Data in R with dplyr

Character Character?
Number?

Number
Joining Data in R with dplyr

Atomic data types


> typeof(TRUE) Logical
[1] "logical"

> typeof("hello") Character (i.e. string)


[1] "character"

> typeof(3.14) Double (i.e. numeric w/ decimal)


[1] "double"

> typeof(1L) Integer (i.e. numeric w/o decimal)


[1] "integer"

> typeof(1 + 2i) Complex


[1] "complex"

> typeof(raw(1)) Raw


[1] "raw"
Joining Data in R with dplyr

Classes
> x <- c(1L, 2L, 3L, 2L)
> x
[1] 1 2 3 2
> typeof(x)
[1] "integer"
> class(x)
[1] "integer"

> attributes(x) <- list(class = "factor",


levels = c("A", "B", "C", "D"))
> x
[1] A B C B
Levels: A B C D 1L = A
> typeof(x) 2L = B
[1] "integer" 3L = C
> class(x) 4L = D
[1] "factor"
JOINING DATA IN R WITH DPLYR

Let’s practice!
JOINING DATA IN R WITH DPLYR

dplyr's coercion rules


Joining Data in R with dplyr

Character Character?
Number?

Number
Joining Data in R with dplyr

Integer
Character
(string)
Double as.character()
Logical

Character

Integer
Double
Logical as.numeric()
TRUE -> 1
FALSE -> 0

Double

Integer Logical as.integer()


TRUE -> 1
FALSE -> 0

Integer
Joining Data in R with dplyr

factors
# x is a factor
> x
[1] A B C B
Levels: A B C D

# How x is stored?
> unclass(x)
[1] 1 2 3 2
attr(,"levels")
[1] "A" "B" "C" "D"

> as.character(x)
[1] "A" "B" "C" "B"
> as.numeric(x)
[1] 1 2 3 2
Joining Data in R with dplyr

factors
# y is a factor
> y <- factor(c(5, 6, 7, 6))
> y
[1] 5 6 7 6
Levels: 5 6 7

> unclass(y)
[1] 1 2 3 2
attr(,"levels")
[1] "5" "6" "7"

> as.character(y)
[1] "5" "6" "7" "6"
> as.numeric(y)
[1] 1 2 3 2
> as.numeric(as.character(y))
[1] 5 6 7 6
Joining Data in R with dplyr

dplyr's coercion behavior


● dplyr functions will not automatically coerce data types
● Returns an error
● Expects you to manually coerce data
● Exception: factors
● dplyr converts non-aligning factors to strings
● Gives warning message
JOINING DATA IN R WITH DPLYR

Let’s practice!

You might also like