Intro to R & Tidyverse

Intro to R & Tidy - STAT 5000
Bird
7/14/2021
knitr::opts_chunk$set(echo = TRUE)
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

## v ggplot2 3.3.5 v purrr 0.3.4
## v tibble 3.1.2 v dplyr 1.0.7
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Welcome to Week 1 Introduction to R and Tidyverse!
The first thing we need to do before we learn to program in R is to learn how to get started.
This is R Markdown, a package inside of R Studio, which uses R. It’s a lot in the beginning but play with it
for a few hours and it should be easy peasy.
First, download R. https://www.r-project.org/
Second, download R Studio. https://www.rstudio.com/
Third, spend some time with RMarkdown.
# install.packages("rmarkdown")
# library(rmarkdown)
install.packages() will install a variety of useful packages for you.

Once installed, RMarkdown will take some time to figure out.
File > New File > R Markdown .. will create a neat little file to start out with.
When you try to Knit your file into an even neater little PDF, you will encounter an issue..
So make sure you follow the error codes and debug it (Hint: you need a TeX application to process it).
1
Let’s get to some coding though.
This will generally follow along with the textbook so you don’t need to have it yourself.
Once we get to more advanced things, the textbook won’t be able to keep up anyway so you still won’t need
it.
#options(width=70)
ruv <- runif(n=20,min=0,max=1)
round(ruv,4)
## [1] 0.7777 0.5198 0.0846 0.2466 0.5978 0.5376 0.0485 0.7557 0.1694 0.4653
## [11] 0.7779 0.3463 0.9613 0.3301 0.4665 0.2034 0.6380 0.8020 0.5239 0.1976
Here, we set a vector, ruv, equal to a random uniform variable via the runif function.
We sample 20 with a minimum possible value of 0 and maximum possible value of 1.
2
R as a simple calculator can be useful from time to time:
(8 * 3) + 12/40 - (7ˆ3) + sqrt(9)
## [1] -315.7
In R Markdown, I only entered my code. echo=TRUE parameter will keep the original code in the PDF.
For homework, you can code directly in R Markdown using code blocks. Keeping echo=TRUE will let me see
your code before your output prints as it normally would in R or R Studio.
3
Text formatting
italic or italic
bold bold
superscript2 and subscript2
Single backslash at the end of a line for new line.
Headings
1st Level Header

2nd Level Header
3rd Level Header
Lists
• Bulleted list item 1
• Item 2
– Item 2a
– Item 2b
1. Numbered list item 1
2. Item 2. The numbers are incremented automatically in the output.
Links and images

http://example.com
linked phrase
Tables
First Header Second Header

Content Cell Content Cell
Content Cell Content Cell
4
Vectors
x <- 3 # The <- is an assignment function that R uses as a foundation.
y <- c(2,3,4) #c for concatenate, which just combines numbers into a vector.
z <- c(10,20,30,40)
x+y
## [1] 5 6 7
x+z
## [1] 13 23 33 43
y+z
## Warning in y + z: longer object length is not a multiple of shorter object

## length
## [1] 12 23 34 42
x+c(5,5,5,5,5)
## [1] 8 8 8 8 8
z+c(1,2,3,4,5)
## Warning in z + c(1, 2, 3, 4, 5): longer object length is not a multiple of

## shorter object length
## [1] 11 22 33 44 15
LogicalVector <- (x < y)
LogicalVector
## [1] FALSE FALSE TRUE

typeof(LogicalVector)
## [1] "logical"
5
Booleans
x <- c(FALSE, TRUE, FALSE)
y <- c(FALSE, TRUE, TRUE)
x & y #are BOTH values true?
## [1] FALSE TRUE FALSE

x | y #are either of them true?
## [1] FALSE TRUE TRUE

x == y #do they equal each other? THIS IS USED A LOT IN PROGRAMMING
## [1] TRUE TRUE FALSE

x != y #do they not equal each other? ALSO USED A LOT
## [1] FALSE FALSE TRUE

print(z)
zDouble <- z
zInteger <- as.integer(zDouble)
typeof(zInteger)
## [1] "integer"
You can also change variables (as long as it makes sense) with the following commands:
as.logical()
as.numeric()
as.double()
as.complex() #any physicists here?
as.character()
as.list()
This is especially useful with data when you want to manipulate entire variables.
Perhaps the data came to you in an unclean manner..
6
Other Basic Functionality
print(1)
## [1] 1
print("You need to make sure YOU knOW wh4T YOU ARE PRintinG")
## [1] "You need to make sure YOU knOW wh4T YOU ARE PRintinG"
Printing will keep everything identical.
You can also print functions to see how many parameters it takes and other info:
print(exp)
## function (x) .Primitive("exp")

print(log)
## function (x, base = exp(1)) .Primitive("log")

Some math functions to know:
print() – prints objects
log() – computes logarithms
exp() – computes the exponential function
sqrt() – takes the square root
abs() – returns the absolute value
sin() – returns the sine
cos() – returns the cosine
tan() – returns the tangent
asin() – returns the arc-sine
factorial() – returns the factorial
sign() – returns the sign (negative or positive)
round() – rounds the input to the desired digit
If you’re interested in mathematical computations, better programming languages exist for intense stuff.
7
Random tid-bits
< less than
<= less than or equal to
> greater than
>= greater than or equal to
== equal
!= not equal
& and
| or
General Help and Searching for help..

?log for help with log function, for ex.
??logit for in depth details on logit, for ex.
8
Matrices (and Arrays)
In the future, we will aim to surpass base R designations with things like Tibbles.
For now, the basics:
a 2-D array is a matrix in R.
M <- matrix(data=1:24,nrow=4, byrow=TRUE)
This is saying fill a matrix with the numbers 1 through 24, 4 rows, 6 columns, and order it by row not column.
M2 <- matrix(data=c(1,2,3,4,5,6),nrow=3, byrow=FALSE)
is.array(M2)
## [1] TRUE
is.matrix(M2)
## [1] TRUE
Now let’s practice making an actual matrix from scratch..
Given data in any format, you can manipulate it to fit the code you want to produce, i.e. by row or col
preference.
City1 has temperatures on three days of 80,70,75
temp.data <- matrix(c(80,70,75,55,56,45,20,22,31), nrow=3, ncol=3, byrow=TRUE,

dimnames = list(c("City1","City2","City3"), c("Day1","Day2","Day3")))
temp.data
## Day1 Day2 Day3

## City1 80 70 75
## City2 55 56 45
## City3 20 22 31
dim(temp.data)
## [1] 3 3
temp.data[2,] #2nd row, all columns
## Day1 Day2 Day3

## 55 56 45
temp.data[2,3] #2nd row, 3rd column
## [1] 45
temp.data[1, ,drop=FALSE] #drop=FALSE will keep this a matrix instead of defaulting into a vector.
## Day1 Day2 Day3

## City1 80 70 75
9
Data Frames
NumVec <- c(1,2,3,4)
CharVec <- c("a","b","c","d")
LogVec <- c("TRUE","TRUE","TRUE","FALSE")
df <- data.frame(NumVec,CharVec,LogVec)
df
## NumVec CharVec LogVec

## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
## 4 4 d FALSE
dfTibble <- as_tibble(df)
dfTibble
## # A tibble: 4 x 3
## NumVec CharVec LogVec
## <dbl> <chr> <chr>
## 1 1 a TRUE
## 2 2 b TRUE
## 3 3 c TRUE
## 4 4 d FALSE
In this course, we will usually be skipping the intro stuff in favor of Tidyverse options like Tibbles.
You can load data directly into R if the data set is supported, such as:
as_tibble(iris)
## # A tibble: 150 x 5
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## <dbl> <dbl> <dbl> <dbl> <fct>
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
## # ... with 140 more rows
#or store it
iris_df <- as_tibble(iris)
10
Tidyverse %>% Pipes
You can read the rest of the chapter if you’d like, but most of your data analysis will be done in Tidyverse.
Let’s check out the basics that will get you on your way..
as_tibble(iris)
## # A tibble: 150 x 5
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
#or store it
11
Let’s see some example code:
iris_df %>%
group_by(Species) %>%
summarize(m = mean(Sepal.Length)) %>%
ungroup()
## Species m
## <fct> <dbl>
## 1 setosa 5.01
## 2 versicolor 5.94
## 3 virginica 6.59
First, %>% is called a Pipe. It is a Tidyverse shortcut that allows for easy processes to occur in intuitive
order.
In the above code, group_by() lets us take a specific variable and group it by each different level.
Then, we %>% to summarize, a function that gives convenient statistics, in this case, the Mean of a different
variable.
Last, we ungroup() to get back to our original Tibble (or specialized data frame).
12
Mutate
as_tibble(iris)
## # A tibble: 150 x 5
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
#or store it
iris_df_v2 <-iris_df %>% mutate(pl2 = Petal.Length ˆ 2,

four_sl = Sepal.Length * 4)
iris_df_v2
## # A tibble: 150 x 7
## Sepal.Length Sepal.Width Petal.Length Petal.Width Species pl2 four_sl
## <dbl> <dbl> <dbl> <dbl> <fct> <dbl> <dbl>
## 1 5.1 3.5 1.4 0.2 setosa 1.96 20.4
## 2 4.9 3 1.4 0.2 setosa 1.96 19.6
## 3 4.7 3.2 1.3 0.2 setosa 1.69 18.8
## 4 4.6 3.1 1.5 0.2 setosa 2.25 18.4
## 5 5 3.6 1.4 0.2 setosa 1.96 20
## 6 5.4 3.9 1.7 0.4 setosa 2.89 21.6
## 7 4.6 3.4 1.4 0.3 setosa 1.96 18.4
## 8 5 3.4 1.5 0.2 setosa 2.25 20
## 9 4.4 2.9 1.4 0.2 setosa 1.96 17.6
## 10 4.9 3.1 1.5 0.1 setosa 2.25 19.6
13
Summarize
as_tibble(iris)
## # A tibble: 150 x 5
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
#or store it
Summarize will collapse all rows and return a summary statistic.

iris_df %>%
summarize(avg.sl = mean(Sepal.Length))
## avg.sl
## <dbl>
## 1 5.84
iris_df %>%
summarize(sd.sl = sd(Sepal.Length))
## sd.sl
## <dbl>
## 1 0.828
14
Filter
Choose specific ROWS
as_tibble(iris)
## # A tibble: 150 x 5
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
#or store it
iris_df_setosa_only <- iris_df %>% filter(Species == "setosa")
iris_df_setosa_only
## # A tibble: 50 x 5
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
## 7 4.6 3.4 1.4 0.3 setosa
## 8 5 3.4 1.5 0.2 setosa
## 9 4.4 2.9 1.4 0.2 setosa
## 10 4.9 3.1 1.5 0.1 setosa
15
Select
Choose specific COLUMNS
iris_df %>% select(Sepal.Length, Sepal.Width) %>% glimpse()
## Rows: 150
## Columns: 2
## $ Sepal.Length <dbl> 5.1, 4.9, 4.7, 4.6, 5.0, 5.4, 4.6, 5.0, 4.4, 4.9, 5.4, 4.~
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.~
iris_df %>% select(-Sepal.Length, -Sepal.Width) %>% glimpse()
## Rows: 150
## Columns: 3
## $ Petal.Length <dbl> 1.4, 1.4, 1.3, 1.5, 1.4, 1.7, 1.4, 1.5, 1.4, 1.5, 1.5, 1.~
## $ Petal.Width <dbl> 0.2, 0.2, 0.2, 0.2, 0.2, 0.4, 0.3, 0.2, 0.2, 0.1, 0.2, 0.~
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s~
16
Arrange
Use this to arrange your data within a variable
iris_df %>% arrange(Sepal.Length) %>% glimpse()
## Rows: 150
## Columns: 5
## $ Sepal.Length <dbl> 4.3, 4.4, 4.4, 4.4, 4.5, 4.6, 4.6, 4.6, 4.6, 4.7, 4.7, 4.~
## $ Sepal.Width <dbl> 3.0, 2.9, 3.0, 3.2, 2.3, 3.1, 3.4, 3.6, 3.2, 3.2, 3.2, 3.~
## $ Petal.Length <dbl> 1.1, 1.4, 1.3, 1.3, 1.3, 1.5, 1.4, 1.0, 1.4, 1.3, 1.6, 1.~
## $ Petal.Width <dbl> 0.1, 0.2, 0.2, 0.2, 0.3, 0.2, 0.3, 0.2, 0.2, 0.2, 0.2, 0.~
## $ Species <fct> setosa, setosa, setosa, setosa, setosa, setosa, setosa, s~
17

Intro to R & Tidyverse

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Intro to R & Tidyverse

Uploaded by

Copyright:

Available Formats

Intro to R & Tidy - STAT 5000

## -- Attaching packages --------------------------------------- tidyverse 1.3.1 --

install.packages() will install a variety of useful packages for you.

Single backslash at the end of a line for new line.

1st Level Header

Links and images

First Header Second Header

## Warning in y + z: longer object length is not a multiple of shorter object

## Warning in z + c(1, 2, 3, 4, 5): longer object length is not a multiple of

## [1] FALSE FALSE TRUE

x & y #are BOTH values true?

## [1] FALSE TRUE FALSE

## [1] FALSE TRUE TRUE

## [1] TRUE TRUE FALSE

## [1] FALSE FALSE TRUE

## function (x) .Primitive("exp")

## function (x, base = exp(1)) .Primitive("log")

General Help and Searching for help..

temp.data <- matrix(c(80,70,75,55,56,45,20,22,31), nrow=3, ncol=3, byrow=TRUE,

## Day1 Day2 Day3

## Day1 Day2 Day3

## Day1 Day2 Day3

## NumVec CharVec LogVec

iris_df_v2 <-iris_df %>% mutate(pl2 = Petal.Length ˆ 2,

Summarize will collapse all rows and return a summary statistic.

iris_df_setosa_only <- iris_df %>% filter(Species == "setosa")

iris_df %>% select(Sepal.Length, Sepal.Width) %>% glimpse()

iris_df %>% arrange(Sepal.Length) %>% glimpse()

You might also like