You are on page 1of 32

Lecture 2: R's core concepts

Ben Fanson
Simeon Lisovski

Quick Refresher
1) Rstudio
set up a new project
build nested folders for programs, data, figures/tables
tips: tab for autofill, tab+F1 for help page, ctrl-R or ctrl-Enter to submit
code
2) Initialization files (Rprofile.site, .Rprofile)

3) Common R mistakes
cAse SenSitive
R uses '/' for file paths (e.g. C:/User/...)
= vs ==



Lecture Outline
1) Data types

2) R classes

3) Subsetting



Helpful references
- Zurr et al. 2009. Beginning R
- http://adv-r.had.co.nz/ (more advanced resource for this lecture)
1. Interactive language that is executed immediately
[not compiled like SAS, C, SPSS]


2. Matrix language (like Matlab)




3. Object-oriented



R as a programming
language
What is an object?
In R, an object is anything created by '<-'.
new_object <- 'some information here'

All the objects you create are listed in the
Environment tab in Rstudio.

Best to go through different examples of objects to
get a feel for what they are...



Let's look at a typical dataset
[the dataset is an object]
Bird_id
1
2
3
4
5
6
Treatment
t1
t2
t3
t1
t2
t3
growth rate
12.3
10.3
14.5
14.3
9.3
15.6
Object Properties
1) 3 columns
2) 6 rows (ignore header)
3) 2 columns are numeric
4) 1 column is character







x <-

could look at each column
separately
[3 objects: id, trt, rate]
Bird_id
1
2
3
4
5
6
Treatment
t1
t2
t3
t1
t2
t3
growth rate
12.3
10.3
14.5
14.3
9.3
15.6
Object Properties
1) 1 column
2) 6 rows
3) 6 different bird ids





Object Properties
1) 1 column
2) 6 rows
3) 3 treatment levels





Object Properties
1) 1 column
2) 6 rows
3) 6 unique growth rates





id <-

trt <-

rate <-

could look at each column
separately
growth rate
12.3
10.3
14.5
14.3
9.3
15.6
what might you want to know about the object
1) mean
2) standard deviation
3) minimum and maximum values





rate =

mean of growth rate
growth rate
12.3
10.3
14.5
14.3
9.3
15.6
mean(rate) = mean( )

what about for treatment?
mean(trt) = mean( )

Treatment
t1
t2
t3
t1
t2
t3
As mean of treatment (or any other character variable) makes no sense, the
mean() function should look for the object property that indicates that the
object is numeric.






Main R data types
[a property of an object]
1. numeric
1, 3.14, 0.001

2. character (aka strings)
't1', 't2', 'green', 'male'

3. factor (ordinal character variable)
'low', 'medium', 'high'
sets each level to a rank
e.g. 'low'=1, 'medium'=2, 'high'=3







R classes
[the structure of an object]
Bird_id
1
2
3
4
5
6
Treatment
t1
t2
t3
t1
t2
t3
growth rate
12.3
10.3
14.5
14.3
9.3
15.6
vs.

Bird_id
1
2
3
4
5
6
1) Structure
1 column
1 or more rows [called elements]

2) Property
all one data type (e.g. character, numeric)

3) Useful functions
c() # create the vector
class() # what data type is it?
is.vector() # ask if the object is a vector
length() # how many elements?




Vector Treatment
t1
t2
t3
t1
t2
t3
1) Structure
multiple columns
multiple row

2) Property
mix of data types across columns, but single data type
in each column

3) Useful functions
data.frame() # create a data.frame
is.data.frame() # ask if object is a data.frame
dim() # how many rows and columns are there?
nrow() # how many rows?
ncol() # how many columns?




data frame
data frame

Basic R classes
[types of objects]
homogenous
(single data type)
heterogeneous
(multiple data types)
http://adv-r.had.co.nz/Data-structures.html
List (most flexible structure)
Note Lists are vital to output from statistical functions (as well as many other functions). Knowing this really
helps with figuring out to handle this output so you can extract specific information.
1) Structure
multiple elements
potential for nested elements

2) Property
mix of data types

3) Useful functions
list() # creates a list
is.list() # ask if object is a list
str() # what is the structure of the list?




List Example
So far, we have seen vectors, matrices, arrays, lists, data.frames, but there
are lots of other classes (most stat functions have their own) . Many of
these classes are just lists with a specific structure.



More R classes
When you run a statistical analysis, you will want to be able to extract
information from that object. There are already written functions to do
that, but they do not allows do what you want.
What is the point of this???
x[ position_number ]
Subsetting Vector
growth rate
12.3
10.3
14.5
14.3
9.3
15.6
x[ row_number, column_number ]









Array are similar... x[ first_dimension, second_dimension, third_dimension, ...]

Subsetting Matrix
Two main methods...
'[]' # the object returned is a list





Subsetting List
Two main methods...
'$' # the object returned is in the original class (here, a integer vector)






Subsetting List
Data frame have both matrix and list properties
Subsetting Data frame
For the next few weeks now, we will be focusing on data frames, as this
is how your data will be organized.

We will discuss lists more later.
Working with Data frames
R Core Concepts
string functions
dates
regular expressions
Next Week
Lecture 2: Hands on Section
Overview of Course Project
Filename Description Source
dataBrains.csv Relative brain size in ~750 species
http://onlinelibrary.wiley.com/doi/10.1111/1365-
2656.12255/full
dataGenome.xlsx Genome size in ~900 species
www.genomesize.com
dataMass.csv Bird masses (all birds?)
https://ag.purdue.edu/fnr/Documents/WeightBookUpdate.pdf
dataLifespan.txt Lifespan for ~500 bird species
http://www.demogr.mpg.de/longevityrecords/0303.htm
Bird Data (go to http://github.com/bfanson/Rcourse_proj)
Objective
We will be looking for correlations among species level traits in birds (and then
develop post-hoc stories for any correlations)
Getting Started
1) Open Rcourse_proj.R

2) install 'xlsx' package and add library(xlsx) to your .Rprofile for start-up

3) close and re-open Rcourse_proj.R
this is to load .Rprofile again

4) download and open Lecture2.r script (we will follow this script)
http://github.com/bfanson/Rcourse_proj