You are on page 1of 15

R Module 1 Notes

I. Getting Started with R

To enable us to use R, we firstly discuss its capabilities, describe how to install it,
illustrate some basic command, and know how to obtain help.

There are several statistical software packages that provide all sorts of analytical and
data management capabilities:
 R (www.r-project.org)
 SAS (www.sas.com)
 SPSS (www.spss.com)
 Stata (www.stata.com)

Practical Issues R SAS SPSS Stata

Price ++ (free) -- - +

Command Structure + + -- ++

Support + - -- ++

Ease of Teaching - -- + ++

R Commercial Package

Many different datasets (and other “objects”) One dataset available at a given time
available at same time.

Datasets can be of any dimension. Datasets are rectangular.

Functions can be modified. Functions are proprietary.

Experience is interactive-you program until Experience is passive-you choose an


you get exactly what you want. analysis, and they give you everything
they think you need

One stop shopping - almost every analytical Tend to have limited scope, forcing
tool you can think of is available you to learn additional programs; extra
options cost more and/or require you
to learn a different language.

R is free and will continue to exist. Nothing They cost money. There is no
can make it go away, its price will never guarantee they will continue to exist,
increase. but if they do, you can bet that their
prices will always increase.
R Module 1 Notes

CAVEAT:

“Using R is a bit akin to smoking. The beginning is difficult, one may get
headaches and even gag the first few times. But in the long run,it
becomes pleasurable and even addictive. Yet, deep down, for those
willing to be honest, there is something not fully healthy in it.”
--Francois Pinard

A. What is R?

• R is a statistical programming environment for performing standard and specialized


statistical tools
• R is a is a free open-source statistical package based on the S language developed
at Bell Labs (later commercially released by Mathsoft as Splus).
• R is a programming language, thus generating computer code to complete tasks is
typically required.
• Initially developed by Robert Gentleman and Ross Ihaka of University of Auckland;
now maintained by the “R core development team”
 Since 1997: international R-core team ~15 people & 1000s of code writers
and statisticians happy to share their libraries
• Cross platform compatibility: Windows, MacOS, Linux
• Very powerful for writing programs.
 Many statistical functions are already built in.
 Contributed packages expand the functionality to cutting edge research.

B. Installation

To install R, go to the R homepage at http://www.r-project.org/ and download the latest


version.
R Module 1 Notes

Select preferred CRAN


Mirror. Select the appropriate OS for your device.

Select install R for the first time. Download and install in your device.

C. The R Workspace

• Current working environment


• Comprised primarily of variables, datasets, functions
• Most functionality is provided through built-in and user-created functions ; all data objects
are kept in memory during an interactive session.
 Basic functions are available by default.
 Other functions are contained in packages

 R command window (Console)


◦ Used for entering commands, data manipulations, analyses, graphing
◦ Output: results of analyses, queries, etc. are written here
R Module 1 Notes

◦ Toggle through previous commands by using the up and down arrow keys

Interactive
Command Window
Commands are
typed here.

 R Scripts Window

R scripts
 A text file containing commands that you would enter on the command line of R
 To place a comment in a R script, use a hash mark (#) at the beginning of the line

File ► New Script

Menu bar
R Module 1 Notes

Tool bar
Button Functions
• Open : Opens R file.
• Load Workspace
• Save: Saves the current data.
• Copy
• Paste
• Copy and Paste
• Stop current computation
• Print

 Assignments, Operations, and Functions

• Arithmetic and Mathematical Operations:


 +, -, *, /, ^ are the standard arithmetic operators.
 Mod: %%
 sqrt, exp, log, log10, sin, cos, tan, …
• Other Operations:
 $ component selection HIGH
 [ , [[ subscripts, elements
 : sequence operator
 %*% matrix multiplication
 <, >, <=, >= inequality
 ==, != comparison
 ! not
 &, |, &&, || and, or
 ~ formulas
 <- assignment (or =)

• Functions:
– Almost everything in R is done through functions. Numeric and character
functions are commonly used in creating or recoding variables.
– Note that while the examples here apply functions to individual variables,
many can be applied to vectors and matrices as well.

• Some Numeric Functions:


Function Description
abs(x) absolute value: abs(-5.5) is 5.5
sqrt(x) square root: sqrt(4) is 2
ceiling(x) ceiling(3.475) is 4
floor(x) floor(3.475) is 3
R Module 1 Notes

trunc(x) trunc(5.99) is 5
round(x, digits=n) round(3.475, digits=2) is 3.48
signif(x, digits=n) signif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(x) also acos(x), cosh(x), acosh(x), etc.
log(x) natural logarithm
log10(x) common logarithm
exp(x) e^x
seq(from , to, by) generate a sequence
indices <- seq(1,10,2)
#indices is c(1, 3, 5, 7, 9)
rep(x, ntimes) repeat x n times
y <- rep(1:3, 2)
# y is c(1, 2, 3, 1, 2, 3)
cut(x, n) divide continuous variable in factor
with n levels
y <- cut(x, 5)
y
• Matrix Arithmetic.
 * is element wise multiplication
 %*% is matrix multiplication
• Assignment
 To assign a value to a variable use “<-” or equal (=) character

• Objects can be used in other calculations. To print object just enter name of object.
• Restrictions for name of object:
 Object names cannot contain `strange' symbols like !, +, -, #.
 A dot (.) and an underscore ( _) are allowed, also a name starting with a
dot.
 Object names can contain a number but cannot start with a number.
 R is case sensitive, X and x are two different objects, as well as temp and
temP.
• The assignment operator <-
x <- 25
• assigns the value of 25 to the variable x
y <- 3*x
• assigns the value of 3 times x (75 in this case) to the variable y
r <- 4
area.circle <- pi*r^2
area.circle
• NOTE: R is case-sensitive (y ≠ Y)
• We can evaluate truth or falsity of expressions:
2>1
1>2&2>1
• generate sequences (and perform operations on them)
3*(1:5)
R Module 1 Notes

• We can do matrix operations


a <- 1:3
b <- 3:5
a*b
a%*%b
a%*%t(b)

 Workspace

• Objects that you create during an R session are hold in memory, the collection of
objects that you currently have is called the workspace.
• This workspace is not saved on disk unless you tell R to do so. This means that your
objects are lost when you close R and not save the objects, or worse when R or your
system crashes on you during a session.
• During your R session you can also explicitly save the workspace image. Go to the
`File‘ menu and then select `Save Workspace...', or use the save.image function.
## save to the current working directory
save.image(“basicR .Rdata”)
## just checking what the current working directory is
getwd()
## save to a specific file and location
save.image("C:\\Program Files\\R\\R-2.5.0\\bin\\basicR .RData")
• If you have saved a workspace image and you start R the next time, it will restore the
workspace. So all your previously saved objects are available again. You can also
explicitly load a saved workspace Go the `File' menu and select `Load workspace...'.
or alternatively:
load ("basicR.RData ")
• R gets confused if you use a path in your code like c:\mydocuments\myfile.txt.
Note that R sees "\" as an escape character. Thus, it is better to use
c:\\my documents\\myfile.txt or c:/mydocuments/myfile.txt.
• To list the objects that you have in your current R session use the function ls or the
function objects :
ls()
objects()
• So to run the function ls we need to enter the name followed by an opening “(“ and a
closing “)”. Entering only ls will just print the object, you will see the underlying R
code of the function ls.
• Most functions in R accept certain arguments.
• For example, one of the arguments of the function ls is pattern. To list all objects
starting with the letter “x”:
x2 = 9
y2 = 10
ls(pattern="x")
• If you assign a value to an object that already exists then the contents of the object
will be overwritten with the new value (without a warning!).
• Use the function rm to remove one or more objects from your session.
R Module 1 Notes

rm(x, x2)
• Let us generate two small vectors with data and a scatterplot.
z2 <- c(1,2,3,4,5,6)
z3 <- c(6,8,3,5,7,1)
plot(z2,z3)
title("My first scatterplot")

 Data Sets and Libraries

◦ R comes with a number of sample datasets that you can experiment with. Type data() to
see the available datasets. The result will depend on which packages you have loaded.
◦ Type help(datasetname) for details on a sample dataset.
data()
help(women)

• One of the strengths of R is that the system can easily be extended.

 The system allows you to write new functions and package those functions in a
so called `R package' (or `R library').
 The R package may also contain other R objects, for example data sets or
documentation.

• There is a lively R user community and many R packages have been written and made
available on CRAN for other users.

 Just a few examples, there are packages for portfolio optimization, drawing
maps, exporting objects to html, time series analysis, spatial statistics and the list
goes on and on.

• When you download R, already a number of packages are downloaded as well.


 To use a function in an R package, that package has to be attached to the
system.
 When you start R not all of the downloaded packages are attached, only seven
packages are attached to the system by default.

• You can use the function search to see a list of packages that are currently attached to
the system, this list is also called the search path.
search( )
• To attach another package to the system you can use the menu or the library function.
library()
library(MASS)
shoes
R Module 1 Notes

• Or you can use the Menu: Select the “Packages” in the Menu and select “Load
Package”, a list of available packages on your system will be displayed. Select one and
click “OK”, the package is now attached to your current R session.
• Suppose we want to install a package called Rcmdr:
Choose Rcmdr in Packages ► Install packages menu
Or alternatively run the command: install.packages("Rcmdr“)

D. Getting Help
• R has a very good help system built in.
• If you know which function you want help with simply use ?_______ with the function in
the blank. For example for the functions hist and lm:
?hist
args(hist)
?lm
args(lm)

• If you don’t know which function to use, then use help.search(“<keyword>”).


help.search("histogram")
• In Help Option of Menu/Header bar:
 Console (keys to work on R console)
 FAQ on R
 FAQ on R for windows
 Manuals (in portable document format)
 R function
 Html help

• Each of the following tutorials are in PDF format.


 P. Kuhnert & B. Venables, An Introduction to R: Software for Statistical Modeling
& Computing
 J.H. Maindonald, Using R for Data Analysis and Graphics
 B. Muenchen, R for SAS and SPSS Users
 W.J. Owen, The R Guide
 D. Rossiter, Introduction to the R Project for Statistical Computing for Use at the
ITC
 W.N. Venebles & D. M. Smith, An Introduction to R
R Module 1 Notes

II. Data Types in R

• R is an object oriented programming language;


 Almost all things in R – functions, datasets, results, etc. – are OBJECTS.
 Graphics are written out and are not stored as objects

• Objects are classified by two criteria:


 MODE: how objects are stored in R - character, numeric, logical, factor, list, &
function
 CLASS: how objects are treated by functions – vector, matrix, array, data frame,
& hundreds of special classes created by specific functions

• R has a wide variety of data types:


 scalars,
 vectors (numerical, character, logical),
 matrices,
 data frames
 lists

A. Scalar

• Scalars are fixed constants.


# scalar (fixed constants)
3+8
x <- 50
41.3*x
• Thus, we can use R as a calculator directly.
• Or alternatively, generate scalar objects first, perform calculations on the objects (and
yield another object)
• Scalars are the most basic “vectors”.

B. Vectors

• null vector
# null vector using content function
x <- c()

• numeric vector
# numeric vector
a <- c(2,4,-3.6,12) ; a

Note: Semicolon used to combine multiple statements in one line

• character vector
# character vector
b <- c("one","two","three")
R Module 1 Notes

• logical vector
#logical vector
c1 <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)

• We can refer to elements of a vector using subscripts.


# 3rd and 2nd elements of vector a
a[c(3,2)]

• Also, vectors can be generated using functions such as


# sequence
d=seq( from =1, to =4, by =0.2); d

# replicate
e = rep(NA,5); e

vector1 <- c( seq(0,5),rep(NA,2),-1,-5)

# length
length(d)

• Be careful with assignments using “<- “, this is not the same as “ < - “
# assigning a value of 2 to f
f<-2
f
# is f less than negative 2?
f< -2

• Comparison and Logical Operators:

• We can sort:
a
a <-sort(a, decreasing=TRUE); a

• We can identify missing values with is.na function:


a<-c(a,NA, 2,-4, 2, NA,2); a
is.na(a)
• Other helpful functions:
unique(a)
duplicated(a)
R Module 1 Notes

• Note: by default, mean of a vector with missing data is missing:


mean(a)
mean(a,na.rm=TRUE)

• To determine number of missing data:


sum(is.na(a))
#Summary statistics
summary(a)

EXERCISE
1. Generate a vector e1 of positive even integers less than 100.
2. Remove the values greater than 50 and less than 90, and store these into e2.

C. Matrix

• A matrix is a rectangular array.

• All columns in a matrix must have the same mode(numeric, character, etc.) and the
same length.

• General format is
mymatrix <- matrix(vector, nrow=r, ncol=c,
byrow=FALSE,dimnames=list(char_vector_rownames, char_vector_colnames))

byrow=TRUE indicates that the matrix should be filled by rows


byrow=FALSE indicates that the matrix should be filled by columns (the default).
dimnames provides optional labels for the columns and rows.

• Let us generate a 2 by 3 matrix consisting of numbers 10 to 15:


mat_a <- matrix (10:15 , nrow =2, ncol =3) mat_a

• We can perform matrix operations:


mat_a+mat_a
3*mat_a

• We can transpose a matrix:


mat_a; t(mat_a)

• We can identify its dimensions:


dim(mat_a )

Or alternatively:
nrow(mat_a)
ncol(mat_a)
R Module 1 Notes

• Do matrix multiplication
mat_a %*% t(mat_a )

which is different from:


t(mat_a )%*% (mat_a )

• We can also subset matrices


First element in Matrix
mat_a [1,1]

First row
mat_a[1,]

Question: how about second column?

• We can extract elements of a matrix


Extracting 2nd and 3rd elements in first row
mat_a [1,c(2,3)]

Extracting 2nd element in 1st row, and 3rd element in 2nd row;
c(mat_a[1,2], mat_a[2,3])

• We can stack two vectors, one below the other, use rbind():
mat_b <-rbind(a,a); mat_b

If one vector has less length than the others, elements will be repeated until
appropriate:
a
d
mat_c = rbind(a,d); mat_c

• We can stack two vectors, one next to each other, use cbind():
mat_d <-cbind(a,a); mat_d

• Missing data may also be part of a matrix:


mat_e= matrix (c(9,NA,-2,5,-10, NA), nrow =2, ncol=3, byrow = TRUE ); mat_e

• To see if any of the elements of a vector are missing use is.na():


is.na(mat_e)

• To see how many missing values there are, use sum and is.na functions:
sum(is.na(mat_e))

• To obtain the element number of the matrix of the missing value(s), use which and is.na
functions:
which(is.na(mat_e))
Note: by default counting goes from first column, to next columns.
R Module 1 Notes

• EXERCISE
Find the matrix product of M_A and M_B if

M_A= M_B =

D. Arrays

• Extension of matrices but can have more than two dimensions.


• Vector is an array of one dimension
• Matrix is a rectangular array
• See help(array) for details.

E. Dataframes

• Another generalization of a matrix , but with different columns possibly having different
modes (numeric, character, factor, etc.).
d <- 1:5
e <- c("red", NA, "white", "blue", "red")
f <- c(TRUE,TRUE,TRUE,FALSE,TRUE)
mydata <- data.frame(d,e,f)
names(mydata) <- c("ID","Color","Passed") #variable names

• There are a variety of ways to identify the elements of a dataframe .


mydata[2:3] # columns 2 and 3 of dataframe
mydata[c("ID",“Color")] # columns ID and Color from dataframe
mydata$Color # variable Color in the dataframe

F. Lists

• An ordered collection of objects (components).


• A list allows you to gather a variety of (possibly unrelated) objects under one name.
# example of a list with 5components - a string, a numeric vector, a matrix, and a
scalar
w <- list(name="Fred", mynumbers=a, mymatrix=mat_a, mydf=mydata,age=28)

G. Factor

• Need to tell R that a variable is nominal by making it a factor.

# variable sex with 20 "male" entries and 30 "female" entries


sex <- c(rep("male",20), rep("female", 30))
summary(sex)
R Module 1 Notes

sex <- factor(sex)


# stores sex as 20 1s and 30 2s and associates 1=female, 2=male internally
(alphabetically)
# R now treats sex as a nominal variable
summary(sex)

You might also like