R Programming

R programming
University of Trento - FBK
19 February, 2015
1 / 50
Hints on programming
1 Save all your commands in a SCRIPT FILE, they will be useful in future...no one knows...
2 Save your script file any time you can! You swet a lot writing those instructions; You don’t
want to loose them!
3 Try to give smart name to variables and functions (try to avoid “pippo”, “pluto” “a”, “b” etc...)
4 Use comments to define sections in your script and describe what the section does
If you read the code after 2 month you won’t be able to remember what it does, unless you try to read all the instructions...it’s
not worth spending time reading codes, use COMMENT instead
5 If using values in more than one instruction, try to avoid code repetitions and static values.
BAD:
sum(a[a>0])
GOOD:
thr <- 0
sum(a[a>thr])
2 / 50
Programming with R
The if then else statement

Check whether a condition is TRUE or FALSE
Syntax:
if (expr is TRUE){ do something
} else { do something else}
expr can be one logical expression as seen before
A simple if statement: A more complex if statement:

If the instruction is on one line and there is
no else -> no need for curly brackets
x <- 5 x <- 5
y <- 2 y <- 3
## if (y!=0) xy <- x/y if (x > 5){
## xy xy <- x - y ## expr = TRUE
} else {
xy <- x + y ## expr = FALSE
}
## [1] 8
3 / 50
Testing condition using combination of epression (& |)
a<-2
b<-3
d<-4
# Using & to test two conditions, both true
if(a<b & b<d)
x<-a+b+d
x
## [1] 9
# Using & to test two conditions, one is false

if(a>b & b<c)
y<-a-b-d
## Error in b < c: comparison (3) is possible only for atomic and list types
y
## Error in eval(expr, envir, enclos): object ’y’ not found
# Using | to test two conditions, both false
if(a==b | a>d)
z<-a*b*d
z
## [1] 24
# Using or to test two conditions, one true
if(a<b | a>d)
z<-a*b*d
z
## [1] 24
4 / 50
Looping
The while() statement

Syntax:
while( expr ){
do something
}
An example
x <- 0 ## set the counter to 0
while( x<5 ){ ## do the same operation until x is < 5
x <- x + 1 ## update x
}
x
## [1] 5
Pay attention to the condition

x <- 0
y <- 0
## while (x < 5){
## y <- y + 1
## }
5 / 50
Looping II
The for() statement

Syntax:
for (i in start:stop ){
do something
}
An example
y <- vector(mode="numeric") ## Allocating an empty vector of mode "numeric"
for (i in 1:5){
y[i] <- i + 2
}
Nested Loops
mat <- matrix(nrow=2,ncol=4)
for (i in 1:2){
for (j in 1:4){
mat[i,j] <- i + j
}
}
mat
## [,1] [,2] [,3] [,4]
## [1,] 2 3 4 5
## [2,] 3 4 5 6
6 / 50
Vectors I I
Indexing
Use the square brackets to access a slot in a vector []

a[2] ## Extract the second element
## [1] 89
R stats counting from 1

a[0] ## Does not exists!
## integer(0)
We can pass multiple indexes using c() function

a[2:3]
## [1] 89 54
## a[2,3] ## What happen here?
What happen when I use a negative number as index

b[-1] ## All but the first element
## [1] 2 3 4 5 6 7 8 9 10
e[-c(1,4)] ## All but the first and the fourth elements

## Error in eval(expr, envir, enclos): object ’e’ not found
NB: Do not use c as variable name

7 / 50
Subsetting using logical operators
Using logic operator inside indexes

Logical operator can be use to subset a vector
Select only the element of the vector matching the TRUE condition
x <- 5:15
y <- 10
x[x > y]
## [1] 11 12 13 14 15
x[x==y]
## [1] 10
can be used also in matrices

mymat <- matrix(3:9, ncol=3)
## Warning in matrix(3:9, ncol = 3): data length [7] is not a sub-multiple or multiple of
the number of rows [3]
mymat > 7 ## Get TRUE where mymat is bigger than 7
## [,1] [,2] [,3]
## [1,] FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE
mymat[mymat>7] ## Get the actual values where mymat is bigger than 7
## [1] 8 9
8 / 50
Subsetting using logical operators II
Getting indexes
The which() function

Syntax:which(expr)
works only on vectors (matrix and data.frame)
returns the indexes where the expr is TRUE
expr can be any logical expression; combination of AND, OR are accepted
mymat > 7
## [,1] [,2] [,3]

## [1,] FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE
## Get the indexes where mymat > 7

which(mymat>7)
## [1] 6 7
which(mymat>7, arr.ind=TRUE)
## row col
## [1,] 3 2
## [2,] 1 3
9 / 50
Exercises I
1 Given an integer number x check all its divisors.

2 Given an integer number x compute the sum of all its divisors.
3 A perfect number is a number whose sum of the divisors (apart from itself) is equal to the
number itself. For example 6 is perfect because 1 + 2 + 3 (the divisors) = 6.
1 Given an integer number check if it is perfect.
2 Given an integer number x find all perfect numbers i < x.
10 / 50
Functions I
Define your own function

We have seen many function such as:
sum(mymat)
## [1] 49
mean(mymat)
## [1] 5.4444
Now you can define your custom function

myfunction <- function(arg1, arg2){
do something with arg1 and arg2
return(results)
}
Define a function to convert Fahrenheit to Celsius
FtoC <- function(F){
cels <- (F - 32) * (5/9)
return(cels)
}
FtoC(212)
## [1] 100
11 / 50
Functions II
Define a function to make the power of a number/vector

Use default argument
mypow <- function(x, exponent=2){
res <- x^exponent
return(res)
}
mypow(2)
## [1] 4
mypow(3,5)
## [1] 243
Variables defined inside a function will be valid only inside the function
res
## Error in eval(expr, envir, enclos): object ’res’ not found
Use debug() for debugging a function

It will run line by line
It allows to see the values of the variable inside the function
Each time the function is defined the debug mode will be removed
To exit the debug mode type c
debug(mypow)
12 / 50
Functions II
Function arguments can be call according to positions

bt <- read.table("../Lesson1/example1/BodyTemperature.txt",TRUE, " ") ## This will assign the f
## Gender Age HeartRate Temperature
## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3
Function arguments can be call by name

## Call arguments by name (position does not count)
bt <- read.table("../Lesson1/example1/BodyTemperature.txt",sep=" ", header=TRUE)
## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3
13 / 50
Data Exploration and summary statistic
Develop high level understanding of the data

Given a data.frame let’s understand the data inside.
What variables do we have?
Do they have meaningful names?
What are the variable types? (numeric, boolean, categorical)
What is the distribution of the data?
Are there any categorical variable?
The aim is to reduce the amount of information and focus only on key aspect of the data
14 / 50
Working with data objects
As an example let’s work on the labdf dataset.

bt <- read.table("BodyTemperature.txt", header=TRUE, sep=" ", as.is=TRUE)
head(bt) ## Let's look onlyt the firsts rows of the data.frame

## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3
15 / 50
Working with data objects
Get the structure and some useful statistic

str(bt) ## See the structure of the data object
## 'data.frame': 100 obs. of 4 variables:

## $ Gender : chr "M" "M" "M" "F" ...
## $ Age : int 33 32 42 33 26 37 32 45 31 49 ...
## $ HeartRate : int 69 72 68 75 68 79 71 73 77 81 ...
## $ Temperature: num 97 98.8 96.2 97.8 98.8 ...
summary(bt) ## Compute some statistic on each variable in the data.frame

## Length:100 Min. :21.0 Min. :61.0 Min. : 96.2
## Class :character 1st Qu.:33.8 1st Qu.:69.0 1st Qu.: 97.7
## Mode :character Median :37.0 Median :73.0 Median : 98.3
## Mean :37.6 Mean :73.7 Mean : 98.3
## 3rd Qu.:42.0 3rd Qu.:78.0 3rd Qu.: 98.9
## Max. :50.0 Max. :87.0 Max. :101.3
names(bt) ## Get the variable names
## [1] "Gender" "Age" "HeartRate" "Temperature"
16 / 50
Working with data objects I
Change the variable mode of the columns:

Check the variable modes
is.data.frame(bt) ## Check if the object is a data.frame
## [1] TRUE
is.numeric(bt$Age) ## Check if the mode of the column is numeric
## [1] TRUE
is.character(bt$Gender) ## Check if the mode of the variable Gender is character
## [1] TRUE
Look at the variable Gender, it is categorical, but it’s stored as character

as.factor(bt$Gender) ## Change variable mode Gender into factor (categorical)
## [1] M M M F F M F F F M M F F F F M F M F F F F F M F M M M M F F F M M M
## [36] F F M F F M M F M M M F F F F M F M M F F F M F F F M M F M M F M M M
## [71] F F M M M M F M F M M F F M F M M M F M F F M M F M F F F M
## Levels: F M
17 / 50
Working with data objects II
Store the changes on the data.frame and check the data.frame

bt$Gender <- as.factor(bt$Gender) ## Store the previous change
str(bt) ## Look at the structure
## 'data.frame': 100 obs. of 4 variables:
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 1 1 2 1 1 1 2 ...
## $ Age : int 33 32 42 33 26 37 32 45 31 49 ...
## $ HeartRate : int 69 72 68 75 68 79 71 73 77 81 ...
## $ Temperature: num 97 98.8 96.2 97.8 98.8 ...
summary(bt) ## Compute some statistic
## F:51 Min. :21.0 Min. :61.0 Min. : 96.2
## M:49 1st Qu.:33.8 1st Qu.:69.0 1st Qu.: 97.7
## Median :37.0 Median :73.0 Median : 98.3
## Mean :37.6 Mean :73.7 Mean : 98.3
## 3rd Qu.:42.0 3rd Qu.:78.0 3rd Qu.: 98.9
## Max. :50.0 Max. :87.0 Max. :101.3
18 / 50
Exercise II
1 Define a function that converts km to miles and viceversa.

2 Define a function that check wheter a number is perfect (vd Exercise I).
3 Define a function that given a numeric matrix returns the log of the matrix where the matrix
element is > 0 and NA otherwise.
4 Get the dataset SAheart_sub.data from the website and check the type for each column.
Add a column of factor type with Alchoolic where the value of alchol consumption is > 13 and
Non-Alcoholic otherwise.
19 / 50
Probability Distributions in R
Probability functions:
Every probability function in R has 4 functions denoted by the root (e.g. norm for normal
distribution) and a prefix:
p for “probability”, the cumulative distribution function (c.d.f.)
F (x) = P(X <= x)
q for “quantile”, the inverse of c.d.f.

x = F −1 (p)
d for “density”, the density function (p.d.f.)

2
f (x) = √1 e−x /2
2π
r for “random”, the random variable having the specified distribution
Example:
For the normal distribution we have the functions: pnorm, qnorm, dnorm, rnorm
20 / 50
Probability distribution in R
Available functions
Distributions Functions
Binomial pbinom qbinom dbinom rbinom
Chi-Square pchisq qchisq dchisq rchisq
Exponential pexp qexp dexp rexp
Log Normal plnorm qlnorm dlnorm rlnorm
Normal pnorm qnorm dnorm rnorm
Poisson ppois qpois dpois rpois
Student t pt qt dt rt
Uniform punif qunif dunif runif
Check the help (?<function>) for further information on the parameters and the usage of each
function.
21 / 50
The Normal Distribution in R
Cumulative Distribution Function
pnorm: computes the Cumulative Distribution Function where X is normally distributed

F (x) = P(X <= x)
## P(X<=2), X=N(0,1) Normal Cumulative
pnorm(2)
1.0
## [1] 0.97725
0.8
## P(X<=12), X=N(10,4)
pnorm(12, mean=10, sd=2)
0.6
## [1] 0.84134
pnorm
0.4
What is the P(X > 19) where
0.2
X = N (17.4, 375.67)? 0.0
−4 −2 0 2 4
22 / 50
The quantiles
qnorm: computes the inverse of thd c.d.f. Given a number 0 ≤ p ≤ 1 it returns the p − th quantile
of the distribution.
p = F (X )
X = F −1 (p)
## X = F^-1(0.95), N(0,1) Normal Density
qnorm(0.95)
1.0
p
0.95
## [1] 1.6449
0.8
## X = F^-1(0.95), N(100,625)
qnorm(0.95, mean=100, sd=25)
0.6
## [1] 141.12
pnorm
qnorm(p)
What is the 85-th quantile of X = N (72, 68)? 0.4

0.2
0.0
1.645
−3 −2 −1 0 1 2 3
23 / 50
The Density Function
dnorm: computes the Probability Density Function (p.d.f.) of the normal distribution.
(x−µ)2
−
f (x) = √1 e 2σ 2
2π
## F(0.5), X = N(0,1) Density Function
dnorm(0.5)
0.4
## [1] 0.35207
## F(-2.5), X = N(-1.5,2)
0.3
dnorm(-2.5, mean=-1.5, sd=sqrt(2))
## [1] 0.2197
dnorm
0.2
0.1
0.0
−4 −2 0 2 4
24 / 50
The Random Function
rnorm: simulates a random variates having a specified normal distribution.
## Extract 1000 samples X = N(0,1) Histogram of x
x <- rnorm(1000)
0.025
## Extract 1000 samples X = N(100,225)
x <- rnorm(1000, mean=100, sd=15)
0.020
xx <- seq(min(x), max(x), length=100)
hist(x, probability=TRUE)
lines(xx, dnorm(xx, mean=100, sd=15))
0.015
Density
0.010
0.005
0.000
60 80 100 120 140
25 / 50
Exercise III
1 Compute the values for p = [0.01, 0.05, 0.1, 0.2, 0.25] given X = N (−2, 8)
2 What is P(X = 1) when X = Bin(25, 0.005)?
3 What is P(13 ≤ X ≤ 22) where X = N (17.46, 375.67)?
26 / 50
Plotting in R
High level plot functions
Function Name Plot Produced

plot(x,y) Plot vector x against vector y
boxplot(x) "Box and whiskers" plot
hist(x) Histogram of the frequencies of x
barplot(x) Histogram of the value of x
pairs(x) For a matrix or data.frame plots all bivariate pairs
image(x,y,z) 3D plot using colors instead of lines
27 / 50
Simple visualization on numeric variables
Visualizing two vectors

x <- 1:10
y <- 1:10
plot(x,y)
10
●
●
8
●
6
y
●
4
●
2
2 4 6 8 10
28 / 50
Simple visualization on numeric variables
Visualizing two vectors, adding axis labels and changin the line type
plot(x,y, xlab="X values", ylab="Y values", main="X vs Y", type="b")
X vs Y
10
●
8
●
Y values
●
6
●
4
●
2
2 4 6 8 10
X values
More graphical parameter can be seen looking at the help of par
29 / 50
Additional parameter to graphical functions
Low level plotting functions

Adding point/line to an existing graph using points(x,y) and lines(x,y)
Adding text to an existing plot using text(x,y,label=”")
Adding a legend to a plot using legend(x,y,legend=”")
plot(x,y)
abline(0,1)
points(2,3, pch=19)
lines(x,y)
text(4,6, label="Slope=1") 10
●
8
Slope=1 ●
6
y
●
4
● ●
●
2
2 4 6 8 10
30 / 50
Barplot
The function barplot()

It plots the frequencies of the values of a variable
It is useful for looking at categorical values
It takes a vector or a matrix as input and use the values as frequencies
barplot(1:10)
10
8
6
4
2
0
31 / 50
Barplot
The function barplot()

Given a matrix as input (Death rates per 1000 population per year in Virginia)
VADeaths
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
barplot(VADeaths)
200
150
100
50
0
Rural Male Rural Female Urban Male Urban Female
32 / 50
Visualization on Categorical variables
Summarize the count for factors
table(bt$Gender) ## Collect the factors and count occurences for each factor
##
## F M
## 51 49
Look at the summarization in a bar plot
barplot(table(bt$Gender),
xlab="Gender", ylab="Frequency", main="Summarize Gender variable")
Summarize Gender variable

50
40
30
Frequency
20
10
0
F M
Gender
33 / 50
Histograms
The function hist()

Normaly used to visualize numerical variables
It is similar to a barplot but values are grouped into bins
For each interval the bar height correspond to the frequency (count) of observation in that
interval
The heights sum to sample size
34 / 50
Look at the distribution of the data
How the heart rate is distributed over our dataset?

Histogram of the HeartRate variable using frequency on the Y axis
hist(bt$HeartRate, col="gray80")
Histogram of bt$HeartRate
30
25
20
Frequency
15
10
5
0
60 65 70 75 80 85 90
bt$HeartRate
35 / 50
Density on the Y axis

hist(bt$HeartRate, col="gray80", freq=FALSE) ## Use parameter freq to change behaviour
0.06
0.05
0.04
Density
0.03
0.02
0.01
0.00
60 65 70 75 80 85 90
bt$HeartRate
36 / 50
Changing the intervals

hist(bt$HeartRate, col="gray80", breaks=50) ## Use parameter breaks to change intervals
8
6
Frequency
4
2
0
60 65 70 75 80 85
bt$HeartRate
37 / 50
Adding information to the histogram, mean and median

hist(bt$HeartRate, col="gray80", main="Histogram of Hear Rate")
abline(v=mean(bt$HeartRate), lwd=3)
abline(v=median(bt$HeartRate), lty=3, lwd=3)
legend("right", legend=c("Mean", "Median"), lty=c(1,3))
Histogram of Hear Rate
30
25
20
Frequency
Mean
Median
15
10
5
0
60 65 70 75 80 85 90
bt$HeartRate
38 / 50
Boxplots
The function boxplot()

Visualize the 5-number summary, the range and the quartiles
39 / 50
Boxplots
Look at the boxplot for the HearRate Variable
boxplot(bt$HeartRate, horizontal=TRUE, col="grey80")
60 65 70 75 80 85
40 / 50
Boxplots
Look at the boxplot for the HeartRate Variable
boxplot(bt$HeartRate, horizontal=TRUE, col="grey80")

points(bt$HeartRate, rep(1,length(bt$HeartRat)), pch=19) ## See where the data are
abline(h=1, lty=2)
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
60 65 70 75 80 85
41 / 50
Using factors and formula objects
Using a factor as categorical variable to condition the plot

Conditioning a plot using the factor using the formula object:
bt$HeartRate ~ bt$Gender
The numeric values in bt$HeartRate will be divided according to categories in bt$Gender
boxplot(bt$HeartRate~bt$Gender, horizontal=TRUE, col="grey80")
M
F
60 65 70 75 80 85
42 / 50
Pairs
The pairs()
function
It plots all the possible pairwise comparison in a data.frame
It allows a fast visual data exploration
pairs(bt) ## Look at all possible comparison at once
20 25 30 35 40 45 50 96 97 98 99 101
1.0 1.2 1.4 1.6 1.8 2.0

●● ● ●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●● ● ● ●● ●
●●●
●●●
●●●●
●●●
●●●
●●●
●●● ●● ●
Gender
● ● ● ●●●●●●●●●●●●●●●●●● ● ●●●●
●●
●●●●●●●●●●● ● ● ●●
●●●
●●●
●●●
●●
●●●
●●●
●●●
●● ●
20 25 30 35 40 45 50
● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ●●
● ● ● ● ●
● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ●●●● ●
● ● ● ● ●● ●● ● ●
● ● ●● ●● ● ● ● ● ●
● ● ● ●● ●●● ● ● ● ●● ● ● ●
● ● ● ●● ● ● ●● ●● ●●
● ● ●
● ● ● ● ●● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
Age ●
●●
●
●
●●
●● ● ●● ●●
● ●
●
● ●
● ●
●●
●●●●
●
●
●
●
●
●
● ●● ●
●●
● ●●
●● ● ●
●●
●
●
● ●
● ●● ●
●
● ● ●● ● ● ●
● ● ● ●● ● ●● ●●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
60 65 70 75 80 85
● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●● ●●
● ● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ● ●●●● ●
● ● ● ● ● ● ●● ● ● ● ● ●●● ●
● ● ● ● ●● ● ● ● ●● ●
● ● ● ● ● ● ● ● ● ●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●●
● ●
● ●● ●● ● ●
● ●●●
●● ●
●
●
●
●
●
HeartRate ●●
●
●●
●●●●●
●
● ● ● ●
●
●●●
● ● ●
●
●
●
●
●
●●
● ● ● ● ● ●● ● ● ● ●●
● ● ● ● ● ●●● ● ● ●● ● ●● ●
● ● ● ●● ● ● ● ● ● ●
● ● ● ●● ●● ● ● ●
● ●
●●●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
101
● ● ●
● ● ●
● ● ●
●
● ● ● ● ●
●
● ● ● ● ●
● ● ●
●
● ● ● ● ● ●● ●●● ●
● ● ● ● ● ●● ● ● ●
96 97 98 99
● ● ● ●● ●● ●● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
●
●
● ● ●
●●
●●●●
●
●● ●● ●
● ●
● ●●
●
● ●
●
● ●
●●
●
●
● ●●●
●●● ● ●● ●
● ● ●
●
●●● ●
● ● ●
●
Temperature
●
● ● ●● ● ●
●
● ● ● ●●●●● ● ● ● ●●●●●
● ●● ●
● ●
● ● ● ●● ●● ●● ●●●
● ● ● ●
● ●
● ●● ● ●
●
● ●● ● ● ● ●● ● ●
● ●●
●
● ●
● ● ● ● ● ●● ●●● ●●●● ●● ●●
● ●
● ● ● ● ● ●
● ● ●
● ● ●
1.0 1.2 1.4 1.6 1.8 2.0 60 65 70 75 80 85
43 / 50
Normal plot
Let’s look at the variable HearRate vs Temperature

See the use of ∼ in the plot command
## plot(bt$HeartRate, bt$Temperature)
plot(bt$HeartRate~bt$Temperature, main="Heart Rate vs Temperature")
Heart Rate vs Temperature
● ●
85
●
●
●
● ● ● ●
80
● ● ● ●
● ● ● ● ●
● ● ● ● ●● ●
● ● ● ●●
bt$HeartRate
● ● ●● ●
75
● ● ● ●● ● ●●
● ●●
●●● ● ● ●●
● ●● ● ● ●
● ●● ● ●
70
● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●● ●●● ●
●
65
●
●
●
●
60
96 97 98 99 100 101
bt$Temperature
44 / 50
Multiple plots on the same windows
Put more information together on the same plot
par(mfrow=c(2,1)) ## Note mfrow defining 2 rows and 1 column for allowing 2 plots
hist(bt$HeartRate, col="grey80", main="HeartRate histogram")
abline(v=mean(bt$HeartRate), lwd=3)
abline(v=median(bt$HeartRate), lty=3, lwd=3)
legend("right", legend=c("Mean", "Median"), lty=c(1,3))
boxplot(bt$HeartRate~bt$Gender, horizontal=TRUE, col=c( "pink", "blue"))
title("Boxplot for different gender")
points(bt$HeartRate[bt$Gender=="F"], rep(1,length(bt$HeartRate[bt$Gender=="F"])), pch=19)
points(bt$HeartRate[bt$Gender=="M"], rep(2,length(bt$HeartRate[bt$Gender=="M"])), pch=19)
HeartRate histogram
25
Frequency
Mean
15
Median
0 5
60 65 70 75 80 85 90
bt$HeartRate
Boxplot for different gender

M
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
F
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
60 65 70 75 80 85
45 / 50
Exporting graphs
It is possible to export graph in different formats
Png, Jpg, Pdf, Eps, Tiff
Look at the help for the functions pdf,png
pdf("myfirstgraph.pdf") ## Start the png device
par(mfrow=c(2,1))
hist(bt$HeartRate, col="grey80", main="HeartRate histogram")
boxplot(bt$HeartRate, horizontal=TRUE, col="grey80", main="Boxplot")
dev.off() ## switch off the device
1.4
1.2
nif
.0
46 / 50
Look probability distribution in plot
How an extraction from a N distribution looks like?

Extract enough samples from a N (0, 1)
Use Histogram to look at the data
x <- seq(-3,3,by=0.1) ## Create a vector of x values
y <- dnorm(x) ## Compute the normal density function over the vector x
plot(x,y,type="l") ## Plot it
0.4
0.3
0.2
y
0.1
0.0
−3 −2 −1 0 1 2 3
47 / 50
Data in R
R comes with a lot of dataset included

Look at all the available data sets with:
data() ## See all the availabel datasets
data(package = .packages(all.available = TRUE)) ## See all the available dataset in all the pav
## Warning in data(package = .packages(all.available = TRUE)): datasets have been moved
from package ’base’ to package ’datasets’
## Warning in data(package = .packages(all.available = TRUE)): datasets have been moved
from package ’stats’ to package ’datasets’
Get the VADeaths dataset from the datasets package

data(VADeaths, package="datasets") ## Load the dataset
## ls() ## Look if the dataseta has been loaded
## ?VADeaths ## Look at the documentation
48 / 50
Exercise I
1 Define a function that transform Celsius to Fahrenheit

Given the function defined before think on using an argument to compute the inverse (Fahreneit to
Celsius)
2 Define a function that given a number it computes the Fibonacci series

What can happen if a float number or a negative number is given?
3 Define a function that given a number it checks if it is a prime number
4 Two integer number are “friends” if the quotient between the number itself and the sum of the
divisors are equal. For example the sum of divisors of 6 is 1 + 2 + 3 + 6 =12. The sum of
divisors of 28 is 1 + 2 + 4 + 7 + 14 + 28 = 56. Then 12 /6 = 56 / 28 = 2, thus 6 and 28 are
“friends”.
Define a function that given 2 number as input checks if the numbers are “friends”.
5 Fix the number of samples to 1000 and extract at least 8 N (m, 1) where m ∈ [−3, 3].
With the same number of samples extract at least 8 N (0, s) where s ∈ [0.1, 2].
Plot the results in a same window with 3 different plot, one for N (m, 1), one for N (0, s) and one for
N (m, 1) and N (0, s) together. Decide the color code for each line
suggestion: search for “R color charts” in google and the function colors() in R
Plot the different distribution on the sample plot
49 / 50
Exercise II
6 Extract form a normal distribution an increasing number of samples (10-10000) and look at
the differences in the distribution between sample sizes
7 The dataset Pima.tr collects samples from the US National Institute of Diabetes and
Difestive and Kidney Disease. It includes 200 women of Pima Indian heritage living near
Phoenix, Arizona.
Get the dataset from the MASS package or download it from the website.
Describe the dataset, how many variables, which type of variable, how many samples ...
What do the variable mean?
Get the frquencies of the women affected by diabetes.
Explore the dataset using histograms, barplot and plots. For each plot you do describe what you see
and why did you do that plot.
Using categorical variable type to see if there is any difference in age distribution, bmi, and glu
variables
50 / 50

R Programming

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R Programming

Uploaded by

Copyright:

Available Formats

R programming

University of Trento - FBK

The if then else statement

A simple if statement: A more complex if statement:

# Using & to test two conditions, one is false

The while() statement

Pay attention to the condition

The for() statement

Use the square brackets to access a slot in a vector []

R stats counting from 1

We can pass multiple indexes using c() function

What happen when I use a negative number as index

e[-c(1,4)] ## All but the first and the fourth elements

NB: Do not use c as variable name

Using logic operator inside indexes

can be used also in matrices

The which() function

## [,1] [,2] [,3]

## Get the indexes where mymat > 7

1 Given an integer number x check all its divisors.

Define your own function

Now you can define your custom function

Define a function to make the power of a number/vector

Use debug() for debugging a function

Function arguments can be call according to positions

Function arguments can be call by name

Develop high level understanding of the data

As an example let’s work on the labdf dataset.

## Gender Age HeartRate Temperature

Get the structure and some useful statistic

## 'data.frame': 100 obs. of 4 variables:

summary(bt) ## Compute some statistic on each variable in the data.frame

## Gender Age HeartRate Temperature

names(bt) ## Get the variable names

## [1] "Gender" "Age" "HeartRate" "Temperature"

Change the variable mode of the columns:

Look at the variable Gender, it is categorical, but it’s stored as character

Store the changes on the data.frame and check the data.frame

1 Define a function that converts km to miles and viceversa.

q for “quantile”, the inverse of c.d.f.

d for “density”, the density function (p.d.f.)

r for “random”, the random variable having the specified distribution

Binomial pbinom qbinom dbinom rbinom

Chi-Square pchisq qchisq dchisq rchisq

Exponential pexp qexp dexp rexp

Log Normal plnorm qlnorm dlnorm rlnorm

Normal pnorm qnorm dnorm rnorm

Poisson ppois qpois dpois rpois

Uniform punif qunif dunif runif

pnorm: computes the Cumulative Distribution Function where X is normally distributed

## P(X<=2), X=N(0,1) Normal Cumulative

## X = F^-1(0.95), N(0,1) Normal Density

What is the 85-th quantile of X = N (72, 68)? 0.4

## F(0.5), X = N(0,1) Density Function

rnorm: simulates a random variates having a specified normal distribution.

## Extract 1000 samples X = N(0,1) Histogram of x

60 80 100 120 140

High level plot functions

Function Name Plot Produced

Visualizing two vectors

More graphical parameter can be seen looking at the help of par

Low level plotting functions

The function barplot()