You are on page 1of 50

R programming

University of Trento - FBK

19 February, 2015

1 / 50
Hints on programming

1 Save all your commands in a SCRIPT FILE, they will be useful in future...no one knows...
2 Save your script file any time you can! You swet a lot writing those instructions; You don’t
want to loose them!
3 Try to give smart name to variables and functions (try to avoid “pippo”, “pluto” “a”, “b” etc...)
4 Use comments to define sections in your script and describe what the section does
If you read the code after 2 month you won’t be able to remember what it does, unless you try to read all the instructions...it’s
not worth spending time reading codes, use COMMENT instead
5 If using values in more than one instruction, try to avoid code repetitions and static values.
BAD:
sum(a[a>0])

GOOD:
thr <- 0
sum(a[a>thr])

2 / 50
Programming with R

The if then else statement


Check whether a condition is TRUE or FALSE
Syntax:
if (expr is TRUE){ do something
} else { do something else}
expr can be one logical expression as seen before

A simple if statement: A more complex if statement:


If the instruction is on one line and there is
no else -> no need for curly brackets
x <- 5 x <- 5
y <- 2 y <- 3
## if (y!=0) xy <- x/y if (x > 5){
## xy xy <- x - y ## expr = TRUE
} else {
xy <- x + y ## expr = FALSE
}

## [1] 8

3 / 50
Testing condition using combination of epression (& |)
a<-2
b<-3
d<-4
# Using & to test two conditions, both true
if(a<b & b<d)
x<-a+b+d
x
## [1] 9

# Using & to test two conditions, one is false


if(a>b & b<c)
y<-a-b-d
## Error in b < c: comparison (3) is possible only for atomic and list types
y
## Error in eval(expr, envir, enclos): object ’y’ not found
# Using | to test two conditions, both false
if(a==b | a>d)
z<-a*b*d
z
## [1] 24
# Using or to test two conditions, one true
if(a<b | a>d)
z<-a*b*d
z
## [1] 24

4 / 50
Looping

The while() statement


Syntax:
while( expr ){
do something
}
An example
x <- 0 ## set the counter to 0
while( x<5 ){ ## do the same operation until x is < 5
x <- x + 1 ## update x
}
x
## [1] 5

Pay attention to the condition


x <- 0
y <- 0
## while (x < 5){
## y <- y + 1
## }

5 / 50
Looping II

The for() statement


Syntax:
for (i in start:stop ){
do something
}
An example
y <- vector(mode="numeric") ## Allocating an empty vector of mode "numeric"
for (i in 1:5){
y[i] <- i + 2
}

Nested Loops
mat <- matrix(nrow=2,ncol=4)
for (i in 1:2){
for (j in 1:4){
mat[i,j] <- i + j
}
}
mat
## [,1] [,2] [,3] [,4]
## [1,] 2 3 4 5
## [2,] 3 4 5 6

6 / 50
Vectors I I
Indexing

Use the square brackets to access a slot in a vector []


a[2] ## Extract the second element
## [1] 89

R stats counting from 1


a[0] ## Does not exists!
## integer(0)

We can pass multiple indexes using c() function


a[2:3]
## [1] 89 54
## a[2,3] ## What happen here?

What happen when I use a negative number as index


b[-1] ## All but the first element
## [1] 2 3 4 5 6 7 8 9 10

e[-c(1,4)] ## All but the first and the fourth elements


## Error in eval(expr, envir, enclos): object ’e’ not found

NB: Do not use c as variable name


7 / 50
Subsetting using logical operators

Using logic operator inside indexes


Logical operator can be use to subset a vector
Select only the element of the vector matching the TRUE condition
x <- 5:15
y <- 10
x[x > y]
## [1] 11 12 13 14 15
x[x==y]
## [1] 10

can be used also in matrices


mymat <- matrix(3:9, ncol=3)
## Warning in matrix(3:9, ncol = 3): data length [7] is not a sub-multiple or multiple of
the number of rows [3]
mymat > 7 ## Get TRUE where mymat is bigger than 7
## [,1] [,2] [,3]
## [1,] FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE
mymat[mymat>7] ## Get the actual values where mymat is bigger than 7

## [1] 8 9

8 / 50
Subsetting using logical operators II
Getting indexes

The which() function


Syntax:which(expr)
works only on vectors (matrix and data.frame)
returns the indexes where the expr is TRUE
expr can be any logical expression; combination of AND, OR are accepted

mymat > 7

## [,1] [,2] [,3]


## [1,] FALSE FALSE TRUE
## [2,] FALSE FALSE FALSE
## [3,] FALSE TRUE FALSE

## Get the indexes where mymat > 7


which(mymat>7)

## [1] 6 7

which(mymat>7, arr.ind=TRUE)

## row col
## [1,] 3 2
## [2,] 1 3

9 / 50
Exercises I

1 Given an integer number x check all its divisors.


2 Given an integer number x compute the sum of all its divisors.
3 A perfect number is a number whose sum of the divisors (apart from itself) is equal to the
number itself. For example 6 is perfect because 1 + 2 + 3 (the divisors) = 6.
1 Given an integer number check if it is perfect.
2 Given an integer number x find all perfect numbers i < x.

10 / 50
Functions I

Define your own function


We have seen many function such as:
sum(mymat)
## [1] 49
mean(mymat)
## [1] 5.4444

Now you can define your custom function


myfunction <- function(arg1, arg2){
do something with arg1 and arg2
return(results)
}
Define a function to convert Fahrenheit to Celsius
FtoC <- function(F){
cels <- (F - 32) * (5/9)
return(cels)
}
FtoC(212)

## [1] 100

11 / 50
Functions II

Define a function to make the power of a number/vector


Use default argument
mypow <- function(x, exponent=2){
res <- x^exponent
return(res)
}
mypow(2)
## [1] 4
mypow(3,5)
## [1] 243

Variables defined inside a function will be valid only inside the function
res
## Error in eval(expr, envir, enclos): object ’res’ not found

Use debug() for debugging a function


It will run line by line
It allows to see the values of the variable inside the function
Each time the function is defined the debug mode will be removed
To exit the debug mode type c

debug(mypow)

12 / 50
Functions II

Function arguments can be call according to positions


bt <- read.table("../Lesson1/example1/BodyTemperature.txt",TRUE, " ") ## This will assign the f
## Gender Age HeartRate Temperature
## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3

Function arguments can be call by name


## Call arguments by name (position does not count)
bt <- read.table("../Lesson1/example1/BodyTemperature.txt",sep=" ", header=TRUE)
## Gender Age HeartRate Temperature
## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3

13 / 50
Data Exploration and summary statistic

Develop high level understanding of the data


Given a data.frame let’s understand the data inside.
What variables do we have?
Do they have meaningful names?
What are the variable types? (numeric, boolean, categorical)
What is the distribution of the data?
Are there any categorical variable?

The aim is to reduce the amount of information and focus only on key aspect of the data

14 / 50
Working with data objects

As an example let’s work on the labdf dataset.


bt <- read.table("BodyTemperature.txt", header=TRUE, sep=" ", as.is=TRUE)
head(bt) ## Let's look onlyt the firsts rows of the data.frame

## Gender Age HeartRate Temperature


## 1 M 33 69 97.0
## 2 M 32 72 98.8
## 3 M 42 68 96.2
## 4 F 33 75 97.8
## 5 F 26 68 98.8
## 6 M 37 79 101.3

15 / 50
Working with data objects

Get the structure and some useful statistic


str(bt) ## See the structure of the data object

## 'data.frame': 100 obs. of 4 variables:


## $ Gender : chr "M" "M" "M" "F" ...
## $ Age : int 33 32 42 33 26 37 32 45 31 49 ...
## $ HeartRate : int 69 72 68 75 68 79 71 73 77 81 ...
## $ Temperature: num 97 98.8 96.2 97.8 98.8 ...

summary(bt) ## Compute some statistic on each variable in the data.frame

## Gender Age HeartRate Temperature


## Length:100 Min. :21.0 Min. :61.0 Min. : 96.2
## Class :character 1st Qu.:33.8 1st Qu.:69.0 1st Qu.: 97.7
## Mode :character Median :37.0 Median :73.0 Median : 98.3
## Mean :37.6 Mean :73.7 Mean : 98.3
## 3rd Qu.:42.0 3rd Qu.:78.0 3rd Qu.: 98.9
## Max. :50.0 Max. :87.0 Max. :101.3

names(bt) ## Get the variable names

## [1] "Gender" "Age" "HeartRate" "Temperature"

16 / 50
Working with data objects I

Change the variable mode of the columns:


Check the variable modes
is.data.frame(bt) ## Check if the object is a data.frame

## [1] TRUE
is.numeric(bt$Age) ## Check if the mode of the column is numeric
## [1] TRUE
is.character(bt$Gender) ## Check if the mode of the variable Gender is character
## [1] TRUE

Look at the variable Gender, it is categorical, but it’s stored as character


as.factor(bt$Gender) ## Change variable mode Gender into factor (categorical)
## [1] M M M F F M F F F M M F F F F M F M F F F F F M F M M M M F F F M M M
## [36] F F M F F M M F M M M F F F F M F M M F F F M F F F M M F M M F M M M
## [71] F F M M M M F M F M M F F M F M M M F M F F M M F M F F F M
## Levels: F M

17 / 50
Working with data objects II

Store the changes on the data.frame and check the data.frame


bt$Gender <- as.factor(bt$Gender) ## Store the previous change
str(bt) ## Look at the structure
## 'data.frame': 100 obs. of 4 variables:
## $ Gender : Factor w/ 2 levels "F","M": 2 2 2 1 1 2 1 1 1 2 ...
## $ Age : int 33 32 42 33 26 37 32 45 31 49 ...
## $ HeartRate : int 69 72 68 75 68 79 71 73 77 81 ...
## $ Temperature: num 97 98.8 96.2 97.8 98.8 ...
summary(bt) ## Compute some statistic
## Gender Age HeartRate Temperature
## F:51 Min. :21.0 Min. :61.0 Min. : 96.2
## M:49 1st Qu.:33.8 1st Qu.:69.0 1st Qu.: 97.7
## Median :37.0 Median :73.0 Median : 98.3
## Mean :37.6 Mean :73.7 Mean : 98.3
## 3rd Qu.:42.0 3rd Qu.:78.0 3rd Qu.: 98.9
## Max. :50.0 Max. :87.0 Max. :101.3

18 / 50
Exercise II

1 Define a function that converts km to miles and viceversa.


2 Define a function that check wheter a number is perfect (vd Exercise I).
3 Define a function that given a numeric matrix returns the log of the matrix where the matrix
element is > 0 and NA otherwise.
4 Get the dataset SAheart_sub.data from the website and check the type for each column.
Add a column of factor type with Alchoolic where the value of alchol consumption is > 13 and
Non-Alcoholic otherwise.

19 / 50
Probability Distributions in R

Probability functions:
Every probability function in R has 4 functions denoted by the root (e.g. norm for normal
distribution) and a prefix:
p for “probability”, the cumulative distribution function (c.d.f.)
F (x) = P(X <= x)

q for “quantile”, the inverse of c.d.f.


x = F −1 (p)

d for “density”, the density function (p.d.f.)


2
f (x) = √1 e−x /2

r for “random”, the random variable having the specified distribution

Example:
For the normal distribution we have the functions: pnorm, qnorm, dnorm, rnorm

20 / 50
Probability distribution in R
Available functions

Distributions Functions

Binomial pbinom qbinom dbinom rbinom

Chi-Square pchisq qchisq dchisq rchisq

Exponential pexp qexp dexp rexp

Log Normal plnorm qlnorm dlnorm rlnorm

Normal pnorm qnorm dnorm rnorm

Poisson ppois qpois dpois rpois

Student t pt qt dt rt

Uniform punif qunif dunif runif

Check the help (?<function>) for further information on the parameters and the usage of each
function.

21 / 50
The Normal Distribution in R
Cumulative Distribution Function

pnorm: computes the Cumulative Distribution Function where X is normally distributed


F (x) = P(X <= x)

## P(X<=2), X=N(0,1) Normal Cumulative

pnorm(2)

1.0
## [1] 0.97725

0.8
## P(X<=12), X=N(10,4)
pnorm(12, mean=10, sd=2)

0.6
## [1] 0.84134

pnorm

0.4
What is the P(X > 19) where
0.2
X = N (17.4, 375.67)? 0.0

−4 −2 0 2 4

22 / 50
The Normal Distribution in R
The quantiles

qnorm: computes the inverse of thd c.d.f. Given a number 0 ≤ p ≤ 1 it returns the p − th quantile
of the distribution.
p = F (X )
X = F −1 (p)

## X = F^-1(0.95), N(0,1) Normal Density

qnorm(0.95)

1.0
p

0.95
## [1] 1.6449

0.8
## X = F^-1(0.95), N(100,625)
qnorm(0.95, mean=100, sd=25)

0.6
## [1] 141.12

pnorm
qnorm(p)

What is the 85-th quantile of X = N (72, 68)? 0.4


0.2
0.0

1.645
−3 −2 −1 0 1 2 3

23 / 50
The Normal Distribution in R
The Density Function

dnorm: computes the Probability Density Function (p.d.f.) of the normal distribution.
(x−µ)2

f (x) = √1 e 2σ 2

## F(0.5), X = N(0,1) Density Function

dnorm(0.5)

0.4
## [1] 0.35207

## F(-2.5), X = N(-1.5,2)

0.3
dnorm(-2.5, mean=-1.5, sd=sqrt(2))

## [1] 0.2197

dnorm

0.2
0.1
0.0

−4 −2 0 2 4

24 / 50
The Normal Distribution in R
The Random Function

rnorm: simulates a random variates having a specified normal distribution.

## Extract 1000 samples X = N(0,1) Histogram of x

x <- rnorm(1000)

0.025
## Extract 1000 samples X = N(100,225)
x <- rnorm(1000, mean=100, sd=15)

0.020
xx <- seq(min(x), max(x), length=100)
hist(x, probability=TRUE)
lines(xx, dnorm(xx, mean=100, sd=15))

0.015
Density

0.010
0.005
0.000

60 80 100 120 140

25 / 50
Exercise III

1 Compute the values for p = [0.01, 0.05, 0.1, 0.2, 0.25] given X = N (−2, 8)
2 What is P(X = 1) when X = Bin(25, 0.005)?
3 What is P(13 ≤ X ≤ 22) where X = N (17.46, 375.67)?

26 / 50
Plotting in R

High level plot functions

Function Name Plot Produced


plot(x,y) Plot vector x against vector y
boxplot(x) "Box and whiskers" plot
hist(x) Histogram of the frequencies of x
barplot(x) Histogram of the value of x
pairs(x) For a matrix or data.frame plots all bivariate pairs
image(x,y,z) 3D plot using colors instead of lines

27 / 50
Simple visualization on numeric variables

Visualizing two vectors


x <- 1:10
y <- 1:10
plot(x,y)

10


8


6
y


4


2

2 4 6 8 10

28 / 50
Simple visualization on numeric variables

Visualizing two vectors, adding axis labels and changin the line type
plot(x,y, xlab="X values", ylab="Y values", main="X vs Y", type="b")

X vs Y

10

8

Y values


6


4


2

2 4 6 8 10

X values

More graphical parameter can be seen looking at the help of par

29 / 50
Additional parameter to graphical functions

Low level plotting functions


Adding point/line to an existing graph using points(x,y) and lines(x,y)
Adding text to an existing plot using text(x,y,label=”")
Adding a legend to a plot using legend(x,y,legend=”")

plot(x,y)
abline(0,1)
points(2,3, pch=19)
lines(x,y)
text(4,6, label="Slope=1") 10


8

Slope=1 ●
6
y


4

● ●


2

2 4 6 8 10

30 / 50
Barplot

The function barplot()


It plots the frequencies of the values of a variable
It is useful for looking at categorical values
It takes a vector or a matrix as input and use the values as frequencies
barplot(1:10)

10
8
6
4
2
0

31 / 50
Barplot

The function barplot()


Given a matrix as input (Death rates per 1000 population per year in Virginia)
VADeaths
## Rural Male Rural Female Urban Male Urban Female
## 50-54 11.7 8.7 15.4 8.4
## 55-59 18.1 11.7 24.3 13.6
## 60-64 26.9 20.3 37.0 19.3
## 65-69 41.0 30.9 54.6 35.1
## 70-74 66.0 54.3 71.1 50.0
barplot(VADeaths)
200
150
100
50
0

Rural Male Rural Female Urban Male Urban Female

32 / 50
Visualization on Categorical variables
Summarize the count for factors
table(bt$Gender) ## Collect the factors and count occurences for each factor

##
## F M
## 51 49
Look at the summarization in a bar plot
barplot(table(bt$Gender),
xlab="Gender", ylab="Frequency", main="Summarize Gender variable")

Summarize Gender variable


50
40
30
Frequency

20
10
0

F M

Gender

33 / 50
Histograms

The function hist()


Normaly used to visualize numerical variables
It is similar to a barplot but values are grouped into bins
For each interval the bar height correspond to the frequency (count) of observation in that
interval
The heights sum to sample size

34 / 50
Look at the distribution of the data

How the heart rate is distributed over our dataset?


Histogram of the HeartRate variable using frequency on the Y axis
hist(bt$HeartRate, col="gray80")

Histogram of bt$HeartRate
30
25
20
Frequency

15
10
5
0

60 65 70 75 80 85 90

bt$HeartRate

35 / 50
Look at the distribution of the data

Density on the Y axis


hist(bt$HeartRate, col="gray80", freq=FALSE) ## Use parameter freq to change behaviour

Histogram of bt$HeartRate

0.06
0.05
0.04
Density

0.03
0.02
0.01
0.00

60 65 70 75 80 85 90

bt$HeartRate

36 / 50
Look at the distribution of the data

Changing the intervals


hist(bt$HeartRate, col="gray80", breaks=50) ## Use parameter breaks to change intervals

Histogram of bt$HeartRate

8
6
Frequency

4
2
0

60 65 70 75 80 85

bt$HeartRate

37 / 50
Look at the distribution of the data

Adding information to the histogram, mean and median


hist(bt$HeartRate, col="gray80", main="Histogram of Hear Rate")
abline(v=mean(bt$HeartRate), lwd=3)
abline(v=median(bt$HeartRate), lty=3, lwd=3)
legend("right", legend=c("Mean", "Median"), lty=c(1,3))

Histogram of Hear Rate

30
25
20
Frequency

Mean
Median
15
10
5
0

60 65 70 75 80 85 90

bt$HeartRate

38 / 50
Boxplots

The function boxplot()


Visualize the 5-number summary, the range and the quartiles

39 / 50
Boxplots

Look at the boxplot for the HearRate Variable

boxplot(bt$HeartRate, horizontal=TRUE, col="grey80")

60 65 70 75 80 85

40 / 50
Boxplots

Look at the boxplot for the HeartRate Variable

boxplot(bt$HeartRate, horizontal=TRUE, col="grey80")


points(bt$HeartRate, rep(1,length(bt$HeartRat)), pch=19) ## See where the data are
abline(h=1, lty=2)

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

60 65 70 75 80 85

41 / 50
Using factors and formula objects

Using a factor as categorical variable to condition the plot


Conditioning a plot using the factor using the formula object:
bt$HeartRate ~ bt$Gender
The numeric values in bt$HeartRate will be divided according to categories in bt$Gender

boxplot(bt$HeartRate~bt$Gender, horizontal=TRUE, col="grey80")

M
F

60 65 70 75 80 85

42 / 50
Pairs

The pairs()
function
It plots all the possible pairwise comparison in a data.frame
It allows a fast visual data exploration

pairs(bt) ## Look at all possible comparison at once

20 25 30 35 40 45 50 96 97 98 99 101

1.0 1.2 1.4 1.6 1.8 2.0


●● ● ●●●●●●●●●● ●●●●●●●●●●● ●●●● ●●●●● ●●●●●●●●●●● ● ● ●● ●
●●●
●●●
●●●●
●●●
●●●
●●●
●●● ●● ●

Gender

● ● ● ●●●●●●●●●●●●●●●●●● ● ●●●●
●●
●●●●●●●●●●● ● ● ●●
●●●
●●●
●●●
●●
●●●
●●●
●●●
●● ●
20 25 30 35 40 45 50

● ● ●
● ● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ●●
● ● ● ● ●
● ● ● ● ●● ● ●●●
● ● ● ● ● ● ● ●●●● ●
● ● ● ● ●● ●● ● ●
● ● ●● ●● ● ● ● ● ●
● ● ● ●● ●●● ● ● ● ●● ● ● ●
● ● ● ●● ● ● ●● ●● ●●
● ● ●
● ● ● ● ●● ● ● ● ● ● ●










Age ●
●●



●●
●● ● ●● ●●
● ●

● ●
● ●
●●
●●●●






● ●● ●
●●
● ●●
●● ● ●
●●


● ●
● ●● ●

● ● ●● ● ● ●
● ● ● ●● ● ●● ●●
● ● ● ● ● ● ● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ● ●
● ● ●
● ● ●
● ● ●

● ● ● ● ●

60 65 70 75 80 85
● ● ●
● ● ●
● ● ●
● ● ● ● ● ● ●● ●●
● ● ● ● ● ● ● ●● ●
● ● ● ● ● ● ● ● ●●●● ●
● ● ● ● ● ● ●● ● ● ● ● ●●● ●
● ● ● ● ●● ● ● ● ●● ●
● ● ● ● ● ● ● ● ● ●● ●










● ●

●●
●●
● ●
● ●● ●● ● ●
● ●●●
●● ●




HeartRate ●●


●●
●●●●●

● ● ● ●

●●●
● ● ●





●●
● ● ● ● ● ●● ● ● ● ●●
● ● ● ● ● ●●● ● ● ●● ● ●● ●
● ● ● ●● ● ● ● ● ● ●
● ● ● ●● ●● ● ● ●
● ●
●●●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●

● ● ●
101

● ● ●
● ● ●
● ● ●

● ● ● ● ●

● ● ● ● ●
● ● ●

● ● ● ● ● ●● ●●● ●
● ● ● ● ● ●● ● ● ●
96 97 98 99

● ● ● ●● ●● ●● ●●














● ●
● ●


● ● ●
●●
●●●●

●● ●● ●
● ●
● ●●

● ●

● ●
●●


● ●●●
●●● ● ●● ●
● ● ●

●●● ●
● ● ●

Temperature

● ● ●● ● ●

● ● ● ●●●●● ● ● ● ●●●●●
● ●● ●
● ●
● ● ● ●● ●● ●● ●●●
● ● ● ●
● ●
● ●● ● ●

● ●● ● ● ● ●● ● ●
● ●●

● ●
● ● ● ● ● ●● ●●● ●●●● ●● ●●
● ●
● ● ● ● ● ●
● ● ●
● ● ●

1.0 1.2 1.4 1.6 1.8 2.0 60 65 70 75 80 85

43 / 50
Normal plot

Let’s look at the variable HearRate vs Temperature


See the use of ∼ in the plot command
## plot(bt$HeartRate, bt$Temperature)
plot(bt$HeartRate~bt$Temperature, main="Heart Rate vs Temperature")

Heart Rate vs Temperature

● ●
85



● ● ● ●
80

● ● ● ●
● ● ● ● ●
● ● ● ● ●● ●
● ● ● ●●
bt$HeartRate

● ● ●● ●
75

● ● ● ●● ● ●●
● ●●
●●● ● ● ●●
● ●● ● ● ●
● ●● ● ●
70

● ● ● ● ●
● ● ●● ● ● ●
● ● ● ●
●● ●●● ●

65





60

96 97 98 99 100 101

bt$Temperature

44 / 50
Multiple plots on the same windows
Put more information together on the same plot
par(mfrow=c(2,1)) ## Note mfrow defining 2 rows and 1 column for allowing 2 plots
hist(bt$HeartRate, col="grey80", main="HeartRate histogram")
abline(v=mean(bt$HeartRate), lwd=3)
abline(v=median(bt$HeartRate), lty=3, lwd=3)
legend("right", legend=c("Mean", "Median"), lty=c(1,3))
boxplot(bt$HeartRate~bt$Gender, horizontal=TRUE, col=c( "pink", "blue"))
title("Boxplot for different gender")
points(bt$HeartRate[bt$Gender=="F"], rep(1,length(bt$HeartRate[bt$Gender=="F"])), pch=19)
points(bt$HeartRate[bt$Gender=="M"], rep(2,length(bt$HeartRate[bt$Gender=="M"])), pch=19)

HeartRate histogram
25
Frequency

Mean
15

Median
0 5

60 65 70 75 80 85 90

bt$HeartRate

Boxplot for different gender


M

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
F

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

60 65 70 75 80 85

45 / 50
Exporting graphs
It is possible to export graph in different formats
Png, Jpg, Pdf, Eps, Tiff
Look at the help for the functions pdf,png
pdf("myfirstgraph.pdf") ## Start the png device
par(mfrow=c(2,1))
hist(bt$HeartRate, col="grey80", main="HeartRate histogram")
boxplot(bt$HeartRate, horizontal=TRUE, col="grey80", main="Boxplot")
dev.off() ## switch off the device
1.4
1.2
nif
.0

46 / 50
Look probability distribution in plot

How an extraction from a N distribution looks like?


Extract enough samples from a N (0, 1)
Use Histogram to look at the data
x <- seq(-3,3,by=0.1) ## Create a vector of x values
y <- dnorm(x) ## Compute the normal density function over the vector x
plot(x,y,type="l") ## Plot it
0.4
0.3
0.2
y

0.1
0.0

−3 −2 −1 0 1 2 3

47 / 50
Data in R

R comes with a lot of dataset included


Look at all the available data sets with:
data() ## See all the availabel datasets
data(package = .packages(all.available = TRUE)) ## See all the available dataset in all the pav
## Warning in data(package = .packages(all.available = TRUE)): datasets have been moved
from package ’base’ to package ’datasets’
## Warning in data(package = .packages(all.available = TRUE)): datasets have been moved
from package ’stats’ to package ’datasets’

Get the VADeaths dataset from the datasets package


data(VADeaths, package="datasets") ## Load the dataset
## ls() ## Look if the dataseta has been loaded
## ?VADeaths ## Look at the documentation

48 / 50
Exercise I

1 Define a function that transform Celsius to Fahrenheit


Given the function defined before think on using an argument to compute the inverse (Fahreneit to
Celsius)

2 Define a function that given a number it computes the Fibonacci series


What can happen if a float number or a negative number is given?

3 Define a function that given a number it checks if it is a prime number

4 Two integer number are “friends” if the quotient between the number itself and the sum of the
divisors are equal. For example the sum of divisors of 6 is 1 + 2 + 3 + 6 =12. The sum of
divisors of 28 is 1 + 2 + 4 + 7 + 14 + 28 = 56. Then 12 /6 = 56 / 28 = 2, thus 6 and 28 are
“friends”.
Define a function that given 2 number as input checks if the numbers are “friends”.

5 Fix the number of samples to 1000 and extract at least 8 N (m, 1) where m ∈ [−3, 3].
With the same number of samples extract at least 8 N (0, s) where s ∈ [0.1, 2].
Plot the results in a same window with 3 different plot, one for N (m, 1), one for N (0, s) and one for
N (m, 1) and N (0, s) together. Decide the color code for each line
suggestion: search for “R color charts” in google and the function colors() in R

Plot the different distribution on the sample plot

49 / 50
Exercise II

6 Extract form a normal distribution an increasing number of samples (10-10000) and look at
the differences in the distribution between sample sizes

7 The dataset Pima.tr collects samples from the US National Institute of Diabetes and
Difestive and Kidney Disease. It includes 200 women of Pima Indian heritage living near
Phoenix, Arizona.
Get the dataset from the MASS package or download it from the website.
Describe the dataset, how many variables, which type of variable, how many samples ...
What do the variable mean?
Get the frquencies of the women affected by diabetes.
Explore the dataset using histograms, barplot and plots. For each plot you do describe what you see
and why did you do that plot.
Using categorical variable type to see if there is any difference in age distribution, bmi, and glu
variables

50 / 50

You might also like