Professional Documents
Culture Documents
1.1 What is R?
R is a language and environment for statistical computing. It has the following features:
## [1] 5 4 3 1 4
# This is equivalent to
x
## [1] 5 4 3 1 4
## [1] 5
Note that R commands are written in a chunk which begins with three back ticks and ends with three back
ticks. The command {r vect1} specifies that chunk is for R language and it is labeled by vect1. All the R
commands are to be written inside the chunk.
There are five basic operators in R, namely +, -, *, /, ˆ, which are meant for addition, negation, mul-
tiplication, and exponentiation, respectively. Besides these basic operators, a large number of functions are
also available for arithmetic computations. Let us create a new chunk:
## [1] 17
3
sum(xˆ2) #sum of squares.
## [1] 67
1/x # inverse of x
sin(x)
cos(x)
mean(x) # mean of x
## [1] 3.4
var(x) #variance of x
## [1] 2.3
4
sd(x) # SD of x
## [1] 1.516575
## [1] 44.60515
x=c(5,4,3,1,4)
length(x)
## [1] 5
## [1] 6 5 4 2 5
2*x
## [1] 10 8 6 2 8
log(x+1)
sin(2*x)
a1 is valid
1a is not valid as it starts wih numeric
$a is not valid as it begins with special character
a$ is also not valid as it involves a special character
A12_5 is valid
AtmaRam is valid
Atma Ram is not valid as fractured names are not allowed.
AtmaRam<-5 is valid
Atma Ram<-5 is not valid
5
1.6 The functions : seq() and rep()
The function seq() is needed to generate sequences. The operator : is also used to generate sequences with
an increment of 1. Let us see some R commands:
x<-1:30
x
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30
rx<-30:1
rx
## [1] 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13
## [19] 12 11 10 9 8 7 6 5 4 3 2 1
## [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
## [19] 19 20 21 22 23 24 25 26 27 28 29 30
rx1=seq(from=30,to=1)
rx1
## [1] 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13
## [19] 12 11 10 9 8 7 6 5 4 3 2 1
## [1] 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0
## [12] 6.5 7.0 7.5 8.0 8.5 9.0 9.5 10.0
## [1] 1 1 1 2 2 2
6
rep(c(1,2),each=3)
## [1] 1 1 1 2 2 2
tasbiheFatmi=rep(c("subhaallah","alhamdulillah","allahuakbar"),c(33,33,34))
tasbiheFatmi
rep(c("R1","R2","R3"),c(3,3,3))
## [1] "R1" "R1" "R1" "R2" "R2" "R2" "R3" "R3" "R3"
7
x=c(2,5,3,1,8)
x>3
sum(x>3)
## [1] 2
x[x>2]
## [1] 5 3 8
mean(x[x>2])
## [1] 5.333333
mean(x[x<3])
## [1] 1.5
mean(x[x<3])
## [1] 1.5
mean(x<3)
## [1] 0.4
## [1] 3
x=c(2,5,3,NA,6)
is.na(x)
mean(x)
## [1] NA
8
mean(x,na.rm=TRUE)
## [1] 4
!is.na(x)
mean(x[!is.na(x)])
## [1] 4
## [1] NaN 1 1
is.na(x/x)
x=c(2,5,8)
x
## [1] 2 5 8
names(x)=c("Azam","Zuklekha","Ankit")
x
barplot(x)
9
8
6
4
2
0
paste(): The function paste() behaves like c() but it always returns a character vector.
x
paste(c("Azam","Zuklekha","Ankit"),c(2,5,8),sep=": ")
paste0(c("Azam","Zuklekha","Ankit"),c(2,5,8))
dob=c("10/01/93","12/03/93","15/07/94")
dob
paste(names(x),dob)
10
paste0(names(x),dob)
letters
## [1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n"
## [15] "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"
LETTERS
## [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N"
## [15] "O" "P" "Q" "R" "S" "T" "U" "V" "W" "X" "Y" "Z"
letters[c(3,5)]
LETTERS[1:4]
month.abb
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep"
## [10] "Oct" "Nov" "Dec"
month.abb[c(3,9)]
month.name
substring
substr(month.name,1,3)
## [1] "Jan" "Feb" "Mar" "Apr" "May" "Jun" "Jul" "Aug" "Sep"
## [10] "Oct" "Nov" "Dec"
Factor is like the concept of factors in design of experiments. Factors talk with levels. For example
11
grp<-c("control","treatment","control","treatment")
grp
grp=factor(grp)
#factors are efficient ways of storing data
as.integer(grp)
## [1] 1 2 1 2
levels(grp)
m=matrix(1:6,nrow=2,ncol=3)
m
dim(m)
## [1] 2 3
dimnames(m)=list(LETTERS[1:2],letters[1:3])
m
## a b c
## A 1 3 5
## B 2 4 6
colnames(m)=c("samad","Radha","Umar")
m
rownames(m)=c("Muradabad","Aligarh")
m
12
matrix(1:6,ncol=3,byrow = TRUE)
## [1] 5
m[,c(1,2)]
## samad Radha
## Muradabad 1 3
## Aligarh 2 4
m[c(1,2),3]
## Muradabad Aligarh
## 5 6
## arrys()
a1=array(1:8,c(2,2,2))
a1
## , , 1
##
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
##
## , , 2
##
## [,1] [,2]
## [1,] 5 7
## [2,] 6 8
#naming arrays
dim(a1)
## [1] 2 2 2
dimnames(a1)=list(c("a","b"),LETTERS[3:4],c("D1","D2"))
a1
## , , D1
##
## C D
13
## a 1 3
## b 2 4
##
## , , D2
##
## C D
## a 5 7
## b 6 8
These are objects, a kind of generalization of matrices. Generalization in the sense that columns can be of
different types from each other. This flexibility is not with matrices. Use data.frame() to construct a data
frame from vectors:
colors=c("red","yellow","blue")
numbers=c(1,2,3)
more_numbers=c(5,4,7)
colors_numbers=data.frame(Colors=colors,Numbers=numbers,MoreNUMBERS=more_numbers)
colors_numbers
names(colors_numbers)=c("Colours","NUMBERS","MNumbers")
colors_numbers
14
DF2=colors_numbers[,c(1,3)]
DF2
## Colours MNumbers
## Row1 red 5
## Row2 yellow 4
## Row3 blue 7
histogram
x=c(12,15,13,20,14,16,10,10,8,15)
hist(x,prob=TRUE)
Histogram of x
0.15
0.10
Density
0.05
0.00
8 10 12 14 16 18 20
This is a plot histogram. It is evident from Figure~@ref(fig:hist1) that data is small even we have a nice
histogram.
x=seq(1,10)
y=xˆ2-10*x
plot(y~x)
15
0
−5
−10
y
−15
−20
−25
2 4 6 8 10
curve(xˆ2,from=-2,to=2)
16
4
3
x^2
2
1
0
−2 −1 0 1 2
curve(expr=sin,from=0,to=6*pi)
17
1.0
0.5
sin(x)
0.0
−0.5
−1.0
0 5 10 15
curve(dnorm(x),from=-3,to=3)
18
0.4
0.3
dnorm(x)
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
# Plot f1(x)=2x+3xˆ2 , -3 to 3
f1=function(x) 2*x+3*xˆ2
curve(f1,from=-3,to=3,lwd=3,col="blue")
19
30
25
20
f1(x)
15
10
5
0
−3 −2 −1 0 1 2 3
f1=function(x) 2*x+3*xˆ2
dump("f1",file="f1.txt")
source("f1.txt")
sink("solar.txt")
x=c(2,3,5)
mean(x)
## [1] 3.333333
# Weight of 30students
weight<-c(68,53,69.5,55,71,63,76.5,65.5,69,75,76,57,70.5,
20
71.5,56,81.5,69,59,67.5,61,68,59.5,56.5,73,61,
72.5,71.5,59.5,74.5,63)
dump("weight","weight.dat")
length(weight)
## [1] 30
#create histogram
h1<-hist(weight)
h1
## $breaks
## [1] 50 55 60 65 70 75 80 85
##
## $counts
## [1] 2 6 4 7 8 2 1
##
## $density
## [1] 0.013333333 0.040000000 0.026666667 0.046666667
## [5] 0.053333333 0.013333333 0.006666667
##
## $mids
## [1] 52.5 57.5 62.5 67.5 72.5 77.5 82.5
##
## $xname
## [1] "weight"
##
## $equidist
## [1] TRUE
##
## attr(,"class")
## [1] "histogram"
h1$breaks
## [1] 50 55 60 65 70 75 80 85
21
Histogram of weight
8
6
Frequency
4
2
0
50 55 60 65 70 75 80 85
weight
22
30
25
cumsum(h1$counts)
20
15
10
5
55 60 65 70 75 80
h1$mids
stem(weight)
##
## The decimal point is 1 digit(s) to the right of the |
##
## 5 | 3
## 5 | 56779
## 6 | 001133
## 6 | 688899
## 7 | 0112233
## 7 | 5567
## 8 | 2
23
62.5 57.5
52.5
67.5 82.5
77.5
72.5
1. High level graphic functions which always create a new plot. Examples are; plot(),hist(), barplot(),
dotchart(), pie(), curve().
2. Low level graphic functions: They add on existing plot. For example, lines(), points(), mtext(),
main(), sub().
Example: Bar charts and dot charts The data VADeaths a data set in R contains death rates (number of
deaths per 1000 population per year) in various subpopulations within state of Viginia in 1940. This may
be plotted as barchart.
VADeaths
24
barplot(VADeaths, beside = TRUE, legend=TRUE,ylim=c(0,90),
ylab = "Deaths per 1000", main="Death rates in Virginia")
50−54
55−59
60−64
Deaths per 1000
60
65−69
70−74
40
20
0
25
Death rate in Virginia
Rural Male
70−74
65−69
60−64
55−59
50−54
Rural Female
70−74
65−69
60−64
55−59
50−54
Urban Male
70−74
65−69
60−64
55−59
50−54
Urban Female
70−74
65−69
60−64
55−59
50−54
0 20 40 60
Pie chart displays a vector of numbers by breaking up a circular disk into pieces whose angle (and hence
area) is proportional to each number. For example, the letter grades assigned to each class might arise in
the proportions shown in the following figure:
groupsizes<-c(18,30,32,10,10)
labels<-c("A","B","C","D","F")
pie(groupsizes, labels, col=c("grey40", "white","grey","black", "grey90"))
1.23 Histogram—hist()
## [1] -0.01311039
26
B
C D
27
sd(x)
## [1] 0.9967348
hist(x,prob=TRUE)
curve(dnorm(x,mean=0,sd=1),add=TRUE,lwd=2) #to add on histogram
Histogram of x
0.4
0.3
Density
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
## [1] 16.48774
1/0.06
## [1] 16.66667
hist(xe, prob=TRUE)
curve(dexp(x,rate=0.06),add=TRUE)
28
Histogram of xe
0.04
0.03
Density
0.02
0.01
0.00
0 20 40 60 80 100 120
xe
29
Histogram of xw
0.008
Density
0.004
0.000
xw
x<-rnorm(1000)
hist(x,prob=TRUE)
curve(dnorm(x),add=TRUE)
30
Histogram of x
0.3
Density
0.2
0.1
0.0
−3 −2 −1 0 1 2 3
## Box plot—boxplot()
y<-rweibull(1000,shape=1.5,scale=50)
hist(y,prob=TRUE)
31
Histogram of y
0.015
0.010
Density
0.005
0.000
boxplot(y,horizontal = TRUE)
32
0 50 100 150 200
median(y)
## [1] 38.06402
quantile(y,prob=c(0.25,.50,.75))
min(y)
## [1] 0.2924755
max(y)
## [1] 196.229
IQR(y)
## [1] 39.28781
33
LowerWhisker=max(min(y),20.449-1.5*IQR(y))
LowerWhisker
## [1] 0.2924755
UpperWhisker=min(max(y),60.734+1.5*IQR(y))
UpperWhisker
## [1] 119.6657
names(iris)
head(iris)
tail(iris)
str(iris)
34
## ’data.frame’: 150 obs. of 5 variables:
## $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
## $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
## $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
## $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
boxplot(Sepal.Length~Species,data=iris)
7.5
Sepal.Length
6.5
5.5
4.5
Species
## [1] 30.8
mean(x)
## [1] -0.06938237
35
sd(x)
## [1] 0.9779999
30
25
−2 −1 0 1 2 3
36
40
35
y
30
25
−2 −1 0 1 2 3
37
40
35
y
30
25
−2 −1 0 1 2 3
38
40
35
sort(y)
30
25
−2 −1 0 1 2 3
sort(x)
data("Orange")
names(Orange)
head(Orange)
tail(Orange)
39
## 32 5 1004 125
## 33 5 1231 142
## 34 5 1372 174
## 35 5 1582 177
str(Orange)
## [1] 1 1 1 1 1 1 1 2 2 2 2 2 2 2 3 3 3 3 3 3 3 4 4 4 4 4 4 4
## [29] 5 5 5 5 5 5 5
## Levels: 3 < 1 < 5 < 2 < 4
unique(as.character(Orange$Tree))
40
200
circumference
150
100
50
age
41
4
4
200
2 2
4 5
2 5
4
circumference
150
2
5 1 1
3 3
5
1
4
2 1 3
3
100
1
5
3
2
4
1
50
3
5
2
4
5
3
1
age
1.27 QQ plot
z=rnorm(10000,mean=0,sd=1)
quantile(z,prob=c(0.25,.50,.75))
par(mfrow = c(1,4))
X <- rnorm(1000)
A <- rnorm(1000)
qqplot(X, A, main = "A and X are the same")
B <- rnorm(1000, mean = 3, sd = 2)
qqplot(X, B, main = "B is rescaled X")
C <- rt(1000, df = 2)
qqplot(X, C, main = "C has heavier tails")
par(mfrow=c(1,1))
42
A and X are the same
3
B is rescaled X C has heavier tails
30
10
2
20
1
C
A
10
0
−1
0
0
−2
−10
−3
−3 0 2 −3 0 2 −3 0 2
X X X
curve(dt(x,df=100),from=-5,to=5)
curve(dnorm(x),add=TRUE,lty=2)
43
0.4
0.3
dt(x, df = 100)
0.2
0.1
0.0
−4 −2 0 2 4
1.29 lm
y=c(2,4,5,4,10)
x=c(4,5,10,12,10)
plot(y~x)
44
10
8
6
y
4
2
4 6 8 10 12
## (Intercept) x
## 1.4713115 0.4303279
45
10
8
6
y
4
2
4 6 8 10 12
plot(circumference~age, pch=as.numeric(as.character(Tree)),data=Orange)
abline(lm(circumference~age,data=Orange,subset=Tree=="1"))
abline(lm(circumference~age,data=Orange,subset=Tree=="2"),lty=2,lwd=2)
abline(lm(circumference~age,data=Orange,subset=Tree=="3"),lty=3)
abline(lm(circumference~age,data=Orange,subset=Tree=="4"),lty=4)
abline(lm(circumference~age,data=Orange,subset=Tree=="5"),lty=5)
legend("bottomright",legend = paste("Tree",1:5),lty=1:5,lwd=c(1,1,2,1,1))
46
200
circumference
150
100
Tree 1
Tree 2
Tree 3
Tree 4
50
Tree 5
age
47
Line 4
Line 3 Margin 3
Line 2
Line 1
Line 0
50
40
Plot region
Margin 2
Margin 4
30
10Line 120
Line 4
Line 3
Line 2
Line 0
Line 0
Line 1
Line 2
Line 3
Line 4
(6, 20)
0
Line 0
2 Line 1 4 6 8
Line 2
Line 3 Margin 1
Line 4
48
Areas of the World’s Largest Islands
20
15
Frequency
10
5
0
motor<-read.table("motor.txt",header = TRUE)
boxplot(motor, ylab = "Vibration", axes = FALSE)
title("Motor Vibration Data")
mtext("The Effects of Different Bearings", side = 3, line = 0)
box()
axis(side = 2, las = 2)
axis(side = 1, at = 1:5, label = names(motor))
49
Motor Vibration Data
The Effects of Different Bearings
17
16
Vibration
15
14
13
12
50