You are on page 1of 17

SIV2010 MATHEMATICS IN BIOLOGY

REVISION (QUIZ 1) - R
BASIC OPERATIONS
To combine variables in a  Contoh <- c(3,4,5,6) [1] 3 4 5 6
vector  Contoh
 Contoh[2] [1] 4
 Contoh[0] Numeric(0)
Read the table from csv file  Namabaru <- read.csv(file=”simple.csv”, *shows the table*
head=TRUE, sep= “,”);
 Namabaru

To summarize the columns  Summary(namabaru) Trial Mass Velocity


A:3 Min: Min:
B:3 1st Bu.: 1st Bu.:
Median: Median:
Mean: Mean:
3rd Qu.: 3rd Qu.:
Max.: Max.:
To figure out what files are in  Dir() [1] "Adobe" "ClickCharts"
"Custom Office Templates"
the current directory [4] "desktop.ini" "MATLAB"
"My Music"

To determine the current > Getwd() [1] "C:/Users/ASUS/Documents"


working directory
To access individual columns  Namabaru$trial [1] A A B B A B
in a table  Namabaru$mass Levels: A B

[1] 10.0 11.0 5.0 6.0 10.5 7.0


If not sure what columns are  Names(namabaru) [1] "trial" "mass" "velocity"
there
All data are stored within  Attributes(tree) $names
[1] "C" "N" "CHBR" "REP" "LFBM"
data frame as separate "STBM" "RTBM" "LFNCC"
columns [9] "STNCC" "RTNCC" "LFBCC" "STBCC" "RTBCC"
If not sure what variable you "LFCACC" "STCACC" "RTCACC"
have use attribute [17] "LFKCC" "STKCC" "RTKCC" "LFMGCC"
"STMGCC" "RTMGCC" "LFPCC" "STPCC"
[25] "RTPCC" "LFSCC" "STSCC" "RTSCC"

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26
[27] 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52
[53] 53 54

Kalau kat tepi tu tunjuk gini 54 obs. Of 28 variables 54 rows


28 columns
means
To know classs  x <- 28 [1] "numeric"
 class(x)

VECTORS
To make assignment of a single number  A <- 3 [1] 3
 A
Can do basic operations juga  B <- sqrt(a*a+3) [1] 3.456784
 B
To list the variables you have defined in a  Ls() [1] “A” “b”
particular session
To create a vector  A <- c(1,2,4,5) [1] 1 2 4 5
 A
To do basic operations with the vector  A+1 [1] 2 3 5 6
 Mean(A) [1] 3
 Var(a) [1] 2.5
To access to particular entries in the  A[1] [1] 1
vector  A[7] [1] NA
To create a list of 10 numbers, initialized  A <- numeric(10) [1] 0 0 0 0 0 0 0 0 0 0
to zero  A
To determine data type  Typeof(a) [1] “double”
For strings  A <- c(“hello”,”there”) [1] “hello” “there”
 A [1] “character”
 Typeof(a)
R can store data as a factor.  summary(tree$CHBR) A1 A2 A3 A4 A5 A6 A7 B1 B2 B3 B4 B5
B6 B7 C1 C2 C3
3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3
1 3 1 1

//A1 is the data,, so 3 means there are 3 times


A1 muncul kat dalam column CHBR tu
To create a dataframe  a <- c(10,20,30,40) > a
[1] 10 20 30 40
 b <- c('book', 'pen', 'textbook', 'pencil_case') > b
 c <- c(TRUE,FALSE,TRUE,FALSE) [1] "book" "pen"
 d <- c(2.5, 8, 10, 7) [3] "textbook" "pencil_case"
> c
[1] TRUE FALSE TRUE FALSE
> d
[1] 2.5 8.0 10.0 7.0

Join the variables to create a dataframe  df <- data.frame(a,b,c,d) a b c d


1 10 book TRUE 2.5
 df 2 20 pen FALSE 8.0
3 30 textbook TRUE 10.0
4 40 pencil_case FALSE 7.0
To name the columns of the dataframe  names(df) <- c('ID', 'items', 'store', 'price') ID items store price
1 10 book TRUE 2.5
 df 2 20 pen FALSE 8.0
3 30 textbook TRUE 10.0
4 40 pencil_case FALSE 7.0

To know the dataframe infomations  str(df) 'data.frame': 4 obs. of 4


variables:
$ ID : num 10 20 30 40
$ items: Factor w/ 4 levels
"book","pen","pencil_case",..: 1 2
4 3
$ store: logi TRUE FALSE TRUE
FALSE
$ price: num 2.5 8 10 7

Selections 1. for all rows, column 1 until 2 > df[,1:2]


ID items
 df [ , 1:2] 1 10 book
2 20 pen
3 30 textbook
2. for all columns, row 1 until 3 only 4 40 pencil_case
 df[1:3, ]
> df[1:3,]
3. To select columns with all rows ID items store price
1 10 book TRUE 2.5
2 20 pen FALSE 8.0
 df[,c('ID', 'store')] 3 30 textbook TRUE 10.0

4. To select columns and rows > df[,c('ID', 'store')]


ID store
1 10 TRUE
 df[2:3,3:5] 2 20 FALSE
3 30 TRUE
4 40 FALSE
5. To select data based on criteria – select price above 5
 df2 <- subset(df, subset = price > 5 )
> df[2:3,3:5]
store price val
2 FALSE 8 77
3 TRUE 10 88
> df2
ID items store price val
str2
2 20 pen FALSE 8 77
v
3 30 textbook TRUE 10 88
x
4 40 pencil_case FALSE 7 99
z

To add column in an existing table  Val <- c(1,2,3,4) > df


ID items store price val
 Df$val <- Val 1 10 book TRUE 2.5 1
 Df 2 20 pen FALSE 8.0 2
3 30 textbook TRUE 10.0 3
4 40 pencil_case FALSE 7.0 4

Operations on vectors  > a = c(2,4,6,3,1,5) > a


[1] 2 4 6 3 1 5
 > b = sort(a) > b
 > c = sort(a,decreasing = TRUE) [1] 1 2 3 4 5 6
> c
 sum(a) [1] 6 5 4 3 2 1

[1] 21

MATRIX
To make a matrix  matrix_a <- matrix(1:10, byrow = TRUE, nrow = 5) > matrix_a
[,1] [,2]
 matrix_a [1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
[5,] 9 10

To know the matrix info  dim(matrix_a) [1] 5 2


//5 rows 2 columns

To combine matrix  matrix_c <- cbind(matrix_a, c(1:5)) > matrix_c


[,1] [,2] [,3]
 matrix_c [1,] 1 2 1
[2,] 3 4 2
[3,] 5 6 3
[4,] 7 8 4
 matrix_e <- matrix(51:65, byrow = TRUE, nrow = 5) [5,] 9 10 5
 matrix_f <- cbind(matrix_a, matrix_e)
 matrix_f > matrix_f
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 51 52 53
[2,] 3 4 54 55 56
[3,] 5 6 57 58 59
[4,] 7 8 60 61 62
[5,] 9 10 63 64 65

To show elements of a matrix  matrix_f [1,3] > matrix_f[1,3]


[1] 51
 matrix_f[1:3,3:4] > matrix_f[1:3,3:4]
[,1] [,2]
[1,] 51 52
[2,] 54 55
[3,] 57 58

//note that yg kat tepi tu the number changes to 1 2 3


not 3 4

LIST
To create a list  vect <- 1:5 > vect
[1] 1 2 3 4 5
/// atau
Vect <- c(1:5) > mat
 vect [,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 1
 mat <- matrix(1:9, ncol =5)
 mat //note that bila tak letak byrow = TRUE, dia akan susun
 dim(mat) kebawah dulu bukan kekanan dulu.

 df <- EuStockMarkets [1:10,] > dim(mat)


mylist <- list(vect, mat, df) [1] 2 5

> mylist
[[1]] //note ada double bracket here
[1] 1 2 3 4 5

[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 1

[[3]]
DAX SMI CAC FTSE
[1,] 1628.75 1678.1 1772.8 2443.6
[2,] 1613.63 1688.5 1750.5 2460.2
[3,] 1606.51 1678.6 1718.0 2448.2
[4,] 1621.04 1684.1 1708.1 2470.4
[5,] 1618.16 1686.6 1723.1 2484.7
[6,] 1610.61 1671.6 1714.3 2466.8
[7,] 1630.75 1682.9 1734.5 2487.9
[8,] 1640.17 1703.6 1757.4 2508.4
[9,] 1635.47 1697.5 1754.0 2510.5
[10,] 1645.89 1716.3 1754.3 2497.4

To print the 3rd element in the list Mylist [[3]] [[3]]


DAX SMI CAC FTSE
[1,] 1628.75 1678.1 1772.8 2443.6
//note double bracket [2,] 1613.63 1688.5 1750.5 2460.2
[3,] 1606.51 1678.6 1718.0 2448.2
[4,] 1621.04 1684.1 1708.1 2470.4
[5,] 1618.16 1686.6 1723.1 2484.7
[6,] 1610.61 1671.6 1714.3 2466.8
[7,] 1630.75 1682.9 1734.5 2487.9
[8,] 1640.17 1703.6 1757.4 2508.4
[9,] 1635.47 1697.5 1754.0 2510.5
[10,] 1645.89 1716.3 1754.3 2497.4

EXTERNAL
Baca data yang download from the  PATH <-'https://raw.githubusercontent.com/guru99-edu/R- //banyak
internet Programming/master/prison.csv'

 df <- read.csv(PATH)[1:5]

BASIC PROBABILITY DISTRIBUTIONS


“d returns the height of the probability density function

“p returns the cumulative density function

“q returns the inverse cumulative density function (quantiles)

“r” returns randomly generated numbers

1. NORMAL DISTRIBUTION
// If you only give the points it assumes you want to use a mean of zero and standard deviation of one.

> dnorm(0)
[1] 0.3989423

> dnorm(0)*sqrt(2*pi)
[1] 1

> dnorm(0,mean=4)
[1] 0.0001338302

> dnorm(0,mean=4,sd=10)
[1] 0.03682701

>v <- c(0,1,2)


> dnorm(v)
[1] 0.39894228 0.24197072 0.05399097

> x <- seq(-20,20,by=.1)


> y <- dnorm(x)
> plot(x,y)
> y <- dnorm(x,mean=2.5,sd=0.1)
> plot(x,y)
(a) Use the function class to find the class to which the following objects belong: golub, golub[1,1],golub.cl, golub.gnames, apply, exp,
gol.fac, plot, ALL.

1. golub = matrix
2. golub[1,1] = numeric
3. golub.cl = numeric
4. golub.gnames = matrix
5. apply = function
6. exp = function
7. gol.fac = factor
8. plot = StandardGeneric
9. ALL = ExpressionSet
(b) What is the meaning of the following abbreviations: rm, sum, prod, seq, sd, nrow.
Answer:
1. rm = rm command is used to remove files or directory.
2. Sum = sum command is summation. It is used to calculate the sum of vector elements.
3. Prod = prod is the product which shows the multiplication results of all the values present in its arguments.
4. Seq = seq command is used to create a sequence of numbers.
5. Sd = sd stands for standard deviation.
6. Nrow = nrows is used to show the number of rows.

(c) For what purpose are the following functions useful: grep, apply, gl, library, source, setwd, history, str.
Answer:
1. grep = grep is useful in searching plain-text data sets for lines that match a regular expression.
2. Apply = apply is used to return a vector from a function on the rows or columns of a matrix.
3. gl = gl is used to create a factor by specifying the pattern of levels.
4. library = library is the location where the packages are stored and it is used to load add on packages.
5. source = source is used to accept its input from the named file or URL or connection or expressions directly.
6. setwd = setwd is used to set or change R working directory.
7. history = history is useful in printing the last commands given from the command line.
8. str = str is used as compact way to display the structure of an R object.

2. gendat Consider the data in the matrix gendat, constructed in Section 1.6. Its small size has the advantage that you can check your
computations even by a pocket calculator.
(a) Use apply to compute the standard deviation of the persons (column).
Answer:
apply(gendat,2,sd)
(b) Use apply to compute the standard deviation of the genes(row).
Answer: apply(gendat,1,sd)
(c) Order the matrix according to the gene standard deviations(row).
Answer:
sdgenesvalues <- apply(gendat,1,sd) //assign a variable first to count sd
o <- order(sdgenesvalues,decreasing=TRUE) //assign the order according to the ds
gendat[o,] //call the order, order the rows, all columns

(d) Which gene has the largest standard deviation?


Answer: Gene 1

3. Computations on gene means of the Golub data.


(a) Use apply to compute the mean gene expression value.
Answer:
data(golub, package = "multtest") //read the golub data
meangolub <- apply(golub,1,mean)
meangolub
(b) Order the data matrix according to the gene means.
Answer:
o <- order(meangolub,decreasing=TRUE)
golub[o,]
(c) Give the names of the three genes with the largest mean expression value.
Answer: golub.gnames[o[1:3],3]
(d) Give the biological names of these genes.
Answer: golub.gnames[o[1:3],2]

4. Computations on gene standard deviations of the Golub data.


(a) Use apply to compute the standard deviation per gene.
Answer: sdgolub <- apply(golub,1,sd)
(b) Select the expression values of the genes with standard deviation larger than two.
Answer: golubsd <- golub[sdgolub>2,]
(c) How many genes have this property?
Answer: 0 genes

STRIPCHARTS
A strip chart is the most basic type of plot available.

To read a file and see the name of the  w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)


columns  names(w1)
To make a stripchart  stripchart(w1$vals)
To see which points are repeated  stripchart(w1$vals,method="stack")
To have the boxes moved up and down  stripchart(w1$vals,method="jitter")
so that there is more separation
between them
If you want the boxes to be vertical not  stripchart(w1$vals,vertical=TRUE,method="jitter")
horizontal
To create stripchart + the title and  stripchart(w1$vals,method="stack", main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')
labels  title('Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')

HISTOGRAM
It plots the frequencies that data appears within certain ranges.

To plot histogram, with title  hist(w1$vals)


 > hist(w1$vals,main="Distribution of w1",xlab="w1")
You can specify the number of breaks to  hist(w1$vals,breaks=2)’
use using the breaks option  hist(w1$vals,breaks=4)
 hist(w1$vals,breaks=6)
 hist(w1$vals,breaks=8)
 hist(w1$vals,breaks=12)
You can also vary the size of the domain  hist(w1$vals,breaks=12,xlim=c(0,10))
using the xlim option  hist(w1$vals,breaks=12,xlim=c(-1,2))
 hist(lfbm,breaks=4, xlim=c(0,2), ylim=c(0,30))
To add another plot to a histogram  hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 stripchart(w1$vals,add=TRUE,at=15.5)
BOXPLOTS
A boxplot provides a graphical view of the median, quartiles, maximum, and minimum of a data set.

To create boxplot  boxplot(w1$vals)


 boxplot(w1$vals, main='Leaf BioMass in High CO2 Environment’ , ylab='BioMass of Leaves')

To add a boxplot  hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 boxplot(w1$vals,horizontal=TRUE,at=16,add=TRUE,axes=FALSE)
 stripchart(w1$vals,add=TRUE,at=15)

SCATTERPLOTS
A scatter plot provides a graphical view of the relationship between two sets of numbers.

To plot with 2 different values from 2 columns  plot(tree$STBM,tree$LFBM)


To find the correlation between the two  cor(tree$STBM,tree$LFBM)
To annotate the graphs  plot(tree$STBM,tree$LFBM, main="Relationship Between Stem and Leaf Biomass", xlab="Stem Biomass",
ylab="Leaf Biomass")

NORMAL QQ PLOTS
This plot is used to determine if your data is close to being normally distributed.

Create plot  qqnorm(w1$vals, main="Normal Q-Q Plot of the Leaf Biomass", xlab="Theoretical Quantiles of the Leaf
Biomass", ylab="Sample Quantiles of the Leaf Biomass")
Add the theoretical line that the data should  qqline(w1$vals)
fall on if they were normally distributed

You might also like