Siv2010 Mathematics in Biology: Revision (Quiz 1) - R

SIV2010 MATHEMATICS IN BIOLOGY
REVISION (QUIZ 1) - R
BASIC OPERATIONS
To combine variables in a  Contoh <- c(3,4,5,6) [1] 3 4 5 6
vector  Contoh
 Contoh[2] [1] 4
 Contoh[0] Numeric(0)
Read the table from csv file  Namabaru <- read.csv(file=”simple.csv”, *shows the table*
head=TRUE, sep= “,”);
 Namabaru
To summarize the columns  Summary(namabaru) Trial Mass Velocity

A:3 Min: Min:
B:3 1st Bu.: 1st Bu.:
Median: Median:
Mean: Mean:
3rd Qu.: 3rd Qu.:
Max.: Max.:
To figure out what files are in  Dir() [1] "Adobe" "ClickCharts"
"Custom Office Templates"
the current directory [4] "desktop.ini" "MATLAB"
"My Music"
To determine the current > Getwd() [1] "C:/Users/ASUS/Documents"

working directory
To access individual columns  Namabaru$trial [1] A A B B A B
in a table  Namabaru$mass Levels: A B
[1] 10.0 11.0 5.0 6.0 10.5 7.0

If not sure what columns are  Names(namabaru) [1] "trial" "mass" "velocity"
there
All data are stored within  Attributes(tree) $names
[1] "C" "N" "CHBR" "REP" "LFBM"
data frame as separate "STBM" "RTBM" "LFNCC"
columns [9] "STNCC" "RTNCC" "LFBCC" "STBCC" "RTBCC"
If not sure what variable you "LFCACC" "STCACC" "RTCACC"
have use attribute [17] "LFKCC" "STKCC" "RTKCC" "LFMGCC"
"STMGCC" "RTMGCC" "LFPCC" "STPCC"
[25] "RTPCC" "LFSCC" "STSCC" "RTSCC"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16 17 18 19 20 21 22 23 24 25 26
[27] 27 28 29 30 31 32 33 34 35 36 37 38 39 40
41 42 43 44 45 46 47 48 49 50 51 52
[53] 53 54
Kalau kat tepi tu tunjuk gini 54 obs. Of 28 variables 54 rows

28 columns
means
To know classs  x <- 28 [1] "numeric"
 class(x)
VECTORS
To make assignment of a single number  A <- 3 [1] 3
 A
Can do basic operations juga  B <- sqrt(a*a+3) [1] 3.456784
 B
To list the variables you have defined in a  Ls() [1] “A” “b”
particular session
To create a vector  A <- c(1,2,4,5) [1] 1 2 4 5
 A
To do basic operations with the vector  A+1 [1] 2 3 5 6
 Mean(A) [1] 3
 Var(a) [1] 2.5
To access to particular entries in the  A[1] [1] 1
vector  A[7] [1] NA
To create a list of 10 numbers, initialized  A <- numeric(10) [1] 0 0 0 0 0 0 0 0 0 0
to zero  A
To determine data type  Typeof(a) [1] “double”
For strings  A <- c(“hello”,”there”) [1] “hello” “there”
 A [1] “character”
 Typeof(a)
R can store data as a factor.  summary(tree$CHBR) A1 A2 A3 A4 A5 A6 A7 B1 B2 B3 B4 B5
B6 B7 C1 C2 C3
3 1 1 3 1 3 1 1 3 3 3 3 3 3 1 3
1 3 1 1
//A1 is the data,, so 3 means there are 3 times

A1 muncul kat dalam column CHBR tu
To create a dataframe  a <- c(10,20,30,40) > a
[1] 10 20 30 40
 b <- c('book', 'pen', 'textbook', 'pencil_case') > b
 c <- c(TRUE,FALSE,TRUE,FALSE) [1] "book" "pen"
 d <- c(2.5, 8, 10, 7) [3] "textbook" "pencil_case"
> c
[1] TRUE FALSE TRUE FALSE
> d
[1] 2.5 8.0 10.0 7.0
Join the variables to create a dataframe  df <- data.frame(a,b,c,d) a b c d

1 10 book TRUE 2.5
 df 2 20 pen FALSE 8.0
3 30 textbook TRUE 10.0
4 40 pencil_case FALSE 7.0
To name the columns of the dataframe  names(df) <- c('ID', 'items', 'store', 'price') ID items store price
1 10 book TRUE 2.5
 df 2 20 pen FALSE 8.0
3 30 textbook TRUE 10.0
4 40 pencil_case FALSE 7.0
To know the dataframe infomations  str(df) 'data.frame': 4 obs. of 4

variables:
$ ID : num 10 20 30 40
$ items: Factor w/ 4 levels
"book","pen","pencil_case",..: 1 2
4 3
$ store: logi TRUE FALSE TRUE
FALSE
$ price: num 2.5 8 10 7
Selections 1. for all rows, column 1 until 2 > df[,1:2]

ID items
 df [ , 1:2] 1 10 book
2 20 pen
3 30 textbook
2. for all columns, row 1 until 3 only 4 40 pencil_case
 df[1:3, ]
> df[1:3,]
3. To select columns with all rows ID items store price
1 10 book TRUE 2.5
2 20 pen FALSE 8.0
 df[,c('ID', 'store')] 3 30 textbook TRUE 10.0
4. To select columns and rows > df[,c('ID', 'store')]

ID store
1 10 TRUE
 df[2:3,3:5] 2 20 FALSE
3 30 TRUE
4 40 FALSE
5. To select data based on criteria – select price above 5
 df2 <- subset(df, subset = price > 5 )
> df[2:3,3:5]
store price val
2 FALSE 8 77
3 TRUE 10 88
> df2
ID items store price val
str2
2 20 pen FALSE 8 77
v
3 30 textbook TRUE 10 88
x
4 40 pencil_case FALSE 7 99
z
To add column in an existing table  Val <- c(1,2,3,4) > df

ID items store price val
 Df$val <- Val 1 10 book TRUE 2.5 1
 Df 2 20 pen FALSE 8.0 2
3 30 textbook TRUE 10.0 3
4 40 pencil_case FALSE 7.0 4
Operations on vectors  > a = c(2,4,6,3,1,5) > a

[1] 2 4 6 3 1 5
 > b = sort(a) > b
 > c = sort(a,decreasing = TRUE) [1] 1 2 3 4 5 6
> c
 sum(a) [1] 6 5 4 3 2 1
[1] 21
MATRIX
To make a matrix  matrix_a <- matrix(1:10, byrow = TRUE, nrow = 5) > matrix_a
[,1] [,2]
 matrix_a [1,] 1 2
[2,] 3 4
[3,] 5 6
[4,] 7 8
[5,] 9 10
To know the matrix info  dim(matrix_a) [1] 5 2

//5 rows 2 columns
To combine matrix  matrix_c <- cbind(matrix_a, c(1:5)) > matrix_c

[,1] [,2] [,3]
 matrix_c [1,] 1 2 1
[2,] 3 4 2
[3,] 5 6 3
[4,] 7 8 4
 matrix_e <- matrix(51:65, byrow = TRUE, nrow = 5) [5,] 9 10 5
 matrix_f <- cbind(matrix_a, matrix_e)
 matrix_f > matrix_f
[,1] [,2] [,3] [,4] [,5]
[1,] 1 2 51 52 53
[2,] 3 4 54 55 56
[3,] 5 6 57 58 59
[4,] 7 8 60 61 62
[5,] 9 10 63 64 65
To show elements of a matrix  matrix_f [1,3] > matrix_f[1,3]

[1] 51
 matrix_f[1:3,3:4] > matrix_f[1:3,3:4]
[,1] [,2]
[1,] 51 52
[2,] 54 55
[3,] 57 58
//note that yg kat tepi tu the number changes to 1 2 3

not 3 4
LIST
To create a list  vect <- 1:5 > vect
[1] 1 2 3 4 5
/// atau
Vect <- c(1:5) > mat
 vect [,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 1
 mat <- matrix(1:9, ncol =5)
 mat //note that bila tak letak byrow = TRUE, dia akan susun
 dim(mat) kebawah dulu bukan kekanan dulu.
 df <- EuStockMarkets [1:10,] > dim(mat)

mylist <- list(vect, mat, df) [1] 2 5
> mylist
[[1]] //note ada double bracket here
[1] 1 2 3 4 5
[[2]]
[,1] [,2] [,3] [,4] [,5]
[1,] 1 3 5 7 9
[2,] 2 4 6 8 1
[[3]]
DAX SMI CAC FTSE
[1,] 1628.75 1678.1 1772.8 2443.6
[2,] 1613.63 1688.5 1750.5 2460.2
[3,] 1606.51 1678.6 1718.0 2448.2
[4,] 1621.04 1684.1 1708.1 2470.4
[5,] 1618.16 1686.6 1723.1 2484.7
[6,] 1610.61 1671.6 1714.3 2466.8
[7,] 1630.75 1682.9 1734.5 2487.9
[8,] 1640.17 1703.6 1757.4 2508.4
[9,] 1635.47 1697.5 1754.0 2510.5
[10,] 1645.89 1716.3 1754.3 2497.4
To print the 3rd element in the list Mylist [[3]] [[3]]

DAX SMI CAC FTSE
[1,] 1628.75 1678.1 1772.8 2443.6
//note double bracket [2,] 1613.63 1688.5 1750.5 2460.2
[3,] 1606.51 1678.6 1718.0 2448.2
[4,] 1621.04 1684.1 1708.1 2470.4
[5,] 1618.16 1686.6 1723.1 2484.7
[6,] 1610.61 1671.6 1714.3 2466.8
[7,] 1630.75 1682.9 1734.5 2487.9
[8,] 1640.17 1703.6 1757.4 2508.4
[9,] 1635.47 1697.5 1754.0 2510.5
[10,] 1645.89 1716.3 1754.3 2497.4
EXTERNAL
Baca data yang download from the  PATH <-'https://raw.githubusercontent.com/guru99-edu/R- //banyak
internet Programming/master/prison.csv'
 df <- read.csv(PATH)[1:5]
BASIC PROBABILITY DISTRIBUTIONS

“d returns the height of the probability density function
”
“p returns the cumulative density function
”
“q returns the inverse cumulative density function (quantiles)
”
“r” returns randomly generated numbers
1. NORMAL DISTRIBUTION
// If you only give the points it assumes you want to use a mean of zero and standard deviation of one.
> dnorm(0)
[1] 0.3989423
> dnorm(0)*sqrt(2*pi)
[1] 1
> dnorm(0,mean=4)
[1] 0.0001338302
> dnorm(0,mean=4,sd=10)
[1] 0.03682701
>v <- c(0,1,2)

> dnorm(v)
[1] 0.39894228 0.24197072 0.05399097
> x <- seq(-20,20,by=.1)

> y <- dnorm(x)
> plot(x,y)
> y <- dnorm(x,mean=2.5,sd=0.1)
> plot(x,y)
(a) Use the function class to ﬁnd the class to which the following objects belong: golub, golub[1,1],golub.cl, golub.gnames, apply, exp,
gol.fac, plot, ALL.
1. golub = matrix
2. golub[1,1] = numeric
3. golub.cl = numeric
4. golub.gnames = matrix
5. apply = function
6. exp = function
7. gol.fac = factor
8. plot = StandardGeneric
9. ALL = ExpressionSet
(b) What is the meaning of the following abbreviations: rm, sum, prod, seq, sd, nrow.
Answer:
1. rm = rm command is used to remove files or directory.
2. Sum = sum command is summation. It is used to calculate the sum of vector elements.
3. Prod = prod is the product which shows the multiplication results of all the values present in its arguments.
4. Seq = seq command is used to create a sequence of numbers.
5. Sd = sd stands for standard deviation.
6. Nrow = nrows is used to show the number of rows.
(c) For what purpose are the following functions useful: grep, apply, gl, library, source, setwd, history, str.
Answer:
1. grep = grep is useful in searching plain-text data sets for lines that match a regular expression.
2. Apply = apply is used to return a vector from a function on the rows or columns of a matrix.
3. gl = gl is used to create a factor by specifying the pattern of levels.
4. library = library is the location where the packages are stored and it is used to load add on packages.
5. source = source is used to accept its input from the named file or URL or connection or expressions directly.
6. setwd = setwd is used to set or change R working directory.
7. history = history is useful in printing the last commands given from the command line.
8. str = str is used as compact way to display the structure of an R object.
2. gendat Consider the data in the matrix gendat, constructed in Section 1.6. Its small size has the advantage that you can check your
computations even by a pocket calculator.
(a) Use apply to compute the standard deviation of the persons (column).
Answer:
apply(gendat,2,sd)
(b) Use apply to compute the standard deviation of the genes(row).
Answer: apply(gendat,1,sd)
(c) Order the matrix according to the gene standard deviations(row).
Answer:
sdgenesvalues <- apply(gendat,1,sd) //assign a variable first to count sd
o <- order(sdgenesvalues,decreasing=TRUE) //assign the order according to the ds
gendat[o,] //call the order, order the rows, all columns
(d) Which gene has the largest standard deviation?

Answer: Gene 1
3. Computations on gene means of the Golub data.

(a) Use apply to compute the mean gene expression value.
Answer:
data(golub, package = "multtest") //read the golub data
meangolub <- apply(golub,1,mean)
meangolub
(b) Order the data matrix according to the gene means.
Answer:
o <- order(meangolub,decreasing=TRUE)
golub[o,]
(c) Give the names of the three genes with the largest mean expression value.
Answer: golub.gnames[o[1:3],3]
(d) Give the biological names of these genes.
Answer: golub.gnames[o[1:3],2]
4. Computations on gene standard deviations of the Golub data.

(a) Use apply to compute the standard deviation per gene.
Answer: sdgolub <- apply(golub,1,sd)
(b) Select the expression values of the genes with standard deviation larger than two.
Answer: golubsd <- golub[sdgolub>2,]
(c) How many genes have this property?
Answer: 0 genes
STRIPCHARTS
A strip chart is the most basic type of plot available.
To read a file and see the name of the  w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)

columns  names(w1)
To make a stripchart  stripchart(w1$vals)
To see which points are repeated  stripchart(w1$vals,method="stack")
To have the boxes moved up and down  stripchart(w1$vals,method="jitter")
so that there is more separation
between them
If you want the boxes to be vertical not  stripchart(w1$vals,vertical=TRUE,method="jitter")
horizontal
To create stripchart + the title and  stripchart(w1$vals,method="stack", main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')
labels  title('Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves')
HISTOGRAM
It plots the frequencies that data appears within certain ranges.
To plot histogram, with title  hist(w1$vals)

 > hist(w1$vals,main="Distribution of w1",xlab="w1")
You can specify the number of breaks to  hist(w1$vals,breaks=2)’
use using the breaks option  hist(w1$vals,breaks=4)
 hist(w1$vals,breaks=6)
You can also vary the size of the domain  hist(w1$vals,breaks=12,xlim=c(0,10))
using the xlim option  hist(w1$vals,breaks=12,xlim=c(-1,2))
 hist(lfbm,breaks=4, xlim=c(0,2), ylim=c(0,30))
To add another plot to a histogram  hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 stripchart(w1$vals,add=TRUE,at=15.5)
BOXPLOTS
A boxplot provides a graphical view of the median, quartiles, maximum, and minimum of a data set.
To create boxplot  boxplot(w1$vals)

 boxplot(w1$vals, main='Leaf BioMass in High CO2 Environment’ , ylab='BioMass of Leaves')

To add a boxplot  hist(w1$vals,main='Leaf BioMass in High CO2 Environment',xlab='BioMass of Leaves',ylim=c(0,16))
 boxplot(w1$vals,horizontal=TRUE,at=16,add=TRUE,axes=FALSE)
 stripchart(w1$vals,add=TRUE,at=15)
SCATTERPLOTS
A scatter plot provides a graphical view of the relationship between two sets of numbers.
To plot with 2 different values from 2 columns  plot(tree$STBM,tree$LFBM)

To find the correlation between the two  cor(tree$STBM,tree$LFBM)
To annotate the graphs  plot(tree$STBM,tree$LFBM, main="Relationship Between Stem and Leaf Biomass", xlab="Stem Biomass",
ylab="Leaf Biomass")
NORMAL QQ PLOTS
This plot is used to determine if your data is close to being normally distributed.
Create plot  qqnorm(w1$vals, main="Normal Q-Q Plot of the Leaf Biomass", xlab="Theoretical Quantiles of the Leaf
Biomass", ylab="Sample Quantiles of the Leaf Biomass")
Add the theoretical line that the data should  qqline(w1$vals)
fall on if they were normally distributed

Siv2010 Mathematics in Biology: Revision (Quiz 1) - R

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Siv2010 Mathematics in Biology: Revision (Quiz 1) - R

Uploaded by

Copyright:

Available Formats

SIV2010 MATHEMATICS IN BIOLOGY

To summarize the columns  Summary(namabaru) Trial Mass Velocity

To determine the current > Getwd() [1] "C:/Users/ASUS/Documents"

[1] 10.0 11.0 5.0 6.0 10.5 7.0

Kalau kat tepi tu tunjuk gini 54 obs. Of 28 variables 54 rows

//A1 is the data,, so 3 means there are 3 times

Join the variables to create a dataframe  df <- data.frame(a,b,c,d) a b c d

To know the dataframe infomations  str(df) 'data.frame': 4 obs. of 4

Selections 1. for all rows, column 1 until 2 > df[,1:2]

4. To select columns and rows > df[,c('ID', 'store')]

To add column in an existing table  Val <- c(1,2,3,4) > df

Operations on vectors  > a = c(2,4,6,3,1,5) > a

To know the matrix info  dim(matrix_a) [1] 5 2

To combine matrix  matrix_c <- cbind(matrix_a, c(1:5)) > matrix_c

To show elements of a matrix  matrix_f [1,3] > matrix_f[1,3]

//note that yg kat tepi tu the number changes to 1 2 3

 df <- EuStockMarkets [1:10,] > dim(mat)

To print the 3rd element in the list Mylist [[3]] [[3]]

BASIC PROBABILITY DISTRIBUTIONS

>v <- c(0,1,2)

> x <- seq(-20,20,by=.1)

(d) Which gene has the largest standard deviation?

3. Computations on gene means of the Golub data.

4. Computations on gene standard deviations of the Golub data.

To read a file and see the name of the  w1 <- read.csv(file="w1.dat",sep=",",head=TRUE)

To plot histogram, with title  hist(w1$vals)

To create boxplot  boxplot(w1$vals)

To plot with 2 different values from 2 columns  plot(tree$STBM,tree$LFBM)

You might also like