You are on page 1of 5

R programming for NGS Data analysis

1. R programming basics
getwd() Function:

Working directory is the directory where R finds all R file for reading and writing. getwd() function returns an
absolute filepath representing the current working directory of the R process.

getwd()

Output:

[1] "C:/Users/bioc/Documents"

setwd() Function:

setwd(dir) is used to set the working directory to dir.

setwd("D:/bioc/R/")

dir() Function:

dir() function lists all the files in a directory.

ls() Function:

ls() is a function in R that lists all the object in the working environment.

rm() Funtion:

Remove objects from environment. It can be used in scenario where you want to clean the environment before
running code. Below command will remove all the object from R environment.

rm(list = ls())

Help:

help() or ?

help(rlm, package="MASS")

browseVignettes(package="ggplot2") or Vignette()or vignette("timedep",


package="survival")

data( ) or data(iris) or help(iris) or args(“timedep”)

demo(ggplot2)

Packages:

Information about the available packages on CRAN with the available.packages() function.

a <- available.packages()

head(rownames(a), 30) # Show the names of the first 30 packages


Packages can be installed with the install.packages() function in R.The following the code installs the
ggplot2 package from CRAN.

install.packages("ggplot2") or

install.packages("ggplot2", lib="/data/Rpackages/")

You can install multiple R packages at once with a single call to install.packages(). Place the names of the R
packages in a character vector.

install.packages(c("caret", "ggplot2", "dplyr"))

Load the package to make it available. The library() function is used to load packages into R. The following
code is used to load the ggplot2 package into R. Do not put the package name in quotes.

library(ggplot2)

R attributes and Objects:

R has five basic or “atomic” classes of objects:

 Numeric – Also known as Double. – Examples: 1, 1.0, 42.5


 Integer – Examples: 1L, 2L, 42L
 Complex – Example: 4 + 2i
 Logical – Two possible values: TRUE and FALSE – You can also use T and F, but this is not recommended. – NA is
also considered logical.
 Character – Examples: “a”, “Statistics”, “1 plus 2.”
o NaN(Not a Number) -undefined value, inf -infinity(positive or negative infinity)

R objects can have attributes, which are like metadata for the object. These metadata can be very useful in that
they help to describe the object.

 names, dimnames
 dimensions (e.g. matrices, arrays)
 class (e.g. integer, numeric)
 length
 other user-defined attributes/metadata

E.g. attributes(iris) #iris is a dataset

Output: $names
[1] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width" "Species"

Vector -A vector is a sequence of data elements of the same basic type. Members in a vector are officially called
components. Vectors are the most basic R data objects and there are six types of atomic vectors. They are logical,
integer, double, complex, character and raw.

x <- c(1,2,3,4,5) #double


x #If you use only x auto-printing occurs
l <- c(TRUE, FALSE) #logical
l <- c(T, F) ## logical
c <- c("a", "b", "c", "d") ## character
i <- 1:20 ## integer
cm <- c(2+2i, 3+3i) ## complex
print(l)
print(c)
print(i)
print(cm) # typeof(c) #to print vector
is.numeric(x) or as.character(l)

Arithmetic Operations on Vectors:

# Create two vectors.


v1 <- c(1:10)
v2 <- c(101:110)

# Vector addition.
add.result <- v1+v2
print(add.result)

# Vector substraction.
sub.result <- v2-v1
print(sub.result)

# Vector multiplication.
multi.result <- v1*v2
print(multi.result)

# Vector division.
divi.result <- v2/v1
print(divi.result)

List: List is a data structure having elements of mixed data types

x <- list("stat",5.1, TRUE, 1 + 4i)


x
class(x)
typeof(x)
length(x)

Factors:

x <- factor(c("male", "female", "male", "male", "female"))


x
table(x)

Matrices:

A Matrix can be created using the matrix() function. R can also be used for matrix calculations.
Matrices have rows and columns containing a single data type.

m <- matrix(nrow = 2, ncol = 3)


dim(m)
attributes(m)
m <- matrix(1:20, nrow = 4, ncol = 5)
m

x<-1:3
y<-10:12
z<-30:32
cbind(x,y,z)
rbind(x,y,z

Arrays: Arrays are the data types can store data in more than two dimensions of only one type of data. An
array can be created using the array() function. It takes vectors as input and uses the values in the dim
parameter to create an array.
# Create two vectors of different lengths.
v1 <- c(1,2,3)
v2 <- 100:110
# Take these vectors as input to create an array.
arr1 <- array(c(v1,v2))
arr1
arr2 <- array(c(v1,v2), dim=c(2,7))
arr2

col.names <- c("Col1","Col2","Col3","Col4","Col5","Col6","Col7")


row.names <- c("Row1","Row2")
matrix.names <- c("Matrix1","Matrix2")
arr3 <- array(c(v1,v2), dim=c(2,7,2), dimnames = list(row.names,col.names,
matrix.names))

Dataframes: Data frames are used to store tabular data in R. They are an important type of object in R
and are used in a variety of statistical modeling applications. Data frames are represented as a special type
of list where every element of the list has to have the same length. Each element of the list can be thought
of as a column and the length of each element of the list is the number of rows. Unlike matrices, data
frames can store different classes of objects in each column. It can also be created by reading files.
employee <- c('Ram','Sham','Jadu')
salary <- c(21000, 23400, 26800)
startdate <- as.Date(c('2016-11-1','2015-3-25','2017-3-14'))
employ_data <- data.frame(employee, salary, startdate)
employ_data
View(employ_data)
Missing values:
x <- c(100, 200, NA, 300,NA, 400)
b <- is.na(x)
x[!b]
comple.cases(x)
R operators:

R language has so many built-in operators to perform different arithmetic and logical
operations. There are mainly 4 types of operators in R.

1. Arithmetic Operators
2. Relational Operators
3. Logical Operators
4. Assignment Operators
5. Mixed Operators :,%in%,%*%

Files in R:

There are a few very useful functions for reading data into R.

1. read.table() and read.csv() are two popular functions used for reading tabular data into R.
2. readLines() is used for reading lines from a text file.
3. source() is a very useful function for reading in R code files from a another R program.
4. dget() function is also used for reading in R code files.
5. load() function is used for reading in saved workspaces
6. unserialize() function is used for reading single R objects in binary format.
There are similar functions for writing data to files

1. write.table() is used for writing tabular data to text files (i.e. CSV).
2. writeLines() function is useful for writing character data line-by-line to a file or connection.
3. dump() is a function for dumping a textual representation of multiple R objects.
4. dput() function is used for outputting a textual representation of an R object.
5. save() is useful for saving an arbitrary number of R objects in binary format to a file.
6. serialize() is used for converting an R object into a binary format for outputting to a connection (or
file).

The read.table() function is one of the most commonly used functions for reading data in R. TO get the
help file for read.table() just type ?read.table in R console.

You might also like