You are on page 1of 35

Introduction to R Programming

Introduction
• The R statistical programming language is a free open source package
based on the S language (developed by Bell Labs).
• R was created by Ross Ihaka and Robert Gentleman at the university of
Auckland, New Zealand
• The language is very powerful for writing programs.
• Many statistical functions are already built in.
• Contributed packages expand the functionality to cutting edge research.
Getting Started
• Download R from www.r-project.org
• Download RStudio from https://www.rstudio.com/
R Operations
• Mathematical Operators in R • Logical Operators
– +, -, * - Simple Mathematical operations • < less than
– X^Y - X raised to Y • <= less than or equal to
– sqrt(x) - square root of x • > greater than
– abs(x) - Absolute value of x • >= greater than or
– factorial(x) - Factorial of x equal to
– log(x) - logarithm of x • == exactly equal to
– cos(x), sin(x), tan(x) - Trigonometric functions • != not equal to
• !x Not x
• x | y x OR y
• x & y x ANDy
Declaring Variables in R
• Two ways to assign the values • To print the variable
I. Using “=” symbol >print(MyVar)
>MyVar=10 [1] 10
II. Using “<-” symbol
• >MyVar<-10

• Code begins with ‘>’ symbol and output begins with [1]
Data types in R
• R organizes data in following formats
– Scalar : Represents a single number (0 dimensional)
– Vector : Represents row of numbers (1 dimensional)
• Integer Vectors, Character Vectors and Factors
– Matrix: Represents the table like format (2 dimensional)
– Arrays : Represents the table like format (>2 dimensional)
– Lists : General vector containing other kinds of vectors
– Data Frames : Represents the table like format (2 dimensional)
Data types : Examples
• Scalar :
>var1 <- 1
• Vector :
• The “c” - concatenate command is used to create vector
>Var2 <- c(1,2,3,4,5)
>Var2 <- c(“Apple”, “Orange”, “Mango”)
>Var3 <- c(“Hello2”, 20, “Hello4”, 30)
• Colon Operator (:) for creating a sequence of numbers
>Var4 <-c(1:15)
Data types : Examples
• Vector : To access elements in vectors
>Var2 <- c(“Apple”, “Orange”, “Mango”)
• Var2[1] = Apple; Var2[2] = Orange, Var2[3] = Mango

• Factor Vector : Example of Factorial data Yes/No, Male/Female,


A/B/C/D
>Var3 <-c(“A”,”B”,”A”,”B”,”C”)
>Var3 <- as.factor(Var3)
>Var4 <- c(“Apple”, “Orange”, “Mango”)
>Var4<-as.factor(Var4)
Data types : Examples
• Matrix:
>A = matrix(c(1:6),nrow=2, ncol=3, byrow = TRUE)
>print(A)
• A[1,2] = 2
• A[2,1] = 4
• A[2,3] = 6
• A[3,1]
– Error in A[3, 1] : subscript out of bounds
• A[ ,1] = 1,4
• A[2, ] = 4,5,6
Data types : Examples
• Arrays:
># Create two vectors of different lengths.
>v1 <- c(5,9,3)
>v2 <- c(10,11,12,13,14,15)
>result <- array(c(v1,v2),dim = c(3,3,2))
>print(result)
• result[1,3,1] = 13
• result[2,1,2] = 9
• result[3,,2] = 3,12,15
Data types : Examples
• Lists:
>n = c(2, 3, 5)
>s = c("aa", "bb", "cc", "dd", "ee")
>b = c(TRUE, FALSE, TRUE, FALSE, FALSE)
>x = list(n, s, b, 3) # x contains copies of n, s,b
• x[2] =
[[1]]
[1] "aa" "bb" "cc" "dd" "ee“
• x[c(2, 4)]
[[1]]
[1] "aa" "bb" "cc" "dd" "ee"
[[2]]
[1] 3
Data types : Examples
• Data Frame:
>n = c(2, 3, 5)
>s = c("aa", "bb", "cc")
>b = c(TRUE, FALSE, TRUE)
>df = data.frame(n, s, b)
In-built Functions in R
• seq(): It is used to generate the series of numbers which is of equidistant
• It accepts three arguments
– Start element
– Stop element
– Jump element
>seq(4,9)
[1] 4 5 6 7 8 9
>seq(4,10,2) #Three arguments are given, jump by 2 elements
[1] 4 6 8 10
In-built Functions in R
• rep(), is used to generate repeated values.
• It is used in two variants, depending on whether the second argument is
a vector or a single number
>oops <- c(7,9,13)
>rep(oops,3) # It repeats the entire vector oops 3 times
[1] 7 9 13 7 9 13 7 9 13
>rep(oops,1:3)
[1] 7 9 9 13 13 13
Here, oops should be repeated by vector of 1:3 values.
Indicating that 7 should be repeated once, 9 twice, and 13 three times
In-built Functions in R
• Look at following examples of rep() function
>rep(oops,1:4)
Error in rep(oops, 1:4) : invalid 'times' argument
>rep(1:2,c(10,15))
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
>rep(1:2,each=10)
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
>rep(1:2,c(10,10)
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2
In-built Functions in R
> a<-c(201:300)
In-built Functions in R
• Some of the summary commands
In-built Functions in R
• You can “glue” vectors together, column-wise or row-wise,
using the cbind() and rbind() functions.
In-built Functions in R
• Matrix Operations
– Functions like rowSums() and rowMeans() are
used to calculate the sum of all row elements
and mean of all row elements respectively.

– Functions like colSums() and colMeans() are


used to calculate the mean of all column
elements and mean of all column elements
respectively
In-built Functions in R
 The table() command is used to create the
table objects.
• Creating Contingency Tables from vectors
• The simplest data object from which
you can create a contingency table is
vector
• Syntax is as follows
>table(x) ## Where x can be
integer/character vector or Data
Frame
In-built Functions in R
• Built-in Datasets : data()
• To get the dimensions of data : dim(data_name)
• To get first few records of data : head(data_name)
• To get last few records of data : tail(data_name)
• To see the summary of data: summary(data_name)
• To see the structure of data: str(data_name)
• To get the row names of data : rownames(data_name)
• To get the column names of data : colnames(data_name)
• To access particular column : data_set$column_name
• To get the random sample of the data: sample()
Transformations in R
• Re-assigning variable
>names(dow30)
[1] "symbol" "Date" "Open" "High" "Low"
[6] "Close" "Volume" "Adj.Close“

>dow30$mid <- (dow30$High + dow30$Low)/2


>names(dow30)
[1] "symbol" "Date" "Open" "High" "Low"
[6] "Close" "Volume" "Adj.Close" "mid”
Transformations in R
• The transform() function : Function used for changing the number/type of
variables in a data frame
transform(data, t1,t2..)

>dow30.transformed <- transform(dow30, Date=as.Date(Date),


mid = (High + Low)/2)
Transformations in R
• Applying common function to set of objects

apply(X, MARGIN, FUN, ...)


Transformations in R
• Binning
– Another common data transformation is to group a set of
observations into bins (groups) based on value of specific
variables.
– The cut() function in R makes this task simple!
Functions in R
• Functions refer to smaller groups/blocks of statements which a large program is
divided.
• Creating a Function
– Define a function with an appropriate name
– Declare the function keyword using parentheses
– Provide arguments to the functions
– Declare return statement if required
<functionname> <-function(arg1, arg2..) MyFirstFunc<-function()
{ {
statement 1 print(“hello”)
statement 2 }
……………..
return(output)
}
Functions for Visualization in R
• Graphs are drawn by using following techniques
– Using plots for single variable
– Using plots for two variables
– Using plots for multiple variables
• hist(y) : Histograms to display a frequency distribution
Functions for Visualization in R
• Graphs are drawn by using following techniques
– Using plots for single variable
– Using plots for two variables
– Using plots for multiple variables
• plot(y): Plotting indices to display values of y in sequence
Functions for Visualization in R
• Graphs are drawn by using following techniques
– Using plots for single variable
– Using plots for two variables
– Using plots for multiple variables
• pie(x) : Plotting compositional graphs such as pie diagrams
Functions for Visualization in R
• Graphs are drawn by using following techniques
– Using plots for single variable
– Using plots for two variables
– Using plots for multiple variables
• plot(x,y) : Plotting graph between two variables
• abline(lm(waiting ~ duration))
Functions for Visualization in R
• Graphs are drawn by using following techniques
– Using plots for single variable
– Using plots for two variables
• boxplot(x,y) : Plotting graph between two variables
Import and Export files in R
• To read from CSV files, read.csv() command is used.
• To read from text files, read.table() command is used.
• To write from CSV files, write.csv() command is used.
• To write from text files, write.table() command is used.

read.csv(file, header = TRUE, sep = “,”)

read.table(file, header = TRUE, sep = “/t”)

write.csv(file, “File_Name.csv”)
Machine Learning in R
• k.means : K-means clustering
• e1071 : support vector machines, bagged clustering, naive Bayes classifier
rpart Recursive Partitioning and Regression Trees.
• nnet Feed-forward Neural Networks and Multinomial Log-Linear Models.
• randomForest : random forests for classification and regression.
• caret package (short for Classification And REgression Training)
• glmnet Lasso and elastic-net regularized generalized linear models.
• gbm Generalized Boosted Regression Models.
• arules Mining Association Rules and Frequent Itemsets.
• tree Classification and regression trees.
• ipred Improved Predictors.
• mboost Model-Based Boosting.
R Packages
• To install or add new R packages • CRAN ()
– install.packages(“package_name”) • https://cran.r-project.org/
• To load the package • Comprehensive R archivee
– library(package_name) network
• To see default packages on R
– library() • Currently, the CRAN package
• To see installed packages on R repository features 10480
available packages.
– installed.packages()
• You can create your own package
Thank You

You might also like