You are on page 1of 15

UNIT-I: Introduction of the Language

Introduction

 R is a programming language and software environment for statistical analysis, graphics


representation and reporting.
 R was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New
Zealand, in the 1960s and 1970s and is currently maintained by the R Development Core
Team.
 R is an open source freely available software under the GNU (General Public License) and
pre-compiled binary versions are provided for various operating systems like Linux, Windows
and Mac.
 This programming language was named R, based on the first letter of first name of the two R
authors (Robert Gentleman and Ross Ihaka), and partly a play on the name of the Bell Labs
Language S.

Features of R :
1. An effective data handling and storage facility.
2. R is an interpreted programming language which means it allows coding in interactive
manner.
3. R is a well-developed, simple and effective programming language which includes
conditionals, loops, user defined, recursive functions and input and output facilities.
4. A set of large number of operators for calculations on an array in particular matrices.
5. A set of large collection of intermediate tools for data analysis
6. Excellent graphical facilities for analysis of data and to display the results directly on
computer or as a hard copy.
7. R has excellent in-built help system.
8. R is compatible with S-plus.
Thus R is a useful software for interactive data analysis.

Applications of R Programming in Real World

•Data Science
•Statistical computing(9100+ packages)
•Machine Learning (ML tasks like linear and non-linear regression, decision trees, linear and non-
linear classification and many more)

1.2 R
R - Local Environment Setup
Windows Installation
 You can download the Windows installer version of R from R-3.2.2 for Windows (32/64 bit)
and save it in a local directory.
 As it is a Windows installer (.exe) with a name "R-version-win.exe". You can just double
click and run the installer accepting the default settings. If your Windows is 32-bit version, it
installs the 32-bit version. But if your windows is 64-bit, then it installs both the 32-bit and
64-bit versions.
 After installation you can locate the icon to run the Program in a directory structure "R\R-
3.2.2\bin\i386\Rgui.exe" under the Windows Program Files. Clicking this icon brings up the
R-GUI which is the R console to do R Programming.
KLS GCC BCA V- Semester Statistical Computing & R Programming
R Studio: R Studio is an Integrated Development Environment (IDE) for R Language with advanced
and more user-friendly GUI. It includes a console, syntax- highlighting editor that supports direct
code execution, as well as tools for plotting, history, debugging and workspace management. R
Studio allows the user to run R in a more user-friendly environment. It is
open-source (i.e.free) and available at http://www.rstudio.com/.

The fig shows the GUI of R Studio. The R Studio screen has four windows:
1. Console.
2. Workspace and history.
3. Files, plots, packages and help.
4. The R script(s) and data view.
The R script is where you keep a record of your work.
KLS GCC BCA V- Semester Statistical Computing & R Programming
Create a new R script file:
1) File -> New -> R Script,
2) Click on the icon with the “+” sign and select “R Script”
3) Use shortcut as: Ctrl+Shift+N.

Comments
Comments are like helping text in your R program and they are ignored by the interpreter while
executing your actual program. Single comment is written using # in the beginning of the statement
as follows:
# My first program in R Programming

R does not support multi-line comments

R Data Types:

 Variables are nothing but reserved memory locations to store values. This means that, when
you create a variable you reserve some space in memory.
 You may like to store information of various data types like character, wide character, integer,
floating point, double floating point, Boolean etc. Based on the data type of a variable, the
operating system allocates memory and decides what can be stored in the reserved memory.
 In contrast to other programming languages like C and java in R, the variables are not
declared as some data type. The variables are assigned with R-Objects and the data type of
the R-object becomes the data type of the variable.

KLS GCC BCA V- Semester Statistical Computing & R Programming


Variable Assignment : Variable names are case sensitive. R does not require variable types to be
declared. In R assignment can be done in three ways:
= Simple assignment
<- Leftward assignment
-> Rightward assignment

Examples:
> x <- 6 # assignment operator: a less-than character (<) and a hyphen (-)
with no space
> x
[1] 6
> y = 3 # assignment operator = is used.
> y
[1] 3
> z <<- 9 # assignment to a global variable rather than a local variable.
> z
[1] 9
> 5 -> a #A rightward assignment operator (->) can be used anywhere
> a
[1] 5
> a <- b <- 7 # Multiple values can be assigned simultaneously.
> a
[1] 7
> b
[1] 7

Variable (Object) Names: Certain variable names are reserved for particular purposes. Some
reserved symbols are: c q t C D F I T
### meaning of c q t C D F I T

? ## to see help document


?c ## c means Combine Values into a Vector or List
?q ## q means Terminate an R Session
?t ## t means Matrix Transpose
?C ## C means sets contrast for a factor
?D ## D means Symbolic and Algorithmic Derivatives of Simple Expressions
?F ## F means logical vector Character strings
>F ##[1] FALSE
?I ##Inhibit Interpretation/Conversion of Objects
c("T", "TRUE", "True", "true") are true, c("F", "FALSE", "False", "false") as false, and all others
as NA.

KLS GCC BCA V- Semester Statistical Computing & R Programming


Basic Data Types in R:
1) numeric
2) integer
3) complex
4) character
5) logical

1) numeric : The most commonly used numeric data is numeric. This is similar to float or
double in other languages. It handles and decimals, both positive and negative, and also
zero.
Example: 14.3, 23.5, 60
x <- 23.5
print(class(x))
it produces the following result:
[1] "numeric"

2) Integers:

KLS GCC BCA V- Semester Statistical Computing & R Programming


Example: 2L, 34L,
x <- 2L
print(class(x))
it produces the following result:
[1] "integer"
Example:

> i <- 5L # To set an integer to a variable, append the value with an „L‟.
>i
[1] 5
> is.integer(i) # Testing whether a variable is integer or not
[1] TRUE

3) Complex : The complex data type is used to specify purely imaginary values in R. We use the
suffix i to specify the imaginary part.

Example
# 2i represents imaginary part
x <- 3 + 2i
# print class of x
print(class(x))
Output
[1] "complex"

4) Character
The character data type is used to specify character or string values in a variable.
In programming, a string is a set of characters. For example, 'A' is a single character and "Apple" is a
string.You can use single quotes ' ' or double quotes " " to represent strings.
In general, we use:
' ' for character variables
" " for string variables
Example: ’ a ', "BCA", "TRUE", '23.4'

# create a string variable


fruit <- "Apple"
print(class(fruit))

# create a character variable


my_char <- 'A'
print(class(my_char))
Output
[1] "character"
[1] "character"

KLS GCC BCA V- Semester Statistical Computing & R Programming


5) Logical:
The logical data type in R is also known as boolean data type. It can only have two values: TRUE and
FALSE.
Example:
x <- TRUE
print(x)
print(class(x))
Output
[1] TRUE
[1] "logical"

y <- FALSE
print(y)
print(class(y))

Output
[1] FALSE
[1] "logical"

Arithmetic in R:
In R, standard mathematical rules apply throughout and follow the usual left-to-right order of
operations: parentheses, exponents, multiplication, division, addition, subtraction (PEMDAS).
Here‟s are some examples in the console:

R> 2+3 R> 14/6+5 R> 3^2


[1] 5 [1] 7.333333 [1] 9

R> 14/6 R> 2^3


[1] 2.333333 [1] 8

You can find the square root of any non-negative number with the sqrt
function. You simply provide the desired number to x as shown here:
R> sqrt(x<-9)
[1] 3
R> sqrt(x<-5.311)
[1] 2.304561

Input Output functions in R

KLS GCC BCA V- Semester Statistical Computing & R Programming


> #example for scan > mystring<-readline()
> a<-scan() example for readline function
1: 10 > mystring
2: 20 [1] "example for readline function"
3: 30
4: 40
5:
Read 4 items
> a
[1] 10 20 30 40

R has a wide variety of objects for holding data, including scalars, vectors, matrices ,arrays, data
frames, and lists. They differ in terms of the type of data they can hold, how they are created, their
structural complexity, and the notation used to identify and access individual elements.

KLS GCC BCA V- Semester Statistical Computing & R Programming


Vectors:

Vectors are one-dimensional arrays that can hold numeric data, character data, or logical data.
Vectors must be homogeneous i.e, the type of data in a given vector must all be the same. One
important key factor is that in R the indexing of vector starts from 1 and not from 0.

Creating a Vector:

The function for creating a vector is the single letter c, with the desired entries in parentheses
separated by commas.

Example:
> myvec <- c(1,3,11,42)
> myvec
Output :[1] 1 3 11 42

Here are some other examples of each type of vector:


> a <- c(1, 2, 5, 3, 6, -2, 4)
> a
[1] 1 2 5 3 6 -2 4

> b <- c("one", "two", "three")


> b
[1] "one" "two" "three"

> c <- c(TRUE, TRUE, TRUE, FALSE, TRUE, FALSE)


> c
[1] TRUE TRUE TRUE FALSE TRUE FALSE
Here, a is numeric vector, b is a character vector, and c is a logical vector. Note that the data in a
vector must only be one type or mode (numeric, character, or logical). You can‟t mix modes in the
same vector.

Vector entries can be calculations or previously stored items (including vectors themselves).
Example:
> foo <- 32.1
> myvec2 <- c(3,-3,2,3.4,45+5,foo)
> myvec2
[1] 3.0 -3.0 2.0 3.4 50.0 32.1
This code created a new vector assigned to the object myvec2. Some of the entries are defined as
arithmetic expressions, and it‟s the result of the expression that‟s stored in the vector. The last
element, foo, is an existing numeric object defined as 32.1.

KLS GCC BCA V- Semester Statistical Computing & R Programming


We can combine two vectors and assign it to another vector
> myvec3 <- c(myvec,myvec2)
> myvec3
[1] 1.0 3.0 11.0 42.0 3.0 -3.0 2.0 3.4 50.0 32.1
This code creates and stores yet another vector, myvec3, which contains the entries of myvec and
myvec2 appended together in that order.

Sequences, Repetition, Sorting, and Lengths :

Let‟s create an equally spaced sequence of increasing or decreasing numeric values. This is
something you‟ll need often, for example when programming loops or when plotting data points. The
easiest way to create such a sequence, with numeric values separated by intervals of 1, is to use the
colon operator.
Example:
> 3:10
[1] 3 4 5 6 7 8 9 10
The example 3:10 should be read as “from 3 to 10 (by 1).” The result is a numeric vector just as if
you had listed each number manually in parentheses with c.

Sequences with seq


You can also use the seq command, which allows for more flexible creations of sequences. This
ready-to-use function takes in a from value, a to value, and a by value, and it returns the
corresponding sequence as a numeric vector.

Example:
> seq(from=3,to=20,by=3)
[1] 3 6 9 12 15 18

For decreasing sequences, the use of by must be negative. Here‟s an example:


>seq(from=21,to=0, by=-3)
[1] 21 18 15 12 9 6 3 0

Repetition with rep


Sequences are extremely useful, but sometimes you may want simply to repeat a certain value. We
can do this using rep.

Examples:

1. 2.
> myvec_2<-rep(x<-5,times<-3) > vec_r<-rep(5,times<-4)
> cat(myvec_2) > cat(vec_r)
5 5 5 5 5 5 5

3. 4.
> y<-1:5 > x<-1:4
> vec_reach<-rep(y,each<-2) > vec_rlen<-rep(x,length<-3)
> cat(vec_reach) > cat(vec_rlen)
KLS GCC BCA V- Semester Statistical Computing & R Programming
1 2 3 4 5 1 2 3 4 5 1 2 3 4 1 2 3 4 1 2 3 4

5.
> rep(x=c(3,62,8.3),each=2)
[1] 3.0 3.0 62.0 62.0 8.3 8.3

 The rep function is given a single value or a vector of values as its argument x, as well as a
value for the arguments times and each.
 The value for times provides the number of times to repeat x, and each provides the number of
times to repeat each element of x.
 In the first example directly above, it simply repeats a single value three times. The other
examples first use rep and times on a vector to repeat the entire vector.
 Use each to repeat each member of the vector, and finally use both times and each to do both
at once.

Sorting with sort


Sorting a vector in increasing or decreasing order of its elements is another simple operation that
crops up in everyday tasks. The sort function is used to do this.

Examples:
1.
> sort(x<-c(1, 4, 5, 2), decreasing = FALSE)
[1] 1 2 4 5

2.
> sort(x<-c(1, 4, 5, 2), decreasing = TRUE)
[1] 5 4 2 1

3.
> v1<-c(1,4,5,2,3)
> v2<-c(6, 9, 8, 7)
> sort(x<-c(v1,v2),decreasing = TRUE)
[1] 9 8 7 6 5 4 3 2 1

The sort function is pretty straightforward. You supply a vector to the function as the argument x, and
a second argument, decreasing, indicates the order in which you want to sort.

Finding a Vector Length with length


It determines how many entries exist in a vector given as the argument x.

Examples:
1.
> length(x<-c(3,2,8,1))
[1] 4

2.
> length(x<-5:13)
[1] 9

KLS GCC BCA V- Semester Statistical Computing & R Programming


3.
> y<-c(1:4,rep(5,times<-3),seq(from<-6,to<-8,by=1))
> length(y)
[1] 10
> cat(y)
1 2 3 4 5 5 5 6 7 8

Note that if you include entries that depend on the evaluation of other functions (in 3 rd example, calls
to rep and seq), length tells you the number of entries after those inner functions have been executed.

Subsetting and Element Extraction


We can access individual elements by asking R to return the values of vector at specific locations,
done by entering the name of the vector followed by the position in square brackets.
Example:
> myvec<-c(10,20,30,40,50)
> myvec[3] #element at 3rd index
[1] 30

 indexes allow you to retrieve specific elements from a vector, which is known as subsetting.
Example:
> myvec1<-c(11:17)
> length(myvec1)
[1] 7
> myvec1[length(x<-myvec1)]
[1] 17
> cat(myvec1)
11 12 13 14 15 16 17

Because length(x<-myvec1) results in the final index of the vector (in this case, 7), entering this
phrase in the square brackets extracts the final element, 17.

Similarly, you could extract the second-to-last element by subtracting 1 from the length; let‟s try that,
and also assign the result to a new object:

> myvec_len <- length(x<-myvec1)


> z <- myvec1[myvec_len-1]
> z
[1] 16 #2nd last element is 16 in myvec1

You can also delete individual elements by using negative versions of the indexes supplied in the
square brackets
Example:

> myvec2<-c(15:20)
> print(myvec2)
[1] 15 16 17 18 19 20
> myvec2[-4]
[1] 15 16 17 19 20

#excludes element 18 which is present at index 4

KLS GCC BCA V- Semester Statistical Computing & R Programming


Note: Using the square-bracket operator to extract or delete values from a vector does not change the
original vector you are subsetting unless you explicitly overwrite the vector with the subsetted
version.

The index in the square brackets can be the result of an appropriate calculation:

> myvec2[-(length(x<-myvec2))]
[1] 15 16 17 18 19
> myvec2
[1] 15 16 17 18 19 20
#excludes element 20 which is present at index 6, which is
#equivalent to the length of myvec2

As with most operations in R, you are not restricted to doing things one by one. You can also subset
objects using vectors of indexes, rather than individual indexes.

> myvec<-c(10,20,30,40,50)
> myvec[c(1,3,5)]
[1] 10 30 50

This returns the first, third, and fifth elements of myvec in one go.

#program to read vector elements and display the elements


myvec<-c()
n<-as.integer(readline('Enter the limit '))
i<-1
while(i<=n)
{
myvec[i]<-as.integer(readline('Enter the element'))
i<-i+1
}

i<-1
while(i<=n)
{
cat('elements in myvec are',myvec[i],'\n')
i<-i+1
}

Vector-Oriented Behavior:
 vectororiented, vectorized, or element-wise behavior is a key feature of the language.
 R matches up the elements according to their respective positions and performs the operation
on each corresponding pair of elements.

Example: adding two vectors of equal lengths

> v1<-c(1,2,3,4,5)
> v2<-c(6,7,8,9,10)
> v3<-c(v1+v2)
> v3
[1] 7 9 11 13 15
KLS GCC BCA V- Semester Statistical Computing & R Programming
 The situation is made more complicated when using vectors of different lengths, which can
happen in two distinct ways:

1) The first is when the length of the longer vector can be evenly divided by the length of the
shorter vector.
Example:

> a <- c(4, 5, 6, 1, 5, 3)


> b <- c(2, 4, 7)
> res<-c(a+b)
> cat(res)
6 9 13 3 9 10

2) The second is when the length of the longer vector cannot be divided by the length of the
shorter vector—this is usually unintentional on the user‟s part.

R essentially attempts to replicate, or recycle, the shorter vector by as many times as needed
to match the length of the longer vector, before completing the specified operation.

Example:
> a <- c(4, 5, 6, 1)
> b <- c(2, 4, 7)
> res<-c(a+b)
Warning message:
In a + b : longer object length is not a multiple of shorter object
length
> cat(res)
6 9 13 3

Here you see that R has matched the first three elements of a with the elements of b, but it‟s not able
to fully repeat the vector again. (we can observe the warning message).

 Another benefit of vector-oriented behavior is that you can use vectorized functions to
complete potentially laborious tasks.
 For example, if you want to sum or multiply all the entries in a numeric vector, you can just
use a built- in function.

> v1<-c(1,2,3)
> v2<-c(4,5,6)
> #You can find the sum of v1 elements with
> sum(v1)
[1] 6
>
> #You can find the product of v2 elements with
> prod(v2)
[1] 120

KLS GCC BCA V- Semester Statistical Computing & R Programming


#R program to Illustrate with if-else statement and how does it operate on
#vectors of variable length.
cat(" 1. proper variable length \n 2. Improper variable length \n")
choice <- readline("Enter your choice: ")
if(choice == '1')
{
a <- c(4, 5, 6)
b <- c(2, 4, 7)
largest <- ifelse(a>b, a, b)
cat("\n vector 1 =", a)
cat("\n vector 2 =", b)
cat("\n Largest elements are: ", largest)
} else if(choice == '2')
{
a <- c(4, 5, 6)
b <- c(2, 4, 7, 5)
largest <- ifelse(a>b, a, b)
cat("\n vector 1 =", a)
cat("\n vector 2 =", b)
cat("\n Largest elements are: ", largest, "\n")
} else
{
cat("\n Invalid choice")
}

KLS GCC BCA V- Semester Statistical Computing & R Programming

You might also like