You are on page 1of 14

ST.

JOSEPHS PG COLLEGE
Dept. of Business Management – MBA 2nd Year
Subject: BUSINESS ANALYTICS USING ‘R’
LAB MANUAL RECORD
1. What is R Command Prompt and R Script File?
R Command Prompt:
Once you have R environment setup, then it’s easy to start your R command prompt
by just typing the following command at your command prompt −
$R
This will launch R interpreter and you will get a prompt > where you can start typing
your program as follows −
> myString <- "Hello, World!"
> print ( myString)
[1] "Hello, World!"
Here first statement defines a string variable myString, where we assign a string
"Hello, World!" and then next statement print() is being used to print the value stored
in variable myString.
R Script File:
Usually, you will do your programming by writing your programs in script files and
then you execute those scripts at your command prompt with the help of R interpreter
called Rscript. So let's start with writing following code in a text file called test.R as
under −

# My first program in R Programming


myString <- "Hello, World!"

print ( myString)
Save the above code in a file test.R and execute it at Linux command prompt as given
below. Even if you are using Windows or other system, syntax will remain same.
$ Rscript test.R

2. Create a vector “ABC” representing the even months in a year. Print the month which
lies in between July and September.

Sol) > ABC <- c(“Feb”, “Apr”, “June”, “Aug”, “Oct”, “Dec”)
> print (ABC)
o/p:
[1] “Feb” “Apr” “June” “Aug” “Oct” “Dec”
> print(month[4])
o/p:
[1] “Aug”

3. Create a list with the vectors v1, v2, v3 showing the details of five students' like roll
number, name, and height (in centimeters).

Sol) > v1 = c(121418672001:121418672005)


> v2 = c(“ABC”, “XYZ”, “PQR”, “STU”, “DEF”)
> v3 = c(147, 149, 156, , 155, 167)
> student_list = list(v1, v2, v3)
> print(student_list)
o/p:
S. VENKATA SIVA KUMAR
[[1]]
[1] 121418672001 121418672002 121418672003 121418672004 121418672005
[[2]]
[1] "ABC" "XYZ" "PQR" "STU" "DEF" "GHI"
[[3]]
[1] 147 149 156 155 167

4. Create a 4x5 matrix with the natural numbers from 16 to 35 using “byrow=TRUE” and
“byrow=FALSE”. Also, mention the row names and column names as well.

Sol) First, define the vectors with row names “R1”, “R2”, “R3”, “R4” and column names
“C1”, “C2”, “C3”, “C4”, “C5”.
(i)
> rownames = c(“R1”, “R2”, “R3”, “R4”)
>colnames=c(“C1”,”C2”,”C3”,”C4”,”C5”)
> A = matrix(c(16:35), nrow=4, ncol=5, byrow=TRUE, dimnames=list(rownames,
colnames))
> print(A)
o/p:
C C C C C
1 2 3 4 5
R
16 17 18 19 20
1
R
21 22 23 24 25
2
R
26 27 28 29 30
3
R 31 32 33 34 35
4
(ii)
> B=matrix(c(16:35), nrow=4,ncol=5,byrow=FALSE,
dimnames=list(rownames,colnames))
> print(B)
o/p:
C C C C C
1 2 3 4 5
R
16 20 24 28 32
1
R
17 21 25 29 33
2
R
18 22 26 30 34
3
R
19 23 27 31 35
4

5. Write a syntax to create an array of dimension (4,4,4) with the vectors v1 and v2 which
defines the values of (12,16,8,24) and (23,21,19,17,29,22,19,26. Mention the row names (R1,
R2, R3, R4) and column names (C1, C2, C3, C4) without fail.

Sol) > rownames = c(“R1”, “R2”, “R3”, “R4”)


>colnames=c(“C1”,”C2”,”C3”,”C4”)
> v1=c(12,16,8,24)
>v2=c(23,21,19,17,29,22,19,26)

S. VENKATA SIVA KUMAR


>array.1=array(c(v1,v2),dim=c(4,4,4),dimnames=(list(rownames,colnames))

S. VENKATA SIVA KUMAR


o/p:
,,1
C C C C
1 2 3 4
R
12 23 29 12
1
R
2 16 21 22 16 ,,3
R C C C C
8 19 19 8 1 2 3 4
3
R R
24 17 26 24 29 12 23 29
4 1
R
2 22 16 21 22
,,2
C C C C R
19 8 19 19
1 2 3 4 3
R R
23 29 12 23 4 26 24 17 26
1
R
21 22 16 21
2
R
19 19 8 19 ,,4
3
R C C C C
17 26 24 17 1 2 3 4
4
R
12 23 29 12
1
R
6. Create a Data frame with the 2 16 21 22 16 details of 10 employees
working in ABC Company. The R employee details are “Emp.Id,
8 19 19 8
Emp.Gender, Emp.Name, Emp.Age, 3 Emp.Experience, and
Emp.Class”. (Create the details of R
24 17 26 24 employees on your own
covering all the above fields). 4

Sol)
>std_data=data.frame(emp.id=c(101,102,103,104,105,106,107,108,109,110),emp.gender=c("
M", "F", "F", "M", "F", "F", "M", "M", "F", "M"), emp.name=c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J"), emp.age=c(23,26,29,32,35,33,34,25,26,29),emp.exp=c(4,3,4,2,5,5,6,3,4,5),
emp.class=c("I","I","IV","III","III","II","I","II","III","I"))

> print(std_data)
o/p:
emp.id emp.gender emp.name emp.age emp.exp emp.class
1 101 M A 23 4 I
2 102 F B 26 3 I
3 103 F C 29 4 IV
4 104 M D 32 2 III
5 105 F E 35 5 III
6 106 F F 33 5 II
7 107 M G 34 6 I
8 108 M H 25 3 II
9 109 F I 26 4 III
10 110 M J 29 5 I

7. Combine two vectors ‘v1' and ‘v2' representing the even numbers from 1 to 20 and 31 to
50 using the “cbind” and rbind” functions.

Sol) > v1 = c(2,4,6,8,10,12,14,16,18,20)


> v2 = c(32,34,36,38,40,42,44,46,48,50)
> x=cbind(v1,v2)
S. VENKATA SIVA KUMAR
> print(x)
o/p:
v v
1 2
3
[1] 2 2
3
[2] 4
4
3
[3] 6 6
3
[4] 8
8
1 4
[5] 0 0
1 4
[6]
2 2
1 4
[7]
4 4
1 4
[8] 6 6
1 4
[9]
8 8
2 5
[10] 0 0

S. VENKATA SIVA KUMAR


> y=rbind(v1,v2)
> print(y)
o/p:
[ [ [ [ [ [ [ [ [ [1
1] 2] 3] 4] 5] 6] 7] 8] 9] 0]
v 2 4 6 8 10 12 14 16 18 20
1
v 32 34 36 38 40 42 44 46 48 50
2

8. Create a vector “colour” with the strings ‘purple', ‘yellow', ‘yellow', ‘green', ‘purple',
‘green', ‘green', ‘yellow', ‘yellow'. Using the above vector:
(i) Create and print a factor “factorC” to identify the factor levels.
(ii) Identify the number of levels of the factor “factorC”.

Sol)
> colour = c(‘purple', ‘yellow', ‘yellow', ‘green', ‘purple', ‘green', ‘green', ‘yellow', ‘yellow')
> factorC = factor(colour)
> print(factorC)
o/p:
[1] purple yellow yellow green purple green green yellow yellow
Levels: green purple yellow
>print(nlevels(factorC))
[1] 3

9. Create any three vectors with any sort of values not less than 10 each and apply the
following Arithmetic Operators using the three vectors.
(i) ‘+' (ii) ‘-‘ (iii) ‘*' (iv) ‘/' (x/y, x/z, y/z )

Sol)
> x=c(5,8,9,12,13,18,19,8,6,3)
>y=c(3,18,12,17,11,21,9,8,16,10)
>z=c(12,23,8,14,21,7,8,9,10,13)
> A = x+y+z
> print(A)
o/p:
[1] 20 49 29 43 45 46 36 25 32 26

> B = x-y-z
> print(B)
o/p:
[1] -10 -33 -11 -19 -19 -10 2 -9 -20 -20

> C = x*y*z
> print(C)
o/p:
[1] 180 3312 864 2856 3003 2646 1368 576 960 390

Here, D = x/y, x/z and y/z


> D1=x/y
o/p:
[1] 1.6666667 0.4444444 0.7500000 0.7058824 1.1818182 0.8571429 2.1111111 1.0000000
S. VENKATA SIVA KUMAR
[9] 0.3750000 0.3000000

> D2=x/z
o/p:
[1] 0.4166667 0.3478261 1.1250000 0.8571429 0.6190476 2.5714286 2.3750000 0.8888889
[9] 0.6000000 0.2307692

> D3=y/z
o/p:
[1] 0.2500000 0.7826087 1.5000000 1.2142857 0.5238095 3.0000000 1.1250000 0.8888889
[9] 1.6000000 0.7692308

10. Importing CSV and Excel files into R.


Reading a CSV File:
Following is a simple example of read.csv() function to read a CSV file available in your
current working directory −
data <- read.csv("input.csv")
print(data)
When we execute the above code, it produces the following result −
id, name, salary, start_date, dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 NA Gary 843.25 2015-03-27 Finance
6 6 Nina 578.00 2013-05-21 IT
7 7 Simon 632.80 2013-07-30 Operations
8 8 Guru 722.50 2014-06-17 Finance

Reading the Excel File:


The input.xlsx is read by using the read.xlsx() function as shown below. The result is
stored as a data frame in the R environment.
# Read the first worksheet in the file input.xlsx.
data <- read.xlsx("input.xlsx", sheetIndex = 1)
print(data)
When we execute the above code, it produces the following result −
id, name, salary, start_date, dept
1 1 Rick 623.30 2012-01-01 IT
2 2 Dan 515.20 2013-09-23 Operations
3 3 Michelle 611.00 2014-11-15 IT
4 4 Ryan 729.00 2014-05-11 HR
5 NA Gary 843.25 2015-03-27 Finance
6 6 Nina 578.00 2013-05-21 IT
7 7 Simon 632.80 2013-07-30 Operations
8 8 Guru 722.50 2014-06-17 Finance

11. Create a Pie Chart with the labels, chart title, rainbow color pallet for the set of values
(13, 29, 31, 9, 12, 6) representing the number of working hours of the employees A, B, C,
D, and E.
Answer:
Create data for the graph:
> x <- c (13,29,31,9,12,6)
> labels <- c (“A”, “B”, “C”, “D”, “E”)
S. VENKATA SIVA KUMAR
Give the chart a fine name:
png(file = “employees.jpg”)

Plot the chart with title and rainbow color pallet:


> pie(x, labels, main = “Employees Working Hours”, col = rainbow(length(x)))

Save the file:


dev.off()

Output:

12. Create a Bar Chart with labels, title and colors.


Bar Chart Labels, Title and Colors:
The features of the bar chart can be expanded by adding more parameters.
The main parameter is used to add title. The col parameter is used to add colors to the
bars. The args.name is a vector having same number of values as the input vector to
describe the meaning of each bar.
Example:
The below script will create and save the bar chart in the current R working directory.
# Create the data for the chart
H <- c(7,12,28,3,41)
M <- c("Mar","Apr","May","Jun","Jul")
# Give the chart file a name
png(file = "barchart_months_revenue.png")
# Plot the bar chart
barplot(H,names.arg=M,xlab="Month",ylab="Revenue",col="blue",
main="Revenue chart",border="red")
# Save the file
dev.off()
When we execute above code, it produces following result:

S. VENKATA SIVA KUMAR


S. VENKATA SIVA KUMAR
13. Create a Pie Chart with the labels, chart title, rainbow color pallet for the set of values
(13, 29, 31, 9, 12, 6) representing the number of working hours of the employees A, B, C,
D, and E.
Answer:
Create data for the graph:
> x <- c (13,29,31,9,12,6)
> labels <- c (“A”, “B”, “C”, “D”, “E”)
Give the chart a fine name:
png(file = “employees.jpg”)
Plot the chart with title and rainbow color pallet:
> pie(x, labels, main = “Employees Working Hours”, col = rainbow(length(x)))
Save the file:
dev.off()
Output:

14. Define Mean and write the basic syntax for Mean using the commands ‘trim',
‘na.rm=TRUE' using an example.
Answer:
Mean:
It is calculated by taking the sum of the values and dividing with the number of values in
a data series.
The function mean() is used to calculate this in R.
Syntax:
The basic syntax for calculating mean in R is:
mean(x, trim = 0, na.rm = FALSE, ...)
Following is the description of the parameters used −
· x is the input vector.
· trim is used to drop some observations from both end of the sorted vector.
· na.rm is used to remove the missing values from the input vector.

Example:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x)
print(result.mean)
When we execute the above code, it produces the following result −
[1] 8.22

S. VENKATA SIVA KUMAR


Applying Trim Option:
When trim parameter is supplied, the values in the vector get sorted and then the
required numbers of observations are dropped from calculating the mean.
When trim = 0.3, 3 values from each end will be dropped from the calculations to find
mean.
In this case the sorted vector is (−21, −5, 2, 3, 4.2, 7, 8, 12, 18, 54) and the values removed
from the vector for calculating mean are (−21,−5,2) from left and (12,18,54) from right.
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)

# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)
When we execute the above code, it produces the following result −
[1] 5.55

15. Create a syntax for MODE using user defined function and calculate the Mode value
for the set of frequencies assigned to a vector V= c (2, 1, 1, 2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 1).
Answer:
Mode:
The mode is the value that has highest number of occurrences in a set of data. Unike
mean and median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode. So we create a user
function to calculate mode of a data set in R. This function takes the vector as input and
gives the mode value as output.
The user defined function for finding out the mode of given frequency data is created as
follows:
getmode <- function(V) {
uniqv <- unique (V)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
Then create a vector ‘v' for the given frequency data:
> v = c (2, 1, 1, 2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 1)
> getmode(v)
Output:
[1] 2

16. If a vector X = c (10, 11, 15, 13, 12, 19, 23, 8, 11, 16, 18), then apply the following
functions to find out (a) Maximum (b) Minimum (c) Mean (d) Median using R syntax.
Answer:
Define the vector ‘x':
> x = c(10, 11, 15, 13, 12, 19, 23, 8, 11, 16, 18)
(a) To find Maximum of ‘x':
> max(x)
Output:
[1] 23
(b) To find Minimum of ‘x':
> min(x)
Output:
[1] 8
(c) To find Mean of ‘x':
> mean(x)
Output:

S. VENKATA SIVA KUMAR


[1] 14.18182

(d) To find Median of ‘x':


> median(x)
Output:
[1] 13

17. If X = set of even values in between 32 to 56, then write the syntax for calculating the
following: (a) Mean (b) Median (c) Standard Deviation (d) Variance
Answer:
The set of even values lies in between 32 to 56 are: 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52,
54, 56
Now, define the set of even values to the vector ‘x' in R console:
> x = c(32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56)
(a) To find Mean of ‘x':
> mean(x)
Output:
[1] 44
(b) To find Median of ‘x':
> median(x)
Output:
[1] 44
(c) To find Standard Deviation of ‘x':
> sd(x)
Output:
[1] 7.788881
(d) To find Variance of ‘x':
> var(x)
Output:
[1] 60.66667

18. Write about Regression Analysis and the steps to establish Regression Analysis in R.
Regression analysis is a very widely used statistical tool to establish a relationship model
between two variables. One of these variables is called predictor variable whose value is
gathered through experiments. The other variable is called response variable whose
value is derived from the predictor variable.
In Linear Regression these two variables are related through an equation, where
exponent (power) of both these variables is 1. Mathematically a linear relationship
represents a straight line when plotted as a graph. A non-linear relationship where the
exponent of any variable is not equal to 1 creates a curve.
The general mathematical equation for a linear regression is − y = ax + b
Following is the description of the parameters used −
· y is the response variable.
· x is the predictor variable.
· a and b are constants which are called the coefficients.
Steps to Establish a Regression:
A simple example of regression is predicting weight of a person when his height is
known. To do this we need to have the relationship between height and weight of a
person.
The steps to create the relationship is −
· Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
· Create a relationship model using the lm() functions in R.
S. VENKATA SIVA KUMAR
· Find the coefficients from the model created and create the mathematical equation
using these
· Get a summary of the relationship model to know the average error in prediction.
Also called residuals.
· To predict the weight of new persons, use the predict() function in R.
Input Data:
Below is the sample data representing the observations −
# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48

S. VENKATA SIVA KUMAR


lm() Function:
This function creates the relationship model between the predictor and the response
variable.
Syntax: The basic syntax for lm() function in linear regression is −
lm(formula,data)
Following is the description of the parameters used −
· formula is a symbol presenting the relation between x and y.
· data is the vector on which the formula will be applied.
Create Relationship Model & get the Coefficients
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)

# Apply the lm() function.


relation <- lm(y~x)

print(relation)
When we execute the above code, it produces the following result −
Call:
lm(formula = y ~ x)

Coefficients:
(Intercept) x
-38.4551 0.6746

S. VENKATA SIVA KUMAR

You might also like