Professional Documents
Culture Documents
JOSEPHS PG COLLEGE
Dept. of Business Management – MBA 2nd Year
Subject: BUSINESS ANALYTICS USING ‘R’
LAB MANUAL RECORD
1. What is R Command Prompt and R Script File?
R Command Prompt:
Once you have R environment setup, then it’s easy to start your R command prompt
by just typing the following command at your command prompt −
$R
This will launch R interpreter and you will get a prompt > where you can start typing
your program as follows −
> myString <- "Hello, World!"
> print ( myString)
[1] "Hello, World!"
Here first statement defines a string variable myString, where we assign a string
"Hello, World!" and then next statement print() is being used to print the value stored
in variable myString.
R Script File:
Usually, you will do your programming by writing your programs in script files and
then you execute those scripts at your command prompt with the help of R interpreter
called Rscript. So let's start with writing following code in a text file called test.R as
under −
print ( myString)
Save the above code in a file test.R and execute it at Linux command prompt as given
below. Even if you are using Windows or other system, syntax will remain same.
$ Rscript test.R
2. Create a vector “ABC” representing the even months in a year. Print the month which
lies in between July and September.
Sol) > ABC <- c(“Feb”, “Apr”, “June”, “Aug”, “Oct”, “Dec”)
> print (ABC)
o/p:
[1] “Feb” “Apr” “June” “Aug” “Oct” “Dec”
> print(month[4])
o/p:
[1] “Aug”
3. Create a list with the vectors v1, v2, v3 showing the details of five students' like roll
number, name, and height (in centimeters).
4. Create a 4x5 matrix with the natural numbers from 16 to 35 using “byrow=TRUE” and
“byrow=FALSE”. Also, mention the row names and column names as well.
Sol) First, define the vectors with row names “R1”, “R2”, “R3”, “R4” and column names
“C1”, “C2”, “C3”, “C4”, “C5”.
(i)
> rownames = c(“R1”, “R2”, “R3”, “R4”)
>colnames=c(“C1”,”C2”,”C3”,”C4”,”C5”)
> A = matrix(c(16:35), nrow=4, ncol=5, byrow=TRUE, dimnames=list(rownames,
colnames))
> print(A)
o/p:
C C C C C
1 2 3 4 5
R
16 17 18 19 20
1
R
21 22 23 24 25
2
R
26 27 28 29 30
3
R 31 32 33 34 35
4
(ii)
> B=matrix(c(16:35), nrow=4,ncol=5,byrow=FALSE,
dimnames=list(rownames,colnames))
> print(B)
o/p:
C C C C C
1 2 3 4 5
R
16 20 24 28 32
1
R
17 21 25 29 33
2
R
18 22 26 30 34
3
R
19 23 27 31 35
4
5. Write a syntax to create an array of dimension (4,4,4) with the vectors v1 and v2 which
defines the values of (12,16,8,24) and (23,21,19,17,29,22,19,26. Mention the row names (R1,
R2, R3, R4) and column names (C1, C2, C3, C4) without fail.
Sol)
>std_data=data.frame(emp.id=c(101,102,103,104,105,106,107,108,109,110),emp.gender=c("
M", "F", "F", "M", "F", "F", "M", "M", "F", "M"), emp.name=c("A", "B", "C", "D", "E", "F", "G",
"H", "I", "J"), emp.age=c(23,26,29,32,35,33,34,25,26,29),emp.exp=c(4,3,4,2,5,5,6,3,4,5),
emp.class=c("I","I","IV","III","III","II","I","II","III","I"))
> print(std_data)
o/p:
emp.id emp.gender emp.name emp.age emp.exp emp.class
1 101 M A 23 4 I
2 102 F B 26 3 I
3 103 F C 29 4 IV
4 104 M D 32 2 III
5 105 F E 35 5 III
6 106 F F 33 5 II
7 107 M G 34 6 I
8 108 M H 25 3 II
9 109 F I 26 4 III
10 110 M J 29 5 I
7. Combine two vectors ‘v1' and ‘v2' representing the even numbers from 1 to 20 and 31 to
50 using the “cbind” and rbind” functions.
8. Create a vector “colour” with the strings ‘purple', ‘yellow', ‘yellow', ‘green', ‘purple',
‘green', ‘green', ‘yellow', ‘yellow'. Using the above vector:
(i) Create and print a factor “factorC” to identify the factor levels.
(ii) Identify the number of levels of the factor “factorC”.
Sol)
> colour = c(‘purple', ‘yellow', ‘yellow', ‘green', ‘purple', ‘green', ‘green', ‘yellow', ‘yellow')
> factorC = factor(colour)
> print(factorC)
o/p:
[1] purple yellow yellow green purple green green yellow yellow
Levels: green purple yellow
>print(nlevels(factorC))
[1] 3
9. Create any three vectors with any sort of values not less than 10 each and apply the
following Arithmetic Operators using the three vectors.
(i) ‘+' (ii) ‘-‘ (iii) ‘*' (iv) ‘/' (x/y, x/z, y/z )
Sol)
> x=c(5,8,9,12,13,18,19,8,6,3)
>y=c(3,18,12,17,11,21,9,8,16,10)
>z=c(12,23,8,14,21,7,8,9,10,13)
> A = x+y+z
> print(A)
o/p:
[1] 20 49 29 43 45 46 36 25 32 26
> B = x-y-z
> print(B)
o/p:
[1] -10 -33 -11 -19 -19 -10 2 -9 -20 -20
> C = x*y*z
> print(C)
o/p:
[1] 180 3312 864 2856 3003 2646 1368 576 960 390
> D2=x/z
o/p:
[1] 0.4166667 0.3478261 1.1250000 0.8571429 0.6190476 2.5714286 2.3750000 0.8888889
[9] 0.6000000 0.2307692
> D3=y/z
o/p:
[1] 0.2500000 0.7826087 1.5000000 1.2142857 0.5238095 3.0000000 1.1250000 0.8888889
[9] 1.6000000 0.7692308
11. Create a Pie Chart with the labels, chart title, rainbow color pallet for the set of values
(13, 29, 31, 9, 12, 6) representing the number of working hours of the employees A, B, C,
D, and E.
Answer:
Create data for the graph:
> x <- c (13,29,31,9,12,6)
> labels <- c (“A”, “B”, “C”, “D”, “E”)
S. VENKATA SIVA KUMAR
Give the chart a fine name:
png(file = “employees.jpg”)
Output:
14. Define Mean and write the basic syntax for Mean using the commands ‘trim',
‘na.rm=TRUE' using an example.
Answer:
Mean:
It is calculated by taking the sum of the values and dividing with the number of values in
a data series.
The function mean() is used to calculate this in R.
Syntax:
The basic syntax for calculating mean in R is:
mean(x, trim = 0, na.rm = FALSE, ...)
Following is the description of the parameters used −
· x is the input vector.
· trim is used to drop some observations from both end of the sorted vector.
· na.rm is used to remove the missing values from the input vector.
Example:
# Create a vector.
x <- c(12,7,3,4.2,18,2,54,-21,8,-5)
# Find Mean.
result.mean <- mean(x)
print(result.mean)
When we execute the above code, it produces the following result −
[1] 8.22
# Find Mean.
result.mean <- mean(x,trim = 0.3)
print(result.mean)
When we execute the above code, it produces the following result −
[1] 5.55
15. Create a syntax for MODE using user defined function and calculate the Mode value
for the set of frequencies assigned to a vector V= c (2, 1, 1, 2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 1).
Answer:
Mode:
The mode is the value that has highest number of occurrences in a set of data. Unike
mean and median, mode can have both numeric and character data.
R does not have a standard in-built function to calculate mode. So we create a user
function to calculate mode of a data set in R. This function takes the vector as input and
gives the mode value as output.
The user defined function for finding out the mode of given frequency data is created as
follows:
getmode <- function(V) {
uniqv <- unique (V)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
Then create a vector ‘v' for the given frequency data:
> v = c (2, 1, 1, 2, 2, 3, 3, 2, 2, 3, 3, 2, 2, 1)
> getmode(v)
Output:
[1] 2
16. If a vector X = c (10, 11, 15, 13, 12, 19, 23, 8, 11, 16, 18), then apply the following
functions to find out (a) Maximum (b) Minimum (c) Mean (d) Median using R syntax.
Answer:
Define the vector ‘x':
> x = c(10, 11, 15, 13, 12, 19, 23, 8, 11, 16, 18)
(a) To find Maximum of ‘x':
> max(x)
Output:
[1] 23
(b) To find Minimum of ‘x':
> min(x)
Output:
[1] 8
(c) To find Mean of ‘x':
> mean(x)
Output:
17. If X = set of even values in between 32 to 56, then write the syntax for calculating the
following: (a) Mean (b) Median (c) Standard Deviation (d) Variance
Answer:
The set of even values lies in between 32 to 56 are: 32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52,
54, 56
Now, define the set of even values to the vector ‘x' in R console:
> x = c(32, 34, 36, 38, 40, 42, 44, 46, 48, 50, 52, 54, 56)
(a) To find Mean of ‘x':
> mean(x)
Output:
[1] 44
(b) To find Median of ‘x':
> median(x)
Output:
[1] 44
(c) To find Standard Deviation of ‘x':
> sd(x)
Output:
[1] 7.788881
(d) To find Variance of ‘x':
> var(x)
Output:
[1] 60.66667
18. Write about Regression Analysis and the steps to establish Regression Analysis in R.
Regression analysis is a very widely used statistical tool to establish a relationship model
between two variables. One of these variables is called predictor variable whose value is
gathered through experiments. The other variable is called response variable whose
value is derived from the predictor variable.
In Linear Regression these two variables are related through an equation, where
exponent (power) of both these variables is 1. Mathematically a linear relationship
represents a straight line when plotted as a graph. A non-linear relationship where the
exponent of any variable is not equal to 1 creates a curve.
The general mathematical equation for a linear regression is − y = ax + b
Following is the description of the parameters used −
· y is the response variable.
· x is the predictor variable.
· a and b are constants which are called the coefficients.
Steps to Establish a Regression:
A simple example of regression is predicting weight of a person when his height is
known. To do this we need to have the relationship between height and weight of a
person.
The steps to create the relationship is −
· Carry out the experiment of gathering a sample of observed values of height and
corresponding weight.
· Create a relationship model using the lm() functions in R.
S. VENKATA SIVA KUMAR
· Find the coefficients from the model created and create the mathematical equation
using these
· Get a summary of the relationship model to know the average error in prediction.
Also called residuals.
· To predict the weight of new persons, use the predict() function in R.
Input Data:
Below is the sample data representing the observations −
# Values of height
151, 174, 138, 186, 128, 136, 179, 163, 152, 131
# Values of weight.
63, 81, 56, 91, 47, 57, 76, 72, 62, 48
print(relation)
When we execute the above code, it produces the following result −
Call:
lm(formula = y ~ x)
Coefficients:
(Intercept) x
-38.4551 0.6746