You are on page 1of 62

RN SHETTY TRUST®

RNS INSTITUTE OF TECHNOLOGY


Autonomous Institution Affiliated to VTU, Recognized by GOK, Approved by AICTE
(NAAC ‘A+ Grade’ Accredited, NBA Accredited (UG - CSE, ECE, ISE, EIE and EEE)
Channasandra, Dr. Vishnuvardhan Road, Bengaluru - 560 098
Ph: (080)28611880, 28611881 URL: www.rnsit.ac.in

DEPARTMENT OF CSE (Data Science)

DATA ANALYTICS WITH R LAB MANUAL


(BDS306C)
(As per Visvesvaraya Technological University Course type- ESC)

Compiled by

DEPARTMENT OF CSE (Data Science)


R N S Institute of Technology
Bengaluru-98

Name:

USN:
RN SHETTY TRUST®
RNS INSTITUTE OF TECHNOLOGY
Autonomous Institution Affiliated to VTU, Recognized by GOK, Approved by AICTE
(NAAC ‘A+ Grade’ Accredited, NBA Accredited (UG - CSE, ECE, ISE, EIE and EEE)
Channasandra, Dr. Vishnuvardhan Road, Bengaluru - 560 098
Ph: (080)28611880, 28611881 URL: www.rnsit.ac.in

DEPARTMENT OF CSE (Data Science)

Vision of the Department

Empowering students to solve complex real-time computing problems


involving high volume multi-dimensional data.

Mission of the Department

 Provide quality education in both theoretical and applied Computer


Science to solve real world problems.
 Conduct research to develop algorithms that solve complex
problems involving multi-dimensional high volume data through
intelligent inferencing.
 Develop good linkages with industry and research organizations to
expose students to global problems and find optimal solutions.
 Creating confident Graduates who can contribute to the nation
through high levels of commitment following ethical practices and
with integrity.

2
Disclaimer

The information contained in this document is the proprietary and exclusive property of
RNS Institute except as otherwise indicated. No part of this document, in whole or in
part, may be reproduced, stored, transmitted, or used for course material development
purposes without the prior written permission of RNS Institute of Technology.

The information contained in this document is subject to change without notice. The
information in this document is provided for informational purposes only.

Trademark

Edition: 2023- 24

Document Owner
The primary contact for questions regarding this document is:

1. Dr. Mohan H S
Author(s):
2. Prof. Saravana M K

Department: CSE (Data Science)


Contact email ids : hod.datascience@rnsit.ac.in
saravana.mk@rnsit.ac.in

3
COURSE OUTCOMES
Course Outcomes: At the end of this course, students are able to:
CO1: Describe the structures of R Programming
CO2: Illustrate the basics of Data Preparation with real world examples
CO3: Apply the Graphical Packages of R for visualization
CO4: Apply various Statistical Analysis methods for data analytics.

COs and POs Mapping of lab Component

COURSE
PO1 PO2 PO3 PO4 PO5 PO6 PO7 PO8 PO9 PO10 PO11 PO12 PSO1 PSO2 PSO3 PSO4
OUTCOMES

CO1 3 3 3
CO2 3 3 3 3
CO3 3 3 3 3
CO4 3 3 3 3

4
Mapping of ‘Graduate Attributes’ (GAs) and ‘Program Outcomes’ (POs)
Graduate Attributes (GAs)
Program Outcomes (POs)
(As per Washington Accord
(As per NBA New Delhi)
Accreditation)
Apply the knowledge of mathematics, science, engineering fundamentals
Engineering Knowledge and an engineering specialization to the solution of complex engineering
problems

Identify, formulate, review research literature and analyze complex


Problem Analysis engineering problems reaching substantiated conclusions using first
principles of mathematics, natural sciences and engineering sciences.

Design solutions for complex engineering problems and design system


components or processes that meet the specified needs with appropriate
Design/Development of solutions
considerations for the public health and safety and the cultural, societal and
environmental consideration.

Use research – based knowledge and research methods including design of


Conduct Investigation of complex
experiments, analysis and interpretation of data and synthesis of the
problems
information to provide valid conclusions.

Create, select and apply appropriate techniques, resources and modern


Modern Tool Usage engineering and IT tools including prediction and modeling to complex
engineering activities with an understanding of the limitations.

Apply reasoning informed by the contextual knowledge to assess society,


The engineer and society health, safety, legal and cultural issues and the consequential
responsibilities relevant to the professional engineering practice.

Understand the impact of the professional engineering solutions in societal


Environment and sustainability and environmental context and demonstrate the knowledge of and need for
sustainable development.
Apply ethical principles and commit to professional ethics and
Ethics
responsibilities and norms of the engineering practice.
Function effectively as an individual and as a member or leader in diverse
Individual and team work
teams and in multidisciplinary settings.

Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as being able to
Communication
comprehend and write effective reports and design documentation, make
effective presentations and give and receive clear instructions.

Demonstrate knowledge and understanding of the engineering and


management principles and apply these to ones won work, as a member
Project management & finance
and leader in a team, to manage projects and in multidisciplinary
environments.
Recognize the need for and have the preparation and ability to engage in
Life Long Learning
independent and life-long learning in the broadest context of technological
change.
5
REVISED BLOOMS TAXONOMY (RBT)

6
PROGRAM LIST

Sl. NO. Program Description Page No.

1 Sample Programs 1 – 22
Demonstrate the steps for installation of R and R Studio. Perform the following:
2
a) Assign different type of values to variables and display the type of variable.
Assign different types such as Double, Integer, Logical, Complex and Character
and understand the difference between each data type. 23 – 31
b) Demonstrate Arithmetic and Logical Operations with simple examples.
c) Demonstrate generation of sequences and creation of vectors.
d) Demonstrate Creation of Matrices
e) Demonstrate the Creation of Matrices from Vectors using Binding Function.
f) Demonstrate element extraction from vectors, matrices and arrays
Assess the Financial Statement of an Organization being supplied with 2
3
vectors of data: Monthly Revenue and Monthly Expenses for the Financial
Year. You can create your own sample data vector for this experiment)
Calculate the following financial metrics:
a) Profit for each month.
b) Profit after tax for each month (Tax Rate is 30%).
c) Profit margin for each month equals to profit after tax divided by revenue.
d) Good Months – where the profit after tax was greater than the mean for the
year. 32 – 35
e) Bad Months – where the profit after tax was less than the mean for the year.
f) The best month – where the profit after tax was max for the year.
g) The worst month – where the profit after tax was min for the year.
Note:
a) All Results need to be presented as vectors
b) Results for Dollar values need to be calculated with $0.01 precision, but need
to be presented in Units of $1000 (i.e 1k) with no decimal points
c) Results for the profit margin ratio need to be presented in units of % with no
decimal point.
d) It is okay for tax to be negative for any given month (deferred tax asset)
e) Generate CSV file for the data.
4 Develop a program to create two 3 X 3 matrices A and B and perform the
following operations a) Transpose of the matrix b) addition c) subtraction d)
36 – 39
multiplication

5 Develop a program to find the factorial of given number using recursive 40


function calls.
Develop an R Program using functions to find all the prime numbers up to a
6 41 – 42
specified number by the method of Sieve of Eratosthenes.

7
The built-in data set mammals contain data on body weight versus brain weight.
7
Develop R commands to:
a) Find the Pearson and Spearman correlation coefficients. Are they similar? 43 – 45
b) Plot the data using the plot command.
c) Plot the logarithm (log) of each variable and see if that makes a difference.
Develop R program to create a Data Frame with following details and do the
8
following operations.

itemCode itemCategory itemPrice


1001 Electronics 700
1002 Desktop Supplies 300
1003 Office Supplies 350
1004 USB 400
46 - 48
1005 CD Drive 800

a) Subset the Data frame and display the details of only those items whose price
is greater than or equal to 350.
b) Subset the Data frame and display only the items where the category is either
“Office Supplies” or “Desktop Supplies”
c) Create another Data Frame called “item-details” with three different fields
itemCode, ItemQtyonHand and ItemReorderLvl and merge the two frames
Let us use the built-in dataset air quality which has Daily air quality
9
measurements in New York, May to September 1973. Develop R program to
generate histogram by using appropriate arguments for the following
statements.
a) Assigning names, using the air quality data set.
b) Change colors of the Histogram 49 –50
c) Remove Axis and Add labels to Histogram
d) Change Axis limits of a Histogram
e) Add Density curve to the histogram
Design a data frame in R for storing about 20 employee details. Create a CSV
10
file named “input.csv” that defines all the required information about the
employee such as id, name, salary, start_date, dept. Import into R and do the
following analysis.
a) Find the total number rows & columns
b) Find the maximum salary 51 – 53
c) Retrieve the details of the employee with maximum salary
d) Retrieve all the employees working in the IT Department.
e) Retrieve the employees in the IT Department whose salary is greater than
20000 and write these details into another file “output.csv”
Using the built in dataset mtcars which is a popular dataset consisting of the
11
design and fuel consumption patterns of 32 different automobiles. The data was
extracted from the 1974 Motor Trend US magazine, and comprises fuel 54 - 56
consumption and 10 aspects of automobile design and performance for 32
automobiles (1973-74 models). Format A data frame with 32 observations on
8
11 variables : [1] mpg Miles/(US) gallon, [2] cyl Number of cylinders [3] disp
Displacement (cu.in.), [4] hp Gross horsepower [5] drat Rear axle
ratio,[6] wt Weight (lb/1000) [7] qsec 1/4 mile time, [8] vs V/S, [9] am
Transmission (0 = automatic, 1 = manual), [10] gear Number of forward gears,
[11] carb Number of carburetors
Develop R program, to solve the following:
a) What is the total number of observations and variables in the dataset?
b) Find the car with the largest hp and the least hp using suitable functions
c) Plot histogram / density for each variable and determine whether continuous
variables are normally distributed or not. If not, what is their skewness?
d) What is the average difference of gross horse power(hp) between
automobiles with 3 and 4 number of cylinders(cyl)? Also determine the
difference in their standard deviations.
e) Which pair of variables has the highest Pearson correlation?
Demonstrate the progression of salary with years of experience using a suitable
12
data set (You can create your own dataset). Plot the graph visualizing the best
fit line on the plot of the given data points. Plot a curve of Actual Values vs. 57 – 60
Predicted values to show their correlation and performance of the model.
Interpret the meaning of the slope and y-intercept of the line with respect to the
given data. Implement using lm function. Save the graphs and coefficients in
files. Attach the predicted values of salaries as a new column to the original
data set and save the data as a new CSV file.
Additional Programs
13 61

14 Viva Questions 62

9
Sample Programs:

1. Write a program to list the distinct values in a vector from a given vector.

v = c(10, 10, 10, 20, 30, 40, 40, 40, 50)


print("Original vector:")
print(v)
print("Distinct values of the said vector:")
print(unique(v))

Output:

[1] "Original vector:"


> print(v)
[1] 10 10 10 20 30 40 40 40 50
> print("Distinct values of the said vector:")
[1] "Distinct values of the said vector:"
> print(unique(v))
[1] 10 20 30 40 50

2. Write a program to find the elements of a given vector that are not in another given
vector.

a = c(0, 10, 10, 10, 20, 30, 40, 40, 40, 50, 60)
b = c(10, 10, 20, 30, 40, 40, 50)
print("Original vector-1:")
print(a)
print("Original vector-2:")
print(b)
print("Elements of a that are not in b:")
result = setdiff(a, b)
print(result)

10
Output:

[1] "Original vector-1:"


> print(a)
[1] 0 10 10 10 20 30 40 40 40 50 60
> print("Original vector-2:")
[1] "Original vector-2:"
> print(b)
[1] 10 10 20 30 40 40 50
> print("Elements of a that are not in b:")
[1] "Elements of a that are not in b:"
> result = setdiff(a, b)
> print(result)
[1] 0 60

3. Write a R program to access the last value in a given vector.

x = c(10, 20, 30, 20, 20, 25, 9, 26)


print("Original Vectors:")
print(x)
print("Access the last value of the said vector:")
print(tail(x, n=1))

11
Output:

[1] "Original Vectors:"


> print(x)
[1] 10 20 30 20 20 25 9 26
> print("Access the last value of the said vector:")
[1] "Access the last value of the said vector:"
> print(tail(x, n=1))
[1] 26

4. Write a R program to find nth highest value in a given vector

x = c(10, 20, 30, 20, 20, 25, 9, 26)


print("Original Vectors:")
print(x)
print("nth highest value in a given vector:")
print("n = 1")
n=1
print(sort(x, TRUE)[n])
print("n = 2")
n=2
print(sort(x, TRUE)[n])
print("n = 3")
n=3
print(sort(x, TRUE)[n])
print("n = 4")
n=4
print(sort(x, TRUE)[n])

Output:
[1] "Original Vectors:"
> print(x)
[1] 10 20 30 20 20 25 9 26
> print("nth highest value in a given vector:")
12
[1] "nth highest value in a given vector:"
> print("n = 1")
[1] "n = 1"
>n=1
> print(sort(x, TRUE)[n])
[1] 30
> print("n = 2")
[1] "n = 2"
>n=2
> print(sort(x, TRUE)[n])
[1] 26
> print("n = 3")
[1] "n = 3"
>n=3
> print(sort(x, TRUE)[n])
[1] 25
> print("n = 4")
[1] "n = 4"
>n=4
> print(sort(x, TRUE)[n])
[1] 20

5. Write an R program to create a vector of a specified type and length. Create vector of
numeric, complex, logical and character types of length 6.

x = vector("numeric", 5)
print("Numeric Type:")
print(x)
c = vector("complex", 5)
print("Complex Type:")
print(c)
l = vector("logical", 5)
print("Logical Type:")
print(l)
13
chr = vector("character", 5)
print("Character Type:")
print(chr)

Output:
> x = vector("numeric", 5)
> print("Numeric Type:")
[1] "Numeric Type:"
> print(x)
[1] 0 0 0 0 0
> c = vector("complex", 5)
> print("Complex Type:")
[1] "Complex Type:"
> print(c)
[1] 0+0i 0+0i 0+0i 0+0i 0+0i
> l = vector("logical", 5)
> print("Logical Type:")
[1] "Logical Type:"
> print(l)
[1] FALSE FALSE FALSE FALSE FALSE
> chr = vector("character", 5)
> print("Character Type:")
[1] "Character Type:"
> print(chr)
[1] ""
"" "" "" ""

14
6. Write a program to find the factors of a given number

print_factors = function(n) {
print(paste("The factors of",n,"are:"))
for(i in 1:n) {
if((n %% i) == 0) {
print(i)
}}}
print_factors(4)
print_factors(7)
print_factors(12)

Output:
> print_factors(4)
[1] "The factors of 4 are:"
[1] 1
[1] 2
[1] 4
> print_factors(7)
[1] "The factors of 7 are:"
[1] 1
[1] 7
> print_factors(12)
[1] "The factors of 12 are:"
[1] 1
[1] 2
[1] 3
[1] 4
[1] 6
[1] 12

15
7. Write a script that will print "Even Number" if the variable x is an even number, otherwise
print "Not Even":

x <- 3 # Change x to test


if (x%%2 == 0){
print('Even Number')
}else{
print('Not Even')
}

Output:
> x <- 3 # Change x to test
> if (x%%2 == 0){
+ print('Even Number')
+ }else{
+ print('Not Even')
+ }
[1] "Not Even"

8. Write a script that will print 'Is a Matrix' if the variable x is a matrix, otherwise print "Not
a Matrix". Hint: You may want to check out help (is.matrix)

x <- matrix()
if (is.matrix(x)){
print('Is a Matrix')
}else{
print("Not a Matrix")
}

16
Output:
> x <- matrix()
> if (is.matrix(x)){ print('Is a Matrix')
+ }else{
+ print("Not a Matrix")
+ }
[1] "Is a Matrix"

9. R program to find inverse of a Matrix

# Create 3 different vectors # using combine method.


a1 <- c(3, 2, 5)
a2 <- c(2, 3, 2)
a3 <- c(5, 2, 4)
# bind the three vectors into a matrix using rbind() which is row-wise binding
A <- rbind (a1, a2, a3)
# print the original matrix
print(A)
# Use the solve() function to calculate the inverse.
T1 <- solve(A)
# print the inverse of the matrix.
print(T1)

Output:
> A <- rbind (a1, a2, a3)
> # print the original matrix
> print(A)
[,1] [,2] [,3]
a1 3 2 5
a2 2 3 2
a3 5 2 4

17
> # Use the solve() function to calculate the inverse.
> T1 <- solve(A)
> # print the inverse of the matrix.
> print(T1)
a1 a2 a3
[1,] -0.29629630 -0.07407407 0.4074074
[2,] -0.07407407 0.48148148 -0.1481481
[3,] 0.40740741 -0.14814815 -0.1851852

10. R Program to find determinant of a matrix

# Create 3 different vectors.


a1 <- c(3, 2, 8)
a2 <- c(6, 3, 2)
a3 <- c(5, 2, 4)

# Bind the 3 matrices row-wise


# using the rbind() function.
A <- rbind(a1, a2, a3)
# determinant of matrix
print(det(A))

Output:
> # determinant of matrix
> print(det(A))
[1] -28

11. R Program to draw the sine wave

t=seq(0,10,0.1)
y=sin(t)
plot(t,y,type="l", xlab="time", ylab="Sine wave")

18
Output:

library(ggplot2)
qplot(t,y,geom="path", xlab="time", ylab="Sine wave")

Output:

12. Write a R program to create a Dataframe

df <- data.frame(
Training = c("Strength", "Stamina", "other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# print the dataframe
print(df)

19
Output:
> print(df)
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 other 120 45

13. Write a R program to create a summary of the above Dataframe

summary(df)

> summary(df)

Training Pulse Duration


Length: 3 Min. :100.0 Min. :30.0
Class : character 1st Qu.:110.0 1st Qu.:37.5
Mode : character Median :120.0 Median :45.0
Mean :123.3 Mean :45.0
3rd Qu.:135.0 3rd Qu.:52.5
Max. :150.0 Max. :60.0

14. Write a R program to use single brackets [ ], double brackets [[ ]] or $ to access columns
from a data frame:

df <- data.frame(
Training = c("Strength", "Stamina", "other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
20
# print the dataframe
print(df)

df[1]
df[["Training"]]
df$Training

Output:

> df[1]
Training
1 Strength
2 Stamina
3 other
> df[["Training"]]
[1] "Strength" "Stamina" "other"
> df$Training
[1] "Strength" "Stamina" "other"

15. Write a R program to use rbind() function to add new rows in a Data Frame:

df <- data.frame(
Training = c("Strength", "Stamina", "other"),
Pulse = c(100, 150, 120),
Duration = c(60, 30, 45)
)
# Add a new row
New_row_DF <- rbind(df, c("strength",110,110))
# Print the new row
New_row_DF

21
Output:

> # Print the new row


> New_row_DF
Training Pulse Duration
1 Strength 100 60
2 Stamina 150 30
3 other 120 45
4 strength 110 11

22
Program 1

AIM: Demonstrate the steps for installation of R and R Studio. Perform the following:
a) Assign different type of values to variables and display the type of variable. Assign
different types such as Double, Integer, Logical, Complex and Character and understand
the difference between each data type.
b) Demonstrate Arithmetic and Logical Operations with simple examples.
c) Demonstrate generation of sequences and creation of vectors.
d) Demonstrate Creation of Matrices.
e) Demonstrate the Creation of Matrices from Vectors using Binding Function.
f) Demonstrate element extraction from vectors, matrices and arrays.

Source code:

a) Assign different type of values to variables and display the type of variable. Assign
different types such as Double, Integer, Logical, Complex and Character and
understand the difference between each data type.

# Assigning different types of values to variables


double_var <- 3.14
integer_var <- 42L
logical_var <- TRUE
complex_var <- 1 + 2i
character_var <- "Hello, World!"

# Displaying the values and types of variables


cat("Double Variable:", double_var, "- Type:", typeof(double_var), "\n")
cat("Integer Variable:", integer_var, "- Type:", typeof(integer_var), "\n")
cat("Logical Variable:", logical_var, "- Type:", typeof(logical_var), "\n")
cat("Complex Variable:", complex_var, "- Type:", typeof(complex_var), "\n")
cat("Character Variable:", character_var, "- Type:", typeof(character_var), "\n")

23
Output:

> cat("Double Variable:", double_var, "- Type:", typeof(double_var), "\n")

Double Variable: 3.14 - Type: double

> cat("Integer Variable:", integer_var, "- Type:", typeof(integer_var), "\n")

Integer Variable: 42 - Type: integer


> cat("Logical Variable:", logical_var, "- Type:", typeof(logical_var), "\n")

Logical Variable: TRUE - Type: logical

> cat("Complex Variable:", complex_var, "- Type:", typeof(complex_var), "\n")

Complex Variable: 1+2i - Type: complex

> cat("Character Variable:", character_var, "- Type:", typeof(character_var), "\n")

Character Variable: Hello, World! - Type: character

b) Demonstrate Arithmetic and Logical Operations with simple examples.

sum_result <- num1 + num2


difference_result <- num1 - num2
product_result <- num1 * num2
quotient_result <- num1 / num2
remainder_result <- num1 %% num2

# Displaying arithmetic results


cat("Arithmetic Operations:\n")
cat("Sum:", sum_result, "\n")

24
cat("Difference:", difference_result, "\n")
cat("Product:", product_result, "\n")
cat("Quotient:", quotient_result, "\n")
cat("Remainder:", remainder_result, "\n\n")

# Logical operations
logical_var1 <- TRUE
logical_var2 <- FALSE

and_result <- logical_var1 & logical_var2


or_result <- logical_var1 | logical_var2
not_result <- !logical_var1

# Displaying logical results


cat("Logical Operations:\n")
cat("AND:", and_result, "\n")
cat("OR:", or_result, "\n")
cat("NOT:", not_result, "\n")

Output:

# Displaying arithmetic results

> cat("Arithmetic Operations:\n")

Arithmetic Operations:

> cat("Sum:", sum_result, "\n")

25
Sum: 15

> cat("Difference:", difference_result, "\n")

Difference: 5

> cat("Product:", product_result, "\n")

Product: 50

> cat("Quotient:", quotient_result, "\n")

Quotient: 2

> cat("Remainder:", remainder_result, "\n\n")

Remainder: 0

# Displaying logical results

> cat("Logical Operations:\n")

Logical Operations:

> cat("AND:", and_result, "\n")

AND: FALSE

> cat("OR:", or_result, "\n")

OR: TRUE

> cat("NOT:", not_result, "\n")

NOT: FALSE

c) Demonstrate generation of sequences and creation of vectors.

print ("Sequence of numbers from 20 to 50:")


print (seq (20,50))
print ("Mean of numbers from 20 to 60:")
print (mean (20:60))
print ("Sum of numbers from 51 to 91:")
print (sum (51:91))

26
Output:

[1] "Sequence of numbers from 20 to 50:"


> print (seq (20,50))
[1] 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
[23] 42 43 44 45 46 47 48 49 50
> print ("Mean of numbers from 20 to 60:")
[1] "Mean of numbers from 20 to 60:"
> print (mean (20:60))
[1] 40
> print ("Sum of numbers from 51 to 91:")
[1] "Sum of numbers from 51 to 91:"
> print (sum (51:91))
[1] 2911

#Example 1
V1 = c(10, 20, 30, 40, 50, 60)
print('Content of the vector 1:')
print(V1)
#Example 2
V2 = seq(2,15)
print ("Content of the vector 2:")
print (V2)
#Example 3
V3= sample (-50:50, 10, replace=TRUE)
print ("Content of the vector 3:")

27
print (V3)

Output:
[1] " Content of the vector 1: "
> print (V1)
[1] 10 20 30 40 50 60
[1] "Content of the vector 2:"
> print (V2)
[1] 2 3 4 5 6 7 8 9 10 11 12 13 14 15

d) Demonstrate Creation of Matrices.

#Example 1
m1 = matrix(1:20, nrow=5, ncol=4)
print("5 × 4 matrix:")
print(m1)

#Example 2
cells = c(1,3,5,7,8,9,11,12,14)
rnames = c("Row1", "Row2", "Row3")
cnames = c("Col1", "Col2", "Col3")
m2 = matrix(cells, nrow=3, ncol=3, byrow=TRUE, dimnames=list(rnames, cnames))
print("3 × 3 matrix with labels, filled by rows: ")
print(m2)
print("3 × 3 matrix with labels, filled by columns: ")
m3 = matrix(cells, nrow=3, ncol=3, byrow=FALSE, dimnames=list(rnames, cnames))
print(m3)
28
Output:
[1] "5 × 4 matrix:"

> print(m1)

[,1] [,2] [,3] [,4]

[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 13 18
[4,] 4 9 14 19
[5,] 5 10 15 20

> print("3 × 3 matrix with labels, filled by rows: ")

[1] "3 × 3 matrix with labels, filled by rows: "

> print(m2)

Col1 Col2 Col3

Row1 1 3 5
Row2 7 8 9
Row3 11 12 14

> print("3 × 3 matrix with labels, filled by columns: ")

[1] "3 × 3 matrix with labels, filled by columns:"


> print(m3)

Col1 Col2 Col3

Row1 1 7 11
Row2 3 8 12

29
Row3 5 9 14

e) Demonstrate the Creation of Matrices from Vectors using Binding Function.

a<-c(1,2,3)
b<-c(4,5,6)
c<-c(7,8,9)
m<-cbind(a,b,c)
print("Content of the matrix:")
print(m)

Output:
[1] "Content of the said matrix:"

> print(m)

abc

[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

f) Demonstrate element extraction from vectors, matrices and arrays.


# Creating a vector
my_vector <- c(1, 2, 3, 4, 5)

# Extracting the third element from the vector


element_from_vector <- my_vector[3]
print(paste("Element from vector:", element_from_vector))

30
# Creating a matrix
my_matrix <- matrix(1:9, nrow = 3)

# Extracting the element in the second row and third column of the matrix
element_from_matrix <- my_matrix[2, 3]
print(paste("Element from matrix:", element_from_matrix))
# Creating a 3D array
my_array <- array(1:27, dim = c(3, 3, 3))
# Extracting the element in the second row, second column, and third depth of the array
element_from_array <- my_array[2, 2, 3]
print(paste("Element from array:", element_from_array))

Output:

[1] "Element from vector: 3"

[1] "Element from matrix: 8"

[1] "Element from array: 23"

31
Program 2

AIM: Assess the Financial Statement of an Organization being supplied with 2 vectors of
data: Monthly Revenue and Monthly Expenses for the Financial Year. You can create
your own sample data vector for this experiment) Calculate the following financial
metrics:
a. Profit for each month.
b. Profit after tax for each month (Tax Rate is 30%).
c. Profit margin for each month equals to profit after tax divided by revenue.
d. Good Months – where the profit after tax was greater than the mean for the year.
e. Bad Months – where the profit after tax was less than the mean for the year.
f. The best month – where the profit after tax was max for the year.
g. The worst month – where the profit after tax was min for the year.

Note:
a. All Results need to be presented as vectors
b. Results for Dollar values need to be calculated with $0.01 precision, but need to be
presented in Units of $1000 (i.e 1k) with no decimal points
c. Results for the profit margin ratio need to be presented in units of % with no decimal
point.
d. It is okay for tax to be negative for any given month (deferred tax asset)
e. Generate CSV file for the data.

# Sample data for Monthly Revenue and Monthly Expenses


monthly_revenue <- c(50000, 55000, 60000, 58000, 62000, 65000, 70000, 75000, 78000,
80000, 85000, 90000)
monthly_expenses <- c(35000, 38000, 40000, 42000, 45000, 48000, 50000, 52000,
55000, 58000, 60000, 62000)

# Calculate Profit for each month


profit <- monthly_revenue - monthly_expenses

# Tax Rate
tax_rate <- 0.3

# Calculate Profit After Tax for each month


profit_after_tax <- profit * (1 - tax_rate)

# Calculate Profit Margin for each month


profit_margin <- round(profit_after_tax / monthly_revenue * 100, 0)

# Calculate Mean Profit After Tax for the year


mean_profit_after_tax <- round(mean(profit_after_tax), 2)

# Identify Good Months, Bad Months, Best Month, and Worst Month
32
good_months <- which(profit_after_tax > mean_profit_after_tax)
bad_months <- which(profit_after_tax < mean_profit_after_tax)
best_month <- which.max(profit_after_tax)
worst_month <- which.min(profit_after_tax)

# Displaying the results as vectors


results_vector <- c("Monthly Revenue" = monthly_revenue / 1000,
"Monthly Expenses" = monthly_expenses / 1000,
"Profit" = round(profit / 1000, 2),
"Profit After Tax" = round(profit_after_tax / 1000, 2),
"Profit Margin" = paste0(profit_margin, "%"),
"Mean Profit After Tax for the Year" = mean_profit_after_tax,
"Good Months" = good_months,
"Bad Months" = bad_months,
"Best Month" = best_month,
"Worst Month" = worst_month)

# Displaying the results vector


print(results_vector)

# Generate CSV file


results_data <- data.frame(Month = 1:12,
Revenue = monthly_revenue,
Expenses = monthly_expenses,
Profit = profit,
Profit_After_Tax = profit_after_tax,
Profit_Margin = profit_margin)

write.csv(results_data, "financial_results.csv", row.names = FALSE)

Output:
Monthly Revenue1 Monthly Revenue2
"50" "55"
Monthly Revenue3 Monthly Revenue4
"60" "58"
Monthly Revenue5 Monthly Revenue6
"62" "65"
Monthly Revenue7 Monthly Revenue8
33
"70" "75"
Monthly Revenue9 Monthly Revenue10
"78" "80"
Monthly Revenue11 Monthly Revenue12
"85" "90"
Monthly Expenses1 Monthly Expenses2
"35" "38"
Monthly Expenses3 Monthly Expenses4
"40" "42"
Monthly Expenses5 Monthly Expenses6
"45" "48"
Monthly Expenses7 Monthly Expenses8
"50" "52"
Monthly Expenses9 Monthly Expenses10
"55" "58"
Monthly Expenses11 Monthly Expenses12
"60" "62"

Profit1 Profit2
"15" "17"
Profit3 Profit4
"20" "16"
Profit5 Profit6
"17" "17"
Profit7 Profit8
"20" "23"
Profit9 Profit10
"23" "22"
Profit11 Profit12
"25" "28"

Profit After Tax1 Profit After Tax2


"10.5" "11.9"
Profit After Tax3 Profit After Tax4
"14" "11.2"
Profit After Tax5 Profit After Tax6
"11.9" "11.9"
Profit After Tax7 Profit After Tax8
"14" "16.1"
Profit After Tax9 Profit After Tax10
"16.1" "15.4"
Profit After Tax11 Profit After Tax12
"17.5" "19.6"

Profit Margin1 Profit Margin2


"21%" "22%"
34
Profit Margin3 Profit Margin4
"23%" "19%"
Profit Margin5 Profit Margin6
"19%" "18%"
Profit Margin7 Profit Margin8
"20%" "21%"
Profit Margin9 Profit Margin10
"21%" "19%"
Profit Margin11 Profit Margin12
"21%" "22%"

Mean Profit After Tax for the Year Good Months1


"14175" "8"
Good Months2 Good Months3
"9" "10"
Good Months4 Good Months5
"11" "12"
Bad Months1 Bad Months2
"1" "2"
Bad Months3 Bad Months4
"3" "4"
Bad Months5 Bad Months6
"5" "6"
Bad Months7 Best Month
"7" "12"
Worst Month
"1"

Program 3
35
Develop a program to create two 3 X 3 matrices A and B and perform the following
operations:
a) Transpose of the matrix b) Addition c) Subtraction d) Multiplication

# Function to read a matrix from the user


read_matrix <- function(rows, cols) {
cat("Enter matrix elements row-wise:\n")
matrix_elements <- numeric()

for (i in 1:rows) {
row_input <- numeric()

for (j in 1:cols) {
prompt <- paste("Enter element at position [", i, ",", j, "]: ")
element <- as.numeric(readline(prompt))
row_input <- c(row_input, element)
}

matrix_elements <- c(matrix_elements, row_input)


}

matrix(matrix_elements, nrow = rows, byrow = TRUE)


}

# Read matrices A and B from the user


rows <- 3
cols <- 3

cat("Enter Matrix A:\n")


A <- read_matrix(rows, cols)

cat("\nEnter Matrix B:\n")


B <- read_matrix(rows, cols)

# Displaying matrices A and B


cat("\nMatrix A:\n")
print(A)
cat("\nMatrix B:\n")
print(B)
# Transpose of matrices A and B
cat("\nTranspose of Matrix A:\n")
print(t(A))

36
cat("\nTranspose of Matrix B:\n")
print(t(B))

# Addition of matrices A and B


cat("\nAddition of Matrices A and B:\n")
print(A + B)

# Subtraction of matrices A and B


cat("\nSubtraction of Matrices A and B:\n")
print(A - B)

# Multiplication of matrices A and B


cat("\nMultiplication of Matrices A and B:\n")
print(A %*% B)

Output:
> A <- read_matrix(rows, cols)

Enter matrix elements row-wise:


Enter element at position [ 1 , 1 ]: 2
Enter element at position [ 1 , 2 ]: 3
Enter element at position [ 1 , 3 ]: 4
Enter element at position [ 2 , 1 ]: 5
Enter element at position [ 2 , 2 ]: 6
Enter element at position [ 2 , 3 ]: 7
Enter element at position [ 3 , 1 ]: 7
Enter element at position [ 3 , 2 ]: 8
Enter element at position [ 3 , 3 ]: 9

> B <- read_matrix(rows, cols)

Enter matrix elements row-wise:


Enter element at position [ 1 , 1 ]: 1
Enter element at position [ 1 , 2 ]: 0
Enter element at position [ 1 , 3 ]: 0
Enter element at position [ 2 , 1 ]: 0
Enter element at position [ 2 , 2 ]: 1
Enter element at position [ 2 , 3 ]: 0
37
Enter element at position [ 3 , 1 ]: 0
Enter element at position [ 3 , 2 ]: 0
Enter element at position [ 3 , 3 ]: 1

> print(A)

[,1] [,2] [,3]

[1,] 2 3 4
[2,] 5 6 7
[3,] 7 8 9

Matrix B:

>B
[,1] [,2] [,3]

[1,] 1 0 0
[2,] 0 1 0
[3,] 0 0 1

> print(t(A))

[,1] [,2] [,3]

[1,] 2 5 7
[2,] 3 6 8
[3,] 4 7 9

> print(A + B)

[,1] [,2] [,3]

[1,] 3 3 4
[2,] 5 7 7
[3,] 7 8 10

> print(A %*% B)

[,1] [,2] [,3]

[1,] 2 3 4
[2,] 5 6 7
[3,] 7 8 9

> print(A - B)
38
[,1] [,2] [,3]

[1,] 1 3 4
[2,] 5 5 7
[3,] 7 8 8

Program 4
39
Develop a program to find the factorial of given number using recursive function calls.

# Recursive function to calculate factorial


factorial <- function(n) {
if (n == 0 || n == 1) {
return(1)
} else {
return(n * factorial(n - 1))
}
}

# Get user input for the number


num <- as.integer(readline("Enter a number to calculate factorial: "))

# Calculate factorial using the recursive function


result <- factorial(num)

# Display the result


cat("Factorial of", num, "is:", result, "\n")

Output:
> Enter a number to calculate factorial: 5

Factorial of 5 is: 120

> Enter a number to calculate factorial: 10

Factorial of 10 is: 3628800

40
Program 5
Develop an R Program using functions to find all the prime numbers up to a specified
number by the method of Sieve of Eratosthenes.

# Function to find all prime numbers up to a specified number using Sieve of


Eratosthenes
prime_numbers <- function(n) {
if (n >= 2) {
x = seq(2, n)
prime_nums = c()
for (i in seq(2, n)) {
if (any(x == i)) {
prime_nums = c(prime_nums, i)
x = c(x[(x %% i) != 0], i)
}
}
return(prime_nums)
}
else
{
stop("Input number should be at least 2.")
}
}
# Get user input for the specified number
limit <- as.integer(readline("Enter a number to find all prime numbers up to: "))

# Find prime numbers using the Sieve of Eratosthenes


primes <- prime_numbers(limit)

# Display the prime numbers


cat("Prime numbers up to", limit, "are:", primes, "\n")
41
Output:
> Enter a number to find all prime numbers up to: 10
Prime numbers up to 10 are: 2 3 5 7
> Enter a number to find all prime numbers up to: 20
Prime numbers up to 20 are: 2 3 5 7 11 13 17 19

42
Program 6
The built-in data set mammals containing data on body weight versus brain weight.
Develop R commands to:
a) Find the Pearson and Spearman correlation coefficients. Are they similar?
b) Plot the data using the plot command.
c) Plot the logarithm (log) of each variable and see if that makes a difference.

#only for First time


> install.packages("MASS",repos="http://lib.stat.cmu.edu/R/CRAN")
# Load the mammals dataset
data("mammals", package = "MASS")
> head(mammals, 10)
# a) Find the Pearson and Spearman correlation coefficients
pearson_corr <- cor(mammals$body, mammals$brain, method = "pearson")
spearman_corr <- cor(mammals$body, mammals$brain, method = "spearman")

# Print the correlation coefficients


cat("Pearson correlation coefficient:", pearson_corr, "\n")
cat("Spearman correlation coefficient:", spearman_corr, "\n")

# Check if they are similar


if (abs(pearson_corr - spearman_corr) < 0.01) {
cat("The Pearson and Spearman correlation coefficients are similar.\n")
} else {
cat("The Pearson and Spearman correlation coefficients are different.\n")
}

# b) Plot the data using the plot command


plot(mammals$body, mammals$brain,
xlab = "Body Weight", ylab = "Brain Weight",
main = "Body Weight vs Brain Weight")

# c) Plot the logarithm (log) of each variable and see if that makes a difference
43
plot(log(mammals$body), log(mammals$brain),
xlab = "Log Body Weight", ylab = "Log Brain Weight",
main = "Log Body Weight vs Log Brain Weight")

Note: Pearson and Spearman correlation coefficients are two widely used statistical
measures when measuring the relationship between variables. The Pearson correlation
coefficient assesses the linear relationship between variables, while the Spearman
correlation coefficient evaluates the monotonic relationship.

Output:

Pearson correlation coefficient: 0.9341638

Spearman correlation coefficient: 0.9534986

The Pearson and Spearman correlation coefficients are different.

Note: As we can see both the correlation coefficients give the positive correlation value
for body and brain of the mammals but the value given by them is slightly different
because Pearson correlation coefficients measure the linear relationship between the
variables while Spearman correlation coefficients measure only monotonic relationships,
relationship in which the variables tend to move in the same/opposite direction but not
necessarily at a constant rate whereas the rate is constant in a linear relationship.

44
45
Program 7
Develop R program to create a Data Frame with following details and do the following
operations.

itemCode itemCategory itemPrice


1001 Electronics 700
1002 Desktop Supplies 300
1003 Office Supplies 350
1004 USB 400
1005 CD Drive 800

a) Subset the Data frame and display the details of only those items whose price is greater
than or equal to 350.
b) Subset the Data frame and display only the items where the category is either “Office
Supplies” or “Desktop Supplies”.
c) Create another Data Frame called “item-details” with three different fields itemCode,
ItemQtyonHand and ItemReorderLvl and merge the two frames.

# Creating a Data Frame


items <- data.frame(
itemCode = c(1001, 1002, 1003, 1004, 1005),
itemCategory = c("Electronics", "Desktop Supplies", "Office Supplies", "USB", "CD
Drive"),
itemPrice = c(700, 300, 350, 400, 800)
)
# Display the dataframe
cat("Original Data Frame:\n")
print(items)

# a) Subset the Data frame for items whose price is greater than or equal to 350
highPricedItems <- subset(items, itemPrice >= 350)
cat("\nHigh Priced Items:\n")
print(highPricedItems)

46
# b) Subset the Data frame for items with category "Office Supplies" or "Desktop
Supplies"
officeDesktopItems <- subset(items, itemCategory %in% c("Office Supplies", "Desktop
Supplies"))
cat("\nOffice and Desktop Supplies:\n")
print(officeDesktopItems)

# c) Create another Data Frame "item-details"


itemDetails <- data.frame(
itemCode = c(1001, 1002, 1003, 1004, 1005),
itemQtyOnHand = c(20, 30, 25, 15, 10),
itemReorderLvl = c(5, 10, 8, 7, 3)
)
# Merge the two Data Frames based on the common column "itemCode"
mergedItems <- merge(items, itemDetails, by = "itemCode")
cat("\nMerged Data Frame:\n")
print(mergedItems)
print(Sys.Date())
print(Sys.time())

Output:

> # Display the dataframe


> cat("Original Data Frame:\n")
Original Data Frame:
> print(items)

itemCode itemCategory itemPrice


47
1 1001 Electronics 700
2 1002 Desktop Supplies 300
3 1003 Office Supplies 350
4 1004 USB 400
5 1005 CD Drive 800

> # a) Subset the Data frame for items whose price is greater than or equal to 350
High Priced Items:

> print(highPricedItems)
itemCode itemCategory itemPrice
1 1001 Electronics 700
3 1003 Office Supplies 350
4 1004 USB 400
5 1005 CD Drive 800

> # b) Subset the Data frame for items with category "Office Supplies" or "Desktop
Supplies"

> cat("\nOffice and Desktop Supplies:\n")


Office and Desktop Supplies:

> print(officeDesktopItems)
itemCode itemCategory itemPrice
2 1002 Desktop Supplies 300
3 1003 Office Supplies 350

> # c) Create another Data Frame "item-details"

> # Merge the two Data Frames based on the common column "itemCode"

Merged Data Frame:


itemCode itemCategory itemPrice itemQtyOnHand itemReorderLvl
1 1001 Electronics 700 20 5
2 1002 Desktop Supplies 300 30 10
3 1003 Office Supplies 350 25 8
4 1004 USB 400 15 7
5 1005 CD Drive 800 10 3

48
Program 8
Let us use the built-in dataset air quality which has Daily air quality measurements in
New York, May to September 1973. Develop R program to generate histogram by using
appropriate arguments for the following statements.
a) Assigning names, using the air quality data set.
b) Change colors of the Histogram
c) Remove Axis and Add labels to Histogram
d) Change Axis limits of a Histogram
e) Add Density curve to the histogram

#Load the airquality dataset


data(airquality)
# a) Assigning names using the air quality data set
names(airquality) <- c("Ozone", "Solar_Rad", "Wind", "Temp", "Month", "Day")

# Create a new plot window before each plot


par(mfrow = c(3, 2))

# b) Change colors of the Histogram


hist(airquality$Ozone, col = "skyblue", main = "Histogram of Ozone Levels", xlab =
"Ozone Levels")

# c) Remove Axis and Add labels to Histogram


hist(airquality$Ozone, col = "lightgreen", axes = FALSE,
main = "Customized Histogram of Ozone Levels",
xlab = "Ozone Levels",
ylab = "Frequency")

# d) Change Axis limits of a Histogram


hist(airquality$Ozone, col = "lightcoral",
xlim = c(0, 150), ylim = c(0, 25),
xlab = "Ozone Levels", ylab = "Frequency",
main = "Histogram of Ozone Levels")
49
# e) Add Density curve to the histogram
non_missing_values <- airquality$Ozone[!is.na(airquality$Ozone)]
hist(non_missing_values, col = "lightblue", prob = TRUE,
xlab = "Ozone Levels", ylab = "Frequency")

# Add density curve


lines(density(non_missing_values), col = "red", lwd = 2)
# Reset par to default settings
par(mfrow = c(1, 1))

Note: In this code, the par(mfrow = c(3, 2)) command sets up a 3x2 grid for plotting, and
par(mfrow = c(1, 1)) resets it to the default settings after all plots are done.

Output:

Program 9
50
AIM: Write a R program to create a 5 × 4 matrix, 3 × 3 matrix with labels and fill the
matrix by rows and 2 × 2 matrix with labels and fill the matrix by columns

# Create a sample employee data frame


employee_data <- data.frame(
id = c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20),
name = c("John", "Jane", "Bob", "Alice", "Eva", "Mike", "Sara", "Tom", "Linda",
"Chris",
"David", "Sophie", "Alex", "Emily", "Peter", "Olivia", "Ryan", "Mia", "Daniel",
"Grace"),
salary = c(50000, 60000, 75000, 90000, 80000, 70000, 85000, 95000, 55000, 72000,
68000, 82000, 60000, 55000, 75000, 88000, 92000, 60000, 80000, 77000),
start_date = as.Date(c("2022-01-01", "2021-11-15", "2022-02-10", "2020-05-20",
"2021-03-10",
"2022-04-01", "2021-09-03", "2020-12-15", "2021-01-10", "2022-06-05",
"2021-08-10", "2020-10-15", "2022-03-02", "2021-02-28", "2021-07-12",
"2020-06-25", "2022-05-10", "2021-04-05", "2022-03-15", "2020-11-
30")),
dept = c("IT", "HR", "Finance", "IT", "Marketing", "IT", "Finance", "IT", "HR", "IT",
"Finance", "Marketing", "IT", "HR", "IT", "Marketing", "Finance", "IT",
"Marketing", "IT")
)

# Export the data frame to a CSV file


write.csv(employee_data, file = "input.csv", row.names = FALSE)

# Read the CSV file into an R data frame


employee_data <- read.csv("input.csv")

# a) Find the total number of rows & columns


num_rows <- nrow(employee_data)
51
num_cols <- ncol(employee_data)
cat("Total number of rows:", num_rows, "\n")
cat("Total number of columns:", num_cols, "\n")

# b) Find the maximum salary


max_salary <- max(employee_data$salary)
cat("Maximum salary:", max_salary, "\n")

# c) Retrieve the details of the employee with the maximum salary


employee_max_salary <- employee_data[employee_data$salary == max_salary, ]
cat("Details of employee with maximum salary:\n")
print(employee_max_salary)

# d) Retrieve all the employees working in the IT Department


it_department <- subset(employee_data, dept == "IT")
cat("Employees working in the IT Department:\n")
print(it_department)

# e) Retrieve the employees in the IT Department whose salary is greater than 20000
it_high_salary <- subset(it_department, salary > 20000)

# Write these details into another file "output.csv"


write.csv(it_high_salary, file = "output.csv", row.names = FALSE)

Output:

52
# a) Find the total number of rows & columns
> cat("Total number of rows:", num_rows, "\n")
Total number of rows: 20
> cat("Total number of columns:", num_cols, "\n")
Total number of columns: 5

# b) Find the maximum salary


> max_salary <- max(employee_data$salary)
> cat("Maximum salary:", max_salary, "\n")
Maximum salary: 95000

# c) Retrieve the details of the employee with the maximum salary


> cat("Details of employee with maximum salary:\n")
Details of employee with maximum salary:
> print(employee_max_salary)
id name salary start_date dept
8 Tom 95000 2020-12-15 IT

# d) Retrieve all the employees working in the IT Department


Employees working in the IT Department:
> print(it_department)
id name salary start_date dept
1 John 50000 2022-01-01 IT
4 Alice 90000 2020-05-20 IT
6 Mike 70000 2022-04-01 IT
8 Tom 95000 2020-12-15 IT
10 Chris 72000 2022-06-05 IT
13 Alex 60000 2022-03-02 IT
15 Peter 75000 2021-07-12 IT
18 Mia 60000 2021-04-05 IT
20 Grace 77000 2020-11-30 IT

Program 10
53
AIM: Using the built in dataset mtcars which is a popular dataset consisting of the
design and fuel consumption patterns of 32 different automobiles. The data was extracted
from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10
aspects of automobile design and performance for 32 automobiles (1973-74 models).

Format a data frame with 32 observations on 11 variables : [1] mpg Miles/(US) gallon,
[2] cyl Number of cylinders [3] disp Displacement (cu.in.), [4] hp Gross horsepower [5]
drat Rear axle ratio,[6] wt Weight (lb/1000) [7] qsec 1/4 mile time, [8] vs V/S, [9] am
Transmission (0 = automatic, 1 = manual), [10] gear Number of forward gears, [11] carb
Number of carburetors.

Develop R program, to solve the following:

a) What is the total number of observations and variables in the dataset?


b) Find the car with the largest hp and the least hp using suitable functions
c) Plot histogram / density for each variable and determine whether continuous variables
are normally distributed or not. If not, what is their skewness?
d) What is the average difference of gross horse power(hp) between automobiles with 3
and 4 number of cylinders(cyl)? Also determine the difference in their standard
deviations.
e) Which pair of variables has the highest Pearson correlation?

# Load the mtcars dataset


data(mtcars)
# a) What is the total number of observations and variables in the dataset?
num_observations <- nrow(mtcars)
num_variables <- ncol(mtcars)
cat("Total number of observations:", num_observations, "\n")
cat("Total number of variables:", num_variables, "\n")

# b) Find the car with the largest hp and the least hp


car_max_hp <- mtcars[which.max(mtcars$hp), ]
car_min_hp <- mtcars[which.min(mtcars$hp), ]
cat("Car with the largest hp:\n")
print(car_max_hp)
cat("Car with the least hp:\n")
print(car_min_hp)

# c) Plot histogram / density for each variable


par(mfrow = c(3, 4)) # Set up a 3x4 grid for plotting

for (col in names(mtcars)) {


hist(mtcars[, col], main = col, col = "lightblue", xlab = col)
}
# Reset par to default settings
par(mfrow = c(1, 1))
54
# d) Average difference of gross horsepower (hp) between automobiles with 3 and 4
cylinders (cyl)
average_diff_hp <- mean(mtcars$hp[mtcars$cyl == 3]) - mean(mtcars$hp[mtcars$cyl ==
4])
std_diff_hp <- sd(mtcars$hp[mtcars$cyl == 3]) - sd(mtcars$hp[mtcars$cyl == 4])
cat("Average difference of hp between 3 and 4 cylinders:", average_diff_hp, "\n")
cat("Difference in standard deviations of hp between 3 and 4 cylinders:", std_diff_hp,
"\n")

# e) Which pair of variables has the highest Pearson correlation?


cor_matrix <- cor(mtcars)
max_correlation <- max(cor_matrix[upper.tri(cor_matrix)])
indices <- which(cor_matrix == max_correlation, arr.ind = TRUE)
cat("Pair of variables with the highest Pearson correlation:\n")
print(names(mtcars)[indices[, 2:1]])

Output:

Total number of observations: 32

Total number of variables: 11

> # b) Find the car with the largest hp and the least hp

Car with the largest hp:

mpg cyl disp hp drat wt qsec vs am gear carb

Maserati Bora 15 8 301 335 3.54 3.57 14.6 0 1 5 8

Car with the least hp:

mpg cyl disp hp drat wt qsec vs am gear carb

Honda Civic 30.4 4 75.7 52 4.93 1.615 18.52 1 1 4 2


> # c) Plot histogram / density for each variable

55
# d) Average difference of gross horsepower (hp) between automobiles with 3 and 4
cylinders (cyl)

Average difference of hp between 3 and 4 cylinders: NaN


Difference in standard deviations of hp between 3 and 4 cylinders: NA
# e) Which pair of variables has the highest Pearson correlation?
Pair of variables with the highest Pearson correlation:

[1] "cyl" "disp" "disp" "cyl"


Program 11
56
AIM: Demonstrate the progression of salary with years of experience using a suitable
data set (You can create your own dataset). Plot the graph visualizing the best fit line on
the plot of the given data points. Plot a curve of Actual Values vs. Predicted values to
show their correlation and performance of the model.
Interpret the meaning of the slope and y-intercept of the line with respect to the given
data. Implement using lm function. Save the graphs and coefficients in files. Attach the
predicted values of salaries as a new column to the original data set and save the data as a
new CSV file.

# Set seed for reproducibility


set.seed(123)

# Create a synthetic dataset


years_of_experience <- seq(1, 20, by = 1)
salary <- 30000 + 1500 * years_of_experience + rnorm(length(years_of_experience),
mean = 0, sd = 3000)

# Combine into a data frame


salary_data <- data.frame(Experience = years_of_experience, Salary = salary)
print (salary_data)

# Fit a linear regression model


lm_model <- lm(Salary ~ Experience, data = salary_data)

# Get model coefficients


coefficients <- coef(lm_model)

# Plot the data points and the best-fit line


plot(salary_data$Experience, salary_data$Salary, main = "Salary vs Years of
Experience", xlab = "Years of Experience", ylab = "Salary", col = "blue", pch = 16)
abline(lm_model, col = "red", lwd = 2)

# Save the plot


57
png("salary_vs_experience_plot.png")
plot(salary_data$Experience, salary_data$Salary, main = "Salary vs Years of
Experience",
xlab = "Years of Experience", ylab = "Salary", col = "blue", pch = 16)
abline(lm_model, col = "red", lwd = 2)
dev.off()

# Plot Actual vs Predicted values


predicted_values <- predict(lm_model, newdata = salary_data)
plot(salary_data$Salary, predicted_values, main = "Actual vs Predicted Values",
xlab = "Actual Salary", ylab = "Predicted Salary", col = "blue", pch = 16)

# Save the plot


png("actual_vs_predicted_plot.png")
plot(salary_data$Salary, predicted_values, main = "Actual vs Predicted Values", xlab =
"Actual Salary", ylab = "Predicted Salary", col = "blue", pch = 16)
dev.off()

# Interpretation of slope and y-intercept


cat("Interpretation of the linear regression model:\n")
cat("Slope (Coefficient for Experience):", coefficients["Experience"], "\n")
cat("Y-Intercept:", coefficients["(Intercept)"], "\n")

# Attach predicted values as a new column to the original data set


salary_data$Predicted_Salary <- predicted_values

# Save the data with predicted values as a new CSV file


write.csv(salary_data, file = "salary_data_with_predictions.csv", row.names = FALSE)

58
Output:
salary_data
Experience Salary Predicted_Salary
1 1 29818.57 32382.14
2 2 32309.47 33834.01
3 3 39176.12 35285.88
4 4 36211.53 36737.74
5 5 37887.86 38189.61
6 6 44145.19 39641.47
7 7 41882.75 41093.34
8 8 38204.82 42545.21
9 9 41439.44 43997.07
10 10 43663.01 45448.94
11 11 50172.25 46900.80
12 12 49079.44 48352.67
13 13 50702.31 49804.54
14 14 51332.05 51256.40
15 15 50832.48 52708.27
16 16 59360.74 54160.13
17 17 56993.55 55612.00
18 18 51100.15 57063.87
19 19 60604.07 58515.73
20 20 58581.63 59967.60

59
> # Interpretation of slope and y-intercept
Interpretation of the linear regression model:
Slope (Coefficient for Experience): 1451.866
Y-Intercept: 30930.28

60
Additional Programs
1. Write an R Program to find the Factors of a Number.
2. Write an R Program to find the Factorial of a Number.
3. Write an R Program to find the Factorial of a Number Using Recursion.
4. Write an R program to convert Decimal into Binary using Recursion.
5. Write an R Program to check Armstrong Number.
6. Write an R Program to add Two Vectors.
7. Write an R program to Fibonacci Sequence Using Recursion in R.
8. Write an R Program to print the Fibonacci Sequence.
9. Write an R Program to find H.C.F. or G.C.D.
10. Write an R "Hello World" Program.
11. Write an R Program to check for Leap Year.
12. Write an R Program to find L.C.M.
13. Write an R Program to find Minimum and Maximum value in a Vector.
14. Write an R Program to print Multiplication Table
15. Write an R Program to generate Random Number from Standard Distributions
16. Write an R Program to sample from a Population
17. Write an R Program to make a Simple Calculator
18. Write an R Program to sort a Vector
19. Find Sum, Mean and Product of Vector in R Programming
20. Write an R Program to find Sum of Natural Numbers Using Recursion
21. Write an R Program to find the Sum of Natural Numbers

61
Viva Questions:

1. What are the main features of R?


2. What is the difference between R and other programming languages?
3. Explain the basics of R programming syntax?
4. How to create variables and assign values in R?
5. Explain the different data types in R and how to handle them?
6. How will you import and export data in R?
7. What is a data frames in R?
8. How will you perform basic statistical analysis in R?
9. What are the different data visualization techniques in R?
10. Explain the concept of packages in R and how to use them?
11. What are the different control structures in R and how to use them?
12. Explain the concept of functions in R and how to write them?
13. Explain the concept of object-oriented programming in R?
14. What are the different classes in R and how to create them?
15. Explain the concept of exception handling in R?
16. Explain the concept of regular expressions in R and how to use them?
17. Explain the concept of time series analysis in R?
18. What is Shiny in R and how to use it for web development?
19. Write a function in R to calculate the mean of a given vector of numbers.
20. How will you create a scatter plot of two given vectors of numeric data in R?
21. How will you create a histogram of a given vector of numeric data in R?
22. How will you generate random numbers from a specified distribution in R?
23. How will you calculate the standard deviation of a given vector of numbers in R?
24. How will you compute the correlation between two given vectors of numeric data
in R?
25. How will you create a bar plot of a given vector of categorical data in R?

62

You might also like