You are on page 1of 71

Institute’s Vision

To be an organisation with potential for excellence in engineering and


management for the advancement of society and human kind.

Institute’s Mission

To excel in academics, practical engineering, management and to


commence research endeavours.

To prepare students for future opportunities.

To nurture students with social and ethical responsibilities.


Department’s Vision

To create IT graduates with ethical and employable skills.

Department’s Mission

To imbibe problem solving and analytical skills through teaching learning


process.
To impart technical and managerial skills to meet the industry requirement.
To encourage ethical and value-based education.
Excelssior’s Education Society
K. C. COLLEGE OF ENGINEERING
AND MANAGEMENT STUDIES AND
RESEARCH THANE (EAST).

Certificate

This is to certify that Mr. / Ms. Snehal S. Fadale

of Semester VIII Branch I.T. Roll No. 04

has performed and successfully completed all the practical’s in the subject
of R Programming Lab
for the academic year 2020 to 2021 as prescribed by University of Mumbai.

DATE: - 19/05/2021.

Practical In charge Internal Examiner

__ __ __ _ __ __ __

Head of Department External Examiner

COLLEGE SEAL
Program Outcomes

Engineering Graduates will be able to:

1. Engineering knowledge: Apply the knowledge of mathematics, science, engineering


fundamentals, and an engineering specialization to the solution of complex engineering
problems.

2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.

3. Design/development of solutions: Design solutions for complex engineering problems and


design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.

4. Conduct investigations of complex problems: Use research-based knowledge and research


methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions. 


5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modelling to complex engineering activities
with an understanding of the limitations.

6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.


7. Environment and sustainability: Understand the impact of the professional engineering


solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.


9. Individual and team work: Function effectively as an individual, and as a member or leader
in diverse teams, and in multidisciplinary settings.

10. Communication: Communicate effectively on complex engineering activities with the


engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.

11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.


12. Life-long learning: Recognize the need for, and have the preparation and ability to engage
in independent and life-long learning in the broadest context of technological change.
Department of Information Technology

Subject: R Programming Lab

Semester: VIII

Class: BE

Course Outcomes / Lab Outcomes

Course Code (ITL804) Course Outcomes

At the end of experiment student will able to


ITL804.1 Install and use R for simple programming tasks

ITL804.2 Extend the functionality of R by using add-on packages

ITL804.3 Extract data from files and other sources and perform
various data manipulation tasks on them.
ITL804.4 Code statistical functions in R

ITL804.5 Use R Graphics and Tables to visualize results of


various statistical operations on data
ITL804.6 Apply the knowledge of R gained to data Analytics for
real life applications.
Rubrics for Practical

Rubrics Maximum 15-12 12-9 9-6 6-0


Description Marks
Weight

Implementation Successful Output correct Few errors in Incorrect


(R1) completion with but not the output Output
5 accurate output precise (3-2) (2-0)
(5-4) (4-3)

Understanding Understanding Understand Improper No


(R2) Experiment and Experiment Conclusion Conclusion
drawn correct but (3-2) (2-0)
conclusion conclusion
5 (5-4) less accurate
(4-3)

Punctuality and Submission Submission Submission Submission


Discipline within a week after week (4- after two after three
(R3) 5 (5-4) 3) weeks (3-2) weeks and
more (2-0)
CONTENTS

Sr. Date of Date of Page Grade


Name of Experiment Submission / Sign
No Conduction No.
Marks

1 Installation of R, Exploration of R Tools. 10/3/2021 17/3/2021 12

2 Perform Basic Operation On R 17/3/2021 31/3/2021 20


Operators
Built-in Functions
Data Structures in R
Data Types - Vectors
R Data Structures - Matrices
3 Grouping Loops and Conditional Execution. 31/3/2021 7/4/2021 32

4 Exploratory data analysis using data Frames 7/4/2021 14/4/2021 40

5 Graphics in R (Graphs, plot) 14/4/2021 21/4/2021 47

6 Regression and Correlation 21/4/2021 28/4/2021 53

7 Mini Project 28/4/2021 5/5/2021 60

8
9
10
Total Grade / Marks:

Avg. marks of Experiments Avg. marks of Assignments


(A) (B) Total Marks

(A+B)
Obtained Out of Obtained Out of

Practical Incharge Date


EXPERIMENT NO. - 1

Aim of the experiment: Installation of R, Exploration of R Tools.

ITL804.1: - Install and use R for simple programming tasks


Lab Outcome: ITL804.2: - Extend the functionality of R by using add-on
packages
ITL804.3: - Extract data from files and other sources and perform
various data manipulation tasks on them.
Date of Conduction: 10/3/2021 Date of Submission: 17/3/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
EXPERIMENT NO. 1

AIM: Installation of R, Exploration of R Tools.

THEORY:

What is R?
R is a language and environment for statistical computing and graphics. It is a GNU project which
is similar to the S language and environment which was developed at Bell Laboratories (formerly
AT&T, now Lucent Technologies) by John Chambers and colleagues. R can be considered as a
different implementation of S. There are some important differences, but much code written for S
runs unaltered under R.

The R environment
R is an integrated suite of software facilities for data manipulation, calculation and graphical
display. It includes
• an effective data handling and storage facility,
• a suite of operators for calculations on arrays, in particular matrices,
• a large, coherent, integrated collection of intermediate tools for data analysis,
• graphical facilities for data analysis and display either on-screen or on hardcopy, and
• a well-developed, simple and effective programming language which includes
conditionals, loops, user-defined recursive functions and input and output facilities.
Installation of R and R Studio:

1. Install R (https://www.r-project.org/)
2. Install R Studio (https://rstudio.com/products/rstudio/download/)
3. Initial Startup

4. Exploration of R Tools:
5. Installing Packages
CONCLUSION: Thus, we have successfully installed R and R Studio.

LAB OUTCOME:
ITL804.1: - Install and use R for simple programming tasks
ITL804.2: - Extend the functionality of R by using add-on packages
ITL804.3: - Extract data from files and other sources and perform various data manipulation tasks on them.
EXPERIMENT NO. - 2

Aim of the experiment: Perform Basic Operation On R

Operators
Built-in Functions
Data Structures in R
Data Types - Vectors
R Data Structures - Matrices

ITL804.1: - Install and use R for simple programming tasks


Lab Outcome: ITL804.2: - Extend the functionality of R by using add-on
packages
ITL804.3: - Extract data from files and other sources and
perform various data manipulation tasks on them.

Date of Conduction: 17/3/2021 Date of Submission: 31/3/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
Experiment No. 2
AIM: Perform Basic Operation On R.

✔ Operators
✔ Built-in Functions
✔ Data Structures In
R
✔ Data Types -
Vectors
✔ RData
Structures - Matrices
THEORY:
What Are Operators In R?

Operator Function Example

+ - */%% ^ arithmetic 1+1; 2*9; 3-7


>= <= == != relational
!& logical
~ model formulae
<- -> assignment a<-5 , x<- 6 or x = 6
$ list indexing (the ‘element name’ operator)

: create a sequence 10:20

What Are Built In Functions In R ?

Function Meaning
log (x): log to base e of x
exp (x): antilog of x (ex)
log(x,n): log to base n of x
log10(x): log to base 10 of x
sqrt(x): square root of x
factorial(x): x!
floor(x): greatest integer <x
ceiling(x): smallest integer > x
trunc(x): closest integer to x between x and 0

round(x, digits=0): round the value of x to an integer

signif(x, digits=6): give x to 6 digits in scientific notation


runif(n): generates n random numbers between 0 and 1 from a uniform distribution

cos(x),sine(x),tan(x): cosine of x (in radians), sine of x (in radians), tangent of x (in radians)

abs(x): the absolute value of x, ignoring the minus sign if there is one

log(10) [1] 2.302585 exp(1) [1] 2.718282

R Data Structures

• Data Type – Vectors: Vectors are variables with one or more than one values of the
Same type: logical, integer, real, complex, string. Vectors could also have length 0
• a<-1.3
• a <- 5:10
• a<- c(5 6 7 8 9 10) using the concatenation function c,

Example: Creating A Vector: A is of length 5 and B is of length 2


A<-1:6
B<-c(6,7)
A*B
[1] 6 14 18 28 30

• Types Of Vectors:

1. Numeric vector : a <- c(4,3,6.3,6,-8,9) (By default type is double)


2. Character vector : b <- c(“nine","two","eight")
3. Logical vector : c <- c(TRUE,TRUE,TRUE,FALSE,TRUE,FALSE)

Vector Functions:

Operation Meaning

max(x): maximum value in x

min(x): minimum value in x

sum(x): total of all the values in x

mean(x): arithmetic average of the values in x

median(x): median value in x

range(x): vector of (minx) and( max)

var(x): Sample variance of x

cor(x,y): correlation between vectors x and y

sort(x) a: sorted version of x

rank(x): vector of the ranks of the values in x

order(x): an integer vector containing the permutation to sort x into ascending order

Example:
Output:

Sales Ranks Sorted Ordered

100 4 25 6

50 2 50 2

75 3 75 3

150 5 100 1

200 6 150 4

25 1 200 5

Matrices:

A matrix is a vector with two attributes : number of rows. number of columns.

All columns in a matrix must have the Same mode(numeric, character, etc.) and the Same
length.

• To create a matrix is by using the matrix() function:

• myymatrix <- matrix(vector, nrow=number_of_rows,


ncol=number_of_columns, byrow=logical_value, dimnames=list(

• char_vector_rownames, char_vector_colnames))

Example: y <- matrix(c(1,2,3,4),nrow=2,ncol=2)


[,1] [,2]

[1,] 1 3
2 4
[2,]

Adding Rows And Columns To The Matrix

colnames(X)<-c(1:4,"variance") rownames(X)<-c(1:3,"mean") X

1 2 3 4 variance

1 50 90 50 80 425.0000

2 60 100 90 30 1000.0000

3 40 80 10 70 1000.0000

mean 50 90 50 60 358.3333
Operators:

Arithmetic Operator:

Relational Operator:
Logical Operator:

Assignment Operator:
Miscellaneous Operator:

Built-In Functions in R:
R Data Structure (Data Types – Vector):
R Data Structure - Matrices:
CONCLUSION: Hence, we learned about all the types of basic operations in R Programming
along with their use, implementation, and examples.

LAB OUTCOME:

ITL804.1: - Install and use R for simple programming tasks


ITL804.2: - Extend the functionality of R by using add-on packages
ITL804.3: - Extract data from files and other sources and perform various data manipulation tasks on
them.
EXPERIMENT NO. - 3

Aim of the experiment: Grouping Loops and Conditional Execution

Lab Outcome: ITL804.1: - Install and use R for simple programming tasks
ITL804.4: - Code statistical functions in R

Date of Conduction: 31/3/2021 Date of Submission: 7/4/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
EXPERIMENT NO. 3
AIM: Grouping Loops and Conditional Execution
THEORY:
Control Flow and loops
1) IF ELSE:
Syntax: ifelse(test_expression, x, y)
Examples:
1) a = c(5,7,2,9)
ifelse(a %% 2 == 0,"even","odd")
2) x <- 5
if(x > 0)
{
print("Positive number")
}

3) x <-5
if(x > 0)
{
print("Non-negative number")
}
else
{
print("Negative number")

2) For loop

x <- c(2,5,3,9,8,11,6)
count <- 0
for (val in x)
{
if(val %% 2 == 0)
count = count+1
}
print(count)
3) While loop
i <- 1
while (i < 6)
{
print(i)
i = i+1
}
4) Break statement
x <- 1:5
for (val in x)
{
if (val == 3)
{
Break
}
print(val)
}
5) Next statement
x <- 1:5
for (val in x)
{
if (val == 3)
{
next
}
print(val)
}

Functions:
pow <- function(x, y)
{
result <- x^y
print(paste(x,"raised to the power", y, "is“, result))
}
>pow(8, 2)
"8 raised to the power 2 is 64"
>pow(2, 8)
"2 raised to the power 8 is 256"

Function with return value:


check <- function(x) {
if (x > 0)
{
result <- "Positive“
}
else
if (x < 0)
{
result <- "Negative“
}
else
{
result <- "Zero"
}
return(result)
}

OUTPUT:
1) IF-ELSE
a = c (5,7,2,9)
ifelse (a %% 2 == 0,"Even","Odd")

IF-ELSE
x = -5
if (x > 0)
{
print ("Non-negative number")
}
else
{
Print ("Negative number")
}
2) FOR LOOP
x = c (2,5,3,9,8,11,6)
count = 0
for (val in x)
{
if (val %% 2 == 0)
count = count+1
}
print(count)

3) WHILE LOOP
i=1
while (i < 6)
{
print(i)
i = i+1
}

#4) Break statement


x = 1:5
for (val in x)
{
if (val == 3)
{
break
}
print(val)
}
5) NEXT STATEMENT
x = 1:5
for (val in x)
{
if (val == 3)
{
next
}
print(val)
}

6) FUNCTIONS:
pow = function(x, y)
{
result = x^y
print(paste(x,"raised to the power", y, "is", result))
}

pow(8, 2)
"8 raised to the power 2 is 64"
pow(2, 8)
"2 raised to the power 8 is 256"
7) FUNCTION WITH RETURN VALUE:
check = function(x) {
if (x > 0)
{
result = "Positive"
}
else
if (x < 0)
{
result = "Negative"
}
else
{
result = "Zero"
}
return(result)
}
check(x=5)
check(x=-2)
check(x=0)
CONCLUSION: Thus, we have performed Grouping Loops and Conditional Execution
operations in R.

LAB OUTCOME:
ITL804.1: - Install and use R for simple programming tasks
ITL804.4: - Code statistical functions in R
EXPERIMENT NO. - 4

Aim of the experiment: Exploratory data analysis using data Frames

Lab Outcome: ITL804.1: - Install and use R for simple programming tasks
ITL804.4: - Code statistical functions in R

Date of Conduction: 7/4/2021 Date of Submission: 14/4/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
EXPERIMENT NO. 4

AIM: Exploratory data analysis using data Frames

THEORY:

With the help of data frames, we can

1) Create vectors,
2) put the vectors into a data frame
3) Perform operation on the data frame
4) We can save the data frame into a file
5) Import of the file into R
6) Examine the data frames that came with R

Example:

names <- c('Mahi','Sourav','Azhar','Sunny','Pataudi','Dravid')

played <- c(45,49,47,47,40,25)

won <- c(22,21,14,9,9,8)

lost <- c(12,13,14,8,19,6)

Y <- c(2012,2004,2000,1980,1965,2008)

captaincy <- data.frame (names,Y, played,won,lost)

captaincy$names

captaincy$won

captaincy$played

ratio <- captaincy$won/captaincy$played

print(ratio)

captaincy$victory <- ratio


options(digits=2)

View(captaincy)

options(digits=2)

mean(captaincy$played)

(45+49+47+47+40+25) / 6

plot(captaincy$Y,ratio)

library(ggplot2)

ggplot(data=captaincy, aes(x=names, y=played)) +

geom_bar(stat='identity')

write.table(captaincy, file="data1.csv", row.names=TRUE, na="", col.names = TRUE, sep=",")

read.table((file = "data1.csv"))

data()

View(CO2)

Output:

Create Dataset:
Convert Dataset into Dataframe:

Checking Individual Columns:

Ratio of Two Numeric Columns:


Mean Value of Data:

Plot Numeric Values:

Barplot Graph:
Create New Excel File to store data:

Read from the new Excel File:


View Existing Dataset in R Studio:

View any dataset:

CONCLUSION: Thus, we have performed Grouping Loops and Conditional Execution


operations in R.

LAB OUTCOME:
ITL804.1: - Install and use R for simple programming tasks
ITL804.4: - Code statistical functions in R
EXPERIMENT NO. - 5

Aim of the experiment: Graphics in R (Graphs, plot)

ITL804.1: - Install and use R for simple programming tasks


Lab Outcome: ITL804.3: - Extract data from files and other sources and
perform various data manipulation tasks on them.
ITL804.4: - Code statistical functions in R
ITL804.5: - Use R Graphics and Tables to visualize results of
various statistical operations on data

Date of Conduction: 14/4/2021 Date of Submission: 21/4/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
EXPERIMENT NO. 5

AIM: Graphics in R (Graphs, plot)

THEORY:

Graphics in R

1) Bar Plot

max.temp <- c(22, 27, 26, 24, 23, 26, 28)

barplot(max.temp)

# barchart with added parameters

barplot(max.temp, main = "Maximum Temperatures in a Week", xlab = "Degree Celsius", ylab =


"Day", names.arg = c("Sun", "Mon", "Tue", "Wed", "Thu", "Fri", "Sat"), col = "darkred", horiz =
TRUE)

2) Table

age <- c(17,18,18,17,18,19,18,16,18,18)


table(age)
barplot(table(age), main="Age Count of 10 Students", xlab="Age", ylab="Count", border="red",
col="blue", density=10 )

3) Histogram
str(airquality)
Temperature <- airquality$Temp
hist(Temperature)
hist(Temperature, main="Maximum daily temperature at La Guardia Airport", xlab="Temperature
in degrees Fahrenheit", xlim=c(50,100), col="darkmagenta", freq=FALSE )
4) Pie Chart
e=(c("Housing","Food","Cloths","Entertainment","Other"))
v=(c(600,300,150,100,200))
pie(v,labels=e,main="pie chart")
pct=round(v/sum(v)*100)
e=paste(e,pct)
e=paste(e,"%",sep=" ")
pie(v,labels=e,main="pie chart",col=rainbow(length(e)))

Output:
1. Bar Plot
max.temp = c(22,27,26,24,23,26,28)
barplot(max.temp)
# barchart with added parameters
barplot(max.temp,main = "Maximum Temperature In A Week", xlab ="Degree Celcius",
ylab = "Day",names.arg = c("Sun","Mon","Tue","Wed","Thur","Fri","Sat"), col = "darkred",
horiz = TRUE)
2. Table
age = c (17,18,18,17,18,19,18,16,18,18)
table(age)
barplot(table(age), main = "Age Count of 10 Students", xlab = "Age", ylab = "Count",
border= "black", col= "red", density=10)
3. Histogram
str(airquality)
Temperature = airquality$Temp
hist(Temperature)
hist(Temperature, main = "Maximum daily temperature at La Guardia Airport",
xlab= "Temperature in degrees Fahrenheit", xlim=c(50,100), col= "darkmagenta",
freq=FALSE )

4. Pie Chart
e=(c("Housing","Food","Cloths","Entertainment","Other"))
v=(c(600,300,150,100,200))
pie(v,labels=e,main="pie chart")

pct=round(v/sum(v)*100)
e=paste(e,pct)
e=paste(e,"%",sep=" ")
pie(v,labels=e,main="pie chart",col=rainbow(length(e)))
CONCLUSION: Thus, we have performed Grouping Loops and Conditional Execution
operations in R.

LAB OUTCOME:

ITL804.1: - Install and use R for simple programming tasks


ITL804.3: - Extract data from files and other sources and perform various data manipulation tasks
on them.
ITL804.4: - Code statistical functions in R
ITL804.5: - Use R Graphics and Tables to visualize results of various statistical operations on data
EXPERIMENT NO. - 6

Aim of the experiment: Regression and Correlation

Lab Outcome: ITL804.4: - Code statistical functions in R


ITL804.5: - Use R Graphics and Tables to visualize results of
various statistical operations on data

Date of Conduction: 21/4/2021 Date of Submission: 28/4/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
EXPERIMENT NO. 6
AIM: Regression and Correlation
THEORY:
Regression
1) Linear Regression
y=mx+c
x <- c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y <- c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation <- lm(y~x)
print(relation)
# Find weight of a person with height 170.
a <- data.frame(x = 170)
result <- predict(relation,a)
print(result)

2) Plotting
# Give the chart file a name.
png(file = "linearregression.png")
# Plot the chart.
plot(y,x,col = "blue",main = "Height & Weight Regression", abline(lm(x~y)),cex = 1.3,pch = 16,xlab =
"Weight in Kg",ylab = "Height in cm")
# Save the file.
dev.off()
3) Multiple Regression:
y = a + b1x1 + b2x2 +...bnxn
input <- mtcars[,c("mpg","disp","hp","wt")]
# Create the relationship model.
model <- lm(mpg~disp+hp+wt, data = input)
# Show the model.
print(model)
# Get the Intercept and coefficients as vector elements.
cat("# # # # The Coefficient Values # # # ","\n")
a <- coef(model)
print(a)
Xdisp <- coef(model)
Xhp <- coef(model)
Xwt <- coef(model)
print(Xdisp)
print(Xhp)
print(Xwt)

OUTPUT:
1) Linear Regression
x = c(151, 174, 138, 186, 128, 136, 179, 163, 152, 131)
y = c(63, 81, 56, 91, 47, 57, 76, 72, 62, 48)
# Apply the lm() function.
relation = lm(y~x)
print(relation)
# Find weight of a person with height 170.
a = data.frame(x = 170)
result = predict(relation,a)
print(result)
plot(x,y,main="relation")
abline(mod, col = 2,lwd = 3)
2) Plotting
plot(y,x, main="Height & Weight Regressioh", xlab="Weight in Kg", ylab="Height in cm",col =
"blue",abline(lm(x~y)),cex = 1.3,pch = 16)
3) Multiple Regression:
input = mtcars[c("mpg","disp","hp","wt")]
# Create the relationship model.
model = lm(mpg~disp+hp+wt, data = input)
# Show the model.
print(model)
plot(model)
# Get the Intercept and coefficients as vector elements.
cat("# # # # The Coefficient Values # # #","\n")
a = coef(model)
print(a)
Xdisp = coef(model)
Xhp = coef(model)
Xwt = coef(model)
print(Xdisp)
print(Xhp)
print(Xwt)
CONCLUSION: Thus, we have performed Grouping Loops and Conditional Execution
operations in R.

LAB OUTCOME:

ITL804.4: - Code statistical functions in R


ITL804.5: - Use R Graphics and Tables to visualize results of various statistical operations on data
EXPERIMENT NO. - 7

Aim of the experiment: Mini Project on Data Science

ITL804.5: - Use R Graphics and Tables to visualize results of


Lab Outcome:
various statistical operations on data
ITL804.6: - Apply the knowledge of R gained to data
Analytics for real life applications.

Date of Conduction: 28/4/2021 Date of Submission: 05/05/2021

Implementation Understanding Punctuality &Discipline Total


(5) (5) (5) (15)

Practical Incharge
EXPERIMENT NO. – 7
AIM: Mini Project on Data Science (Movie Recommendation System)
CODE:
library(recommenderlab)
library(ggplot2)
library(data.table)
library(reshape2)

setwd("C:/Users/91986/OneDrive/Desktop/FINAL YEAR/Submission/R")
movie_data <- read.csv("movies.csv",stringsAsFactors=FALSE)
rating_data <- read.csv("ratings.csv")
str(movie_data)

summary(movie_data) #Author DataFlair

head(movie_data)

summary(rating_data) #Author DataFlair

head(rating_data)

movie_genre <- as.data.frame(movie_data$genres, stringsAsFactors=FALSE)


library(data.table)
movie_genre2 <- as.data.frame(tstrsplit(movie_genre[,1], '[|]',
type.convert=TRUE),
stringsAsFactors=FALSE) #DataFlair
colnames(movie_genre2) <- c(1:10)

list_genre <- c("Action", "Adventure", "Animation", "Children",


"Comedy", "Crime","Documentary", "Drama", "Fantasy",
"Film-Noir", "Horror", "Musical", "Mystery","Romance",
"Sci-Fi", "Thriller", "War", "Western")
genre_mat1 <- matrix(0,10330,18)
genre_mat1[1,] <- list_genre
colnames(genre_mat1) <- list_genre

for (index in 1:nrow(movie_genre2)) {


for (col in 1:ncol(movie_genre2)) {
gen_col = which(genre_mat1[1,] == movie_genre2[index,col]) #Author DataFlair
genre_mat1[index+1,gen_col] <- 1
}
}
genre_mat2 <- as.data.frame(genre_mat1[-1,], stringsAsFactors=FALSE) #remove first row, which was
the genre list
for (col in 1:ncol(genre_mat2)) {
genre_mat2[,col] <- as.integer(genre_mat2[,col]) #convert from characters to integers
}
str(genre_mat2)

SearchMatrix <- cbind(movie_data[,1:2], genre_mat2[])


head(SearchMatrix) #DataFlair
ratingMatrix <- dcast(rating_data, userId~movieId, value.var = "rating", na.rm=FALSE)
ratingMatrix <- as.matrix(ratingMatrix[,-1]) #remove userIds
#Convert rating matrix into a recommenderlab sparse matrix
ratingMatrix <- as(ratingMatrix, "realRatingMatrix")
ratingMatrix

recommendation_model <- recommenderRegistry$get_entries(dataType = "realRatingMatrix")


names(recommendation_model)

lapply(recommendation_model, "[[", "description")

recommendation_model$IBCF_realRatingMatrix$parameters

similarity_mat <- similarity(ratingMatrix[1:4, ],


method = "cosine",
which = "users")
as.matrix(similarity_mat)

image(as.matrix(similarity_mat), main = "User's Similarities")

movie_similarity <- similarity(ratingMatrix[, 1:4], method =


"cosine", which = "items")
as.matrix(movie_similarity)

image(as.matrix(movie_similarity), main = "Movies similarity")

rating_values <- as.vector(ratingMatrix@data)


unique(rating_values) # extracting unique ratings

Table_of_Ratings <- table(rating_values) # creating a count of movie ratings


Table_of_Ratings

library(ggplot2)
movie_views <- colCounts(ratingMatrix) # count views for each movie
table_views <- data.frame(movie = names(movie_views),
views = movie_views) # create dataframe of views
table_views <- table_views[order(table_views$views,
decreasing = TRUE), ] # sort by number of views
table_views$title <- NA
for (index in 1:10325){
table_views[index,3] <- as.character(subset(movie_data,
movie_data$movieId == table_views[index,1])$title)
}
table_views[1:6,]

ggplot(table_views[1:6, ], aes(x = title, y = views)) +


geom_bar(stat="identity", fill = 'steelblue') +
geom_text(aes(label=views), vjust=-0.3, size=3.5) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +

ggtitle("Total Views of the Top Films")

image(ratingMatrix[1:20, 1:25], axes = FALSE, main = "Heatmap of the first 25 rows and 25 columns")

movie_ratings <- ratingMatrix[rowCounts(ratingMatrix) > 50,


colCounts(ratingMatrix) > 50]
Movie_ratings

minimum_movies<- quantile(rowCounts(movie_ratings), 0.98)


minimum_users <- quantile(colCounts(movie_ratings), 0.98)
image(movie_ratings[rowCounts(movie_ratings) > minimum_movies,
colCounts(movie_ratings) > minimum_users],
main = "Heatmap of the top users and movies")

average_ratings <- rowMeans(movie_ratings)


qplot(average_ratings, fill=I("steelblue"), col=I("red")) +
ggtitle("Distribution of the average rating per user")

normalized_ratings <- normalize(movie_ratings)


sum(rowMeans(normalized_ratings) > 0.00001)

image(normalized_ratings[rowCounts(normalized_ratings) > minimum_movies,


colCounts(normalized_ratings) > minimum_users],
main = "Normalized Ratings of the Top Users")

binary_minimum_movies <- quantile(rowCounts(movie_ratings), 0.95)


binary_minimum_users <- quantile(colCounts(movie_ratings), 0.95)
#movies_watched <- binarize(movie_ratings, minRating = 1)

good_rated_films <- binarize(movie_ratings, minRating = 3)


image(good_rated_films[rowCounts(movie_ratings) > binary_minimum_movies,
colCounts(movie_ratings) > binary_minimum_users],
main = "Heatmap of the top users and movies")

sampled_data<- sample(x = c(TRUE, FALSE),


size = nrow(movie_ratings),
replace = TRUE,
prob = c(0.8, 0.2))
training_data <- movie_ratings[sampled_data, ]
testing_data <- movie_ratings[!sampled_data, ]

recommendation_system <- recommenderRegistry$get_entries(dataType ="realRatingMatrix")


recommendation_system$IBCF_realRatingMatrix$parameters

recommen_model <- Recommender(data = training_data,


method = "IBCF",
parameter = list(k = 30))
recommen_model
class(recommen_model)

model_info <- getModel(recommen_model)


class(model_info$sim)
dim(model_info$sim)
top_items <- 20
image(model_info$sim[1:top_items, 1:top_items],
main = "Heatmap of the first rows and columns")

sum_rows <- rowSums(model_info$sim > 0)


table(sum_rows)
sum_cols <- colSums(model_info$sim > 0)
qplot(sum_cols, fill=I("steelblue"), col=I("red"))+ ggtitle("Distribution of the column count")

top_recommendations <- 10 # the number of items to recommend to each user


predicted_recommendations <- predict(object = recommen_model,
newdata = testing_data,
n = top_recommendations)
predicted_recommendations

user1 <- predicted_recommendations@items[[1]] # recommendation for the first user


movies_user1 <- predicted_recommendations@itemLabels[user1]
movies_user2 <- movies_user1
for (index in 1:10){
movies_user2[index] <- as.character(subset(movie_data,
movie_data$movieId == movies_user1[index])$title)
}
movies_user2

recommendation_matrix <- sapply(predicted_recommendations@items,


function(x){ as.integer(colnames(movie_ratings)[x]) }) # matrix with the
recommendations for each user
#dim(recc_matrix)
recommendation_matrix[,1:4]

OUTPUT:
CONCLUSION: Thus, we have implemented mini-project on movie recommendation
system for data science in R.

LAB OUTCOME:

ITL804.5: - Use R Graphics and Tables to visualize results of various statistical operations on data
ITL804.6: - Apply the knowledge of R gained to data Analytics for real life applications.

--END--

You might also like