You are on page 1of 52

INTRODUCTION TO R PROGRAMMING LAB FILE

(MS113)

SUBMITTED BY: SUBMITTED TO:


Pushpa Kumari Dr. J.K. Sahu
Enrolment No: 00318419823 Professor (R Language)
(1st Semester) MBA Analytics (2023-25)

ARMY INSTITUTE OF MANAGEMENT & TECHNOLOGY


Plot No.M-1, Pocket P-5, Gautam Buddha Nagar, Greater Noida -201310 (U.P)
TABLE OF CONTENTS
S.No Particular’s Page Number Student Signature Teacher Signature

1 Introduction to R Language Page 3


2 Vector Page 4-9
3 Lists Page 10-18
4 Matrices Page 19-24
5 Factors Page 25-26
6 Data Frames Page 27-28
7 Charts and Graphs in R Page 29-33
8 For LOOP Page 34-35
9 Descriptive Statistics using R Page 36-41
10 Various types of Regression Page 42-43
11 Hypothesis Testing Page 44
12 ANOVA Page 45-46
13 Regression Analysis Page 47-50
14 Linear Regression Page 51
15 Neural Network Page 52

2
INTRODUCTION TO R LANGUAGE

What is R Programming Language ?


• R is a popular programming language used for statistical computing and
graphical presentation.
• R is an object-oriented language, which means every operation in R is
performed around objects. These objects can be anything that can be stored
in a variable, like one-dimensional data structures, two- dimensional data
structures, user-defined functions, etc.
Why Use R?
• It is a great resource for data analysis, data visualization, data science and
machine learning.
• It provides many statistical techniques ( such as statistical tests, classification,
clustering and data reduction)
• It is easy to draw graphs in R, like pie charts, histograms, box plot, scatter plot
etc.
• It is open-source and free.
• It has many packages that can be used to solve different problems.
Data Structures
A data structure is a particular way of organizing data in a computer so that it can
be used effectively. The idea is to reduce the space and time complexities of
different tasks.The most essential data structures used in R include

3
VECTORS

1. Vector Creation and Modification:


• c(): Combine vectors.
• seq(): Create sequences.
• rep(): Repeat values.
• vector(): Create empty vectors of a specified length.
• append(): Add elements to a vector.
• length(): Get the length of a vector.
• names(): Get or set names for vector elements.
• dim(): Get or set dimensions for vectors (used in multidimensional
arrays).

1. Creating Vectors

# Creating a numeric vector


numeric_vector <- c(1, 2, 3, 4, 5)
numeric_vector

4
Output:-

[1] 1 2 3 4 5

# Creating a character vector


character_vector <- c("apple", "banana", "cherry")
character_vector

Output:-

[1] "apple" "banana" "cherry"

# Creating a logical vector


logical_vector <- c(TRUE, FALSE, TRUE, TRUE, FALSE)
logical_vector

Output:-

[1] TRUE FALSE TRUE TRUE FALSE

Explanation:
• c() is used to concatenate elements into a vector.
• numeric_vector contains numeric values.
• character_vector contains character strings.
• logical_vector contains logical values (TRUE and FALSE).

2. Vector Indexing and Subsetting:


• [ ]: Indexing and subsetting using numeric or logical indices.
• $: Extract elements by name (for named vectors).
• subset(): Subsetting vectors based on conditions.
• which(): Find indices of elements that meet a condition.

Vector Indexing and Subsetting

5
# Creating a numeric vector
numbers <- c(10, 20, 30, 40, 50)
numbers

Output:-

[1] 10 20 30 40 50

# Indexing: Accessing the third element


third_element <- numbers[3]
third_element

Output:-
[1] 30

# Subsetting: Extracting elements greater than 30


subset_vector <- numbers[numbers > 30]
subset_vector
Output:-
[1] 2 3 4
Explanation:
• numbers[3] retrieves the third element (30) from the numbers
vector.
• numbers[numbers > 30] creates a subset vector containing
elements greater than 30 (c(40, 50)).

3. Vector Operations:
• Arithmetic operations (+, -, *, /) and related functions (sum(),
mean(), min(), max(), median(), etc.).
• sort(): Sort a vector.
• rank(): Compute the rank of elements in a vector.

6
• unique(): Find unique elements in a vector.
• duplicated(): Identify duplicated elements.
• diff(): Calculate differences between consecutive elements.
• cumsum(): Calculate cumulative sums.
• cumprod(): Calculate cumulative products.

Vector Operations

# Creating numeric vectors


x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)

# Addition of vectors

sum_vector <- x + y
sum_vector

Output:-
[1] 6 6 6 6 6

# Multiplication of vectors

product_vector <- x * y
product_vector

Output:-
[1] 5 8 9 8 5

Explanation:
• x + y performs element-wise addition, resulting in sum_vector (c(6,
6, 6, 6, 6)).

7
• x * y performs element-wise multiplication, resulting in
product_vector (c(5, 8, 9, 8, 5)).

4. Vector Functions and Manipulations:


• paste(): Concatenate strings in a character vector.
• tolower(), toupper(): Convert character vectors to lowercase or
uppercase.
• rev(): Reverse the order of elements in a vector.
• substring(): Extract substrings from character vectors.
• substring<-(): Modify substrings in character vectors.
• as.numeric(), as.character(), as.logical(): Convert vectors to
different data types.
• factor(): Create categorical factors.
• levels(): Get or set levels of a factor.
• table(): Create frequency tables.
• aggregate(): Perform data aggregation on vectors.
• cut(): Convert numeric vectors to factor levels based on intervals.
• cummax(), cummin(): Calculate cumulative maximum or minimum
values.

Applying Functions to Vectors

# Creating a numeric vector


grades <- c(90, 85, 78, 92, 88)
# Calculating the mean
mean_grade <- mean(grades)

# Sorting the vector sorted_grades <- sort(grades)


Output:-
[1] 86.6

8
Explanation:
• mean(grades) calculates the mean of the grades vector.
• sort(grades) arranges the elements of grades in ascending order,
creating sorted_grades.

Vectorized Functions

# Creating a character vector


fruits <- c("apple", "banana", "cherry", "date", "elderberry")
# Applying the paste() function
paste_result <- paste("I like", fruits, "and", "I eat them.")
# Creating a factor from a vector
fruit_factor <- factor(fruits)

Explanation:
• paste() combines elements from fruits with additional text.
• factor(fruits) converts fruits into a factor, useful for categorical
data.

9
Lists

A list is a generic object consisting of an ordered collection of objects. Lists are


heterogeneous data structures. These are also one-dimensional data structures. A list
can be a list of vectors, list of matrices, a list of characters and a list of functions and
so on.

# Create an empty list:

empty_list <- list()


empty_list

# Create a list with numeric values:


num_list <- list(1, 2, 3, 4, 5)
num_list

Output:-

> num_list <- list(1, 2, 3, 4, 5)


> num_list
[[1]]
[1] 1

[[2]]
[1] 2

10
[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

>

# Create a list with a vector:


vector_list <- list(vec = c(1, 2, 3))
vector_list
> vector_list <- list(vec = c(1, 2, 3))
> vector_list
$vec
[1] 1 2 3

# Create a list with named elements:

named_list <- list(name = "Pushpa ", age = 23, city = "Ambala")


named_list

> named_list <- list(name = "Pushpa", age = 23, city = "Ambala")


> named_list
$name
[1] "Pushpa"

$age
[1] 23

$city
[1] "Ambala"

# Create a list with a data frame:

11
data_frame_list <- list(data = data.frame(x = 1:3, y = c("A", "B", "C")))
data_frame_list

> data_frame_list <- list(data = data.frame(x = 1:3, y = c("A", "B", "C")))


> data_frame_list
$data
xy
11A
22B
33C

>

# Create a nested list:


nested_list <- list(list(1, 2), list("a", "b", "c"))
nested_list
> nested_list <- list(list(1, 2), list("a", "b", "c"))
> nested_list
[[1]]
[[1]][[1]]
[1] 1

[[1]][[2]]
[1] 2

[[2]]
[[2]][[1]]
[1] "a"

[[2]][[2]]
[1] "b"

[[2]][[3]]
[1] "c"

12
>

# Access elements of a list by index:


item <- num_list[[2]]
item

# Add elements to a list using indexing:

num_list[[6]] <- 6
num_list

> item <- num_list[[2]]


> item
[1] 2
>
> num_list[[6]] <- 6
> num_list
[[1]]
[1] 1

[[2]]
[1] 2

[[3]]
[1] 3

[[4]]
[1] 4

[[5]]
[1] 5

[[6]]
[1] 6

>

13
# Add a new element to a list:

named_list[["gender"]] <- "Female"


named_list

# Remove an element from a list:

named_list$gender <- NULL

named_list

> named_list[["gender"]] <- "Female"


> named_list
$name
[1] "Pushpa"

$age
[1] 23

$city
[1] "Ambala"

$gender
[1] “Female”

>
> named_list$gender <- NULL
> named_list
$name
[1] "Pushpa"

$age
[1] 23

$city
[1] "Ambala"

>

14
# Combine two lists:

combined_list <- c(vector_list,named_list)


combined_list

> combined_list <- c(vector_list,named_list)


> combined_list
$vec
[1] 1 2 3

$name
[1] "Pushpa"

$age
[1] 23

$city
[1] "Ambala"

>

# Convert a list to a vector:


vec <- unlist(num_list)
vec
> vec <- unlist(num_list)
> vec
[1] 1 2 3 4 5 6

>

# Apply a function to each element of a list:


squared_list <- lapply(num_list, function(x) x^2)

15
squared_list

> squared_list <- lapply(num_list, function(x) x^2)


> squared_list
[[1]]
[1] 1

[[2]]
[1] 4

[[3]]
[1] 9

[[4]]
[1] 16

[[5]]
[1] 25

[[6]]
[1] 36

>

# Create a list with a mix of data types:


mixed_list <- list("text", 42, TRUE, c(1, 2, 3))
mixed_list

> mixed_list <- list("text", 42, TRUE, c(1, 2, 3))


> mixed_list
[[1]]
[1] "text"

[[2]]
[1] 42

[[3]]
[1] TRUE

[[4]]
[1] 1 2 3

16
>

# Create a list of lists with named inner lists:


list_of_named_lists <- list(list(a = 1, b = 2), list(x = "apple", y = "banana"))
list_of_named_lists
> list_of_named_lists <- list(list(a = 1, b = 2), list(x = "apple", y = "banana"))
> list_of_named_lists
[[1]]
[[1]]$a
[1] 1

[[1]]$b
[1] 2

[[2]]
[[2]]$x
[1] "apple"

[[2]]$y
[1] "banana"

>

# Convert a list to a data frame:


data_frame_from_list <- as.data.frame(list(x = 1:3, y = c("A", "B", "C")))
data_frame_from_list
> data_frame_from_list <- as.data.frame(list(x = 1:3, y = c("A", "B", "C")))
> data_frame_from_list
xy
11A
22B
33C

>

17
# Convert a list to a matrix:
matrix_from_list <- matrix(unlist(num_list), nrow = 2)
matrix_from_list
> matrix_from_list <- matrix(unlist(num_list), nrow = 2)
> matrix_from_list
[,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6

>

# Create a list with a named vector inside:


named_vector_list <- list(vector = c(a = 1, b = 2, c = 3))
named_vector_list
> named_vector_list <- list(vector = c(a = 1, b = 2, c = 3))
> named_vector_list
$vector
abc
123

>

18
Matrices

A matrix is a rectangular arrangement of numbers in rows and columns. Matrices


are two dimensional, homogeneous data structures. In matrix , rows are the ones
that run horizontally and columns are the ones that run vertically.

# Create a 2x2 matrix named 'matrix1' with the values 1, 2, 3, and 4.


matrix1 <- matrix(c(1, 2, 3, 4), nrow = 2)
matrix1
> matrix1 <- matrix(c(1, 2, 3, 4), nrow = 2)
> matrix1
[,1] [,2]
[1,] 1 3
[2,] 2 4

>

# Access the element in the first row and second column of 'matrix1'.
element <- matrix1[1, 2]
element
> element <- matrix1[1, 2]
> element
[1] 3

19
>

# Create a 3x3 matrix named 'matrix2' with values from 1 to 9.


matrix2 <- matrix(1:9, nrow = 3)
> matrix2 <- matrix(1:9, nrow = 3)
> matrix2
[,1] [,2] [,3]
[1,] 1 4 7
[2,] 2 5 8
[3,] 3 6 9

>

# Find the dimensions (number of rows and columns) of 'matrix2'.


dimensions <- dim(matrix2)
dimensions
> dimensions <- dim(matrix2)
> dimensions
[1] 3 3

>

# Create a matrix named 'matrix3' with all elements set to 0.


matrix3 <- matrix(0, nrow = 2, ncol = 2)
matrix3
> matrix3 <- matrix(0, nrow = 2, ncol = 2)
> matrix3
[,1] [,2]
[1,] 0 0
[2,] 0 0

>

20
# Find the transpose of 'matrix2'. Store the result in a new matrix
named 'transpose_matrix'.
transpose_matrix <- t(matrix2)
> transpose_matrix <- t(matrix2)
> transpose_matrix
[,1] [,2] [,3]
[1,] 1 2 3
[2,] 4 5 6
[3,] 7 8 9

>

# Calculate the determinant of 'matrix1'.


determinant <- det(matrix1)
determinant
> determinant <- det(matrix1)
> determinant
[1] -2

>

# Create a 3x2 matrix named 'matrix4' with values from 1 to 6.


matrix4 <- matrix(1:6, nrow = 3, ncol = 2)
matrix4
> matrix4 <- matrix(1:6, nrow = 3, ncol = 2)
> matrix4
[,1] [,2]
[1,] 1 4
[2,] 2 5
[3,] 3 6

>

21
# Multiply 'matrix2' by 'matrix4'.
product_matrix <- matrix2 %*% matrix4
product_matrix
> product_matrix <- matrix2 %*% matrix4
> product_matrix
[,1] [,2]
[1,] 30 66
[2,] 36 81
[3,] 42 96

>

# Extract the second row from 'product_matrix'.


second_row <- product_matrix[2, ]
second_row
> second_row <- product_matrix[2, ]
> second_row
[1] 36 81

>

# Extract the third column from 'matrix2'.


third_column <- matrix2[, 3]
third_column
> third_column <- matrix2[, 3]
> third_column
[1] 7 8 9

>

# Create a diagonal matrix named 'diagonal_matrix' with values 1, 2,


and 3 on the main diagonal.
diagonal_matrix <- diag(1:3)

22
diagonal_matrix
> diagonal_matrix <- diag(1:3)
> diagonal_matrix
[,1] [,2] [,3]
[1,] 1 0 0
[2,] 0 2 0
[3,] 0 0 3

>

# Calculate the sum of all elements in 'diagonal_matrix'.


sum_diagonal <- sum(diagonal_matrix)
sum_diagonal

> sum_diagonal <- sum(diagonal_matrix)


> sum_diagonal
[1] 6

>

# Check if 'matrix1' is symmetric.


is_symmetric <- identical(matrix1, t(matrix1))
is_symmetric

# Check if 'matrix2' is a square matrix.

23
is_square <- dim(matrix2)[1] == dim(matrix2)[2]
is_square
> is_square <- dim(matrix2)[1] == dim(matrix2)[2]
> is_square
[1] TRUE

>

# Find the minimum value in 'matrix3'.


min_value <- min(matrix3)
min_value
> min_value <- min(matrix3)
> min_value
[1] 0

>

# Calculate the row sums of 'matrix4'.


row_sums_matrix4 <- rowSums(matrix4)
row_sums_matrix4
> row_sums_matrix4 <- rowSums(matrix4)
> row_sums_matrix4
[1] 5 7 9

>

24
Factors

Factors are the data objects which are used to categorize the data and store it as
levels. They can store both strings and integers. They are useful in the columns
which have a limited number of unique values. Like "Male, "Female" and True,
False etc. They are useful in data analysis for statistical modeling.
# You can create a factor by using the factor() function.
color <- factor(c("red", "green", "blue", "red", "green"))
color
> color <- factor(c("red", "green", "blue", "red", "green"))
> color
[1] red green blue red green
Levels: blue green red

>

# You can use the levels() function to check the levels of a factor.
levels(color)
levels
> levels(color)
[1] "blue" "green" "red"
# Number of levels
num_levels <- nlevels(color)
print(num_levels)

25
# Total number of elements
num_elements <- length(color)
print(num_elements)
> num_levels <- nlevels(color)
> print(num_levels)
[1] 3
> # Total number of elements
> num_elements <- length(color)
> print(num_elements)
[1] 5

>

26
Data frames

In R, a data frame is a two-dimensional tabular data structure that is used to store


and manipulate data. It is similar to a spreadsheet or a SQL table, where data is
arranged in rows and columns.

# Create a data frame with three columns


df <- data.frame(Name = c("Pushpa", "neeraj", "ankit"), Age = c(23, 23,
23), Salary = c(50000, 60000, 70000))
df
# View the data frame
> df <- data.frame(Name = c("Pushpa", "neeraj", "ankit"), Age = c(23, 23, 23), Salary = c(50000, 60000, 70000))
> df
Name Age Salary
1 Pushpa 23 50000
2 neeraj 23 60000
3 ankit 23 70000

>

The structure of the data frame can be seen by using str() function.
The statistical summary and nature of the data can be obtained by
applying summary() function.
> str(df)

27
'data.frame': 3 obs. of 3 variables:
$ Name : chr "Pushpa” "neeraj" "ankit"
$ Age : num 23 23 23
$ Salary: num 50000 60000 70000
> summary(df)
Name Age Salary
Length:3 Min. :23 Min. :50000
Class :character 1st Qu.:23 1st Qu.:55000
Mode :character Median :23 Median :60000
Mean :23 Mean :60000
3rd Qu.:23 3rd Qu.:65000
Max. :23 Max. :70000

>

# Add the "dept" coulmn.


df$dept <- c("IT","Operations","IT")
df

> df$dept <- c("IT","Operations","IT")


> df
Name Age Salary dept
1 Pushpa 23 50000 IT
2 neeraj 23 60000 Operations
3 ankit 23 70000 IT

>

28
CHARTS AND GRAPHS IN R

R is a powerful tool for creating a wide variety of charts and graphs.

Line chart

x <- 1:10
y <- c(3, 5, 6, 7, 8, 9, 8, 7, 6, 5)
plot(x, y, type = "l", xlab = "X-axis label", ylab = "Y-axis label", main = "MY
FIRST GRAPH IN R")

29
Bar chart

x <- c("A", "B", "C", "D")


y <- c(20, 30, 10, 15)
barplot(y, names.arg = x, xlab = "X-axis label", ylab = "Y-axis label", main
= "MY BAR CHART IN R")

Scatter plot
x <- c(1, 2, 3, 4, 5)
y <- c(5, 4, 3, 2, 1)
plot(x, y, xlab = "X-axis label", ylab = "Y-axis label", main = "MY SCATTER
PLOT IN R")

30
Histogram

x <- rnorm(100)
hist(x, xlab = "X-axis label", ylab = "Y-axis label", main = "Histogram
example")

Pie chart
# Example data
labels <- c("A", "B", "C", "D", "E")
values <- c(20, 30, 10, 15, 25)
colors <- c("#E69F00", "#56B4E9", "#009E73", "#F0E442", "#0072B2")
# Create a pie chart with custom colors
pie(values, labels = labels, col = colors)

31
Stacked bar chart
# Example data
labels <- c("A", "B", "C", "D")
values1 <- c(20, 30, 10, 15)
values2 <- c(10, 25, 15, 20)
values3 <- c(5, 15, 10, 5)

# Create a stacked bar chart


barplot(rbind(values1, values2, values3), beside = FALSE, col = c("#FFC000",
"#5B9BD5", "#70AD47"),
legend.text = c("Group 1", "Group 2", "Group 3"), args.legend = list(x = "topright"),
xlab = "X-axis label", ylab = "Y-axis label", main = "Stacked bar chart example",
names.arg = labels)

After changing a color : (only change color command)


col= c("red", "black", "purple")

32
Kernel density plot in r
# Generate example data
set.seed(123)
data <- rnorm(1000)

# Estimate the density using the density() function


density_data <- density(data)

# Plot the density using the plot() function


plot(density_data, main = "Kernel Density Plot Example", xlab = "X-axis label", ylab
= "Density")

33
For Loop

A for loop in R is a control flow structure that allows you to execute a block of code
repeatedly for a specified number of iterations. It is particularly useful when you
need to perform a task a fixed number of times or when you want to iterate over
elements in a vector or other data structures.
• variable: This is a loop variable that takes on the values from the sequence
during each iteration.
• sequence: A sequence of values, typically a vector, over which the loop
variable will iterate.
# Use a for loop to generate a sequence of numbers
for (i in 1:5) {
cat(i,"JK ")
}
for (i in 1:5) {
cat(i,"JK ","\n")
}

34
# Create a 9x9 multiplication table
for (i in 1:9) {
for (j in 1:9) {
cat(i, "x", j, "=", i * j, "\t")
}
cat("\n")
}

35
Descriptive Statistics using R

Descriptive statistics is a fundamental aspect of data analysis, and R is a powerful tool for
performing such analyses. To demonstrate descriptive statistics using the mtcars dataset in R, we
will cover some of the most commonly used functions and techniques.

Steps to be followed as under:-

Step 1: Load the Data


First, you need to load the mtcars dataset, which is a built-in dataset in R.
data(mtcars)
library(dplyr)
library(ggplot2)
mtcars
Output

36
Step 2: Summary Statistics
You can use the summary() function to obtain an overview of the dataset. This function provides
statistics for each variable in the dataset.
summary(mtcars)
This will display descriptive statistics for each variable, such as mean, median, quartiles, and
min/max values.

37
Step 3: Measures of Central Tendency
To calculate the mean, median, and mode, you can use the following functions:
### mean ###
mean(mtcars$mpg)
Output:-

### median ###


median(mtcars$mpg)
median(mtcars$qsec)
Output:-

#######mode #######
mode(mtcars)
Output:-

Mode <- function(x) {


uniq_x <- unique(x)
uniq_x[which.max(tabulate(match(x, uniq_x)))]
}
#### Calculate the mode for the 'mpg' variable in the mtcars dataset #####

38
mode_mpg <- Mode(mtcars$mpg)

##### Print the mode ####


cat("Mode of 'mpg' in mtcars dataset:", mode_mpg, "\n")
Output:-

######## range #######


range(mtcars$mpg)
Output:

###### variance #######


var(mtcars$mpg)
Output:

####### standard deviation #######


sd(mtcars$mpg)
Output:

39
###### table ########
table(mtcars$mpg)
Output:

######## moments command ######


install.packages("moments")
library(moments)
a<- moment(mtcars$mpg,order=2)
a
moment(data, order = 3)
Output:

##### for kurtosis command #####


install.packages("e1071")
library(e1071)

40
####### test of normality ######
library(ggplot2)
qqnorm(mtcars$mpg)
qqline(mtcars$mpg)

Output:

41
Various types of Distribution

Binomial Distribution
The binomial distribution model deals with finding the probability of success of an event which has only
two possible outcomes in a series of experiments.

Characterstics of Binomial Distribution:-


• A discrete distribution.
• No of trials is pre-determined.
• Each trial must result in only two possible outcomes, ie either success (prob ‘p’) or
failure (prob ‘q’ = 1- p).
• Prob of success in each trial must be same.
• Trials must be statistically independent.
• In such sit, if number of trials = n;
• prob of success = p;
• prob of failure = q = (1-p),
• then, Prob of exactly ‘r’ successes = ncr pr q n-r

Normal Distribution
In a random collection of data from independent sources, it is generally observed that the distribution of
data is normal.

Which means, on plotting a graph with the value of the variable in the horizontal axis and the count of
the values in the vertical axis we get a bell shape curve.

42
Poisson Distribution
The Poisson distribution is the probability distribution of independent event occurrences in an
interval. If λ is the mean occurrence per interval, then the probability of having x occurrences
within a given interval is:

43
Hypothesis Testing
What is Hypothesis ?
in statistics, is a claim or statement about a property of a population

What is Hypothesis Testing ?


is to test the claim or statement

Types of Hypothesis :-

44
ANOVA
ANOVA test or analysis of variance is a statistical test to evaluate how a quantitative
dependent variable is affected by other individual variables. In simpler terms, it’s a
form of hypothesis testing for variance in a given group or groups.

TYPES OF ANOVA:

1. One-Way Analysis of Variance (ANOVA):

Concept:
One-Way ANOVA is used when comparing means across three or more groups that are
independent of each other. It helps determine whether there are any statistically significant
differences among the means of the groups.
Application:
Examples in business include comparing the average sales performance of different regions,
analyzing the effectiveness of various marketing strategies, or evaluating employee
performance across multiple departments.

45
Key Steps:
a. Formulate hypotheses.
b. Collect data from multiple groups.
c. Calculate the F-statistic.
d. Compare the calculated F-statistic with the critical value.
e. Draw conclusions based on the comparison

2. Two-Way Analysis of Variance (ANOVA):


Concept: Two-Way ANOVA extends the analysis to two independent variables, allowing the
examination of the interaction effect between these variables on the dependent variable. It
assesses whether there are any significant differences due to the main effects of each variable
or their interaction.
Application:
In a business context, Two-Way ANOVA might be used to study the impact of both pricing
strategy and advertising channels on sales.
Key Steps:
a. Formulate hypotheses for main effects and interaction effects.
b. Collect data, considering two independent variables.
c. Calculate the F-statistic for main effects and interaction effects.
d. Compare the calculated F-statistics with critical values.
e. Interpret results and draw conclusions.

Differences between One-Way and Two-Way ANOVA:


One-Way ANOVA considers one independent variable, while Two-Way ANOVA involves two.
Two-Way ANOVA explores the interaction effect between the independent variables. Two-Way
ANOVA is more complex but provides a deeper understanding of the relationships.

46
Regression Analysis
Regression analysis is a statistical method used in various fields to examine the
relationship between one dependent variable and one or more independent
variables.
The goal is to understand how changes in the independent variables correlate
with changes in the dependent variable.
The regression model generates an equation that represents the relationship
between the variables.
The equation takes the form of Y=a+bX+e, where Y is the dependent variable, X is
the independent variable, a is the intercept, b is the slope, and e is the error term
accounting for unobserved factors affecting Y.

47
Regression Analysis in R :-

# Load the dataset


data(mtcars)
str(mtcars)
# Perform linear regression
lm_model <- lm(mpg ~ wt + hp + disp, data = mtcars)

# Coefficients
coefficients <- coef(lm_model)
print(coefficients)

# Intercept
intercept <- coefficients[1]
print(intercept)

# Residuals
residuals <- resid(lm_model)
print(residuals)

48
# R-squared
r_squared <- summary(lm_model)$r.squared
print(r_squared)

# p-values
p_values <- summary(lm_model)$coefficients[, "Pr(>|t|)"]
print(p_values)

# F-statistic
f_statistic <- summary(lm_model)$fstatistic
print(f_statistic)

# Standard errors
standard_errors <- summary(lm_model)$coefficients[, "Std. Error"]
print(standard_errors)

Output:-

49
50
Logistic Regression
Logistic regression is a statistical technique used for predicting binary outcomes.
It is a popular method for classification problems where the dependent variable is
categorical and can take only two values, such as "yes" or "no," "success" or
"failure," or "spam" or "not spam."

Logistic regression analysis is generally used to find the probability of an event. It


is used when the dependent variable is dichotomous or binary.
.

51
Neural Network
What is Neural Network ?
Neural Network in R, Neural Network is just like a human nervous system, which is made up of
interconnected neurons, in other words, a neural network is made up of interconnected information
processing units.

A neural network helps us to extract meaningful information and detect hidden patterns from complex data
sets.

This method is developed to solve problems that are easy for humans and difficult for machines.

For example, identifying pictures like dogs and cats. These problems are often referred to as pattern
recognition.

Neural Network Layers:

52

You might also like